From: An academic Arabic corpus for plagiarism detection: design, construction and experimentation
Segment | Untampered Test Dataset | Test Dataset with Plagiarized Paragraphs | Test Dataset with Plagiarized Sentences | |||
---|---|---|---|---|---|---|
Count | unique count | Count | unique count | count | Unique count | |
unigram | 632 | 413 | 735 | 487 | 678 | 441 |
bigram | 631 | 586 | 734 | 682 | 677 | 626 |
trigram | 630 | 618 | 733 | 718 | 676 | 662 |
4-g | 629 | 624 | 732 | 725 | 675 | 672 |
5-g | 628 | 627 | 732 | 730 | 674 | 673 |
6-g | 627 | 627 | 731 | 731 | 673 | 673 |
7-g | 626 | 626 | 729 | 729 | 672 | 672 |