Skip to main content

Table 8 Characteristics of the test dataset

From: An academic Arabic corpus for plagiarism detection: design, construction and experimentation

SegmentUntampered Test DatasetTest Dataset with Plagiarized ParagraphsTest Dataset with Plagiarized Sentences
Countunique countCountunique countcountUnique count
unigram632413735487678441
bigram631586734682677626
trigram630618733718676662
4-g629624732725675672
5-g628627732730674673
6-g627627731731673673
7-g626626729729672672