Skip to main content

Table 8 Characteristics of the test dataset

From: An academic Arabic corpus for plagiarism detection: design, construction and experimentation

Segment

Untampered Test Dataset

Test Dataset with Plagiarized Paragraphs

Test Dataset with Plagiarized Sentences

Count

unique count

Count

unique count

count

Unique count

unigram

632

413

735

487

678

441

bigram

631

586

734

682

677

626

trigram

630

618

733

718

676

662

4-g

629

624

732

725

675

672

5-g

628

627

732

730

674

673

6-g

627

627

731

731

673

673

7-g

626

626

729

729

672

672