Paraphrase type identification for plagiarism detection using contexts and word embeddings

Paraphrase types have been proposed by researchers as the paraphrasing mechanisms underlying acts of plagiarism. Synonymous substitution, word reordering and insertion/deletion have been identified as some of the common paraphrasing strategies used by plagiarists. However, similarity reports generated by most plagiarism detection systems provide a similarity score and produce matching sections of text with their possible sources. In this research we propose methods to identify two important paraphrase types – synonymous substitution and word reordering in paraphrased, plagiarised sentence pairs. We propose a three staged approach that uses context matching and pretrained word embeddings for identifying synonymous substitution and word reordering. Our proposed approach indicates that the use of Smith Waterman Algorithm for Plagiarism Detection and ConceptNet Numberbatch pretrained word embeddings produces the best performance in terms of F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {F}_1$$\end{document} scores. This research can be used to complement similarity reports generated by currently available plagiarism detection systems by incorporating methods to identify paraphrase types for plagiarism detection.

In past work researchers have identified several types of plagiarism, such as no obfuscation (copy and paste), translation obfuscation and summarisation plagiarism (Alzahrani et al. 2012;Potthast et al. 2014). Paraphrase plagiarism (Carmona et al. 2018) refers to and captures situations in which text is copied from sources and obfuscated using lexical and semantic transformations, such as synonymous substitutions, word reordering and rephrasing. These modifications change the surface form of a text, but preserve its overall meaning, thereby making it difficult for computers and humans to identify and prove that plagiarism has occurred.
In the context of paraphrase plagiarism, several researchers (Barrón-Cedeño et al. 2013;Bhagat and Hovy 2013;Sun and Yang 2015) have identified a number of different paraphrase types: specific categories of text rewrite operations a plagiarist might use in order to obfuscate copied text. In a study based on analysing a collection of simulated cases of plagiarism from the PAN PC-10 corpus, Barrón-Cedeño et al. (2013) have reported that same polarity or synonymous substitution, i.e., the substitution of synonymous words and phrases, forms the largest proportion of paraphrase types in plagiarised text. These findings have been corroborated by Bhagat and Hovy (2013) and Sun and Yang (2015), who have also stated that synonym substitution forms a large proportion of rewrite operations in paraphrased texts. Furthermore, these research works have also identified word reordering as a less frequently used, but an important paraphrase type in the context of plagiarism.
Surveys on plagiarism detection tools, such as (Kanjirangat and Gupta 2016;Weber-Wulff 2014) provide useful information on the current state of effectiveness of these tools. A recent survey on testing of plagiarism detection tools by Foltỳnek et al. (2020) stated that out of 15 available plagiarism detection tools, none satisfied their criteria of being labelled as a useful system. In their words, "the results... indicate insufficient systems. The performance on plagiarism from Wikipedia disguised by a synonym replacement was generally poorer and almost no system was able to satsfyingly identify manual paraphrase plagiarism." These tests indicate that despite advances in educational technology for plagiarism detection, synonym replacement and paraphrasing represent a challenge for plagiarism detection systems.
In this work we propose a three staged approach to identify synonymous substitutions and word reordering in paraphrased, plagiarised sentence pairs. The primary motivation for this research is to develop methods that identify paraphrase types used in plagiarism. This information about detected paraphrase types can be useful to a human evaluator in making an informed decision about the occurrence of plagiarism. We present a novel approach that uses context matching and pretrained word embeddings to identify paraphrase types in plagiarised sentence pairs. To the best of our knowledge, identifying synonymous substitutions and word reorderings using approaches reported in this paper has not been carried out in previous work.
Our dataset consists of pairs of paraphrased sentences annotated for paraphrase types from the Corpus of Plagiarised Short Answers (Clough and Stevenson 2011). Our proposed three staged approach begins with preprocessing which includes sentence filtering. These pairs of paraphrased sentences are then processed as inputs to two parallel paths for identifying the two paraphrase types. For identifying reordered word segments (word reordering) we use paraphrase patterns and permutations of words and text segments. For the detection of synonymous substitutions we use the Smith Waterman Algorithm and ConceptNet Numberbatch pretrained word embeddings. Our experiments report an F 1 score of 0.906 for identifying word reorderings and an F 1 score of 0.802 for identifying synonymous substitutions for the entire dataset.
The rest of this article is organised as follows: Section "Background" provides a review of prior work on plagiarism detection, paraphrase plagiarism detection and monolingual text alignment. Section "Experimental setup" gives the experimental setup by identifying measurement parameters and the dataset used. Section "Proposed approach" provides details of the proposed approach using the three staged framework for detection of word reordering and synonymous substitution. Section "Results and discussion" states the results of our evaluations and comparison by varying alignment method and word embeddings. Finally, Section "Conclusions and future work" concludes the paper by highlighting our contributions and future work.

Background
In this section we provide the background pertinent to the problem of identifying paraphrase types in plagiarism detection. We begin by stating a brief overview of plagiarism and its various forms (Section "Forms of plagiarism") as well briefly discuss the motivations for plagiarism. This is followed by a description of paraphrase plagiarism and its position in various plagiarism taxonomies in research surveys (Section "Paraphrase Plagiarism"). We then present an overview of paraphrase types identified in plagiarised texts by describing various paprahrase typologies (Section "Paraphrase typologies"). We conclude this section by presenting an overview of methods for detecting paraphrase plagiarism (Section "Plagiarism detection") and a brief description of monolingual textual alignment (Section "Monolingual Textual Alignment").

Forms of plagiarism
Text reuse is the reuse of text either in its original or modified form (Clough 2010). Plagiarism is a case of text reuse, but when proper attribution is lacking and can be defined as "the use of ideas and/or words from sources without giving due acknowledgement" (Meuschke and Gipp 2013). Barrón-Cedeño (2012, p. 18) states some of the reasons why students engage in plagiarism which can be classified as teacher oriented, student oriented and educational system oriented. These reasons can be attributed to (a) a lack of commitment by teachers (such as repeating the same assignments), (b) students' attitude to school and learning process (such as investing the least amount of time and effort), and (c) lack of clear rules from the educational institution.
Several types of plagiarism have been identified in the literature from various perspectives which include: no obfuscation (copy and paste), translation obfuscation, random obfuscation and summary obfuscation (Potthast et al. 2015(Potthast et al. , 2014. Both translation obfuscation and summary obfuscation involve the use of paraphrasing.

Paraphrase plagiarism
Paraphrase plagiarism can be defined as a form of plagiarism wherein rephrasing, substitution and restructuring of words and phrases may be used to obfuscate copied text.
Researchers have identified paraphrase plagiarism in surveys ranging over two decades (Alzahrani et al. 2012;Foltýnek et al. 2019;Maurer et al. 2006;Weber-Wulff 2014). Table 1 provides a classification of paraphrasing as a form of plagiarism from selected past surveys. From Table 1 it can be observed that paraphrase plagiarism has been classified as a sub-type of various plagiarism types such as semantics-preserving (Foltýnek et al. 2019), disguised and structural (Weber-Wulff 2014), intelligent plagiarism (Alzahrani et al. 2012) and as a form of plagiarism (Maurer et al. 2006).
An example of plagiarism in journal articles using source and paraphrased text segments (Sun and Yang 2015) as a form of substitution is stated as follows (substitutions are marked with superscripts): 1 (Source Text) These findings are relevant 1 ... In particular, the interactive virtual-labs 2 are effective... to monitor 3 the learning process and to determine 4 whether learning is taking place as planned. 2 (Paraphrased Text) These findings are important 1 ... In particular, the interactive VLs 2 are effective... in monitoring 3 the learning process and determining 4 whether learning is taking place as planned.

Paraphrase typologies
Paraphrase Typologies have been proposed from various perspectives such as discourse analysis and computational linguistics (Vila et al. 2014). From a plagiarism detection perspective, Barrón-Cedeño et al. (2013) have proposed a paraphrase typology comprising twenty paraphrase types classified into four categories. Some of the more important paraphrase types in their work are: same polarity substitution, opposite polarity substitution and modal verb change. Sun and Yang (2015) have also identified paraphrase types as paraphrasing strategies particularly in the context of plagiarism in journal articles. Some of the important paraphrasing strategies in their proposed list are: substitution, insertion/deletion, reordering. Likewise (Bhagat and Hovy 2013) have proposed a comprehensive list of paraphrase types with important types being synonym substitution, antonym substitution and change of modality. We present a partial mapping of these paraphrase types in Table 2 between the works of Barrón-Cedeño et al. (2013) and Bhagat and Hovy (2013). It can be observed that the mappings generally represent  Bhagat and Hovy (2013) and Sun and Yang (2015) show that substitution of synonymous words and phrases is a frequently occuring paraphrase type. This can be corroborated from the percentages of synonymous substitutions which are 45% (Barrón-Cedeño et al. 2013), 37% (Bhagat 2009) and 32% (Sun and Yang 2015) of all paraphrase types in their respective datasets. These findings highlight the importance of synonym substitution as a frequently used paraphrase mechanism underlying acts of plagiarism.

Plagiarism detection
Plagiarism detection refers to techniques, tools and methods used for automated detection of plagiarism, since manual detection becomes infeasible with large amounts of information. In this section we provide a brief overview of various approaches proposed for the detection of plagiarism and paraphrase plagiarism. In particular, approaches based on character and word n-gram similarity (Bensalem et al. 2019;Sánchez-Vega et al. 2017), vector space models (Sanchez-Perez et al. 2014), natural language processing (Chong 2013;Kanjirangat and Gupta 2018) machine translation similarity metrics (Madnani et al. 2012) and alignment algorithms (Nichols et al. 2019) have been successfully applied towards plagiarism detection. Despite these advances, plagiarism detection when text has been paraphrased remains a challenge due to limited success in measuring semantic overlap (Carmona et al. 2018).
A number of recent research works on paraphrase plagiarism detection have adopted various approaches. Carmona et al. (2018) introduce two new semantically informed distance measures between texts, which are based on the Jaccard similarity measure and Levenshtein edit distance by merging WordNet and Word2Vec based similarity measures. Sanchez-Perez (2018) use WordNet similarity metrics, as well as similarity metrics from Word2Vec and GloVe pretrained word embeddings. Sánchez-Vega et al. (2017) combine six different character level features to compute textual similarity based on the Dice coefficient. Kanjirangat and Gupta (2018) propose a new syntactic-semantic similarity measure based on the WUP WordNet similarity, while Chitra and Rajkumar These research works have used novel approaches for paraphrase plagiarism detection and therefore represent state of the art. However, paraphrase type identification can be considered as a different problem as compared to paraphrase plagiarism detection. This is because in paraphrase type identification we aim to identify paraphrase types within text pairs that have been marked as plagiarised.
From a research perspective, a proposed method to identify paraphrase types can be integrated within an existing plagiarism detection system. This has been modeled in Fig. 1, where a proposed method for identifying paraphrase types is appended to a textual alignment module for extrinsic plagiarism detection. The output of the plagiarism detection module (matching sections of text) can be sent as input to the paraphrase type identification module. This will subsequently identify paraphrase types within matched sections of text, thereby achieving successful integration of paraphrase type identification within a plagiarism detection system.

Monolingual textual alignment
Textual alignment is the task of linking similar textual entities between two textual units. Bitext Alignment (Tiedemann 2011) is a particular application of textual alignment in machine translation, where words or phrases in sentences of different languages are linked. Monolingual textual alignment can be defined as, "the task of discovering and aligning similar semantic units in a pair of sentences expressed in a natural language" (Sultan et al. 2014). Monolingual textual alignment is of particular interest to us as it is similar to the problem of identifying paraphrase types in plagiarised text. Figure 2 illustrates monolingual textual alignment between two sentences (Wang et al. 2013) as a word similarity matrix with shaded squares representing alignments. Here, 'read into' ↔ 'interpeted' can be considered as word to phrase alignment in addition to other word and phrase alignments.
Alignment tools, such as Meteor, GIZA++ and Berkeley Aligner, are readily available for monolingual textual alignment. These tools have been used for text reuse detection, such as the Berkeley Word Aligner having been used for aligning sentences from a parallel corpus on a token level (Moritz et al. 2018). Sultan et al. (2014) have proposed a successful pipeline architecture for aligning words between source and target sentences as follows: (i) aligning identical word sequences, (ii) aligning named entities, then (iii) aligning content words, and finally (iv) aligning stopwords. This sentence aligner was one of the best performing aligners at Semeval-2015.

Experimental setup
This section provides details about the experimental setup. We begin with a formal description of the problem, followed by details of the dataset used and the measurement parameters.

Problem description
In this research our objective is the detection of paraphrase types in text pairs that have been marked as plagiarised. We focus on detecting two fundamental paraphrase types: (a) change of order (or word reordering), and (b) synonymous substitution. Our input data consists of pairs of source and paraphrased sentences available as the Subcorpus of In Figure 3 we provide examples of various paraphrase types in source and paraphrased sentences including change of order, synonym substitution and insertion on a pair of source and paraphrased sentences. It can be observed that there are two changes of order (word reorderings), one word insertion and one synonym substitution in the sentence pair. Figure 4 gives further examples of synonym substitutions which include three word substitutions and one phrase substitution. Both examples are on sentence pairs from Task A of the Corpus of Plagiarised Short Answers.
The objective of this research is to detect matching segments of text as paraphrase types from a pair of source and paraphrased sentences. We also aim to identify whether an extracted text segment (as a paraphrase type) is a synonymous substitution or a word reordering.

Dataset used
We use the Corpus of Plagiarised Short Answers (Clough and Stevenson 2011) as the data source for the detection of paraphrase types. The Corpus of Plagiarised Short Answers is a collection of simulated cases of plagiarism divided into five tasks and four levels of revision. The five tasks correspond to five questions posed to university students from Wikipedia, while the four levels of revision are (a) near copy, (b) light revision, (c) heavy revision, and (d) no plagiarism. Alvi et al. (2012) have extracted a subcorpus of paraphrased pairs of sentences from the Corpus of Plagiarised Short Answers. This subcorpus consists of 101 files, each with a given source sentence from Wikipedia and the corresponding light and heavily revised paraphrased sentences across the five tasks. These sentence pairs represent actual instances of paraphrasing by university students, thereby simulating plagiarism. Figure 5 shows a sample file with a given source and two paraphrased sentences from Task E.

Selection of sentence pairs
We extract our collection of sentence pairs from the Subcorpus of Paraphrased Sentences. We follow the filtering criteria for the Microsoft Research Paraphrase Corpus (Dolan et al. 2004) outlined in detail in (Dolan and Brockett 2005). Criteria 1 and 2 are exactly adopted from the Microsoft Research Paraphrase Corpus, while criteria 3 and 4 are slight modifications, stated as follows: 1 Sentence Length Criteria: We include sentences having 5-40 words only. Sentences having less than 5 words or more than 40 words are excluded. The rationale here is to exclude sentences that are too short or long. 2 Overlap Criteria: We apply upper and lower limits of word overlap between sentences. To ensure minimal word overlap, sentences that share at least 3 words in common are included. For having some word diversity, we include sentence pairs whose edit distance is at least 3.
3 Similarity Metric Criteria: Sentence pairs are included whose sentence cosine similarity in terms of words is at least 0.25. 4 Length Ratio: Finally we include sentences such that the shorter sentence is at least 60% of the longer one in terms of the number of words.
These filtering criteria result in 211 pairs of source and paraphrased sentences which serve as our collection of sentence pairs for the annotation as well as the detection step. In the annotation phase, we annotate the sentence pairs for the presence of each of the two paraphrase types i.e., change of order (word reordering) and synonymous substitution.

Word reordering (change of order)
Word reordering or change of order has been identified as a paraphrase type in several research works in the context of plagiarism (Barrón-Cedeño et al. 2013;Sun and Yang 2015). However, the definition of change of order is quite general in these works and may span multiple paraphrase types such as addition of content words. We refer to (Sousa-Silva 2014) for a more specific definition of word reordering in the context of plagiarism as follows: "Word reordering is used to describe the linguistic operations whereby the original words are reused, but in a different order". Likewise, Bhagat and Hovy (2013) refer to reordering of words in the context of paraphrasing as follows: "The words in the new sentence were allowed to be reordered (permuted) if needed and only function words (and no content words) were allowed to be added to the new sentence." We define word reordering to be a permutation of the words in a phrase or a sentence with the addition of stopwords only such as prepositions, conjunctions or determiners. In the context of statistical machine translation, Bisazza and Federico (2016) present two examples from English that highlight our definition of word reordering as follows: 1 "I saw the cat" ↔ "The cat I saw" 2 "the tail of the cat" ↔ "the cat's tail" In these examples, the same set of content words is used along with addition or removal of stopwords. Using the above definition of word reordering, we have annotated our collection of source and paraphrased sentence pairs from the Corpus of Plagiarised Short Answers. The statistics of our annotation are given in Table 3. We observe that the  (Alvi et al. 2012) number of word reorderings is much less than the number of substitutions which agrees with the observation in (Sousa-Silva 2014) i.e., "this linguistic strategy (word reordering) is not as common as word substitution".

Synonymous substitution
We use the term synonymous substitution to refer to both word and phrasal substitutions in plagiarism. Our definition of synonymous substitution generally agrees with the definition of same polarity substitutions defined by Barrón-Cedeño et al. (2013). In general, a synonymous substitution can be considered as the replacement of a word or phrase with another one having exact or approximate meaning such that the overall sense of the sentence remains the same. Examples of change of order and synonymous substitutions appear in both Figs. 3 and 4. Table 3 provides the statistics of annotation of synonymous substitutions and word reordering.
From Table 3 it can be observed that the number of synonymous substitutions is much more as compared to word reorderings. From a quantitative perspective, word reorderings (28 annotations) are 8.43% of the entire set of annotations and 9.21% of substitutions. This is in general agreement with other datasets such as the number of paraphrasing strategies found by Sun and Yang (2015), where word reorderings are approximately 2.49% of the entire dataset and 6.64% of substitutions.

Measurement parameters
We use the information retrieval measures of precision, recall and F 1 score to measure the effectiveness of our proposed approach. The application of precision and recall for identifying paraphrase types is similar to that used in the PAN plagiarism detection evaluation labs (Potthast et al. 2015(Potthast et al. , 2014. In this method, each instance of a paraphrase type annotation is considered as a four-tuple s = (s start , s size , t start , t size ) , where • s start = starting index of the paraphrase type annotation in the source sentence, • s size = length of the paraphrase type annotation in the source sentence, • t start = starting index of the paraphrase type annotation in the paraphrased sentence, • t size = length of the paraphrase type annotation in the paraphrased sentence.
Similarly, we identify a detection as a four-tuple r = (s ′ start , s ′ size , t ′ start , t ′ size ) detected using an algorithm or approach with similar definitions as described above. A match between an annotation s = (s start , s size , t start , t size ) and a detection r = (s ′ start , s ′ size , t ′ start , t ′ size ) is the number of overlapping positions of characters represented as s ∩ r . For the entire dataset, we calculate precision by dividing the size of each match by the size of corresponding detection; for recall, match size is divided by the size of the corresponding annotation. These individual quantities are then summed and further normalised by dividing by the number of instances for the respective measures (i.e., the number of detections for precision |R|, and the number of annotations |S| for recall). This gives us mean averaged precision and recall. The F 1 score is the harmonic mean of both precision and recall and is given by the following equations: In the preceding equations R is the set of detections, while S is the set of annotations. The above computed F 1 score is macro-averaged where each paraphrase type annotation is given an equal weight irrespective of its size in terms of the number of characters. The rationale for choosing character matches instead of word matches is that character matches account for partial overlaps as compared to word matches which match whole words only.

Proposed approach
In this section we present our proposed approach for the detection of word reorderings and synonymous substitutions in paraphrased, plagiarised sentence pairs. Starting with the paraphrased sentence pairs from the Corpus of Plagiarised Short Answers, our proposed approach consists of the following three steps: 1. Preprocessing: In this stage, we apply punctuation removal and case folding. We then filter the sentence pairs according to the criteria set out in Section "Selection of sentence pairs". These pairs of sentences are then sent as input for detection of paraphrase types. 2. Identification of Word Reorderings: In this stage, we detect identical textual segments from the sentence pairs to used as inputs for the identification of word reorderings. We use permutations of identical textual segments and paraphrase patterns for the detection word reorderings. 3. Identification of Synonymous Substitutions: Finally, we identify synonymous substitutions within the sentence pairs using contexts, word alignment and word embeddings. We use ConceptNet Numberbatch pretrained word embeddings (Speer and Lowry-Duda 2017;) and the Smith Waterman Algorithm for Plagiarism Detection (Glinos 2014) to detect these substitutions. ( The output of these three stages consists of detected text segments as paraphrase types, identified as word reorderings or synonymous substitutions. Figure 6 shows the overall block diagram for our proposed approach. In the following subsections, we describe these stages in more detail.

Preprocessing
The preprocessing step is carried out to achieve the following tasks: (a) punctuation removal and case folding, and (b) filtering of sentences.
1 Punctuation Removal and Case Folding: We begin by removing all punctuation signs except for the apostrophe(') from the sentence pairs. The apostrophe is not removed as it represents the possessive form of several words such as Google's and Bayes' in the corpus. Furthermore, we also carry out case folding, i.e. all uppercase letters are converted to lowercase. 2 Sentence Filtering: In the second step we filter out sentences according to the criteria presented in Section "Selection of sentence pairs". In particular, for step 3 of the criteria, cosine similarity is calculated on sentence pairs by considering each sentence as a vector of words. More formally, let S and T be vector representations of two sentences. Therefore,

Fig. 6 Block Diagram of the Proposed Three Staged Approach
This value of sim( S, T ) is used for sentence filtering.

Detection of word reorderings
The next stage in our approach is the detection of word reordering (or change of order) paraphrase type. Word reordering can be considered as a rearrangement of words along with addition or removal of function words as discussed in Section "Word Reordering". In our proposed approach, we detect word reorderings as the second stage before synonymous substitutions. This is because a word reordering is subject to a rearrangement of words from the source sentence only. In contrast, a synonymous substitution may involve replacement of words using words from the source sentence as well as from external sources. In this sense, a word reordering is more specific as compared to a substitution, hence we prioritise the detection of word reorderings. We use permutations of identical textual segments and paraphrase patterns to detect word reorderings. This proceeds in the following steps:

Detection of identical text segments
We use the Greedy String Tiling Algorithm (Wise 1995) to find identical textual segments between the two sentences. We relax the definition of identical to near identical by considering words that end in an 's' or an apostrophe-s ('s or s') to be identical (e.g. "Google" and "Google's" are considered near identical). The Greedy String Tiling Algorithm (Wise 1995) identifies all matching string tiles between two strings starting with the longest matching substring and subsequently reducing the sizes of the matching substring. We find all matching string tiles of n-grams, (where n ≥ 2) and align them between the source and the paraphrased sentences. These large identical fragments represent exactly reproduced text from the source sentence. However, we do not align identical unigrams as there is a high probability of misalignment due to multiple occurrences. Figure 7 shows the result of aligning identical textual segments between the source and the paraphrased sentences. We can observe that the word ("that") might result in misalignment due to multiple occurrences in the paraphrased sentence.

Permutations of identical text segments
In this step, we search for permutation patterns of (near) identical textual segments found in the previous stage. Any permutation of words or text segments between the source and paraphrased sentences is considered as a word reordering. For example, the sequence 'apples carrots bananas' (ACB) is a 3-permutation of 'bananas carrots apples' (BCA) and hence can be considered as a word reordering. However, since permutations might repeat similar elements as left or right contexts, the search space for permutation patterns can be significantly reduced leading to efficient search. This is illustrated in the Table 4, where out of 3! = 6 permutations for 3 elements, almost all 3-permutations can be considered as 2-permutations except for the last one.
Due to this reduction in the number of permutation patterns, we search for the patterns AB ↔ BA (2-permutation) and ABC ↔ CBA (3-permutation) in the sentence pairs.

Paraphrase patterns
Paraphrase patterns can be considered as sets of semantically equivalent paraphrases, with placeholders for words. Zhao et al. (2008) define paraphrase patterns as "sets of semantically equivalent patterns, in which a pattern generally contains two parts i.e. the pattern word and the slots. For example, in the pattern 'X solves Y' , 'solves' is the pattern word while 'X' and 'Y' are slots". The Paraphrase Database (PPDB) (Ganitkevich et al. 2013) is a collection of over 140 million paraphrase patterns in the English language.
For the detection of word reorderings, we consider paraphrase patterns in which the slots X and Y interchange their positions within the pattern. For example, the following pattern: "X announced by Y ↔ Y announced a X" appearing in (Bhagat 2009, p. 200) can be considered as a word reordering.
Some of the paraphrase patterns used for the detection of word reorderings in our sentence pairs are shown with examples as follows: • X of Y ↔ Y X (University of Stanford ↔ Stanford University) • X and Y ↔ Y and X (apples and oranges ↔ oranges and apples) • X and Y ↔ X, Y (cars and trucks ↔ cars, trucks) Google has PageRank Algorithm) The above patterns are also reversible i.e., given the pattern 'X of Y ↔ Y X' , the pattern 'X Y ↔ Y of X' is also a paraphrase pattern corresponding to a word reordering. It can also be observed that the pattern 'X and Y ↔ Y and X' can also be considered as a 3-permutation of the form ABC ↔ CBA. Although we use just a few patterns since the number of annotations is small, the design of the approach is flexible to accommodate a large number of patterns based on dataset.

A B C A B C A B C is an identical form of A B C
A is a 2-permutation with C as the right context

is the only 3-permutation in the list
Using our approach a wide variety of word reordering text segments can be successfully detected, such as the examples shown in Figure 3. Even entire sentences written by a plagiarist using word reordering can be detected using this approach, as shown in the following example from Task B of the corpus: • (Source) It is intended to help reuse existing code with little or no modification. • (Paraphrased) With little or no modification it is intended to help reuse existing code.
This completes the description of our proposed approach for the detection of word reorderings.

Detection of synonymous substitutions
In this subsection, we present the details of the third stage of our proposed approach, i.e. the detection of synonymous substitutions in paraphrased, plagiarised sentence pairs. Synonymous substitutions can be considered as the substitution of a word or phrase in a sentence such that the overall sense of the sentence remains the same (Section "Synonymous substitution"). From a detection perspective, we classify synonymous substitutions into two different types: contextual substitutions and noncontextual substitutions. These are described as follows: 1 Contextual Substitutions: Contextual substitutions can be considered as synonymous substitutions where the left and right contexts of a given word or phrasal substitution are identical in both the source and the paraphrased sentences. For example, given the following fragments (Fig. 4), '...with little or... ' ↔ '...with minimal or... ' , the pair 'little ↔ minimal' can be considered as a contextual synonymous substitution. This can be observed as both the left context ('with') and the right context ('or') of the words 'little' and 'minimal' match. 2 Non-contextual Substitutions: Non-contextual substitutions can be considered as synonymous substitutions such that their left, right or both contexts may not match. For example (Fig. 4), the text fragments 'help reuse existing code' ↔ 'existing code to be used again' the phrase pair 'reuse ↔ to be used again' can be considered as a non-contextual synonymous substitution. We consider this pair as non-contextual synonymous substitution since the corresponding left and right contexts do not match. For the detection of synonymous substitutions, we begin with alignment of sentences. We use the Smith Waterman Algorithm for Plagiarism Detection (Glinos 2014) for the alignment of words in sentences. For sentences having m and n words, the Smith Waterman Algorithm begins by constructing an alignment matrix M of size (m + 1) × (n + 1) . We construct the scoring scheme for the Smith Waterman Algorithm such that the cost of a match is higher than the cost of a mismatch or gap penalty. In particular, for the scoring equation below, we use the following parameters: sim(a, b) = 10 (match), -1 (mismatch) and for the gap penalty, gap = -1, stated in the following equation: In order to match words or phrases at the beginning or end of the sentence, matching contexts are added at the beginning and end of the sentence. These correspond to sentinel rows and columns which indicate a match.
For the detection of synonymous substitutions, we divide our approaches based on contextual and non-contextual substitutions as follows:

Contextual substitutions
For contextual substitutions, our proposed approach is based on the distributional hypothesis, which states that "words are similar if their contexts are similar" (Freitag et al. 2005). Our proposed approach proceeds as follows: 1 We begin with the Smith Waterman alignment matrix with rows corresponding to words from the source sentence and columns corresponding to words from the paraphrased sentence. Furthermore, we also mark matrix elements as 'order' if these have already been identified as a word reordering. 2 Given a word alignment in the matrix form, we consider a given word or a phrase pair as a contextual synonymous substitution, if their left contexts are identical and their right contexts are identical. This can be observed from the Fig. 8 where 'the documents ↔ they' have been marked as synonymous substitutions. In terms of the alignment matrix representation, this is seen by a match in the top-left and bottomright cells of the synonymous substitution. 3 Furthermore, sentinel rows and columns in the alignment matrix ensure that matching substitutions in the beginning and end of the sentence are also detected. This can be observed for 'Longer ↔ Long' in Fig. 8 as a contextual synonymous substitution.

Non-contextual substitutions
Non-contextual substitutions can be considered as synonymous substitutions which do not share identical contexts. We use the following methods for the detection of noncontextual synonymous substitutions described here: 1 We use the cosine similarity score of two word vectors (using pretrained word embeddings) for considering a word pair as a non-contextual synonymous substitution. A threshold value for this similarity score is chosen, which is 0.50 in case of ConceptNet Numberbatch pretrained word embeddings. However, we only consider content (non stopwords) only as word cosine similarity scores of stopwords can be quite high. Consideration of stopwords as non-contextual synonymous substitutions may result in a large number of false positives due to the frequency of occurrence of stopwords. 2 Apart from these, a number of non-contextual synonymous substitutions can be considered as derived by punctuation changes to a word, resulting in a word to phrase substitution. For example, the word pair 'subproblem' ↔ 'sub problem' and 'webpage' ↔ 'web page' can be considered as punctuation based non-contextual synonymous substitutions. Figure 9 gives an example of the detection of non-contextual substitutions using word embedding similarity scores from ConceptNet Numberbatch pretrained word embeddings. It can be seen that the word pairs 'need' ↔ 'required' and 'anymore' ↔ 'longer' have high ( ≥ 0.500 threshold) similarity values and hence can be considered as non-contextual substitutions. We can also observe high similarity values of stopword pairs such as 'won't' ↔ 'will' and 'won't' ↔ 'be' . Due to high similarity values of stopword pairs, we detect non-contextual synonymous substitutions between content (non stopwords) only. This completes the description of our approach for the detection of non-contextual synonymous substitutions.

Results and discussion
In this section we state the results of our proposed approaches for the detection of word reorderings and synonymous substitutions. Table 5 gives the results of detection of word reorderings in terms of precision, recall and F 1 score. It can be observed that overall F 1 score is 0.905, while task wise the F 1 scores vary from 0.727 to 1.000. Figure 10 gives the results alongwith the percentage frequency of word reorderings for each task in a bar chart. It can be observed that the precision, recall and F 1 score are generally high for tasks with a higher percentage of instances such as tasks A, C and E. However, for tasks B and D the results are somewhat lower.

Word reordering
While our approach successfully detects a large number of word reorderings correctly, from Table 5 and Fig. 10 we see the precision to be somewhat lower for tasks B and D. This is due to the generation of false positives which can occur due to misalignment of single terms. Let us consider one such false positive case stated below from a heavily revised example of Task B as follows:  . 10 Precision, Recall and F 1 scores taskwise and overall for detecting word reorderings 1 (Source) Votes cast by pages that are themselves important weigh more heavily and help to make other pages important. 2 (Paraphrased) Expanding on this theory we can then say that the links from important pages are themselves more important.
In the above example, although "pages important ↔ important pages" is a permutation, it does not correspond to a change of order. This is because the last occurrence of important in both sentences should align, leading to "pages important ↔ important pages" being a false positive. In other words, "pages that are themselves important" in the source sentence corresponds to "important pages" in the paraphrased sentence.
From an educational perspective, we observe that our approach of using permutations, paraphrase patterns and string tiling has resulted in high F 1 scores overall and across all of the five tasks. The reasons for this result can be attributed to the nature of word reordering paraphrase type. Our proposed approach detects word reordering where students may (a) reorder words within a phrase, (b) reorder words and insert function words thereby mimicking a paraphrase pattern, or (c) reorder entire phrases within a sentence. Several examples of these types of word reordering can be found within the Corpus of Plagiarised Short Answers, thus providing a plausible explanation for the result.

Comparison with other approaches
To the best of our knowledge this work is a first attempt at defining and proposing an approach for the detection of word reordering paraphrase type. There is a wide variety of definitions available for word reordering in the literature (Barrón-Cedeño et al. 2013;Sousa-Silva 2014;Sun and Yang 2015), hence a direct comparison with other approaches is not possible. Furthermore, most research articles deal with the detection of plagiarism, as opposed to the detection of individual instances of paraphrase types. Kumar (2014) have proposed a graph based approach for detecting plagiarism specifically in the context of artificial word reordering. Their technique uses a graph representation of word patterns. Their reported detection scores for the Corpus of Plagiarised Short Answers are (Precision = 0.698, Recall = 0.672 and F 1 = 0.674). These scores of represent the challenge of detecting plagiarism in the presence of artificial word reordering.

Synonymous substitutions
For the detection of synonymous substitutions in paraphrased sentence pairs, we present a comparison by varying the alignment method and the pretrained word embeddings. Tables 6, 7 and 8 present the results of using various alignment methods and pretrained word embeddings for the overall dataset and for each task.
From the perspective of alignment methods we have used the Smith Waterman Algorithm for Plagiarism Detection (Glinos 2014). We have also used the Meteor monolingual aligner (Denkowski and Lavie 2014) as an alignment tool and the Semeval-2015 monolingual aligner by Sultan et al. (2014) as another tool for performance comparison. These alignment methods are easily implementable as well as usable and can be readily utilized for paraphrase type identification.    , FastText 2 (Mikolov et al. 2018) and GloVe 3 (Pennington et al. 2014) pretrained word embeddings. We have used a word similarity threshold of 0.500 as the cutoff score for considering a pair of words as similar. These pretrained word embeddings have proven useful for a variety of NLP tasks such as sentiment analysis and question answering.
Tables 6, 7 and 8 outline the precision, recall and F 1 scores for the overall dataset and for each of the tasks. It can be observed that the choice of the Smith Waterman Algorithm (for Plagiarism Detection) and ConceptNet Numberbatch pretrained word embeddings outperforms all other combinations for almost all of the tasks (except Task B) as well as for the overall dataset. If we consider the performance of these methods on the overall dataset we observe that the Smith Waterman Algorithm with Concept-Net Numberbatch produces an F 1 score of 0.80184, followed by F 1 scores of 0.76911 and 0.74906 using Meteor and Sultan's word aligners. By varying the pretrained word embeddings we observe a gradual reduction in F 1 scores using FastText and then GloVe, as compared to ConceptNet pretrained word embeddings. Furthermore, we observe a high recall (0.84658) using the Smith Waterman Algorithm and ConceptNet Numberbatch.
From a taskwise analysis perspective, we observe highest F 1 scores of 0.72947, 0.70408, 0.88979, 0.89058 and 0.76696 for each of the tasks A, B, C, D and E. In particular F 1 scores for Task B are low for all of the alignment methods and pretrained word embeddings. This is due to the low precision being reported for this task. This is entirely expected as this Task has the highest percentage of sentence pairs in the category of high revision (23/28 = 82.142%) as compared to other tasks. Another point worth observing is that the Meteor monolingual aligner outperforms the Smith Waterman Algorithm and the aligner by Sultan et al. (2014) in terms of F 1 scores for this task due to a higher precision but lower recall.
From an educational perspective, we observe that our approach of using the Smith Waterman Algorithm with ConceptNet Numberbatch pretrained word embeddings produces the best detection score in terms of precision, recall and F 1 scores. The reasons can be attributed to generally well-aligned sentences within the Corpus of Plagiarised Short Answers with students replacing both words and phrases with synonymous substitutions for simulating plagiarism. Furthermore, the application of word cosine similarity using word embeddings for identifying word similarity is an effective approach at finding pairs of similar words in simulated plagiarised text. This is true whether the substitution is a change of form of the same word ('Longer' ↔ 'Long') as shown in Fig. 8 or a synonymous substitution ('need' ↔ 'required') as shown in Fig. 9.
In summary, we observe that the combination of pretrained word embeddings and alignment methods produces a high detection of paraphrase types for plagiarised, paraphrased sentence pairs. This coupled with the ease of implementation and use with which these methods can be applied gives rise to an opportunity for enriching plagiarism detection methods. Such an addition may result in additional information (paraphrase types) being detected, which may prove useful for a human evaluator in making an informed decision about the actual occurrence of plagiarism.

Conclusions and future work
In this work we proposed methods to identify paraphrase types in paraphrased, plagiarised sentence pairs. Several contributions have been presented in this paper outlined here. The proposed idea of this paper, i.e. methods to detect paraphrase types for plagiarism detection complements several research papers that propose paraphrase types and their frequency in plagiarised text. We also proposed methods to identify word reorderings using permutations and paraphrase patterns which has not been presented in earlier work. For the detection of synonymous substitutions, our proposed method of using the Smith Waterman Algorithm and ConceptNet Numberbatch pretrained word embeddings outperformed other combinations of alignment methods and word embeddings.
This research can be used to enhance existing plagiarism detection methods and systems by incorporating methods to detect paraphrase types for plagiarism detection. Such an addition would provide valuable information to a human evaluator in making an informed decision about the actual occurrence of plagiarism.
For future work, methods to detect other paraphrase types can be proposed. In particular methods to detect the insertion/deletion paraphrase type can provide an interesting addition to the proposed collection of methods. Furthermore, an integrated framework for the detection of a multitude of paraphrase types can be designed which will serve to integrate various approaches for the detection of paraphrase types.
It is also worthwhile to have a broader view of the implications of this research from a wider education perspective. This can be initiated by considering the wider concept of academic integrity which encompasses among other aspects, plagiarism detection. Bretag (2018) identifies Academic Integrity as "an interdisciplinary concept that provides the foundation for every aspect and all levels of education". Academic integrity is based on the values of honesty, trust, fairness, respect, responsibility, and courage as outlined by the International Center for Academic Integrity (International Center for Academic Integrity 2021). The current research and its focus on plagiarism detection provides support for building on the values of honesty, fairness and trust in the pursuit of academic integrity.
The implications of this research on the wider academic community such as teachers and researchers are manifold. From the perspective of teachers, it provides additional support for the detection of plagiarism by highlighting paraphrase types, thereby assisting in the detection of plagiarism. This aspect can also be used for the promotion of originality by educating students on methods of paraphrasing. From the viewpoint of researchers, data on paraphrase types and their frequencies from the current research as well as from past research works (Barrón-Cedeño et al. 2013;Sun and Yang 2015) provides valuable insights into paraphrase types used in plagiarism.
Although English is the language mostly used worldwide, the findings of this research can be extended to languages other than English. Kopotev et al. (2021) provides an excellent overview of plagiarism and its detection in the Russian Language. Cross Language Plagiarism Detection (CLPD) (Foltýnek et al. 2019) is a widely researched area of plagiarism detection where the objective is to detect plagiarism from a wide range of multilingual resources (Potthast et al. 2011). Our proposed research methods based on textual alignment and word embeddings can naturally be extended to other languages, since alignment methods have a strong foundation in a multilingual context (Tiedemann 2011). Furthermore word embeddings for other languages (Wang et al. 2020) can be utilized for the detection of paraphrase types for multiple languages.
It is also important to emphasize on the limitations of current research. The current research with its emphasis on paraphrase type identification in the context of plagiarism detection might have limitations in cases of contract cheating or ghostwriting (Meuschke and Gipp 2013). This is because in contract cheating, a plagiarist utilises the services of an external entity for generating academic content. In cases where the external entity writes completely new content, paraphrase type identification will have negligible affect in assisting in the detection of plagiarism.
In summary, we can conclude that the proposed methods of paraphrase type identification in this research can have a wide variety of applications in the academic context. This includes not only assistance in plagiarism detection but also emphasis on enforcing good academic practice.