Category | Feature name | Feature description | Quantity |
---|---|---|---|
Discussion contextual features | mes.depth | The numeric position (chronological order) within a thread | 1 |
mes.replies | The total number of replies beneath each message in a thread | 1 | |
mes.start | A binary number to indicate whether the message is the start of a thread | 1 | |
mes.end | A binary number to indicate whether the message is the end of a thread | 1 | |
Linguistic features | cm* | Cohesion measure features from the Coh-Metrix tool | 108 |
liwc* | Word-collection based features from the LIWC tool | 90 | |
Semantic similarity | sim.cos.pre | cosine similarity of the current and the previous message represented by two TF-IDF weighted vectors | 1 |
sim.cos.next | cosine similarity of the current and the next message represented by two TF-IDF weighted vectors | 1 | |
sim.bert.pre | similarity of the current and the previous message represented by pre-trained BERT embedding vectors | 1 | |
sim.bert.next | similarity of the current and the next message represented by pre-trained BERT embedding vectors | 1 | |
Name entities | ner* | In each message, occurrence times of 18 types of name entities, including Person, ORG, Date, GPE, Location, Time, etc | 18 |
ner.total | The total number of all above name-entity types in a message | 1 |