Multimodal learning analytics of collaborative patterns during pair programming in higher education

Pair programming (PP), as a mode of collaborative problem solving (CPS) in computer programming education, asks two students work in a pair to co-construct knowledge and solve problems. Considering the complex multimodality of pair programming caused by students’ discourses, behaviors, and socio-emotions, it is of critical importance to examine their collaborative patterns from a holistic, multimodal, dynamic perspective. But there is a lack of research investigating the collaborative patterns generated by the multimodality. This research applied multimodal learning analytics (MMLA) to collect 19 undergraduate student pairs’ multimodal process and products data to examine different collaborative patterns based on the quantitative, structural, and transitional characteristics. The results revealed four collaborative patterns (i.e., a consensus-achieved pattern, an argumentation-driven pattern, an individual-oriented pattern, and a trial-and-error pattern), associated with different levels of process and summative performances. Theoretical, pedagogical, and analytical implications were provided to guide the future research and practice.


Introduction
Grounded upon the sociocultural perspective of learning (Vygotsky, 1978), collaborative problem-solving (CPS) focuses on group members' knowledge construction and meaningful practices through continuous interactions and idea improvement with the technological and pedagogical supports (Hmelo-Silver & DeSimone, 2013;Stahl, 2009). Pair programming (PP), as a mode of CPS in computer programming education, asks students work together to solve challenging programming tasks, improve computational thinking, and enhance real-world problem-solving ability (Beck & Chizhik, 2013;Chittum et al., 2017;Sun et al., 2020). However, PP is a complex phenomenon, in which multiple modals (e.g., communication, behavior, socio-emotion, etc.) interact constantly to form different collaborative patterns and finally influence the quality of collaboration (Stahl & Hakkarainen, 2021). Considering the complex factors that may influence PP, it is *Correspondence: fanouyang@zju.edu.cn students' collaborative patterns and the quality of collaboration (e.g., Han & Ellis, 2021;Lin et al., 2014;Webb et al., 2021). For instance, Lin et al. (2014) detected 45 college students' CPS patterns in online forum based on their cognitive engagement; the manipulation-centered pattern demonstrated a deeper cognition of students in collaboration, while the discussion-centered pattern appeared more off-topic discussions. Webb et al. (2021) identified 45 students' collaborative patterns in the thirdgrade mathematics course based on their interaction characteristics. There were groups that took turns to initiate a strategy, groups with students that generated their own strategies, and groups where one student took responsibility to generate the strategies. The results indicated that no single pattern was better than other patterns for leading students' success in collaboration. Moreover, these works mostly focused on the single aspect (e.g., cognitive process, interactive type) of CPS without considering the complexity, multimodality, and dynamics of collaboration, which might cause incomprehensive understandings of the collaborative patterns (Borge & Mercier, 2019). Overall, exploring collaborative patterns in PP, especially from a multimodal, dynamic, holistic perspective, is necessary to help researchers, instructors, and students unfold the complex factors that influence the collaborative quality as well as how they influence (Lu & Churchill, 2014;Perera et al., 2009).
From an analytical perspective, due to the complexity and multimodality of CPS, multidimensional, temporal, and fine-grained approaches are called for exploring students' collaborative patterns in computer programming education. Multimodal learning analytics (MMLA), as a new trend of learning analytics, leverage advances in multimodal data (e.g., speech, eye gaze, heart rate, body movement data) to capture and mining learning process and to address the challenges of investigating multiple, complex learning-relevant constructs in learning scenarios (Mu et al., 2020;Ochoa & Worsley, 2016;Wiltshire et al., 2019). Recently, relevant research has applied MMLA to reveal the complex, multimodal, and dynamic characteristics of CPS. For example, Sun et al., (2021) utilized discourses analysis, click stream analysis, and video analysis to analyze 63 junior high school students' discourses, behaviors, and perceptions during collaborative programming. Kawamura et al., (2021) modeled 48 students' wakefulness states on e-learning platforms and further detected drowsy students according to their multimodal data (i.e., face recognition, seat pressure, and heart rate). Wiltshire et al. (2019) collected multimodal data (i.e., gesture, speech, mouse and keyboard movement) from 42 pairs of undergraduate students and used growth curve modelling to investigate how students' multimodal movement coordination dynamically changed during collaboration. Overall, compared to traditional statistical analysis (e.g., questionnaire data, performance assessment data), MMLA has the potential to reveal the complex, multimodal, dynamic collaborative patterns in PP from a multidimensional, temporal, and fine-grained perspective.
To address these gaps, the current study applied MMLA to examine students' collaborative patterns in a face-to-face, computer-supported PP environment in higher education. Specifically, we collected students' multimodal process-oriented data (including verbal audios, computer screen recordings, facial expression recordings) and programming products data. We proposed an analytical framework that integrated MMLA methods to identify students' collaborative clusters in PP and further revealed the characteristics of clusters. Specifically, two main research questions were proposed: Page 4 of 20 Xu et al. Int J Educ Technol High Educ (2023)

Research context, participants, and programming procedures
The participants were 40 undergraduate students (23 males, 17 females) without prior programming foundation or experience. 20 pairs (2 students/group) were randomly assigned. Specifically, the 20 pairs included 5 male-only pairs, 6 female-only pairs, and 9 mixed pairs. The research dataset consisted of 19 datasets; data from one pair (i.e., a mixed pair) was damaged, which was excluded in this research. The research environment was a computer-supported collaborative problem solving activity. Two students in the same group sat opposite to each other and controlled a computer individually (see Fig. 1a). The computer screens were connected and shared by a remote screen control software. Student groups were asked to collaborate and learn programming on an online programming platform Minecraft Hour of code (https:// code. org/ minec raft) (see Fig. 1b). The platform is designed for novice programming learners with gamification and graphical programming.
Two sections were designed to support student pairs' PP process (each section lasted 25 min). In the first section, students watched the instructional videos and learned to use the coding blocks (i.e., loop, if ) on the platform by completing a series of programming tasks together. In the second section, group members collaborated to complete a final programming task within 25 min by using the coding skills they had learned. The final programming task included two requirements: (1) creating a five-by-five brick building with at least four bricks over water, and (2) the foundation of the building was first constructed with boulders and then with woods. Pairs were asked to use at least two loop blocks, two if blocks, one loop-if nested block, and less than 30 coding blocks to complete the above task requirements. During the final task, both students had rights to control and operate the platform. All participants signed the consent forms and agreed to participate in the research.

Data collection and dataset
The research dataset consisted of 19 datasets collected from 19 pairs of participants. This research collected the multimodal process-oriented data and programming product data of student pairs through two ways. First, video recorders (with audio) were used to capture student pairs' verbal communications and facial expressions. Second, computer screen videos (with audio) were recorded to capture student pairs' behavioral operations on the platform as well as their final programming products. Each dataset included audio recordings of verbal communication data of pairs (about 475 min in total), computer screen recordings of click stream data (about 475 min in total), video recordings of facial expression data (about 475 min in total), and the final products of pair programming task data.

The analytical framework, procedures and methods
An overall analytical framework was proposed to examine the multimodal characteristics of collaborative patterns. The framework included the first step of the assessment and clustering as well as the second step of collaborative pattern analysis. In the first step of assessment and clustering, K-means clustering was conducted to detect the collaborative clusters based on student pairs' process and summative assessment. In the second step of collaborative pattern analysis, Quantitative content analysis (QCA), click stream analysis (CSA), and video analysis (VA) were used to analyze student pairs' verbal communication, operational behavior, and facial expression dimensions. Further, statistical analysis (SA), epistemic network analysis (ENA), and process mining (PM) were used to examine the verbal communication, operational behavior, and facial expression dimensions, in order to reveal the quantitative, structural, and transitional characteristics of different clusters.

Assessment and clustering
First, process assessment was conducted based on the video recording of PP processes. Based on a previously validated assessment framework (Meier et al., 2007), process assessment was conducted in terms of nine dimensions, including (1) sustaining mutual understanding, (2) dialogue management, (3) information pooling, 4) reaching consensus, 5) task division, 6) time management, (7) technical coordination, 8) reciprocal interaction, and 9) individual task orientation (see Table 1). Specifically, a three-level assessment framework (1 = almost not, 3 = partially, 5 = completely) was used to measure the collaborative quality during students' PP process. Two raters completed student pair's process assessment. Two raters watched the video recordings and rated 30% of the dataset independently, and then discussed to resolve the differences between them. Finally, they rated the other data independently and cross-checked each other's rating results. The inter-rater reliability with the Krippendorff 's (2004) alpha reliability was 0.892. Second, summative assessment was conducted to measure the final products of PP. Drawing from the previous relevant literature (Wang et al., 2021;Xu et al., 2022;Zheng et al., 2022), we proposed a three-level summative assessment framework (1 = low, 3 = medium, 5 = high), including two dimensions of problem solving and coding skill Table 1 The process assessment framework of collaborative quality (Meier et al., 2007)

Rating rules
Low ( Students respected each other equally and encouraged one another to make contributions 9. Individual task orientation Both students showed little interests in the task and usually became distracted One student concentrated on the task, while the other usually became distracted Both students focused on the task at most of the time and avoided distractions Page 7 of 20 Xu et al. Int J Educ Technol High Educ (2023) 20:8 (see Table 2). Specifically, on the dimension of problem solving, two sub-dimensions (i.e., finish time, completeness) were used to assess whether the student pair completed the PP task correctly as required (Zheng et al., 2022). Two requirements of the final task were rated on the completeness dimension, respectively. On the dimension of coding skill, two sub-dimensions (i.e., coding structure, coding complexity) were used to assess whether the student pair could apply the coding skills that they have learned appropriately to solve the task (Wang et al., 2021;Xu et al., 2022). Summative assessment of final programming products was completed by two raters. Rater 1 first rated 25% of the dataset and rater 2 rated again to discuss with Rater 1 and reached an agreed assessment framework. Finally, two raters independently rated the other data and reached an interrater reliability with the Krippendorff 's (2004) alpha reliability of 0.959. Then, K-means clustering was used to extract the similar clusters of student groups' PP based on the process and summative assessment. K-means clustering, as an unsupervised algorithm, is designed to partition two-way, two-mode data (i.e., N objects with measurements on P variables) into K classes (MacQueen, 1967;Steinley, 2006). K-means clustering was run through R package factoextra (Kassambara & Mundt, 2017). To achieve an alignment, the process and summative assessment of student pairs were transferred into standard scores before K-means clustering. Elbow method was used to select and determine the optimal value of K clusters. This method gives total within sum of squares (TWSS) for each value of K through the iteration; the value of K is optimal when TWSS drops dramatically and reaches an inflection point (i.e., elbow) (Kodinariya & Makwana, 2013).

Collaborative pattern analysis
Quantitative content analysis (QCA), click stream analysis (CSA), and video analysis (VA) were used to analyze the process data of students' PP. The computer screen recording data and video recording data (with audio) were transcribed by two researchers to record students' verbal communications, operational behaviors, facial expressions in the same time scale. During the transcription, the unit of 1. Problem solving (Zheng et al., 2022) 1a. Finish time The students failed to complete the task in 25 min The students completed the task in 20-25 min The students completed the task less than 20 min 1b. Completeness All task requirements were not completed Parts of the task requirements were completed All of the task requirements were completed 2. Coding skill (Wang et al., 2021;Xu et al., 2022) 2a. Coding structure The students didn't not use the coding blocks correctly The students used more than 30 coding blocks to finish the task correctly The students used less than 30 coding blocks to finish the task correctly 2b. Coding function The students used functional blocks if and loop incorrectly The students used functional blocks if and loop correctly but did not use them effectively The students used functional blocks if and loop correctly and used them effectively Page 8 of 20 Xu et al. Int J Educ Technol High Educ (2023) 20:8 analysis for audio recording data was the unit of a sentence spoken by a student; the unit of analysis for the operation was a clickstream behavior conducted by a student when a student moved or clicked the mouse on the platform; and the unit of analysis for facial expression was one time of facial expression when a student was speaking or operating the computer. After the transcription, 19 datasets included 10,  Table 3). The coding procedure were completed by three raters. Rater 1 first coded 30% of the dataset according to the proposed coding scheme. Next, rater 2 coded the data again and discussed with rater 1 to solve discrepancies. At this phase, Krippendorff 's (2004) alpha reliability was 0.853  Xu et al. Int J Educ Technol High Educ (2023) 20:8 between two raters. Finally, rater 1 coded the rest of dataset, then rater 3 doublechecked the coding results to decide if there were any problems.
Next, three analytics methods were used to reveal the quantitative, structural, and transitional characteristics of the collaborative patterns. From a quantitative perspective, statistical analysis (SA) was used to analyze the frequency of verbal communication, operational behavior, and facial expression and then a one-way analysis of variance (ANOVA) was conducted to test the significance of differences among clusters.
From a structural perspective, epistemic network analysis (ENA) was used to demonstrate the structure of connections among the verbal communication, operational behavior, and facial expression dimensions in different clusters. ENA can detect and represent the accumulative connections between elements in coded data in dynamic networks (Csanadi et al., 2018;Shaffer et al., 2016). In this research, ENA was conducted on all codes of three dimensions. ENA Webkit (epistemicnetwork.org) was utilized to conduct ENA analysis and its visualization (Marquart et al., 2018). Referring to threshold value used in previous research (Shaffer et al., 2016), we set the threshold of edge weight as 0.25 in ENA and showed the strong and representative connections rather than all connections, in order to clearly interpret the structural characteristics among different clusters.
From a transitional perspective, process mining (PM) was used to detect and visualize the transitional processes of the verbal communication, operational behavior, and facial expression dimensions among different collaborative clusters. PM is a temporal data mining and analysis method that focuses exclusively on transitions between events or activities (Reimann, 2009;Schoor & Bannert, 2012). The software Disco 3.1.4 was used to analyze PM models that examine and visualize the code transitions (Rozinat & Günther, 2012).

Results
After the process and summative assessment of student's PP, the clustering results of K-means generated based on the distribution of corresponding standard scores. With the value of K as suggested by the elbow method (K = 4) (see Fig. 2), the optimal clustering results revealed four clusters of collaborative types, consisting of 5, 5, 6, and 3 student pairs for Cluster 1 (i.e., the yellow section), Cluster 2 (i.e., the green section), Cluster 3 (i.e., the blue section), and Cluster 4 (i.e., the orange section), respectively (see Fig. 3).

From a quantitative perspective
From a quantitative perspective, ANOVA with the Bonferroni correction was conducted to test the significant differences between the four collaborative clusters on the three dimensions. Levene tests were conducted before ANOVAs and the results showed the homogeneity of variance. Moreover, post-hoc pairwise comparisons were conducted to further reveal significant differences between clusters (see Table 4). Considering that some codes (i.e., NR, RP, NE) were not normally distributed, a non-parametric test was conducted to cross-check the ANOVA results. The results showed that there were significant differences in the frequency of KC, CR, and PO (p < 0.05) with the Bonferroni correction under the Kruskal-Wallis test. Specifically, on the verbal communication dimension, there was statistically significant difference on both KC and CR, where Cluster 1 had the highest frequency, followed by Cluster 2, Cluster 3, and Cluster 4. However, there were no statistically significant differences on the other codes (i.e., ST, QP, SR, OE, CR, FM, NR) (p > 0. 05). In addition, OE and ST appeared frequently while NR appeared infrequently in all the four clusters. On the operational behavior dimension, no statistically significant differences were found on the codes (i.e., AP, AC, RP, DB). Moreover, all four clusters had a low level of frequency on AP and a high level of frequency on AC. On the facial expression dimension, statistical significances were found on PO (Cluster 1 > Cluster 3 > Cluster 4; Cluster 2 > Cluster 4). Moreover, there were no statistically significant differences on MO (all four clusters had a high level of MO) and NE (all four clusters had a low level of NE). Different characteristics were identified among the four clusters of collaborative types, reflected by the locations of the centroid in epistemic networks (shown as red nodes in Fig. 4). In Cluster 1, the centroid of the epistemic network was located at the upper left corner, mainly focusing on PO, KC, and CR (i.e., connection value of MO-PO = 0.55, connection value of MO-KC = 0.45, connection value of MO-CR = 0.31). In Cluster 2, the centroid of the epistemic network was located at the lower left corner, mainly focusing on AG, SR, FM, and QP (i.e., Page 12 of 20 Xu et al. Int J Educ Technol High Educ (2023)   Page 13 of 20 Xu et al. Int J Educ Technol High Educ (2023)  The process mining results of four collaborative clusters. In the process models, the boxes refer to the absolute frequencies of codes and the arrows refer to the observed directional transitions from code A to code B Page 14 of 20 Xu et al. Int J Educ Technol High Educ (2023) 20:8

From a transitional perspective
From a transitional perspective, the characteristics among the four clusters were reflected by the code transitions in the process models (see Fig. 5). The regular characteristics of four clusters began by verbal communication (SR, QP in Cluster 1; OE, FM in Cluster 2; QP in Cluster 3; KC, NR, FM, QP in Cluster 4) and moderate emotion (MO in all four clusters), then moved to operational behavior (AC, RP in Cluster 1; DB, AC in both Cluster 2 and Cluster 3; RP, DB in Cluster 4), and finally ended with verbal communication (FM in Cluster 1 and Cluster 3; AG, KC, ST in Cluster 2; ST in Cluster 4). Different transitional characteristics were found among four clusters (see Fig. 5). In Cluster 1, student pairs were more likely to start with two paths, including SR AC MO and QP RP MO/PO. Student pairs mainly ended with FM, which indicated that they regulated to maintain the function at the end of PP. Moreover, three loops appeared frequently in Cluster 1, including MO CR AC MO, MO ST AC MO, and MO OE DB MO. These results indicated that students tended to adjust coding blocks through selftalking and reaching consensus, and debug the programs through expressing new opinions. In Cluster 2, students usually started their collaboration with OE and then divided into two paths, namely OE FM AC MO/PO and OE DB MO. Compared to other three clusters, Cluster 2 ended with more codes (i.e., AG, ST, KC, MO, PO). Two loops often appeared during the PP processes, including MO KC (SR QP) AC MO and MO KC AC PO AG DB MO. These results indicated that students were more likely to constructed knowledge to drive the coding behaviors, but usually argued with each other when debugging programs. In Cluster 3, students had high probability to start their collaboration with QP, then moved to the path of NE OE DB or directly moved to AC. They mainly ended with FM, which also indicated that they regulated to maintain the function of pairs in the end. Two loops usually appeared in Cluster 3, including MO NR MO and MO CR DB MO. These results indicated that students sometimes replied to the peer negatively and sometimes reached a consensus to debug programs. In Cluster 4, the code transitions and loops started with MO and ended with ST, AC and DB. Specifically, a loop of MO OE DB AG MO appeared most frequently among all loops, which indicated that they expressed opinions to debug and solve problems but usually argued with each other. In addition, the loops of MO KC MO, MO NR MO and MO FM MO sometimes appeared, which also implied that pairs not only made regulations, but also had negative interactions when constructing knowledge.

Discussions and implications
This research applied MMLA to examine students' collaborative patterns in a face-toface, computer-supported PP environment in higher education. Specifically, we collected students' multimodal process-oriented data and programming products data, and proposed an analytical framework integrating MMLA methods to detect and examine student pairs' collaborative patterns. Based on the process and summative assessment results, four clusters were detected from 19 pairs through K-means clustering, namely Cluster 1 (5 pairs), Cluster 2 (5 pairs), Cluster 3 (6 pairs), and Cluster 4 (3 pairs). Cluster 1, with the high performance in both process and summative assessment, was characterized as a positively-engaged, knowledge-constructed, and consensus-achieved pattern.
Page 15 of 20 Xu et al. Int J Educ Technol High Educ (2023) 20:8 Cluster 2, with a relatively high performance in process assessment but a low performance in summative assessment, was characterized as a moderately-engaged, argumentation-driven, and opinion-divergent pattern. Cluster 3, with the low performance in both process assessment and summative assessment, was characterized as a negativelyengaged, individual-oriented, and problems-unsolved pattern. Cluster 4, with a low performance in process assessment but a relatively higher performance in summative assessment, was characterized as a negatively-engaged, opinion-centered, and trial-anderror pattern. Overall, this research revealed four clusters of student pairs with distinct collaborative patterns and performances, that initially verify the complexity, multimodality, and dynamics of CPS as well as their relations with collaborative quality. From a theoretical perspective, this research contributed to the extant literature on CPS through revealing how complex connections among multimodality emerged into different collaborative patterns which in turn influenced the collaborative quality of final products. First, regarding the highly-performed collaborative pattern (i.e., Cluster 1), we found that opinion expression after a series of operations and trials could form a foundation for deep-level knowledge construction and group regulation to achieve high-quality collaboration (Ouyang & Chang, 2019;Park et al., 2015). Moreover, compared to negative emotions (i.e., Cluster 3, 4), students' positive emotions might contribute to the high quality of collaboration, like Cluster 1 did (Törmänen et al., 2021). Furthermore, consensus reaching in argumentation is also the key to achieve a high-quality of collaboration (Straus, 2002). Second, previous research verified that argumentation contributed to CPS through cognitive elaboration and knowledge construction (Stegmann et al., 2012), but constant argumentation without peers' consensus might result in divergence of opinions and inefficient collaboration (i.e., Cluster 2). Third, inconsistent with previous research that highlighted the role of self-talk in promoting self-regulation in CPS (DiDonato, 2013), the frequent use of self-talks (i.e., Cluster 3) might result in too much individual-oriented opinion expression and less group negotiation, which may in turn lead to the failure of collaboration. Students in Cluster 3 also spent most of the time on debugging, which somehow indicated that they encountered difficulties without successful programming in collaboration (Klahr & Carver, 1988). Fourth, compared to Cluster 3, students in Cluster 4 tended to express opinions together and appeared more programming running behaviors during debugging to achieve a relatively higher summative performance. Hence, running programming and debugging could together reflect students' persistence and productive struggle in PP that help them learn from failures (Kapur, 2008;Kim et al., 2022).
From a pedagogical perspective, instructors should concentrate on the collaborative process and provide appropriate scaffoldings and interventions to support a high quality of collaborative programming. First, instructors should provide scaffoldings to enhance student pairs' collaboration quality based on the characteristics of collaborative patterns. For example, students in Cluster 3 were more likely to be individual-orientated rather than group-orientated, which led to the low performance in both process and summative assessment; therefore, instructors can regulate their collaboration through some metacognitive scaffoldings (e.g., planning group's goal) and socio-emotional scaffoldings (e.g., encouraging students to collaborate) to achieve Page 16 of 20 Xu et al. Int J Educ Technol High Educ (2023) 20:8 group cohesion within student pairs (Molenaar et al., 2014;Ouyang et al., 2021). In addition, students in Cluster 3 and Cluster 4 had constant debugging and frequent errors, which might indicate that they were not familiar with the programming skills; therefore, cognitive scaffoldings (e.g., task-relevant information or hint) can be provided to help them solve the problems in programming (Ouyang & Xu, 2022;Zhong & Si, 2021). Second, most of the students mainly expressed moderate emotions rather than positive emotions during the PP processes. However, positive social emotion plays an important role to motivate learning interest, lessen tension, and improve social cohesion in collaboration (Rogat & Adams-Wiggins, 2015), such as how Cluster 1 performed in this research. Hence, the engagement of instructors as social supporters during students' collaborative programming, might mobilize the collaborative atmosphere to reach the goal of high-quality PP (Ouyang & Scharber, 2017;Ouyang & Xu, 2022). Third, since constant argumentation and opinion divergence are the critical factors that resulted in low-quality PP (e.g., Cluster 2), instructors are supposed to pay attention to the conflicting moments in argumentation and make appropriate interventions (e.g., easing the atmosphere, providing new ideas) to guide the coconstruction of knowledge and problem-solving (Barron, 2000). Overall, instructors should be aware of student pair' collaborative patterns as well as the complex characteristics, and support their work appropriately with varied scaffoldings. From an analytical perspective, since CPS is a complex and adaptive phenomenon (Stahl & Hakkarainen, 2021), multimodal data collection and learning analytics are suggested for future works to explore the complex problems and phenomena in CPS (Jacobson et al., 2016;. Compared to traditional performance evaluation (e.g., test score, product data) and self-report data (e.g., questionnaire, interview), process-based multimodal data and learning analytics methods provides us a holistic, complementary, fine-grained perspective to understand the complex nature of CPS (Hilpert & Marchand, 2018;Kapur, 2011). Recently, many research has used multimodal data (e.g., speed rate, gesture, body movement, eye movement) as well as learning analytics methods to examine the complex, synergistic, and dynamic collaborative patterns and characteristics in CPS (e.g., Mu et al., 2020;Wiltshire et al., 2019). Echoing this trend, this research collected student pairs' multimodal data (i.e., verbal audios, computer screen recordings, facial expression recordings, final products data) and applied multiple learning analytics methods (e.g., content analysis, epistemic network analysis, process mining) to investigate the collaborative patterns in PP as well as their quantitative, structural, and transitional characteristics. Furthermore, advanced and automated artificial intelligence (AI) algorithms (e.g., hidden Markov model, natural language processing, recurrence quantification analysis) are advised to analyze the complexity and dynamics of collaboration in the future research (Gorman et al., 2020;Hoppe et al., 2021). Compared to traditional learning analytics methods, AI-driven methods have potential to analyze multimodal and nonlinear data and extract the complex and dynamic structure of CPS (de Carvalho & Zárate, 2020). Overall, due to the complexity of CPS, it is critical to capture the fine-grained process data and utilize multimodal learning analytics to reveal the collaborative patterns as well as their implicit characteristics (Kapur, 2011;Reimann, 2009).

Conclusions, limitations, and future directions
Since it is challenging for novice programmers to succeed in collaborative programming, it is necessary to investigate how their multimodality can form different collaborative patterns and how different patterns contribute to the quality of collaborative programming. Using MMLA, the current research collected and analyzed multimodal data to understand the collaborative patterns during student pairs' PP in higher education. The results detected four collaborative patterns associated with different levels of process and summative performances. Based on these findings, the current research proposed theoretical, pedagogical, and analytical implications to guide future practice and research. There are two limitations in the current research, which lead to future research directions. First, since the current study aimed to explore collaborative clusters and patterns, the research design may lead to a threat to validity (Drost, 2011;Humphry & Heldsinger, 2014), which should be addressed in future research. For example, regarding the internal validity, we did not control the gender distribution of student pairs, which might partially influence the collaborative processes. In addition, although participants did not have prior programming foundations or experiences, no pre-test was set to measure and control students' prior programming knowledge. Moreover, the difficulty of the programming tasks may also have impacts on student collaboration. Regarding the external validity, the sample size of student pairs had a limited range of demographic backgrounds. Therefore, future CPS research is supposed to strictly control internal validity (e.g., gender, prior knowledge, task) and expand the sample size and pair structure and arrangement to test, validate, or modify the implications. Second, this MMLA research merely collected students' discourse, online behaviors, and facial expression from video data to analyze the CPS processes, and there is a lack of other multimodal data, such as physiological and psychological data. In addition, the facial expressions were coded manually rather than automated identification based on software, which might reduce the data analysis efficiency and accuracy. Therefore, AI-driven data collection and analysis methods as well as more modalities of data (e.g., physiological, eye tracking data) can provide further insights into CPS research. Overall, it is valuable to examine different collaborative patterns of novice programmers through MMLA, in order to tease out fine-grained and complex features, which serves as a data-driven evidence for promoting the quality of computer programming in higher education.