Research design and participants
The research methodology behind the development of the LIS system uses a mixed research design. An action research methodology (Oates, 2006) combined with a design and creation approach (Kuechler & Vaishnavi, 2012) has been used. The former is used because a product needs to be developed and tested in real-case learning scenarios. In this case, the product is a system that aims to solve a problem in teaching–learning environments. The principles of an action research methodology are to focus on practical issues, an interactive plan-act-reflect cycle, collaboration with practitioners, and data generation methods to evaluate the product outcomes. The latter involves the creation of new Information Technology (IT) artifacts. The methodology is an iterative problem-solving approach composed of five steps (Kuechler & Vaishnavi, 2012): awareness of the problem to be solved; suggestion of tentative ideas to solve it, development of an IT artifact based on the suggested ideas, evaluation of the artifact whether it meets the expectations, and conclusions about the results to gain knowledge.
This paper focuses on the second iteration of developing the dropout intervention mechanism integrated within the LIS system. In the first iteration, the dropout predictive model was designed and compared with previous approaches proposed by other authors. The second iteration involves the integration of the model into the teaching–learning process by designing an intervention mechanism for dropout at-risk learners. In this cycle, the dropout intervention mechanism has been tested in an online course of 6 ECTS called Markets and behavior of the Faculty of Economics and Business.
Markets and behavior is an introductory course included in the specialty of microeconomy and mandatory within the Faculty of Economics and Business. This course facilitates the comprehension of the characteristics and adjustment measures in the modern economy based on the interaction between supply (companies and their costs) and demand (consumers and their preferences). The assessment model comprises five CAA combined with a Validation Test (VT) for learners who passed the CAA. The condition to be eligible to perform the VT is to pass the CA with a score greater than five and submit four out of the five CAA. The final mark FM of the course is computed as FM = 70% CA + 30% VT. If the CA is not passed or the learner decides not to do the course through the CA, the learner can perform a final exam where FM is the mark of the exam. When performing the CAA, a learner receives individual feedback when she fails an activity or upon request. This course has been selected because it is a first-year mandatory course within different faculty degrees (i.e., BSc. in Economics, BSc. in Business Management and Administration, BSc. in Tourism, and BSc. in Market Research and Marketing) with a large number of enrolled learners. The course also offers some flexibilization strategy (i.e., four out of the five CAA have to be submitted) that could mitigate even more dropout issues combined with the proposed mechanism. Finally, reducing dropouts in this course can also be advantageous at the degree level. However, this last assumption is out of the scope of this paper and requires a longitudinal analysis of cohorts after several semesters.
The research design is illustrated in Fig. 1. The dropout intervention mechanism has been tested on the Markets and behavior course during the 2021 Spring semester (i.e., from February to June 2021). Learners interested in participating signed a consent form to participate in the pilot because the institutional Research Ethical Committee requires an explicit acceptance to be included in any study, following the General Data Protection Regulation (GDPR, https://gdpr-info.eu/). The consent form informs about the data the system collects and processes and its capabilities. After acceptance, learners will receive a daily prediction about the likelihood of dropping out (i.e., not submitting the ongoing CAA) in a personalized dashboard. If a learner is identified early as at risk, the intervention mechanism will automatically provide the learner with meaningful information to help submit the ongoing CAA. Figure 1 shows the high participation of 60.71% of the learners (i.e., 581 from 957). Note that the LIS system also incorporates a course failure identification mechanism, whose impact was analyzed in (Guerrero-Roldán et al., 2021), and it is the baseline system used in the previous semester (i.e., 2020 Fall semester). We gathered the learners' performance results from the previous semester who consented to use the course failure identification mechanism. The participants' conditions were the same, but learners only received information from the intervention mechanism to avoid course failure. There was also high participation of 69.16% (i.e., 1036 from 1498). After finishing the course, the performance of the dropout risk-level identification mechanism is analyzed (i.e., the percentage of correct dropout identifications regarding the submission event) to answer the RQ1; the impact of the intervention mechanism is evaluated by comparing dropouts on participants versus learners who do not consent and learners from previous semester who only tested the failure identification mechanism to answer RQ2; and the correlation between both risk-level identification mechanisms to answer RQ3.
Study procedure and instruments
As previously described, this iteration involves integrating the predictive approach to detect potential dropout learners with the intervention mechanism within the LIS system.
The predictive approach involves a two-step method. In the first step, a predictive model denoted as PDAR (Profiled Dropout At Risk) is trained for each day of the course. Such a model takes into account learner's profile information (i.e., number of enrolled courses in the current semester, number of repeated enrollments in the target course, whether she is a novice learner, and the grade point average of the academic report), performance data within the course (i.e., the score of the already assessed CAA, and whether previous CAA have been submitted), and daily clickstream data about the VLE utilization (i.e., access to the VLE, the classroom of the target course, the resources, and tools; and number of messages read in the forum and teacher's blackboard). The model's outcome is the likelihood of not submitting the ongoing CAA. Aforesaid, we defined the learner's dropout risk as the risk of not submitting the ongoing CAA. Not submitting an activity in a CA model is evidence that the learner may drop out eventually from the course. The outcome is a binary variable: 1—not submitting, 0—submitting. However, such a model suffers a low accuracy in detecting non-dropout learners because learners typically do not access the VLE daily.
Therefore, DTWin is proposed for each CAA which improves the accuracy of identification by reducing the false positive cases (i.e., non-dropout learners incorrectly identified as dropouts). Recall that the DTWin is an interval of consecutive days defined for each CAA. A learner is considered a dropout in a CAA when she is predicted as a dropout by the PDAR model during a consecutive number of days larger than the specified DTWin size. Although teachers can manually define the DTWin size based on their preferences and experience, the LIS system can provide a recommended DTWin size based on an optimization procedure that maximizes the correct identification of dropout and non-dropout learners. The optimization process searches in the testing dataset (i.e., learners of the previous semester) for each learner the largest interval of consecutive days that the learner is predicted by the PDAR model as a dropout and gathers whether she has not submitted the CAA. Then, the process identifies for each possible DTWin which learners have been predicted as dropouts by varying the window size from one day to the total number of days of the CAA. When the explored window size fits in the largest dropout interval for a learner (i.e., the largest interval of consecutive days predicted as a dropout is larger than the window size), the learner's prediction is set to the likelihood of dropping out for such a window. Finally, TPR and TNR's metrics can be computed from the predicted and actual dropout events for each window size. The best window size (i.e., the DTWIN size) is selected by maximizing the sum of the TPR (i.e., accuracy when detecting at-risk dropout learners) and TNR (i.e., accuracy when detecting non-at-risk ones).
Based on the PDAR predictive model, the LIS system computes a dropout prediction for each learner and day of the course. This information is transformed into a risk level based on different colors based on the decision tree shown in Fig. 2. This decision tree has been built based on experimentation on the failure identification system. The decision tree applies to all university courses and differentiates between certainly and possible predicted likelihood dropout events depending on the DTWin accuracy on the target course. Note that course personalization is achieved by the course's corresponding dropout model and the teacher's intervention messages. The risk level is set to an intermediate risk level when the accuracy is low due to possible misprediction. Additionally, the decision tree considers the non-submission event of the previous CAA as a relevant event. Even though a learner has not submitted one CAA, she can still be active and receive interventions. However, when the learner does not submit more than one CAA, the learner has probably dropped out of the course, and no intervention is triggered. From previous pilots, we observed that there is room for improvement when a learner does not submit one CAA but not when more than one (Rodríguez et al., 2022). The colors differ for learners and teachers. While learners only have a three-color risk level following a traffic light metaphor, teachers have more colors to know each learner's status better. The green color is given to low-risk learners and learners who do not reach 30% of the DTWin size. After different experiments, we observed that most active learners tended to access VLE below such a limit. This decision tree has been built based on experimentation on the failure identification system. The decision tree applies to all university courses and differentiates between certainly and possible predicted likelihood dropout events depending on the DTWin accuracy on the target course. Note that course personalization is achieved by the course's corresponding dropout model and the teacher's intervention messages. The risk level is set to an intermediate risk level when the accuracy is low due to possible misprediction. Additionally, the decision tree considers the non-submission event of the previous CAA as a relevant event. Even though a learner has not submitted one CAA, she can still be active and receive interventions. However, when the learner does not submit more than one CAA, the learner has probably dropped out of the course, and no intervention is triggered. From previous pilots, we observed that there is room for improvement when a learner does not submit one CAA but not when more than one (Rodríguez et al., 2022). The yellow color for learners represents a medium risk with different meanings. This color is set for learners transitioning to the DTWin size limit where not submitting some previous CAA is also considered (i.e., Y30%, YSNS, and YSNS2 levels). It is also assigned to activities with low-quality predictive models (i.e., where the accuracy of detecting at-risk learners—TPR and non-at-risk learners—TNR are lower than a threshold of 70%). A low-quality model cannot guarantee the prediction's correctness; therefore, high and low risks are set to medium. Finally, the red color represents a high-risk level when the DTWin size is reached, and the predictive model is a high-quality one (i.e., the TPR larger than 70%). Note that low-quality models for detecting dropouts have an orange color for teachers to distinguish at-risk levels. Red color (and black for teachers) is also assigned to at-risk learners who did not submit previous CAA (i.e., represented in the decision nodes as not submitted—NS and NS2).
This information is provided to teachers and learners in dashboards. Figure 3 illustrates the teacher's dashboard where the two risk level mechanisms are provided: course failure and dropout. For each CAA, the risk level of course failure (Baneres et al., 2020) is shown in a green-amber-red-black color distribution with the percentage of learners passing the course based on historical data. The number of learners in the historical data with a similar risk level, the predicted grade to have a medium or low risk of failing, and the grade finally obtained by the learner are also shown. The course failure risk level for the ongoing CAA is computed when the previous CAA is assessed. For example, we can observe in the figure that the risk level for the CAA3 has been computed. This information is ready while the learners are performing the CAA3 and when the teacher has assessed the CAA2. Thus, the learner knows before submitting the CAA which grade must obtain to likely pass the course (i.e., medium- or low-risk level). This goal-setting information intends to motivate the learner to reach the minimum suggested grade.
The dropout risk level is based on the dropout classification tree shown in Fig. 2. A dropout risk level is provided for each day of the ongoing CAA showing an evolution through a sparkline chart. Each prediction has contextual help to inform the teacher about the detailed description of the risk. Figure 3 shows many of the risk levels. For instance, taking into account that the CAA1 has a low-quality model for Markets and behavior (i.e., TPR smaller than 70%), the YLowD risk level (i.e., orange color) is delivered to the second learner instead of the R high-risk level when the DTWin size is reached. Other CAA have a high-quality model that allows assigning the R high-risk level (i.e., red color) to the first and third learner in the CAA2. Those learners did not submit the CAA2. Therefore, we can observe that YSNS and NS risk levels (i.e., yellow and black colors, respectively) are assigned in the CAA3 in the current day. Note that when the learner submits the CAA, the sparkline shows the event with golden color, and the dropout risk level computation is deactivated for such learner until the next CAA.
A learner also receives information about her potential risk levels. Figure 4 illustrates the main part of the dashboard for the first learner of Fig. 3, where the risk-level semaphores for each risk level are shown together with information about personal progression and comparison with other course learners. Each piece of information also provides additional explanations to inform the learner about her status.
The dashboard also provides detailed information about risk-level identification. Figure 5 illustrates such part. The risk level for activities section summarizes the information for the risk of course failure. In assessed CAA, the learner observes the risk level where her grade is, and for the ongoing CAA, the grades distribution for each risk level. The dropout risk is shown in the risk level of not submitting the current activity section. Here, the risk is drawn on an at-risk bar divided by as many points as the DTWin size has. The DTWin size and, therefore, the number of specific points in the bar are not specified to the learner to avoid technical details. The learner only observes the proximity of the at-risk level marked by two triangles which, in fact, indicate the number of consecutive days the learner has been predicted as a dropout for the corresponding PDAR model of the CAA. In the example, the DTWin size of the CAA3 is four days, and the learner has been predicted as a dropout for three consecutive days. If the PDAR predicts her as a dropout on the following day, the DTWin size will be reached, and therefore, the learner will be considered a potential dropout, and the red color will be set in the dropout risk semaphore.
Finally, the system has a built-in intervention mechanism that only sends messages to learners on YLowD, R, and NS risk levels, as shown in Fig. 2. As previously discussed, such risk levels are the ones where the messages may have some impact on reverting the potential dropout issue. The messages are triggered automatically by the system when such risk levels are detected on behalf of the teacher. However, the content of the messages has been previously designed and written by the teachers because their expertise helps to enhance the goal-setting objectives of the messages. In order to increase personalization, there is a different message for each risk level and CAA.
The LIS system has been tested in the Markets and behavior course during the 2021 Spring semester. The learners receive a daily prediction about their risk of not submitting the next CAA jointly with the prediction of passing the course after assessing each CAA. In case of being at risk, the intervention mechanism sends the corresponding message to the learner. This paper analyses the dropout risk-level identification and intervention mechanism by answering the research questions stated before.
Data analysis
We used data from different sources to answer the research questions that arose in this paper. First, records from the LIS system have been used to analyze the performance of the DTWin approach (RQ1) and the relationship with the risk level of course failure (RQ3). Also, such data have been combined with anonymized data from the institutional data mart for analyzing the engagement of participants after receiving an intervention message (RQ1) and the dropout impact (RQ2).
The performance of the DTWin approach uses the following metrics:
$$TNR= \frac{TN}{TN+FP} \quad ACC= \frac{TP+TN}{TP+FP+TN+FN}$$
$$TPR= \frac{TP}{TP+FN} \quad {F}_{1.5}= \frac{\left(1+{1.5}^{2}\right)TP}{\left(1+{1.5}^{2}\right)TP+{1.5}^{2}FN+FP}$$
where TP denotes the number of at-risk students correctly identified, TN the number of non-at-risk students correctly identified, FP the number of non-at-risk students not correctly identified, and FN the number of at-risk students not correctly identified. These four metrics are used for evaluating the global accuracy of the model (ACC), the accuracy when detecting at-risk learners (true positive rate—TPR), the accuracy when distinguishing non-at-risk learners (true negative rate—TNR), and a harmonic mean of the true positive value and the TPR that weights correct at-risk identification (F score—F1.5). The accuracy of the dropout risk-level identification is computed by checking the percentage of correct dropout identifications regarding the learners who finally have not performed the VT or final exam.
Statistical analysis has also been used to answer the research questions. Results have been computed with R language scripts embedded in the LIS system. Boschloo's unconditional test has been performed to see the association between submitting the CAA and participating in the pilot represented as binary variables (RQ2). Note that no association would indicate that the submitting event is not conditioned by participating in the pilot and, therefore, using the intervention mechanism. Moreover, a Chi-squared test is performed to check the independence between both risk-level identification mechanisms (RQ3). In the case of dependency, the Chi-squared test does not give enough evidence about the correlation among variables (i.e., risk levels identification mechanisms) with more than two levels. Therefore, the Cramer's V test is used to measure the percentual correlation between both risk-level identification mechanisms.