Experiences in the use of an adaptive intelligent system to enhance online learners' performance: a case study in Economics and Business courses

Several tools and resources have been developed in the past years to enhance the teaching and learning process. Most of them are focused on the process itself, but few focus on the assessment process to detect at-risk learners for later acting through feedback to support them to succeed and pass the course. This research paper presents a case study using an adaptive system called Learning Intelligent System (LIS). The system includes an Early Warning System and tested in a fully online university to increase learners’ performance, reduce dropout, and ensure proper feedback to guide learners. LIS also aims to help teachers to detect critical cases to act on time with learners. The system has been tested in two first-year courses in the fully online BSc of Economics and Business at the Universitat Oberta de Catalunya. A total of 552 learners were participating in the case study. On the one hand, results show that performance is better than in previous semesters when using it. On the other hand, results show that learners' perception of effectiveness is higher, and learners are willing to continue using the system in the following semesters because it becomes beneficial for them.

Page 2 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 competence acquisition while attempting to mitigate dropout. Dropout has been one of the most analyzed topics in online Higher Education Institutions (HEIs) (Kemper et al. 2020; Lee and Chung 2019;Mubarak et al. 2020;Xing and Du 2019). As it is known, learners tend to leave courses when they feel unmotivated, insecure about passing the course or overwhelmed about the next assignments or competences to acquire. Learners usually feel alone when studying in a fully online environment and, isolation or poor feedback is one of the main reasons for not continuing active across online courses (Bakar et al. 2020;Ross and McNealy 2020). Focusing on feedback, several works analyzes when feedback should be provided to learners and which type of feedback is most suitable in each case (Angus and Watson 2009;Voelkel 2013). Several examples can be found in the literature (Espasa and Meneses 2010;Esterhazy and Damşa 2019;Gibbs and Simpson 2005;Guasch et al. 2010;Winstone and Carless 2019). Feedback becomes the cornerstone to engage and motivate learners while it becomes a way to better track them during the whole learning process. Personalization can be achieved by knowing the learners better. Learning analytics and educational data mining research areas explore how collected and analyzed data from digital systems can enhance the teaching and learning process (Siemens and Baker 2012). Both research areas focus on tracking learners in the VLE rather than on their assessment process or engagement level. Thus, there are few insights about reducing dropout or the increment of learners' motivation. However, in recent years, several adaptive systems raised to support teachers and learners across their courses. Some of them are just collecting navigational data when learners interact with the VLE in communication spaces like virtual classrooms, debates, or forums, but few of them focus on learners' actions when performing learning activities and assessments (Mousavinasab et al. 2021). An adaptive system has been developed and tested in a fully online HEI in Spain to support learners, reduce dropout, and engage them within the courses. Most learners at our university are workers and/or adults with families (82%). 44% of them are male, while 56% of them are female. Regarding the age distribution, learners are ranged between 25 and 45 years old. The system called Learning Intelligent system (LIS) was developed, considering two main premises.
On the one hand, the LIS system processes data from an institutional data mart where historical data, as well as learners' current data about their academic life, are stored in an anonymized way. The system stores the learners' online behavior in terms of delivered assessment activities, level of engagement (i.e., navigational data, resources and tools utilization), and marks, among others. On the other hand, the LIS system runs a predictive analysis using classification algorithms based on Artificial Intelligence (AI) techniques trained with the historical data of past learners available in the data mart. The results are still stored anonymized, and they are provided to teachers with a dashboard about learners' status using an institutional deanonymizer. Learners also have a graphical representation (a three-light semaphore indicator, similar to Arnold and Pistilli (2012)) with personalized information about their learning status, likelihood to pass the course, and feedback with recommendations such as contacting the teacher to solve doubts or checking previous content consolidate prior knowledge for the next learning activities. Taking advantage of the LIS system, the aim of this research is twofold. First, the impact on the learner's performance is analyzed when using the LIS system in the Page 3 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 courses they are enrolled in. Second, to analyze learners' perception of the usefulness and effectiveness of LIS utilization. This paper is organized as follows. The next section provides the theoretical framework and background behind this research paper by describing the main aspects of the educational model, adaptive Intelligent Tutoring Systems (ITS), the study context, and the research questions aimed to answer. The second section is related to the methodological approach followed in this research and describes the participants, the study procedure, the instruments used to collect data, and the performed analysis. The third section shows the results obtained and the main findings related to the research questions. Finally, the last section provides an overview of the study's conclusions, limitations, and future research.

Theoretical framework and background
Adaptive intelligent tutoring systems Psotka et al. (1988) stated that the ITS technology is the convergence of computer-based instruction, advances in cognitive science, AI, and the increase of the computer's power. Such intersection produced in the past decades many powerful ITS such as LEARN-SQL (Abelló et al. 2016) (databases), VerilUOC (digital systems design) (Baneres et al. 2014), ACME (Soler et al. 2010) (mathematics, statistics, databases and software engineering, among others), Jutge.org (Giménez et al. 2012) and Mooshak (Fernandez Aleman 2011) (programming languages), among others. ITS are mostly autonomous with a low intervention of teachers and tend to be specific to an educational field since better recommendations and feedback can be delivered. A big challenge in this research field is to develop an adaptive ITS to be applicable to any academic field (Phobun and Vicheanpanya 2010), and develop accurate models to predict learners' behavior. There are different attempts with relevant findings (Cem 2008;Ramírez-Noriega et al. 2017) where AI plays a fundamental role in the development. The LIS project proposed in this paper aims to develop an adaptive system to be globally applicable in our institutional VLE to help learners succeed in their learning process. It should be widely applicable to all types of courses and independently of our institution's learning resources and contents. In the first step of the project, an Early Warning System (EWS) has been developed for tracking purposes to provide feedback about learners' status, for preventing dropout, and for detecting at-risk learners.
Probably the most cited work on EWS is the Course Signals at Purdue University (Arnold and Pistilli 2012) by the impact on learners' performance, retention, and satisfaction. The tool was applied to a complete learners' cohort, and the impact was analyzed after three years. The tool informed the different stakeholders (i.e., learners and teachers) with meaningful dashboards and, also, provided different intervention mechanisms ranging from sending short messages by email to face-to-face meetings with tutors or the teachers of the courses. There are different kinds of EWS depending on the focus: retention on face-to-face environments (Knowles 2014;Márquez-Vera et al. 2016), retention on online courses (Lykourentzou et al. 2009;Srilekshmi et al. 2017;Xing et al. 2016), or at-risk early detection of failing (Casey and Azcona 2017;Macfadyen and Dawson 2010;Vandamme et al. 2007;You 2016). As stated in different works (Freitas and Salgado 2020; Ortigosa et al. 2019), many EWS approaches focus on defining Page 4 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 predictive models from the available datasets to identify such at-risk conditions (Cerezo et al. 2016;Huang and Fang 2013;López-Zambrano et al. 2020), and few full-fledged developments can be found. However, the number of developments applied in real educational settings is increasing during the last years. Some systems only focus on showing dashboards for teachers (Najdi and Er-Raha 2016;Wolff et al. 2014). Other systems additionally provide information to learners (Hu et al. 2014;Ortigosa et al. 2019) since it is essential to inform and empower each stakeholder group. The LIS system (Baneres et al. 2020) provides both features by informing learners and teachers about the risk. The system also provides capabilities to easily define, explore, and select custom predictive models based on the available data. Simultaneously, personalized feedback is provided to learners based on their risk level with recommendations about resources and guidance about preventing subsequent risk situations. Currently, feedback is provisioned by the teachers, and the system automatically handles the distribution based on the learners' risk level. The LIS Project seeks eventually to develop a fully adaptive ITS capable of recommending learning resources, exercises, and additional tools to support and engage learners. This paper focuses on this early stage of the project where only teacher recommendations are provided, but a significant impact is found in performance and satisfaction. There is a positive relationship between motivational factors such as self-efficacy and self-regulation and learning engagement (Bates and Khasawneh 2007;Sun and Rueda 2012).

The assessment process as a cornerstone to track learners
One of the main pillars of any educational institution is its educational model. It is understood as a complete description of how the teaching and learning process is taking place, which is the assessment model preferred by the institution, as well as a description about what type of learning resources and learning tools and set of activities are preferably recommended to use. It is also convenient to describe how and where the teaching and learning process is taking place. The educational model is the rationale behind any institution, and it is the key to success during the teaching and learning process. When an educational model guides an institution, all courses must follow the rationale through their syllabus. The syllabus summarizes the main competences to be achieved by learners, the methodology they are going to follow across the course, a short description of learning activities, and tools and resources needed. Learners' activities become crucial for later conduct a proper assessment (Gurvitch and Metzler 2013;Melton 2002). In a syllabus, the other core piece is the assessment model. From the first days, learners should be informed about the assessment model in each course, how many learning activities they should deliver, how they will be assessed, which feedback they will receive, and how it will be provided (Espasa et al. 2018). The most valuable feedback is qualitative rather than quantitative (Terzis et al. 2012), although both are the best recommendation (Martínez-Argüelles et al. 2015;Tekian et al. 2017).
In the case of distance, blended, or fully online universities, the syllabus becomes the main piece for learners to understand how they will be evaluated, thus the assessment model but also which kind of feedback they are going to receive after delivering the learning activities. The assessment model and the feedback sent are a cornerstone to Page 5 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 help learners succeed in their courses while continuously informed about their progress. Another research (Guerrero-Roldán and Noguera 2018) stated, in this sense, that adopting e-assessment involves much more than introducing online technologies into the assessment process; it means supporting effective learning. Considering this approach and focusing on e-assessment and feedback to learners, there is a lack of systems and tools to predict when a particular learner is at-risk of failing the course. Teachers are usually making efforts to analyze data coming from VLE to have a better understanding of learners' behavior. Teachers aim to early detect at-risk learners to support, guide and encourage them to continue the course by delivering subsequent learning activities. Wandber and Rohwer (2010) stated that learning activities have a more substantial relation to learners' performance. At this point, an EWS is very helpful and seems quite effective for both teachers and learners to succeed by addressing shortcomings during the learning and assessment process. LIS system is centered on learner's success and failures in individual courses for detecting at-risk learners and providing them with rich feedback to ensure reaching the educational goals.

Background: a fully online university
The Universitat Oberta de Catalunya is a fully online university born in 1994. Its educational model is centered on the learners and the expected competences to be achieved. The university's assessment model is based on a Continuous Assessment (CA), and learners receive qualitative feedback after delivering learning activities. Teachers and tutors communicate with the learners through personal and group-class communication spaces through the VLE. The VLE includes all communication spaces to promote social interaction, a digital library, and teachers and tutors to support learners across courses. As aforementioned, our university learners mostly have a full-time job and familiar commitments, so there are some time constraints to overcome when enrolling in online courses. Learning activities, tools, resources, and learning materials are thought considering such conditions. Teachers design activities and provide feedback to learners to reduce isolation and guide learners as much as possible across their learning path. However, new systems such as LIS are being developed to reinforce learners' selfknowledge about their actual status in a course and enhance personalization. Following this rationale, the LIS system was developed and tested in the 2020 spring semester in two first-year courses to analyze the impact on learners' engagement and motivation. A full description of the technical architecture can be found on Baneres et al. (2021), while capabilities and the model described in the next section can be found on Baneres et al. (2020). Thus, this paper aims to analyze the impact on the learners' performance when using the LIS system in the courses they are enrolled in and in the learners' perception about the usefulness and effectiveness of LIS utilization. With these twofold aims, research questions are: RQ1. Is learner's performance increasing when using the LIS system?. RQ2. Do learners consider that the LIS system is effective and useful?.

Research design and participants
The development and testing of the LIS system are based on a mixed research methodology. It follows the principles of an action research methodology (Oates 2005) combined with a design and creation approach (Vaishnavi and Kuechler 2013). This is because the system is created to solve a problem in a teaching-learning environment, where learners are involved by using the virtual classroom to interact and learn. It also generates a product that deals with market needs and will be tested by pilots in real learning scenarios. This methodology has been used mainly by professionals who want to investigate and improve their own practices. It follows the following principles (Oates 2005): concentration on practical issues, an iterative cycle plan-act-reflect, an emphasis on change, collaboration with practitioners, multiple data generation methods, and finally, action outcomes plus research outcomes and research. The other related approach behind LIS is the design and creation approach. It focuses on developing new Information Technology (IT) products. Design and creation methodology is typically a problem-solving approach. It uses an iterative process involving five steps (Vaishnavi and Kuechler 2013): Awareness (the recognition of a problem where actors identify areas for further work looking at findings in other disciplines or from clients expressing the need for something); Suggestion (a creative leap from curiosity about the problem offering very tentative ideas of how the problem might be addressed); Development (where the idea is implemented, depending on the kind of the proposed IT artifact); Evaluation (examines the developed artifact and looks for an evaluation of its worth and deviations from expectations); and Conclusion (where the results of the design process are consolidated and the gained knowledge is identified).
These approaches are followed in the development of the LIS system but also in this research paper. First, the problem to solve (learners' success prediction) is detected and shared by teachers and educational institutions. The educational purpose is described to identify the problem to address in our daily tasks as teachers in online environments. Secondly, a solution (the LIS system) is provided. Thirdly, it is implemented and tested in different real learning scenarios following the iterative cycle of plan-act-reflect. This cycle will be done by analyzing experiences conducted across courses during several academic semesters by cycles. At the end of each cycle, an evaluation process will be done to correct deviations and introduce improvements. At the end of these cycles, several enhancements will be introduced, and the LIS system will be finished. Based on this approach, this work belongs to the second cycle where improvements on learners' dashboards, detection of at-risk learners, and feedback system were made. The Conclusion section provides a tentative solution to enhance the system for the next cycle. In this cycle, LIS was tested in two online courses of 6 ECTS called, Markets and behavior and Introduction to enterprise of the BSc of Economics.
Introduction to enterprise course brings learners first contact with the organization world. It offers an overview of the enterprise economy's fundamental concepts and focuses on enterprise management and administration. Several areas related to enterprises are introduced: human resources, production, marketing, and accountability. It is a fundamental and mandatory course of the Economics bachelor's degree. The continuous assessment model (CA) comprises five assessable learning activities (ALA1, ALA2, …, ALA5), where the learner should submit at least four. If the learner passes the CA, a Validation Test (VT) should be given at the end of the course, where the final mark (FM) is computed as FM = 70% CA + 30% VT. Otherwise, the learner performs a final exam at the end of the semester, where the final mark of the course is the exam score. When Page 7 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 following the CA model, learners are provided with individual feedback in the ALA1. For the next activities, feedback is just offered if the learner fails it, or he/she is just near to do not pass it. However, learners can individually request personalized feedback for each ALA.
Markets and behavior course is also an introductory course included in the specialty of microeconomy. This course facilitates the comprehension of cost-value mechanisms in the modern economy from the interaction between supply (companies) and demand (consumers). The final mark is computed in the same way as that previous course, but feedback only is provided when the learner fails, or he/she is close to failing the activity.
The participants involved in this research are the learners from both courses that agree to be part of the pilot. Note that the institutional Research Ethical Committee requires learners to give their explicit consent to be included in any study, following the General Data Protection Regulation (GDPR, https:// gdpr-info. eu/) that entered in force in 2018. After accepting the consent form, the system is granted to process the anonymized data of the data mart for such learner. Table 1 shows the total number of participants by signing the consent form and the percentage they represent. As it can be observed, gender participation is balanced, and global participation was 44.52%. Participation in the Markets and behavior course was higher than Introduction to enterprise.

Study procedure and instruments
Following the principles stated in the methodology section, the LIS system was created to improve educational practices in online educational real settings. The system has been developed for online scenarios where the teaching and learning process is entirely conducted online. The LIS system is composed of an EWS used to track learners' progress and detect at-risk learners. The learning activities' role in studying is a cornerstone to ensure the learners' learning process.
The EWS runs a predictive model called Gradual At-Risk model (abbreviated as GAR model) tailored for each course (Baneres et al. 2020). The GAR model is trained before the course starts using historical anonymized data of learners (available at the data mart) that have enrolled in past semesters and likely share characteristics with future learners. The model is trained with four classification machine learning algorithms: Naive Bayes (NB), Decision Tree (DT), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). Following the training, a validation test is performed. As a result, an optimized GAR model, where the best classifier and training set are selected, is ready to be used in a course. Figure 1 shows the workflow for a course C.
The GAR model for a course consists of several submodels. Specifically, as many submodels as ALA the course proposes (see the right part of Fig. 1). Each submodel model considers several features of the learner. Concretely, the learner's marks obtained for the already graded ALA; the number of courses that the learner has enrolled; whether he or she is a newbie learner at university; how many times the learner has enrolled in the course, and his or her Grade Point Average (GPA) which measures how well the learner scored at the university. Based on these features, a prediction for the learner is issued. This outcome is a binary variable with two possible values: fail or pass. When a learner has chances to fail the course, the EWS identifies him or her as an at-risk learner. Each ALA is graded in a qualitative scale (i.e., A, B, C+, C−, and D) where the learner passes the activity with marks from A to C+ . The grade N (non-submitted) is used when the learner does not deliver the activity. The LIS system informs about the minimum mark the learner should take on each ALA to have chances to pass the course. The minimum mark is obtained by computing the prediction using the ALA's submodel for all possible marks of the corresponding ALA. The mark that makes the prediction change from failing to pass is identified as the minimum mark (Baneres et al. 2020). The risk level is updated on each ALA whether the learner's mark is higher or lower than such minimum mark by following the decision tree of Fig. 2. Green is assigned when the mark  Page 9 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 is higher than the minimum mark, Red when the learner fails the ALA, and Yellow when the ALA is passed but the mark is lower than the prediction. Learners who do not submit the ALA are classified as another risk level (i.e., NS1 and NS2) depending on the number of non-submitted consecutive ALA. Note that the risk level classification also takes into account the submodels' accuracy on detecting at-risk learners (true positive rate-TPR) and detecting non-at-risk learners (true negative rate-TNR). On submodels with a TPR or TNR lower than 75%, the risk level is also assigned to yellow to avoid errors on classifying learners. Note that, given that predictions consider several learner features, two learners having the same mark for an ALA can have different demand levels because they have different profiles.
On the one hand, the system can detect at-risk learners and provide this information to teachers through a dashboard (Fig. 3). On the other hand, it gives learners the risk level (Fig. 4), showing them their current status and the minimum mark for each activity. Simultaneously, they receive personalized and written information through an email about how to improve for the next learning activity and subsequently activities to pass the course. The email is triggered to learners when the mark is included in the VLE and after running the prediction process to classify learners' status of being or not at-risk. This information is provided to both teachers and learners, as shown in Figs. 2 and 3, respectively. It is worth noting that the LIS system provides information to all learners, the ones that are at-risk and successful ones for encouraging them to continue in the same way for the next activities. This is one of the most distinctive aspects of the system.
On Green risk level, the learner receives a message congratulating him or her for the learning activity. On Yellow (or amber), the feedback message alerts the learner about his or her chances to pass and gives some recommendations to progress successfully through the course. Red represents that the learner is at serious risk of failing or Fig. 3 Teacher dashboard when ALA1 has been graded, the risk level has been assigned, and prediction for activity two has been generated Page 10 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 dropping out (at-risk learner). Given that learners can be in Yellow or Red risk levels for different reasons as we can observe in Fig. 2, the message changes depending on the specific case. To sum up, the LIS system's messaging system, which can be triggered automatically or by teachers' actions, provides learners with detailed and personalized feedback to better support them.
Note that the predictive model currently is not distinguishing between failing and dropping out learners. However, there are different situations, and learners from each group will have different needs (Grau-Valldosera and Minguillón 2014). For this reason, although both situations are displayed as red to learners, the LIS system provides different feedback depending on these at-risk conditions. Aforesaid, the LIS system has been tested in two first-year courses described in the Research design and participants section during one academic semester (from February to June 2020). During the whole semester, the LIS system triggered emails to learners and showed them the dashboard with the predictive status after each learning activity. This was also combined with teachers' feedback provided through the VLE. Moreover, the previous semester performance rate has been analyzed to compare if using the LIS system in these two courses has enhanced the previous performance rate. After finishing the course, a post-questionnaire was passed to learners to know learners' perceptions about the effectiveness of the system and its usefulness. The post-questionnaire was divided into three sections. The first one was devoted to demographic data (such as age and gender). The second section was dedicated to collecting learners' perceptions, including questions about effectiveness and usefulness and their experience with the LIS system. The third section was related to rating the system. A total of 10 questions (including all the sections) were sent through an online questionnaire to all learners taking part in piloting the LIS system. Learners responded to each question based on a 7-point Likert scale, ranging from 1 (strongly Fig. 4 Learner dashboard for the second learner of Fig. 3 Page 11 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 disagree) to 7 (strongly agree) or a 5-point Likert scale from 1 (Dislike a great deal) to 5 (Like a great deal) depending on the section.

Data analysis
To answer the research question RQ1, data from the LIS system and the VLE records about learners' performance have been collected in both courses across two semesters and stored as CSV files. That is, the semester where the case study was conducted and the previous semester where the LIS system was not used. Additionally, anonymized data from the data mart have also been collected from the semester where the case study was conducted to obtain the learners' profile information. We gathered the data the LIS system is using to perform the predictions about the likelihood to pass the course. Specifically, we collected the ALA grades (ALAn), the FM for the course, whether the learner is new at the institution (NEW), the grade point average (GPA), the number of enrolled subjects (N_ENR), and the number of repeated times of the current course (N_ REP). Finally, we computed the GAR model accuracy and the predictions performed by the LIS system for all learners participating in the pilot. The R language (https:// www.rproje ct. org) has been used to analyze all quantitative data. The statistical analysis of the performance was done using the unpaired two-sample Wilcoxon test due to the final mark's non-normal distribution (Kruskal 1957). Also, a descriptive analysis of the dropout and activity's performance at each learning activity was performed to show the impact of the LIS system's utilization. In addition, a multivariate regression analysis was conducted on the anonymized data to get insights about which variables are more relevant to predict the FM of the course. Finally, the accuracy of the GAR model has been analyzed by two methods. First, the accuracy of the submodels for each ALA has been analyzed by training from historical data from the 2017 fall semester until the 2019 spring semester and a validation test on the 2019 fall semester. The metrics used are described next: where TP denotes the number of at-risk learners correctly identified, TN the number of non-at-risk learners correctly identified, FP the number of at-risk learners not correctly identified, and FN the number of non-at-risk learners not correctly identified. These four metrics are used for evaluating the global accuracy of the model (ACC), the accuracy when detecting at-risk learners (true positive rate-TPR), the accuracy when distinguishing non-at-risk learners (true negative rate-TNR), and a harmonic mean of the true positive value and the TPR that weights correct at-risk identification (F score-F1.5). Second, the accuracy has been computed by checking the percentage of correct predictions regarding the learners' final mark.

System?
Before analyzing the performance, Tables 2 and 3 summarize the trained submodels' accuracy. Recall that each ALA has a submodel to predict the likelihood of passing the course based on different learner features. As we can observe, the global accuracy (ACC) in both courses is significantly high from the first activity (i.e., higher than 78%) and reaching a value higher than 90% in the last one. However, the GAR model can detect easier learners who are not at-risk (i.e., in both courses, the TNR value is higher than 89%) because learners tend to pass the courses in the past and, thus, the models are trained with such cases. Detecting at-risk learners is more difficult in the first activity in both courses. The learner's profile and only the mark from ALA1 are not enough to identify them. However, the TPR accuracy rapidly increases until a value higher than 70% in ALA2. Nevertheless, it should be interesting to compare such theoretical accuracy with the percentage of correct detections in the semester where the case study was conducted. Tables 4 and 5 summarizes the risk level distribution and performance on each ALA on Introduction to enterprise and Markets and behavior courses, respectively. Column No. shows the number of learners classified on such level, P. the number of learners who finally passed the course on such level, and F. the number of learners who failed. Note that 882 messages have been sent on Introduction to enterprise, while 1880 messages have been sent on Markets and behavior. As we can observe, most learners were assigned to the low-risk level in both courses, but the LIS system can identify at-risk learners during the semester. On Introduction to enterprise, the percentage of learners  identified as at-risk ranges from 6.80% to 15.90%, while such rate ranges from 9.60% to 14.10% on Markets and behavior. This is relevant because we have representatives at all risk levels in the study. Before the LIS system, teachers did not have any solution to detect at-risk learners and contact such a group. The only solution was to check marks manually and send messages individually, but this method is neither efficient nor scalable. The system offers awareness of the learners' status and allows teachers to intervene more frequently in the learners' learning process to guide them in the courses. When analyzing the accuracy, we can observe that the learners are mainly correctly classified based on their final performance. Green risk level correctly identifies learners who pass in both courses in all activities with an accuracy higher than 90%. Yellow risk level contains learners with different casuistic (i.e., models with low accuracy in ALA1 and ALA2 and learners that passed the activity but mark lower than prediction in all ALA). Thus, the risk level includes learners at risk and not at risk on ALA1 and ALA2. However, the accuracy improves from ALA3 (the accuracy of the model is higher than 75%) when yellow risk is only assigned to learners who passed the activity. There are few cases on red risk level because both courses increase the chance to pass the course by dropping from the final mark computation the activity with the worst mark. However, the models mostly identify learners that finally failed the courses from ALA3. Finally, learners who did not submit an activity are correctly identified in NS1 and NS2 risk levels. Here, incorrect classification Table 4 Risk level distribution and performance on Introduction to enterprise course RL is the different risk levels (G, Green; Y, Yellow; R, Red; NS1, Not submitted = 1; and NS2, Not submitted consecutive ≥ 2), No. is the number of learners identified in that level, P. is the percentage of learners identified in that level that passed the course, and F. is the percentage of learners in that level that failed the course RL ALA 1 ALA 2 ALA 3 ALA 4 ALA 5    is because some learners moved again to lower risk levels when they submitted and passed the remaining activities.

No. P. (%) F. (%) No. P. (%) F. (%) No. P. (%) F. (%) No. P. (%) F. (%) No. P. (%) F. (%)
To answer this research question about performance, three groups of learners have been analyzed. Groups refer to learners who signed to take part in LIS usage (Signed group) and those who were not (Not signed group). The third group is related to learners enrolled in the previous semester, where the LIS system was not tested (previous semester group). In all groups, learners who did not deliver any learning activity were removed because they did not start courses.
To analyze performance, we have checked the dropout rate after ALA4. Recall that it is enough to pass four from the five learning activities to pass the CA in both courses. Thus, not submitting the fifth activity is not sufficient to determine that a learner has dropped out. Results are summarized in Table 6. The table shows the number of learners in each group (n), the number of learners after removing learners who did not submit any activity (Engaged), and the dropout rate (Dropout). As we can observe, the dropout rate is significantly lower on learners using the LIS system than the not signed group, and even when comparing with the previous semester group. Such learners, who signed the consent form, are receiving additional feedback, they are more engaged because they feel belonging to a learning group, and they feel that the teacher care for them. A previous research in dropout (Rodríguez et al. 2019) revealed that dropout in our university is related with to fail or not submit the ALA proposed in the CA. Several factors affected dropout. Professional and family commitments were the most relevant ones. In turn, they were related to time management issues (workload) and the number of enrolled courses, which can cause similar deadlines for ALA of different courses. As pedagogical factors, they included difficulties in the course contents, the difficulty (or perceived difficulty) of the proposed ALA and the appropriateness of the learning resources, or how large was the ALA. These factors are even more relevant in first-year courses and newbies in online education.
The performance rate for each activity is also analyzed. Table 7 summarizes the results. Considering that marks provided on an ALA are qualitative at the university (A, B, C+, C−, and D), the performance rate has been calculated whether a learner has passed the activity (from A to C+ marks). Learners' performance in the CA has been computed through the two groups in the current semester and with the previous semester group for both courses. Results show an increment of the passing rate when the LIS system is used compared with the other two groups. Also, we can observe two additional insights. First, ALA5 passing rate decreases in all groups because many learners have already passed the CA by passing the previous activities (from ALA1 Page 15 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 to ALA4). Second, we can observe that the risk level distribution in Tables 4 and 5 is consistent with the activities' passing rate. However, the passing rate of ALA is not enough to prove that the LIS system impacted performance. After passing the CA, learners must perform a validation test to demonstrate the acquired knowledge. The performance is tested based on the final mark of the learners. A statistical analysis of the performance was done using the unpaired two-sample Wilcoxon test (Kruskal 1957) due to the final mark's non-normal distribution. Here, we assume as the null hypothesis that the marks are worse or equal when the LIS system is used. A comparison among all groups has been performed. Table 8 summarizes groups' comparison for the Introduction to enterprise course. As a result of the analysis, learners' performance in the group that signed to take part in testing LIS is much better than those who did not sign and even better than the learners' performance in the previous semester. In this sense, we can reject the null hypothesis, and this research question can be answered as yes. Learners' performance in the Introduction to enterprise course increases when using the LIS system. In Fig. 5. Results related to the final mark (i.e., p-value among groups, median and mean values) are summarized with a notched box-and-whisker plot.
In the case of the Markets and behavior course, the same analysis has been performed. Table 9 indicates that the learners' performance in the case of those who signed to take part in LIS is better than the performance of the one who did not sign and better than the previous semester's performance. Thus, we can also reject the null hypothesis, and, therefore, the answer of RQ1 is positive, which means that learners' performance is better when using the LIS system. In Fig. 6 Results related to the final mark are also summarized with a notched box-and-whisker plot.
After checking that the final mark is significantly better when using the LIS system, we considered relevant to check which variables were more important on the final   Table 9 Results of the unpaired two-sample Wilcoxon test on final mark distribution for Markets and behavior a Significance: * p ≤ 0.05, ** p ≤ 0.01, *** p ≤ 0.001, **** p ≤ 0.0001, ***** p ≤ 0.00001

Group Group comparison p-value Significance a
Signed Not signed 2e −6 ***** Signed Previous semester 2.2e −6 ***** Fig. 6 Notched box-and-whisker plot of the final mark distribution with the corresponding p-value of the unpaired two-sample Wilcoxon test, median and mean (diamond sign) values for each group for Markets and behavior course Page 17 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 mark distribution. The analysis has been done using a multivariate regression on the variables used for the LIS system to perform the prediction of the likelihood to pass the course. Due to the anonymization of the data mart, this study has been done without knowing the learners' identity. After an initial analysis using multivariate linear regression, we observed a heavy-tailed residuals distribution in both courses. After digging in the model and the data, we observed that there are three groups of learners. The first group contains the learners who passed the CA and performed similarly in the validation activity. The model correctly predicts such learners. However, the outliers appear on the remaining groups. On the one hand, some learners passed the CA but did not pass the validation test or did not even submit it. On the other hand, some learners did not pass the CA, they did the alternative final exam, and they passed the course. Due to such outliers, we proposed to use another regression approach to better fit the data. We used the non-parametric Multivariate Adaptive Regression Spline (MARS) (Friedman 1991) that automatically models nonlinearities and interactions between variables and better addresses outliers. The regression divides the dataset into multiple bins and fits each bin with a separate polynomial model. Furthermore, the approach also prunes the number of variables by evaluating the relative importance in the model. All this is done by performing several iterations until the model converges to a subset of variables with the best cuts and polynomials terms (i.e., coefficients and degrees). Appendix 1 summarizes the performed study and the final equation of the regression model for each course. On the Introduction to enterprise course, the proposed model had a final coefficient of determination (R 2 ) of 0.628 compared to the Adjusted R 2 of 0.586 of the linear regression. Note that we consider that the model is quite good, considering the large number of outliers which are difficult to predict. The interesting part of the model is the pruning process. We can observe that only the profile variable N_ENR and the ALA grades are selected to predict the final mark. Checking the relative variable importance, we can see the following order ALA3 > ALA4 > ALA2 > ALA5 > N_ENR > ALA1. From this selection, we can see that middle course ALA has more impact than the constraint that at least four activities must be submitted. Also, learners tend to drop out when low grades are obtained on initial activities. The N_ENR variable also has some positive importance when its coefficient is analyzed. Learners who are enrolled in more courses tend to be full-time learners. Therefore, they tend to have better performance; meanwhile, learners with less enrolled courses tend to have some familiar or professional commitments that limit their performance. Finally, we observe that the grade of the first activity is the least important, although it has a relevant negative impact in non-passing grades (i.e., values D and C−). Learners who do not pass the first activity may have a low motivation to continue the course and probably drop out of the course. Note that this is not observed on learners who have not submitted the activity. Some of these learners are still motivated to start the course in the second activity and try to pass it.
The model is slightly different on Markets and behavior course (see Appendix 2). The coefficient of determination of the model is 0.613 compared to the Adjusted R 2 of 0.552 of the linear regression. The pruning process removes from the model only NEW and GPA variables, and the relative variable importance is ALA3 > ALA4 > ALA1 > ALA5 > N_ REP > ALA2 > N_ENR. Similar conclusions can be drawn from ALA grades and the Page 18 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 number of enrolled courses. Middle activities have more impact on the final mark, and learners with more enrolled courses perform better. However, a larger number of repetitions have a negative impact on the final grade. Learners that stay in the course multiple semesters enter in a loop of repetitions.

Related to RQ2. Do learners consider that the Learning Intelligent System is effective and useful?
When it turns to this research question, a qualitative analysis has been conducted. From the 552 learners from both courses taking part in the pilot, 205 answered the post-questionnaire. Data coming from both courses have been merged to obtain the learners' perceptions about the LIS system's effectiveness and usefulness as a whole.
The demographic information is shown in Table 10, where we can observe that the most prominent group is going from 21 to 35 years old. Here, we found that the average age of the learners in this course is younger than usual compared to the average age of learners. Aforesaid, the average is usually between 21 and 45, and it is noticed that the learners between 41 and 45 years old are just a small group on these courses. This fact is essential to see how these learners' profiles are perceiving the usefulness and effectiveness of the LIS system. About gender, it is almost equally. 50.73% of learners who answered the post-questionnaire were male, while 48.78% were female. Just 0.49% preferred not to say their gender. According to that, there is no bias regarding gender.
As concerns the second section of the questionnaire, it asked learners about effectiveness and usefulness (Table 11). The following questions asked the effectiveness: Using the LIS system enhanced my effectiveness in passing the course, and the LIS system supported me to pass the course in an effective way. The questionnaire was also asked about usefulness by the following questions: I found it easy to use the LIS system dashboard, and the LIS system provided me access to relevant information over my learning process, so it was useful. Table 11 summarizes the main results regarding the LIS system's effectiveness and usefulness according to learners' opinions. Regarding effectiveness, learners consider that the LIS system helps to pass the course. Only 27.80% of learners' answers are located between strongly disagree, disagree, and somewhat disagree. Similar results were found in the second question. Only 24.78% of learners considered that LIS did not provide support in an effective way during the course. On the contrary, 42% and 44.38% of learners considered that the LIS system is effective (somewhat agree, agree, strongly agree) to pass the course and provide support, respectively. When turns to usefulness, 75.12% of learners pointed out that the dashboard was useful, while just 47.81% of respondents agree on the LIS system as a valuable system for providing relevant information over the learning process. This last value is understood as positive because learners consider the teacher as the main actor to support them and the tool as the vehicular resource to transfer relevant information to them. When feedback messages are sent through the system by the teachers with a set of recommendations, teachers stated that learners react by contacting teachers. According to these data, we can conclude that the answer to RQ2 is positive. Learners considered that the LIS is effective and useful for them.
As concerns to the third section of the questionnaire, three questions were asked: Please rate globally feedback messages received from the LIS system; Please rate the LIS system's dashboard globally; Please rate the LIS system globally. Table 12 shows the results. All questions were appraised with the largest scale options (Like somewhat, Like a great deal) with more than 60% of the responses. These results show a high acceptance of the functionalities of the system and the system as a whole.   -Roldán et al. Int J Educ Technol High Educ (2021) 18:36 Regarding the question for the future usage of LIS in upcoming courses, we asked learners to answer the following question: Would you consent to the activation of the LIS system for the next semester in other courses in which you will enroll?. As Table 13 shows, 68.29% of learners agree and strongly agree, while only 9.76% strongly and somewhat disagree. These values show that some learners prefer to use the LIS system. According to the third section results, we can also conclude that the system is well rated.

Conclusions, limitations, and future research
The obtained results in the case study have made it possible to answer affirmatively to both research questions. Concerning the RQ1 (Is learner's performance increasing when using the LIS system?), providing predictive feedback to learners that participated in the pilot resulted in better performance in both courses. Regarding dropout, the LIS system improved engagement during the semester, reducing the dropout significantly. These findings show that this type of feedback, combined with the learner's dashboard, positively impacted, and complemented the regular feedback mechanisms available in the courses. This is particularly important, given that both courses are fundamental ones (and hence, mandatory) that learners enroll in at the beginning of the bachelors. For many learners, it was probably the first experience with online learning, and the LIS system enhanced their learning engagement, increased motivation, and helped them in aspects such as self-efficacy and self-regulation. Other authors that have deployed EWS in real learning environments have also detected an increase of learners' success, i.e., a better learners' performance and dropout reduction. For example, Arnold and Pistilli (2012) detected a strong increase in satisfactory grades, and a decrease in unsatisfactory grades and dropout. Similar results were reported in the specific case of online learning and distance education (Hu et al. 2014;Mubarak et al. 2020;Ortigosa et al. 2019). From the teachers' perspective, the interaction with at-risk learners provides interesting insights about the types of situations that compromise the learners' continuity in the courses (Ortigosa et al. 2019), as well as the detection of learners' needs (Hu et al. 2014). The results obtained are consistent too with other investigations as Bates and Khasawneh (2007), Sun and Rueda (2012), and Yun et al. (2020). From the teachers' perspective, the LIS system helped them to better detect at-risk learners. Instead of basing their decisions exclusively on their experience, they could make decisions based on data evidence from the beginning of the course. We can also observe by checking the variable relative importance that the ALA grades are the variables that have more impact on the learners' success, as also stated in other works (Wandber and Rohwer 2010). Thus, it is crucial to help learners during the continuous assessment and provide them a feeling of belonging to a study group to improve engagement and retention (Masika and Jones 2016). Moreover, the analysis of the accuracy of the predictive model embedded into the LIS system (i.e., the GAR model) and the issued predictions shows the capacity of the LIS system to detect the potential at-risk learners effectively from the early stages in both courses. A more detailed discussion about the GAR model accuracy and its comparison with other predictive models can be found in Baneres et al. 2020. The results derived from the multivariate analysis are aligned with the results provided by the risk level classification analysis. In both courses, the multivariate analysis reveals that the third activity (ALA3) has a higher impact on the learners' final performance. The GAR model reaches an accuracy above 80% when detecting at-risk learners in the same activity. Thus, learners can be finally classified into all risk levels. Engaged learners who passed the first three activities have a high chance to pass the courses; meanwhile, at-risk learners who failed or not submitted any of the previous activities some of the at-risk levels (i.e., Red, NS1, or NS2) are triggered correctly. Therefore, the recommendation messages are properly sent to the learners' risk level. Unfortunately, the GAR model (and, therefore, the LIS system) has a significant limitation. Other EWS based on learners' performance have the same limitation (Arnold and Pistilli 2012;You 2016). When the continuous assessment process of a course drastically changes, adding or reducing the number of activities or the typology of the exercises included, the GAR model is invalid because the historical data will be useless to predict learners' performance according to the new assessment model. This highlights the need of developing contingency models when the GAR model is not valid based on other features such as VLE interaction or learning resources utilization.
In the light of these results, we can also explain learners' high appraisal of the LIS system regarding effectiveness and usefulness and their willingness to use the LIS system in future courses they enroll (68.29% of learners answered positively about using the LIS system in the future). These aspects allowed responding RQ2 positively (Do learners consider that the LIS system is effective and useful?). Here, it is worth noting that opinions were collected through a post-questionnaire, and the answers were mostly from younger age ranges. Specifically, 31.71% of the respondents (the largest group) were between 21 and 25 years old. Younger learners may probably have more problems in managing self-efficacy and self-regulation, and they use systems as LIS can support them (Shane and Heckhausen 2016;Williams and Hellman 2004). A high appraisal about the usefulness of EWS was also found in related research. For example, Arnold and Pistilli (2012) reported that most learners perceive the messages generated by the EWS as personal communication with their teachers, and as an opportunity to change their behavior. Additionally, the availability of dashboards has been positively perceived for the learners as an efficient way to quickly understand their learning status in the course (Hu et al. 2014).
Despite positive results, the case study has two main limitations. First, a self-selection bias is induced by the constraints imposed by the Ethical Committee. Learners that participate in pilots tend to have better performance and a predisposition to test innovative learning tools. Second, a mortality bias may affect our qualitative analysis because not all the pilot participants answered the post-questionnaire. Concretely, probably many of the learners that failed or dropped out of the course did not answer it. We cannot have this information because the post-questionnaire was anonymously processed according Page 22 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 to the Ethical Committee's rules. Consequently, a deeper qualitative analysis including learners interviews as well as focus groups should be performed in next iterations to dig into learners' mindsets, motivations, and beliefs for using or rejecting the EWS and its perceived usefulness in helping them to achieve their learning goals. It can be also interesting to conduct a longitudinal study to better analyze learners' cohorts to see if results are consolidated across semesters. As future work, the LIS system will continue evolving to reach its aim to be an adaptive ITS through a new cycle iteration. To provide specific and automatic recommendations, we first need to collect information about competence acquisition at the activity level. Such information can give better fine-grained control about learners' needs and lack of knowledge. If we can classify resources, activities, and tools within courses by competence coverage, we will be able to develop the final stage of the LIS system. Additionally, we observed that the feedback messages are highly appraised, but such statements are only sent after each activity. This means that there is a time window of 2-3 weeks between messages. In some cases, teachers may lose some learners between such intervals. The time window should be reduced to increase even better the system's effectiveness and one option that will be considered is to build a (near) real-time model to predict potential dropout. Such a model may impact daily learners with specific recommendations to enforce engagement and, in the end, help the learners to pass the course.

Appendix 1
Here, we show the result model based on the multivariate regression splines (MARS) on the Introduction to enterprise course. The computation has been done by the earth R package (Milborrow et al. 2014). Table 14 shows the basis functions (BFs) with the corresponding coefficients produced by earth command with default arguments and the summary command. Note that, qualitative grades from ALA have been transformed to numerical values (i.e., N-0, D-1, C−-2, C+ -3, B-4, A-5). Page 23 of 27 Guerrero-Roldán et al. Int J Educ Technol High Educ (2021) 18:36 The relative importance of the model computed by the evimp command is shown in Table 15.

Appendix 2
Here, we show the result model based on the multivariate regression splines (MARS) on the Markets and behavior course. Table 16 shows the BFs with the corresponding coefficients.
The relative importance of the model computed by the evimp command is shown in Table 17.