Based on our results, this article’s novelty lies in providing the following contributions, which have not been addressed in prior research, to the best of our knowledge:
Empirical evidence showing the long-term effectiveness of an innovative gamification design, which features fictional and competitive-collaborative game elements, in improving positive learning behaviors;
Empirical evidence revealing how the effect of gamification decreases (novelty effect) and naturally increases, with no intervention (familiarization effect) over time;
A heuristic on a ’safe period’ in which educators can use gamification before its effect starts to decrease;
A heuristic on how long gamification studies should last to ensure their findings are roughly stabilized.
To further support these contributions, the remainder of this section discusses our findings, their implications, this article’s limitations, and future research recommendations.
First, our findings revealed that the effect of gamification effect decreased after four weeks, for all behavioral measures, although the magnitude of that change differed between measures. THe effect of gamification either was maintained from the first time-point, or it increased between time-points one and two. However, it decreased from moderate-to-large to negligible, large to small, and had it size diminished for attempts, IDE usage, and system access, respectively. These findings support the novelty effect as, during the first four weeks, the effects were positive, after which they decreased. These findings corroborate most previous research, showing gamification impact decreasing over a period of time (e.g., Mitchell et al. (2017); Putz et al. (2020); Sanchez et al. (2020)). Compared to these, our study expands the literature, by providing evidence that the novelty effect holds in a different context, as well as for a distinct gamification design. In addition, this study expands the literature by demonstrating the extent to which the novelty effect influences the gamification impact, in terms of changes on effect size magnitudes. We also show that changes in magnitudes differ among similar measures, similar to what Mavletova (2015) found in the context of online surveys, yet extended to our context and gamification design. On the other hand, results by Grangeia et al. Grangeia et al. (2019) suggested that gamification impact was positive throughout the whole intervention. Whereas we show that this impact decreased after four weeks, their intervention lasted only four weeks. Therefore, considering the design we employed features elements likely to enhance gamification potential Sailer and Homner (2019), their results might be due to the limited intervention duration, or the different gamification design.
Second, our results indicate that after the decrease, the gamification effect showed a trend of increasingly positive impacts. While the decrease first appeared from time-point two to three, it promptly ended for attempts, but lasted until time-points four and five for system access and IDE usage, respectively. After the end of the decline, our results show an increase in gamification impact for all measures, which either continued to increase (attempts) or achieved a rough plateau (IDE usage and system access). These findings support the familiarization effect, in which users need some time to fully familiarize themselves with gamification. This effect was previously discussed by Van Roy and Zaman Van Roy and Zaman (2018), when they found that learner motivation appeared to follow a polynomial behavior over time. Similarly, Tsay et al. Tsay et al. (2020) noted that, over time, the impact of gamification decreased in the first term, but not in the second one, which also suggests a familiarization effect. However, neither of these studies performed such analyses by comparing data from the gamification intervention to a control, non-gamified group. Therefore, our contribution here is providing empirical support for the familiarization effect, based on a quasi-experimental study, comparing data from gamified and non-gamified interventions, besides extending it to the context of Brazilian students from CS1 classes from STEM courses, as well as to our gamification design. Additionally, it is worth noting other long-term studies (e.g., Hanus and Fox (2015); Sanchez et al. (2020)) are limited in the number of time-points considered (three) and data analysis method (linear modeling), which prevented them from finding such a non-linear pattern. Thus, our findings suggest that the decrease in gamification impact, likely due to the novelty effect, might happen because learners need some time to familiarize themselves with it.
Third, our findings revealed that, overall, the gamification effect was positive, and, importantly, that it was not negative at any point in time. Considering the whole intervention period, all behavioral measures of participants in the gamified condition outperformed those in the control setting. Considering each time-point separately, effect sizes estimated in the analyses corroborate that positive impact, with the exception of a few points in which gamification impact was nonsignificant. This finding is aligned to the overall gamification literature, which shows most studies report positive with null results Koivisto and Hamari (2019). Additionally, these findings are aligned to the gamified learning literature Sailer and Homner (2019), revealing average small-to-moderate positive impacts on behavioral learning outcomes. The literature has demonstrated cases in which gamification applied to education is associated with negative outcomes Toda et al. (2018); Hyrynsalmi et al. (2017), which is harmful to students’ learning experience and reinforces the need to carefully designing gamification, to prevent such cases. A possible reason for avoiding these pitfalls might be the inclusion of fictional and competitive-collaborative elements, which have been suggested as positive moderators of gamification’s impact on behavioral outcomes Sailer and Homner (2019). In this sense, this study contributes by scrutinizing the effects from the employed gamification design over several time periods, finding that besides being positive on average, it was not harmful to learners at any point in time.
Finally, these findings additionally demonstrate the effectiveness of adding an innovative design to the educational process of teaching programming. On one hand, gamification has been seen as an educational innovation, with the potential to improve learning experiences Hernández et al. (2021); Palomino et al. (2020). However, most research on gamification has been limited to standard designs, featuring elements such as points, badges, and leaderboards Bai et al. (2020); Koivisto and Hamari (2014); prior gamification research also lacked long-term studies Nacke and Deterding (2017); Alsawaier (2018); Kyewski and Krämer (2018). On the other hand, unlike most prior research, this paper analyzed a gamification design featuring fictional game elements Toda et al. (2019). Additionally, that design also features competitive-collaborative elements, while most research is often limited to leaderboard-based competition Mustafa and Karimi (2021). Such distinction is important, because, despite being rarely used, such fictional and competitive-collaborative game elements are considered maximizers of gamification’s effects Sailer and Homner (2019). Therefore, Codebench provides to students an innovative educational design, not only in terms of being gamified, but also because its gamification design itself is innovative, compared to the prior works. Thus, this article contributes with evidence demonstrating that adding an innovative educational design to programming lessons, implemented through gamification featuring both fictional and competitive-collaborative game elements, is able to improve learning behaviors in the long run.
Nevertheless, in addition to the gamification design, this study differs from previous research in terms of the context. Compared to Mavlova Mavletova (2015), Mitchell et al. Mitchell et al. (2017), and Putz et al. Putz et al. (2020), contexts differ, because we focused on the learning domain and they did not. Compared to Hanus and Fox Hanus and Fox (2015), Grangeia et al. Grangeia et al. (2019), Sanchez et al. Sanchez et al. (2020), and Tsay et al. Tsay et al. (2020), contexts differ because neither of them involved CS1. Consequently, our study context differs from theirs in terms of the task: we gamified programming assignments, whereas coding is not part of the classes involved in their research (e.g., psychology, medicine, and communication). Compared to Rodrigues et al. Rodrigues et al. (2021), who also gamified CS1 activities, most of those activities were not coding assignments. Additionally, their participants were Software Engineering students, a course highly linked to coding. In our study, most participants were students of STEM courses, such as Electrical Engineering and Mathematics. That difference is key, as such STEM students often struggle to see the value of coding for their future occupations Fonseca et al. (2020); Pereira et al. (2020). Therefore, we can summarize that our study context differs from those of prior research in terms of domain, learning task, and major. Note that acknowledging such differences is important because research suggests that one’s experience with gamification might change, depending on the task Liu et al. (2017); and that different designs should be adopted depending on the learning activity Rodrigues et al. (2019); Hallifax et al. (2019). Consequently, the difference in contexts might have affected our findings when compared to similar studies. Thus, we also contribute to the discussions on the role of contextual aspects (e.g., domain, task type, and major) in gamification, raising the question of how those factors affect gamification effectiveness.
Our findings have major implications for higher education. That is, they provide consistent support for the effectiveness of an increasingly popular approach, which has, nevertheless, been rarely analyzed in the long term: gamification. From our findings, we have empirical evidence that gamifying programming assignments has an overall positive effect on students’ submissions (attempts), IDE usage, and system access. Such behaviors are valuable for learning, as empirical evidence shows that the more one solves questions and spends time on a task, the better their learning is Rowland (2014); Rodrigues et al. (2021); Sanchez et al. (2020). Importantly, our study demonstrates that the innovative design was able to improve positive learning behaviors throughout the course of a whole semester. Thus, our findings’ educational implication is that if practitioners deploy a gamification design similar to ours, they are likely to be adopting an innovative educational design that will improve their students’ learning.
From that main contribution, we have derived five additional implications that contribute to educational practice and gamification research. First, despite the fact that the gamification effect is likely to decrease over time, it is unlikely to start decreasing before four weeks. Beyond supporting the novelty effect, our findings open the debate in terms of when it starts to act. Our results suggest that the effect od gamification only started to decrease after four weeks of use. Despite several discussions in the literature that gamification effect suffers from the novelty effect, little has been explored and discussed in terms of when the decrease starts. Therefore, this finding has both practical and theoretical implications, informing practitioners on a safe period in which gamification can be used without losing its power, as well as providing researchers with a threshold that can be assessed in further research, to ground the extent to which the novelty effect starts acting.
Second, whereas the novelty effect is often present, the familiarization effect seems to naturally address it. Our findings demonstrate that the impact of gamification on all measures started to decrease at some point in time. However, we also showed that such impact then shifted back to an increasing trend, without any intervention (e.g., changing the gamification design). This behavior, which resembles a U-shaped curve, has been called the familiarization effect: after some (familiarization) time, gamification’s effect is enhanced. Additionally, it should be noted that such effect appeared after the downtrend of the novelty effect. Providing empirical support for the familiarization has practical implications as, despite the fact that the impact of gamification might decrease after a period of time, a recovery is likely to happen, after users familiarize themselves with the game elements. Furthermore, empirically supporting the familiarization effect has theoretical implications. From our findings, the familiarization effect tackles the drawback from the novelty effect within a few weeks (two to six) after the end of the downtrend. To our best knowledge, however, such analysis, based on a comparison to a control group, has not been performed before in previous research. Hence, our findings call for future long-term, longitudinal gamification studies, to better understand the familiarization effect.
Third, gamification likely suffers from the novelty effect, but benefits from the familiarization effect, which contributes to an overall positive impact. Our findings corroborate gamification literature on three perspectives. They demonstrate that i) using gamification positively impacted on learner behavior, ii) gamification impact suffered from the novelty effect, and iii) some time after the end of the novelty effect, gamification’s effect was enhanced via the familiarization effect. Therefore, from a practical point of view, despite a gamified intervention seemingly loosing power or failing to work after some time, it is likely to gain power again in the future, although the effect might not go back to its initial values. From a theoretical point of view, more research is needed to understand the magnitude of both the novelty and the familiarization effects, as well as for shedding light into how long the uptrend of the familiarization effect lasts.
Fourth, studies lasting less than 12 weeks are likely to be insufficient for truly revealing gamification’s impact. Our findings indicate that the novelty effect starts to act only after four weeks. In addition, our findings show that the final uptrend of the familiarization might start only after 10 or 12 weeks. Therefore, studies lasting less than this period of time might reveal unreliable findings, as they might fail to consider such changes that happen during learners’ experiences with a gamified intervention. Nevertheless, it might be that even longer studies are necessary, given that, for attempts, the increase continued until the end of the intervention (14 weeks). These findings have practical and theoretical implications as well, informing researchers on the duration of the experiments they will conduct, and practitioners on how long they should use gamification before assessing its impacts.
Lastly, our article has implications on how researchers can deploy similar studies in other contexts, learning activities, countries, and for other student types, further underpinning our findings. As our evidence is limited to the context of STEM students completing programming assignments, our results should be seen in this context. Then, as the literature will benefit from understanding how the gamification effect varies in other settings, future research needs to generalize our study design to that end. For instance, one could use gamification to motivate completing different learning activities (e.g., multiple-choice quizzes) or following other behaviors (e.g., class attendance). In exploring other behaviors/learning activities, researchers can develop similar studies outside the STEM context. Thus, we contribute to future research with directions on how to ground our findings based on other samples/populations.
This study generated several limitations that must be acknowledged when interpreting its findings. First, participants were not randomly assigned to experimental conditions, because the gamified version of the system used in this research was only available after 2016. To mitigate that, we compared participants in terms of demographic characteristics and, through preliminary analyses, showed the roughly one-year difference among groups is unlikely to confound our results. Additionally, meta-analytic evidence suggests the lack of random assignment does not affect the effect of gamification on behavioral learning outcomes Sailer and Homner (2019), further mitigating the impact of this limitation.
Second, the disciplines involved in this study did not have the same instructors throughout the three years of data collection. This limitation especially emerges because UFAM has a policy that professors take turns with the disciplines they teach. Despite that, pedagogical plans, learning materials, and the problems within the tasks participants completed were similar. In addition, meta-analytic evidence suggests that experimental conditions having different instructors does not affect gamification impact Bai et al. (2020), further mitigating this limitation’s impact.
Third, group sizes were highly unbalanced. From the perspective of sample sizes, this limitation is handled by the number of participants for each condition (\(\ge\) 138) being above the average of the total sample of gamification studies, as reported by secondary studies (see Koivisto and Hamari (2019) and Sailer and Homner (2019), for instance). From the perspective of conclusion validity, we addressed this limitation, by answering our research question using robust statistical analyses that, among other advantages, handle groups with different sizes Wilcox (2017).
Fourth, our discussions regarding the shape (change) of the effect of gamification over time are based on visual inference. Although we selected robust statistical methods to guarantee conclusion validity, we did not perform, for instance, growth curve analysis. We chose this approach to enable comparing gamification effect at each time-point, as well as determining the effect’s magnitude, rather than understanding the effect’s curve. Consequently, we provided evidence on each time-point’s difference and inferred the curve shape from visual analysis, upon that evidence. While that is a limitation, we opted for it instead of more conclusive approaches (such as regression analysis based on p-values), due to our exploratory analysis goal, as the literature recommends Vornhagen et al. (2020); Cairns (2019); Dragicevic (2016).
Fifth, regarding our study design, as this was a quasi-experimental study conducted over three academic years, its internal validity is likely affected. Indeed, gamification studies are often criticized, due to lack of methodological rigor Hamari et al. (2014); Dichev and Dicheva (2017). However, we made this choice in exchange of having a higher external validity, as it was conducted using a real system, within the context of real classrooms. Another aspect that could be discussed is the lack of a pre-test, in which pre-existent differences could be the source of differences rather than the experimental manipulation, especially given the lack of random assignment. Nevertheless, it must be noted that we analyzed gamification effect in terms of student behavior. Consequently, this is mitigating the lack of a pre-test, which would have to be based on learners’ intentions (e.g., to access the system) rather than on actual behavior. Additional positive points are that this study features a control group and that the dependent variables reflect participants’ behavior.
Lastly, in terms of study context, this research concerned STEM students completing programming assignments in either a gamified or non-gamified setting. That is, the study is limited to a single learning activity type. Additionally, we only deployed a single gamification design throughout the data collection period. In contrast, research suggests that one’s experience with gamification might change depending on the task Liu et al. (2017) and that different designs should be adopted, depending on the learning activity Rodrigues et al. (2019); Hallifax et al. (2019). Therefore, we cannot rule out the possibility that we observed an effect due to the combination of the gamification design and the task type (i.e., programming assignments). However, given that the task type was invariant in our study, the effect observed could only be attributed to the gamification. Nevertheless, we opted for a single-factor study (i.e., only manipulating designs—gamified or not—within a single task) to maximize the findings’ interval validity, following literature recommendations Cairns (2019).
Based on our results and their implications, we call for further similar research (i.e., long-term, longitudinal gamification studies featuring a control group) in different contexts, based on other measures, to further ground our findings as well as determine whether they generalize to new contexts. Future research is also needed to confirm for how long gamification positively acts, until the novelty effect starts to act—a four-week period according to this study. Also, the literature demands such longitudinal research to ground the familiarization effect. We found that it takes between six and 10 weeks to start the uptrend. However, the lack of longitudinal studies and appropriate data analysis methods lead to little evidence on whether this period differs in other contexts. Additionally, future research should seek for further evidence on how much of the initial effect of gamification the familiarization can recover, for how long its uptrend lasts, and, mainly, ground whether there is a point in which the gamification impact actually achieves a plateau. In doing so, researchers could explore mixed-methods approaches. Specifically, qualitative data would allow understanding what maintains user motivation and persistence over time, based on their subjective experiences. Then, researchers could triangulate quantitative and qualitative data to advance the overall understanding of gamification’s effect over time.
Furthermore, once researchers are aware that these changes in gamification effect happen over time, interventions to mitigate them should be sought for. Although the familiarization effect appears to naturally address the novelty effect’s negative impact, it seems not to recover gamification’s initial benefit. Nevertheless, the period in which the gamification impact decreases should be tackled, to maximize its contributions. To that end, a promising research direction is tailored gamification, which can be accomplished through personalization or adaptation Klock et al. (2020); Rodrigues et al. (2020). In this approach, game elements are tailored based on user and/or contextual information, with the goal of enhancing gamification potential (e.g., Rodrigues et al. (2021); Lopez and Tucker (2021)). The rationale is that the same gamification design is unlikely to work for all users (i.e., one size does not fit all) because people have different preferences, perceptions, and experiences, even under the same conditions Van Roy and Zaman (2018); Rodrigues et al. (2020). Accordingly, in future studies, tailoring can be triggered once a user logs into the system (i.e., personalization), providing game elements that better suit the user, whilst expecting to mitigate the magnitude of the novelty effect, due to this choice, or during system usage (i.e., adaptation), changing the gamification design when its effect starts to decrease Tondello (2019). As personalized gamification is a recent research field Klock et al. (2020); Tondello et al. (2017), long-term experimental studies assessing its effects compared to general gamification approaches would be beneficial.