Benefits of immersive collaborative learning in CAVE-based virtual reality

How to make the learning of complex subjects engaging, motivating, and effective? The use of immersive virtual reality offers exciting, yet largely unexplored solutions to this problem. Taking neuroanatomy as an example of a visually and spatially complex subject, the present study investigated whether academic learning using a state-of-the-art Cave Automatic Virtual Environment (CAVE) yielded higher learning gains compared to conventional textbooks. The present study leveraged a combination of CAVE benefits including collaborative learning, rich spatial information, embodied interaction and gamification. Results indicated significantly higher learning gains after collaborative learning in the CAVE with large effect sizes compared to a textbook condition. Furthermore, low spatial ability learners benefitted most from the strong spatial cues provided by immersive virtual reality, effectively raising their performance to that of high spatial ability learners. The present study serves as a concrete example of the effective design and implementation of virtual reality in CAVE settings, demonstrating learning gains and thus opening opportunities to more pervasive use of immersive technologies for education. In addition, the study illustrates how immersive learning may provide novel scaffolds to increase performance in those who need it most.


Introduction
Unlike traditional media such as textbooks, immersive virtual reality and related technologies allow educational content to be projected all around the learner. By being virtual, the content is no longer bound to the laws of physical reality and can thus be presented in novel ways, with the potential to benefit learners. For example, a visually and spatially complex subject such as neuroanatomy may be hard to comprehend when using textbooks, as readers have to make due with a restricted number of 2D images of anatomical structures (Jeffrey et al., 2002). Real anatomical models do not have this limitation, but consist of a finite number of parts and cannot be enlarged to inspect details of interest. Informed use of immersive technologies can remove these restrictions and may offer custom-tailored, highly interactive student-centered education, thus providing new avenues of support for learning. Yet, few empirical studies have investigated differences in learning benefits from immersive learning compared to more conventional study methods.
Non-immersive virtual learning environments, such as those displayed on regular 2D desktop monitors, have already been indicated to yield a range of learning benefits, and may facilitate engagement, spatial awareness, contextual and collaborative learning (Dalgarno & Lee, 2010), constructivist learning (Lee, Wong, & Fung, 2010;Mikropoulos & Natsis, 2011) and learning transfer (Dede, 2009).

Virtual reality headsets for learning
Virtual reality learning environments that are more immersive, such as those experienced via virtual reality headsets, likely enhance the benefits of non-immersive virtual learning environments. A prominent example of this is presence, or the feeling of existing inside a computer-generated environment (Heeter, 1992;Steuer, 1992). Presence may be limited when using regular 2D desktop monitors, but greatly increases when using virtual reality headsets which highly immerse the viewer in the simulated content. Presence was found to be associated with positive learning outcomes in three studies in a 10-year review of educational virtual environments (Mikropoulos & Natsis, 2011). Immersive virtual reality using virtual reality headsets additionally allows objects and environments to be experienced in stereoscopic 3D and at actual size, and has been found to support spatial understanding (Bacim et al., 2013;Ragan, Kopper, Schuchardt, & Bowman, 2013;Schuchardt & Bowman, 2007) and navigation, especially when coupled with physical movement (Ruddle, 2013). In addition, virtual reality headsets enable users to embody a virtual self or avatar, allowing them to engage in collaborative learning within a shared social space, deemed to be an important part of the social constructivist learning process of Vygotsky (Dalgarno, 2002;Huang, Rauch, & Liaw, 2010). While users interact with avatars on 2D desktop monitors indirectly using keyboard and mouse inputs, users can interact with avatars on virtual reality headsets directly using natural head and hand motions, increasing avatar fidelity. These and other types of intuitive interaction additionally support the learning of procedural tasks (Ragan, Sowndararajan, Kopper, & Bowman, 2010).

Prior virtual reality headset studies
With the technological advancement and improved availability of virtual reality solutions, recent years have seen an increase in studies on the benefits of virtual reality headsets for educational purposes compared to those of more conventional learning methods. Investigated areas are varied and cover topics including academic learning (Makransky, Terkildsen, & Mayer, 2017;Maresky et al., 2019;Parong & Mayer, 2018;Webster, 2016), motion learning (Chen et al., 2019;LaFortune & Macuga, 2018) and skills training (Buttussi & Chittaro, 2018;Li, Liang, Quigley, Zhao, & Yu, 2017;Sankaranarayanan et al., 2018;Yoganathan, Finch, Parkin, & Pollard, 2018). Results of these and other studies are mixed, with some reporting learning benefits of the use of virtual reality headsets over conventional methods (e.g., Maresky et al., 2019;Yoganathan et al., 2018), while others did not (e.g., Makransky et al., 2017;Parong & Mayer, 2018). While these studies employed multiple learning benefits of immersive virtual reality, they did not examine multi-user collaboration for educational purposes. A rare de Back et al. International Journal of Educational Technology in Higher Education (2020) 17:51 exception is Šašinka et al. (2018), in which participant pairs wearing virtual reality headsets engaged in collaborative learning while performing geospatial tasks. With the collected data being interpreted qualitatively rather than quantitatively, no definitive conclusions could however be drawn about the effect of virtual collaboration on learning. To our knowledge, virtual reality headset studies on collaborative learning in larger groups are near non-existent. As such, the potential of virtual reality headsets for collaborative learning remains largely unexplored. Perhaps the limited number of studies on collaborative learning using virtual reality headsets is no surprise, given the limitations of these devices. Even though they seem to provide a collaborative experience, they are not able to detect and convey facial expressions required for natural face-to-face interaction. Yet, the absence of facial gestures and other social cues when using current virtual reality headsets were reported to be problematic depending on the task type in a study on collaborative noneducational games (Greenwald, Wang, Funk, & Maes, 2017). These limitations of virtual reality headsets for collaborative learning are compensated for in Cave Automatic Virtual Environments (CAVEs).

CAVEs for learning
A CAVE is a room which immerses one or multiple persons into a virtual environment projected onto its walls (Cruz-Neira, Sandin, DeFanti, Kenyon, & Hart, 1992). The virtual environment is typically viewed in stereoscopic 3D using small see-through glasses similar to those worn in a movie theater. These glasses, along with a 3D mouse, are tracked in space. This allows users to explore and interact with the virtual environment using the full range of natural head and hand motions.
CAVEs possess a number of features which make them uniquely positioned for immersive collaborative learning. Due to the use of see-through glasses, there is no need for an avatar to represent the user, as CAVEs allow for a mix of the virtual and the real by enabling users to simultaneously view both their physical body, that of others, as well as the virtual environment. This is clearly different from contemporary virtual reality headset setups that isolate the viewer from the physical surroundings and typically only track head and hand positions, with the consequence of the remaining body information being either lost or needing to be inferred. In contrast, when using CAVEs, groups of learners can be jointly immersed in educational virtual environments while preserving body language, including facial expressions. This allows for natural group interaction, thus creating a strong sense of co-presence as the learners can see and interact with each other as they normally would.

Model elaboration
Previous research has argued that co-presence may enhance collaborative learning, as well as mediate a number of other learning benefits, including spatial, experiential, and contextual learning (Dalgarno & Lee, 2010). Additionally, learners can be immersed into the virtual environment across the full human field of view when using CAVEs, whereas this is greatly reduced in the "goggle vision" of contemporary virtual reality headsets. Importantly, increased field of view has been reported to aid memory (Lin, Duh, Parker, Abi-Rached, & Furness, 2002;Ragan et al., 2010). Dalgarno and Lee (2010) presented a comprehensive model encapsulating the majority of the aforementioned learning benefits, drawing from a range of examples of virtual learning environments, while referring to theories including cognitive and social constructivism, flow theory, context-dependent memory and situated learning. The model details how pedagogical benefits may indirectly arise from the unique characteristics of virtual learning environments by affording learning tasks. The characteristics of virtual learning environments are grouped into representational fidelity and learner interaction, with representational fidelity comprising of such characteristics as the realism of the displayed environment, the quality of the representation of the user within it, and the smoothness of view changes. In turn, learner interaction mainly consists of the level of embodiment of the user when looking around, navigating, and manipulating objects, as well as when engaging in verbal and non-verbal communication. While the model of Dalgarno and Lee (2010) is comprehensive, it does not contain examples of the technical elements which representational fidelity and learner interaction consist of. Adding these elements to the model would however facilitate a comparison of the differential ways in which 2D monitors, virtual reality headsets and CAVEs may afford learning tasks.
Drawing on the literature discussed previously, in Table 1 we elaborate the model of Dalgarno and Lee (2010) to contain examples of these differentiating elements including field of view, stereoscopic 3D, facial expressions and gestures. In addition, we indicate how these elements differ between 2D monitors, virtual reality headsets and CAVEs. In accordance with Dalgarno and Lee (2010), in the elaborated model we expressly do not map individual differentiating elements of representational fidelity and learner interaction directly to learning benefits. We argue that it depends both on the configuration of the employed elements, as well as on the task in question which learning benefits may potentially occur. From the model it can be understood how, through afforded tasks, spatial awareness may for instance benefit from the enhanced spatial cues of immersive virtual reality, and how engagement and collaborative learning may benefit from increased embodiment and the higher level of expressiveness it enables. The elaborated model thus provides further support for the myriad ways in which immersive technologies may increase the representational fidelity and learner interaction of non-immersive methods, affording learning tasks and ultimately strengthening learning benefits. Yet while the use of CAVEs and virtual reality headsets may potentially increase learning benefits, these platforms are not without possible downsides compared to 2D desktop monitors, exemplified by increased cost and adverse physiological effects such as dizziness. These factors are thus to be considered together with the potential gain in learning benefits of immersive technologies to arrive at an optimal decision for the platform to use.

Prior CAVE studies
As with the number of studies on collaborative learning in virtual reality headsets, very few studies have examined the learning effects of CAVEs compared to other learning methods. O'Brien and Levy (2008) conducted a study in college students in which knowledge of grammar was applied using a computer game in 2D desktop monitor, regular projector screen and multi-person CAVE conditions. While the CAVE condition was rated to be the most expansive, a posttest did not lead to clear conclusions. Limniou, Roberts, and Papadopoulos (2008) conducted a similar study, in which a CAVE condition about chemistry was performed after all students learned about the same topic in an equivalent 2D desktop monitor condition. In both conditions, teacher instruction was used and the students were given the opportunity to ask questions. Knowledge was assessed using the same test items after each condition. Learning outcomes were reported to be higher after the CAVE condition, but it is difficult to determine whether this was due to the benefits of the CAVE, or the additional teacher instruction received by that time. Moreover, one of the major benefits of virtual reality in headsets and CAVEs, embodied learner interaction, was left unused, as it was the instructor and not the students who controlled the game at all times.
The benefit of interaction with a CAVE was also investigated for conceptual learning in children, focusing on the topic of math fractions (Roussou & Slater, 2017). Two CAVE conditions were conducted (active interaction, passive observation) along with an equivalent 'reality' condition. In all conditions, the children directly interacted with the learning environment. While the quantitative results showed no differences between the CAVE conditions, learning gains were higher for the CAVE than the 'reality' condition. Regarding CAVE benefits, children participated individually, without co-presence with other learners and the potential benefits for collaborative learning it may mediate. Alhalabi (2016) came to similar conclusions when learning gains were measured across CAVE, virtual reality headsets with and without tracking and nonimmersive learning conditions. Knowledge scores were highest in the tracked virtual reality headset condition, followed by the CAVE condition, with lowest scores for the non-immersive condition. In a study on martial art motion learning in CAVE, virtual reality headsets and 2D desktop monitor conditions, learning outcomes were highest for the CAVE condition (Chen et al., 2019). The study employed collaborative learning, yet this was restricted to the display of pre-recorded avatar motions of other students, making interactivity between learners impossible. In contrast, Leder, Horlitz, Puschmann, Wittstock, and Schütz (2019) in two studies on safety training did not find significantly higher learning outcomes of a non-interactive immersive CAVE condition compared to a PowerPoint condition, neither immediately after the experimental manipulation, nor after a 6-month interval. In the studies of Alhalabi (2016), Roussou and Slater (2017) and Leder et al. (2019), students were all tested individually. Not only does this forego the unique collaborative aspect of CAVEs, it also causes the use of CAVEs in class settings to be costly.
In sum, in the few CAVE studies published, benefits of CAVE conditions over more conventional learning methods were obtained for those studies reporting such differences. Yet, not all studies reported quantitative results (O'Brien & Levy, 2008), compared CAVE and conventional learning conditions (Limniou et al., 2008), utilized the unique interactive aspects of virtual reality in 3D settings (Leder et al., 2019;Limniou et al., 2008) or took advantage of the collaborative aspect of the CAVE system (Alhalabi, 2016;Leder et al., 2019;Roussou & Slater, 2017). Additionally, compared to CAVEs, studies employing virtual reality headsets are higher in number, and results of these studies on the benefits for learning have been mixed. This suggests that benefits of virtual environments are not automatic, and require informed design choices to be obtained successfully. This is reflected in Dalgarno and Lee (2010), who purport that when using 3D virtual learning environments, their unique benefits are to be employed for unique learning gains to occur in comparison to using 2D environments.

The importance of individual differences
Even if we were to reach the tentative conclusion that virtual reality in 3D settings (headsets and CAVEs) yield higher learning gains than less immersive virtual reality in 2D settings, and that such 2D settings yield higher learning gains than traditional non-immersive learning methods, we would miss out on an important aspect of learning-that of individual differences. After all, individual differences in cognitive ability may affect performance, particularly in the area of academic learning (Chen, Whiteman, Gully, & Kilcullen, 2000). This raises the issue whether learners benefit equally from virtual reality settings such as CAVEs, and whether specific cognitive ability differences modulate learning benefits. The question, then, is what aspects of individual differences one should focus on. Established major subdomains of cognitive ability are working memory, processing speed and spatial ability (Rohde & Thompson, 2007). Of these subdomains, spatial ability seems to be particularly relevant for learning using CAVEs and other immersive display technologies. This can be understood as such technologies offer rich spatial information, especially when compared to more traditional media (Castronovo, Nikolic, Liu, & Messner, 2013;Ragan et al., 2010).

Spatial ability
Spatial ability encompasses a set of cognitive functions, and has been defined as the ability to accurately perceive a scene, to reconstruct it in the mind's eye, and to be able to alter and reconfigure it dynamically (Carroll, 1993;Höffler, 2010). Spatial ability has been shown to be an important individual difference affecting academic learning (Shea, Lubinski, & Benbow, 2001;Wai, Lubinski, & Benbow, 2009). However, it remains unclear whether low or high spatial ability learners benefit more from animations generated by multimedia (Höffler & Leutner, 2011). It may be the case that for low spatial ability learners, constructing a mental representation of a subject presented using still images requires substantial cognitive effort, and that this process may be facilitated when animations are provided. High spatial ability learners, on the other hand, are readily able to form mental representations without additional visual support structures, and for them no substantial benefits are expected. These predictions are part of the ability-as-compensator hypothesis (Huk, 2006;Mayer & Sims, 1994).
Evidence in favor of this hypothesis comes from a meta-analysis of 27 experiments, in which Höffler (2010) showed low spatial ability learners benefitted more from animations versus static pictures, or 3D versus 2D illustrations compared to high spatial ability learners. However, more recent studies showed mixed results, with some reporting benefits of additional spatial information for low spatial ability (Barrett & Hegarty, 2016;Kühl, Stebner, Navratil, Fehringer, & Münzer, 2018;Lee & Wong, 2014;Münzer, 2015;Sanchez & Wiley, 2014), and others reporting benefits for high spatial ability groups (Vindenes, de Gortari, & Wasson, 2018;Wu, Lin, & Hsu, 2013). When virtual learning environments were used in these studies, these were predominantly of the non-immersive kind, with a few notable exceptions (e.g. Barrett & Hegarty, 2016;Vindenes et al., 2018). For research focusing on the learning benefits of CAVEs, individual differences due to spatial visualizations are expected to be critical because of users spatially moving around in the virtual world.

Research objectives of the present study
The present study addressed the present gap of knowledge regarding the educational value of CAVEs compared to more conventional learning methods when a combination of key CAVE benefits is utilized. To this end, we aimed to investigate whether the use of a CAVE yielded increased learning gains as compared to traditional textbook learning, leveraging the unique interactive aspects of virtual reality in 3D settings and the collaborative aspect of a CAVE system. Given the importance of employing unique benefits of virtual learning environments to obtain unique learning gains (Dalgarno & Lee, 2010), the focus here is not on strict equivalence of the two conditions and rather on the potential for learning when using CAVEs and to compare this to traditional textbook learning. Textbooks have commonly been used as a comparison condition in measuring learning gains in educational technologies (Allcoat & von Mühlenen, 2018;Barab et al., 2009;Graesser et al., 2004). Moreover, we took into account the individual differences in spatial ability and examined to what extent they would modulate any learning gains in CAVE settings.  Fig. 1.

Materials
We developed a virtual reality game for CAVEs on the subject of neuroanatomy involving the understanding of brain structures, their interconnections and broader spatial relationships. The subject of neuroanatomy was selected as it is a visually and spatially complex subject which may be hard to grasp when using conventional textbooks which are inherently 2D and do not allow learners to dynamically explore anatomical structures from multiple viewpoints. Virtual learning environments displayed by CAVEs can be designed to have neither of these restrictions and support spatial understanding using strong spatial cues, in accordance with the first potential learning benefit of the model by Dalgarno and Lee (2010) of spatial knowledge representation. The virtual reality game we developed consisted of an interactive virtual learning environment which employed all four walls of a CAVE to provide an immersive and engaging experience to groups of learners. Educational content regarding neuroanatomy was distributed by type along the CAVE walls. Most prominently, the virtual learning environment contained a large-size 3D model of a whole human brain which served to indicate the precise spatial location of individual brain areas, which were shown on a separate wall. The game incorporated (social) constructivist elements including free exploration, knowledge construction and collaboration. To further increase the chances of facilitating the learning process, multiple additional task-relevant CAVE benefits were utilized. The game gives full autonomy to the students without requiring the presence of an experimenter. It employs the use of head tracking, allowing participants to observe brain areas and their spatial relations from a broad range of angles. The game additionally leverages co-presence by allowing groups of users to play together, and promotes active participation by requiring players to take turns in interacting directly with the game. Interaction is embodied and intuitive, allows exploration, and requires the players to physically draw connections between related parts of information, distributed across the virtual environment shown in the CAVE in order to foster collaborative learning. Prominent gamification elements such as scores, stages and audiovisual feedback (Hamari, Koivisto, & Sarsa, 2014) were incorporated into the game for the purpose of increasing motivation and enhancing learning. For the CAVE condition, names and function descriptions of brain areas were obtained from a textbook chapter on neuroanatomy (Friedenberg & Silverman, 2006). The textbook information was incorporated into the game without alteration. Illustrations of brain areas contained in the chapter were substituted with equivalent 3D representations, taken from human anatomy database BodyParts3D/Anatomography (The Database Center for Life Science, CC Attribution-Share Alike 2.1 Japan). The content of the CAVE condition was segmented into five stages and respectively dealt with topics of brain sectioning, anatomical directions, cortical areas, human memory systems, and lastly areas involved in attentional processes.
For the textbook condition, we used the newer 2015 edition of the textbook chapter by Friedenberg and Silverman (2015) instead of the 2006 edition used for the development of the game. In order to retain the content of the game, the discrepancy of information present in the game but absent from the 2015 edition was compensated for by adding one page of the 2006 edition to the textbook condition. Screenshots of the neuroanatomy game are depicted in Fig. 2. The game was presented in stereoscopic 3D with audio produced by wall-mounted speakers consisting of sound effects and prerecorded voice instructions.

Tests
Two question tests were created using content obtained from the textbook, which was consistent with the content of the virtual reality game. Each of the question tests contained 20 four-option multiple-choice questions. Test difficulty was balanced between the two tests by ensuring they contained an equal number of questions of the same type (brain area name, function, location). The two tests were used as pretests and posttests to assess learning gains, counterbalanced in the experiment to avoid differences in pre-and post-learning to be due to the questions themselves. Answer keys were used to determine the number of correct answers for each of the two tests. Spatial ability was measured using a written self-assessment test containing 26 questions of increasing difficulty. The questions were obtained from subsection "Shapes and Blocks" from a chapter on spatial ability of a psychometric test book (Barrett, 2008). The questions are a revised version of those from the 2003 version of the book (Barrett & Williams, 2003), which have shown to be effective in determining spatial ability in a 2D desktop monitor study on anatomy (Lee & Wong, 2014).

Procedure
We used a balanced, between-subjects design. Participants were randomly assigned to one of two experimental conditions on the topic of neuroanatomy: immersive learning (CAVE) or traditional textbook learning (textbook), noting that for some sessions participants registered for the same session were assigned to the CAVE condition due to allocation constraints. The experimental procedure of both conditions is depicted in Fig. 3.
In the CAVE condition, between two and four persons participated simultaneously to enable efficient use of the virtual learning environment displayed by the CAVE. Upon entering the lab, participants received information about the study, signed an informed consent form and completed a spatial ability test. In order to assess preexisting knowledge of neuroanatomy, participants were presented with a written pretest, and were instructed to make an educated guess if they did not know the answer to a question, as unanswered questions would be treated as errors. Next, participants received seethrough glasses and followed the experimenter into the CAVE. Inside, the experimenter gave a scripted explanation of the game, and allowed each participant to briefly familiarize him/herself with its use in a practice stage. Then, the experimenter started the first actual stage of the game and exited the room. In the game, participants collaboratively learned about brain region shape, position, name and function. These elements were arranged by type and were shown on different walls of the virtual environment of the CAVE (i.e. one type per wall). Participants were instructed to discuss among each other which of the displayed elements belonged to each other and to demonstrate their knowledge by using a wand (3D mouse) to draw large-size lines connecting the different elements together. Collaboration and active participation was stimulated by having participants take turns at set intervals in directly interacting with the virtual environment while engaging in discussion with the other participant(s) about the correct answers. By pulling a virtual lever, participants received audiovisual feedback about right/wrong connections which had been made, followed by an opportunity to jointly reflect on the information provided. Examples of key game actions are depicted in Fig. 4. The activities of the participants while playing the game inside the CAVE were observed by the experimenter via a monitor in the adjacent lab room. Time allotted per stage was not fixed, which could otherwise have reduced learning effectiveness in groups needing more time to learn all the provided educational content. The average total playtime was 40 min. After finishing the game, the participants left the CAVE and completed a written posttest, concluding the session.
In the textbook condition between one and five persons participated in any one session. Instead of experiencing the collaborative neuroanatomy game, participants individually studied the textbook chapter from which the educational content of the CAVE condition was obtained. Forty minutes were allotted to study the chapter, equal to the average total playing time of the CAVE condition. The remainder of the procedure was identical to that of the CAVE condition.

Data analysis
All statistical tests were conducted using analysis of variance (ANOVA) tests, and were performed using SPSS 25 (IBM Corp. in Armonk, NY). Simple gain scores were obtained by calculating the raw difference between the pretest and posttest scores. In addition, normalized gain scores were calculated which adjust for the pretest score, thus accounting for possible differences in prior knowledge (Hake, 1998(Hake, , 2002. Normalized gain scores were calculated as follows: (Posttest -Pretest) / (1 -Pretest). A one-way between-subject ANOVA was used to assess the main effect of condition (textbook, CAVE) on learning gains. For the assessment whether low and high spatial ability learners benefit differentially from immersive virtual learning environments, low and high spatial ability groups were determined using a median split of the scores on the spatial ability test, consistent with Lee and Wong (2014). A 2 × 2 ANOVA with variables condition (textbook, CAVE) and spatial ability (low, high) as between-subject factors was used to assess the effect of condition on learning gains for each of the four split groups. Normality and homogeneity of variance were respectively tested with the Kolmogorov-Smirnov test and Levene's test. Effect size is reported using partial etasquared (η p 2 ), with η p 2 values of .01, .06, and .14 respectively characterized as small, medium and large effects (Cohen, 1988). Statistical significance is reported two-tailed (α = .05), with all pairwise comparisons being Bonferroni-corrected.
The CAVE condition thus resulted in higher learning gains compared to traditional textbook learning, with large effect sizes. This is consistent with the findings of Roussou and Slater (2017) and Alhalabi (2016) who compared learning outcomes after CAVE and more traditional study methods.

Spatial ability
We next examined whether learning gains were modulated by spatial ability. The reliability of the spatial ability test was high, Cronbach's α = .84. There was no main effect of spatial ability on learning gain, neither simple gain, F(1, 38) = 1.51, p = .227, η p 2 = .038, 95% CI [.00, .20], nor normalized gain, F(1, 38) = 1.53, p = .224, η p 2 = .039, on the spatial ability test divided participants in the two conditions into low and high spatial ability groups (textbook: low spatial ability: n = 14, high spatial ability: n = 6. CAVE: low spatial ability: n = 6, high spatial ability: n = 14). This ensured participants were assigned to the same low or high spatial ability group for both the analysis of the main effect of spatial ability on learning gain as well as for the pairwise comparisons of the effect of condition on learning gain for the low and high spatial ability groups. Pairwise comparisons indicated that the learning gain of the low spatial ability group was significantly higher in the CAVE condition than in the textbook condition with η p 2 indicating a large effect, both for simple gain, .119. A plot of the simple and normalized learning gains of the spatial ability groups in the CAVE and textbook conditions is presented in Fig. 5. As predicted by the ability-as-compensator hypothesis, the analysis of the gain difference between the spatial ability groups in the conditions revealed that the high group had a significantly higher normalized learning gain in the textbook condition, whereas the simple gain difference was only approaching significance, simple gain: Thus, the spatial ability disadvantage of the low spatial ability group in the textbook condition was effectively mitigated in the CAVE condition, raising the learning gain to the level of the high group. This is consistent with the ability-as-compensator hypothesis, indicating that low spatial ability learners benefitted more from the affordances of the immersive CAVE system as compared to the high spatial ability learners.

General discussion
Despite the benefits of CAVEs, few studies have investigated their use for education. Studies comparing the benefit of CAVEs for academic learning over traditional learning methods are especially lacking. The present study aimed to fill this gap of knowledge by incorporating key CAVE benefits such as embodied, natural multi-person interaction into a serious game, to directly compare learning gains after exposure to CAVE and traditional textbook study conditions.
Results indicated that learning gains were higher after the immersive CAVE experience compared to textbook study, with high effect sizes. This is consistent with the magnitude of effects in a study leveraging collaborative CAVE learning by Limniou et al. (2008), in which higher learning outcomes were reported after CAVE over 2D desktop monitor conditions on the topic of chemistry. Similarly, the findings are in line with Alhalabi (2016), reporting improved learning performance on engineering-related topics after an immersive CAVE condition over a non-immersive control condition. Additionally, our findings are consistent with Chen et al. (2019), who observed higher performance for motion learning after an immersive CAVE condition compared to virtual reality headsets and 2D desktop monitor conditions. The results of the present study are in contrast to Leder et al. (2019) who focused on safety training and did not obtain significantly higher learning gains after CAVE compared to PowerPoint conditions. Leder et al. (2019) did not utilize collaboration and interactivity, both components that are of importance according to the model of Dalgarno and Lee (2010) on the learning benefits of 3D virtual learning environments, and it is therefore possible that this result was (in part) due to foregoing these CAVE benefits.
Regarding individual differences in spatial ability, a modulating effect on performance was observed, indicating that those with low spatial ability benefitted most from immersive learning, as only after the CAVE condition did their performance match that of the high spatial ability learners. This suggests that the immersive properties of the condition are an important factor contributing to the learning process. The finding of higher learning gains in low spatial ability learners is in line with the meta-analysis of 27 experiments of Höffler (2010) by presenting evidence in favor of the ability-ascompensator hypothesis, which posits that visual support structures are of special benefit to low spatial ability learners, for whom creating a mental representation of a subject without such scaffolds may require more concerted effort. The finding is additionally in line with the results of recent studies showing low spatial ability learners benefiting from more extensive spatial information, enabled using stereoscopic displays viewed with see-through glasses in Barrett and Hegarty (2016), 2D yet interactive desktop virtual reality in Lee and Wong (2014) and 2D animations of educational material in Kühl et al. (2018), Münzer (2015) and Sanchez and Wiley (2014), as compared to less spatially rich conditions predominantly restricted to the use of static imagery. The present study demonstrates the potential of incorporating multiple task-relevant benefits of 3D virtual learning environments, important to increase the likelihood of obtaining higher learning gains compared to traditional paradigms (Dalgarno & Lee, 2010). The combined use of immersive, active and highly collaborative learning is particularly under-researched, and the present study may thus serve as a concrete example of its effective implementation for the design of future CAVE and virtual reality headset studies alike. Additionally, the findings stress the importance of controlling for taskrelevant individual differences, as was done here for spatial ability, in order to gain more insight into the characteristics of those who stand to gain the highest potential benefit of immersive educational experiences.
To investigate the potential for learning with CAVEs, the use of a virtual learning environment leveraging not one but multiple CAVE benefits was compared to traditional textbook learning. Besides stereoscopic 3D and interactivity, the CAVE condition featured collaboration to enable efficient use of CAVEs and incorporated gamification to foster learning, while both elements were absent from the textbook condition. The consequence of these differences is that, at present, it is difficult to assess which (combination of) benefits contributed most to the observed differential learning gains of the CAVE condition. The results of the present study are therefore a starting point demonstrating the potential for learning with CAVEs. Now that this potential has been established, an interesting avenue for future research is to experimentally manipulate individual benefits of CAVEs to gain further understanding of the optimal use of these systems for educational purposes.
Recognizing the importance of individual differences and their potential modulating effect on learning performance, the present study accounted for the individual difference of spatial ability. No data was collected regarding the sense of presence into the virtual environment and the occurrence of side effects such as dizziness and nausea, which could potentially have affected learning gains as well. Taking these factors into account may be helpful for gaining further understanding of why learning in immersive environments may be higher in some compared to others, as well as inform the design of virtual learning environments which minimize the occurrence of unwanted side effects. Moreover, given the focus on short-term learning gains in the present study, it would be interesting for future studies to investigate whether knowledge acquired in immersive settings is retained for longer periods of time compared to knowledge obtained through conventional learning methods.

Conclusion and future work
The present study contributes to the literature on learning in virtual environments by demonstrating higher learning gains after immersive learning compared to textbook study, obtained through informed use of a combination of CAVE benefits. The higher learning gains after immersive, collaborative and active learning demonstrated in the present study is in agreement with social constructivism and experiential learning theory. Additionally, in showing the feasibility of interaction and collaborative learning using a CAVE-based system, the present study is informative for the design of virtual learning environments in CAVEs. New evidence was presented for a modulating role of spatial ability on performance, indicating the importance of accounting for this factor in immersive settings. Besides adding to the literature on the potential effectiveness of immersive learning when leveraging multiple task-relevant benefits, the obtained results are of interest to knowledge institutions seeking new ways to motivate their students, while offering an exciting look into the future of education.
The present study opens up new avenues for research on the use of immersive technologies for education. Group size increases are generally posited to be linked to decreases in performance (Mullen, 1994;Petty, Harkins, Williams, & Latane, 1977), yet its effect on collaborative immersive learning has rarely been investigated. Studies on immersive learning which manipulate group size would therefore potentially provide a significant contribution to this area. Even if it was found that learning in immersive settings is reduced in large groups, this would still be of interest if the gains exceed those of non-immersive settings. To gain a deeper understanding of the conditions under which immersive education is most effective, an additional worthwhile endeavor would be to examine the effect of immersive learning at different stages of a curriculum, so as to determine the optimal timing of exposure to immersive educational environments. Augmented reality headsets, wearable see-through devices which present virtual objects in the physical surroundings of the wearer, can potentially be networked to enable collaborative learning as is possible in CAVEs, while having the benefit of not requiring a dedicated physical space. As current limitations such as a restricted field of view (Blattgerste, Strenge, Renner, Pfeiffer, & Essig, 2017) are addressed, future studies should investigate the potential of augmented reality headsets to support collaborative immersive learning. The results of these studies are likely of interest to knowledge institutions considering the use of CAVEs and other immersive technologies for educational purposes, for which engaging large numbers of students is a pressing issue, and which stand to benefit from empirical findings regarding the most optimal circumstances of their use.