Multimodal Math Data Corpus (see below to gain access to data set)
Advances in learning analytics are expected to contribute new empirical findings, theories, methods, and metrics for understanding how students learn. It also could contribute to improving pedagogical support for students’ learning through assessment of new digital tools, teaching strategies, and curricula. The most recent direction within this area is multimodal learning analytics, which emphasizes the analysis of natural rich modalities of communication across a variety of learning contexts.
This includes students’ speech, writing, and nonverbal interaction (e.g., gestures, facial expressions, gaze, sentiment). A primary objective of multimodal learning analytics is to uncover new learning theories while also making many of the traditional qualitative research methodologies more scalable and more systematic.
The Second International Workshop on Multimodal Learning Analytics will bring together researchers in multimodal interaction and systems, cognitive and learning sciences, educational technologies, and related areas to advance research on multimodal learning analytics. Following the First International Workshop on Multimodal Learning Analytics in Santa Monica in 2012, this second workshop will be a data-driven “Grand Challenge” event held at ICMI 2013 in Sydney Australia on December 9th 2013.
Workshop participants will be considered in three categories: (1) grand challenge dataset competition with report on prediction results, (2) submit independent research paper on topic related to multimodal learning analytics, or (3) observer and discussant, with submission of related position paper. Space will be limited, with the first two categories receiving priority.
ICMI Grand Challenge Workshop on Multimodal Learning Analytics:
Multimodal learning analytics, learning analytics, and educational data mining are emerging disciplines concerned with developing techniques to more deeply explore unique data in learning settings. They also use the results based on these analyses to understand how students learn. Among other things, this includes how they communicate, collaborate, and use digital and non-digital tools during learning activities, and the impact of these interactions on developing new skills and constructing knowledge. Advances in learning analytics are expected to contribute new empirical findings, theories, methods, and metrics for understanding how students learn. It also can contribute to improving pedagogical support for students’ learning through new digital tools, teaching strategies, and curricula. The most recent direction within this area is multimodal learning analytics, which emphasizes the analysis of natural rich modalities of communication during situated interpersonal and computer-mediated learning activities. This includes students’ speech, writing, and nonverbal interaction (e.g., gestures, facial expressions, gaze, sentiment. The First International Conference on Multimodal Learning Analytics (http://tltl.stanford.edu/mla2012/) represented the first intellectual gathering of multidisciplinary scientists interested in this new topic.
Grand Challenge Workshop and Participation Levels
The Second International Workshop on Multimodal Learning Analytics will bring together researchers in multimodal interaction and systems, cognitive and learning sciences, educational technologies, and related areas to advance research on multimodal learning analytics. Following the First International Workshop on Multimodal Learning Analytics in Santa Monica in 2012, this second workshop will be organized as a data-driven “Grand Challenge” event, to be held at ICMI 2013 in Sydney Australia on December 9th of 2013. There will be three levels of workshop participation, including attendees who wish to:
1. Participate in grand challenge dataset competition and report results (using your own dataset, or the Math Data Corpus described below which is available to access)
2. Submit an independent research paper on MMLA, including learning-oriented behaviors related to the development of domain expertise, prediction techniques, data resources, and other topics
3. Observe and discuss new topics and challenges in MMLA with other attendees, for which a position paper should be submitted
For those wishing to participate in the competition using the Math Data Corpus, they will be asked to contact the workshop organizers and sign a “collaborator agreement” for IRB purposes to access the dataset (see data corpus section). The dataset used for the competition is well structured to support investigating different aspects of multimodal learning analytics. It involves high school students collaborating while solving mathematics problems.
Tentative Workshop Timeline:
May 15, 2013: Distribution of workshop announcement to email lists July 1, 2013: MMLA database available for grand challenge participants August 21, 2013 August 30, 2013: Deadline for submitting grand challenge papers with results, and other workshop and position papers. (submission link)
Sept. 15, 2013: Notification of acceptance
Oct. 8, 2013: Camera-ready papers due
December 9, 2013: Workshop event
Available Data Corpus and Multimodal Analysis Tools
Existing Dataset: A data corpus is available for analysis during the multimodal learning analytics competition. It involves 12 sessions, with small groups of three students collaborating while solving mathematics problems (i.e., geometry, algebra). Data were collected on their natural multimodal communication and activity patterns during these problem-solving and peer tutoring sessions, including students’ speech, digital pen input, facial expressions, and physical movements. In total, approximately 15-18 hours of multimodal data is available during these situated problem-solving sessions.
Participants were 18 high-school students, including 3-person male and female groups. Each group of three students met for two sessions. These student groups varied in performance characteristics, with some low-to-moderate performers and others high-performing students. During the sessions, students were engaged in authentic problem solving and peer tutoring as they worked on 16 mathematics problems, four apiece representing easy, moderate, hard, and very hard difficulty levels. Each problem had a canonical correct answer. Students were motivated to solve problems correctly, because one student was randomly called upon to explain the answer after solving it. During each session, natural multimodal data were captured from 12 independent audio, visual, and pen signal streams. These included high-fidelity: (1) close-up camera views of each student while working, showing the face and hand movements while working at the table (waist up view), as well as a wide-angle view for context and another top-down view of students’ writing and artifacts on the table; (2) close-talking microphone capture of each students’ speech, and a room microphone for recording group discussion; (3) digital pen input for each student, who used an Anoto-based digital pen and large sheet of digital paper for streaming written input. Software was developed for accurate time synchronization of all twelve of these media streams during collection and playback. The data have been segmented by start and end time of each problem, scored for solution correctness, and also scored for which student solved the problem correctly. The data available for analysis includes students’:
1. Speech signals
2. Digital pen signals
3. Video signals showing activity patterns (e.g., gestures, facial expressions)
In addition, for each student group one session of digital pen data has been coded for written representations, including (1) type of written representation (e.g., linguistic, symbolic, numeric, diagrammatic, marking), (2) meaning of representation, (3) start/end time of each representation, and (4) presence of written disfluencies. Furthermore, students’ speech has been transcribed for lexical content.
To access the dataset and a full description of it, workshop participants must first download and sign a Collaborator Agreement. This signed agreement can be emailed to Sharon Oviatt at: firstname.lastname@example.org, who will then provide a username and password for accessing the dataset and its description at: http://mla.ucsd.edu/. Technical questions about the database or using ChronoViz multimodal analysis tools to analyze it should be directed to Nadir Weibel at: email@example.com
Data-Driven Grand Challenge Competition and Evaluation
The purpose of the data-driven grand challenge event organized as part of the Second International Workshop on Multimodal Learning Analytics is to use the Math Data Corpus to explore and develop new techniques that can successfully:
- Predict which student in a 3-person group is the dominant expert
- Predict which problems will be solved correctly versus incorrectly by the group (i.e., of the 32 available)
As examples, these predictive techniques could aim to identify this information (1) with high reliability, (2) as rapidly as possible (i.e., from a minimal set of the solved problems) (3) using a technique that generalizes across all problem difficulty levels, and/or (4) in a way that accurately distinguishes the domain expert from group leader.
Participants can use empirical or engineering techniques, and any source of signal or coded information available in these multimodal data. Participants can focus on developing predictors of any one or more of the aims listed above. For evaluation purposes, each participant will be required in their workshop paper to describe:
(1) Replicable methods for achieving their reported level of predictive success with any one or more of the above aims,
(2) A clear quantitative summary of their method’s precise predictive capabilities, and
(3) A description of the scope of their predictive capabilities, if more than one objective is included.
Reporting is on the honor system. However, participants may be asked to present their data and techniques to the evaluation committee or by others at the workshop. They therefore should be prepared to reveal and defend the specifics of any prediction methods, techniques, or results presented.
Please direct questions about the Data Corpus and the evaluation criteria to the Data Resources and Grand Challenge Evaluation Committee Members: Sharon Oviatt and Nadir Weibel.
Workshop Organizing Committee & Roles
Members of the committee are:
1. Dr. Stefan Scherer
USC Institute for Creative Technologies
12015 Waterfront Dr.
Playa Vista, CA, 90094-2536
2. Dr. Nadir Weibel
Department of Computer Science and Engineering
University of California San Diego
La Jolla, CA, 92093-0414
3. Marcelo Worsley
Transformative Learning Technologies Lab
520 Galvez Mall
Stanford, CA, 94305
4. Dr. Louis-Philippe Morency
USC Institute for Creative Technologies
12015 Waterfront Dr.
Playa Vista, CA, 90094-2536
5. Dr. Sharon Oviatt
President & Research Director, Incaa Designs Nonprofit
11140 Wing Point Drive N.E.
Bainbridge Island, Wa. 98110