Research Updates - Emma Caputo

Winter 2024-25: Research Progress Update

The winter months have focused on refining and expanding our research tools while developing new applications for language education. Our conversational agent has undergone significant technical improvements to enhance its performance and reliability. We've also expanded our collaboration with the Arizona Games in Language Education group, leading to several exciting new projects that bridge research and practical classroom applications.

Survey Development

Working with Dr. Reinhardt, we've completed a comprehensive revision of our research survey, building on Sundqvist and Uztosun's 2023 validated survey in their article "Extramural English in Scandinavia and Asia: Scale Development, Learner Engagement, and Perceived Speaking Ability". Our latest refinements focus on capturing detailed insights into how learners interact with English in gaming environments. The survey now includes more nuanced analysis of gaming preferences and communication patterns, alongside detailed assessment of speaking and listening habits in gaming contexts. We've also integrated new tools for self-assessment of gaming fluency and expanded our metrics for understanding player motivation and interaction styles. These improvements will provide a more complete picture of how language learners engage with English through gaming.

Conversational Agent Improvements

Our AI-powered agent has undergone significant technical enhancements to create more natural and responsive interactions while maintaining our commitment to free/libre software principles. We've implemented Whisper.cpp for optimized speech-to-text processing, significantly improving accuracy and reducing latency in voice recognition. Response generation has been enhanced through the implementation of the Mistral 2 7B model, while text-to-speech capabilities have been improved using Piper for more natural voice output. The system now runs on a containerized deployment system for reliable scaling and performance, with robust security protocols ensuring research data integrity. These improvements have markedly enhanced the naturalness and timing of conversations while staying true to our commitment to using free/libre software.

Arizona Games in Language Education Projects

We've launched several new initiatives:

1. Educational AI Language Partners

Building on our research technology, we've created AI language partners that enable authentic, unpredictable communication practice in a low-anxiety environment.

Role-play scenarios with consistent AI personas
Real-world communication tasks
Comprehensive progress tracking

2. Hybrid Game Development

Our new game projects explore the combination of physical and digital elements to create more engaging language learning experiences that maintain valuable social interaction aspects.

Transhumancia Trail: Spanish language learning game combining physical map gameplay with digital resource management
Battle for the Brain: Team-based language production game with physical and digital components

3. Game Concept Generator Modernization

Updating Dr. Reinhardt's game deck (https://uagamedeck.weebly.com/game-deck) with enhanced features for language learning integration and game design components.

Fall 2024: Update at University of Arizona

As a visiting scholar in the Second Language Acquisition & Teaching (SLAT) program, I'm developing my research on game-based language learning through coursework, collaboration, and study of game design.

Through weekly collaboration with the AGILE research group, I've been deepening my understanding of game design principles and mechanics for language acquisition. This work has involved studying how different game elements support language learning, and developing frameworks for analyzing player interactions in gaming environments. The AGILE group's expertise has been particularly valuable in helping refine my conversational agent design and methodology for audio tasks.

While at Arizona, I've been attending the SLAT Proseminar course, which has strengthened my understanding of language learning foundations. I've also presented my research at two conferences:

Developing a Generative AI Tool for Dialogue Data Collection in Online Environments

LatinCALL: CALL Challenges and New Horizons in the Age of AI, November 2024

A Generative AI tool for Dialogue Data Collection in Online Environments

2024 Technology for Second Language Learning Conference at Iowa State University, October 2024

Moving forward, I will begin pilot testing the conversational agent, followed by the primary data collection phase. These steps will build directly on the theoretical and methodological developments from my time at Arizona.

Fall 2024: Collaboration with University of Arizona

I'm excited to announce my upcoming three-month stay as a visiting scholar at the University of Arizona, Second Language Acquisition & Teaching (SLAT) program.

During my stay, I will be focusing on the following activities:

Collaborating with Dr. Reinhardt and the Games in L2 Learning and Teaching (GL2TL) Research Group to refine my theoretical framework and research methods.
Improving data instruments for analyzing fluency, intelligibility, and comprehensibility in online audio data collected from gamers.
Developing and piloting qualitative data collection methods like game observation techniques.
Exploring specific game design elements and their potential impact on language learning outcomes.

This opportunity will provide valuable feedback, unique collaboration, and access to new research contexts, significantly contributing to the development of my project.

Summer 2024: Ongoing Instrument Development

Our current research focuses on developing two tools for language learning research: a conversational agent for L2 English dialogue data collection and a specialized dataset, or corpus, of L2 English usage in online gaming contexts including a conversational agent and a specialized dataset of L2 English.

Conversational Agent Development

Our research project involves developing a conversational agent to facilitate L2 English dialogue data collection. Utilizing a libre LLM and the Elixir programming language, we aim to create an agent capable of conducting brief, natural conversations with participants. A key focus is optimizing response times to closely simulate human interaction. Despite time constraints limiting proper validation, we anticipate that this tool will yield valuable data and increase participant engagement. The project is being developed in collaboration with a computer science student at the University of London. While not without limitations, this approach offers a practical method for gathering online dialogue data in support of our research objectives.

L2 English Dataset from Online Gamers

We are looking to create and analyze a corpus of L2 English from online gaming contexts. We plan to collect 30,000-50,000 words from recorded gaming sessions of non-native English speakers. Using the Natural Language Toolkit (NLTK), we will examine vocabulary usage, grammatical structures, and gaming-specific language across different gaming genres. This research will contribute to our understanding of L2 English usage in computer-mediated communication contexts, potentially informing both theory and pedagogy.

Early-Mid 2024: Survey Development and Pilot Studies

The first quarter of 2024 was dedicated to developing and testing the research instruments. We conducted several pilot studies to test our survey framework, refine methodologies and gain initial insights into participant preferences and technical requirements.

Custom Survey Framework

This project involved developing custom programs using JavaScript, PHP, HTML5, and CSS3 to enhance website functionality and implement a survey framework. This approach overcame the limitations of existing libre survey software, particularly in terms of design customization and functionality. The resulting program offers improved control over survey layout, navigation, and data storage, notably facilitating the collection and storage of audio data— a feature uncommon in standard survey software. Future program development contemplates rewriting the system using Go to prevent potential runtime errors, improve data accessibility and interpretability, and establish a foundation for future survey instruments.

Pilot results:

1. Topic preferences and task order (n=60)

Preferred task order

Survey
Native language task
Three L2 English tasks

Preferred order for L2 tasks

Monologue task
Story re-telling task with video
Dialogue

Topic preferences for monologue tasks

L2 English monologue: Hobbies
Native Language monologue: Hometown

2. Participation preferences (n=205)

~75% prefer computer-human voice interaction in a second language
~66% more likely to participate in online, self-paced studies without researcher interaction

3. Audio quality and participation challenges (n=15)

Tested survey framework and audio data collection
Revealed challenges in recruiting participants for dialogue tasks

4. Microphone and recording quality (n=15)

Tested microphone quality across various devices
Confirmed good overall audio quality for fluency studies
Identified potential issues with environmental noise

Software and Tools List

To support our research goals, we're continuously enhancing our technical skills. We're focusing on tools and technologies that will enable more sophisticated data analysis and improve our research methodologies. We are also dedicated to using libre software in order to guarantee participant privacy and security.

Planned software and tools:

Praat: For effective speech data analysis and interpretation
Python libraries:
- NLTK (Natural Language Toolkit): For language processing, tokenization, and sentiment analysis
- Pandas
- NumPy
- Parselmouth