MyVoice Final Four: Students Compete in the 2021 MyVoice Data Challenge

The 4th annual Data for Public Good symposium Data for Good in Changing Times

Student teams competed in a two month long MyVoice Data Challenge, culminating in presentations at the 2021 Data for Public Good Symposium

The Michigan Institute for Data Science (MIDAS) in partnership with faculty and research staff from Michigan Medicine helped us host the 2021 MyVoice Data Challenge on Youth Health and Wellness. The goal of the virtual team event was to get students and learners involved in how MyVoice makes sense of the hundreds of thousands of text messages we receive as part of our weekly survey to American youth. Teams were tasked with developing improved, natural language processing-enhanced qualitative research process to quickly extract insights from MyVoice’s text message polling data. They were given about two months to develop their best ideas.

Ten teams submitted their work, and the final four were invited to present at the 2021 Data for Public Good Symposium hosted by MIDAS in February 2021 and winners were announced at the conference. Watch the finalists presentations which included undergraduate and graduate students studying applied data science, engineering, mathematics, design, and information science.

The winners of the 2021 MyVoice Data Challenge were a team of students from the University of Michigan School of Information: Christine Gregg, Nhan Le, Michael McManus, and Liu Jason Tan, all Master’s Degree students in Applied Data Science. The title of their presented solution was: “Natural Language Processing for Qualitative Research: A Human-Centered Approach.”

Process overview: Data input is Raw Excell survey data. Entered into this diagram of the process. A Python Scrumpt (Researcher's steps documented in log in real-time" leads to REsearcher input (select file and research question) leading to Data preparation (clean, lemmatize, remove generic stop words) leads to Researcher input (review top 20 words and add custom stop words) elading to Generate MOdel (generate language model via BERT Sentence Transformer) then to the Iteration cycle of Binary Clustering (Generate binary cluster via hierarchical clustering) leading Research input (examinig cluster sample semantic consistency) and with consistent theme Reserach Input (Assign Theme) then return to View remaining cluster) once that is finished Finalize Themes: Display final theme tree and format coded data and finally Theme Output: Process documentation log and theme visualizations

Let’s hear from the winners, in a Q&A:

Can you summarize your winning solution for us?

We developed a clustering tool utilizing Google BERT to support and accelerate researchers’ task of deriving common themes from text message responses to MyVoice survey questions. The tool leverages the researchers’ domain expertise by incorporating their review throughout the cluster labeling process. By utilizing a semi-automated human-in-the-loop approach, our tool has the potential to save the researchers hours of work, facilitates full reproducibility of analysis, and provides a robust understanding of underlying sentiments within the text message responses.

What was surprising about the process?

One major surprise was how well we were able to derive semantic meaning from short text message responses. Coupled with human-in-the-loop tagging, we were able to cluster themes based on thematic content with overall great results.

What was one of the biggest takeaways from participating in the challenge?

Nhan: Engage with your stakeholders: our work takes effect by leveraging their ability.

Christine: The MyVoice Data Challenge reinforced how important it is to have an interdisciplinary team with diverse backgrounds and perspectives. We naturally gravitated to different parts of the project and we accomplished so much more than we could have individually.

My background before starting the MADS program was environmental engineering which is often one or two steps removed from people. I thought the public policy and human-centered aspects of this challenge were really exciting and this helped me reframe what to focus on after graduation.

Michael: I think the biggest take away for me is to keep the end user and other stakeholders in mind and at the heart of the process. By keeping our focus on what would be most beneficial for the researchers, I feel we provided something that may be a value-add for them in their endeavor to help inform policy that ultimately impacts the lives of the youth in the United States.

Secondly, go to each meeting with an agenda and assign action items with due dates to help keep the project on track and respect everyone’s time.

Jason: My biggest takeaway was that if you put your mind to it anything is possible. All of us had very minimal background in NLP [natural language processing], but we overcame those obstacles and used our strengths to produce a great product. We all come from different backgrounds, having our own strengths and weaknesses. It was very important that our strengths complement each other. We set a goal in mind very early on, and I am still amazed by how far we’ve come and everything that each of us learned.

Do you have advice for students who’ll participate in the future?

Nhan: Have no fear trying state-of-the-art tools: be goal-oriented and try whatever techniques to get to the results first, then think about what you’re doing later.

Absorb what your team members have to say as much as possible. Find things to agree with and say yes to as many ideas as reasonably possible.

Michael: As Stephen Covey says, “Begin with the end in mind.” I think having a clear direction of where we wanted to go and setting goals from the beginning helped us stay focused and produce good results.

Assume positive intent. As we are all students working remotely, unique challenges can arise. These could be anything from communication, scheduling, sharing deliverables, and meeting deadlines. Understanding this and assuming positive intent from the team goes a long way in creating a team environment conducive to collaboration.

Take time to get to know your team personally, at least a little bit, at the very beginning of any project. It will help communication and collaboration and pay dividends in the end.

Jason: Teamwork makes the dream work. Everyone has their own strengths, and your team should work together in a way that utilizes your strengths. Listen to your teammates and bounce ideas off each other.

****

Thank you to all the participants, judges, viewers, and event organizers for helping us make our goal of more rapid, usable, and actionable insights and our wish for student-led, youth-centered data science methods more of a reality with the 2021 MyVoice Data Challenge.