Exploring Electrodermal Activity Signals in Child-Robot Learning Interactions
This past summer, I had the amazing opportunity to intern in the Personal Robots group at the MIT Media Lab through the MIT Summer Research Program on a rather unique data science project. This post is intended to be an overview of what I did for this project. You can read more about my general experience here:
Also, if you are interested in the MIT Summer Research Program, here is a short fun-filled video:
My project was centered round quantifying the difference in two learning conditions, personalized (P) and non-personalized (NP) assigned to the kids when interacting with Tega (a social robot) using electrodermal activity signal as a measuring metric. The P condition adapts to the kid’s current aptitude while the NP condition is based on a created curriculum of the kid’s grade level. I worked completely in MATLAB and created custom code to perform all the operations highlighted.
Dataset and Study Details
For this study, 32 kids from two schools (16 from A and 16 from B) in the local Boston area participated in this study over a three month period. Four E4 sensors were worn(Two for each school) by the kids. On average six or seven different learning sessions occurred for each kid in each school. (Note that the E4 sensors also records temperature, heart rate, blood volume pressure (BVP), and acceleration (ACC))
The study session was divided into six different timelines. Opening, Story 1, Retell 1, Story 2, Retell 2 and closing. It is highlighted in the infographic below.
The data for each session for each kid for each school is stored in a different folder. The recorded EDA data is stored in a CSV file. The first row contains the start time, and the second row shows the frequency.
Additionally, there is a different excel sheet created with the timestamps for when each storytelling stage (opening, story1..) begins, ends and total duration. Also note that during the story stages, Tega asks the kids questions. The questions asked, the specific time they were asked and whether they were answered or not is logged in a separate excel sheet.
Data Trimming and Preparation
To begin exploring the data, I had to take the raw data in about 215 excel sheets and collate the necessary information in a format that for further analysis. I created a custom script in MATLAB that did the following:
- Collected all the data files in the current directory and stored only the EDA.csv files
- Extract both the timestamp and frequency from each file
- Apply a one-dimensional nth order median filter to the data(n is the frequency of the data)
- Based on matching timestamp(using timestamp excel sheet), I section the EDA data into respective columns based on the storytelling stage.
- Based on user id of the kid, create a column entry that says either P or NP
- Calculates the mean value for each kid for each storytelling stage
This same procedure is applied to both groups of kids from both schools
Visualizing the data
A. Graph of EDA, ACC, and BVP
The first set of visualization I did was to plot the EDA, ACC, and BVP data for each kid for each session and superimpose the vertical lines that represented the different storytelling stages. I used MATLAB’s built-in plotting tools but then extended the capability with my own custom code.
B. Graph of EDA with Question Timestamp superimposed
I also created graphs of the EDA signals with the question period superimposed on it. This was simply a different way to view the data.
Data Analysis: T-Test
The main analysis performed on the data was a series of the unpaired two-tailed t-test. The most important t-test I ran was an unpaired two-tailed test of the mean of the P vs NP group for all the kids for each storytelling state. Variance is assumed unequal based on f-test that showed insignificant p-values based on a hypothesis that the variances are equal.
Hypothesis: EDA Mean of P group and EDA Mean of NP are unequal
The table below shows the important result form the test. Note that all the EDA p-values are significant (less than 0.05) except the Story2 and Retell2 states. This is because there were very little data points in these states because not all kids had data recorded for these states.
To better visualize and gain a deeper understanding, I created a boxplot to show this.
In summary, I was able to show that kids with a personalized learning condition have a higher EDA mean average as opposed to kids with a non-personalized learning condition. My results were included in a research paper from the Personal Robots Group which I was very thankful and excited about.
This was an amazing summer project that definitely challenged my programming skills since that is essentially what I did the entire time. I wasn’t able to go as in-depth as I should since there is only so much you can get from a blog post.
Thank you. If you have questions or need to reach me, email me at firstname.lastname@example.org