But did you see the gorilla?

For a course in Cornell Tech University called Psychological and Social Aspects of Technology taught by Mor Naaman, we were asked to performed a replication experiment on one of the papers we studied in class.

We decided to perform a replication report of the paper Gorillas in our midst: Sustained inattentional blindness for dynamic events by Simons, D. J., & Chabris, C. F. (1999). Perception, 28(9), 1059–1074.

We chose this paper in particular for our replication study because the experimental procedure is detailedly explained, and the material required to conduct this study is easily available. Specifically, we have used only the opaque gorilla video as there is a strong contrast in the performance of the participants for this video as compared to the other videos used in the study. We are also changing the medium in which the video is shown. The experimental conditions are also easy to replicate in a college environment.

Our Hypothesis


For this study, we have used the opaque video of a gorilla being the unexpected event. However, the video will be displayed on two different media: computer screen and VR. The hypothesis of the experiment is that computer videos will cause a drop in the performance drop in terms of detecting the unexpected event as VR will allow participant to have a more immersive experience.

Video:Opaque (Gorilla)

Sampling/ Recruitment:


30 participants, either volunteer, paid in candy bars, or paid a small fee.

We divided the experiment into two types. One was done in the traditional way where participants watched the video on a monitor. The other one was done on a VR mode by using Google cardboard. For the experiment done using a monitor we recruited participants from Cornell Tech Housing. We went from door to door and asked students to perform the experiment. For the second type, we recruited participants around the place where some of us lived in Queens. It was a bit difficult to recruit neighbors as participants as they did not know the context and had difficulty using VR.


We used one video which is the opaque gorilla video, which lasts for about 75 seconds. Each tape shown 2 teams of 3 players. One team wears white and the other black, and move around randomly, passing a basketball to their teammates. The passes were either bounce passes or aerial passes. Almost 44–48 secs into the video, an unexpected event occurs, which lasts for 5 seconds.

Treatment Conditions:

Unexpected events: Gorilla: Short woman wearing a gorilla costume walks from the left to right.

Video Creation: Opaque: all 7 actors are filmed together.

Participant Tasks: Count how many times the team wearing white passes the ball on the screen.

Independent vs Dependent variable

Independent variables: VR condition, Computer condition.

Dependent variable is a Bernoulli variable that whether participants detected unexpected event.


Participants will be asked to note the number passes. Then, they will be asked whether they saw the unexpected event (dependent variables) during the experiment. After a Yes response, observer is asked to elaborate the observation.

*If participants know of such a study, their results are discarded.

Usage of Connective Media

We are reimagining the experiment by incorporating modern technology such as computer and VR into the original experiment. We hope to test whether the medium of technology influences how users perform in different environments.

Experimental procedure:

For this study, we replicated the experiment on inattentional blindness carried out by Simons D. J. and Chabris C. F. We changed the experimental conditions a bit for this replication experiment. In this case, we only used the opaque gorilla video, and asked the participants of the study to count the number of passes made by the team wearing white. We chose the gorilla video because it had a higher performance in the original experiment. Also, we asked the participants only to perform the easy task from the original experiment. We then divided the participants randomly into two groups, and varied the medium of watching the video. Half the participants, were randomly assigned to watch the video on a laptop, and the other half were assigned to watch the video in VR on a Google Cardboard.

Participant characteristics:

Out of 20 participants, we had 9 females and 11 males. The age ranged from 22 years to 62 years old. We had more older people perform the task in VR compared to performing in on a monitor. Most people who performed the experiment in monitor were also tech savvy and young students.

Few problems that we faced while recruiting participants for our VR condition was that some older participants were not comfortable with watching the video in VR and said they preferred watching the video on a laptop or a tablet. Also it was hard to find participants in the neighborhood at Queens as people would either not open the door, or just close the door on our face. When it came to the students in university, they had already watched it, as it is a well known problem. Hence some of the participants were not included in the dataset.


For the desktop condition, 16 people were interviewed but only 10 of them hadn’t watched the video before. 3 people out of these 16 saw the gorilla, and 7 people completely missed it. Among the 7 people who missed it, 4 of them when given hint of gorilla, said they roughly remember something like that, while the other 3 have no recollection even given the hint. All 3 people who correctly identified the gorrilla can accurately describe what was the condition like.

For the VR condition, 14 people were interviewed, but only 10 hadn’t watched the video before. We discarded the data of the people who had watched the video before. 2 people out of the 10 saw the gorilla, and 8 completely missed it. 3 out of these 8 people mentioned seeing something unusual in the video, but failed to notice the gorilla. The other 5 completely missed the gorilla.

In the original study, 42% of the participants who watched the opaque gorilla video, and performed the easy task noticed the task. In our experiment, 30% of the participants noticed the gorilla when watching the video on the laptop, and only 25% of the participants noticed the gorilla when watching the video in VR. This variation in result, at least for the laptop condition may be because of our small participant pool of 10 people watching the video on laptop.

Desktop condition performs better.

A possible explanation for this is that the demographics of our participants watching the video in VR, may have had a harder time to focus. Also , some of these people had never experienced VR before, which might’ve led them to be uncomfortable or distracted while watching the video.

A potential improvement is to continue testing the VR condition with people who are familiar with technology. If our hypothesis is indeed supported by participants familiar with VR, it may mean that a new technology has adoption stages. At each stage, it is our responsibility to figure out how people interact with it and how we can maximize our desire effect given on that.


Our hypothesis was proven wrong from our experiment and we think one of the main reasons is that we had a wide range of demographics. Most people who performed the task in VR did not have prior experience with it, which may have caused discomfort leading to lesser attention on the task itself. On the other hand, most young people were comfortable with technology and their focus was on doing the task. This leads us to an observation that different stages of technology adoption may have an impact on the results. It is our responsibility to make sure we understand how people interact with the technology at each stage and most effectively utilize the interaction habits to achieve our desired effect. While designing experiments performed with an assist from technologies we need to keep in mind the age groups we are recruiting and their tech savviness.



Noshin Anjum Nisa, Utsav Vakil and Yuhui Dai

Follow me on Twitter or shoot me an email (nisanoshin@gmail.com).
Always looking for feedback and cool projects to work on.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.