Computer Vision in the EdTech Industry — What Can AI See

Diana (Fangyuan) Yin (she/her/hers)
Alef Education
Published in
8 min readNov 16, 2019


In my previous blog article, we have explored the four active participants of AI in education:

Higher-Ed and research institutions.

Tech giants.

Educational giants.

EdTech startups.

In this and the following articles, we will look into the question of how various AI technology is applied in EdTech, and as a start, we will focus on computer vision in this article.

In May 2019, HolonIQ published the “2019 Artificial Intelligence and Global Education Report”, where it mapped five core applications of AI technology in education, namely voice, vision, natural language, algorithms, and hardware. These five areas of application span over a broad spectrum of educational activities, from knowledge acquisition, school logistic support through to assessment, and academic integrity, expediting a deeper educational transformation thoroughly. Building on such a framework, I aim to provide a further look into each category, covering how each type of technology is used and some examples of companies/products.

Five Core Applications of AI (from HolonIQ 2019 Artificial Intelligence & Global Education Report)

Emotion detection, eye-tracking, and posture detection are made possible with:

  • Increasingly powerful computing techniques.
  • A broader use of camera-equipped electronic devices.
  • An increased number of independent image-acquisition hardware (for example, Microsoft Kinect sensors).

Overall, the most prominent advantage of computer vision in education is the ease of capturing and assessing learning in a non-obstructive manner, without continuous efforts from humans. It can be applied (and is being applied) in a plethora of educational scenarios and this article would only be able to give four examples of them:

  • Engagement detection in distance learning.
  • Learning management in physical schools.
  • Automated proctoring online exams.
  • Handwriting recognition.

1. Distance Learning: Engagement detection and enhancement

Distance learning breaks the boundary of geography and wealth and provides high-quality and democratized education to interested learners. Nevertheless, data on MOOCs bring to our attention the concerns of low retention rate (7% to 13% of students enrolled in a MOOC lesson actually complete that lesson.) and engagement rate. Compared with brick-and-mortar classrooms, distance learners can’t benefit from the physical proximity of instructors or support from learning communities. Additionally, online instruction and learning don’t happen at the same time, and different learners opt in and out of their study at different times, adding to the asynchronicity and seclusion of learning. That’s why engagement is even more important in online learning environments than in traditional classrooms.

AI technology has developed so much that it not only recognizes your identity but also decodes your emotions. Source: Shutterstock

In a physical class, teachers can easily identify when a student is bored, stressed, or distracted by paying attention to their body language and facial expression. Online learning makes this approach impossible, and that’s where AI could help. With computer vision, distance learning platforms could collect real-time learner behavior, such as eye movement, body position (slouching or sitting up), and facial expression (yawning, frowning, or squinting eyes). Such data can be harnessed for either immediate intervention (such as suggesting easier or engaging materials or segmenting students) or reflective actions (redesigning lessons, for example).

Unfortunately, this concept remains at the research level. Several research papers [1] [2] have found facial recognition a promising approach in engagement detection in an online learning environment, but few of them are turned into commercialized products. One company that has been implementing it is Emotuit, a California-based e-learning analytics start-up using facial recognition as analytics to improve student e-learning engagement.

2. Physical schools: Learning behavior detection, attendance monitoring, and campus security

For a long time, researchers and educators have heavily relied on teacher observation and student self-reporting (surveys, questionnaires, and interviews) to gauge student learning behaviors in class. Such an approach can be time-consuming and unscalable when multiple classes or schools are in question. Besides, compared with machines, human eyes are not perfect in picking up subtle changes in micro-expression, processing hundreds of faces at the same time, or detecting patterns in behaviors. Recent advances in computer vision and algorithms have made possible considerable improvements in accuracy for both image and streaming video. Recent years also witnessed an increasing number of products and companies specialized in facial analysis, emotion detection, and identity recognition, such as Megvii, Kairos, Amazon Rekognition, Microsoft Emotion Detector, and Affectiva.

In theory, the support of computer vision could enhance the ability of educators to detect, measure, and respond to student learning behaviors, which could be further used to provide tutors with relevant feedback on their instructional methods. That is to say, once the user’s level of involvement and acceptance can be assessed in real-time, both instructors and learners benefit from a more effective and adaptive learning experience. Similar technology could more broadly be utilized to take attendance in class and ensure campus security, considerably freeing teachers and schools from logistic burdens.

In a classroom, do you feel comfortable if every movement of students is captured by cameras? Photo by Nicole Honeywill / Sincerely Media on Unsplash

However the application of computer vision to physical learning environments is highly controversial in the western world.

On the one hand, practitioners are concerned that such technology is not mature enough and delivers inaccurate and racially biased results. With even a slight misinterpretation of student emotions and behaviors, inappropriate decisions would be made and mistakenly target students. On the other hand, there is also severe ethical consideration to student privacy and sensitive data (Though researchers have been exploring a privacy-preserving approach to train AI model without requesting a copy of the data. For details see “federated learning”). In autumn 2018, two Swedish high schools launched a trial of facial recognition to check student ID and take automated attendance. However, in August 2019, the schools’ local authority received a fine worth of around $20,700 from the Swedish Data Protection Authority under GDPR. According to the Swedish Data Production Authority, students should expect to have certain levels of privacy and the school didn’t have adequate reason to collect sensitive student data, even with consent from parents to monitor students.

It seems that China is more open to approving such an initiative.

By 2018, cameras have been installed in several high schools and universities to scan student faces for emotion and to improve attendance. The cameras are also paired with a smart learning management system that gives student behavior scores and suggests interventions to teachers to ensure that students are focused and actively learning. But the details of such a system have sparked broad debate in both China and overseas.

3. Online exams: Automated proctoring

With more universities advancing their curriculum online, higher education also sees the necessity of scaling online examinations. But with large numbers of students taking the exam remotely and potentially at different times, how can faculties ensure exam integrity and accountability? Traditionally, a proctor needs to be physically present in a test environment to verify the identity of test-takers and prevent them from cheating during the test. This traditional model apparently doesn’t scale up and doesn’t fit the online learning model. At the same time, AI technology has advanced significantly to report and even block the test-takers from using a false identity or cheating when the exam is happening or afterwards. This process requires no more than a webcam, an Internet-connected laptop, and a microphone, and the proctoring system can also keep video records for post-exam reviews.

Since false identities can be determined by recognizing faces and voices, and cheating behaviors usually involve suspicious eye or facial movements, an AI-based online proctoring system could instantly detect any inappropriate behaviors and shut down the exam. In their paper “Automated Online Exam Proctoring”, researchers from Michigan State University have demonstrated their “multimedia analytics system” and its accuracy of “nearly 87% segment-based detection rate” in detecting four types of cheating behaviors. The higher the volume of data, the more accurately the algorithm trained in red-flagging suspicious behaviors.

AI can detect false identity if students attempt to cheat in the exam. Photo by Sergey Zolkin on Unsplash

When it comes to public acceptance, automated proctoring has been already widely accepted by the industry. In 2017, ProctorU launched UAuto, an automated online proctoring solution that uses AI to verify test-taker identity and flag any suspicious behavior in real-time. Respondus Monitor and Mettl also provide technology solutions in proctoring online exams remotely with AI. Some of their technologies are compatible with the most common Learning Management Systems (Canvas, Blackboard, Moodle, Brightspace, and Schoology) and partners with certain EdTech and educational publishing companies, such as Cengage Learning, Pearson, McGraw-Hill Education, Wiley, and ALEKS.

4. Handwriting recognition of student work (essays, short answers, and notes)

In the majority of schools and universities, paper-based tests and written exams are still a primary means of student evaluation. Students write their responses to short open-ended questions or long essays in tests administered school-wide and nation-wide. Traditionally, a select group of graders is responsible for scoring them, but such a process can be time-consuming, low-efficient, and painstaking.

Nowadays, artificial intelligence has empowered computers with the cognitive ability to recognize convoluted handwriting and even provide scores to them, given a scoring rubric. A team in CEDAR (Center of Excellence for Document Analysis and Recognition) from University at Buffalo-SUNY has conducted a few research on this topic [1] [2]. Despite errors, the results from system performance were comparable to human graders.

Besides essays and short answers, the same technique could also be used for students’ written notes. One fair example is INK-12, a collaborative project between TERC and MIT, which allows students to take notes and write down math solutions on software and enables teachers to view and share student work with the class to support students’ learning.


Up to now, we have looked at one specific domain of AI technology — computer vision, and four examples of its application in the EdTech industry. Admittedly, technology never stops to amaze us, and computer vision is and will be harnessed to more aspects of education. For example, MIT has developed machine learning robots with the ability to perceive and respond to human emotions and adopted it to facilitate therapies for children with autism. Smart robots like Little Sophia could recognize and track human faces when “communicating” with kids. Tools like Osmo pair their software with a camera to recognize student activities in the physical world and creatively reflect it on the screen, and the list can go on.

As technology has become smarter and smarter to “see” for itself, it is also a controversial issue whether it should be adopted broadly to look into, monitor, and report on student learning behaviors.

Do you know of any other educational product or company that utilizes computer vision? What do you think of them? Are you an advocate or a critic of such an attempt? Share your experience with the community by commenting below.



Diana (Fangyuan) Yin (she/her/hers)
Alef Education

Product Manager. Harvard GSE. Michigan Ross MBA Candidate. CFA. In tech industry for 6 years. I write about tech for fun. Writing to fulfill my childhood dream.

Recommended from Medium


See more recommendations