AI.EDU: tuning GPT 3.5 into GPA 4.0

9 min readMar 7, 2023

The Role of ChatGPT and Generative Models in K-12 Education

Over my two decades experience with EdTech, education applications of popular technologies rarely make it to the public consciousness — it took a global pandemic for otherwise informed citizens to realize their kids could study 100% online despite such models existing for 30 years and having already surpassed 10% of all students many years before.

So imagine my bemusement this winter to see that within a month of its launch, ChatGPT had already inspired breathless commentary on the impact of AI on education, including from tech bloggers (“AI Homework”, Stratechery, December 2022), think tanks (“ChatGPT: Educational friend or foe”, Brookings, January 2023), politics and culture magazines (“The College Essay Is Dead,” The Atlantic, December 2022) and newspapers (“Don’t Ban ChatGPT in Schools. Teach with It”, NY Times, January 2023). Inspiring further amusement, several tech influencers have called for new approaches to learning in the face of AI disruption which sort of ended up sounding like the already very well established pedagogical practice of “flipped learning.”

For most of us focused on education innovation, we have been thinking about AI in Education for a few decades now, indeed the decade from 2008 to 2017 saw $1 billion invested into AI driven “adaptive” or “personalized” learning, including over $200M alone for Knewton’s “mind-reading robo tutors from the sky” (to borrow its founder’s hyperbolic pitches in 2015, before quietly selling for $15M in 2019). Even older researchers may recall NLP Intelligent Tutoring Systems from forty years prior (Sleeman and Brown, HAL Open Science 1981).

Source: my own DALL-E image of “mind reading robo-tutors in the sky.”

To be clear, this current excitement has been centered on Open AI and its GPT3.5 based ChatGPT along with a handful of other big generative models (i.e., the related Microsoft Bing/Sydney, Google LaMDA and its Bard chat bot, META Galactica and BlenderBot, Baidu and ErnieBot, etc.). But for educators and edtech founders (and investors) there are risks in focusing on these models and derivative applications.

Generative Won’t Generate VC Returns

I will start with founders and investors by linking to Reach Capital’s generative AI edtech market landscape from December “GPT and a New Generation of AI for Education” (Tony Wan), which while comprehensive, sort of strikes me as classifying all the different types of deck chairs on the Titantic.

Venture influencer Tyler Tringa did a twitter thread criticizing tech start-ups building off ChatGPT which is equally valid for vertical market applications focused on education: there is little to no proprietary, defensible technology nor even a unique go-to-market. In Tyler’s words “the vast majority of these products are just (a) an API call to GPT-3 or similar platforms and (b) a thin layer of “prompt engineering” … where you are quite literally teaching the underlying model how to directly deliver [your intended] value via future API calls. All margin will inevitably flow to the platforms or be competed away by infinite competition.”

I am reminded of the 2007–2012 flood of Salesforce-based Higher Ed CRM SaaS plays, “Moneyball for Education” retention analytics services, curation engines for iOS and Android education apps, and eTextbook platforms digitizing titles from the Big 5 publishers. There is no shame in building a product instead of a platform, but when your product is so directly derivative to a larger company’s code base, data set, algorithm or content, you ultimately are just proving out a market’s attractiveness to be disintermediated by that larger company or outcompeted by a hoard of fast followers. Whether its ASU unplugging from Civitas and Pearson eCollege or Pearson publishing their own eTexts (killing off Kno, forcing Inkling to pivot), new media and modalities (e.g., online, cloud, mobile, AI) become integrated into the base product/platform. If your business model is a message board for Blackboard, you better have bootstrapped it so you can at least exit in some buy vs build type deal.

This is already happening as seen in survey results from HolonIQ’s recent report Artificial Intelligence in Education 2023:

Within this context, I suppose there are a couple examples of potential scaled success within the Reach Capital GPT/Generative AI EdTech Landscape, namely Duolingo and Quizlet, which use GPT-3 to generate items for test banks and grammar correction. But to me, this is just an example of an incumbent platform boasting massive existing data sets of user generated and open sourced content and learning outcomes that allows them to train their own AI, integrating a new modality as a feature invisible to the user. To be fair, I do like several of the concepts profiled by Reach, such as tools to allow any subject matter expert to create their own course or storytellers to create narratives (with a little skill around OpenAI prompting), but I just don’t see how that scales for venture returns.

Can ChatGPT Deliver GPA 4.0?

As we next turn our focus to the value of generative AI and ChatGPT for educators and students, I want to express all my gratitude to the team at Whooo’s Reading, a start-up I advised this past year and that has been immersed in AI/NLP for many years. Whooo’s Reading developed AI that can evaluate student writing about any text that they have read. Their models are designed to work with any text and any question and have focused on formative style short responses (i.e., from 1 sentence to 1 paragraph long). In so doing, they built their own machine learning technology to analyze successful writing and their teachers lesson plans and assigned reading.

The CEO of Whooo’s Reading Raphael Menko noted that in his conversations with schools about AI, three questions seemed to come up the most:

How can generative models best be leveraged to help students and teachers?
What problems could generative models create in the classroom?
Now that models like ChatGPT are accessible, should I begin building a learning product centered on these models?

I have asked Raphael and the CTO of Whooo’s Reading Gilles Ferone to share their thoughts around these questions:

What exciting opportunities can we leverage in the literacy space?

“If you have a product where you are frequently releasing new content, then the process of writing thoughtful questions can eventually become time-consuming and expensive. Generative models like ChatGPT, could be leveraged to automatically create questions based on the content you are showing students. For example, if your product has hundreds of hours of video content or thousands of articles, you could in theory use a generative model to produce questions for you automatically. Technically you do not need to use a generative model to achieve this. With Whooo’s Reading, we have created a specific model that’s able to identify which sentences in a piece of content carry the most meaning, and we have been able to use that model to identify automatically promising questions to ask students.”

“We think that generative models are great tools for ‘augmenting’ content creation tasks, but they aren’t suited for unsupervised usage. For example, leaving a student to interact with an AI-generated question that wasn’t vetted could be an issue, with non-sensical questions that only serve to confuse the student.”

“Perhaps the area where we are most excited about leveraging generative models is in helping scaffold various learning skills for students. Using generative models we can take in what students have produced as inputs, and then guide them using their own original work with the roadmap on how to develop their work. This type of feature would certainly save teachers a lot of time and help accelerate students’ growth with richer and more personalized feedback.”

What problems could generative models create in the classroom?

“I think the issue that most of us intuitively think about is how generative models like ChatGPT could make it much easier for students to cheat. We can imagine a tech-savvy student inputting the essay prompt and getting an essay back that is at the very least a solid rough draft, and at worst a nearly finished product.”

“This is a valid concern. First, it is worth noting that the writing of younger students looks considerably different than what ChatGPT produces. We know this, as we can have seen tens of millions of student-written responses, and their writing rarely matches the type of writing you’d see from a generative model that was trained on content that was written by adults who happen to be good writers. During our first NSF SBIR grant, we worked with 20-plus teachers to analyze student writing from 1st grade to early high school, and to come up with an 18-part rubric for each grade level, and we can assure you the writing students in elementary through early high looks categorically different from writing produced by ChatGPT. So, a teacher should quickly be able to see that the writing doesn’t match what their students usually are producing. This can be built into AI through a ‘confidence score’ of whether the writing appears to have come from a student or a generative model.”

Now that models like ChatGPT are accessible, should educators/education companies/founders begin building a learning product centered on these models?

“While models like ChatGPT provide some remarkable opportunities for scaffolding skills using students’ own work as the input, at this stage (and, clearly, AI is evolving very quickly), we have a hard time imagining a product built around ChatGPT addressing the needs of a K12 classroom.”

“Services like OpenAI allow users to generate coherent text based on a prompt. However, they are very generic, hard to troubleshoot when things go wrong, not easily customizable, and overall hard to audit.”

“OpenAI’s general approach will yield the following four problems, as we know from working with teachers across nearly 30,000 schools.

Evaluating an answer to specific criteria. We have learned through six years of AI evaluations that teachers want to know the reason behind a score — without more detail, they will be skeptical of that score. Rubrics provide the solution since they allow a score to be explained in terms of skills that are relevant to the task. A generative model is not set up to do this.
Generate feedback ordered by importance. We have learned that providing feedback is complex. You don’t want to overwhelm students with feedback and there is an ideal sequence to feedback that is most constructive. Thoughtful sequencing and prioritizing requires a more custom approach.
Output confidence intervals. As we have learned, educators want a transparent AI. Confidence levels both build trust with teachers and help minimize showing students false-negative feedback.
Personalization. We anticipate that it will be hard for generic API services to be able to adapt their output predictively based on the grade level or level of the student, which means that evaluation and feedback would not be differentiated and adaptive.”

“In a sense, you could build a feedback engine. We would argue that this would necessitate a very convoluted system just to achieve some very limited purpose. Again, it would be hard to interpret the results of the system and you would have no guarantee the feedback is consistent. Nor would the feedback be catered to a student’s grade or academic level.”

Conclusion

ChatGPT3.5 has come a long way from version 3.0 and so, perhaps 4.0 will take us closer still to producing a full comprehensive rubric for education. However, these models are by nature very hard to control, interpret and develop in-house. The massive development costs has left most such models in the hands of conglomerates like OpenAI/Microsoft, Google, Facebook, Alibaba and Baidu. Their aim is generality. We may one day see such models produced specifically for supporting educational applications, but from my work with Whooo’s Reading, we believe we are still many years and millions of dollars away. Whooo’s Reading has been able achieve results with a model footprint that is but a tiny fraction of GPT-3.5’s 100 billion parameters. As Raphael said, “the current generative models are like going grocery shopping with an airliner. It may not land precisely in the parking lot and it burns a heck of a lot of resources along the way”.

While Whooo’s Reading was not built on ChatGPT, much of my above analysis on the market for AI start-ups extends to their model as well. And so it should come as no surprise that we ultimately chose to sell the company to one of the leading publishers in K-12 education, Savvas Learning, this past week (read more here). It was an honor to advise Raphael and Gilles in this transaction and I hope this post helps other enterpreneurs and educators considering AI.EDU.

AI.EDU: tuning GPT 3.5 into GPA 4.0

Written by Christopher Nyren