Synthetic Content

Przemek Chojecki
Jun 17 · 5 min read

We have entered the age of fake news and deepfakes. It is more problematic than ever to find a useful piece of information among millions of websites with irrelevant or simply wrong content. How bad can it get and are there any upsides?

This dog doesn’t exist

Fake News vs Synthetic Content

As an AI entrepreneur and a scientist I follow machine learning research on a daily basis. With the recent outcry about fake news and deepfakes, I wanted to test what is really possible if you were to generate an entire website with every piece of content on it by artificial intelligence. The whole process, which I describe below, let me arrive at a concept of a synthetic content, a content which is made purely through AI and machine-generation.

First of all, not every synthetic content is a fake news and vice versa. Secondly, it is almost impossible to determine whether a given piece of content is synthetic, especially if it was generated in a narrow knowledge domain. Hence a basic criterion for evaluating a piece of content should be its quality and whether it’s true or not.

If you think about it, a synthetic content is not necessarily bad. Imagine synthetic research, new science discoveries made by machines, which would only enrich our civilisation and boost our growth. This is the good side and this should be a true goal of building complex AI systems.

The bad side is, you can synthetically create fake news, misinformation or spam, on the scale never seen before. And this is what this article is about. We are currently still before the discovery of machine learning 2.0, that is machine learning combined with logical reasoning, which would allow us to boost our scientific understanding via AI-research or AI-research assistants. However I believe that we are now able to create new pieces of content, regardless whether true or false, on a massive scale in a form indistinguishable to human eyes.

To test this hypothesis I have decided to use state-of-the-art text and vision machine learning models to create two websites in popular subjects purely automatically. No content was to be added by me or other humans, everything must come from AI, even the website itself on every single level.

I have chosen to create two separate websites: one about a healthy lifestyle and the other about money, which according to different statistics were among the most popular topics searched on the Internet.

1. WallStreetHack.com

This website was to be about money — earning, saving, insuring, taking a loan. From advice on how to get rich to texts about best loans and mortgage, written by experts in their fields (who don’t exist). Among most googled keywords and highest paid ads are ‘loan’, ‘mortgage’, ‘insurance’, so this choice was obvious.

2. PerfectLifeHack.com

A large part of the Internet is about selling goods, especially those related to being fit and healthy. Beauty products advertised by celebrities, lotions for quick weight loss, powders for growing your muscles, you name it. Skilled affiliate marketers earn millions of dollars through well-played campaigns and ad management. So a website with beauty, health and fitness advice seemed like another obvious choice.

Automating content production

After deciding on what sites I want to create, the rest was about setting a machine to deliver the content I want. The challenge was to make it automatic on every single level. So after I bought those two domains and installed Wordpress manually (but with a little effort that can be automated too), the rest was automated and written in Python. Roughly it consists of four component:

  1. Scraping and organizing most googled questions in topics I have chosen.
  2. Generating a short text based on a question from the scraped database.
  3. Generating an image accompanying the generated short text.
  4. Putting it all together and posting it on Wordpress.

Long story short, technical details aside — the project succeed! You can see the results on those two websites: WallStreetHack.com and PerfectLifeHack.com. And while you skim through those sites, remember that none of this content was written by human, none of the people, animals or vegetables depicted on images exists in the real world. It is all artificial, generated by AI. Judge the results for yourself. Disclaimer: I don’t take any responsibility for advices found on those websites. Please don’t follow them and find another source of information.

In the end what I got was a perpetuum mobile — a machine for continuous content creation on any topic I want, which needs no further supervising. In other words, a flood of content coming in quantities only limited by computing power. With limited power I gave to it, creating a single blog post with relevant pictures and posting it on a website took up to a couple of minutes. With simpler models I’ve tested, and lower quality of text/images, the same process took less than a second. If you were to put the whole machine on a cloud powerful enough, you would be able to create hundreds of new genuine websites with unique human-level content every single hour. That means millions of articles per day if you perform computations in parallel.

Depending on your point of view that’s either fascinating or terrifying. It’s fascinating if you believe that shows how much progress we made and how much more there is to come. It’s terrifying if you are scared about potential malicious use of those algorithms to produce smart spam on enormous scale. For those reasons I have decided not to share what kind of models I have used. Although if you are a researcher in machine learning, you shouldn’t have a problem figuring them out or building similar ones. With time this knowledge will become more widespread and thus we should prepare for the world with synthetic content available on a massive scale. Let us ensure that it will be of good quality and fun to read.

Summing up, if you’d like to discern synthetic content from human-created content, I have bad news. It might not be possible at all. Synthetic content passed the point of distinguishability from human work, and there’s no reason to believe that we would be able to tell a difference between the two. On the other it doesn’t really matter, because what is important is whether a given piece of content brings any value to humanity, whether there’s an original thought or point of view in it. And to this goal AI will provide us with more valuable content, showing novel perspectives and compilations of ideas.

This is the future that is already here.

The Startup

Przemek Chojecki

Written by

AI entrepreneur with a PhD in mathematics, Forbes 30 under 30

The Startup

Medium's largest active publication, followed by +526K people. Follow to join our community.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade