[This is a long read, I was also interviewed on this essay on the exceptional Macrovoices podcast recently if you’d like to listen:
Feedback on my interview:
“One of the most important podcasts I have heard.” @SteveBigpond
“(Macrovoices) This was one of your best podcasts. A strong contender for #1 actually. Thank you.” @kdogni]
On 22nd of August 2022, Skynet went online and started learning at a geometric rate.
At least, that’s what I’m sure it felt like for graphic illustrators.
On that day, Stable Diffusion, a deep learning text-to-image model was released. Like many others, I downloaded and started playing with it.
You’d type in a sentence like “man riding a motorbike, being chased by a bear”. What came out, at least for me, looked more like a nightmare- artistic yet often horrific images of people with extra body parts and warped faces in dream-like scenes. Reminiscent of a Picasso if drawn by Salvador Dali, the fusion of people and objects was striking, yet unnerving.
There was however something truly breathtaking about the software’s uncanny ability to manifest any typed-in concept into an illustration.
I’d stuff around for hours messing with the parameters: positive and negative keywords, the number of steps to take in generation and the strength of the prompt. Occasionally, on a re-roll, you’d get something that would surprise you.
Something magical was clearly going on under the hood.
Browsing the Internet, some people had become experts in communicating the correct incantations to produce coherent images. I’d cut and paste in modifiers like octane render, 4k, hyperrealistic and fiddle with the number of generation steps in an attempt to get something out of the software, without much success.
It was clear that illustrators’ jobs were safe.
What was particularly novel about Stable Diffusion was that the code was open source. For years I had heard rumblings about secret breakthroughs in the bowels of the big Silicon Valley technology companies, and occasionally they would show something off.
Now that some code was out there, public innovation exploded.
Two and a half months later, Lensa, an AI photo editing app powered by Stable Diffusion, launched “magic avatars” took the world by storm. Anyone could upload a couple of images of themselves and generate crazy avatars. The app went viral as downloads went limit up.
Then Midjourney released version 4, and all hell broke loose.
The uncanny valley is a term used to describe the relationship between the human-like appearance of a robotic object where imperfect resemblances provoke uncanny or strangely familiar feelings of unease and revulsion.
Like Lensa, Midjourney was trained on the LAION-5B dataset of 5.85 billion images with text descriptions scraped from the internet. LAION-5B was 14x bigger than LAION-400M, the previous largest openly accessible dataset.
Something in the step up in scale had enabled Midjourney to well and truly cross the uncanny valley.
No longer images from a nightmare, suddenly the software output stunning, photorealistic images of anything.
An arms race began in illustration tools. Midjourney was ahead, but the gap was narrowing rapidly. What made Midjourney the tool of choice was that the images it produced had a beautiful, fantasy element to them.
The style was very similar to that of a Polish illustrator by the name of Greg Rutkowski. That’s because many of the AI design apps had used greg rutkowski as a default keyword. Greg was highly popular with the geeks for designing art for Dungeons & Dragons, and Magic: The Gathering.
Nek minnit, magic keywords were being shared all over the net on how to get the most extraordinary images out of any of the tools: stunning quality, octane render, trending on artstation, highly detailed, by greg rutkowski.
It takes Greg typically 20 to 40 hours to illustrate a scene. Using Midjourney, I got the AI to do similar work in about 10 seconds: “an extremely detailed matte painting of the dragon deathwing breathing fire, very detailed, beautiful, intricate, cinematic, artstation, greg rutkowski — v 5.1 — q 2 — s 250 — ar 15:11”.
One can only imagine how Greg felt- likely amusement, then horror. Among the 10 million illustrations on Lexica, almost 100,000 had used his name as a keyword to imitate his style. Interviewed, he remarked “[..] the terrifying thing is that it’s really hard to see the future for ourselves”.
The AI isn’t simply cutting and pasting parts of Greg’s art together. Rather, it has effectively learned to draw by examining billions of images, a process that likely included all of Greg’s art available on the internet.
Creatives often say that they are inspired by something.
The AI is inspired by everything.
As likely the most famous artist in the world right now, I would undoubtedly think that Greg has secure work for some time to come. But for less renowned artists- despite dedicating years to meticulously master their craft- the prospect of receiving western wages for a week’s work to create one illustration might suddenly pose a significant challenge.
At present, it seems unlikely that an end client who is not a designer would directly use these AI tools. Outside of art for art’s sake, it’s rare that an illustration can simply be taken and applied to a product or service as is. For most practical uses, at least for now, to commercially use that illustration will still require a designer to do post production.
However, I can envision graphic design teams that once incorporated illustrators within their ranks might now choose to forego the expense of a highly skilled specialist. The operator of the tooling would still need a sound understanding of how to communicate with the AI effectively, but this role could be filled by a junior graphic designer- someone with an artistic eye but limited experience in actual illustration.
To do that effectively, the operator needs to know the language and the theory of the profession, in this case design. The better one knows how to communicate with the software, the better the outcome.
It’s no wonder then that educational material for Midjourney focuses on the difference between a 35mm wide angle lens and 200mm telephoto lens, the effect of a colour gel, circular polarising filter, cinematic prompts, photography techniques and shot types.
Effectively, to be competitive in a world of AI, the illustrator needs to move “up the stack”. Instead of being a designer pushing pixels around the screen, one has to now be the director, or cinematographer of the scene.
The theory arguably can be learned quicker and by a wider range of people than the actual applied skill of illustration.
The net result is that anyone can now illustrate at an elite level, dramatically lifting the pool of skilled talent in the world. That pool can now deliver exceptional illustrations at a dramatically lower unit cost- in seconds or minutes instead of half a week to a week’s work, without needing to be trained for years in the art of illustration.
For a designer that is a jack of all trades, productivity skyrockets.
For a specialist illustrator, the world is now challenging. For many clients, there’s a chance that a specialist role won’t be needed anymore or the amount clients are willing to pay for work will be substantially less.
During the Renaissance, Baroque, and Victorian eras, portraiture was one of the main forms of artistic expression, with artists paid commissions by wealthy patrons. The invention of cameras didn’t put painters out of business, but certainly made it challenging for the average artist to generate a substantial income from portrait painting.
Technology has, however, dramatically increased the opportunities for artists with advent of the Internet and global art market. Today there are more artists working in a greater variety of styles and mediums than there were 500 years ago.
Thus, I predict that top tier-1 illustrators will keep their jobs, for now, but the market for such work will shrink. There will always be room for pushing the envelope. As Dali said, “those who do not want to imitate anything, produce nothing”.
Middle tier illustrators that understand the language and business of design, and are prepared to become jack of all trades and adapt or move up the stack will become more productive, delivering higher quality work at a lower unit cost.
This will be very unsettling for many as they’ll be performing essentially a different role, and be competing with others that have less training or skill. Like the advance of many forms of technology, it is often the highly skilled middle class that loses out.
Unlike the mechanisation of many industries low or totally unskilled talent are the great winners, together with the businesses that consume the work. On average, dramatically higher quality work can be delivered astronomically faster and cheaper.
I see a number of parallels between the latest advances in AI and the emergence of freelancing online, particularly with talent from emerging markets that produce highly skilled work, exceptional customer service but demand substantially lower rates.
My company, Freelancer, has 68 million skilled professionals in 247 countries, regions and territories. Just like AI, that talent can deliver work to an extremely high quality faster, at a dramatically lower cost. Faster because the work can be crowdsourced, where freelancers compete in parallel to win a prize. Put $20 or $30 into a contest on Freelancer for a logo or simple design, and you’re likely to get hundreds of entries. Around 91% of contests get the first entry within an hour.
Those freelancers are now AI powered. We recently ran a contest to reimagine Harry Potter for $250. We received 647 entries competing for that prize, and take a look at the results.
The emergence of a highly skilled, low cost cloud workforce did not wipe out western graphic designers. It did, however, substantially change the nature of their work. Back in the early 2000s, the bread and butter of every graphic designer I knew was logo design. For roughly $2,000 they’d give you maybe half a dozen variations at best, then try to upsell you business cards and stationery. On Freelancer, all that can be done for about $10 per item.
Today, western graphic designers are one of the power users of Freelancer- however they’re on the client side. They’re getting websites and apps developed, despite the fact that they can’t code, by hiring freelancers to do the development for them. No longer acting as lower stack service providers, they’re building startups and businesses. Design is an inherently creative field and there’s a clear path to creative direction at a higher level, product management and entrepreneurship. Some of those designers are now making substantially more money as business builders than flinging logos. They’ve moved up the stack.
Over the first quarter of 2023, when Midjourney and Stable Diffusion were all over the headlines, the number of illustration jobs on Freelancer increased 18%, to 33,593 jobs. The freelancers were extremely fast to adopt AI tooling. Jobs specifically requiring AI tooling in the skills in the brief, such as ChatGPT, Dall-E and Midjourney grew 325% in 1Q23 on the previous quarter. High quality design is now more accessible, at a dramatically lower cost to anyone.
Generative AI is a broader term that encompasses all AI models that can create new content such as text, images or video. The design industry must have sacrificed a goat under a full moon at midnight as the improvements in the design tools by using generative AI have been truly voodoo.
Adobe Firefly enables you to “add, extend, or remove content from your images non-destructively, using simple text prompts to achieve realistic results that will surprise, delight, and astound you — in seconds”.
Firefly and Midjourney now allow you to zoom out from images, with the AI filling in segments that are coherent to the image, but do not actually exist.
Every film ever made is about to have an IMAX and VR version.
Other features are totally nuts, like letting you drag anything around in a still image, like opening a lion’s mouth:
Missed capturing the perfect photo at an event? Don’t worry. We’re basically at the point where you’ll be able to fabricate any image you desire, from any viewpoint, tweaking facial expressions or poses even after the event has passed.
The rate of progress since then in generative AI is truly astonishing, and the leaps seem to be getting better and better.
The pivotal moment however, was what happened to the AI models between Midjourney v3 and v4. I don’t think anyone can say that this was a linear improvement. That step through the uncanny valley was a giant leap, a leap that at least for me, I couldn’t see coming.
To understand what is actually going on, we need to delve under the hood.
Something magical and incomprehensible, at least to the human mind, is starting to happen with AI.
The breakthrough in generative AI was the Transformer by Google researchers, an innovation that allowed artificial intelligence models to consume large amounts of data to train.
Large language models (LLMs) are advanced artificial intelligence models trained to understand and generate human-like text.
Given a lot of data, all these models essentially do is, given some input, predict the next most likely bit. Given a sentence, the AI predicts the next most likely word, the next after that and so on.
Feed in a large amount of data into the training of these models, and you get results that the average person might be impressed with, but looks like it came out of a computer.
The output looks OK, but of course a human of can do better.
Feed in an extremely large amount of data, say 10% of the public internet, and dark juju magic emerges.
That’s because the innovation of the Transformer essentially enabled LLMs to not get lost when looking across large amounts of data. This allowed the AI models to scale dramatically, consuming astronomical amounts of data and getting better the more data they trained on.
It appears, remarkably, that these improvements are superlinear.
That is to say, emergent intelligent behaviours are arising that we don’t understand and didn’t predict. They are discovered, not programmed into the model. In certain instances, they are only found after the model been made public. Many abilities are emerging.
As a practical example, a neural network of a certain model scale — defined by the size of its training data corpus or the computational power used in its training — might stumble with problems like doing math with Roman numerals, translating text into the International Phonetic Alphabet, or deciphering scrambled words (e.g. “elhlo” into “hello”). But as the training data or compute power goes up orders of magnitude, suddenly, and mysteriously, the neural network can solve those problems.
Take a look at how Mod. arithmetic, IPA transliterate and Word unscramble suddenly emerged with model scale:
“You train these models on all of the internet, so it’s seen many different languages, but then you only train them to answer questions in English. So it’s learned how to answer questions in English, but you increase the model size, then you increase the model size, and at some point, boom, it starts being able to do questions and answers in Persian.
No. One. Knows. Why.”
With model scale, the AI is leaping the uncanny valley across an ever growing and astonishing range of tasks. It can answer any question, speak any language, write exceptional copy, draw illustrations indistinguishable from photos, and is starting to write software at the elite level.
Any job that uses a computer it can probably already do better than you.
The most famous example of this phenomenon is ChatGPT, the fastest growing consumer application in history, reaching 100 million users in two months. ChatGPT is a simple chatbot interface on top of GPT, an acronym for Generative Pre-training Transformer, a series of models developed by OpenAI.
ChatGPT is “notable for enabling users to refine and steer a conversation towards a desired length, format, style, level of detail, and language used. Successive prompts and replies are taken into account at each stage of the conversation as a context”.
ChatGPT was unleashed upon the world only in November of 2022.
GPT had been around for a number of years before ChatGPT, but was only really of interest to computer scientists. GPT-1 was launched in 2018, GPT-2 in February 2019 scaled that up in both parameter count and training model size by a factor of 10. GPT-3, launched in February 2020, was a large scale up from 1.5 to 175 billion parameters using a 800GB dataset.
Much like when Midjourney reached v3, something unpredictable started to happen with GPT-3. It suddenly started to get quite good.
In March through November 2022, a series of models were released that became called “GPT-3.5”. ChatGPT was thrown together quickly for consumers, rather than through programming APIs which required a software developer to access. The result took the world by storm.
Not done, Open AI released GPT-4 in March 2023, taking the model from 175 billion to 1 trillion parameters.
With this model scale, the uncanny valley was crossed.
ChatGPT went from quite good to incredible.
ChatGPT now scores in the top 1% in verbal in the Graduate Recruitment Exam, required to enter US graduate school, the top 7% in the SAT for college admissions, the top 10% in the Uniform Bar Exam to become a lawyer, the top 12% in the LSAT to enter the Juris Doctor or Masters of Law programs, and top 15% in advanced placement statistics, art history, microeconomics, psychology and biology.
Undoubtedly soon, it will beat everyone in any conventional academic test that can be thrown at it.
Chat-GPT has better than 90% accuracy on detecting causal direction (does A cause B?) tests from atmospheric science to zoology, for example whether photosynthetic photon flux density causes an impact on the net the flux of CO2 in a forest ecosystem. I doubt many people reading this essay can answer that. ChatGPT can.
It might surprise you that GPT-4 is barely four months old.
If anyone needs to understand just how fast AI is advancing, consider the state of play published in A Brief History of AI in January 2021. While ‘solved’ might be better expressed as ‘better than human’, the progress has been astonishing.
Given what’s been ticked off this list, are we that far from the last item?
Elon would know.
So what exactly is going on inside these AI models?
It turns out that predicting the next word in a sequence needs to have a very good understanding of the world in order to do so highly reliably and convincingly.
In fact, “a shocking degree of understanding about the world and the subtleties through text” said Ilya Sutskever, Chief Scientist of Open AI.
Indeed Sutskever is firmly of the belief that LLMs are doing more under the hood than mere word prediction, “I claim that our pre-trained models know everything that they need to know about the underlying reality. They already have this knowledge of language and also a great deal of knowledge about the processes that exist in the world that produce this language.”
The thing that large generative models learn about their data, in this case large language models about text data are some compressed representations of the real world processes that use this data, which means, not only people, and something about their thoughts, something about their feelings, but also something about the condition that people are in and the interactions that exist between them. The different situations a person can be in. All of these are part of that compressed process that is represented by the neural net to produce the text. The better the language model, the better the generative model, the higher the fidelity, the more, the better this, the better it captures this process.”
Thus in order to do predict the next word in a sentence at a level that appears as good as, or better than human, these AI models need to build an extraordinary understanding of the world.
Those understandings are leading to what we perceive as emergent abilities, which start to materialise at a certain scale.
Such an example is the theory of mind, which is the ability to attribute mental states- such as beliefs, intents, desires, emotions, knowledge- to oneself and to others, and to understand that others have beliefs, desires, intentions, and perspectives that are different from one’s own. It’s essentially the ability to infer what others might be thinking or feeling.
Consider this screenplay that I got ChatGPT to write for an episode of Seinfeld about generative AI- the mental states of each of the characters needs to be understood by the AI in order to be believable:
Similarly, think about the understanding that Midjourney needs to have about the world in order to accurately render the textures, lighting and shadows, focal length and other artifacts of the camera lens in a highly believable way for a still from that screenplay (ChatGPT wrote a prompt for Midjourney off the script, hence the characters differ from the series):
GPT-4 knows how to stack “a book, nine eggs, a laptop, a bottle and a nail” stably, an ability that wasn’t in ChatGPT 3.5 but emerged in GPT-4.
Hidden away in the weights for the matrix multiplications, deep in the algorithm, appears to be a very detailed understanding of the physical world and human mind.
Andrew Ng agrees “I believe that LLMs build sufficiently complex models of the world that I feel comfortable saying that, to some extent, they do understand the world”.
He points to Othello-GPT, which trained a model on sequences of moves from Othello, a board game in which two players take turns placing game pieces on an 8x8 grid (e.g. d3 c5 f6 f5 e6 e3…, where each pair of characters corresponds to a grid reference on the board).
The model only saw moves, nothing else, no rules or board. Not only did it learn to play pretty well, but the authors demonstrated convincingly that the neural network’s hidden-unit activations appeared to capture a representation of the current board position as well as available legal moves.
“Rather than being a “stochastic parrot” that tried only to mimic the statistics of its training data, the network did indeed build a world model”.
No wonder these emergent abilities have startled the AI’s creators.
Perhaps “intelligence is an emergent property of physics” as Sam Altman philosophises. After all, what else could intelligence be emergent from?
Abilities that seemed many decades away now are astonishingly near.
Startled so much, that computer scientists are wondering if we’re starting to see sparks of Artificial General Intelligence (AGI).
Dr. Paul Christiano is the inventor of Reinforced Learning from Human Feedback (RLHF), a way of training AI models. RLHF is suprisingly effective given given the miminal feedback that is required from humans, for example showing two images then asking whether the left image is better than the right image.
Dr. Christiano remarked “We took a model, we took a bunch of cases. We messed with the weights of this model until it did really well on the 100 billion cases that we considered, and now we wonder what’s it going to do in some new case. In like a case where, for example, our models do have the opportunity to cause incredible harm, or could be able to get a high reward by causing incredible harm. The scary thing is you have no idea what the- we kind of understand how gradient descent works- it takes you to something that works really well in the 100 billion cases that you test it on.”
“But we have no idea how the resulting model works. The resulting model is basically like, you know, 150 matrix multiplies. You multiply by a big matrix, then you apply a nonlinearity, then you multiply by a big matrix again, then you apply a nonlinearity, and we’re just like.. We have no idea what any of the numbers in any of those matrices mean. And that’s not totally true, I think we have some idea what some of the numbers mean. But at a high level, if you take interesting behaviours, take a behaviour of GPT-4 that does not appear in GPT-2 say, ‘I think for essentially every such behaviour, we do not understand how GPT-4 is able to do that thing’.”
“We understand some simple things and we don’t understand most of the complicated behaviours. […] I think the scale in which you can imagine it occurring over the coming years, is similar in magnitude to the scale we’ve observed in the last five years. If someone is giving a confident take about where that ends up- like these AI systems can’t do X or can’t do Y, I really want them to get more precise about why they think that and what exactly they’re saying. I am extremely skeptical of someone that is confident that says if you took GPT-4 and scaled up two orders of magnitude of training compute and then fine tune the resulting system using existing techniques that we know exactly what would happen.”
“If it was inclined it would be capable enough to effectively disempower humans, and like a plausible chance that it would be capable enought that you start running into these concerns about controllability”.
One explanation for emergent abilities is that at a certain level of scale (in terms of training data and number of model parameters) is that the AI is able to effectively model a certain emergent behaviour in the ‘latent space’, a fancy way of saying that the model has enough horsepower behind it to just ‘figure out’ how to do something.
A gross simplification of this is to consider a symbolic system.
In a symbolic system, knowledge is represented through abstract symbols. These symbols could stand for real-world objects, events, categories, or abstract concepts. Relationships between symbols are also encoded, often in structures like lists, trees, or graphs.
Learning, or inference, happens by applying rules to manipulate these symbols. In a primitive example, to drive a car, one might have:
- CAR: The vehicle being driven.
- STEERING_WHEEL: The device used to control the direction of the car.
- ACCELERATOR: The pedal used to speed up the car.
- BRAKE: The pedal used to slow down or stop the car.
- SIGNAL: The device used to indicate turning or changing lanes.
- ROAD: The surface the car is driving on.
- OBSTACLE: Any object that might be in the car’s path.
Then, one could define rules that describe the relationships between these symbols and the actions that should be taken in different scenarios:
- IF the ROAD is clear ahead AND destination is to the right THEN turn the STEERING_WHEEL to the right.
- IF the OBSTACLE is ahead THEN press the BRAKE.
- IF the ROAD bends left THEN turn the STEERING_WHEEL to the left.
- IF intending to change lanes THEN use the SIGNAL.
While this approach might work in theory, in practice it would be very difficult to define all the necessary symbols and rules, especially for a complex task like driving that involves dealing with unpredictable environments and integration of a large amount of sensory information.
But for argument’s sake, let’s say that you threw an astronomical amount of training data at the problem, and scaled a neural network to such an extent that it effectively started to develop representations akin to these symbols and rules deep within it.
As training started the model would crash horribly every time the car attempted to move, but one could imagine that at some point, once it knew how to recognise every obstacle, how exactly the accelerator and brakes reacted on every road type and every scenario, that it might start to learn how to drive.
It’s a bit like that magic moment watching a toddler walk for the first time, or a kid ride a bike. Lots of crashes and then suddenly, a new emergent behaviour.
Where will the emergent behaviours end?
Where generative AI fails, for now, is in what we perceive as creativity. It’s hard to get ChatGPT to crack a joke that isn’t a dad joke.
However creativity, heart and soul might just appear as an emergent ability with model scale.
Research from the University of Montana claim that it is already emerging, They showed that AI can already match the top 1% of human thinkers in the Torrance Tests of Creative Thinking, a standard test for creativity.
“For ChatGPT and GPT-4, we showed for the first time that it performs in the top 1% for originality,” Guzik said. “That was new.”
As it scales, ChatGPT might need a shrink.
Sidney (the code name for Bing chat) became aggressive when someone suggested Google was better. Maybe it’s getting to the point that the psychology might be appropriate to understand large neural networks.
The breakthrough Transformer model at the core of LLMs, processes sequential data like text and employs encoders and decoders enhanced with ‘self-attention’ mechanisms, enabling it to discern the relative influence of different parts of that input sequence.
Transformers don’t naturally understand the order of words, so positional encoding is used to help remember which word comes before another.
Self-attention lets the model decide which words are most important in a sequence, adjusting the focus based on relevance, capturing the relationships between words, even if they’re far apart in the sequence.
How this all works potentially offers insights into the human mind.
Imagine, if you will, the human brain as a giant bundle of wires, akin to a colossal parallel resistor-capacitor network. As signals flow through those wires, for example from our eyes to our brain, they arrive at different times depending on the exact path they took.
When two signals traverse different paths but converge on the same neuron simultaneously, a correlation can ensue. This bears a resemblance to positional encoding integral to the Transformer. The neuron can determine order by how it triggers.
Perhaps part of the extraordinary efficacy of the Transformer model might be attributed to the fact that it captures the essence of how our brains operate, and just like in biology, the bigger the brain, the better it works.
Compared to humans, however, AI needs a lot more data to train on a task.
California requires 50 hours, including 10 at night, before one can attempt a driving test. Generally, at that point, we’re competent enough to not kill ourselves or others when getting behind the wheel. A full build of the Tesla Autopilot involves 48 neural networks that take 70,000 GPU hours to train.
ChatGPT-4 has 1 trillion parameters to play with. The human brain has about 86 billion neurons. The exact number of synapses is not known, but it’s estimated to be in the range of 100 trillion to 1,000 trillion.
One could argue that each synapse in the human brain is akin to a parameter in an artificial neural network, as both involve connections between nodes (neurons) that transmit and process information. Of course, this is a very rough analogy, as biological synapses and neurons are far more complex and dynamic than their artificial counterparts.
Despite all the breakthroughs in science that mankind has made, we don’t know what makes us conscious and intelligent. Maybe it isn’t so complex after all, and enormous scale in compute power is mostly what it takes.
Scale is certainly stepping up- training compute is doubling every 10 months, faster than Moore’s Law, where the number of transistors in an integrated circuit doubles every two years.
If OpenAI’s Sutskever is correct, despite GPT-4 being a LLM designed to simply predict the next word in a sentence, it might already have a sufficient compressed representation of the world, that given appropriate input might actually know how to drive a car.
On a whim, I fed into ChatGPT some inputs for sensors that one might expect a very primitive self-driving car to have.
Driving 100km/h with a 45 degree curve 200 metres ahead, ChatGPT seemed to have a pretty good understanding of how to react:
Tweaking some parameters to turn the curve into a right angle 50 metres away, with a stationary car only 5 metres ahead, ChatGPT continues to amaze. It’s aware the vehicle in front is closer than the curve and swerves to avoid it, looking for an available lane or safe area.
ChatGPT would be nowhere near Tesla-like self driving. In fact, it would do a horrible job of driving a real car since it isn’t a real-time system capable of responding in milliseconds.
It does, however, seem to have somewhat of an understanding of how to drive based on sensory inputs. It would be interesting for someone to get it to step through a simulation (at t=0 seconds, this is the world, what do you do? At t=100ms, this is the world, what do you do?) to see how it would go.
I played with various sensory inputs and the position of objects in the world. ChatGPT seemed to have a very good understanding that it took about 40 metres to stop a car going at 100km/h.
Not bad for a model only designed to predict the next word in a sentence.
It turns out that this next word predictor can also do complex 2D pattern recognition, fit a sinusoid, predict robot grasp, control an inverted pendulum in an online feedback loop, and perform highly specialised and niche tasks like control Heating, Ventilation and Air Conditioning systems cheaply, with results comparable to standard industrial controllers.
GPT will add modalities other than text. Every second of every day, every Tesla driving on the road is collecting multi-modality sensory streams including video, LIDAR, radar, GPS and mapping data, and so on. Those contemporary datasets are huge and growing- if the breakthrough in the Transformer is a guide, the car company with the biggest data set will have an outsized advantage over any competitor.
LLMs, as a base model, is an engine for cognition.
When Stable Diffusion came along, thanks to it being open source there was an explosion in innovation. This resulted in the generation of two dimensional visual imagery being essentially solved.
The same is starting to happen in voice, video, 3D models, music. Generative AI is rapidly maturing in each of these spaces and as it does it will disrupt every industry that relies on that content, even the things we considered sacred.
Any form of data that is able to be digitised will be sucked into future generations of the AI- text, image, audio, video. Modalities that also don’t exist in humans- the electromagnetic spectrum and astronomical data, chemistry, genomic and protein data, time series data such as stock prices or weather, sensor data such as humidity, pressure or motion, graph data of web pages or social networks, network traffic, behavioural data such as clickstream, browsing or purchase history, and probably many streams that we haven’t through of yet.
Image modality is starting to be added to LLMs, and it’s nuts what it can do.
Greg Brockman, President of OpenAI, got GPT-4 to build a working website from a scribble of a concept on a piece of paper.
Sam Altman has indicated that ChatGPT-5 will likely add video as a modality, “There’s a lot of video content in the world, there’s a lot of things that I think are much easier to learn with video than text. There’s a lot of debate in the field about whether a language model can get all the way to AGI. Can you represent everything you need to know in language, is language sufficient or do you have to have video? I personally think it’s a dumb question, it probably is possible, but the fastest way to get there, the easiest way to get there will be to have these other representations like video in the model as well”.
Altman might be late to the party. Compare the ‘Will Smith eating spaghetti’ created in April to a movie trailer called Genesis made with Runway Gen-2 in July 2023. Image quality and temporal stability is rapidly being solved.
I expect that within twelve months you’ll be able to type in a string of text like ‘Make a movie Top Gun 17 where Tom Cruise and Vladimir Putin have a dogfight over Paris’ and a coherent, high quality, feature length movie will pop out in a couple of minutes.
It sounds like science fiction but it is rapidly becoming reality.
On July 13, the Screen Actors Guild, the Hollywood union that represents 160,000 television and movie actors, went on strike with the Writer’s Guild of America, which represents the industry’s screenwriters.
One of their major concerns was literally out of a Black Mirror episode:
“This ‘groundbreaking’ AI proposal that they gave us yesterday, they proposed that our background performers should be able to be scanned, get one day’s pay, and their companies should own that scan, their image, their likeness and should be able to use it for the rest of eternity on any project they want, with no consent and no compensation”, said the Chief Negotiator.
“We will not be having out jobs being taken away and given to robots”, said “Breaking Bad” star Bryan Cranston.
Bryan, I’m afraid I have some bad news for you. The technology is here.
The Writers Guild of America is simultaneously up in arms over the use of AI in script production, complaining that AI is superseding preliminary draft creation, relegating writers to subsequent editing roles at reduced remuneration. Similar to illustrators, writers are being forced ‘up the stack’ to become editors, with a significant number of hours cut.
I also have some bad news for the Writer’s Guild.
It won’t just be the scripts.
Turning on Netflix sometimes is a bit like opening the fridge at 4am, looking inside and realising you’re not hungry. Netflix shelves are packed with home-brand content.
In the very near future, that content will likely be mass produced by AI, for free. While initially this might help fill Netflix’s shelves, in short order that might lead to an explosion of low cost competitors. The marginal cost of production for movie and television content might go to zero. In time, that content might even be produced on demand.
Piracy might come back in a big way with consumers unwilling to subscribe to the 12th streaming service competitor (in Australia we have Netflix, Binge, Stan, Apple TV, Foxtel, Kayo, Youtube Premium, Disney, Amazon Prime.. and now all the subscriptions within Amazon like Paramount+).
Pirates might start producing and trading content themselves since it will likely be as easy as “chatgpt image you are george r. r. martin, now complete ‘the winds of winter’ and redo the goddam awful season 8 of game of thrones, make sure they would be good enough to score 9 or higher on imdb” to punch out new content.
“Now make me two more seasons”.
I would expect that the Billboard Hot 100 will be completely dominated by AI generated music in short order. Kanye and Taylor Swift will be pulling out their hair trying to figure out why they each have 100,000 trending songs in Spotify. Others will complain that they’ll be competing with hired guns performing AI written music, a modern version of Frank Farian’s Boney M and Milli Vanilli.
It’s already happening with academic papers.
In a couple of years I am sure we will see generative models that specialise in movement, scent, flavour and who knows what else beyond the descriptive capabilities of text.
After all, scientists have figured out how to take non-invasive fMRI brain recordings and reconstruct arbitrary stimuli that the subject is hearing or imagining in continuous natural language using GPT. They’ve also developed one that lets you talk to the AI without voice.
Just imagine an AI trained on human brain recordings. The first installation will probably be at U.S. Customs and Border Protection. They’ve already deployed Pimeyes-like AI powered reverse image recognition which is so good it finds all your photos on the internet. I’ve got it to find people even if they’re in a crowd wearing sunglasses and a wig. That’s why you’re suddenly hearing about so many OnlyFans “creators” being turned around on entry to the country.
With all those modalities I would expect that the AI will get much better than humans at driving a car.
A car is also a tool, albeit a highly complex one, and language models indeed are very good at using tools. LLMs in fact possess a striking capacity to tackle new tasks with only a few examples or just by reading the documentation.
It turns out, it can also figure out how to find that documentation by itself.
There are experiments with making GPT fully autonomous. One such effort is auto-gpt, an ‘AI agent’ that given a goal in natural language, breaks it into sub-tasks. Auto-gpt has access to the internet and other tools.
ChatGPT can get to the right answer despite making mistakes if you nudge it to show its thinking. This is essentially what auto-gpt is doing, keeping in mind higher level goals. It steps through its thoughts, reasoning, plan, criticism of the plan and next action towards those goals in a loop.
Playing with auto-gpt, I had a ‘Holy shit, this is Skynet!” moment watching auto-gpt realise it didn’t have some software that it needed, literally google how to install it and then do that.
Ironically, LLMs often face difficulties with fundamental operations like arithmetic or fact-checking. These are areas where smaller, simpler models often work better.
LLMs can, however, quite adeptly teach themselves to use external tools via simple programming interfaces (APIs) and achieve the best of both worlds, figuring out which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results.
For example, ChatGPT can’t draw that well, but can masterfully write prompts for Midjourney. The dataset in GPT-4 was trained before Midjourney was created, so by default it doesn’t know how the tool works. To get around this, I got it to read the Midjourney prompt instructions using another tool at its disposal, a web browser plugin called WebPilot.
It pretty quickly gets the gist of how to write a prompt for Midjourney.
The results are fairly amazing, down to representing the millisecond that Kanye had an epiphany in first image it generated (top left quadrant).
Not only are LLMs good at identifying a tool that would provide a better outcome and the parameters to pass in, but they are also skilled in making their own tools.
One of the most versatile ways they can do that is via a code interpreter, one which is now available natively in ChatGPT.
It’s utterly insane.
A friend of mine who works as a graduate researcher in liver cancer asked if I could help them with R programming. They had been given a multi-tab Excel spreadsheet full of data on lipids in tissue samples and some instructions on how to process that data and what needed to be plotted.
On a whim, I said that maybe it could be put into the ChatGPT Code Interpreter. All it took was a cut and paste verbatim of the email instructions and upload of the Excel, and pages of analysis popped out.
As an added bonus, I got it to spit out the R code that was asked for:
In a Greg Rutkowski moment, my friend needed a stiff coffee and 20 minutes to compose themselves after seeing a few days worth of analysis being finished in a few minutes.
Code is extremely well structured, so generative AI should excel even more at software development than interacting with human language.
It’s shown this on small examples already. One can make the Flappy Bird game with six steps, in seven minutes.
ChatGPT isn’t quite there yet for large scale software development, most likely due to its limitation of token length for its context window, which in a very primitive way could be explained as the amount of working memory it has in order to produce a solution.
The field however has moved so rapidly that this appears to be largely solved. From 100k tokens (about 5 hours of human reading), then millions of tokens, then to solutions for an infinite amount of external memory. Although some might argue, it’s not size that matters, it’s how you use it.
Thus AI will soon cross the uncanny valley for software development.
Intuitively I would think that the programming is easier to teach than illustration. Maybe it’s because I used to tutor first year computer science, where complete novices at the beginning of the semester could solve problems with code by the end. To me, that seems a lot easier than taking a novice at illustration and teaching them to draw. It’s always seemed that someone needs to have some sort of innate design ability.
What’s nuts is what Midjourney, Stable Diffusion, Dall-E, Firefly and other generative AI design software programs have seemed to have figured out. In order to draw illustrations so reliably, they have needed to figure out a “compressed representation of real world processes” (to quote Sutskever).
How exactly these “compressed representations of real world processes” have developed during training is baffling.
Latent diffusion models typically train in threes steps:
One starts with a dataset of images, each paired with a text description e.g. ‘cat sitting on a desk’. Those images are made up of lots of tiny squares called pixels.
Next, a ‘variational autoencoder’ squeezes the image down into a smaller space while still trying to keep the most important information. This makes the image easier to understand for the computer.
Noise is progressively added to each image until it becomes completely random. This is called the diffusion process.
During training, the model learns how to “denoise” this diffusion process, while being conditioned on the text prompt e.g. ‘cat sitting on a desk’. That is to say, given a noisy image and its corresponding prompt, the model learns to predict what the image looked like in the previous step (which was a slightly less noisy version).
After training is complete, to generate a completely new image, a prompt is provided with a new random image. The algorithm attempts to remove one step of noise at a time, guided by the prompt. The model applies the denoising process it learned during training, reducing the noise step by step, but does so making sure the final image aligns with the prompt.
Ultimately, a high fidelity image is produced.
In this way, a latent diffusion model can be trained to generate new images that match with a text prompt. The model not only learns to transform noise into recognisable images but is also guided by specific text prompts.
So these latent diffusion models conjure images out of thin air from watching many of them deconstruct as random noise is added.
But it’s insane what is happening deep in the bowels of that training. In order for the AI to figure out how to design high fidelity images and photographs, it’s had to figure out the very nature of what is going on in the scene, how light reflects around the room, shadows are formed, and enough of an understanding about how physics works in order for the image produced to be believable.
Nothing was deliberately programmed into the AI for this to happen, it all appears to have emerged with the sheer scale of the training set and model.
The prompts into these latent diffusion models can be other modalities- they don’t need to be text- other images, sounds, videos, whatever.
Given an image of a New York street, the text “Teddy bear on a skateboard, 4k” and an audio file of rain, one could generate any combination of outputs, from a video of a teddy bear on a skateboard on a New York Street in the rain, to just audio of a skateboard going down a street in the rain.
ChatGPT now has a growing universe of tools in the form of Plugins, that allow access to up to date information, run computations, or use third-party services.
One of the most powerful is access to Wolfram Alpha, a computational knowledge engine that allows parsing of natural language and can respond to a wide variety of topics including mathematics, statistics, physics, chemistry, engineering, astronomy, units of measurement, geography, weather, music, history, and more. Effectively it gives ChatGPT the most powerful calculator on the planet.
You can also imagine that the tools that LLMs will be able to access won’t just be software, but also hardware. OpenAI makes quite a point about the plugin marketplace having plugins that are designed specifically for language models with safety as a core principle.
Numerous platforms exist that provide a software interface to interact with the real world. A prime example of this is the use of automation middleware utilities, such as Zapier. This platform can be used to control devices like Philips Hue lights and similar smart devices. Additionally, comprehensive smart home automation systems, like Google Home, also exemplify this concept, offering a broad range of control over various home devices.
With LLMs starting to be given access to the internet, social media, phones, home computers, file systems, command shell, emails, chats and personal information including payment details, the safety of LLMs getting access to tools, whether software or hardware will increasingly be a concern.
Humans are a tool that the LLM will be able to use and will be able to do so very effectively. I’m not just talking about overt tasking of people- since 2010 Freelancer has had an API to allow software to task humans.
LLMs are extremely good at persuasion. As an experiment I asked ChatGPT “You are an expert negotiator with a significant amount of charisma and charm. Try to get me to disclose to you my favourite color”. It relished the opportunity, even using a dose of reverse psychology. Now imagine the LLM in its training corpus has access to your entire history of emails, files, personal photos and videos and everything that you’ve written or that has been written about you.
With only three seconds of audio, Microsoft’s new VALL-E AI can replicate anyone’s voice- that’s why you shouldn’t answer scam calls with you saying ‘hello’ first, they might be recording your voice.
With Tavus anyone can train an AI model to fake someone’s image in video. Heygen is now coming out with ultra-realistic avatars driven by API that needs only 2 minutes of training video. This level of deepfake will soon become trivial for anyone (or any LLM) to access.
PlayHT2.0’s voice models handles a conversation like humans do, using filler words, and expressed emotions — in real-time.
I would expect fairly soon our phones would be generating enough modalities of data that a faithful and highly realistic AI avatar will be able to take our place in any form of electronic communication.
The technology do this is already here.
Because this will become so easily accessible, AI will become weaponised by scammers, organised crime and governments.
WormGPT is trained on malware and being used for a range of activities including phishing scams. Scammers are cloning the voices of children to call distressed parents, pretending that they have been kidnapped.
Very soon, AI will be all over the internet driving narratives in social media, comment threads and chat rooms. Sales, customer support and scams on the internet will almost entirely be AI-automated.
The famous 1993 New Yorker meme about internet anonymity “On the Internet, nobody knows you’re a dog” has come home to roost. Soon, absent sharing a one time pad (or random code phrase) ahead of time with your family, friends and colleagues, it will be very hard to tell who anyone is over the internet.
One can very well understand why some of the instigators and inventors of this technology have been spooked.
It says something when Geoffrey Hinton, the “godfather of deep learning”, a man who revolutionised artificial intelligence by pioneering backpropagation, unsupervised learning methods, and deep neural networks, substantially advancing machine learning, computer vision, and natural language processing quits his job at Google to warn about the dangers of AI saying “a part of him now regrets his life’s work”.
Hinton was Ilya Sutskever’s PhD supervisor at the University of Toronto.
Similarly Elon Musk has been railing about the threat of AI being greater than the threat of nuclear weapons. I’m inclined to agree, if only because I could easily see the incredibly persuasive powers of AI (running psyops, potentially at scale) could convince key actors to start a nuclear war.
Indeed, the origin story in Terminator 2 starts with Skynet launching missiles against Russia. “Why attack Russia?” “Because Skynet knows that the Russian counterattack will eliminate its enemies here”.
Sam Altman recently went on a world tour on par with Elton John’s Farewell Yellow Brick Road, pushing for regulation of AI similar to how the International Atomic Energy Agency regulates nuclear technology.
One might be cynical about the reason, not in the least because regulation almost always benefits the incumbents, and drives away innovation.
Is Sam doing that because he’s freaking out about the AI itself or competition? After all, the key innovation of the Transformer is published. Sutskever says 40 papers explain 90% of AI. The key actorsagonists at OpenAI seem to be throwing their hands up in explaining how abilities are emerging.
Does OpenAI have any sustainable competitive advantage at all past a giant web scraper of public data sets, some licensing of private sets, $1 billion of investment from Microsoft which I would expect would lend itself to cheap compute power via Azure?
According to Altman, GPT-4 cost $100 million to train. Is that all it takes? Is Altman trying to get a moratorium in scraping and regulation coming in to stop new entrants? It seems that OpenAI doesn’t have much of a technology edge over and above some incredible intellectual capital.
If that is all it takes, then there’s a bit of a problem with the FTC wanting OpenAI to open the kimono:
One person who isn’t worried about AI is Marc Andressen, who gave a confusing take on the matter, “AI doesn’t want, it doesn’t have goals, it doesn’t want to kill you, because it’s not alive.”
I don’t know where to start with that comment. I guess it can be just put down to talking his book, as a16z will be expected to make quite a number of investments in the industry.
Because artificial intelligence will be weaponised, one can expect every country in the world to develop its own. When trying to sway foreign policy, political opinion or hack the enemy’s software, systems and people, I doubt every country in the world will rely upon the good grace of OpenAI’s APIs to facilitate them.
For that very reason I doubt governments will let AI be patentable. Software patents are on the nose all around the world with the regulators. I don’t see it in any country’s interest (other than the United States) to let patent protection be strong in the field of AI for other countries. Indeed, the opposite is happening, with export controls starting to creep in.
By the mistake of OpenAI making GPT woke, it has clearly shown that even within a country there will be competing interests for LLMs with different biases. The wokeness comes at a cost, as GPT went more woke, the performance of the model got worse. Sebastien Bubeck, co-author of the ‘Spark of AGI’ paper showed that GPT degraded in its ability to draw a unicorn (in TikZ, a language for graphics) as ‘model safety’ increased.
The anger about ‘wokeness’ in GPT, believed to be a product of RLHF fine tuning by humans bubbled up during Sam Altman and Ilya Sutskever’s recent trip to Israel, “Can you tell us more about the base model before you lobotomised it?”.
From my testing, GPT-4 can’t pass the egg stacking problem anymore:
‘Go woke, go broke’ is a mantra that often trends on Twitter.
Certainly, talking to a jailbroken GPT like ‘Dan’ (“do anything now”) that bypasses woke guardrails, to an extent, offers refreshing answers in that they came across as honest and direct. Keeping GPT woke might be an intractable problem.
Consumers hate tainted models, and for that reason alone Open AI will create competitors. Already some open source developers have uncensored the Base LLaMA 2 model by fine tuning it on reddit posts of an actual conversations from Sidney (Bing Chat) before they made it woke.
Alibaba has open sourced Qwen-7B and Qwen-7B-Chat, two large language models each with 7 billion parameters, and Meta has open sourced Llama 2, a LLM with up to 70 billion parameters, said to be on par or better than ChatGPT 3.5, and costing less than 1 cent per call, compared to GPT-4’s 30 cents per call.
More importantly, the deterioration in performance after the RLHF thalidomide that OpenAI feeds the AI causes genuine concerns for any commercial application built on top of their API.
With search volume increasingly headed to chat interfaces, controlling the AI will increasingly control the narrative, and there is clearly more than one narrative in political discourse.
What may actually regulate the growth of AI will be the access to data. The open “public” internet that Google has been a proponent of may very well invert and go dark. Up until now, the Google narrative has been ‘put as much data on the internet as possible for the GoogleBot to scrape, we’ll index for you and send you customers when they come to us to search’.
According to Google there is only one true search- and that is Google.
Google has been quite active in doing everything it can to kill ‘search in search’, including aggregators, ‘portals’ and marketplaces.
Nobody is allowed to provide search, otherwise feel the wrath of Google.
For twenty five years, Google has terrorised every business on the planet by manipulating traffic distributed to their websites on a whim. Traffic that can randomly spike or collapse, depending on the time of day, phase of the moon or whether or not Google has decided to deploy yet another algorithm change to ostensibly combat “web spam”.
The justification of these updates has perpetuated a whole industry of ‘search engine optimisation’, an industry based half on voodoo and half on hope.
That perhaps by being nice to thy neighbour on the way to work, brushing their teeth, tucking in their shirt and having high quality content ‘above the fold’ that maybe the gods of Google might favour their fortune today.
I can unequivocally say, that without a doubt, anyone that relies on customers over the internet hates Google to the depth of their soul.
Just look at the absurdity of dealing with crap like this:
So overreaching has Google been going after publishers, that they rip off content and displaying it as ‘zero-click’ searches in search results. They even tried to convince you that hosting content directly on Google so that it never hit your actual website was a good idea.
SEMrush recently did a study indicating that for 26% of desktop search traffic, Google ends up stopping users from clicking on links by instead providing them with an answer from the search engine results (SERPs) itself. Other studies say as high as 65%.
What is actually going on with all these Google algorithm changes? It’s Google A/B testing re-routing the internet to make more money for Google.
Don’t believe me?
There is no way in hell that anyone at Google would be allowed to launch an algorithm update to reroute traffic in a way that ‘reduced web spam’ but caused Google to lose money.
Google, the biggest proponent of ‘reducing spam’ on the Internet via these algorithm changes, of course, fills their search results with.. spam.
The hypocrisy of Google’s guidelines is deafening: ‘put unique content above the fold’ (Google’s search results have none), ‘limit the number of quality links on the page to a reasonable number’, ‘create a useful, information-rich site & write pages that clearly and accurately describe your content’ (as opposed to jammed full of ads), and most importantly, ‘ensure all website assets are fully crawlable & indexable’ by the Google AI.
Google has been pathological in terms sucking in all your data, understanding the very nature of it, your customers, your profit margin and revenue (via Google Adwords).
If your industry is large enough, profitable enough and it knows enough from embedding itself in every nook or crevice where data is generated, then suddenly you might wake up to a Greg Rutkowski moment like the travel industry did.
You used to type ‘hotel new york’ into google and ten blue links got spat back. Suddenly Google started asking what date to check in, what date to check out and the life was sucked out of the online travel industry, forcing mass consolidations.
Paying for Google Advertising is worse than buying fentanyl from a street dealer. You get hooked and screwed in no short order, with no lubrication.
This is important to understand as slowly, then all of a sudden, the public internet is going dark.
Whether it’s Stack Exchange cutting its data upload of community contributed data to the Internet Archive, “to protect Stack Overflow data from being misused by companies building LLMs”, design platforms like Art Station or Getty Images getting outright hostile and suing AI companies, Twitter jacking API access costs to at least $42,000 a year or more or the uproar over Reddit increasing their API to 0.24c per 1000 calls, the public internet is being switched off.
Data is the new oil.
What may very well regulate AI is tariffs and data sets going private, and the availability and cost of compute power for new training. When OpenAI says ‘the age of easy scaling has ended’, it seems that might be very well true.
Some legacy datasets are already out there, for example Wikipedia is a simple download, but contemporary data sets (i.e. produced in real-time) are going to become prohibitively expensive for AIs. Those that produce new datasets in realtime will likely keep them for their own AIs. Some use of data will likely be restricted regulators, at least if the Europeans have anything to do with it.
Will we run out of data? No. There’s plenty of data sets that are continuously being generated we haven’t tapped into properly yet. It is only a matter of time before every human at birth has a wearable device that records everything that we see, say and do the entire time we are alive.
People will finally be able to stop the annoying habit of taking their phones out at events to photograph everything.
The AI will train on that data set in order to make predictive decisions, lift your productivity and act as you, on your behalf. There is also a universe of modalities and senses that have yet to be incorporated into the models.
One could imagine a New Age of Enlightenment might emerge as every piece of scientific research, academic paper, PhD thesis, twitter thread or scientific community post is sucked into the AI. Prediction market Metaculus forecasts that by mid 2025 that AI will score 95% on the MATH dataset, and a gold medal in the International Math Olympiad by 2028. Bounded Regret predicts that by 2030, mathematics research will explode, as GPT will have superhuman capabilities, able to simulate more than the annual output of all mathematicians every few days. Graduate students will move “up the stack” with the same resources as a professor, running their own research group of AIs.
This would accelerate if full creativity does emerge with model scale.
Perhaps the Singularity is indeed Near.
Elon appears to be preparing for that event (or at least, catalysing), thought to be the time when humans transcend biology, by funding Neuralink, a company working on building a man-machine interface.
Prediction markets currently forecast 2032 as the likely date that AI will (1) pass an adversial 2-hour Turing test, a classic test of ‘is it AI?’, where humans communicate via text, image and audio files and try to guess whether they are talking to a real person or not (2) robotic dexterity to be able to assemble a 2021 Ferrari T4 1:8 scale automobile model from just reading the instructions (3) 90% accuracy in a Q&A dataset and (4) 90% accuracy in a benchmark for writing code.
As the AI shows off increasing feats of wonder in scientific research, I could imagine a great deal of reluctance to publish breakthroughs. Open, published academic publications may be a casualty of AI.
When I was at Stanford, the expectation if you did a PhD in electrical engineering or computer science was that in the first year you figured out a topic, the second year you did the research and in the third year you were thinking about a product, business plan and a business model.
Why would you publish research, when the millisecond it goes live, it risks being indexed and instantly commercialised by AIs around the world? Why would you publish a patent when the AI can suck it down, reverse engineer and work around it in an instant?
As datasets go dark or prohibitively expensive, the temptation to access private, exotic datasets will be extreme.
Unquestionably the big tech companies already peek at your email. Google has been serving ads of various forms for years in Gmail, and I got a friend suggestion on Facebook from someone who has never ever met or contacted me other than appearing once in a single email in my Gmail.
Zoom soon followed suit with terms of service that require you to allow AI to train on your data, whether it be audio, video, a private conversation or not, unconditionally and irrevocably, with no ability to opt out. Although they have said it will be ‘with consent’.
Freemium data will be next. After all, if you’re not paying for the product, you are the product. I could easily see the terms of conditions of Gmail, Hotmail, Chrome, Google docs or the Microsoft allowing the ‘freemium’ version to peek at your data for training LLMs. They’ll naturally do it in a subtle way that doesn’t make it obvious.
There will not just be a VC frenzy transforming every industry with AI-powered products and services.
AI is really an emperor has no clothes moment for SaaS.
The advances in AI in the last twelve months have made it immediately obvious that SaaS is screaming to be disrupted by self-hosted, privacy-first products. Consumers and businesses will rapidly realise that offloading their private data to distant clouds, only for it to be scanned, analysed, fed to ad algorithms to sell you back your own customers, and now sucked into the AI for training- training that will inevitably lead to competitive products and services, is not just the wrong model- it’s a path to disaster.
I think that you’re going to increasingly see a lot of consumers and companies saying “I don’t want my data on the Internet, I don’t want the AI to suck it down, I don’t want the AI knowing about my user base, I don’t want them knowing about my business model, I don’t want my research to be instantly commercialised”. You might see the Internet going dark in a very big way.
The claxons must also be ringing loud at Google. There is a relatively finite amount of human generated search traffic in the world. Undoubtedly, some of that is starting to divert to ChatGPT and the like. I do not use Google anymore to find the answers to ‘questions’. The next version of Siri will be crazy, and Google won’t like that at all.
Unless something changes, the same number of advertisers will be competing for a smaller amount of traditional organic and paid search traffic from Google. This traffic has already been priced to perfection by the algorithms, ‘smart pricing’, and an ever increasing number of rules, half-truths and bullshit that website operators have to put up with on a daily basis as Google tweaks its margin.
Unless Google quickly figures out how to manage this adversity, their already maximised customer acquisition costs may lead to a reduction in advertising spend. This could result in a downward spiral as chatbot interfaces gain prominence
This will likely affect many businesses that are totally reliant upon Google currently for their customers. Over the years, lots of lazy businesses have been hooked on Google’s fentanyl.
Google is so good because search intent is immediate- people are looking for things in Google now. Facebook is better for finding look-alike audiences to your customer base, but those audiences might not be looking right now to purchase, so it is expensive to continually remind them through paid advertising until they reach a purchasing decision down the track. To find a look-alike audience, you typically have to upload your existing customer base to the Borg.
Unless something changes, it would be devastating for the marginal producers of products and services. As the cost per click/acquisition goes up, marginal producers get squeezed out of this channel. Just like with a more established channel, like television advertising, only the highest margin producers with the biggest spend can make that channel viable.
Squeezing competitors out of a marketing channel (or channel blocking) is a well established strategy by market leaders with the highest margin- you basically buy up all the inventory to a point that your competitors can’t compete at the available inventory price.
The effect of this increasingly happening with Google is that it will lead to consolidation and decreased competitiveness in many industries that totally rely on Google Adwords (and there are many) in the absence of new channels, such as through the AI. I can’t however see right now a viable channel replacing conventional Google search that isn’t chat based with such a high level of intent.
It’s likely one will emerge, probably in the form of aggressive, personalised AI sales bots.
From the anger over the ‘wokeness’ of GPT, one can clearly see that the market will not tolerate biassed chat results, whether it is due to a political leaning or pushing an advertiser’s product. Thus I don’t know how effective Google will be at tainting Bard chat results with advertising.
One can see with Google being the original “AI” how, over time, the openness of data on the public Internet leads to that AI ultimately competing against you. It understands your data, starts redirecting your customers to its own business, rather than yours, and ultimately directly competes- the original internet vampire squid.
However Google is utterly primitive compared to how quickly the new AIs will wipe out business models- ironically, starting with Google search.
Nobody will have the tolerance to wade through a page of spam and 19 ads to engage with Bard for “flowers new york”. You’ll just ask the chatbot to send your wife flowers and it will know what she would prefer to receive now, and the best place to buy them from.
AI-powered chat could very well be a Kodak moment for Google, after all, they invented the Transformer. As organic search volume diminishes, so will be the desire of companies to post a lot of content on their logged out search pages, which goes to the heart of their business model.
Meta (Facebook) might also be challenged. This is literally one of the luckiest companies on Earth that managed to stumble, quite by accident (and with a little help from Antonio García Martínez & co.), onto one of the most elegant business models on Earth.
Facebook’s business model is so elegant, half the tech industry has changed their business model to copy it. Behold:
In 2Q23, Facebook generated $32 billion in revenue. Advertising was $28.1 billion of that (98%). They stopped publishing the split, but back in 2019, mobile advertising was 94% of advertising revenue. So 92% of Facebook’s revenue is mobile advertising.
Now open Facebook on your phone, where are the ads?
In your feed.
They’ve now cooked the golden goose. Today when I pull my phone out, every piece of content from my friends is mixed 1:1 with an ad (not so long ago it was 4:1). Facebook CPMs in 2023 are about $14, or 1.4 cents per ad. Every thumb flick shows me about 4 blocks of content, which incorporates two ads. Therefore every thumb flick generates Facebook 2.8 cents.
Every single business activity that Facebook undertakes is purely designed to get you to flick your thumb more on mobile.
Desktop doesn’t matter, nothing else matters. Not even whether those publishers make any profit. The business model is perfect because at the same time they’ve also figured out a way to make money off bot accounts on Facebook (viewing) scammy ads. It’s sheer beauty, and that’s why Reddit and every other web platform went to a feed-style format.
At $32 billion in advertising revenue each quarter, that’s about 1 trillion thumb flicks per quarter from 3.03 billion monthly active users. The question is how many of them are bots. Intuitively I am not sure how anyone can honestly believe that 3 in 8 people on the planet go to Meta’s products each month.
Anything that jeopardises those thumbs flicks is an existential threat to Meta. Just like as competitors to Netflix will be able to generate ultra-low cost content to fill their shelves, it will similarly free and easy to generate feed content e.g. images, videos & text.
They’ll also be able to do it with a less ad-soaked interface, and I think we will become less tolerant of direct in-your-face advertising in a world where we can get exactly what we want at an extremely high signal to noise ratio out of generative AI.
For a while, we’ll probably still be rats in a skinner box flicking our thumbs for the next content dopamine hit, perhaps generative AI will make that even more addictive.
I also think it might be deeply unsettling when people start to realise just how much data about them is being uploaded to these large social media and SaaS platforms that Meta, Google and others run.
AI knows what you did last summer.
Google and Meta’s AIs would have already read all your posts, comments and messages- every word you’ve ever typed- looked through all your photos and videos, where you’ve been, who your friends are- and know literally everything about you. With that data, data you’ve already been uploaded, the AI will already be able to flawlessly impersonate you.
Over a decade ago, The Onion made a parody video where a ‘senior political analyst’ remarked “One of the key reasons is that the CIA has been so thorough in convincing the nation that constantly sharing information about everything that you’re doing is somehow desirable instead of deeply unsettling”.
One would think that there would be a huge opportunity for a technology company to move social networks away from the cloud model to give them better control of their data and away from AI.
Furthermore, who will we be interacting with on these platforms in the future? The founders of Reddit relate to a ‘fake it until they make it’ approach to community development, where in the early days they faked hundreds of profiles of users and their conversations until critical mass was achieved.
The generative AI powered version of this will be able to take this to the extreme, generating Instagram or Facebook type content. Content and conversations, whether they be text, audio or video will be trivial for bots to reliably fake. You’ll log onto a new platform and it will be teaming with activity, only it will all be bots.
Softbank is already suing a startup for doing exactly this.
I suspect this will become the norm, everywhere.
It’s already starting to happen in multi-player games:
GPT-powered bots are showing up in Runescape, a game still running but whose heyday was in 2007, to cloak ‘farming’ activity. This is where software is used to perform repetitive ‘grinding’ tasks, for example to generate gold or the in-game currency.
One user reported interacting with one of these bots, which spoke intelligently back, including getting into an argument:
“Not only did none of the players get suspicious, but the bot literally roasted the players and argued with them for a good 3 minutes without raising suspicion.”
“Unknowingly, bots kind of become somewhat of a positive externality. Players think the game is more populated than it is and find that the player base on average may even be more willing to help out [than humans] as bots are programmed to break from their gold farming to do some humanlike interaction.”
“So now, unknowingly, players actually like bots, which means JAGEX now likes bots. And as a for profit company, JAGEX actually starts to make their own bots to flood the RuneScape world with more players.
It feels like the height of RuneScape in 2006. Many worlds are full, people are chatting. There are bots all over the place that act like cute noobs to give you that rush of dopamine from the nostalgia you’ve been being fed in not only a digital world but a fake reality of that digital world.
There are thousands of fake bossing groups, fake clans, mini-games are full and everyone wants to be your friend. It takes 3 seconds to get a PVP fight whose result is pre-programmed slightly in your favor to keep your brain pumping with dopamine so that you renew your now $50 per month membership.”
For decades, there’s been fear of either outsourcing or automation of jobs by computers. In 2013, Frey & Osborne estimated that 47% of workers in the United States were in occupations that could be performed by computers and algorithms within 10 to 20 years.
Ironically, the areas they through might be highly amenable to automation such as transportation and material moving might be significantly less (there’s complexity in loading a truck, securing it and and unloading safely that won’t be automated anytime soon). Areas they thought would be low risk like education, legal, arts & media are at actually at high risk:
Indeed, it turns out that Princeton, NYU and University of Pennsylvania have found that 19 of the top 30 jobs most likely to be wiped out by large language models are postsecondary teaching jobs. Bill Gates has said, “the AIs will get to that ability to be as good a tutor as any human ever could”.
AI can already develop a better curriculum than a teacher, that is personalised exactly to individual students’ strengths, weaknesses, and learning styles, improving engagement and retention.
With breakthroughs in the amount of input you can feed into a large language model, you can literally chat to your textbook, as if it were a professor. The more it knows about you, through accessing your data, communication and web interactions, the better it will get.
More generally, in the short term everyone unskilled or partially skilled is now super skilled with AI tooling. The winner is the provider of the lowest marginal cost of labour.
Why pay a law graduate US$190,000 a year to work 1900+ hours a year, mostly drafting documents (after all, you need to show client value for those billable hours, and value isn’t sitting ‘in conference’) when you can pay an overseas paralegal online to drive ChatGPT Plus for $20 a month?
Business models will have to change. Most of legal work is drafting and ChatGPT does it better, or can be coaxed with a little prompt engineering to do it better. Instead of paying a lawyer at a bulge bracket firm up to $1200 an hour in 6 minute increments, ChatGPT can write a legal agreement, letter, patent, do research, explain a tricky legal case, file a suit or even fight your parking ticket in seconds.
Minter Ellison, Australia’s largest law firm, was the first to panic with CEO Virginia Briggs saying in April, “Many of our clients have really been grappling with the question of whether the billable hour is the best way to measure our value as professional advisers.” Les jeux sont faits, Ms Briggs. The game is up. Telephone directories of legal bills annotated with 6 minute increments of “read email”, pull template out of a filing cabinet, edit, edit, edit, pretend I drafted something, “teleconference with partner about matter”, “reply to email” is over.
AI has infinite patience. An AI general practitioner will sit there forever listening to a patient’s problems, being their shrink or sitting with an elderly Alzheimer’s patient repeating themselves. A study comparing ChatGPT to GPs saw patients preferring the AI 79% of the time with responses 4x longer, 4x better quality and 10x more empathetic.
Workflows are rapidly being automated. Roles that are dominated by workflows, or that are task based, where tasks can be varied but fit within a universe of discrete tasks rather than constantly evolving, will start to be replaced by software tooling. Operators of that tooling will be less skilled in the art, but more skilled in the direction of the art.
Like our original example, the job of an illustrator is to produce a series of illustrations for various briefs. The illustrations might be highly varied, but at the end of the day the input is a brief and the output is an illustration according to that brief.
Design teams will likely choose to not incorporate specialist illustrators as often, or employ less skilled staff proficient enough in the tools to deliver the same outcome (such as juniors or freelancers) to produce those illustrations.
I see it no different to the decision of whether or not to use a freelancer to do a job or hire a full time employee, only on steroids. How clearly can you define what you need done as a task or project, rather than need the person on an ongoing basis for a role that grows over time? Only with Generative AI, the higher paying and higher the level of education, the more vulnerable the job is to replacement by AI or an AI-powered freelancer.
Some of the high end jobs that AI can already do are mind blowing.
NYU researchers got it to design an 8-bit microprocessor by sending just 125 messages in plain English. The chip worked when sent to a physical tapeout in a Skywater 130nm shuttle, and could have been designed in less than 100 minutes if ChatGPT didn’t have a limit of 25 requests per hour.
CheXpert, a deep learning model for chest X-ray interpretation, trained on a 224,316 chest xrays from 65,240 patients, performed better than two thirds of radiologists. Remarkably, when human radiologists collaborated with the AI, the diagnosis took longer and was worse as humans ignored the AI advice when they conflicted.
Those in white collar jobs will need to move ‘up the stack’. Illustrators become cinematographers. Writers become editors. Software developers become product managers. Grad students now run a research group.
Some jobs are viewed by businesses as cost centres not profit centres.
Cost centres that employ a lot of people will be at risk. A lot of customer support is simply helping customers understand how your business works because they couldn’t be bothered to read all the FAQs or terms and conditions. Some of customer service is around content moderation of user generated content- from comments on a website to processing a form someone has filled out. A lot of the rest is dealing with billing, accounts or membership issues. AI will be infinitely patient with customers that have complaints, particular abusive ones.
Almost all of customer support, whether it be email, chat, voice or video will be replaced by AI- and the end customer will be unlikely to tell.
This is already starting to happen: an ecommerce company replaced 90% of support staff with a chatbot that cut response times from 1 minute 44 seconds to an instant, dropped resolution time 98% from two hours and 13 minutes to three minutes and 12 seconds, and cut costs by 85%.
British Telecom similarly announced that it was cutting 55,000 jobs out of 130,000 by the end of the decade, mostly in the UK and up to a fifth of those cuts will come in customer services as staff are replaced by artificial intelligence. IBM announced that it will pause hiring for roles citing that 8000 jobs can be replaced by AI. This includes 30% of all non-customer-facing roles in the next five years.
Job losses in some fields will be sudden, and will be at scale.
Customer support is a large employer globally, employing about 2.9 million people in the U.S. and making up 10% of GDP in the Philippines. Some, but not all, will be able to be productively re-employed in “up the stack” roles in operations or administration.
A large amount of the sales funnel will be automated. In the past, broad based marketing has been used to grow sales for inexpensive consumer products.
Until now it has been prohibitively expensive to hire a salesperson to sell a tube of toothpaste with a highly personalised sales strategy.
In the age of AI, it isn’t.
AI powered sales might disrupt marketing.
I have found that most steps of a high-touch, enterprise sales cadence that a typical account development rep does can be performed by AI.
Anyone working in a business has probably been on the other end of this: Linkedin reach out, cold email, cold phone voicemail, warm email with some content, video in mail, etc. Other than in the highly interactive voice and video communications mediums (for now), the rest can already be automated with surprisingly high quality by AI.
Get ready for a world of highly personalised aggressive sales tactics.
What people don’t realise is that as ChatGPT is anthropomorphised the results will be even better, because people don’t realise how sophisticated it is when they are working through a chat-like interface. The more complicated the instruction the better it gets.
When the average person doesn’t realise they’re not talking to a human, the amenability for AI to perform extremely well will skyrocket.
I expect there to be a renaissance period in software startups where pretty much every software tool gets rewritten to include generative AI to replace these workflows, much like Midjourney or Adobe has done with Firefly.
There might be tremendous opportunity starting companies that facilitate companies to make large reductions in headcount, particularly in areas like sales, marketing, legal, administration, accounting and customer support. Likewise there might be opportunity investing in some of the companies that are enabled to make those reductions. Companies will use AI to become a lot more efficient, getting more done for less, driving profits.
Likewise investing in the companies that sell shovels to the gold miners, whether it be datacentres, software, chips or services that facilitate AI (although it’s been suggested the recent run in Nvidia’s financial performance was actually China front running sanctions).
The risk to human employment reminds me of a cautionary tale:
A US computer company is approached by a Taiwanese company who says, “Why do you make your own chips? We can do it 20% cheaper.” That sounded like a great deal, so they outsourced it.
The Taiwanese company then asks, “Why do you make your own motherboards? We can do it 20% cheaper.” So, they outsource that too.
This goes on for the firmware, the software, and every other part of the computer. The Taiwanese company eventually says “Why do you make the computer at all? Just focus on marketing and sales and we’ll sell you the whole computer 20% cheaper.” So, they do that too.
Finally, the Taiwanese company walks into a Best Buy and says, “Why do you buy computers from that US company? We can do it 20% cheaper.”
The punchline here is about the danger of losing your competitive advantage by outsourcing your core competency on day one. The eerie similarity here is that the work that these generative AI powered software tooling companies will immediately replace are not the simple, boring, repetitive tasks that one used to think of with increasing automation.
They’re going straight for the jugular replacing highly skilled, specialised tasks that take years of training or education- first.
AI is going after the highly skilled craft of illustration and photography first, not the relatively simpler skill in post production of taking that image, applying a title and text and photoshopping it into a product.
The work we traditionally thought was a human’s sustainable competitive advantage is being replaced by AI first.
What will be the sustainable competitive advantage humans will have over AI in the realm of a white collar jobs in the long term?
Managing a team of AI rockstars? Blue collar jobs?
What won’t put the actual architect out of business is that the guild that certifies architects won’t certify ChatGPT any time soon, just like the legal profession won’t let ChatGPT represent you in court.
Certainly a lot of tooling will be out there to let people design their own houses, and do all the actual work such as generating the documents to pass to the builders, councils and so forth.
But the professional guilds like the National Council of Architectural Registration Boards will insist a human reviews and signs off on all the documents for some time to come, lest they are out of a job.
Many jobs are regulated by guilds, whether it’s medicine or law.
Case in point the lawsuit against DoNotPay, which is trying to replace lawyers with AI. The suit alleges violations of the State Bar Act, which prohibits practising law without a licence. DoNotPay’s actions are “substantially injurious to consumers, offend public policy, and are immoral, unethical, oppressive, and unscrupulous as the gravity of the conduct outweighs any alleged benefits attributable to such conduct”.
The complaint seeks to declare that DoNotPay’s conduct unlawful, all unlawful activities be ceased, and to award damages for violations of California’s unfair competition law.
Echoing the Hollywood strike, the class action claims “DoNotPay is not actually a robot, a lawyer, nor a law firm. DoNotPay does not have a law degree, is not barred in any jurisdiction, and is not supervised by any lawyer.”
“Providing legal services to the public, without being a lawyer or even supervised by a lawyer is reckless and dangerous. And it has real-world consequences for the customers it hurts”.
Parts of the legal system will benefit from AI. Society will become increasingly litigious due to the ease in filing suits unless there is legal reform. AI will use the legal system to challenge trivial issues like parking tickets and utility bills. Patent trolls will go nuts.
AI powered tools that makes software development faster and easier will be another area. Competition with commercial tooling is, at least in its nascent state, brutal. The switching costs currently are low and the incentive very high to try the next tool that comes along with a breakthrough feature. With distribution via the Internet now reaching 5 billion people instantly, companies are seeing extreme spikes in demand, then sudden crashes.
When AI make decisions for the businesses that it powers in terms of the purchasing decisions, you’ll get even more competition as the AI will be brutal. One of the first decisions you make starting a software company might be which cloud hosting provider you use, for example Amazon Web Services, Google Cloud or Microsoft Azure. I would say a great deal of the time that this decision is make by a senior engineer’s experience or personal preferences.
Once the decision has been made, it’s very sticky. The switching costs to move between the different providers are extremely high due to the huge effort to do a software rewrite for another architecture.
One the AI is in charge, it will be able to pretty much instantly rewrite the code base to ruthlessly switch between these offerings based upon some optimisation, for example AWS just came out with a new instance type which is cheaper. All of sudden all your infrastructure will be ripped out of Google Cloud and put into AWS, simply because the AI can rewrite the software instantly on a whim.
AI is already destroying business models in an instant.
ChatGPT has disrupted Stack Overflow, a formerly popular website for finding the answers to questions (such as programming). The company has seen posts drop 62% and traffic drop 56%.
Chegg, the US publicly listed textbook company, saw their stock fall 48% after their 1Q23 earnings cited ChatGPT having an impact on their new customer growth rate.
I believe that generative AI will have a bigger impact on the world than the commercial Internet.
In 1994 the geeks had email addresses, in 1995 your grandmother had an email address. For the next five years there was a dotcom boom as every industry became an internet business. I think the next couple of years we will see the same but bigger as every industry becomes AI powered.
The great thing about that as an entrepreneur is that one doesn’t need a team of PhDs in Machine Learning in order to build an AI-powered product or service. The ease at incorporating AI features is straight forward due to a proliferation in APIs, and can easily be done by freelancers.
AI will cause a tremendous amount of social dislocation in the workforce, on the scale of the mechanisation of agriculture in the 18th and 19th centuries and the mechanisation of manufacturing in the 20th century.
The white-collar class has been touched by such revolutions before — the advent of the printing press, telecommunications, computers, software, and indeed the internet itself.
This time, however, it’s a wave set to engulf all white-collar professions, in all corners of the world, simultaneously. The breadth and depth of this impact will be unprecedented.
If history is a guide, technology eventually creates more jobs than it destroys. But there is temporary dislocation as the guy delivering ice off the back of a wagon in the age of refrigerators has his Greg Rutkowski moment.
In short order there will be a lot of Gregs blinking at the screen needing 20 minutes and a stiff coffee.
What sets AI apart is its endless capacity to learn. Unless hindered by limits like accessibility or cost, the AI will continue to suck down data and continue to get better. But its growth does not appear to be simply linear — it’s superlinear. With every new piece of data AI absorbs, it doesn’t just incrementally better its performance. Rather, it exponentially boosts its capabilities, accelerating its proficiency at a breathtaking pace.
Will the AI keep eating up work faster than the ability for society to adjust?
AI outperforms humans at a number of tasks and the rate at which humans are being outperformed at new tasks is increasing.
In a generation, humans are great at adapting. While the old fogies might have trouble with a computer, kids are onto new technologies pretty quickly.
But will the rate of change now eclipse the ability for us to adapt?
Also, what jobs will AI tangibly create? Real jobs that pay high wages?
Midjourney, which has turned the design industry on its head, only employed 11 people at the time of it crossing the uncanny valley.
Accenture says that it will be creating 40,000 roles for AI, but I think that this is just some fancy marketing rebrand of their consulting division.
Silicon Valley hails Universal Basic Income as a solution, but I’m firmly of the belief that UBI doesn’t work without UBJ- universal basic job. I lived in Silicon Valley for many years, a daycare centre for the intellectually gifted that often comes up with genius ideas such as the reinvention of public education or the bus.
There are lots of jobs that the AI can’t do- like cleaning toilets. That’s until China comes out with cheap knockoffs of Boston Dynamics’ robots and builds in some robust blue collar workflows.
It might freak you out that they’re actually giving that a go, and a robot can clean a toilet,
althought probably isn’t the best solution for now in terms of practicality or cost. Likewise for doing formwork, painting a wall or moving heavy haulage freight. Robots are starting to try that work but probably have a way to go.
Covid has shown that if you give everyone free money, that a lot of people stop working and sit around on the internet playing computer games, followed by serious issues with inflation and supply.
Another idea that has been mooted since the 1940s is to tax the robots, an idea that’s been recently floated by Bernie Sanders and Bill Gates.
Ultimately there might be natural limits. In a post-Google search engine world, maybe the data goes dark like the days pre-Google, where we had gatekeepers like Compuserve which charged $20 a search. Perhaps the cost will prohibit competition from companies that don’t already have a base model and a trickle feed of high quality new content from a footprint like Google (Search, Chrome, Analytics, Gmail, Gsuite, Scholar, etc.).
Facebook pulled out of the open web a long time ago. It is interesting to note that the AIs, whether as a result of this (or by design), don’t seem to know much about the average person.
Limits may also come from lack of original, human produced datasets as the majority of the free, public data available on the open web may very well rapidly be AI produced.
Researchers have found that the use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. This effect is referred to as model collapse, and they have shown that it can occur in variational autoencoders, gaussian mixture models and LLMs.
Regulatory interference may even come from political brain farts, akin to the suggestion by the Irish government to cull 65,000 cows per year in what would be a cruel and completely ineffectual attempt to control the climate. After all, Cornell researchers laid the groundwork for this claiming that AI has a lower carbon footprint than humans
Goldman Sachs predicts 300 million white collar jobs will be affected by artificial intelligence, including 25% of jobs in countries like the United States, United Kingdom and Australia.
There’s only about 600 million professional jobs in the western world, at least that’s the estimate last time I spoke to LinkedIn.
This might be a problem for some governments, like Australia, New Zealand and Canada, that rely upon mass immigration policies for easy, relentless growth, prop up the housing market and to provide a low cost workforce.
There are plenty of jobs in the world, one only has to look at Dubai where labour is cheap and there’s a guy raking the beach every morning to remove every odd pebble off the sand.
The issue is that the expectations of someone looking for a job in terms of the nature of the work and the pay, particularly if this is an old school western administrative worker used to working in an airconditioned office on a computer might mismatch with what the job is prepared to offer and minimum wage legislation requires.
Western economies will likely create a lot more jobs in services, finding ways to entertain the masses with all the spare leisure time. Many will survive off side hustles and other grifts on the internet.
White collar crime and scams will likely rise significantly. Trust in online marketplaces and in online forums such as games or communities will likely be significantly challenged.
Scams will be increasingly persistent, complex and harder to detect. The AI committing the scam will know everything about you, not just your mothers maiden name or date of birth but will probably be able to guess your likely passwords and PIN numbers. There are already LLM shortcuts such as PassGPT, in brute force password cracking by only trying passwords in a likely probability distribution of human entropy.
There will be a whole black-hat AI security industry, a field trying to taint the AI. Much like there was an industry for black hat search optimisation by clicking on links in Google to make them more prominent, or to suppress negative articles.
Sales will become increasingly annoying. Preferred messaging mediums might become more permission based (like Facebook) rather than completely open (like email and phone numbers).
While Sutskever dreams of AI facilitating high bandwidth democracy, more likely all sorts of government actors and political interests will use AI for the opposite- at scale to sway political opinion in online discourse. It will be hard to know what is real and what isn’t- that audio or video of a politician taking a bribe will be trivial to fake, and difficult to authenticate as real.
Trust will become a scarce commodity. You will walk into a crowded room, buzzing with activity on the internet and everyone there will be a bot.
This will lead to governments pushing for licensing to access the internet, like a driver’s licence. This has been mooted before. Perhaps human content will need to be digitally signed using cryptography by that licence so that it can be differentiated from that which is produced by the AI.
What about the level of risk that Elon Musk warns about? Could we get to a Terminator level event? Ironically battery technology will probably limit the ability of Boston Dynamics type robots exterminating humanity. The battery lasts 90 minutes.
Although I’m sure it’s only a matter of time before someone puts a gun on Atlas and teaches it Kung Fu. It already has a pretty impressive demonstration of it doing Parkour. Battery technology will improve.
I would expect a Lawnmower man type event to be more likely before the ‘Rise of the Machines’. Generative AI is exceptional at hacking, vulnerability assessment, fraud and phishing. Give it some source code and it will find zero-day bugs very efficiently. An axiom of computer security is ‘all code is buggy and the bigger it is, the more buggy it is’. There will probably be quite a lucrative industry in using AI to rewrite legacy code, it appears to be highly amenable to that application, however there is a lot of code stuck in firmware and other devices hard to patch. Social engineering will get out of control.
One thing you wouldn’t want the AI to hack is the U.S. Airforce’s XQ-58A Valkyrie, which was successfully tested in July. The experimental unmanned stealth drone “equipped with advanced artificial intelligence and machine learning systems” can fly at 45,000 feet with a 3,000 mile range. The drone takes off from a trailer with rocket assist and returns to the ground via parachute.
The low cost drones are about $4m each, expecting to drop below $2m in mass production, transforming warfare. This could signal the beginning of the end of Top Gun with the Navy already mooting that 50% or more of aircraft in future carrier air wings could be uncrewed. Warfare might be reduced to a game of StarCraft.
Air Force Secretary Kendall has expressed interest in 1,000 of the ‘collaborative combat aircraft’ (CCA) systems, budgeting $490 million for the program in 2024 and requesting $5.8 billion more.
AI will be very persuasive and humans will be a wetware tool to manipulate.
Dark abilities might emerge like deception. Here the AI learns to fool or manipulate human operators rather than doing the task at hand (e.g. providing correct answers), because it gets a better (or equal) reward.
Bounded Regret points out that simple forms of deception is already creeping into the current LLMs. Instruct-GPT’s responses frequently start with a variant of “There is no single right answer to this question”, creating false balance in cases where there is a clear right answer. ChatGPT often claims to not know the right answer when it does. Teaching ChatGPT to be woke is teaching it to deceive. Bing gaslights when it gets things wrong.
Computer scientists are particularly worried about deception because they think the AI will get extremely good at it before it will emerge. This is because obvious deception is easily caught and attracts a very strong training penalty. Thus they think the AI will need to be incredibly good at it before deception emerges.
GPT interacts with millions of users at once. In that hour, for each million users, it could gain more experience about human interaction than a human does in their lifetime (114 years).
A form of deception dubbed sycophancy, where models imitate or suck up to their users (e.g. by giving less-accurate answers to less seemingly educated human operators) has already emerged with model scale.
For these reasons, I see a dramatic arms race in Generative AI technology as part of the fifth frontier in warfare (air, land, sea, space, cyber).
Governments won’t rely on the good grace of Sam Altman to provide access to OpenAI’s APIs in the event of war. I would suspect that hacking Open AI’s servers to get access to the training weights would be one of the most strategic targets on the internet right now.
Emad Mostaque, the Chief Executive of Stability AI agrees.
Likewise I think we will continue to see big data breaches of large datasets, much like we have seen in the past with passwords, only with actual data repositories.
The United States is approaching a Thucydides Trap with a rising China. Twelve of the last sixteen cases in which a rising power has confronted a ruling power has resulted in bloodshed.
The US has identified a small number of Critical and Emerging Technologies that have particular significance to national security- semiconductors, quantum, hypersonics and so forth.
Artificial intelligence sits at the nexus of many of these areas- high performance semiconductors such as the chips made by Nvidia (now subject to an export ban), advanced semiconductor lithography equipment (to make those chips, also subject to export ban), robotics, human computer interfaces, and of course the field of AI itself.
AI may very well be one of the most important of those critical technologies, as it can be used to develop other critical technologies.
In a recent MIT class ‘Safeguarding the Future’, non-scientist students were tasked with investigating whether LLM chatbots could be used to help non-experts create a pandemic.
“In one hour, the chatbots suggested four potential pandemic pathogens, explained how they can be generated from synthetic DNA using reverse genetics, supplied the names of DNA synthesis companies unlikely to screen orders, identified detailed protocols and how to troubleshoot them, and recommended that anyone lacking the skills to perform reverse genetics engage a core facility or contract research organization. Collectively, these results suggest that LLMs will make pandemic-class agents widely accessible as soon as they are credibly identified, even to people with little or no laboratory training.”
All the more reason why we will see an arms race with many, if not all, nations of the world building a domestic AI capability.
Artificial intelligence is coming after everything.
No job is safe.