The Visual Summary — Apple Intelligence 🤖✨
A sketchnote overview of Apple’s WWDC24 AI announcements
Last month, Apple opened the box containing its strategy for AI: Apple Intelligence. Let’s dive into a visual summary of the announcement by looking at the overall strategy, capabilities, and architecture. Let’s go!
Apple Intelligence: Personal & Feature-driven
After Microsoft and Google, Apple has now entered the AI business. As the developer of several of the world’s most used operating systems, they could not stay quiet on the AI front. Microsoft is adding Windows features under the Copilot monicker, and Google is integrating Gemini with their Android OS and Google Workspace offering. This is Apple’s move.
Apple Intelligence stands out in a few ways:
- the focus on features rather than generic generative AI, and
- the focus on a personal experience.
Building features with Generative AI is rather challenging. Systems like ChatGPT are very good in a teacher role: someone who can explain world knowledge in a manner that is suitable to a particular individual. However, two aspects are very challenging with Generative AI models: connecting your own knowledge and making it do something useful. Apple makes a solid effort to solve these two challenges.
Two things are very challenging with Generative AI models: connecting your own knowledge, and making it do something useful.
First, Apple can search personal information via a system called semantic index. Apps can make expose their data to Apple Intelligence via specific APIs. Exposing personal information to AI services typically comes with a backlash, as Microsoft’s Recall feature has learned us. But Apple takes a slightly different route.
Apple’s focus on security in the past decade seems to be paying off. They leverage their trusted on-device processing, but this time stretch that trust into the (private) cloud (see architecture later on). The result: a trusted environment for your personal data, where AI models can only momentarily access it.
The second challenge, making AI do something, making it actionable, is approached via existing frameworks Apple already has in place. Apps on iOS provide hooks for creating shortcuts using so-called App Intents. The are 12 app intents, which AI models (Siri) can now leverage to perform actions on the user’s behalf. While this, of course, limits the diversity of actions the system can perform, it can provide much higher control for Apple. Ultimately, this is key to increasing the reliability of AI features.
Apple presents a personal AI experience that is integrated with their OS’es and app ecosystem.
While systems like ChatGPT are now only gently exploring personalization, Apple taps into its security reputation to provide a real personal experience. Next to that, they leverage the App Intents foundation to provide AI capabilities that can tap into apps.
The result: Apple presents a personal AI experience that is integrated with their OS’es and app ecosystem.
Let’s explore the concrete capabilities that will be offered!
Capabilities: Siri, Text & Images
Siri
With the new operating systems, we’ll get a completely new Siri experience. Instead of the colorful “orb,” we get a colorful, glowing device edge that activates with a subtle animation that warps the screen.
Next to the visual overhaul, we get a Siri that understands us better, connects the dots between separate requests, and allows type-to-siri as input option.
Siri’s knowledge has been bumped. Siri can now tap into Apple product knowledge, so you can ask it how to perform tasks on your device. It also gets on-screen awareness, which allows you to say things like “Add this address to contact X.” As Siri can now tap into apps via App Intents, it can perform the requested actions.
The semantic index allows Siri to understand the actual meaning of your request and connect it to personal info.
Siri’s personal knowledge is constructed by building something called a “semantic index” (via Core Spotlight). This allows Siri to understand the actual meaning of your request and connect your request to (personal) information stored in apps. That information can then be processed to formulate a response or perform the desired action.
Language & Text
Another capability that will be baked into the OS are the Writing Tools. Rewriting text with a different tone, proofreading, and summarization will now be possible in any app that uses Apple’s text components. This systemwide functionality will provide a consistent experience over apps without needing dedicated, re-invented text processing tools for the same task in every app. There’s obviously still a place for dedicated AI components within apps, as they can offer more domain-specific processing (e.g., turning text into specific output, such as diagrams).
Next to the writing tools, we’ll get notification improvements. Notifications typically grow quickly and require active management, so users can quickly lose track of what’s really important. Apple will leverage summarization and prioritization features to let important notifications bubble up. We’ll also get a new Focus mode called “Reduce Interruptions,” where your device actively decides which notifications can interrupt your focus.
Besides systemwide functionality that developers can tap into, Apple also showcases how AI can be used inside apps.
An example of AI inside an app is Mail, where we’ll see Smart Replies that you can quickly fill in with a form-like interface. We’ll also get email categories, priorities, and summaries. The summaries will be shown instead of the old email snippet (first n characters) to provide more information at a glance.
Another example of AI inside apps can be found in live transcriptions and summaries, which are new features for both the Notes and Phone apps.
Images
Image generation using GenAI models has been quite a hot topic in the last two years. It’s also a dangerous topic when you consider deep fakes and fake news. However, Apple decided to jump into this AI subfield, while still imposing some guardrails to avoid abuse.
There’s a new app called “Image Playground” where users can create images by adding contacts, themes, and styling information. In essence this app streamlines prompting for end-users. Styles are limited to animation, sketch, or illustration, so photorealistic images are impossible. That might be a good call, as you can generate images based on your personal contacts’ pictures.
Messages gets an extension to emojis called “GenMoji”. This feature lets you generate custom emojis using a prompt, in an interface similar to Image Playground. You can use the resulting unique emoji as a sticker, tapback, or even inline. On non-Apple systems we can expect a descriptive text, this is especially relevant since iOS 18 comes with RCS, which opens up Android communication.
Finally, Notes gets an image wand to turn sketches into generated images. Photos gets image cleanup (removing people from your photos), better search (even inside videos!), and a mechanism to quickly make memory movies with audio from Apple Music, all based on a prompt.
Architecture — A focus on privacy
Even as Apple offers features and not a generic AI service, privacy remains an extremely important aspect for end-users.
Apple has been focussing heavily on privacy for many years now, distancing itself from the data and ad-driven competition like Google. Their main strategy was on-device processing, which might not be sufficient for some large ML models. So, with the introduction of Apple Intelligence, they extend their security perimeter to Private Cloud Compute.
Apple’s Private Cloud Compute consists of servers in the private cloud of Apple, that:
- cannot store information,
- can be verified by your device when communicating with it, and
- runs components that can be verified by independent third parties.
So, whenever Siri needs a more powerful AI model, it will
- select the data required to perform the task at hand,
- contact a server in Apple’s private cloud that spins up for you,
- verify the integrity of that service,
- securely transfer the data,
- receive a response from the server, after which the server deletes all info and shuts down.
This process will make sure none of your private data is ever stored or used for other purposes.
Sometimes world knowledge, or very specific domain knowledge, which transcends Apple’s current models, might be needed.
Sometimes, world knowledge or very specific domain knowledge, which transcends Apple’s current models, might be needed. This brings us to the final architectural component: external models. The first model to be offered is ChatGPT 4o, OpenAI’s most recently released model. This can be used to extend Siri’s power at no cost and is opt-in.
Apple’s agreement with OpenAI (for the free model) includes that they cannot store or use any of the personal information sent to the service from Apple Intelligence. You also have the option to connect your ChatGPT Plus account, which enables more advanced features, such as image generation. Data processing terms might be different in this case.
Whenever you ask Siri to perform a task, it will decide whether to use an on-device, Private Compute Cloud, or external model.
The result is that whenever you ask Siri to perform a task, it will decide whether to use an on-device, Private Compute Cloud, or external model. In the last scenario, you will be asked for your permission to share data with the model, every time. Apple also warns about the reliability of the output of these external models.
The architecture Apple presented allows them to add more dedicated models in the future, like Google’s Gemini, or maybe even very specific models, such as legal or medical AI systems, whose knowledge goes beyond general world knowledge. Overall, it seems like a solid approach as it allows Apple to focus on certain use cases and outsource the more risky ones without limiting end-user functionality.
Availability
If all of this got you excited, there might be a few caveats that can spoil the fun: hardware limitations, language, and living in the European Union.
There might be a few caveats that can spoil the fun: hardware limitations, language, and living in the European Union.
Devices that will support Apple Intelligence are Macs and iPads with Apple Silicon and the iPhone 15 Pro (Max). That’s right, only one iPhone line will be supported. The 8GB of RAM needed seems to be the main reason. Apple chooses not to offload all tasks to the cloud for more limited models. This might be for various reasons: a push towards new, more powerful devices, or maybe avoiding a degraded user experience?
Languages are also limited in this first iteration: all of the text and writing tools will only be available in US English at first.
Finally, Apple Intelligence features will not be available in the EU at first. Apple seems to be figuring out the Digital Markets Act, and so does the EU.
Some Personal Thoughts
My personal thoughts on the introduction of Apple Intelligence is somewhat mixed. I would love to try out the text features; both the in-app and the systemwide ones.
It’s very difficult to make language models actually DO something.
The Siri revamp seems like a big step: finally, away with the ugly orb! The use of in-app actions shows again how Apple lays the groundwork for things way in advance (it might still be a happy coincidence). I think it’s the right approach, as it’s very difficult to make language models actually do something. Leveraging an existing contract with apps simplifies technical challenges a lot and is potentially a solid starting ground for Siri to execute more complex tasks.
The image generation features… I’m not yet fond of these. They seem like a big styling mismatch in Apple’s presentations. Even more, the uncanny AI images feel like a departure from what Apple stood for regarding creativity — Think Different? I’m happy this remains somewhat limited to iMessage, where we already have generic memojis, which can now complemented by generated images and GenMoji.
Apple still plays it safe by limiting the amount of styles, which I think is a good call. You don’t want millions of users generating fake pictures of their contacts. Still, allowing you to mix other people’s pictures into an image feels a bit un-Apple.
As always, these seemingly awkward features might lead up to something we cannot envision yet (see also the path to the Vision Pro). So, all in all, I’d still like to try these tools myself.
Instead of just releasing an open-ended tool, we get actual features that aim to improve productivity.
Ultimately, Apple heavily bets on personalization and features, which I believe to be the right approach. Instead of just releasing an open-ended tool, we get actual features that aim to improve productivity. It makes sense for Apple to start exploring the AI territory; if this really starts taking off, it could make their offering less relevant (are apps still needed? are iPhones still needed?).
Besides the image features, I think they’re heading in the right direction with this approach. I’m hoping Apple can offer these features in the EU as well, and this does not end in a tedious power struggle. For now, at least my Siri will look better, and I have some other nice OS updates to look forward to!
One more thing to wrap it up: solid naming choice, that “Apple Intelligence.”