Understanding Applebot: Apple’s Web Crawler
How to stop Apple Intelligence scraping your data
Applebot is the web crawler developed by Apple to power various search and indexing features across its ecosystem, including Spotlight, Siri, and Safari. By allowing Applebot to access your website through robots.txt, your content can appear in search results for Apple users worldwide, enhancing the visibility and reach of your site.
What Does Applebot Access?
Applebot can crawl a wide range of resources from web servers, including:
- robots.txt
- sitemaps
- RSS feeds
- HTML documents
- Sub-resources needed to render pages such as JavaScript, Ajax requests, and images.
Identifying Applebot
Applebot can be identified through reverse DNS lookups in the *.applebot.apple.com domain. Additionally, its IP addresses can be matched with a CIDR prefix found in a specific JSON file provided by Apple. Here’s an example of using the host
command to verify Applebot’s identity:
$ host 17.58.101.179
179.101.58.17.in-addr.arpa domain name pointer 17-58-101-179.applebot.apple.com
$ host 17-58-101-179.applebot.apple.com
17-58-101-179.applebot.apple.com has address 17.58.101.179..
User Agents
User agents help webmasters identify and manage crawler traffic. Applebot uses different user agents for search and podcast indexing.
The user-agent string contains ”Applebot” and other information.:
- Search:
- For desktop:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
- For mobile:
Mozilla/5.0 (iPhone; CPU iPhone OS 17_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4.1 Mobile/15E148 Safari/604.1 (Applebot/0.1; +http://www.apple.com/go/applebot)
- Podcasts:
- iTMS traffic is identified by:
User-Agent: iTMS
The iTMS user agent does not follow robots.txt rules as it only crawls URLs associated with registered Apple Podcasts content.
Customising robots.txt Rules
Applebot adheres to standard robots.txt directives. Here’s an example of a robots.txt file that restricts Applebot’s access to certain directories:
User-agent: Applebot
Allow: /
Disallow: /private/
User-agent: *
Disallow: /not-allowed/
If Applebot-specific instructions are not provided, it will follow the directives for Googlebot.
Rendering and Robot Rules
For Applebot to index your website effectively, ensure all resources needed to render the page are accessible. Blocking resources like JavaScript and CSS might prevent proper rendering. Ensure your site performs well even if some resources are unavailable, a practice known as graceful degradation.
Customising Indexing Rules for Applebot
Applebot supports robots meta tags in HTML documents to control indexing. These meta tags should be placed in the <head>
section:
<html><head>
<meta name="robots" content="noindex, nosnippet"/>
...
</head>
<body>...</body>
</html>
Available directives include:
noindex
: Do not index this page.nosnippet
: Do not generate a snippet for this page.nofollow
: Do not follow any links on this page.none
: Do not index, snippet, or follow links.all
: Index, snippet, and follow links as usual.
Controlling Data Usage by Apple’s AI models
Apple offers an additional user agent, Applebot-Extended, which provides web publishers control over how their content is used for training Apple’s AI models. To opt-out of this, add the following rule to robots.txt:
User-agent: Applebot-Extended
Disallow: /private/
Applebot-Extended does not crawl pages but helps determine data usage for AI training.
Search Rankings
Apple Search ranks web results based on:
- User engagement
- Relevancy to search terms
- Quality and number of links
- User location signals
- Webpage design
These factors collectively influence search results without predetermined importance.
Conclusion
Understanding and configuring Applebot correctly can significantly enhance your website’s visibility within Apple’s ecosystem. Properly managing robots.txt and meta tags ensures that your content is indexed efficiently and appropriately by Applebot.