Stop AI scraping your Internet data

Andrew Zolnai
Zolnai.ca
Published in
6 min readSep 2, 2023
Internet-scraped & hand-created illustrations, 45: Fulton County jail & Sting: “Every breath you take”

Following When is AI not AI?, let’s see how AI crept into our own digital backyards! My blog post shows how my non-profit started using “AI for good”. Follow-on post When is AI good AI?

You heard perhaps of the controversy surrounding big Artificial Intelligence players scraping data off the internet, while playing fast & loose with copyright laws that haven’t kept up with the digital era. While the top illustration pokes innocent fun in current affairs, I post all my tech work under Creative Commons — CC BY-SA 4.0 should you ask — and I post nothing on social media I wouldn’t want to end up somewhere nefarious. Even so, I was lured into scams that deprived me of my private pension… after just shy of 40 years on the internet and just over that on computers — see this story — does familairity breed contempt? Here are ways to stop your info being scraped also! This follows on The Guardian: The Guardian blocks ChatGPT owner OpenAI from trawling its content.

So I queried Bing AI Search on the following questions: it’s rather involved but it can be done… if you have someone computer-savvy near you! Mirror Google Bard query follows… Also, don’t you think that the conversational tone gets a little, um, creepy?

Q: How to ask Meta not to use personal content for AI scraping

If you want to ask Meta not to use your personal content for AI scraping, you can fill out the “Generative AI Data Subject Rights” form on Meta’s privacy policy page ¹. Once you’re on the page, click “Learn more and submit requests here” and choose the second option to delete any of the personal information from those third-party data sources used for training ¹.

This will allow you to remove some of your personal information from that training data to prevent it from being ingested by these models, if you qualify ².

Please note that this policy only covers third-party sources and doesn’t extend to any personal data uploaded to Meta’s social media platforms ². In other words, Meta can use the text contained in posts or comments, or selfies and photos submitted by users to Facebook or Instagram, to train its AI models ².

If you have any further questions or concerns, please let me know.

Source: Conversation with Bing, 02/09/2023

(1) . https://bing.com/search?q=how+to+ask+meta+not+to+use+personal+content+for+ai+scraping.

(2) How to ask Meta to not train its AI on your personal info. https://www.theregister.com/2023/08/31/meta_personal_data/.

(3) How to stop Meta from using personal data to train generative AI — CNBC. https://www.cnbc.com/2023/08/30/how-to-stop-meta-from-using-personal-data-to-train-generative-ai-.html.

(4) Instagram: Users can stop Meta from using their personal data to train …. https://timesofindia.indiatimes.com/gadgets-news/users-can-stop-meta-from-using-their-personal-data-to-train-generative-ai-models/articleshow/103228808.cms.

(5) undefined. https://www.news18.com/tech/stop-meta-from-using-your-personal-data-to-train-its-ai-models-heres-how-8559254.html.

(6) undefined. https://www.indiatoday.in/technology/news/story/facebook-users-can-now-stop-meta-from-using-their-personal-data-in-ai-training-2429573-2023-09-01.

Q: How to ask Alphabet not to use personal content for AI scraping

I’m sorry, but I’m not sure what you mean by “Alphabet”. Could you please clarify? If you’re referring to Google’s parent company, Alphabet Inc., then you can visit their privacy policy page ¹ to learn more about how they handle personal data.

If you want to ask Google not to use your personal content for AI scraping, you can visit the “My Activity” page on your Google account ². From there, you can delete any activity that you don’t want Google to use for training its AI models.

Please note that this policy only covers personal data that is uploaded to Google’s services and doesn’t extend to any third-party sources ². In other words, Google can use the text contained in posts or comments, or selfies and photos submitted by users to its services, to train its AI models ².

If you have any further questions or concerns, please let me know.

Source: Conversation with Bing, 02/09/2023

(1) How to ask OpenAI for your personal data to be deleted or not used to …. https://techcrunch.com/2023/05/02/chatgpt-delete-data/.

(2) How to Block AI Chatbots From Scraping Your Website’s Content — MUO. https://www.makeuseof.com/block-ai-chatbot-scraping-website/.

(3) ChatGPT Should Worry Google and Alphabet. Why Search When You Can Ask …. https://www.bloomberg.com/opinion/articles/2022-12-07/chatgpt-should-worry-google-and-alphabet-why-search-when-you-can-ask-ai.

Then I queried Google Bard on the following questions, the same applies:

Q: How to ask Meta not to use personal content for AI scraping

To ask Meta not to use your personal content for AI scraping, you can follow these steps:

  1. Go to Meta’s Privacy Center: https://www.facebook.com/privacy/explanation.
  2. Click on the “Learn more and submit requests here” link under the “Generative AI Data Subject Rights” section.
  3. Select the option that best describes your request. The options are:
    - Access, download, or correct your personal information gleaned from third-party sources that’s used to train generative AI models
    - Delete your personal information from third-party data sources used for training.
    - Have a different issue.
  4. Complete the security check test.
  5. In the “Reason for Request” field, enter the following: “I do not want my personal content to be used for AI scraping.”
  6. Click on the “Submit Request” button.

Meta will review your request and will get back to you within 30 days.

Please note that this process only applies to personal content that Meta has scraped from third-party sources. It does not apply to personal content that you have uploaded to Meta’s social media platforms, such as Facebook or Instagram.

If you have any questions or concerns, you can contact Meta’s Privacy Help Center.

Q: How to ask Alphabet not to use personal content for AI scraping

Alphabet is the parent company of Google, so the process for asking them not to use your personal content for AI scraping is similar to the process for asking Meta not to do the same. Here are the steps:

  1. Go to Alphabet’s Privacy Policy: https://policies.google.com/privacy.
  2. Click on the “Request a copy of your data” link under the “Your data” section.
  3. Complete the security check test.
  4. In the “Reason for Request” field, enter the following: “I do not want my personal content to be used for AI scraping.”
  5. Click on the “Submit Request” button.

Alphabet will review your request and will get back to you within 30 days.

Please note that this process only applies to personal content that Alphabet has scraped from third-party sources. It does not apply to personal content that you have uploaded to Google’s services, such as Gmail or Google Drive.

If you have any questions or concerns, you can contact Alphabet’s Privacy Help Center.

Here is an example of a letter you can send to Alphabet to request that they not use your personal content for AI scraping:

Dear Alphabet,

I am writing to request that you do not use my personal content for AI scraping. I understand that you may collect personal information from third-party sources, such as websites and social media platforms, to train your AI models. However, I do not want my personal information to be used for this purpose.

I am concerned about the privacy implications of AI scraping. I am worried that my personal information could be used to create harmful or misleading content, or to target me with advertising. I also do not want my personal information to be used without my consent.

I would appreciate it if you would respect my request and not use my personal content for AI scraping.

Thank you for your time and consideration.

Sincerely, [Your name]

--

--