The Startup
Published in

The Startup

Have You Been a Victim of Blog Scraping?

A curious discovery led me down a weird rabbit hole of confusion

Blogging is new to me

Until this last year, I primarily wrote my thoughts down the old school way, in pen on paper. After enrolling and completing a coding bootcamp however, I found blogging to be a way of reinforcing my learning and as a way to open up and express who I am behind the screen. As I am in the midst of the most difficult but transforming career change and job search, I found something very strange when periodically checking the many analytics measurements I have related to my personal website.

A screenshot of the Google Search Results from https://gabrielhicks.dev

I was honored and confused to see that some rogue website had linked to my own personal website more times than I had shamelessly self promoted through my medium blogs!

That was until I actually followed the links to see that it was indeed my own shameless self promotion after all. However, in this case, my shameless self promotion was translated into 13 different languages! Amazing, I had been wondering why I had seemingly sporadic engagement from various other places that were linked directly to my portfolio, I felt this may be a strong contender for that explanation.

Who are they and what do they want

Ichi seems to be some sort of bot or script that scrapes content from Medium publishers specifically, and runs the articles through 13 languages translators, stripping any identifiable accreditation away from the article. For some reason, one of my blogs slipped through the cracks and properly linked to my website, exposing ichi to me. I was flattered, until I found every other one of my blogs posted there, including photos of me and reproductions of my journey into tech.

I sought out help from my network, and used my googling skills to discover this was not an unknown problem. Last summer a few other authors had reported directly to Medium staff what was going on, and that their content was being stolen. In August many people thought it was the end as ichi had appeared to go offline. It is clear that ichi is in the business of blatantly plagiarizing and reprinting blog posts, but is there anything we can do about it?

What can we do about it?

As of writing this, the more promising solution other than writing a blog may be to reach out to Medium via email to seek guidance. This article outlining Medium’s terms and privacy policy states that we can reach them directly by emailing yourfriends@medium.com.

How can I see if my blog posts have been republished?

The key identifying features that I can find of what ichi deems plagerisable, are:

1. The blog is published to a publication on Medium

That’s it, they literally just repost every blog post that has ever been written and added to a publication from my investigation. If you would like to see if your posts have been stolen, I also discovered a surefire method of finding that out!

Step 1: Copy the headline of your published blog post

Copy the heading of a published blog post

Step 2: Use Google Translate to translate your heading to Japanese

Translate to Japanese (the most common translation plagiarized)

Step 3: Search for your translated heading in Google as a quote

Throw some quotes around that translation

Step 4: ????????

First search result!

Step 5: Ichi profits?

A photo of me as a child, several photos of me within this post specifically
Incredible, I just stumbled across this!

It appears they are not only excluding external links, but in this original blog post, I linked to another one of my Medium publications. They stripped this link and instead linked directly to their own plagiarized version of that article!!!

Step 6: Write a blog post, or send an email, or both!

Right now this seems to be what we can do to at least bring some awareness to the issue. At first sight I thought it might be a cool feature, or an interesting and humbling republishing. It is not republished because they thought it was interesting and wanted to share my content, it is stolen and stripped of external links to me and my webpage ( except for one ). I will return to update this blog as I get responses to my emails and if I am able to successfully remove any of my content!

Thanks for checking this out, in the event this blog post is stolen I want to riddle it with as many potential links to my content and shameless self promotion as I can! You can find me on LinkedIn, Twitter, my portfolio, here, at another blog, or on my own blog site.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Gabriel Hicks

Gabriel Hicks

Software Engineer from Iowa, living in NYC. Interested in new technologies and exploring opportunities to grow. https://gabrielhicks.dev