Five Years of Hadoop Weekly

Joe Crobak
Jan 20, 2018 · 5 min read

It’s been five years since the first edition of Hadoop Weekly went out on January 20, 2013 (see the 245+ issue archives at HadoopWeekly.com). Five years ago, the Hadoop ecosystem didn’t consist of much more than HDFS, MapReduce, Pig, Hive, HBase, and Oozie. Streaming systems were just starting to take off, with Twitter open sourcing Storm in late 2011 (other projects that started around the same time, including S4 and HStreaming have since become defunct). Spark wouldn’t enter the Apache incubator for another six months.

Now, five years later, Hadoop distros include many more projects (Amazon EMR, for one, supports over 18), Apache Kafka is at the center of many applications, and we’re seeing tons of innovation in the container orchestration space. While Hadoop Weekly has always covered the “Hadoop ecosystem,” that definition has shifted significantly over the past five years. (See my brief analysis below).

As new technologies arrive and Hadoop plays less of a role, I’ve resisted renaming the newsletter for one reason or another. But I finally have a new name and the time to make the change. In early 2018, Hadoop Weekly will become Data Engineering Weekly (dataengweekly.com -website coming soon). Don’t expect drastic changes to the content, but I’ll slowly be making the change in name over the course of the next few weeks.

Hadoop Weekly now has well over 10,000 subscribers. Growth has been consistent and organic (I haven’t run marketing or advertising campaigns to drive subscriptions). Thanks to all the readers, especially those that send me content, recommend Hadoop Weekly to their friends/coworkers, and otherwise encourage me to keep the newsletter going.

Trends

Maintaining a high quality newsletter is a lot of work, and there have been several times that I nearly gave up. As part of my reflection on whether or not to keep up the newsletter, I’ve analyzed the data that I’ve captured over the years. Through weekly snapshots of technical topics, it’s easy to see some high-level trends.


Caveat: Hadoop Weekly very much expresses my editorial bias, and thus the following analysis also reflects that bias. What follows is not a scientific analysis, but hopefully you’ll still find it interesting!


Keyword appearance in Hadoop Weekly. More details of the data capture can be found in Source Data. See also an Interactive Chart.

Some brief analysis of the above trend data:

  • YARN started off strong, but its coverage has been trending downwards since 2014.
  • Since late 2013, Spark has been growing like mad, until late 2017 when Apache Kafka became more widely written about.
  • Coverage of Hive, HDFS, and MapReduce has been slowly trailing off over the past five years.
  • Apache Drill had a burst in 2015–2016, but it’s dropped in coverage in 2017.
  • Apache Flink came onto the scene in 2015 (before that, it had a different name) and has maintained steady coverage since then.

Firsts

Here’s a look at the first appearances of various technologies (and more!) in Hadoop Weekly over the past five years.

Costs

I’m often asked how much time Hadoop Weekly takes as well as other details about operating it (and what they cost). So if you’re curious, here’s a peak into those details.

I use Mailchimp for sending emails and AWS to host hadoopweekly.com. During the first ~13 months of Hadoop Weekly, I qualified for Mailchimp’s free tier. Since then, prices have steadily increased from $27/month to $72/month. In just under four years, the total cost of sending email is around $2,750. Hadoopweekly.com is built with Jekyll and hosted byAmazon S3. In early 2016, I also added Cloudfront to the mix to enable https. It’s a bit difficult to estimate exactly, but it costs around $1/month to host Hadoop Weekly (the majority of that cost is the $0.50 for a Route53 hosted zone). So over 5 years, cost is well under $50.

Other costs include:

  • PO Box for compliance with CANSPAM (has gone from $76/year to $90/year): $412 (76+80+80+86+90)
  • DNS: $55 ($11/year)

So in total, over five years, my out of pocket cost has been around $3300. Of course, the real cost is my time—I spend 4+ hours per week curating content.

Sponsorship

As you can see, the cost of operating a weekly newsletter is non-trivial. Starting in 2018, I’m going to be looking into including sponsored content (something that several other weekly newsletters do) to help offset some of my costs. If you’re interested in advertising a job, a webinar, or another type of sponsored post, please get in touch by mailing info@dataengweekly.com.

A look ahead

2018 should be an exciting year for data engineering. I’m excited to cover that news, and I hope you remain on for the ride (and help spread the word). As always, I can be reached on twitter and via email (info@dataengweekly.com) if you have any news or posts to share. Until the transition is complete, signup still lives at hadoopweekly.com.

Joe Crobak

Written by

Distributed and complex systems, healthcare and gov tech. Prev @USDS @Foursquare & some defunct startups. I run dataengweekly.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade