Big Data and Data Science Recap 2016
As the year draws to a close, I’d like to take the opportunity to reflect on what I’ve learnt in 2016. Towards the start of this year, I started a blog as a tool to simply store information on topics I’d learnt so that I could revisit them when needed.
I then had a revelation that this information would not only be useful to me, but to others in my field: so I began sharing it!
I integrated Google Analytics and this has allowed me to track how the blog has grown from a couple hundred sessions a month to over two thousand and has been slowly rising!
The spike has been due to the increased volume of blog posts, after setting myself a challenge to write a post a week since around June/July.
So what have I learnt and what could you learn from my posts? And what will I be doing with the blog next year?
End of Year Recap
10 Steps to Big Data:
- Learn how to ingest data from your Data Warehouse into Hadoop with Apache SQOOP and Apache Flume!
- Transform your data into Hive structures ready for analytics using Pentaho Data Integration for ETL
- Find out what I learnt from the Strata and Hadoop World Conference in London
- Understand Real Time Streaming concepts through Apache Kafka and How to scale spark streaming applications
- Use Apache Kudu for RDMS like data storage alongside your Hadoop cluster
- My most popular post of the year! Learn how to improve the performance of your Spark applications
- Build a search engine with Elasticsearch
- Begin performing analytics on your data in Real-Time with Apache Spark
- Follow the Big Data Journey to learn the whole Big Data pipeline from ingestion to analytics — Part 1, Part 2, Part 3 and Part 4
- Understand the differences between RDBMS and NoSQL data stores and gain knowledge of how to use Apache HBase
The road to Data Science:
- Take the Data Science 101 class
- See how we run our first Data Science Hackathon at Capgemini
- Step into Machine Learning with this simple intro on Naive Bayes
- Step 2 of classification — Support Vector Machines
- Finishing off classification with Decision Trees
- Moving into Clustering for recommendations with K-Means clustering
- Sarcasm Detection using Machine Learning in Spark
- How can machines better understand textual data with Natural Language Processing
- What did I learn from being a part of an evolving agile team at a large company.
- I went to SAS Forum in the UK for their new product launch
- Some android programming to build a fitness app — this was a neat algorithm to calculate how far you’ve ran without using GPS
I will continue to issue my newsletter in the new year so please sign up by clicking here if you haven’t already.
I will also look to introduce new types of content to the blog. Don’t worry, the technical big data and data science posts will still exists however I will look to bring in posts on other topics such as process and behavior improvement and the tools and techniques I use to improve how I live and work.
Thanks for a great year and I hope you all enjoy your holidays.