Top startup jobs: Data scientist
Quick guide for data scientists, analysts and big data managers
If you had ever searched for a job at internet/software startup there is a big chance you have seen some openings for positions like: Data analyst, scientist or data engineer.
Such jobs tend to be very crucial for modern day companies because each digital product gathers a lot of information which when analysed can bring a lot of great information about our product and users.
According to research done by RJMetrics there are ‘There are 11,400 self-identified data scientists in the world today’ and ‘52% of all data scientists have earned that title within the past 4 years.’
The whole report is here:
We Had 4 Questions About Data Scientists
Millions of data points later, we have the answers
medium.com
HBR while ago in 2012 called data scientist the sexiest job of 21st century :
Forbes pretty much confirmed it in 2016:
The demand for data scientists is pretty big, Linkedin tells me there are as of today 8 453 data analyst jobs and 12 148 data scientist jobs. It’s not bad, for comparison there are 19 831 marketing manager jobs but also 174 687 jobs for software engineers. Nevertheless if you are a data scientist there is a job for you out there and the pool is getting bigger every year.
What data scientists are working on?
I gathered some descriptions from Linkedin profliles:
Working on Revenue prediction models, and query item proximity measures to re-order search results
Mentoring team members working on the search components like spell-correction, query typeahead suggestions, related search queries, etc.Worked on Price Optimization, and product/user recommendation algorithms.
Data analysts tend to work a lot with machine learning stuff:
Apply Machine Learning for improving search relevance.
Working on CTR prediction using regression techniques. The regression engine is used to learn relative importance (non-negative coefficients) of item fields by minimizing the weighted-mean-square-error in CTR prediction weighted by query frequency.Provided an contextual Ad-matching solution using the contextual information derived from publisher and advertiser information (without crawling the pages) — a patent is applied for this process. The context plus the network information are used to define user segments, which provide a deeper understanding of user click behavior between sections of sites for different types of Advertisements. The context also provides an effective way of generalizing and bootstrapping for new publishers and advertisers.
Provided a collaborative filtering based solution to promote mobile applications to the users based on the user mobile application usage profile.
Provided a innovative mobile device fingerprinting solution with a probability of device uniqueness exceeding 0.99.
Designed algorithms for Ad budget management towards yield optimizationAnalyzed data and implemented backend data processing systems using Hadoop Map-Reduce framework.
There is no getting away from product insights and development:
Product analysis (& A/B testing) for a website to find churn possibility, reasons, most/least used features, KPIs performance & changing trends which affect growth directly or indirectly
Campaign management and Campaign performance analysis (millions of unique customers)
Customer segmentation based upon their behaviour, recency, frequency, engagement & other attributes
Generating reports/dashboards to facilitate decisions of managementBuilding forecasting solution based on 70 billion events per day data stream.
Improving decision making capabilities of the management team, PMO, HR and project teams through delivering high quality metrics and reports.
Programming (mostly in R), data cleaning, data lake development, meetings with stakeholders, developing frequent code and design reviews within our team, as well as gradual improvement of our own process.
Predictive modeling to optimise marketing campaigns on search engine. Built machine learning model with historical SEM data to improve marketing efficiency.
Designed and implemented recommendation algorithms which leverage customers’ feedback and deal specific feature to improve the recommendation product.
Built dashboard to monitor the performance of recommendation algorithms.
Developed analyses around social data which facilitate better advertisement performance (ROI, CPL, CPA).
Developed algorithms which automatically manage campaign budgets according to performance.
Automated data retrieval and processing to create actionable and insightful plots for client-facing dashboards.
Also Samuel Noriega wrote some nice article about it, check it here: https://medium.com/@shugert/what-do-data-scientists-do-exactly-d4bc0d7fbb78#.u44zdjkkm
Another great way to learn what is being accomplished with big data is to read Airbnb’s tech blog on Medium: https://medium.com/airbnb-engineering
What skills do you need?
Of course a lot of analytical skills but which are the most important?
The RJMetrics report says that ‘The top five skills of a data scientist are data analysis, R, Python, data mining, and machine learning.’
If you were to make a deeper dive into topic you would find out that other important are (data also from Linkedin profiles):
Technical Skills : Machine Learning, Deep Learning, Python, SQL.
Libraries : TensorFlow, Numpy, Pandas, SickitLearn.Experience in Data Science, Machine Learning, Deep Learning, Convolutional Neural Networks, Natural Language Processing, Pattern Recognition, Big Data ML, Big Data Analytics, GPU Computing, etc.
Ample experience with Big Data — Spark, Shark, MLLib, Mahout, Weka, Hadoop, Crunch, MapReduce, Amazon EMR, AWS-EMR, HDFS, Caffe, Theano, TensorFlow, CNTK, etc.
Proficient in Python, Pyspark, Scala, R, Matlab and Octave
Excellent Quantitative and Analytical Skills
Aptitude for learning new things and working with new technologiesSpecialties:
Data mining modelling (focused on e-commerce)
Data analytics in online advertising context (Google Analytics, Google AdWords)
A/B testing (focused on e-commerce)
Big data (Hadoop, Hive, Spark, SQL Server, Google BigQuery)
Data visualization (d3.js)
Business analytics (customized KPI reporting)Large-scale data mining of terabytes of data.
Providing statistical analysis of audience.
Programming in SQL and Hadoop.
Preparation of various ad-hoc analyses to support business.
A/B tests.
Creating reports in Excel files.Expertise in statistical modeling and visualization, data analysis, writing data pipelines, and experimental design and survey methodology.
Tools: R, SAS, Matlab; Hadoop, Hive, Presto, SQL; Dagger, Dataswarm.
How the job offers look like?
Good examples can be currently open positions at Stripe:
or Airbnb:
How to become a data scientist?
You probably wondering ok, but what should I do to become a data ninja?
The easiest answer is — start and do some smalls projects
I would start with recommendations form Quora:
There is a ton of data available out there — just search for something like ‘open data’ https://en.wikipedia.org/wiki/Open_data
The biggest resource is probably here:
and here: https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
Use any source of big data like for example Google keyword planner: https://adwords.google.pl/KeywordPlanner or open data from companies: https://baremetrics.com/open
For more technical knowledge explore sites like: http://rpubs.com/ and https://github.com/trending/r
What’s the first step?
Start a blog and try to optimise it’s SEO ranking, analyse the data from Google Analytics. Create some small 1 hour projects but also a longer one, this Quora thread may be a good inspiration:
Check out some freely available APIs and do cool things with them.
Get some online education:
In 2016 there is no excuse if you want to learn something then there should be a way to do it online and data science is no exception. Coursera and Udacity has a lot of courses and I can tell you from my Linkedin research that people really use them to get the knowledge and actual job.
Here are some links:
- https://www.coursera.org/learn/exploratory-data-analysis
- https://www.coursera.org/specializations/jhu-data-science
- https://www.coursera.org/learn/data-products
- https://www.coursera.org/learn/machine-learning
- https://www.coursera.org/learn/practical-machine-learning
- https://www.coursera.org/learn/data-cleaning
- https://www.coursera.org/learn/r-programming
- https://www.coursera.org/learn/data-scientists-tools
- https://www.coursera.org/learn/reproducible-research
- https://www.coursera.org/learn/statistical-inference
- http://blog.udacity.com/2014/11/data-science-job-skills.html
Follow some great people:
I can recommend:
But most importantly keep an eye on job market with websites like https://www.kaggle.com/jobs or Linkedin or pretty much on any other modern job board.
Thank you for reading!
Check also my previous posts: