Twitter Data Science Interview

Vimarsh Karbhari
Acing AI
Published in
3 min readJul 10, 2018

Last Earnings, Twitter Inc. soared the most since its market debut in 2013 after it posted the first revenue growth in four quarters, driven by improvements to its app and added video content that are persuading advertisers to boost spending on the social network — Bloomberg

Twitter has one of the biggest data sets in the world. It is much different from Facebook from the aspect that Twitter is real time. Twitter data sets are awesome troves of information and provide great insights. Working on some Twitter data set and providing valuable insights can be a good portfolio project to showcase. One can get twitter data here.

Interview Process

The interview process usually consists of phone interview with the hiring manager. On site interviews consists of meeting with Engineers/Data Scientists. The questions are usually algorithmic in nature including some machine learning questions, math/application based questions and one system design question around working on a distributed system to deliver high scale machine learning.

Important Reading

  1. Tips for using the Twitter APIs: Twitter Data Developers Blog
  2. All Twitter Dev Libraries (Including Python): Twitter Developer Utilities
  3. Twitter Data Case Studies: Use Cases to inform business decisions

AI/Data Science Related Questions

  • Given a 2-column file with user codes and counts, retrieve the top-k users based on a score that is a function of the number of times they appear on the file and these counts.
  • Given a list of all followers in format: 123, 345;234, 678;345, 123;…where the first column contains the Id of the follower, and the second one is the Id of who’s followed, find all mutual follows(pair 123, 345 in the example above). Do the same in the case, when this list does not fit into the memory.
  • Design a system to find top 10 twitter hashtags in the most recent 1 min, 10 min, 1 hr…
  • Given Twitter user data, how would you measure engagement?
  • How can you illustrate a tree-based system with a SQL query?
  • How to combine two datasets?
  • What features would you use to build recommendation algorithm for users?
  • What would you change in Twitter App?
  • How would you test if the proposed change is effective or not? (related to previous question)
  • Find the median of a large dataset.
  • If you got the job at Twitter and got access to all of its data what kind of data analysis would you like to perform?

Reflecting on the Questions

Twitter has a list of complex coding questions from a data science perspective. Twitter Data Blog has a collection of great use cases and Github repos which can be useful to do some hands on work on the platform. This will definitely help learn more about the platform and also answer some of the Twitter specific questions. I would strongly encourage checking those out.

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

The sole motivation of this blog article is to learn about Twitter and its AI technologies helping people to get into it. All data is sourced from online public sources. I aim to make this a living document, so any updates and suggested changes can always be included. Please provide relevant feedback.

--

--