Data skills for the average Software Engineer

Daniel Sager
2 min readFeb 13, 2017

Probably far from being comprehensive, here are some of the basic techniques you can use to leverage data within a software application. Each is a huge topic in its own right and can be tackled in many different forms and at almost inconceivably large scale.

Search & Filtering

Search and Filtering enables users to look for content of interest.

  • All French-speaking consultants with USAID experience in Uganda
  • All education projects funded by the World Bank in South East Asia

Metrics & Statistics

Collecting continuous data points from all kinds of sources lets you monitor performance, discover trends or create beautiful visualizations. #datadriven

  • Amount of daily posted jobs over the last month
  • Average error rate of “REST-API version 3”


Recommendations help you build features that help your users discover previously unknown content. The better the recommendation algorithm, the higher the likelihood of the user considering the recommended content useful.

One Technique: Collaborative Filtering

Collaborative filtering makes use of collective user behavior to recommend content among the like-minded (People who bought this item also bought…).

  • News articles that were read by users with similar reading behavior
  • Jobs that were viewed by similar groups of people

Another one: Similarity

Recommendation of similar content is based on the characteristics of an object (like its title, content or category).

  • News articles that talk about similar topics
  • Jobs that have a similar description and the same location


Categorization takes an object (e.g. an article or a project) and assigns it one or multiple predefined categories (e.g. news categories or sectors). There are also ways to generate new categories, derived from data (e.g. by clustering).

  • Assign a sector to a project based on the project’s data
  • Label user created content as spam or not-spam (classification)

Entity Extraction

Entity extraction helps to create structured data (e.g. companies or people) from unstructured data (e.g. a free-text description).

  • Extract the relevant locations from a project description
  • Extract mentioned donors from a news article


Generate predictions of the future based on historical data. Knowing the future tends to be beneficial for obvious reasons.

  • How likely is a user to sign up considering his recent activity?
  • How many applications will a job get?