Data Science team at Synthesio is mostly composed of what we like to call Data Science Engineers.
In short, these are people who know enough about Software and Data Science to bring great AI stuff into production: taking scalability and reliability concerns on board.
If you are a Data Science Engineer at Synthesio, real work begins when you send your algorithm in production.
We could give a definition (actually there are a lot of them depending on your organisation) of Data Scientist as the kind of people with a PhD in Data Science. They work on algorithms: they create, they modify and improve these algorithms along time.
Typically they create algorithms and develop prototypes using their laptops.
Data Science Engineer is the “applied” version of the Data Scientist. They are keen to deploy their work in production and analyse its behaviour on real use cases. The Data Science Engineers master the use of algorithms but even if they have a great knowledge about them they don’t necessarily have the finest grained vision of how exactly they work inside. However they excel at choosing the best one for every use case they fulfil.
They are able to take a prototype that runs on a laptop and make it run reliably in production, sometimes with a little help from Data Engineers.
These are some important characteristics defining what a Data Science Engineer is:
- Data Science Engineers have strong knowledge about Data Science field
- They are capable to work with Data Engineers and Site Reliability Engineers who evolve and maintain the production systems
- They understand software development methodology and are pretty skilled on the tools developers use daily (IDE, continuous deployment pipelines…)
- In order to make data products that work in production at scale, they focus on holistic design and use of components such as logging and A/B testing infrastructure
- As data pipelines and models can go stale and need to be retrained, Data Science Engineers need to be up to speed on issues that are specific to monitoring data products in production and be able to know how to detect data smells