What does a data scientist do?
My contribution to a popular question on Quora.
Earlier this year, two data scientists joined Vizzuality’s growing team. As I wrote their staff profiles and welcome blogs, I tried to understand better what they would be doing each day and how their work would fit into the process of creating and developing data visualisations, maps, and other data-based applications. I searched Quora for an answer but struggled to relate to what had already been written. I got the impression that the role was fairly solitary, clinical, and involved a lot of ‘data cleaning’, number crunching and complicated mathematics.
Realising that my perception might be a little off (ok, a long way off) I decided to spend a day learning the ways of one of my own teammates, Benjamin Laken, so I could add a new answer to Quora. My intention was to help other people like me understand what data scientists do and how their role interacts with designers, developers, and clients. I submitted the answer below to Quora and thought it would also make an interesting blog post.
Benjamin joined our team in January 2017 after spending many years as a researcher on projects aiming to better understand the Earth’s climate. When I talk to Benjamin, it’s clear that he loves talking about science and he has a knack for explaining things, two qualities I came to see are important for a data scientist to have. Let me explain why by giving you a peek into Benjamin’s day.
Benjamin’s usually in the office by 8.30am, and after sharing an update on his work with the rest of the team via Slack, he dives into some data analysis tasks. For instance, today he’s exploring a dataset with Jupyter Notebook and Python; and he’s thinking about what kind of interactive statistical visualisations would allow people to view the data without revealing any details specific enough to identify an individual person. Since the dataset contains confidential information, Benjamin is also considering where the data could be safely stored and how queries could be made without exposing the database. At a later date he’ll discuss the options with the client to see which one they prefer.
Mid-way through the morning, Benjamin gets an urgent request to help resolve an issue ASAP. Our developers need a way to extract location and time information from pixels in a custom web-map. The web-map is part of an application that will help people monitor forests. If we’re able to extract precise longitude and latitude coordinates from a pixel that indicates where deforestation might have occurred, it becomes much easier for people to investigate it in real life. Benjamin and Alicia, another member of our data science team, set about creating an example software-pipeline to extract the information. It takes them a little while so they stop at lunchtime to refuel.
A few hours past lunchtime, Benjamin and Alicia arrive at a solution they think will work. They test it out and feeling satisfied with it, they formalise the notes they’ve been writing in a Jupyter Notebook and push it to Github so anyone can see it and replicate it. The next step was to review the solution with Álvaro, one of the developers working on the app, to make sure it could be integrated into the application without any issues. Although it worked, the developers decided to go with a different solution that would extract the information at an earlier stage. However this time and effort won’t go to waste as the solution might be suitable for another project in the future. So, for now, it’s saved in our ‘tutorials’ folder.
Benjamin’s last appointment of the day was with Ariadna, one of our designers. Together they are designing a webpage that aims to communicate complex climate science to an audience that probably hasn’t studied science since high school. Ariadna is a perfect example of the target audience; she cares about climate change but doesn’t know much about the science behind it. By combining Ben’s climate knowledge and Ariadna’s design skills, they hope to create something that’s accessible, user friendly, and packed full of facts and figures that help people understand the role of carbon in climate change.
This afternoon, Benjamin is teaching Ariadna about the carbon cycle, explaining concepts such as the movement of carbon between the atmosphere and the biosphere. I asked Ariadna what one thing stuck out for her during the conversation she had with Benjamin and she said, “I’m learning lots of things I didn’t know before. One of the most amazing things I’ve discovered with him is how much carbon storage varies between the seasons, and how there are people who can actually work out the amount of carbon on Earth by looking at the dark spots on the moon or stars.”
Data and design.
Realising that there was more to this lesson than just teaching a colleague how the carbon cycle works, I asked Benjamin for his perspective on why data scientists work with designers. He explained that using the wrong visualisations could be misleading. Ariadna might create the most beautiful design you’ve ever seen, but if it doesn’t convey the facts, or if it stretches them, it becomes meaningless. Data scientists and designers have to work together to ensure the data leads the design, and that it’s communicated in a way that’s not confusing or overwhelming to the user. If a designer understands the details and the context of the data they are designing for, it’s more likely the design will achieve its objectives.
At 5.30pm Benjamin has wrapped up his tasks for the day and is heading home. My day of learning from him has made me realise a data scientist does so much more than simply review and analyse data. They spend a lot of time listening and talking to people:
- with clients to understand their domain and problem-space, to ensure they get the product they really want;
- with designers to create visualisations that best balance aesthetics with scientific precision;
- and with developers to ensure the data ultimately required are optimally packaged and ready to use.
In a role that encompasses teacher, advisor, and problem-solver, a data scientist needs to be great at communicating and passionate about their subject. After what I’ve heard today, I’d say Benjamin fits that description perfectly and he’s definitely changed my preconceptions about what data scientists do all day!
Many thanks go to Benjamin for sharing his day with me!
If you found this blog helpful, please recommend it and help other people find it too. Thanks!