Driving Innovations @ Socialbakers

Published in

Emplifi

6 min readMar 10, 2020

First of all, I have to say I’m fortunate to manage a team of super clever folks, each keen to solve hard problems creatively.

Yes, I’m talking about the research or data science team in Socialbakers. Actually, the name of our department is the Innovations.

At Socialbakers, we work with social media data: posts, photos, and videos from all social networks, including ones you’ve never heard of.

This article describes our daily tasks as well as the atmosphere in the team and our approach to problems we’re working on. It’s also something I would usually explain to candidates during the interview, so it might be handy for anyone interesting in joining our team.

Innovations is a team dedicated to, and responsible for, delivering the most advanced features for our product. From simple, clever tweaks and hacks to deep learning systems processing billions of data points every day, we deal with it all.

So, which kinds of things do I talk to candidates about?

I’d tell them that the variety of projects we work on is quite wide. In fact, I believe there is not a single person that could cover all the things we’re working on — and that’s ok. No one expects you to be an expert in every field. What we expect, on the other hand, is that you’re able to dive into the problem and find the solution together with the team. So, instead of just having in-depth knowledge of some tools, I definitely prefer candidates who have common sense and the ability to find creative solutions.

Important note: we’re not a research team in the academic sense. It’s a lot more applied research, applying existing state-of-the-art approaches to problems we’re working on or tweaking them based on our domain knowledge.

Team structure

Although, data scientists are quite often dedicated to different unique teams, in Socialbakers we prefer to keep them all together in a team. I’m not saying it won’t change in the future, but I think for smaller teams it makes more sense and, moreover, it makes a lot of things easier, such as training or education.

The team is relatively small, currently seven people but we’re definitely aiming for organic growth like 1–2 people every six months.

The team belongs to “data teams”, currently six teams working on analytical, research, data engineering and business intelligence tasks.

We’re operating from the two locations, roughly one hour in train distance.

Agile approach

We don’t think it makes sense to work in a scrum on a daily basis, especially with research-oriented tasks. And I personally hate daily standup meetings :), so we rather meet twice a week for a slightly longer time, so the rest of the week is (ideally) dedicated to uninterrupted work.

But otherwise, our approach is kanban oriented — there is a backlog of tasks and user stories sorted by priority, and we pick one after the other.

Teamwork

One thing I find really crucial is a constant review of ideas. Developers review code very often and we also do something we could call “idea reviews”.

For features we’re working on, there is usually a small team created. It consists of one to three researchers, one of them being the head of the team. They are all responsible for discussing various ideas, came with the best ones and defend them in the face of the rest of the team. This ensures everybody is aware of what the team is doing and moreover, people are more confident about the next steps as they already been discussed and approved by the team.

Continuous education

This is, in fact, an integral part of our work. It’s quite clear that data science work means a lot of studying to be able to solve a variety of tasks, but on top of that, we support individual education plans for each member of the team.

Every week team members have dedicated study time (whitepapers, MOOCs, books, online study) and this is supported and encouraged by all managers, right up to our CEO.

From time to time, we also like to publish some materials on our company’s Medium account, such as our recent project related to sentiment analysis.

To share the knowledge, we regularly educate the rest of the people, either through presentations, hands-on workshop or dedicated machine learning study group.

We also organize two local meetups, one in Prague more related to Spark and AI and the other one in Pilsen, oriented towards AI+ML. Our researchers often present our projects in these meetups.

We also regularly attend different events, such as ML Prague or Spark Summit with talks, workshops or posters. The easiest way how to get to the conference is to submit a material there :).

Career structure

Where do you see yourself in five years? Do you also hate this typical job interview question? So do I because it’s quite difficult to explain that I see myself lying on the beach on Bali, drinking something good :)

Obviously, the team consists of junior as well as senior researchers. Seniors are responsible for maintaining the high quality of our solutions and mentoring our juniors, from simple code reviews to making sure they are able to present their ideas and outcomes clearly to our top management.

Each member of the innovations team has their own development plan for the next 6–12 months and we periodically check in on their progress. Based on fulfilling the plan, and feedback from the team and managers, the member can be promoted to a higher role.

Personal development is a very important topic that we can go into more detail in a future article.

Example of the junior researcher development plan

Software stack

This section is, intentionally, not higher in the article, as I think it’s not of paramount importance. Obviously, like most data science teams, python is our language of choice, backed by Spark processing the terabytes of data we have.

We mostly work in Jupyter or Databricks notebooks.

Otherwise, we use a common portfolio of tools such as scikit-learn, pandas, numpy, Keras and so on.

Data

A key thing, right? We’re dealing with billions of social media posts. It means a lot of NLP challenges, such as sentiment analysis or topic detection and various language models. It also means a lot of computer vision challenges — our customers want to know what’s in the images and videos.

Topics (interests) detected in images (source, source)

Socialbakers also help customers to manage their pages, monitoring them through our solution and replying to comments and questions. An ideal use case for anomaly detection is when a customer wants to confirm that a rate of negative comments on their post is exceptionally high (above what could be expected) and therefore is notified when this situation happens.

Anomaly detection on sentiment comments for one customer

There is an almost never-ending stream of ideas coming to us from our wonderful product team as well as our top management. We also have our own ideas and projects, so the final list of priorities is a mixture of all these.

Conclusion

Are there any disadvantages? Yes, there are — we have so many projects and ideas in our backlog that we need to carefully choose the most important one.

Maybe you have some good ideas you’d like to share with us!

Sounds interesting? Check out more of our stories & don’t forget we’re hiring!

Driving Innovations @ Socialbakers

Written by Peter Krejzl