If there is a decision to be made, there is risk for bias.

Image by Coffee Bean from Pixabay

In the last decade, advances in data science and engineering have made possible the development of various data products across industry. Problems that not so long ago were treated as very difficult for machines to tackle are now solved (to some extent) and available at large scale capacities.
These include many perceptual-like tasks in computer vision, speech recognition, and natural language processing (NLP). Nowadays, we can contract large-scale deep learning-based vision systems that can recognize and verify faces on images and videos. In the same way, we can take advantage of large-scaled language models to build conversational bots, analyze large…

A new web-based tool to get insights from time series forecasting

Time Series forecasting comprises a set of algorithms that are designed to predict future behavior based on historical data. Here at Daitan, time series forecasting has been one of the most important applications of machine learning and today, we are pleased to announce the first (to our knowledge) time series forecasting playground. Inspired by the Neural Network Playground and the GAN Lab, the The Time Series Playground is an interactive open-source tool designed to provide intuition on how to train AutoRegressive Feed Forward Neural Networks for time series forecasting.

In the tool, one can define, configure, and train Neural Networks…

How can we learn good representations without relying on human-annotated data?

Image by Ahmed Gad from Pixabay

The following brief abstract provides highlights about this featured article on Representation Learning, written by Thalles Silva and posted in Towards Data Science.

In this article the author discusses the importance of learning good representations from images without having labels to guide the learning process. These representations are particularly special because they can be used to solve a large number of related problems. This process is called transfer learning, and it is largely used in both academia and industry. Transfer learning relates to the idea of using the knowledge a deep neural network has gained by solving a particular task…

I step by step guide on how to use Rasa to create a chatbot assistant

Chatbots are everywhere, and so are the tools that promise easy development and deployment of these applications. Frameworks like Google DialogFlow, Microsoft Luis, and Amazon Lex are fighting (badly) each-other to control this growing market.

In this blog post, we describe an experience we had at Daitan building a conversational agent for handling calendar appointments. We explore an open-source alternative — Rasa — a machine learning-based framework to build contextual AI assistants. We describe Rasa’s benefits and drawbacks and how we managed to build and deploy a calendar scheduler agent from scratch.

The Rasa Framework

As stated in the Rasa website:

The Rasa…

Sharing our experiences building an audio denoiser using GANs

Photo by Jason Rosewell on Unsplash

An article by Jacob Boness, Jamie Thomassen, and Colton Davenport

One of the main goals of the Innovation team at Daitan is to keep our eyes open to emerging technology that can positively impact our clients. Undoubtedly, one of such technologies is voice-based application. In the last few years, large and mid-sized companies have relied more and more on this kind of application to address problems ranging from recognition, identification, and enhancement tools. In this piece, we expand our previous work on noise removal of audio signals. We believe that exploring different strategies to solve a still open problem, can…

It’s not only about Agility, it’s much bigger than that


I like to think holistically about the business impact of engineering’s role on a company by always connecting development projects to business value.

Unfortunately, all too often, I see engineering organizations that operate transactionally, meaning the dominating business factors framing a project are typically time, resource skills and project costs. Those are important, but not the best leverage for creating business value.

I believe when engineering measures success in terms of growth-related business outcomes, the contribution to business value is significant-making it important to identify targeted outcomes at project onset, and not in hindsight as a justification for a new…

Contrastive learning for supervised and self-supervised tasks

Photo by Ivan Bandura on Unsplash


Biometric-based authentication methods tend to increase in importance in times of social distancing, remote working, and collaboration, as they can deliver higher security and customer experience at the same time. One of its techniques is Voice Recognition, that is, identifying whether a given voice input is from someone previously registered or not. Voice authentication presents one of the best user experiences among all authentication methods, so advances in that area could help improve applications’ security without impairing experience in many industries.

In this piece we describe how we built a reasonably performing Voice Recognition System with PyTorch, using deep learning…

Local Differential Privacy, Randomized Response, and Global DP in 10 min read

In the last 2 decades, with the increasing availability of sensors and the popularity of the internet, data has never been so ubiquitous. Yet, having access to personal data to perform statistical analysis is hard. In fact, that is one of the main reasons we, as data analysts, spend so much time doing research using “toy” datasets, instead of using real-world data. For areas like Healthcare, it is common to see the birth of specialized startups that spend years in contact with medical centers in order to get quality datasets.

But, what if there was a way for medical centers…

How to train a deep learning system to estimate Mean Opinion Score (MOS) using TensorFlow 2.0

Photo by Matthieu A on Unsplash


If you’ve ever used VoIP (Voice Over IP) applications like Skype or Hangouts, you know that audio degradation can be a problem. In video or audio conferences, perhaps with clients and prospects, audio quality is important.

“Speech quality” might sound like a subjective concept. But there are some well-known types of degradation that hurt speech intelligibility. By intelligibility, I mean how comprehensively “pleasant” speech can be. Some of the degradations that reduce intelligibility include echo, reverberation, and background noise (usually from your colleagues).

One commonly used metric to assess the quality of an audio signal is the Mean Opinion Score…

How business requirements can prevent you from using available machine learning tools and what to do about that

When we search the internet for the required technical abilities of a successful data scientist, we will find variations of the following list:

  • Analytical skills
  • Programming
  • Linear algebra
  • Multi-variate calculus
  • Statistics
  • Machine learning
  • Data visualization
  • Software engineering

And so on.

But there is a common subject to some items on this list that does not get the same attention, despite being the main engine of several tools used by a data scientist: Convex optimization.

When hearing this term, most people will immediately start talking about how gradient descent is the most awesome thing there is, how we can add momentum…


We build core technologies, data solutions and software products that scale with real-time performance. Visit https://careers-br.daitan.com: We’re Hiring!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store