This Week in Data Preparation (Feb 28, 2020)

Nikolaos Konstantinou
The Data Value Factory
2 min readFeb 28, 2020

In this week’s blog post: one report (on tech salaries), five articles (on data preparation for AI, bias in machine learning, master data management, data integration in Industry 4.0, and data preparation for cognitive computing, respectively), and two reports (by Infor & Snowflake, and Databricks, respectively).

Image by Markus Spiske from Pixabay

This report gives an answer to the question: “Which occupations saw the biggest increases in salary and job postings between 2018 and 2019?”. The short answer is: “Those that allowed businesses to wrangle and analyze data, build applications, and make sure those applications went into the world relatively bug-free.”

In this blog post, Rodrigo Ceron, a senior managing consultant in IBM, discusses data preparation for AI. “If I were to give you one hint about the AI game, it is to invest in data preparation!”, he comments.

In this article, Davide Zilli, Client Services Director at Mind Foundry discusses bias in machine learning algorithms. “There are hundreds of parameters to take into consideration during data preparation, so it can often be difficult to strike a balance between removing bias and retaining useful data.”, he comments.

In this article, Bill Connelly, founder of Byte Bell news site, discusses Master Data Management (MDM): what is it and why do you need it.

In this article, the sixth in a series of articles, Georg Frey and Sepp Gmeiner from Lignum Consulting, discuss data integration in the context of Industry 4.0 — “The Connected Factory”.

In this article, Marty Loughlin, SVP and head of global sales at Cambridge Semantics, comments on data preparation for cognitive computing models.

In this press release, Infor, a global leader in business cloud software specialized by industry, announced it is partnering with Snowflake, the cloud data platform, to help enterprises build automated data warehouses.

Databricks, the company founded by the creators of Apache Spark, announced a data integration partner program. Ali Ghodsi, Databricks’ CEO, feels that running data warehouses separately from data lake platforms leads to siloed data. That’s why Databricks is heavily pushing the “data lakehouse”, its concept for a converged data lake/data warehouse platform.

Thank you for taking the time to read this blog every week.

--

--