This Week in Data Preparation (August 10, 2020)

Nikolaos Konstantinou
The Data Value Factory
5 min readAug 10, 2020

This weekly post with news items from the data preparation market is brought to you by The Data Value Factory, the company offering Data Preparer.

14 links in this week’s post: 5 articles (on data democratization, data catalogs, master data management, data warehouses/lakes/hubs, and global transformation, by Dataiku, SolarWinds, Collibra, Naveego, Lore IO, Profisee, Gartner, and Forbes), 1 research study on data engineering (by Ascend.io), 1 tutorial on data prep for Machine Learning (by Microsoft Research), 6 company announcements (by Smarten, NRMA, Striim, Qubole, Datafold, and Samasource), and 1 capital raising announcement (by Tetra).

The Data Value Factory — This Week in Data Preparation. August 2020 Image by Gerd Altmann from Pixabay.

How to navigate the rise of data democratization: A guide for IT architects. In the face of data complexity, IT architects are responsible for making sure the backend data plumbing is well-conceived, so data teams across the organization can effectively leverage data. François Sergot, Dataiku offers a guide to IT architects on how to navigate the rise of data democratization.

Top 7 data catalog use cases for enterprises. The underlying goal of a data catalog is to capture and store metadata, which is data about data. “In many ways, data catalogs haven’t changed much in 20 years,” said Thomas LaRock, head geek at SolarWinds, an IT service management tools provider. All modern BI tools, cloud platforms and data discovery applications include some type of data cataloging capability that provide basic visibility within their own environments. “But rarely are all of your data assets stored and managed in a single environment or repository,” said Chandra Papudesu, vice president of product management, catalog and lineage at Collibra, a data intelligence company.

Mastering the Internet of Things with Master Data Management. The symbiotic relationship between MDM and the IoT is mutually beneficial, as explained by Naveego CTO Derek Smith, Lore IO CEO Digvijay Lamba, and Profisee CTO Eric Melcher, in this analysis.

How data warehouses, data lakes and data hubs differ in focus and work better together. Ted Friedman, Research Vice President, Gartner, discusses data hubs, lakes and warehouses and how to use them effectively in your organisation.

Strategies And Practices For Transformation To A Subscription Business Model. Ramanujam Rao, Senior Technology Executive with expertise in Global Transformation, Enterprise Architecture, Analytics, AI/ML and Cloud Services, shares some core principles based on things he’s learned that can make the transformation a bit easier and eventually successful.

Infographic: Data Engineering Evolved. Ascend.io, the data engineering company, announced results from a new research study about the work conditions of data scientists, data engineers, and enterprise architects in the U.S. “Organizations are quickly discovering that data engineers are essential to unlocking the value of data and to removing bottlenecks across the entire data team,” said Sean Knapp, CEO and founder of Ascend.io.

Data Prep for Machine Learning: Normalization. Dr. James McCaffrey of Microsoft Research uses a full code sample and screenshots to show how to programmatically normalize numeric data for use in a machine learning system such as a deep neural network classifier or clustering algorithm.

Smarten Announces an Innovative eLearning Course, ‘Be a Citizen Data Scientist — Aspire, Inspire, Become’. Smarten CEO, Kartik Patel says, “By transforming business users into Citizen Data Scientists, the organization will enjoy support for data-driven, fact-based decision making, and will gain insight and perspective and clarity.”

NRMA resets its rules for data wrangling. Harris Hutkin, general manager of digital and data, said data governance projects work best with whole-of-organisation buy-in and a clear understanding of the project’s goals — and sometimes that means not using the term “data governance.”

Striim Expands Cloud Support, Bolsters Manageability and Diagnostics. “In today’s challenging business environment, cloud adoption initiatives have become the number one priority for many organizations looking to optimize their operations, enhance their customers’ online experience, and gain the real-time insights required for digital transformations,” said Alok Pareek, founder and EVP of Products at Striim.

R Works Its Way Into Qubole’s Data Lake. “The idea is to make it simple for data scientists to get up and running with fast and secure R environments in Qubole, without requiring them to get their hands dirty with the technical details,” says Mohit Bhatnagar, SVP of products at Qubole.

Datafold is solving the chaos of data engineering. Datafold is a brand-new platform for managing the quality assurance of data. Much in the way that a software platform has QA and continuous integration tools to ensure that code functions as expected, Datafold integrates across data sources to ensure that changes in the schema of one table doesn’t knock out functionality somewhere else. Founder Gleb Mezhanskiy knows these problems firsthand. He’s informed from his time at Lyft, where he was a data scientist and data engineer, and later transformed into a product manager “focused on the productivity of data professionals.”

AI bias detection (aka: the fate of our data-driven world). One solution that’s becoming more visible in the market is validation software. Samasource, a prominent supplier of solutions to a quarter of the Fortune 50, is launching AI Bias Detection, a solution that helps to detect and combat systemic bias in artificial intelligence across a number of industries. “Our AI Bias Detection solution proves the need for a symbiotic relationship between technology and a human-in-the-loop team when it comes to AI projects,” says Wendy Gonzalez, President and Interim CEO of Samasource.

Tetra Insights raises $1.5 million to transform customer audio and video into business insights. The round is led by Active Capital, a top-tier seed firm for B2B SaaS companies outside of Silicon Valley, with other participants including HNVR Technology Investment Management, an early-stage venture capital firm in Menlo Park, CA. “In a study of CEOs across industries, 39% report that customer experience is their number one strategic differentiator,” said Michael Bamberger, Co-founder and CEO of Tetra Insights. “Leading organizations are increasingly relying on qualitative data to generate deep customer insights,” said Pat Matthews, CEO & Founder of Active Capital. “Michael and the team at Tetra have built a machine for turning voluminous video and audio into usable, searchable data no matter where it originated,” said Joe Malchow, founding partner at HNVR. “User research is critical for our company to inform product and commercialization decisions,” said Colin Swindells, who heads research of digital products for professional agriculture at Yara, a company, deploying Tetra across his international team. “Tetra has been a game-changer for incorporating insights from qualitative research into our product,” said Kevin Niparko, Product Lead at Segment.

The Data Value Factory. A week’s worth of manual data preparation in minutes.
A week’s worth of manual data preparation in minutes.

Thank you for reading our weekly post with news items from the data preparation market. Have you tried Data Preparer?

--

--