Swift solutions to common data problems in the enterprise

Our fortnightly selection of must-reads from the community, for the community

Roberto Cadili
Low Code for Data Science

--

Businesses need to access, extract and analyze a lot of data that comes in different formats, shapes and sits in different places. Governing and engineering the data bulk can be complex, overwhelming and time consuming. Low-code tools can simplify those data operations. For example, the extraction of text from PDF files using a user-friendly interface and pre-built components that minimize the need for extensive programming knowledge. Similarly, low code tools can streamline the automation of monitoring activities, such as the collection and analysis of data about the company’s website performance (vs. the competitors’ website). In a nutshell, low-code tools can empower businesses to gain a faster yet accurate understanding of their data and operations, driving better outcomes and competitive advantage.

The articles that we selected for this edition of the Workflow focus on KNIME orchestration, low-code text extraction, and REST capabilities. From a great story about text extraction from tables in PDF files blending Python and KNIME, to an insightful tutorial to monitor the performance of your website querying Google PageSpeed Insights API, these pieces propose swift solutions to data problems companies often face. The last story provides great food for thought around the very hot topic of understanding and overcoming bias in AI systems. Happy reading!

Photo by Damon Hall on Unsplash.

KNIME — Extract Text and Tables from PDF Files with Python in a Low-Code Environment

By Markus Lauber

Text extraction from PDFs involves the process of converting the text content embedded within a PDF document into a format that can be easily manipulated and analyzed. This very useful operation is often cumbersome and messy, especially when the text we want to extract has a fancy layout or is stored in tables. In this article, Markus Lauber shows how to extract data tables and text from a PDF file using a low-code approach with Python in KNIME. The use of ad-hoc Python libraries, such as Camelot and PyMuPDF, combined with the masterful use of KNIME for looping, smooth data access and data orchestration, make the process of textual data extraction as swift as it can be. Check out this great tutorial!

How to monitor the performance of your websites with KNIME and the Google PageSpeed Insights API

By Giovanni Battisti

Website performance refers to how fast and efficiently a website loads and functions for users. It encompasses factors like page load times, responsiveness, and overall user experience. Having a performant website is crucial for any business as it directly impacts user satisfaction, engagement, and ultimately, business success. In this article, Giovanni Battisti provides an insightful tutorial on how to monitor the performance of your website (both in “Desktop” and “Mobile” mode) using Google PageSpeed Insights API and KNIME. After a short intro to PageSpeed Insights API, the author presents a KNIME workflow to automate data collections from the API, data parsing and the creation of an interactive dashboard to compare website performance. A great low-code solution to automate the benchmarking of your website performance with that of your competitors — Don’t miss it!

Overcoming AI Bias: Understanding Bias in Machine Learning and Humans

By Jamie Crossman-Smith

Bias, whether in human decision-making or machine learning algorithms, represents a systematic deviation from objectivity or fairness. In humans, it often stems from preconceived notions, cultural influences, or incomplete information, leading to skewed judgments or outcomes. In machine learning, it usually comes from errors or prejudices present in the data or algorithms that can lead to unfair or inaccurate predictions or decisions. In this article, Jamie Crossman-Smith shares his views on the topic of bias, and especially bias in machine learning and AI. He offers food for thought on overcoming human and ML bias by exercising awareness, critical thinking, understanding limitations, and designing curated data processes.

We love learning new creative solutions using KNIME from the articles that we publish, and we love to share them with you. We are proud of building together a thriving community that supports each other, shares experiences, and shapes the future of low code data science.

See you in the next Workflow,

The Editors of Low Code for Data Science

PS: 📅 #HELPLINE. Want to discuss your article? Need help structuring your story? Make a date with the editors via Calendly (every second Thursday).

--

--

Roberto Cadili
Low Code for Data Science

Data scientist at KNIME, NLP enthusiast, and history lover. Editor for Low Code for Data Science.