DATA STORIES | AUTOMATION | KNIME ANALYTICS PLATFORM

Ask me about the main limitations of KNIME

My successful data journey with KNIME to accomplish the assigned task

Kazimierz Grabski
Low Code for Data Science

--

Figure 1. List of files to work with.

A typical conversation/task at work:

‘But how much time does it take to launch new product, and to withdraw legacy product from the offer?’ — boss asked.

‘Well…’ — was the smartest answer I had at that moment while another question was popping up in my mind: how much time will it take to get the evidence-based answer…?

Wham, that’s my new task: extract data from ca. 100 Excel files, track the life cycle of over 4000 products, calculate statistics on product introduction pace and retirement periods, visualize results, and derive actionable business insights. My initial instinct was to resort to the comfort of VBA in Excel, a familiar terrain. However, a daring thought lingered — what if I embraced the analytical prowess of KNIME?

This article encapsulates my metamorphosis from an Excel-centric individual to a KNIME user who conquered his own limitations in using this analytical platform to accomplish the assigned task.

Excel or KNIME?

Confident in my Excel proficiency, having developed dozens macros, the decision to switch to KNIME was not made lightly. Developing VBA macros, while familiar, posed challenges in terms of coding complexity, testing, and subsequent modifications.

The allure of KNIME lay in its promise of a more streamlined, efficient, and adaptable solution. Thus, I embarked on a deep dive into the KNIME waters, anticipating a task completion timeframe of up to two weeks.

KNIME: Familiar Yet Uncharted

1. Loops — Navigating Data Sources

My data resided in numerous source files, prompting the need for loops to sequentially read data. Although loop usage was not second nature to me, a few attempts, culminating in success, resulted in the compilation of a consolidated dataset.

2. The Power of the Column Expressions node

Next in line was the challenge of tallying occurrences of different sequences. While accustomed to employing successive String Manipulation nodes for this purpose, I hesitated to explore the Column Expressions node. Surprisingly, its utilization proved simpler than anticipated, changing my approach to data manipulation.

3. Statistics for Data Subsets

The core value of my tool rested on computing basic statistics: mean and median. Despite my awareness of KNIME’s statistical capabilities, my past experiences hadn’t involved leveraging them. Implementing Math Formula nodes for statistical calculations proved straightforward and delivered the desired results.

4. Visualizing Results

With two statistics and dozens of input files containing data for approximately 4000 products, presenting results in a tabular format seemed impractical. The need for visualization arose, a territory unexplored by me in KNIME. A brief encounter with the Bar Chart node, and the workflow was ready.

Figure 2. Workflow that overcomes the main limitations.

Limitations in Using KNIME…

Would you guess that my limited familiarity with KNIME’s functionalities, specifically loops, the Column Expressions node, statistical computations, or result visualization posed the greatest challenges in creating this workflow? Close, but the correct answer lies elsewhere.

The primary limitations in using KNIME were my preconceptions about my own KNIME proficiency. Ultimately, not only did I triumphantly overcome my self-imposed limitations, but I also gained new convictions: that KNIME is irresistibly attractive and worth embracing.

P.S. The task was completed in approximately 4 hours.

--

--