Of open-source LLMs and ever-green ML principles

Our fortnightly selection of must-reads from the community, for the community

Roberto Cadili
Low Code for Data Science

--

As data scientists in the age of AI, it’s imperative to keep our knowledge and skills up-to-date. Especially now that open-source technology and models are flourishing and are becoming increasingly easier to access. With the rapid growth of open-source models, the gap between open-source and closed-source solutions is narrowing significantly too. At the same time, amid the AI revolution, we should not forget a few fundamental concepts in data science. To ensure the generalizability of our predictive models, it’s crucial to avoid pitfalls in the way we partition the data. After all, Mixtral 8x7B or Llama 3 wouldn’t be as advanced as they are if their developers and data scientists had neglected basic machine learning principles.

The articles that we selected for this edition of the Workflow focus on KNIME’s ability to integrate with cutting-edge, open-source LLMs, usage as a teaching tool, and geospatial features. From an inspiring tutorial to connect to Llama 3 via Ollama wrapper and design a low-code chatbot, to the illustration of three common pitfalls (and their solutions) when splitting data in training and testing sets for machine learning. The last story is a brilliant analysis of the impact of fire station relocation in the Netherlands using geospatial analytics. Happy reading!

Photo by Milad Fakurian on Unsplash.

Chat with local Llama 3 Model via Ollama in KNIME Analytics Platform — Also extract Logs into structured JSON Files

By Markus Lauber

LLMs are still going strong and the development is fast. Especially when it comes to open-source and local LLMs, such as the GPT4All project. Contributing to the conversation around local and open-source LLMs, in this article Markus Lauber explores how to leverage the much hyped Llama 3 by Meta via the Ollama wrapper and prompt the model as a low-code chatbot using KNIME. How? By creating a small JSON prompt in a Table Creator node and then sending it to Ollama via a POST Request (to the URL of a localhost). The results will be output as a JSON format that you can then use and inspect. A brilliant tutorial that brings together advanced LLMs, orchestration and low-code!

Avoid Common Pitfalls When Splitting Data Into Training and Testing

By Carsten Lange

In machine learning, splitting data into training and testing sets is crucial. The training set is used to train the model, allowing it to learn patterns and make predictions. The testing set, which the model has not seen before, evaluates its performance and generalization to new data. And yet, despite being a broadly accepted practice, there are a number of conceptual pitfalls that can creep into the process. In this article, Carsten Lange uses example workflows built in KNIME to highlight three common pitfalls when splitting data into training and testing. From data leakage to incorrect normalization, the author shows how to correct those pitfalls and ensure the generalizability of your ML model predictions. Check it out!

The Impact of Fire Station Relocation Using Geospatial Analysis in KNIME

By Hans Samson

A fire is never a welcome occurrence, but having a responsive fire department can make all the difference. In the Netherlands, the standard dictates that the fire department should arrive within 8 minutes of receiving a call, a crucial factor in minimizing damage and ensuring safety. However, what happens when this response time is jeopardized by the closure of a fire station? In this article, Hans Samson uses KNIME to explore the implications of such an event, delving into a low-code geospatial analysis to uncover the impact on community safety. The author identities POIs, computes routing distance matrices and displays findings on an interactive map. A great piece that you should not miss!

We love learning new creative solutions using KNIME from the articles that we publish, and we love to share them with you. We are proud of building together a thriving community that supports each other, shares experiences, and shapes the future of low code data science.

See you in the next Workflow,

The Editors of Low Code for Data Science

PS: 📅 #HELPLINE. Want to discuss your article? Need help structuring your story? Make a date with the editors via Calendly (every second Thursday).

--

--

Roberto Cadili
Low Code for Data Science

Data scientist at KNIME, NLP enthusiast, and history lover. Editor for Low Code for Data Science.