DATA STORIES | CHEMINFORMATICS | KNIME ANALYTICS PLATFORM

From Molecules to Data: A Dialogue with Cheminformatic Experts

My Data Guest — An Interview with Christophe Molina & Heather Lambert

Rosaria Silipo
Low Code for Data Science
9 min readSep 13, 2023

--

My Data Guest — An Interview with Christophe Molina & Heather Lambert.

It was my pleasure to interview Christophe Molina and Heather Lambert from Pikaïros in this new episode of My Data Guest. In this insightful interview, we engage in a dynamic discussion with experts Christophe Molina and Heather Lambert, shedding light on the role of cheminformaticians and the profound impact of data science in their field. Our guests defined what a cheminformatician is, their biggest successes in their fields, and future predictions for the field of cheminformatics and data science.

Our Guests

Heather, a Cheminformatician based in Toulouse, France, is one of the world’s top Just KNIME It! KNinjas, and works for Pikaïros. She holds an MSc in Chemistry from Imperial College London.

Christophe, also based in Toulouse, France, is a Data Analyst with extensive research experience in pharmaceutical-related fields, including Cheminformatics, Bioinformatics, and Genomics. He is the founder of Pikaïros, a KNIME Certified Trainer, and excels as a cheminformatician, scientist, mentor, KNIME expert, and machine learning expert.

Rosaria: Let’s start from the question that is now in everybody’s head. What is a cheminformatician?

Heather: A cheminformatician is a scientist who blends chemistry with informatics, extending into other sciences like biology, pharmacology, and medicine. We’re trying to replicate what happens in a lab in silico, on a computer. One of the biggest parts we focus on is the discovery process. As cheminformaticians, we hope to aid that process for pharmaceutical companies and give them a more targeted research focus.

Christophe: Computer science has become more and more important in all sciences, including the pharmaceutical industry. To excel in this field, you need to have combined skills in chemistry, biology and informatics.

Rosaria: What do you do specifically in your job as a cheminformatician?

Christophe: We can be asked to solve many different tasks. Our basic data is chemical structures. There is also the biology aspect of this research: the proteins. We’re all made of proteins and it’s important to know how chemicals interact with them. You can think of proteins as the lock of the door and the chemical, drug or medicine, as the key. Every day, we manipulate millions of molecules. Eventually, there are maybe a few molecules that can be compatible with the protein. It’s our daily work to find out the best chemicals to solve medicinal problems that are related to proteins.

Rosaria: Would you like to spend a few words about Pikaïros, the company you work for. How long have you been working there and what is the main company business?

Heather: Pikaïros is an analytics company, focusing on research, training and consulting. Originally, it was just Christophe. Then I joined the company this year in February. Recently, we welcomed a third colleague, Gabriela. We’re very happy to have her with us.

Christophe: We specialize in life sciences. We train people in companies on introductory subjects about data science applied to life science. Once they master the basics, we continue to upskill them in more advanced training.

We also offer consulting services, where we help clients solve specific problems. If they eventually ask us to do some development, we can do that too, most of it with KNIME Analytics Platform.

Rosaria: You are now KNIME experts and have been around in the community for some time. How did KNIME Software help you with your work? Did it make it faster, more accurate, more agile, or what else?

Heather: You could mention all of those. What I like in particular is being able to see the results at the end of each node. That really helps with collaboration, as it’s a simple way to see what is wrong, what the results were and how they can be improved. Additionally, the visual aspect of KNIME Analytics Platform really allows you to plan out your project.

Christophe: KNIME is the main tool in our company. For training, consulting and providing the clients with the final product, KNIME is used 99% of the time. We can’t imagine Pikaïros without KNIME.

Rosaria: How did you come across KNIME Analytics Platform?

Christophe: I first encountered an early version of this software while working at a pharmaceutical company, around 2006 or 2007. During that time, I was already using another tool, but I remember your visit and the introduction to KNIME Analytics Platform, which we loved.

Rosaria: So you’ve seen the whole evolution and story of KNIME. What do you think changed the most?

Christophe: Plenty of things. There were no variables or components in the early versions. You’ ve been so proactive, improving this tool so much. What’s striking is that this tool has coped with the evolution over the years. My biggest challenge in KNIME is to keep up with it.

Heather: I started using KNIME about 7 years ago. Christophe was my first KNIME Trainer and I was his first English-speaking KNIME student. We worked on a project together when Christophe was the consultant for a company that I was working for. I come from a chemistry background and KNIME has really allowed me to get into data science.

Rosaria: When was the “wow” effect, the turning point when you decided that KNIME is the tool to go?

Heather: For me, right from the start -it’s very intuitive. Before I could work fully independently with KNIME, I used to send some of my work to Christophe, asking him to check it for me. When I could finally create my own workflows and check my work, that was the moment where I felt the “buzz” and wanted to keep going. With Just KNIME It! challenges, I still get those “wow” moments when I do something I’ve never done before.

Rosaria: How long did it take you to gain “workflow independence”?

Heather: I did a few days of training with Christophe. The key was to get active on the platform, and I was also lucky to work in a company that allowed the learning to really sink in. In that regard, it was very quick to build my first full workflow.

Rosaria: Could you name 3 KNIME features or nodes you couldn’t do without?

Heather: I’ll go with the nodes. The Joiner node lets you combine two tables — it’s super useful!

The String Manipulation node is very versatile as it allows you to perform a wide range of manipulations (e.g, search, replace, capitalization, character removal, etc.) on strings, and lastly the GroupBy node, which is very powerful to compute a wide range of aggregations.

Christophe: I especially like the visual aspect of KNIME, which is a huge advantage compared to other tools and programming languages. Secondly, you can keep track of what’s happening in every step of the process with KNIME. This is useful not only for debugging but also for communicating to others what the workflow does.

Rosaria: Let’s ask a few questions to Christophe as a teacher and to Heather as a learner. Christophe, do you use KNIME Analytics Platform in your courses? If yes, why?

Christophe: Yes, I exclusively use KNIME to teach my courses, which are centered around life sciences or cheminformatics. I always tell my students that KNIME is like a board game: you just need to learn the rules and have fun with it. KNIME is also a platform you can swiftly integrate other technologies with. As such, you can teach many different subjects with it, knowing that the tool will be able to accommodate your needs. Finally, it also has a lot of features that make teaching very easy and enjoyable.

Rosaria: How far can a low-code tool like KNIME go? Where do you stop implementing things?

Christophe: It’s endless. You can do anything: from deep learning, to connecting to databases or programs in R, Python and Java. You can capitalize on the power of these scripting languages, if you ever need them.

Rosaria: Heather, was it still useful to take part in the Just KNIME It! challenges after all the experience you have with KNIME?

Heather: Absolutely. Just KNIME It! is a weekly challenge format by KNIME itself, where you have one week to complete the challenge on the platform. You share your solution on the KNIME Community Hub, post your write-up on the KNIME Community Forum, and discuss it among other community members. I’m currently trying to transition my KNIME skills to more research-focused skills. Doing these challenges has allowed me to learn various techniques, tips, and tricks that are highly transferable to the field of cheminformatics as well as other fields you might not encounter in your everyday work.

Rosaria: Would you advise newbies to try Just KNIME It! challenges? Are they too hard, or perhaps too easy?

Heather: I definitely recommend them, along with KNIME courses and the KNIME Forum.

Rosaria: What would you advise a beginner in data science and KNIME Analytics Platform?

Christophe: We offer on-site KNIME courses at Pikaïros that are available in French, English, and Spanish. Additionally, I would recommend visiting the KNIME courses website to explore the online offerings. Accessing educational resources, such as the KNIME Forum, the KNIME Community Hub, the KNIME Blog, data stories on Low Code for Data Science (KNIME’s community journal on Medium), and whitepapers, is also highly advisable. Finding information is not difficult at all.

Heather: After completing a beginner’s course, the best approach is to actively incorporate KNIME Analytics Platform into your daily routine. You can start by attempting tasks with KNIME that you would typically perform using other software. If you don’t have the chance to use KNIME at work, consider integrating it into your personal life, such as tracking sports team results or pursuing other engaging projects that require you to develop solutions from beginning to end.

Rosaria: Heather, in Season 2 of the weekly Just KNIME It! challenges, you’re always on top of the game, proposing polished and advanced solutions. What’s the aspect of the challenges you like the most?

Heather: I like that they are different every week. Sometimes, it gets technical, while other times, it’s intuitive and interactive. I also like the community spirit and sharing on the forum. It keeps you accountable and motivates you to submit your work every week.

Rosaria: Could you tell us about the biggest success of your career? The one project you are the most proud of.

Heather: For me, it was the first project that Christophe and I worked on together. We were making a search website for chemicals. I worked on a section that allowed users to combine multiple chemical motifs to narrow down their search. It was a great feeling to see people actually using it.

Christophe: Heather built the first Google-like engine to look for chemicals. Personally, I’m very proud of having created my company 8 years ago. I worked alone for some time, not even thinking of hiring people. Recently, I’ve had the opportunity to start hiring people. I hope to continue developing this company, definitely with the help of KNIME and the supportive community.

Rosaria: We are slowly approaching the end of this interview. Before saying goodbye, we have a couple of final questions. How do you see the evolution of data science in general or in cheminformatics in the next few years?

Heather: There’s always a new buzzword going on every year. Nowadays, it’s ChatGPT and AI influence and improvements in that field.

Christophe: Recently, a fascinating in-silico discovery has been made regarding the folding of proteins. While we have knowledge of the protein’s sequence, determining its actual 3D shape has always been a challenge. In the past, there were tools that made educated guesses, but they weren’t particularly accurate in research. However, about two years ago, AlphaFold-2 emerged in the field, allowing us to predict the shapes of various types of proteins more accurately. This capability is complemented by generative models for molecules and the potential to analyze the docking of billions of virtual molecules on thousands of proteins. These advancements are progressing at a remarkable pace.

Rosaria: Did you have a chance to test K-AI, KNIME’s built-in artificial assistant?

Christophe: We tested the nightly release that already allowed us to connect ChatGPT with KNIME. It was impressive, and we’re looking forward to using it. This will assist us in automatically gathering information digested from various life science sources and consequently, accelerate our research. While resource verification remains essential and mostly manual work, relying on AI for resource collection saves an enormous amount of time.

Rosaria: Thank you, Christophe and Heather, for the great conversation and for sharing your advice.

Watch the original interview with Christophe Molina & Heather Lambert on YouTube.

--

--

Rosaria Silipo
Low Code for Data Science

Rosaria has been mining data since her master degree, through her doctorate and job positions after that . She is now a data scientist and KNIME evangelist.