Integrate quickly and simply from all different data sources

Published in

ManoMano Tech team

5 min readJul 12, 2022

Integrate quickly and simply from all different data sources

Every company is facing integration challenges, by collecting data from different sources such as files, api, queues, etc. Searching for a simple integration framework to do this? Apache Camel will help you to do it!
Apache Camel home page.

This article, mainly addressed to backend developer but also for all curious who want a solution to manage a lot of differents sources for an easy integration, I will try to introduce you to Apache Camel showing some basic examples and then help you to successfully implement this project.

Apache Camel demo project

The demo project will just show you different ways to consume data but also will show you that with Apache Camel the ease and speed of coding this.

In this project you will find :

File consumption with differents formats (csv, json)
Api consumption
Processes which will do like business code
Store data into a file at the good format

Some examples and explanations of basics method and architecture

Reminder : keep in mind that the aim is to overview some functionalities, so even if you believe that’s not clever to do it this way, it’s normal. The little project is available here.

You have to customize the application.yml with your own properties, and if you get some questions don’t hesitate to add a comment on the Gitlab.

So first let’s talk about the context :

And here the file system dedicated to this project

In the csv folder I have just one file which contains 3 rows with a seller name and product id for each of them. In the json folder I have two files, one which represents a v1 with seller and product_id fields and another one which represents a v2 with seller_name and product_id fields.

Here a diagram to show the full process of this demo project

The project is just to compare for a product id given into input files and get the same seller name into these files and these coming from the channel management API.

We will proceed step by step so starting with consuming files part to end with the formatting output process.

Step 1 : Consuming files (InputRoute.class)

Here you get the code to consume files from the two folders (csv and json)

The from method is to declare your source, and like this it will listen to any events from this source and here when a file is present into this given folder he will automatically process it.

The choice method could allow you to declare that you will get different possibilities, here it’s to say that if we found a “_V2” into the file name we will process it differently.

In case we get V1 we will just log it and let jsonProcessor do its job, whereas if it contains V2 we will unmarsall it before and call another processor.

At the end we will send it in all cases to the next route using the to method.

The second route contains a split method cause here for example we want just to treat element by element. So the file will be processed and splitted. The streaming method is here just to send to the next route split after split whereas the split method has to split everything to know the size first.

A processor in Apache Camel is an interface which allows you to consume the exchange and to let you do custom code. Let’s see the csvProcessor to understand it.

Here you get the code which will receive csv file and extract data to push it to the next step

The class has to implement the Processor interface and your entry point will be the process method.

Here we are just parsing the CSV file to put in a map(sellerName, productId) data we found and put in the body of the exchange.

An exchange is the main object which contains a body which is the payload of your “event” and headers too. The body type depends on your from and here it’s a File for example, but could be a simple String or whatever.

Step 2 : Calling API and merging data (Merge.class)

Here you get the code which will receive as body a map of a seller name and product id.

The architecture is the same starting with a from and will proceed in the processor of the rest api call to channel management endpoint with the given product id. In the output of this route we will get an object which will contain the seller name from input files, the product id and a boolean to let us know if the name was good or not.

Step 3 : Formatting output

Here you get the code which will compute the file name of the output file and store it in the correct output directory.

Conclusion

Even if the example is not a complex one, the framework Apache Camel is really simple, and allows you to do really all things you need by using the framework’s methods or by using a custom processor. Its simplicity is a really strong point for the maintainability but also for features which will allow for example to do a double run easily.

I used this framework in my last 2 jobs. One to work on the application @CTES (https://www.collectivites-locales.gouv.fr/institutions/ctes-dematerialisation-de-la-transmission-des-actes) which is government one allowing “Dematerialisation of the transmission of legal documents”. To make it scalable and quick to be able to consume all the flux in a good time, we chose Apache Camel and this was a successful choice.

The second time I used it was when we had to process more than 1 million pdf bills to extract data and store it in a database in less than 1 day, and also successfully did the job.

Bonus points with hawtio

It is difficult to do a better presentation of hawtio than a quick tutorial video. In a few words, hawtio will give you the opportunity to monitor by an automatic discovery all your routes, to get a lot of metrics, pre-configured dashboards and a really useful graphic part. Hawtio / Hawtio example

We ❤️ learning and sharing

If you’d like to get in touch on any of the subjects above or about QA in general, I’m always reachable through my LinkedIn profile. Drop me a line! Whether you had a similar or totally different experience, I’d love to hear about it.

Oh, and by the way: we are hiring in France and Spain.

Integrate quickly and simply from all different data sources

Written by Robin Lamberte