Smooks processing recipies
In one of our customer projects we had a requirement to import CSV, fixed length and Excel files in different formats and store records in the database. We chose Smooks to accomplish this task.
Smooks is a Java framework to read, process and transform data from various sources (CSV, fixed length, XML, EDI, …) to various destinations (XML, Java objects, database). It convinced me because:
- it brings out-of-the-box components to read CSV and fixed length files
- it integrates smoothly with an ORM library (Hibernate, JPA)
- processing is configured using an XML configuration file — you need only few lines of code to do the transformations
- extensibility — implementing a custom Excel reader was relatively easy
- low added filtering overhead — reading 100.000 CSV lines and storing them in the database using Hibernate took us less than 30 seconds
During the development we had to overcome some hurdles imposed by Smooks processing model. In this post I would like to share our practical experience we gained working with Smooks. First, I’m going to present a sample transformation use case with requirements similar to a real-world assignment. Then I will present solutions to these requirements in a ‘how-to’ style.
We are developing a ticketing application. The heart of your application is
We have to write an import and conversion module for an external ticketing system. Data comes in the CSV format (for the sake of simplicity). The domain model of the external system is slightly different than ours; however, issues coming from the external issue tracker can be mapped to our Issues.
External system exchange format defines the following fields: description, priority, reporter, assignee, createdDate, createdTime, updatedDate, updatedTime. They should be mapped to our
Issue in the following manner:
- description property — description field
- project property — there is no project field.
Projectshould be assigned manually
- priority property — priority field; P1 and P2 priorities should be mapped to
Priority.LOW, P3 to
Priority.MEDIUM, P4 and P5 to
- involvedPersons property — reporter field plus assignee field if not empty (append assignee using ‘;’ separator)
- created property — merge createdDate and createdTime fields
- updated property — merge updatedDate and updatedTime fields