Smooks processing recipies

Introduction

Paul Parsons
The Server Labs
2 min readDec 17, 2010

--

In one of our customer projects we had a requirement to import CSV, fixed length and Excel files in different formats and store records in the database. We chose Smooks to accomplish this task.

Smooks is a Java framework to read, process and transform data from various sources (CSV, fixed length, XML, EDI, …) to various destinations (XML, Java objects, database). It convinced me because:

  • it brings out-of-the-box components to read CSV and fixed length files
  • it integrates smoothly with an ORM library (Hibernate, JPA)
  • processing is configured using an XML configuration file — you need only few lines of code to do the transformations
  • extensibility — implementing a custom Excel reader was relatively easy
  • low added filtering overhead — reading 100.000 CSV lines and storing them in the database using Hibernate took us less than 30 seconds

During the development we had to overcome some hurdles imposed by Smooks processing model. In this post I would like to share our practical experience we gained working with Smooks. First, I’m going to present a sample transformation use case with requirements similar to a real-world assignment. Then I will present solutions to these requirements in a ‘how-to’ style.

Use case

We are developing a ticketing application. The heart of your application is Issue class:

We have to write an import and conversion module for an external ticketing system. Data comes in the CSV format (for the sake of simplicity). The domain model of the external system is slightly different than ours; however, issues coming from the external issue tracker can be mapped to our Issues.

External system exchange format defines the following fields: description, priority, reporter, assignee, createdDate, createdTime, updatedDate, updatedTime. They should be mapped to our Issue in the following manner:

  • description property — description field
  • project property — there is no project field. Project should be assigned manually
  • priority property — priority field; P1 and P2 priorities should be mapped to Priority.LOW, P3 to Priority.MEDIUM, P4 and P5 to Priority.HIGH
  • involvedPersons property — reporter field plus assignee field if not empty (append assignee using ‘;’ separator)
  • created property — merge createdDate and createdTime fields
  • updated property — merge updatedDate and updatedTime fields

--

--