Today I release an alpha version of a pipeline framework I’m excited about, react-pipeline. This framework will allow a developer to build task pipelines utilizing React idioms and JSX. The current implementation is a bit rough around the edges, but works. I have built an example application cleverly entitled react-pipeline-example.
An excerpt from the README.md that explains why I decided to build this:
For the past four years I have worked primarily on big data projects utilizing various technologies like Hadoop, Pig, Hive, Spark, etc. In all of these projects I’ve needed to execute a number of tasks in order to reach my project goal. An example would be resolving a user profile’s city via their postal code; join that data to a larger profile data set; wrangle the resolved data set into a structure I could use for analysis; then run some algorithms on that data for the final result. Each step in the pipeline may utilize a different technology; resolving cities from postal codes may require an application written in Python or Node to fetch information from Google’s Geocoding API; joining and wrangling the data may utilize Pig on Hadoop; and the final analysis may utilize Spark.
In the past I have used Luigi, AWS Data Pipeline, and custom pipeline code to string these tasks together. On one fateful day I was working on a React project while some of my data analysis tasks were running when I realized that describing a pipeline using JSX and executing those tasks under React+Redux would be intuitive and would allow me to easily write tasks involving server code in Node.js. Thus react-pipeline was born.