Stream Processing for Enterprise IT:

Should I Build or Should I Buy?

The easiest way to move from batch processing to stream processing is to call an outside IT vendor such as Microsoft (Azure), SAP (Event Stream Processor) or IBM, tell them to take care of everything, and write them a check. But you’d better make sure you’re sitting down when you write that check, because the amount of money these companies will need to build your stream processing infrastructure with their proprietary software and systems might make your knees buckle.

Save Money With Free and Open Source Software

Illustration courtesy of Grid Dynamics

The alternative to big-money proprietary stream processing software is, of course, Free and Open Source Software. The Apache Software Foundation besides sponsoring the renowned Apache HTTP Server, which is unquestionably the number one Web server in the world, also sponsors dozens of other projects, including everything needed for enterprise-level Big Data manipulation and In-Stream Processing.

The cost of these many fine, well-tested software packages is typically zero dollars (or yen, rubles, Euros, etc.). Plus, you have the right to customize and improve any Apache-licensed program any way you like, but you are expected to share your modifications — at least your bugfixes — with Apache so that others can use them, too. This is, in effect, is your licensing fee. You pay yours, others pay theirs, and many companies kick in a little financial support to keep The Apache Foundation itself going, which is what keeps the total cost of obtaining software this way tiny compared to the cost of buying commercial equivalents.

The Do-It-Yourself Aspect of Open Source Software

When you’re ready to set up your shiny-new real-time stream processing system to extract business-usable information from your company’s kilometer-wide river of sales, purchasing, and operating expense data, who are you going to call? Not Ghostbusters, and not the Apache people. You’re basically on your own — unless you hire some of the people who develop the software you plan to use. There is nothing wrong with this. It is often a great way to get finely-tuned, customized applications that give you better performance and reliability than you can possibly get from standard COTS (Commercial Off the Shelf) software.

But what if you need to integrate a number of different packages to achieve your desired results? Developers who specialize in one project aren’t going to do you much good. Suddenly, you have another choice to make: Should you do it yourself or hire a consulting company to help?

There is a good, free tool to help you make that decision. Before I mention it, though, I need to disclose that I am a contract editor for Grid Dynamics, the company that produces this tool — and that it is <i>not</i> necessarily intended to help you decide if your IT staff can set up In-Stream Processing without help from an outside consultant, even though it is well-suited for this purpose . Now, with that out of the way…

The In-Stream Processing Blueprint

Illustration courtesy of Grid Dynamics

The tool is an In-Stream Processing Service Blueprint that is pre-integrated, uses nothing but well-supported free and open source software, and is 100% production-ready. You can use it with any Big Data platform, and extend or modify it in any way you like.

By using this Blueprint, you can do a rapid, low-cost test of how In-Stream Processing can help you gain new, more timely insights into your company’s operations and, at the same time, help you evaluate your staff’s ability to set up, customize, and expand a real-time system without help from outsiders.

Don’t forget: no matter how much you pay an outside company to help you set up In-Stream Processing using free, open source tools, it is going to be less (often orders of magnitude less) than you’d pay for a proprietary stream processing solution. You also avoid vendor lock-in, something an IT manager friend has said, “can lead to early hair loss, poor sleep, and unexpectedly blown budgets.” But is this necessarily the best route? It is also possible that your in-house people can move you into the real-time processing world without help. Then again…

Decisions, decisions, decisions

Real-time processing (or In-Stream Processing) is an obvious choice for any enterprise more modern than a horse and carriage livery stable. But as I have just shown, once you make that choice you have several others to make before you can implement In-Stream Processing, especially if you want to do it in the fastest and most cost-effective method possible. Sure, you can hire consultants to evaluate your unique business situation and recommend one solution above all others, and maybe that’s a good first step. Still, working with the Grid Dynamics In-Stream Processing Blueprint or something similar to it can give you real-life experience deploying and operating a real-time data processing environment at a much lower cost than setting up a full-custom, pilot open source-based system from scratch or hiring a proprietary vendor to do a pilot project — which is why I so heartily recommend the Blueprint method of getting your (figurative) IT feet wet with stream processing before you and your company dive into the deep end of the real-time processing pool.

Robin “Roblimo” Miller is a freelance writer and former editor-in-chief at Open Source Technology Group, the company that owned SourceForge, freshmeat, Linux.com, NewsForge, ThinkGeek and Slashdot. Now he’s mostly retired, but still works part-time as an editorial consultant for Grid Dynamics, and writes regularly for FOSS Force.