Brief Comparison of Existing ETL Software

Outplay Entertainment has always gathered data to inform to game development process. We are taking this philosophy of informing decisions based on data right down to the level of improving the enjoyment of each and every game players experience with Outplays game offerings. To do this we are expanding our capabilities to gather data, transform that data to inform us how best to make our games the most fun and engaging that they can be. To help us we are reviewing existing tooling for E.T.L. processing on the market place and in the open source arena, below is a brief run down of main features of a broad sub-section of the tools reviewed.

Table 1 below, summarises various software available for Extract, Transform and Load processing commonly known as E.T.L. It focusing on tools which, where possible will provide cloud based end-to-end functionality but also includes some desktop based processing tools have been included for a broad view of the tool sets available. Table 2 is an overview of main functionality of the different products available.

Table 1: ETL Software.

Software — Supplier — Model [1] — Link

ETL Tools — DB Software Laboratory Ltd — Commercial — https://www.etl-tools.com/

KNIME — KNIME Cloud Server — Open source — https://www.knime.com/

Databricks — Databricks — Commercial — https://databricks.com/

Alteryx — Alteryx — Commercial — https://www.alteryx.com/

Talend — Talend — Open source — https://www.talend.com/

Pentaho — Hitachi — Open source — http://www.pentaho.com/

CloverETL — CloverETL — Open source — http://www.cloveretl.com/

Apache Airflow — AirBnb — Open source — https://airflow.incubator.apache.org/

Table 2: ETL Software Summary

Name — Platform — Cost — Core functionality

KNIME

Cloud Server Azure

Cost of infrastructure

-Share workflows, data, and meta-nodes with colleagues.

-Offload computationally intensive tasks to dedicated hardware.

-Schedule tasks to run automatically.

-Server administration management.

-Distribute analytics generated with KNIME Analytics Platform

-Make KNIME Analytics Platform available to end-users (Analytics) with KNIME WebPortal.

ETL Tools

Windows Desktop — Advanced ETL Processor Ent

Site License available,

Customer Loyalty Programs available

- Processing of unstructured data, complex transformations and validation required

- Import and Export data functionality for Desktop users

- Processing automation

- Process scheduling

- Report creation and automated distribution

- Process logging

Databricks

Cloud infrastructure

SaaS model

Notebooks

-Supported Languages, Scala, Python, Spark SQL, R, and Markdown

-Collaboration, in workbooks and github integration

-Visualisations, embedded including d3, ggplot, matplotlib

-Documentation, Markdown, HTML, CSS & Javascript

-Libraries, Python, Java, Scala, R and more.

-and more

Infrastructure

-Compute Environments

-Performance Optimisations

-Cluster Resource Management

-Job Management

-Security

Integration

-BI Integrations, Pentaho, Qlik, Tableau, Jaspersoft, and more

-Data source Integrations, SQL stores, NoSQL stores, File stores, Search engines

-REST-Based API for management

Alteryx

Alteryx Server for Scaled Analytics

Enterprise licensing available

-APIs and macros for integration with internal and external applications

-Supports may sources including spreadsheets, data stores, AWS and Salesforce

-Supports data blending and cleaning

-Supports small to massive scale data

-Drag drop interface

-Sharing, including public gallery

-Integrations with Microsoft Power BI, Qlik and Tableau

-Packaged workflow publish and sharing

Talend

Platform Edition

Enterprise licensing available, Open Core model**

-Data integration

-Enterprise scale data capable

-Job creation, management and monitoring

-Data profiling and cleaning

-Fully managed service available

Pentaho

Enterprise platform

Enterprise licensing available

-Ease of use and quick learning curve

-Data integration

-Supports data modelling creation

-Job creation management and monitoring

-Simple web server for running and monitoring jobs

-Dedicated support for Enterprise customers, including training

CloverETL

CloverETL Server Cluster

Enterprise licensing available

-Java based cross platform

-Parallel data transformation execution

-Visual data transformation designer

-Customisation and extension capable

-Multi user interaction

-Move, combine and transform data from any source including binary

Apache Airflow

Cloud infrastructure

Cost of infrastructure

-Workflow creation, including task chaining

-Coded workflows allowing versioning,testing and monitoring of workflows as well as job detail

-Management tool for transformations, including workflow, job management and detailed job monitoring

* Details taken from individual company websites correct at date of writing 2017–10–21

** Open Core Model — core functionality available via as open source with additional services, support available under commercial licensing

Comment on the summary

The services provided by the ETL Tools described here are wide ranging and offer a variety of approaches including Desktop through to SAAS only, Open source to completely commercial. Deciding on a tool to suit your needs may be based of technology already employed within the organisation as well as functionality, flexibility for future grow and cost implications.

There are many other ETL tools not included in the comparison for more detail and further products checkout ETL Enterprise Data Integration

Outplay (Entertainment) background

Outplay Entertainment is the largest mobile games studio in the UK. Our BAFTA nominated games are enjoyed by hundreds of millions of players around the world and have been featured multiple times as Apple’s Editor’s Choice. Most recently the studio has been nominated in various categories for the 2017 TIGA Awards.

We are always looking for people — come join us on our journey! If you’re interested please get in touch here.

References

[1] ETL Enterprise Data Integration