Under the hood of machine learning

The idea of “rational” machines has always been part of the human imagination. As the field of artificial intelligence (AI) advanced, computational tools became more sophisticated, and specific applications of AI, such as machine learning, evolved.

Machine learning revolves around the idea that we should be able to give machines access to data and let them learn for themselves. It arose within the interesting confluence of emergence of big data, cheap and powerful computational processing, and more efficient data storage. From banking to health care to retail, machine learning is revolutionizing the way we do business. Whether it is used for detecting fraud, identifying patterns in trading, or recommending a new product based on real-time information processing, the potential for this burgeoning field is vast.

Many industries recognize that real-time insights into big data make them more efficient and differentiate them from their competitors. And ignoring the data carries a hefty price tag: PayPal reported losing $10 million a month to hackers until they implemented machine learning to detect fraudulent patterns. It is no surprise that machine learning is on the top of every IT department’s priority list for long-term investment.

IT departments quickly realized, however, that while machine learning as a field has exploded, the landscape of tools, technology, and infrastructure to power these applications is confusing and fragmented. It is not easy to manage all the servers and connect services in a way that can be scaled when needed.

Companies that want to deliver new services with data insights often find it difficult to capture and process their big data. For instance, the machine learning tools must integrate easily with the software platforms that support existing business processes, users, and diverse projects. The tools must also interface with many different data platforms and handle structured, semi-structured, and unstructured data. Lastly, the tools must integrate with the company’s preferred technology stack.

In the past few years alone, a plethora of tools has emerged to facilitate machine learning, including a broad set of container and big data technologies, such as distributed databases, message queues, and real-time analytics engines. Analysts might require access to Hadoop for batch processing analytics, Spark for processing data in real time, Kafka for near real-time messaging, and Cassandra as a fast, scalable data store for high-volume web applications.

Each of these systems and services is complicated in its own right. And within each category, there are many options: various solutions and features, each with their own merits and suited for a different purpose. Yet, all of the technologies involved must be able to work together and cooperate when needed.

Posted on 7wData.be.