Unlocking large hidden values

Published in

scaleout

4 min readJan 29, 2019

Ola Spjuth, AI Lead Scientist at Scaleout, explains in a recent chat the importance of structured data, the benefits of continuous analytics and what the next logical step is for better machine intelligence.

Can you say something about Scaleout’s core values when it comes to AI and Machine Learning?

At Scaleout we believe that in order to be able to efficiently develop and deploy AI/ML models to aid business processes and decision making you need to start with structuring data and take control over the data lifecycle.

This requires a well-developed information infrastructure and control over e.g. data capture, storage, and metadata. Having data well structured and accessible enables the possibility to design and develop a continuous analytics solution that starts with data, comprises preprocessing steps, modeling, versioning of models, and deployment under accessible APIs that enables integration in all kinds of systems.

What machine learning methods do you use at Scaleout?

This largely depends on the data. For high-dimensional data such as images then deep learning such as convolutional neural networks have shown to produce very good results.

However, the majority of data is not huge or multidimensional, and here we apply other types of methods or ensembles of methods. A key focus for us is confidence in predictions, and we realize this in the majority of cases using conformal prediction methodology.

What is conformal prediction?

Traditional machine learning gives a point prediction as output, such as a class label for an object. In short, conformal prediction gives a prediction interval meaning a valid confidence measure for the object you are trying to predict, given a user-defined confidence level.

Traditional performance measures of models commonly rely on cross-validation that offers average estimates of prediction accuracy on the dataset used for training and testing, but says nothing about a particular new predicted object.

In conformal prediction, if a new object that is predicted is very different from the training data — we call it non-conforming — then the prediction interval will be larger than for objects that are more similar to training data. The methodology is based on solid mathematical theory, and the validity of results is theoretically proven.

What hardware, middleware, and frameworks do you use at Scaleout?

We have deep competence from infrastructure and up, meaning we can work with on-prem hardware, virtual resources, and hybrid infrastructures. Cloud-native microservice-based architectures with orchestration frameworks such as Kubernetes/Openshift are important components for us, as it provides a flexible and scalable platform to deploy components.

We mainly use open source components, but can also work with other components as required by our clients. For data processing and modeling we use a variety of frameworks and tools; KubeFlow, TensorFlow, and Pachydermare important but we also use a lot of other toolkits. Some clients are in need of large-scale distributed modeling and then we use Apache Spark a lot.

We are flexible and use the tools needed for the job and do not try to impose a particular product or structure. A key objective in many cases is to work towards and enable continuous analytics.

What is continuous analytics?

If the concept of DevOps was the merge between Development and Operations divisions in a company, then Continuous Analytics is the merge of DevOps with Data Science.

When the same data is accessible on all levels, e.g. from APIs to dashboards, and the entire process is covered from data to deployed models so that models can be continuously updated as data changes and new models can be developed in an agile fashion, tested, and then moved into production — this is continuous analytics.

What are the main benefits of continuous analytics?

Our experience is that the sustainability of deployed models in many cases is heavily underdeveloped, and in many cases particular models are tied to specific people in an organization. This poses unnecessary risks, and one important implication of taking control over and standardizing the data flow including modeling is that model governance is improved and can be decoupled from the model development, commonly performed by data scientists. It also adds fault tolerance and enables monitoring, analytics, and prioritization of model training and predictions.

Is there anything else you want to mention about AI and machine learning at Scaleout?

Today many companies are investing heavily to take control over their data and make use of it to improve processes and predictions. This is the right thing to do, and there is a huge potential for many companies to save a lot of money by being more efficient and make better decisions.

The next step is to combine data and models between organizations without disclosing sensitive information, which has the potential to unlock large hidden values. The idea to make better predictions by implicitly utilizing larger pools of data while not sharing data and preserving data privacy is referred to as Federated Learning.

We at Scaleout are determined to lead this next wave in AI and machine learning.

Scaleout is pioneering federated learning to overcome the data sharing problem, building collective intelligence from distributed data at scale while preserving data privacy and security.