Published in


OmniML launches Omnimizer, a platform that re-imagines MLOps for edge AI

Omnimizer unleashes the true potential of edge hardware by adapting computationally intensive AI applications while reducing the time and cost of development and deployment

Edge-based artificial intelligence (AI) is improving our lives in many ways, making it safer, providing greater privacy, and making our lives more convenient. However, the true potential of AI has been hindered by the high cost and long development time due to the mismatch between machine learning (ML) models and resource-constrained edge hardware — until now.

Today, OmniML has released Omnimizer™, a platform that simplifies and accelerates edge AI development and deployment to help enterprises of all sizes save cost and get to production faster.

One common pain point many customers face today is the interdependence between the machine learning development and the deployment team, which can result in many iterative cycles just to get a model running on an edge device. These iterative hindrances are caused by inefficient and siloed toolkits that can only handle one specific function instead of the whole workflow, a lack of model deployment familiarity among ML engineers, and no easy-to-access to environments to profile and prototype ML models on different devices.

Figure 1: Omnimizer workflow

As illustrated in Figure 1 above, the ML engineer developing the model needs to work with and wait for another team to deploy the model on the target hardware before knowing on-device model feasibility and inference performance. ML engineers find it challenging to simultaneously focus on model training and inference performance. In many cases, the model requires multiple design iterations as it was not purposely designed for edge deployments. To add to the problem, many companies struggle to find talent who is proficient in both ML algorithms and hardware, which makes getting AI into production at the edge slow and inefficient.

Omnimizer solves these pain points by adapting the model with a ‘hardware-in-the-loop’ approach that unifies the training and inference workflows, as illustrated on the right side of Figure 1. A hosted device farm produces quick measurements of model inference performance in a simulated production environment and model adaptation services that can help ML engineers automatically tune the model for deployment.

Figure 2: Omnimizer augments model compression by adapting the model

A key technology of Omnimizer is to make the ML model “hardware-aware”. As illustrated in Figure 2, unlike existing approaches that compress the model, Omnimizer auto-adapts and enable self-service white box optimization to further modify ML models to better fit the target hardware. As a result, the deployed model can have significant speedup and memory reduction while keeping or even improving upon accuracy. Omnimizer expands upon the core technology from the founding team who pioneered “deep compression” and elevated neural architecture search (NAS) into the mainstream.

Omnimizer empowers ML engineers to focus on model design and training without worrying about the hardware. It gives ML engineers the ability to ensure that models are optimized for the target hardware device. It is a self-service white box approach that always keeps the developer informed during every step.

A maker of smart cameras, for example, uses Omnimizer to significantly reduce the complexity of designing a model, drastically reducing deployment time while achieving superb inference performance on its edge devices.

Figure 3: Omnimizer overview

Omnimizer supports most PyTorch models, including both open-source and custom ML models, out of the box independent of the application.

It contains two main components:

  1. Omnimizer Core Services (omnimizer.nas) to adapt, compress and optimize a model, including:
  • AutoNAS: A general and automated neural architecture search function that enables elastic model conversion and the search for the best neural architecture from the provided base model.
  • FastNAS: A lightweight neural architecture search function that does not require additional training. It operates in a reduced search space compared to AutoNAS with greater efficiency and robustness.

2. OmniEngine Cloud Services (omnimizer.engine) to profile, diagnose, and then deploy a model on the target hardware platform

  • Profile: Examine and quantify model performance across a variety of hardware platforms and inference devices.
  • Deploy: Compile and export model to be runnable on major hardware platforms.

Below we use an example to show how Omnimizer adapts a segmentation model for a mobile phone using the AutoNAS workflow.

Step 1: Setup & Profile
After installation, the ML engineer can easily profile the original PyTorch model’s performance via omnimizer.engine’s application programming interface (APIs) on the target device and get a baseline latency and accuracy.

Step 2: Diagnose & Evaluate
Next, the baseline model can be diagnosed through omnimizer.nas APIs and adapted to achieve better utilization on the target hardware. In this example, a bottleneck layer is identified by Omnimizer and replaced automatically with a more hardware-friendly version. This can be further modified by ML engineers via Omnimzier’s self-service features to better optimize the model. The entire adaptation process is a white box to the developer, which means they are informed about every action and always have full control.

Step 3: Train & Optimize
Then, the adapted model goes through the “omnimize” process, which makes it “elastic” and able to be shrunk or expanded. After the model is made elastic, the user can train the entire search space of all the possible shrinking/expanding configurations at the same time using only 2~3x of the original training epochs.

Step 4: Search & Save
In step 4, the best subnet model sampled from the search space is selected based on the given constraints such as target latency. This best model can be saved and restored for downstream tasks like fine-tuning and inference.

Step 5: Deploy & Verify
Finally, the searched model can be compiled and deployed to compare and verify the on-device metrics against the baseline. In this example, the latency can be improved by 9x without any drop in accuracy, demonstrating the power of adapting and optimizing the model using Omnimizer.

Omnimizer customers are leveraging the platform to optimize their ML models for applications in autonomous vehicles, robotics, IoT, and mobile devices. Omnimizer is also working on proof-of-concept opportunities in industrial automation, smart appliances, and pharmaceuticals, among other industries. Omnimizer adapts ML models for different hardware and unified workstreams to help cut down on back-and-forth iterations between teams which reduces the costly and iterative cycles in today’s machine learning operations (MLOps) for edge deployments. Omnimizer is the main software platform that is at the heart of OmniML’s mission to bring the benefits of AI to everyone.

Stay tuned as we will be publishing more updates on Omnimizer! Follow us on LinkedIn, Medium, and Twitter. Contact us for early access to Omnimizer:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
OmniML Inc.

OmniML Inc.

OmniML is an enterprise artificial intelligence (AI) company that aims to effortlessly empower edge AI everywhere.