Accelerating Machine Learning and AI impact with MLOps on Analytics.gov

Introduction to Analytics.gov

Analytics.gov (AG), developed by the Data Science and Artificial Intelligence Division (DSAID), is a central secured data exploitation platform for Singapore government agencies to support advanced analytics and machine learning use cases in a compliant architecture and secure platform. AG has expanded from an on-premise setup to a cloud-based environment on Government Commercial Cloud (GCC) 2.0 to provide greater scale, performance and elasticity of resources. To-date, AG empowers over 2000 users across 80 agencies by eliminating barriers to leverage advanced analytics, promote collaboration, and scale AI/ML initiatives. The Ministry of Manpower, Ministry of Foreign Affairs, Housing & Development Board and Skillsfuture Singapore — are a few ministries and statutory boards that have onboarded Analytics.gov and actively worked on their AI/ML use cases since its recent release of Amazon SageMaker services. This includes automation of machine learning pipelines, access to expanding selection of generative AI models, and a no-code machine learning tool.

Why MLOps

MLOps, short for Machine Learning Operations, encompasses the practices and tools used to streamline and automate the deployment, monitoring, and management of machine learning models in production. It focuses on the entire machine learning lifecycle, from model development to deployment and ongoing maintenance, with an emphasis on collaboration, reproducibility, and scalability.

In the public sector, there is no doubt that AI/ML technologies have transformed many government agencies in terms of policymaking, service delivery and internal operations — Policymakers can make better sense of complex issues, AI-powered services like chatbots and summarisers can be used, and repetitive tasks can be automated for better work efficiency and productivity. While the demand for integrating AI/ML into core business processes is rapidly growing, there’s still a common gap observed across the Whole-of-Government (WOG) — many AI/ML models in government agencies are stuck at the prototyping stage. Data scientists often lack the know-how for IT deployment, especially within the stricter regulatory context of government operations. Furthermore, once these models are deployed, there’s still a need for continuous monitoring and re-training to ensure optimality of their performances. More often than not, establishing such mechanisms would require a team beyond just Data Scientists comprising other technical experts in the fields of data engineering and development operations (devops for short). This is where the value of MLOps comes into the picture , where it aims to ensure efficient deployment of models, scalability, and reliability in real-world applications, continuous monitoring and governance for model performance, proactive identification of issues and cost optimization through workflow automation. Ultimately, MLOps is about operationalizing machine learning, enabling organizations to effectively manage and derive value from their machine learning initiatives.

AG now introduces end-to-end machine learning operations (MLOps) capabilities to enable agencies to efficiently develop, test and deploy AI/ML models at scale.

Accelerating MLOps adoption for Housing & Development Board

For the rest of the sub-sections in this article, we will be articulating AG’s MLOps capabilities through HDB’s use case as an example which leverages our platform to develop, deploy and manage their machine learning models.

A R-based ML model for resale property valuation was manually refreshed by HDB officers and deployed to an on-premise model server for productionisation. The setup limits the robustness, scalability and agility of machine learning development. With AG, HDB was able to develop their model in GCC 2.0, version each model, and deploy it in a scalable and robust manner, through REST interfaces that are available on the platform in Government Enterprise Network (GEN).

Fig 1: AG Sagemaker Simple Architecture for MLOps

1. Model development on SageMaker Studio via AG

Fig 2: Domains Menu

By using AG, data scientists at HDB tap on SageMaker Studio to scale beyond the compute constraints of individual laptops, accelerating model development and training workflows. The AG environment on SageMaker Studio provides WOG data scientists with access to the latest Python and R libraries, while ensuring compliance to government security protocols. For each team that onboards onto SageMaker Studio, AG provides domain-level segregation of resources and access, allowing for resource consumption to be tracked at team-level and resource access to be separated between teams. A data recovery solution was also implemented with AWS Backup and AWS DataSync, and this data recovery solution ensures quick access to previous SageMaker profiles and file recovery in case of accidental deletions. This integration strengthens the backup strategy for Amazon Elastic File System (EFS) used for storage of AG user files, bolstering resilience and availability for crucial machine learning workloads.

Specifically, HDB teams leverage SageMaker’s R kernel support coupled with AG’s seamless integration with the GEN and WOG Active Directory. This allows HDB’s R developers to conveniently access the studio resources with their existing credentials and build on top of a security tested and centrally maintained platform. Data scientists can focus on rapidly building models with R while benefiting from scaled cloud compute and with undifferentiated heavy lifting of authentication and platform security. To enable interactive development in R, AG provides a custom-built R kernel image that enables data science and machine learning development via an interactive Studio environment.

Fig 3: Sagemaker Studio Notebook Environment

Within the Studio environment, data scientists can easily upload datasets to a managed and team-segregated Amazon EFS directory, run interactive R sessions, integrate with other AWS services like Amazon S3 and S3 access points for potential system-to-system integration, and scale their data science work into full-fledged MLOps workflows. In addition to a custom R environment, AG also offers other pre-built kernels and images for model development in an interactive JupyterLab environment, such as for Python-based development work utilizing frameworks such as PyTorch and HuggingFace, on top of accelerated compute capabilities such as NVIDIA GPUs.

2. Deploying machine learning models on AG

To deliver its models for production use, HDB leverages SageMaker endpoints integrated with Amazon API Gateway for secured access. The SageMaker endpoints provide a scalable hosting environment to efficiently deploy models developed through SageMaker Studio. HDB data scientists successfully packaged their R-based models to be compatible with SageMaker inference using the Rplumber framework, with guidance from the AG team. This packaging demonstrates the wide applicability of SageMaker inference via AG, being able to cater for custom R models that were first developed on other environments. SageMaker endpoints are also able to deploy models from frameworks and libraries other than R, including widely-used Python-based frameworks like PyTorch, HuggingFace transformers and scikit-learn.

Fig 4: API Key Management for API Gateway
Fig 5: API Gateway Model Endpoints Association

The endpoints are then accessed via Amazon API Gateway, centrally provisioned to enforce authenticated access. Instead of managing their own API infrastructure, HDB tapped on AG’s inference service to readily provide an authentication layer before model invocation. By handling authentication, throttling and monitoring of APIs, the Amazon API Gateway integration minimizes overheads for HDB in delivering ML models as services suitable for production use cases. Furthermore, with networking integration via GCC 2.0 Transit Gateway to Amazon API Gateway, SageMaker endpoints deployed on AG can be seamlessly invoked by systems in the GEN. AG models accessed through API Gateway are also secured by AWS Web Application Firewall (WAF), ensuring robust protection against potential web application threats and vulnerabilities. The combination of SageMaker endpoints, Amazon API Gateway, and networking and firewall integrations streamlined deployment of HDB’s machine learning models from development and deployment to consumption, demonstrating the full end-to-end capabilities of AG.

3. End-to-end MLOps workflow within AG

Fig 6: Sagemaker Pipelines

To standardize and scale machine learning development lifecycles, HDB data scientists build their model training and deployment pipelines using SageMaker’s MLOps templates. Previously, HDB data scientists had to manually track model artifacts, model performance, and run ad-hoc R code to train and deploy models. AG’s integration with SageMaker MLOps now provides HDB users with modular, visually-assembled workflows to take models from training to deployment with in-built automation.

HDB users can trigger template model training runs on-demand to rebuild models accessing the latest data or test new algorithms. Furthermore, the templates facilitate systematic tracking of model lineage across pipeline stages. Within SageMaker Studio, HDB data scientists can visually inspect the end-to-end flow, review logs to debug failures during model training and deployment, and customize parameters such as training hyperparameters with ease.

Fig 7: Sagemaker Model Registry

The integrated SageMaker Model Registry then enables centralized tracking of model performance metrics over multiple iterations. HDB leverages these MLOps capabilities to systematically evaluate model candidates before selecting the best performing version to promote to production endpoints.

Fig 8: Project Pipeline with Manual Approve Action

Such MLOps standardization, coupled with on-demand scheduling, enabled HDB to reliably scale up development of machine learning and data science models. Systematic model retraining and managed deployments then help ensure the reliability of the continually improving AI services. By using MLOps templates workflows on AG SageMaker, HDB can accelerate the impact of machine learning and data science for public good.

4. Role-based Project Governance

Fig 9: User Management with Role-based Controls
Fig 10: Data Integrations

AG project administrators are equipped with robust controls at the project level, allowing them to tailor instance type limitations, manage cross-account S3 bucket access, and assign distinct personas roles such as MLOps engineer and Data Scientist. This comprehensive administrative capability ensures precise customization and governance over resource provisioning, data access, and user roles, enhancing the overall flexibility and security of the AG platform for our users.

Fig 11: Request for Larger Instance Types

Instance types used for different functions like SageMaker training jobs, inference endpoints, and notebooks all be tweaked at the project level. Additionally, user personas such as MLOps Engineer and Data Scientist ensures that each role has only the essential permissions for their own specific functions. This approach provides precise control over costs and individual actions within the project structure, allowing each project to tweak to their budget and use case needs.

Value Created for HDB

With full MLOps workflow on GCC 2.0 via AG’s Amazon SageMaker, HDB can save on time and effort for manual model training and deployment, which approximates to potential annual savings of 26 man-days/year.

More importantly, like other user agencies that had onboarded the platform, they do not need to expend additional resources to develop and maintain equivalent MLOps systems like AG — estimated to be another 52 man-days for preventive maintenance and annual upgrade cycles of infrastructure. Furthermore, leveraging AG not only means cost avoidance to HDB but its data science team also enjoys a much shorter lead time to kick start and work on their projects with the platform’s end-to-end functionalities that are already in place.

Start your MLOps journey today with AG

With the advancement of machine learning and AI in the past year, the need for standardization, performance tracking and automation are even more important to accelerate the time-to-impact for machine learning and AI use cases, while ensuring the highest standards of security, reliability and scalability. The partnership between DSAID and AWS unlocks the powerful capabilities of Amazon SageMaker for users across the Singapore Government. By onboarding onto AG, data scientists and AI enthusiasts across the government are enabled with ready-made templates for MLOps, with the ability to build on a centrally managed and secure platform, and equipped with the latest tools and models, including generative AI for text and images. If you are interested in finding out more about AG, please contact the team via ag_helpdesk@tech.gov.sg or visit our product information page on the Singapore Government Developer Portal.

P.S. Shoutout to our co-contributors for this article: Tay Jun Jie (HDB). Bill Cai (AWS), Hansel Ng (AWS)

Fig 12: 🫶 AG Team Photo at AWS 🫶

--

--