A Public Sector AI Sandbox: Notes

Published in

STREAM-ZERO

7 min readApr 29, 2024

Recently I received a call for projects for the City of Zurich’s AI Sandbox. The AI Sandbox aims to foster public-private partnerships. Do visit their site, I think it is an exemplary approach to how public administration can work with start-ups, business and academia : https://www.zh.ch/en/wirtschaft-arbeit/wirtschaftsstandort/innovation-sandbox.html

I took the idea of the Sandbox a bit literally(the projects up to now are very focused on specific aspects) and ran through some ideas and submitted this one, of course based on StreamZero’s AiFLOW since we have many of the core elements either as a part of the StreamZero deployments or the work we have been doing for Swiss Fintech in the past years.

Nonetheless I think the AI Sandbox can be built with other tools too and have started to sketch out a white-paper/blueprint for ANY government body to implement. While the overall technical details and blueprint are important I think the value of such a project would be to explore the regulatory and governance issues as well as to define a implementation blueprint for key processes such as data delivery and permissions management for data as well as model access — effectively materialising the processes and guardrails through software.

The challenges public administration faces in particular related to data delivery and usage restrictions are outlined in this article by Stefanie Volz of the Zurich AI Sandbox team: https://www.zh.ch/content/dam/zhweb/bilder-dokumente/themen/wirtschaft-arbeit/wirtschaftsstandort/dokumente/ki-sandbox-wissenschaftlicher-beitrag.pdf

Definition

A public sector AI Sandbox is a virtual environment designed to facilitate the development, testing, and deployment of AI applications within government and public organisations.

It includes a comprehensive datastore for managing and accessing relevant datasets, a library of pre-built models that can be customised or used as a foundation for new solutions, and an integrated application development and execution environment that supports a wide range of programming languages and tools.

This sandbox provides a secure and controlled setting where public sector entities can innovate with AI technologies, ensuring that new applications are robust, compliant, and well-suited to serve public needs before they are fully implemented.

Goals

The following are the key goals

Foster Collaboration Between Public and Private Sectors in AI: Encourage joint ventures and cooperative initiatives between public and private entities to enhance AI innovation and application.
Provide a User-Friendly Environment for Public Sector Employees: Establish an accessible platform where public sector employees can engage with AI solutions, offering their insights and feedback to refine these technologies.
Educate and Nurture Young Talent: Commit to the development of emerging talent through educational programs and opportunities that foster expertise in AI and related technologies.
Reduce Friction in Solution Development: Streamline the process of building AI solutions by reducing bureaucratic and technical obstacles, thus accelerating development timelines.
Ensure a Secure Testing Environment: Offer a robust and secure environment for rigorous testing of AI technologies, ensuring they meet stringent security and compliance standards before deployment.
Support Standardisation of APIs and Integration Patterns: Aid in the design and implementation of standardised APIs, integration patterns, access protocols, governance frameworks, and delivery mechanisms to ensure compatibility and interoperability across systems.
Lay the Groundwork for Production-Ready Solutions: Provide the necessary foundation and resources to transition AI solutions from the experimental phase to full production, ensuring scalability and stability.
Enhance Academic Collaborations: Create opportunities for academic institutions to contribute to and benefit from the AI sandbox, fostering a cycle of innovation and practical application.

Benefits

These benefits make the AI sandbox an invaluable tool for developing, testing, and refining AI-driven solutions in a supportive and efficient environment.

Innovation Acceleration: Provides a space where new ideas can be tested quickly and safely, speeding up the innovation cycle.
Risk Reduction: Allows for thorough testing and refinement of AI applications in a controlled environment before they are deployed in real-world settings, minimising potential disruptions.
Collaboration Enhancement: Facilitates easier collaboration across different sectors, including public, private, and academic institutions, fostering a multidisciplinary approach to problem-solving.
Resource Efficiency: Enables more efficient use of resources by providing shared tools, data, and infrastructure, reducing the need for individual entities to invest in expensive technology setups.
Skill Development: Acts as a training ground for public sector employees and students, helping them to develop and enhance their AI skills in a practical, hands-on setting.
Standardisation Support: Helps in creating and enforcing standard practices for API usage, data integration, and security measures, leading to more consistent and interoperable technology deployments.
Enhanced Security: Offers a secure environment to experiment with sensitive data and AI models, ensuring compliance with data protection regulations and minimising exposure to security risks.
Scalability Testing: Provides tools and environments to test the scalability of AI solutions, ensuring they can handle increased loads and adapt to growing demands.
Public Trust and Transparency: Helps build public trust in AI technologies by demonstrating their effectiveness and safety in a transparent manner.
Access to State-of-the-Art Technology: Gives public sector entities access to the latest AI technologies and methodologies, which may otherwise be inaccessible due to cost or complexity.

Stakeholders

The following are the key stakeholders.

Public Sector: Data Management and Delivery Team: Responsible for ensuring timely and secure delivery of data required for AI applications.

Public Sector: End Users: The primary beneficiaries of the AI applications, providing feedback and insights to improve usability and functionality.

Public Sector: Project Support Collaborators: Support teams that facilitate project logistics, coordination, and resource management to ensure project success.

Private Sector: Technology Providers: Supply advanced AI models and develop applications, bringing technical expertise and innovations to the partnership.

Academia: Researchers and Students: Engage with the data to conduct research, test theories, and contribute to the development of new AI models and applications.

Regulatory Bodies: Oversee compliance with laws and regulations, ensuring that AI applications adhere to ethical standards and privacy laws.

IT Infrastructure Teams: Manage and maintain the computing environments, ensuring robust, scalable, and secure infrastructure for hosting AI applications.

External Consultants and Experts: Provide specialised knowledge and advisory services to enhance project outcomes and strategic direction.

The Sandbox Environment

The sandbox is ideally implemented in a private cloud or on-prem which is located within the location of the public sector entity. This environment is a Kubernetes installation on which the individual components are layered. The environment is air-gapped to ensure data cannot be sent out of the cluster.

A Minimal Sandbox

A basic AI sandbox configuration is outlined below. The minimal Sandbox is limited to hosting data as files along with an extendible library of models —application development and execution is limited to notebook environments along with a suitable data access governance concept implemented using a combination of S3 Policies and Keycloak.

Model Library or Registry alongside an execution environment. StreamZero AiFlow enhances this setup with a Model Store, hosting over 1,200 open-source models readily available for use and expansion with custom models. Additionally, it supports on-demand model launching capabilities.

S3 Object Storage: We employ Minio for our storage needs, although other S3-compatible implementations are also compatible. All data is securely stored in S3, with access and permissions meticulously managed via S3 Policies. It’s important to note that no traditional database systems are provided.

JupyterHub: We provide JupyterHub as a dynamic workspace for participants, enabling them to develop and demonstrate solutions through notebooks.

Keycloak: For identity and access management (IAM), we use Keycloak. It is chosen for its comprehensive security features and ease of integration, ensuring robust access control across our platform.

Advanced Sandbox

In the advanced configuration of our AI sandbox, we incorporate additional sophisticated components to enhance data handling and analysis capabilities. We introduce apache Kafka for sourcing Streaming Data, an Analytics DB capable of working with files on S3 and an automation environment (StreamZero AiFlow) for handling diverse aspects such as data ingestion and ETL, 24x7 event driven processing and container launches.

Apache Kafka: is integrated for managing streaming data, facilitating real-time data processing and responsiveness in our applications.

StreamZero Event-driven Automation Platform: Alongside Kafka, we deploy the StreamZero platform which leverages the capabilities of real-time streaming to automate workflows and improve operational efficiency.

Analytics DB: For analytical purposes, we include a high-performance analytics database. Options such as Trino, ClickHouse, and Apache Doris are supported, with a preference for Apache Doris due to its optimised query engine that delivers accelerated data insights.

Data Governance and Discovery Tools: The AI Sandbox includes several advanced tools for data governance and metadata management, specifically Apache Ranger, Hive Metastore, and Apache Atlas. These tools are integral components of the Hadoop ecosystem but are equally relevant to modern solutions, boasting broad community support. They are chosen primarily for their ability to operate independently of external cloud services, ensuring enhanced control and security over data governance within the sandbox environment. Further detailing and integration strategies for these tools will be outlined to ensure optimal functionality and compliance with data governance standards.

Conclusion

What you see above is a first sketch of the AI Sandbox done over a weekend. I hope to expand this to a full scale white paper time permitting. If you wish to contribute and collaborate please drop a comment.