Deploying impactful algorithms on Daisi

JM Laigle
Daisi Technology
Published in
10 min readAug 17, 2022

Daisi (app.daisi.io) is the go-to community platform for Python Cloud functions and apps, deployed in a serverless fashion. At Daisi, we are on a mission to advance the deployment of incredibly impactful and game-changing algorithms or ML models. We want them to be readily actionable so everyone can start using any of them with simply one line of code.

See our introductory Medium post Daisi — Python cloud functions for scientists and engineers.

Daisi is free to use, so anyone can make an impact by deploying code or reusing existing services. Daisi is also much more than a tool for developers : it is a place where communities gather and collaborate, where anyone can come to inspire and get inspired.

This blog post shares some thoughts and examples on how to create truly impactful Daisies. I hope it will be useful to the many Daisi creators who are joining our platform each day.

The need for deployed code with actionable APIs

Whereas online code repositories like GitHub have transformed how the world build software by sharing code, the code there is mostly static and its deployment is often not trivial. The consequence is that there is a trove of incredibly useful code which doesn’t deliver all the value that it could.

Daisi solves this problem by making deployment a breeze. You don’t need to write any extra code for deployment, since environment building, endpoints creation etc… is entirely abstracted and automated by the Daisi platform (see Create a Daisi — The fastest way to deploy any Python code (including a full app !))

When deployment is a no-brainer, the community can start making code available to the world in a “ready-to-consume” fashion.

Every Daisi service is one call away, and they are all callable with a same clean interface, from any environment where it is possible to run HTTP requests. From Python, it is even easier with the pydaisi package:

When you go to the Daisi Platform, you have immediately access to a large catalog of Daisies created by the community. Some of them might be immediately useful for your needs, have been validated by other users and you can start calling their endpoints confidently, as easily as if you were calling a local Python function.

Homepage of the Daisi platform app.daisi.io

Similarly, when you create a Daisi, keep in mind how others might want to use it and what is the best design for the functions that will be turned into endpoints.

Here below are a couple of use cases illustrating how Daisies are already delivering value:

Stitch the Web together by creating incredibly useful web services on the fly

Nothing is easier than turning your Python code into a web service on the Daisi platform. Write the logic of the service as a Python function, and simply link the Githup repo to the platform. Your function is now a web service. And as long as your Daisi will be called from a Python code, you don’t need to consider data serialization since you will be able to send and receive arbitrary Python objects.

So if you need to feed a dashboard, make different services communicate with each other, pull data from a database, add an extra logic to your system or automate a series of tasks, Daisi is the go-to place.

Many Python packages are available today to interface with online services and it is straight forward to build a function that you can run locally and which will pull online data. Deploying this function is one click away with Daisi.

See this Daisi as an example of how easy it is to create a live, continuously running service interfacing with the Ethereum blockchain to get the gas price, the latest block, or an account balance, in real time.

Making ML and science code readily actionable

The research community publishes every day impactful innovations and its accompanying code. Going one step further and deploying this code as Daisies make it readily actionable by everyone, for testing, understanding, validation and immediate applications. A code deployed as a Daisi with a clear and stable API guarantees that the community can continue building on it, and experimenting, and gives the assurance that the code performs at its best and as it should.

Example: Deploying a Deep Learning model trained to classify pixels on Copernicus Sentinel satellite images

WatNet is a deep ConvNet which has been trained for highly accurate surface water mapping on Sentinel-2 image. It combines state-of-the-art image classification model and a semantic segmentation model into an improved deep learning model and remarkably outperforms existing methods (as of December 2021)

This model has been productionized as a Daisi to evaluate surface water presence at a given location through time and monitor lakes and reservoirs extent. Given the drought situation in most of the western US states, it delivers immediately a practical way to monitor the situation for any reservoir.

Streamlit app wrapping the Drought Monitoring Daisi

Given a Sentinel-2 image, online inference is as easy as:

Interfacing with data sources

While more data are available today than ever, it always require a learning phase to understand what they are in detail, and how to use them. And some transformation need often to be applied on data in order to make them truly useful.

For instance, the example above makes use of a lower level Daisi which is tasked to retrieve the reference of satellite images at a certain time period and location which have the least cloud amount. This Daisi can typically be reused in a variety of use cases, each time that a clean Sentinel-2 satellite image is needed for a certain location. It prevents from re building the logic each time and guarantees a consistent behavior whenever it is called.

Building APIs for large AI models

While it is straightforward to deploy personal, custom or publicly available ML models as Daisies, it often necessary to deploy additional logic to post process its results to make them useful for a business case. This post processing layer is a natural fit for Daisies.

Daisies are a great complement to large AI models which are either difficult to deploy because of the resources needed, or only available through an API. Let’s consider for instance Large Language Models like GPT-3 from OpenAI. GPT-3 performs impressively for a large panel of tasks. However extracting value consistently in a business context might be challenging.

For the context, GPT-3 is a generative model, trained to predict, given a text, what the next token is. It needs then to be triggered by a user defined prompt. This prompt can be completely open, or it can be designed to coerce the model in producing a very specific result. For instance, extracting information from a text, classifying or characterizing a paragraph etc… To be deployed at scale, the prompt needs to be written in a way such that the model will produce consistent results in a variety of situations.

Engineering a prompt is then critical and this is where Daisies can be extremely helpful by deploying curated, ready-to-use prompts for a variety of tasks in front of a LLM.

Example 1 : Text mining with GPT-3 and Daisi to systematically characterize engineering good practices mentioned in documents

Text Mining and Information Extraction are key components of many digitization projects. Whereas Names Entities Recognition, text classification require specific training, a LLM like GPT-3 perform remarkably well out of the box, even for more complex tasks.

For instance, let’s consider a series of engineering texts discussing good practices and risks for various technical area. It is straightforward to instruct GPT-3 to summarize them in a structured way with the following prompt:

<Input text goes here>Write a list of the best practices mentioned in the text above in a JSON format like this one: {"name": <None if not mentioned in the text above>, "technical_field": <None if not mentioned in the text above>, "20_words_description": <None if not mentioned in the text above>, "benefit": <None if not mentioned in the text above>, "risk_if_not_applied": <None if not mentioned in the text above> }

This can be productionized easily in a Document processing pipeline involving a couple of Daisies:

  1. A Daisi digitizing the content of a document and returning paragraphs
  2. A Daisi evaluating if a paragraph discusses good practices
  3. A Daisi interfacing with GPT-3 through the OpenAI API and returning a JSON following the template above.

Which, given the following text:

Exploration phase has the highest uncertainty thus highest risk in a geothermal development project. Drilling cost is one of the critical components that significantly affect geothermal project development cost. In general, there are two major risks associated with drilling, consisting of resource risks and other risks. Resource risks are mainly associated with temperature and permeability. A robust conceptual model built from reliable data is necessary to assess both of resource risks and assist the well targeting process. Other risk are ones that related to regulation, drilling infrastructure, drilling operation issues, environmental aspect, and local community issue. The variation of drilling objectives in each stage of the project (exploration, appraisal, development) requires different strategies in order to minimize the associated risk and project cost. This preliminary study aims to summarize the thinking process or main considerations when developing the exploration drilling strategy, which accommodate subsurface, environmental, drilling, construction perspectives based on literature reviews, and authors experience. This study also presents a generic guideline developed by the authors to assist the decision-making process in developing strategy in a geothermal exploration project in Indonesia.

Produces the following outcome:

Example 2 : Using GPT-3 and Daisi to extract title, authors and their affiliations from the front page of a document

Whereas this problem seems easy to solve, it is actually deceiptively complex: in real life, the metadata of a PDF document are rarely informed, making them basically useless. It exists some heuristic based or ML based algorithms intending to extract it from the front page, but the layout of a document first page can vary widely, making them very brittle. However GPT-3 performs well to extract it from the front page content, provided that it is instructed to do so with the right prompt. A good prompt for this task is for instance

<Front page content goes here>Fill in the following template, based on the text above:  {"title": <>,  
"authors": [<list of authors names and their company>
{
"author_name": <>,
"author_company": <>}],
"acceptanceDate": <> }

Combining two Daisies, it is now straightforward to implement a scalable service processing a collection of documents:

  1. First Daisi takes in input a document and returns its digitized content (this is the same Daisi than in the first example here above, showing how a service can be easily reused or remixed)
  2. Second Daisi extracts the first page text, combines it with the prompt above, send it to the OpenAI API and returns a JSON with the requested metadata.

A practical example is illustrated below:

First page of a scientific article

The digitized version of the first page with muPDF (deployed in this Daisi) is displayed below. PDFs documents digitization produces usually a very messy output.

Draft version February 26, 2021 Typeset using LATEX twocolumn style in AASTeX63Exploring the origin of thick disks using the NewHorizon and Galactica simulationsMinjung J. Park,1, ∗ Sukyoung K. Yi,1, † Sebastien Peirani,2, 3 Christophe Pichon,3, 4 Yohan DuboisHoseung Choi,1 Julien Devriendt,5 Sugata Kaviraj,6 Taysun Kimm,1 Katarina Kraljic,7, 8 andMarta Volonteri1Department of Astronomy and Yonsei University Observatory, Yonsei University, Seoul 03722, Republic of Korea 2Observatoire de la Cˆote d’Azur, CNRS, Laboratoire Lagrange, Bd de l’Observatoire,Universit´e Cˆote d’Azur, CS 34229, 06304 Nice Cedex 4, France 3Institut d’Astrophysique de Paris, Sorbonne Universit´e, CNRS, UMR 7095, 98 bis bd Arago, 75014 Paris, France 4Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, 02455 Seoul, Republic of Korea 5Dept of Physics, University of Oxford, Keble Road, Oxford OX1 3RH, UK 6Centre for Astrophysics Research, School of Physics, Astronomy and Mathematics, University of Hertfordshire, College Lane, Hatfield AL10 9AB, UK 7Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh EH9 3HJ, UK 8Aix Marseille Universit´e, CNRS, CNES, UMR 7326, Laboratoire d’Astrophysique de Marseille, Marseille, FranceSubmitted to ApJEver since a thick disk was proposed to explain the vertical distribution of the Milky Way disk stars, its origin has been a recurrent question. We aim to answer this question by inspecting 19 disk galaxies with stellar mass greater than 1010 M⊙ in recent cosmological high-resolution zoom-in simulations: Galactica and NewHorizon.(...)

Clean output produced by GPT-3 and the PDF Metadata extraction Daisi:

Online execution is as straightforward as:

And it performs consistently whatever the layout of the first page. This Daisi deployed as a service makes it available for a team to process a large number of documents.

These are just a few examples scratching the surface of what it is possible to create with Daisies. Soon, there will be a large repository of cloud Python functions readily available to the community, or shared more privately inside teams, accelerating how we innovate, share and deploy innovation.

Join the Daisi Community!

So join the Daisi Community and contribute to building the largest repository of deployed and actionable code, algorithms, ML models etc..! Connect with like minded contributors, discover what is happening in your field, inspire and get inspired!

Follow us on Twitter and engage with the community on Discourse, Slack, Medium!

--

--