Enhanced Data Management: Google Sheets Empowered by `bal persist`

Sahan Hemachandra

Published in

Ballerina Swan Lake Tech Blog

9 min readSep 6, 2023

This article was written using Ballerina Swan Lake Update 7

Ballerina & Data Persistence

Ballerina is a programming language designed for integrations in the cloud. It provides various features and constructs to simplify the development of distributed systems. One such powerful feature is `bal persist`, which allows developers to persist data across different executions of a Ballerina program.

The `bal persist` feature in Ballerina enables developers to store and retrieve data from various data stores, such as databases, spreadsheets, or system cache, with ease. It provides an abstraction layer that simplifies the handling of persistent data, allowing developers to focus on business logic rather than low-level storage operations. With `bal persist` feature, users can define and manage data models, interact with data stores, and perform CRUD (Create, Read, Update, Delete) operations seamlessly. Ballerina provides built-in support for various database systems, including SQL databases like MySQL and MSSQL, in-memory tables as well as Google Sheets.

Overall, the `bal persist` feature empowers developers to build robust and scalable applications by simplifying the interaction with persistent data. It abstracts away the complexities of working with databases and file systems, allowing developers to focus on building business logic and delivering value.

Google Sheets as a Persistent Data Store

Google Sheets is a widely used, free online spreadsheet application that has emerged as a prominent competitor to the popular Microsoft Excel. While databases and spreadsheets operate on different principles, Google Sheets has gained popularity as a versatile data storage solution, particularly within businesses.

The `bal persist` feature takes advantage of the Ballerina Google Sheets connector and the Google Sheets APIv4 to offer seamless integration with Google Sheets. It provides developers with an abstraction layer that simplifies the interaction with Google Sheets as a data store.

By utilizing the Ballerina Google Sheets connector and Google Sheets APIv4, `bal persist` feature enables developers to seamlessly read from and write to Google Sheets within their Ballerina programs. This integration abstracts away the complexities of working directly with the Google Sheets API, allowing developers to focus on their core business logic.

When google sheets are used as a data store similar to databases, the following limitations can be observed.

As spreadsheet software does not support the concept of unique primary keys, it is hard to prevent the duplication of records in normal use.
Even Though Google Sheets supports Google visualization query language for simple queries, it lacks adequate support to run join queries.
Due to the lack of primary keys, running update and delete queries can be challenging.

To overcome these challenges, the persist module employs the concept called developer metadata, where metadata values can be assigned to each row, column, or sheet itself. In the persist module, a metadata value generated using primary keys is assigned to each row, and metadata filtering-based APIs are employed to execute CRUD operations. As it’s challenging to support join queries at the API level, first, the content of each table is saved to the local cache, and join queries are executed in the Ballerina program.

As a known limitation, persist in Google Sheets will not support transactions.

Supported Data Types for Google Sheets

The table below illustrates the ballerina data types supported by `bal persist` for the Google Sheets data store, along with their corresponding mapped types in the spreadsheet.

Practical Use Case

A medical center serves as a repository for countless records pertaining to patients, encompassing essential data associations among drug details, physicians, drug manufacturers, prescriptions, and more. Given that medical professionals and administration officers typically possess greater familiarity with spreadsheet software rather than programming languages, Google Sheets emerges as an ideal solution for handling this vast amount of data. However, despite the ease of visualization that Google Sheets provides, there remains a necessity for a programmatic approach to cater to the diverse requirements arising from various applications. This is precisely where the `bal persist` bridges the gap.

In the upcoming sections, we will explore a practical example of how `bal persist` can be employed as a persistent layer for a medical center. By combining the convenience of Google Sheets’ interface with the robustness of `bal persist`, we can achieve an efficient and reliable system to manage and utilize medical data seamlessly.

Data Model

In this section, we will establish a straightforward and cohesive data model that establishes associations among patients, physicians, drugs, blood pressure readings, and drug manufacturers. The data model’s structure will resemble the following illustrative diagram.

To define the data model in `bal persist`, users can execute the following steps.

After creating a new ballerina project, users can initialize the `bal persist` by running the following command.

bal persist init –datastore=googlesheets –module=store

This command will initialize the `bal persist` by adding the persist directory with `model.bal` file and will make the necessary changes in the Ballerina.toml file.

To define the above data model with `bal persist`, users can enter the following code to the autogenerated `model.bal` file inside persist directory, defining Patient, Physician, PatientAssignment, DrugAssignment, Manufacturer, and Drug record.

In the above data model, the following syntaxes were used to assign the associations.

For entities like patients, which can be associated with multiple drug assignments and blood pressure readings, we established one-many relationships. To achieve this, we defined arrays representing the many entities on one side of the association. This approach was consistently applied for all similar associations within the data model.

Another significant association in the data model exists between patients and physicians, allowing multiple patients to be associated with multiple physicians, creating a many-to-many relationship. However, since `bal persist` does not directly support many-to-many associations, an intermediate entity known as “patient assignment” was introduced. By defining two one-to-many associations, we effectively represent the many-many relationship between patients and physicians through the patient assignment entity.

These syntaxes and associations are pivotal in constructing a coherent and flexible data model that efficiently represents the relationships between various entities within the medical center scenario. The careful implementation of these associations enables seamless data management and retrieval, supporting the medical center’s operational requirements effectively.

many-many associations depicted as two one-many associations

The resulting ER diagram would look like the following.

Client Generation and Spreadsheet Initialisation

The next step would be to generate the Client APIs. The user can run `bal persist generate` to generate the relevant clients. The resulting folder structure would look like the following.

patient-management
|--- generated
   |--- store
        |--- persist_client.bal
        |--- persist_types.bal
        |--- script.gs
        |--- persist_configuration.bal
|--- Config.toml
|--- persist
     |--- model.bal
|--- Ballerina.toml
|--- main.bal

`persist_client.bal` contains the generated client objects while `persist_types.bal` contains the generated types. Script.gs contains the Google App Script code used for the initialization of the spreadsheet. `persist_configuration.bal` file contains all the required spreadsheet configurations and tokens.

The subsequent phase involves initializing the spreadsheet. To accomplish this, users can create a new spreadsheet in Google Sheets and simply copy the content of the script.gs file into the app script console of the designated spreadsheet. Once completed, the script can be deployed as a web application, allowing the user to execute the function responsible for initializing the spreadsheet. As a result of this initialization process, the spreadsheet will be transformed into the desired format, exhibiting a structured layout as depicted below. This layout serves as the foundation for the seamless integration of data, enabling efficient data management and facilitating various functionalities for medical professionals and administrators alike.

Users can follow the guide to obtain the auth token and can enter these values in the Config.toml file.

Managing user data

As the first step, the user should initialize the client objects.

Then, the user can add hospital and drug details to the data store. The code snippet would be similar to the following.

Next step would be adding patients to the system.

As the spreadsheet contains Physicians, Drugs, Manufacturers, and Patients, now the user can add patient assignments to the spreadsheet.

After the admissions, generally each patient goes through hourly blood pressure measurements before the exact cause of their symptoms is determined.

Considering the blood pressure measures and other symptoms, physicians can assign drugs to the patients.

In addition to the expected updates, many unexpected scenarios can happen in the hospital. Let’s assume a scenario where the physician assigned to Jacob Doe is changed to John Smith.

We can update the record using the following code

In a case where a false blood pressure reading is added to the spreadsheet, it can be removed using the following code segment.

In cases where a complete profile of a patient, along with drug prescriptions and physician assignments is required for review, it can be retrieved using the following code segment.

This will result in the following output.

{
    "id": "pa01", 
    "name": "Jane Doe", 
    "age": 37, 
    "physiciansAssigned": [{
        "physicianId": "ph03" } ], 
    "drugassignment": [{
        "drugId": "d01", 
        "quantity": 100 
    }], 
    "bloodpressure": [
       {
        "timestamp": "2023:07:02:19:30:00", 
        "systolic": 155, 
        "diastolic": 95 
       }, 
       {
       "timestamp": "2023:07:02:21:30:00", 
       "systolic": 155, 
       "diastolic": 95 
       }, 
       {
       "timestamp": "2023:07:02:23:30:00", 
       "systolic": 155, 
       "diastolic": 95 
       }
   ]
}

Bal persist for Integration

The previous section discussed how to use the generated client APIs to do the CRUD operations. This section will discuss the creation of a ballerina data service using `bal persist`.

As the first step, the user has to initialize the data service

Let’s consider the scenarios where a physician needs all the data related to the patient to make an accurate diagnosis. The following resource function can be used in order to carry out the task.

This method can be invoked using

`GET http://localhost:9090/patient-management/patients/pa01`

and it would results in the following response.

{
  "id": "pa01",
  "name": "Jane Doe",
  "age": 37,
  "physiciansAssigned": [
    {
      "patientId": "pa01",
      "physicianId": "ph03"
    }
  ],
  "drugassignment": [
    {
      "physicianId": "ph01",
      "drugId": "d01",
      "patientId": "pa01",
      "quantity": 100
    }
  ],
  "bloodpressure": [
    {
      "timestamp": "2023:07:02:19:30:00",
      "patientId": "pa01",
      "systolic": 155,
      "diastolic": 95
    },
    {
      "timestamp": "2023:07:02:21:30:00",
      "patientId": "pa01",
      "systolic": 155,
      "diastolic": 95
    },
    {
      "timestamp": "2023:07:02:23:30:00",
      "patientId": "pa01",
      "systolic": 155,
      "diastolic": 95
    }
  ]
}

For another scenario, the hospital director needs to check the data related to each physician as a part of the performance evaluation. The following resource function can be implemented to integrate the persist layer with the user application.

This method can be invoked using

`GET http://localhost:9090/patient-management/physicians`

and it would result in the following response.`

{
  "id": "ph01", 
  "name": "John Smith", 
  "specialization": "Peadiatrics", 
  "patientsAssigned": [], 
  "drugsAssigned": [
    {
      "drugId": "d01", 
      "patientId": "pa01", 
      "quantity": 100 
    }
  ]
}, 
{
  "id": "ph02", 
  "name": "Jane Carter", 
  "specialization": "Internal Medicine", 
  "patientsAssigned": [
    {
      "patientId": "pa02"
    }
  ], 
  "drugsAssigned": [
    { 
      "drugId": "d02", 
      "patientId": "pa02", 
      "quantity": 100 
    } 
  ] 
}, 
{
  "id": "ph03", 
  "name": "Kyle Schmidt", 
  "specialization": "Family Medicine", 
  "patientsAssigned": [
    {
      "patientId": "pa01"
    }, 
    {
      "patientId": "pa03"
    } 
  ], 
  "drugsAssigned": [
    {
      "drugId": "d03", 
      "patientId": "pa03", 
      "quantity": 100
    }
  ] 
}

The full ballerina code for the implementation can be found here and readers can test it on their own spreadsheets.

Conclusion

In conclusion, the `bal persist` feature proves to be an invaluable tool in seamlessly integrating spreadsheets with user applications. By providing a reliable and efficient mechanism for data storage and retrieval, Ballerina enables developers to effortlessly connect their applications with spreadsheets, bridging the gap between these two vital components of modern data processing. Through its intuitive syntax and powerful capabilities, Ballerina significantly streamlines the integration process, empowering developers to focus on building robust applications without the complexities of traditional data handling methods. With `bal persist` feature, businesses and developers alike can leverage the full potential of spreadsheets, harnessing their data-rich capabilities while delivering user-friendly and efficient applications. As a result, the possibilities for innovation and collaboration between users and their data are endless, making Ballerina a game-changing solution for organizations seeking to enhance their data-driven workflows and application development processes.