Filling Out Forms Isn’t Fun
Online forms are the worst. The often-long, sometimes multi-page forms can be a time-consuming and laborious process to fill out. Almost any other task is more enjoyable, even with the occasional prize drawing or other form of incentive. While large forms can and often do provide valuable data, they certainly have not won any awards in regards to providing a well-regarded user experience.
In October 2017, our data science team at the NASA Jet Propulsion Laboratory (JPL) was approached by the Office of Safety and Mission Success (5X) to improve processes for reporting in the Problem Reporting System (PRS). PRS is an internal tool that allows engineers to submit Problem Failure Reports (PFRs) and Incident Surprise, Anomaly reports (ISAs), which document pre-launch test failures and post-launch operational anomalies experienced by spacecraft. These reports not only serve as a record of past problems but also of past solutions to the problems described.
Despite their value, the reports contained within the PRS are costly to fill out and submit. With dozens of textual, categorical, and other inputs in the forms, the PFRs and ISAs draw valuable time away from mission staff to the annotation of internal forms — time better spent with spacecraft operations and mission work. A solution was needed that would reduce the time needed to file reports in PRS while ensuring ease of use for users already familiar with the current PRS system, such as a recommendation system for form fields. What we needed from a data science and IT operations perspective was a straightforward process to deploy a simple recommendation system for use in enterprise applications containing categorical form inputs (like dropdown menus).
The Solution
What we can up with is Henosis, a cloud-native, lightweight Python-based recommender framework that brings together model training and testing, storage and deployment, and querying under a single framework (Henosis is a classical Greek word for “unity”). Developed using user-driven design (UDD), Henosis serves as a bridge between two roles in the organization. Henosis provides data scientists with a straight-forward and generalizable environment in which to train, test, store, and deploy categorical machine learning models for making form field recommendations using scikit-learn, while also providing developers with a REST API that can be easily queried for recommendations and integrated across different enterprise applications. This user-driven design and simplification of the integration process allows for easier deployment of recommender system capability across different enterprise applications.
Henosis is available on Github as an open-source framework, with the latest alpha versions regularly released as we move toward future releases. As the framework is applied with different internal applications within JPL, the capabilities of Henosis will also evolve in tandem. The same open-source software available on Github is the software we are developing for our own use internally.
How Can I Use It?
Due to UDD, Henosis is also intended to work easily in a modeling workflow. There’s a bit more to it than what’s presented below, but once you have an Elasticsearch index and AWS S3 bucket configured, it’s pretty easy to use. You can use Henosis to deploy models that use of incoming form data to provide recommendations.
Here’s a simple modeling workflow example. First, we import Henosis from some directory (a pip install is planned) and the libraries pandas or scikit-learn.
import sys
sys.path.insert(0, '../../src') # source app.py of henosis lies
import app as henosis
import itertools
import json
import numpy as np
import pandas as pd
from sklearn.naive_bayes import MultinomialNB
from tqdm import tqdm
Data prep. Here, we create the Henosis Data object d and load data from a local csv file into the object. This allows Henosis to create testing and training splits of your data, and upsample or downsample your data (using the imbalanced-learn library) if you need it.
# load the henosis data object and the data
d = henosis.Data()
d.load(csv_path='data.csv')
print(d.all.columns.values)# gather your X and y variables
y_var = 'y_var'
X_vars = ['x1_var']
all_vars = X_vars + [y_var]
# d.all is a DataFrame populated after loading data into d
to_model = d.all[all_vars]
# remove any empty observations for modeling
df = to_model[pd.notnull(to_model[y_var])]
y = df[y_var]# split the data into training and test splits using d
d.test_train_split(
df[X_vars], # independent variables
df[y_var], # dependent variables
share_train=0.8, # share of dataframe for training
balance='upsample' # upsample, downsample; not required
)
Define the model. Here, we can pass any scikit-learn classification model into Henosis to use for providing recommendations.
m = henosis.Models().SKModel(MultinomialNB(alpha=0.15))
Once your model m is defined, use it to train and test against your data.
m.train(d)
# show the training results
print(json.dumps(m.train_results, indent=4))m.test(d)
# show the rest results
print(json.dumps(m.test_results, indent=4)
Happy with the results? You can then store the model in S3 and update the entry in Elasticsearch with simple call from the model, m. First, you’ll need to load a Henosis server object s using a local config.yaml file. Then, you can call the model and store it on AWS S3 and Elasticsearch using the settings specified in object s.
s = henosis.Server().config(yaml_path='<path_to_config.yaml>')
m.store(
server_config=s,
model_path='model_variableOne_1.pickle', # path in AWS S3
override=True # to overwrite old AWS S3 files
)
While I spare readers the details of configuring Henosis from this article, you can find the information you need in the Github repository (and soon to come documentation).
Deployment is easy too, and just one call away. Once you’re satisfied with model results, you can deploy that model for use in the recommender system by calling m.deploy().
m.deploy(
server_config=s,
deploy=True # false to take offline
)
Running a Henosis instance to serve your deployed recommendations is easy, too. Here, we show how one can spin up a Henosis instance for serving recommendations. Using an identically configured config.yaml, one can start a Henosis instance in a Python script which spins up a simple Flask server.
import app as henosis # may need to specify a path using sys firsts = henosis.Server().config(yaml_path='<path_to_config.yaml>')
s.run()
See? Not so bad. We recommend using Docker and a container manager such as Kubernetes for scalable deployment, redundancy, and minimizing downtime. But you can just as easily host a Henosis instance using any service that will deploy and serve Python applications.
You can find more documentation on configuring Henosis in the Github repository. Keep in mind the framework is still in development and in its early stages, so the documentation and function calls may change slightly as we work through open issues and other development action items.
What’s Next?
Modeling and REST API documentation is in the works and will soon be released for the community to use in their own applications. We’re taking the time to make sure the documentation is complete and accurate prior to release. In the meantime, development of Henosis continues and you’ll see pushes to Github every now and then adding additional features or capability (a pip install is coming soon, too)! We’re excited to share this simple but useful capability with the general public and beyond the confines of the laboratory.