Lab AI : Instrument Integration
A Change to Web Serial.
Introduction
- This article will investigate a relatively new and cool way to capture serial device data directly into a browser web page along with a POC simulator app that one can test this idea with, available here . The setup is described in the Demo section below.
- A natural follow up… Converting data to a standard Allotrope ASM JSON format for easy Digital Lab Exchange of data.
So, lets see what we need to get there.
Instrument Integration Before Web Serial
There are basically two flavours of Lab Instrument Integration. A file based one where an instrument produces a file, which is detected by a “file watcher service” and which is then processed. We are not looking at this flavour here.
We will take a look at the other flavour, at instrument integration with simpler instruments like a weighscale, temperature probe, pH meter, DO probe etc instruments that typically appear to the OS as a COM or serial port. One of the first questions that comes to any developer’s mind is : Why now ?. We learnt how to do this decades back, sometime in 1960 when RS232 was first introduced, and so, what’s the big deal today ?. Well, decades back we developers designed software for serial devices using native apps ( apps that are basically downloaded & installed) and they were typically developed for one particular operating system (e.g. Windows, iOS or Android) only . This worked perfectly well until the world shifted to WebApps which ran in a browser . The browser shift happened to avoid or bypass the issues of installing a new native app or updating say 100 user boxes with a new “download and install” version plus associated installation validation and the need to maintain different app versions across OSes etc... For the browser webapp scenario a new version update simply meant update the web server with new content and that’s it, the new version is loaded in all boxes and everybody’s good, no individual installs.
Unfortunately this introduced a new problem. Browsers were sandboxed for security reasons !. They did not have access to local resources like COM ports and that broke the idea of getting data directly to the browser from a serial device. A feature that worked perfectly well for a native app was, now, no longer available to a webapp. The only way to access a COM port was to “create a new native app in the middle” that acted as an agent between browser and the COM port. The agent to browser communication could be done by a regular web protocol https or websockets and so this was not an issue.
The way a communication between WebApp and COM ports happens is something like this
This kind of solution reminds one of A hands it to B and B in turn hands it to C..yes like Whatsapp message forwarding or the Chinese whispers game , not to forget that every handing over ie network call from A to B increases latency ie the time between when a device command is sent and a response received. This can really irritate lab users who have tons of pending samples to process !.
Thus, with the introduction of the webapps and a browser, we developers (whoever) while solving one issue actually introduced another and this sad state of affairs continues till this day !…. Until …March 2021.
Instrument Integration with Web Serial ….it’s simpler now !
The makers of Chrome ( and everyone obviously) knew this issue existed and around March 2021 the browsers Chrome, Edge and Opera were released ( see link1 and link2 : Browser Compatability) with full support for Web Serial, a technology which allows these browsers to flip between a Sandboxed mode and a Non-Sandboxed one. This happens if and only if the users says Yes.. Go ahead. Permission must be granted !. and yes Full Support ! This essentially means that we are back to business once again with a direct line of communication between browser and COM port. We no longer need that “native evil app in the middle or data connector as it’s sometimes called”. Imagine this : We simply add some script to an existing web page and the page is now COM port enabled !. Given how it’s currently done, how cool is this new way ?
One of the comments posted here (link1 ) by the author Pete LePage is
“I’m really excited about WebHID, WebNFC, and Web Serial. They open up new scenarios for users that were never possible before, interacting with real world hardware.”.
Web BLE is very similar to WebSerial and covers Bluetooth Low Energy devices like smart watches. To remain focused we will address just Web Serial.
Very few have actually adopted this technology change .
Digital Lab : Data Exchange & The Tower of Babel
Lab data is mainly sourced from instruments. Unfortunately each instrument’s output data stream is vendor specific. Each instrument uses a different set of “instrument understood words” or instrument commands arranged in an arbitrary vendor specific grammar / language. A Mettler balance commands / responses would differ from say a Sartorius balance. Then again , take an attribute like temperature . For some vendors of a temperature probe it’s Temperature for one , for another T or temp or Temp (case sensitive) or freezer_temp or oven_temp. The temperature attribute, its value and units are likewise vendor specifically arranged in a data stream requiring a vendor specific parser. Given this , how would any module understand and correlate readings and thereby think intelligently ?. So, we need a Standard data format, at least for now.
To resolve this, the Allotrope Foundation created ASM JSON. This is one way that attempts to take that data stream and format that into a common data representation ( a Allotrope Framework for analytical data, consisting of a standard data format…. see here) . Remember the Allotrope foundation started sometime 2012 nearly a decade before ChatGPT which entered around Nov 30th, 2022 . The reader could perhaps look up the Tower of Babel story here which is a story about the confusion caused when persons who speak different languages try to build something. Here again standard data format .
Another amazing happening in the world of instrument integration took place even before 2012 sometime pre 2004. It was a brilliant idea that allowed the capture of data from instruments using a print driver and was designed by a company named NuGenesis. Some details about it are available here. It’s success was once again attributed to capturing a generated standard data format from a non-standard ( read as vendor specific) one and saving that along with metadata to a database. Once again..standard data format !.
Very recently Benchling too has done the same ..it has gone the Allotrope way with a python based offering for specific instruments (see here) but starting with an instrument data file and not direct device access like this article.
Data in Flight : The Vendor Specific Parser Issue
Suppose
Vendor 1’s instrument produces an output : Temperature=30 deg C.
Vendor 2’s instrument produces an output: temp <SPC> 30<SPC>C
where <SPC> means a single space.
Let’s say we decide that our Standard Data Format (SDF) is
Key=Temp, Value=30 , Units= Centigrade.
It’s simple to see that for each vendor, we would require a different parser which will convert the vendor’s data to our SDF. To avoid a new vendor specific parser being developed everytime the only other option is we tell instrument vendors to change their machines to output our SDF (our meaning a generally accepted one like Allotrope) . Yes, some vendors are switching to that way.
The above “Parser issue” is only true for “Data in Flight” ie. when data moves or flies between the producer instrument and the receiver server just prior to it being database stored.
Lab AI : ASM JSON ? or Neo4J ? or …… LLM or … ?
Do we really need the Standard as above ?
The actual fun beings now, after the data is stored. How do we make sense of different data items from varied sources ? To store this data in some standard SQL database we need to have a schema, different tables, joins for queries etc etc. This is a bit restrictive because queries and joins have to be tailored according to the schema and requirements always change. Neo4J is one graph database that allows a loose relationship between entities and is getting more popular day by day because of AI. See how the node relationship diagram of a typical Neo4J database looks here.
Somehow, the Neo4J image of connected nodes with relationships brings to mind ChatGPT. It’s amazing ability to “generate” new content stems from a pre-trained model that it consults. The pretrained model consists of word relationships obtained by some algorithms. How did it get the training data ? . Well, it was trained on the internet web page content, pages that consists of different words, word sequences laid out in accepted grammars etc and from these word vectors are derived ! We don’t want to go in too deep.
A question…Suppose we do not go the standard data format, SDF route. Can we make sense of or correlate data that’s related from different vendors like the samples shown above using an LLM ?
Lets check this out with ChatGPT.
I asked ChatGPT: “ How are you ?“ (English ..yes, we all know that ) and it came up with a nice response. I then posed the question in conversational Hindi …” aap kaise ho ?” and it come back with a similar response but now in Hindi ! ( English alphabetised Hindi..ok, not a big deal). I then asked ChatGPT the same question in Hindi but told it to specifically give me the equivalent response in English, “Could you reply in English” , which it did and beautifully at that. It would do the same the other way as well. This was great, it essentially answered my question.
The key take away here is not ChatGPT’s ability to do such amazing things but the fact that a standard data format, words, attributes, vocabulary whatever does not necessarily mean that Lab software must use exactly the same collection of words on a one to one for each attribute or relationship. As long as the pretrained model has an association built in, it will work. So, for LabAI, Digital Lab Exchange of data etc … this means that (taking the previous temperature attribute example : T or temp or oven_temp etc) all could work provided that association pre-training is done !. It was easy for ChatGPT because it was pre-trained on the internet where such associations already pre-existed, sentences with different languages mixed up in one as well …. our world has really grown small. This tells us that, yes, we can have an intelligent system even with vendor specific data but if we go the vendor specific route the complexity bar gets raised and the pretrained model gets more complex.
Therefore, the move to a standard data format and vocabulary while great for “ a standard parser, data in flight” is not mandatory for Lab AI queries similar to ChatGPT ! We only need a pre-trained model !.
One also has to remember that these past moves towards a Standard Data Format happened long before ChatGPT came out in Nov 2022. At that time few knew about or expected the success of these LLMs. Who knows, the thinking may now change to something different given ChatGPT’s success and so I like to add this to an expected query …Will this work ?
Self Disclaimer : I am not a data scientist nor a fortune teller and possess only a rudimentary knowledge of AI algorithms, and so , all that’s said above about AI is my humble understanding and common sense given the tools we have today.
Every software vendor wants to attach those two magic alphabets …”AI” to their software. However , nothing, really, is ever known or predictable in these heady days ..but one can dream and be prepared for the unexpected and yes of course…..try !:)
For this article or journey we (in Leg 1) look at Web Serial. In Leg 2 we look at converting the data to a standard Allotrope format / vocabulary or package as an attempt towards a standard data format.
A peek into a Lab, with some Go Ahead Questions for Web Serial !
Demo time : Building a Web Serial Simulator that one can use to test this concept. Be warned, this cannot be used, as yet, with a regular instrument because it requires a few trivial code changes but one can always assume that the simulator is the actual instrument.
No doubt, there are many web-serial demos available online but none that I could find which allowed a developer to send data to a serial device and also view the communications, from the other side, the receiver, on perhaps the same web page. Having such a simulator test app only builds up confidence in this new technology. Its also simple to set up with just an RS232 cable required for testing. There is a WebApp side and a Simulator side as shown below.
Screen shots taken from the demo are : Hardware SetUp, Software Setup and a Command Response mode long running test.
The screenshots are self explanatory. The last figure: “Command Response mode” running in a loop was done to check stability for a long running background mode operation like a : Lab oven drying experiment capture …weight vs time .
This ends Leg 1… Instrument Integration.
Next Leg Preview :
Allotrope ASM-JSON Package
The idea here is to use something readily available, configurable and easy to setup. A Google sheet is the first thing that came to mind. See the image below .
We investigate a Simple Lab weighing task to create a standard ASM JSON package of data that contains all the items for that lab weighing including context . It uses a Google sheet as a scratchpad to first parse vendor specific instrument data formats to an (Allotrope identifier, a number indicating value and units) , like a key name-value pair except that for the key we need to use a standard Allotrope identifier… getting ready for the standard vocabulary we mentioned earlier.
Concluding Remarks
Understanding the concepts above require some knowledge of both Domain side and the Developer side. A Domain reader would need to know what it takes to work in a Lab or say a Chemical Plant and issues involved. A developer would need the tech part. As a Chemical-BioChem Engineer person who has spent these last 35+ years working in both an R&D lab and Bioreactor Pilot Plant and as a hands on developer one of the biggest issues in building soutions like these is the gap between developers understanding domain and vice versa. Articles such as these will bridge the Developer-Domain divide, for both sides to get a grip on how these gears move, what makes this watch tick.
So, till the next journey Leg 2, the actual setup of the Google spreadsheet, the webapp for communications etc its Adios, Bye for Now.
Attribution for images used :
The images were sourced from the links below
Thanks for the images !