What would you do in two days? Let me be more precise, what would you do on a weekend? Depending on the kind of person you are, the answers may differ. Some may wanna stay in, have a good sleep, take it slow. If you are like me, you would be on the road riding a bike to that one peaceful getaway. Maybe you want to go on a date with your dearest one.
But if you asked me the same a couple of weeks back, you would be laughing your head off after hearing what I say. Hold on to your head guys…here is what it was. Along with my friends, I wanted to speak to data. Yes, you heard me, we were planning a scheme to talk with data and databases. Don’t get me wrong, I am not insane.
Let me brief you on what is going on. My company is organizing their yearly hackathon, the Accubits Innovation Challenge. It’s an annual event which sees technologists and innovators from around the world collaborating and contributing to develop the technologies for tomorrow. I was part of an internal team of researchers and our idea was to develop a unified data aggregation and interpretation tool powered by natural language processing to perform data analytics. The premise of the idea is to let anyone perform data science jobs by just having a conversation with the tool.
We laid the ground works by noting down solution approaches on sticky notes and did a lot of caffeine powered brainstorming. End result was the realization that the more we think about it, the more complex the solutions become. We agreed upon building a minimum viable product (MVP) within the weekend. A modular approach was adopted where every section of the architecture remains isolated, because we wanted the flexibility to tweak, modify or remove these modules as we continued our development. An abstract break down of the architecture goes like this -
- A chat UI for conversational inputs.
- An NLP engine to make DB queries based on free flowing conversational inputs.
- A data parser to ingest raw data and create dumps to our central database.
- A Machine Learning (ML) backend that ingest data, creates subsets and decides the best model to fit the data for prediction.
- A data visualizer.
First step to building a product is to find a good, catchy name for it. After having some heated verbal exchanges and a couple of black eyes later, we settled upon the name InsightsBud. Think more about the name and we come to realize how aptly put together it is. Core idea behind the platform is to remove a data scientist from redundant duties like cleaning up the data and generating visual insights. Now InsightsBud can do this job for you, where it equips a layman to do data science tasks and generate correlations on data to bring business insights. Biggest beneficiaries of the platform are business executives, sales heads, and marketing leads to name a few.
Potential use cases as well as market impact of such a product fueled the enthusiasm of the team. Everyone had their duties assigned and it was crunch time. I was tasked with developing the ML backend. The workflow was pretty straight forward. System will receive some data and some information on which domain the data is from, eg: healthcare, transportation etc. Although this info may not make much difference to what happens in the backend, having something is better than nothing.
Once a dataset is received, we have some preprocessing involved which checks for data characteristics like strings, integers, booleans etc. Headers for the data are compiled to make word ontologies to make processing and mapping of data points easier. Then we have our rule engine that fits subsets of the data to figure out the best ML model, its parameters and characteristics of the outcome. Although not the best method for scaling, this ensures less computational errors and reduces resource cost.
Predictive analytics, data insights etc are created in real time based on what the user asks. As an example let me tell you how the system reacts when a user inputs some data and chats with the bot. Say, you are someone from the healthcare department. You are asked to generate insights on data about patients from different parts of a state or province. The data contains info related to patient health, history of diagnosed diseases, their geo-location, ethnic background, demography etc. You can load this data to InsightsBud, answer certain questions related to the uploaded data and start asking questions to Bud. Question would be something like, “Hey Bud, what is the possibility of someone from X location having chronic arrhythmia?”. Based on data already uploaded to the system in the previous step and information extracted by the bot, the model takes in location and type of disease to give a probability score for chances of a person having the disease. The model that is generated for the data will be based on several attributes of the training data and is decided by a rule engine and model parameter estimation technique that relies on data subsets to evaluate best fit model for any given data.
The UI team started their work with designing the user flow and the UX design. On an experimental basis we tried out flutter for the application development, but found that its very unstable for our particular use case. This forced us to switch to Angular for the frontend. NodeJS was our choice for managing backend communications, whereas python was chosen for building the data loader and preprocessing backend.
During the development of InsightsBud, we incorporated our data ingestion tool called Gulpi, which acts as the data crawler and dumping mechanism that lets us integrate data sources like Slack, Twitter, Facebook etc. A regular CSV input is also supported. Next comes the ML backend which was my prime focus. Due to the vastness of data sources it was only sensible to come up with a somewhat generic solution that can handle different data characteristics like data types, sources, discipline to which the data belongs etc.
First step was to design a preprocessing rule-book which takes into account certain obvious characteristics of the data like, origin, trend, data type and size. Based on the guidelines the crawler performs certain validation checks to ensure that the input data clears some basic criteria to be considered a valid input to the system. Such an approach lets us filter out fluke data.
Insights from the crawler will also be used while preprocessing and data cleanup. Right after the cleanup, our backend algorithm generates models based on random subsets to evaluate best model parameters for any given data. Immediately after this the model suggested by the backend is trained with the entire dataset. Similar steps are repeated with key attributes within the data to generate models for every feature in the dataset as a function of every other attribute.
Once we felt that we have all done our part, it was time for the marriage — we had to merge everything together for a seamlessly functioning prototype. All hell broke loose when we integrated each module. Although everyone worked together and knew what’s what, cross platform integrations often hit roadblocks during initial integration.
The clock was ticking and we were just 15 hours from the product presentation. We had a sleepless night ahead. Foreseeing this we already stocked Red Bulls and some munchies, which proved to be not so helpful after 6 hours into the night. It was four in the morning and nobody seemed to flinch their eyes. Bug fixes were underway and by 6AM, the platform seemed to be working. It was accepting data and generating insights based on conversations with the bot. We planned to take a nap for a couple of hours before the events inaugural address.
Then suddenly someone reminded us that we have a product but no demo presentation or a pitch deck. This was bad. Even though we have a viable product, an average pitch could mean disaster for the future of this platform. Main purpose of the presentation is to convince a group of investors about the relevance of the product, its market impact, how it will make them money and ultimately make them invest in the product.
We quickly got into action, prepared the deck and rushed to the presentation. The demo went well, investors were happy and we had to get some sleep. We slept so deep, that we missed the moment we were declared the winners of the event. But for us a good satisfactory sleep was more worth it. You know, the feeling you have when you have accomplished something that everyone else said you couldn’t, it’s priceless and that sleep, my god, I have never had such a pleasant, satisfactory sleep in my entire life. The past two days taught me a lot of lessons in mentorship, team work, the importance of knowledge sharing and collaboration.
I felt the story of our journey needs to reach a lot of people, because it has the potential to inspire or create an impact on an individual or group of individuals to become more proactive in terms of how he/she manages their time and give them a lesson in team work. For getting more info about our journey and the things we do, please visit us at www.accubits.com. To get more in depth understanding on the working of InsightsBud and trying out the platform, visit www.insightsbud.com