Technology Learnings at omni:us | Celebrating a year on Medium
A year ago, when we pushed for our first Medium blogpost, we were still figuring out how to use Medium to showcase some of our work and contribute to the community. Fast forward to December 2018, Medium has been a great platform for us. We are generating more views and reads every month and it’s also pushing us, as a technology team, to write good articles based on the scenarios we encounter in our day to day lives at omni:us. A big shout out to Vithya Gopal for helping us curate all the articles and bringing them to human-readable states.
Technology Team Culture.
The reason we are taking time out to write a post on the culture we follow in the tech team, is because we have grown from a team of just 6 people this time last year, to a team of 15+. We are a close-knit team of data scientists and engineers, so having the right culture and attitude towards problems and towards each other proved to be an essential component in helping us grow.
Our data scientists build their own docker containers. Yes, you heard that right. Without building and testing their models/services via a docker container, the service does not get shipped into production. This saves a lot of time for the engineers to actually be able to work and manage the data engineering pipelines and other engineering platforms, without the overhead of needing to understand every single dependency required to run a deep learning system in production.
The engineering & operations team understand the meaning of precision, recall, accuracy, model performance, evaluation, test, train and validation datasets. They are building tools and processes to handle the end to end AI technology platform that we are building at omni:us. This helps the data scientists know that their models/services are in safe hands, without having to explain what the expected outcome is, every time they add or modify an AI module.
This culture emerged due to the sheer curiosity, respect, and mutual admiration of each others’ expertise that was brought to the table. Also, the cherry on top, is that we currently have people from 22 different countries in our company and that number is growing! Trust me, diversity helps.
Things we did wrong.
In the last 14 months we really started to run at an astonishing speed in terms of development. Of course, we screwed up quite a bit and the repercussions stayed with us awhile. Here is a short list of our screwups:
- Not thinking about data engineering pipelines: In the early days, instead of thinking about a broader spectrum of problems, we really just focussed on building a monolithic Java-based system ourselves. This really came back to bite us, as we thought neither about operations nor about scalability.
- Naming convention: As we started working rapidly, building different components for different customers, we completely forgot our 101 lessons on writing good, maintainable code. We introduced customer names in some of our modules, and this came back to haunt us for a really, really long time. We have finally been able to remove it completely, thanks to the determined engineers on our team.
- Single-Channel mindset: From the beginning, we knew the problems we wanted to solve at omni:us, could only be solved with deep learning systems. The hiccup was that, we thought predominantly from a computer vision perspective, while ignoring everything else. This created quite a few problems in our solutions as, in the beginning, they did not perform as well in production for our customers. We were really just thinking of using one hammer to nail all the bolts.
But, this was a catalyst for us to start thinking about the problem from a multi-dimensional perspective. We have come a long way now!
- Underestimated the limitations of Deep Learning: We love the wonderful things that deep learning can solve for us and it’s genuinely a great tool to solve complex problems. However, there are certain limitations to it. These limitations start to surface more and more, as you descend deeper down the rabbit hole. Especially for a startup with the foundation of building reliable AI systems for customers, not having the concept of theoretical limitations as to what deep learning systems can do, puts us in tight situations. This wonderful article by Gary Marcus elaborates more on this topic. We have found better solutions to this problem by making our system and thought processes very transparent to our customers, thereby making them aware of deep learning’s limitations, rather than showing them a magical black box that can solve everything.
Things we did right.
To actually be here today, and writing about this, we did a lot of things right. We can go on and on about them, but instead we’ll just touch on a few key points:
- Transparency: It’s easy being transparent when everything is going right, but it gets especially hard when you are stuck in bad situations. The attitude of being completely transparent was set by our CEO, who made it very clear from day 1 that we are all in this together, through the highs and the lows.
- Hiring: It took us a while, but we believe that we hired the right people with the right attitude, based on more than just skillsets alone, into our organization in every department. We set up a process very early on, in the life of the company, to evaluate candidates in all aspects before welcoming them to the company.
- Setting our ego aside: When you are in a dynamic startup with highly-skilled people, egos come into play very easily and very early on. We have been able to sidestep this successfully, with a very intellectual thought process, and moderation of ideas and implementations. This has helped us replace systems, technologies, algorithms, and sometimes, even a few customers, in quick succession.
- AI for the right reason: The coolest part for us, is the fact that we are not using AI only on our slide decks to pitch, but we are actually implementing a lot of deep learning techniques, as at this point in time, that is probably the only way to solve the problems of information extraction and understanding in documents.
- Education of AI and technologies : We truly believe not just in building reliable systems, but also educating stakeholders of every kind on the possibilities and limitations of our technologies.
This point takes me back to a discussion that took place at our office between Vithya Gopal and I, where we had asked her to come up with a specification for our APIs. She did some reading up and came back to us with a proposal and one question. The question was, what does an API really mean in our ecosystem? This was a wonderful question, because this reminded us that, many times, we think everything that we build is naturally understood by everyone we talk to. Educating is a process of bringing everyone to the same plane that truly nurtures growth.
Technologies we use at omni:us.
When we write our blogs or speak at conferences, we are mostly asked about what our tech stack is, and we think it is very relevant to the culture we have at omni:us. We believe in open source ( sometimes a little too much ) and try to use open source projects as much as possible. We also contribute, you can find us on GitHub.
Our choice of technologies are also such that they can be used both on cloud and on-premise installations. As we are in the insurance industry, it’s very important for us to keep this in mind. We should be able to dockerize what we do. Yes, dockerize is a term.
We are predominantly a python shop, with some of our core annotation systems written in complete java. Of course, for the UI, we use angularJS. Our preferred way of storing metadata is XML.
Some of the open source frameworks/libraries we use:
* Airflow — For building our data pipelines.
* Keras, Tensorflow, Pytorch — For building our deep learning models, we have started loving Pytorch a bit more these last months.
* Tensorboard, Visdom — Visualizations of our deep learning models plotting more than accuracy metrics.
* Jupyter notebooks — All our experiments are carried out with notebooks, it has sort of become our go-to tool. Sometimes, we even use it to train our deep learning models during experiment runs, but we have see the kernel panicking a bit too much during the recent runs.
* sklearn, matplotlib, seaborn — Python packages we use to carry out machine learning and visualizations.
* ELK — We use Elastic search, logstash and kibana to monitor the applications and logs of all our microservices.
* Flask — We use Flask, a micro web framework written in python, for building our deep learning services.
* Prometheus — We use prometheus and grafana to monitor our infrastructure logs.
* keycloak — We are using keycloak to build our single sign on service that we can deploy in all our pipelines.
We are a complete Kubernetes, docker company. We have our on-premise customers running Kubernetes, as it eases our deployment. We are a small team, so naturally, we need to use technologies that help us scale and be efficient at the same time.
We run our own internal pypi server and use artifactory to manage our java artifacts.
We are extremely privileged and proud to be part of the AI journey where we can create a big impact in the real world. In order to do this consistently, we need a team of not just talented people, but also people with the right culture and attitude. When everything comes together, you end up having so much more fun than you signed up for!