OpenWISP Monitoring — GSoC 2020 Project Report

The last few months I have been working on an open source project OpenWISP selected for Google Summer of Code 2020 under the mentorship of Federico Capoano and Pablo Castellano.

The end product of my entire summer and many more contributors who openly contributed to the module is:

Device Admin OpenWISP Monitoring

We aimed to create a monitoring module for multiple devices (eg. routers) used in a network that can be easily monitored live with the help of multiple Charts (you can easily add one as per your use case). If any problem arises with any of the devices (for reasons caused by say Configuration issues, CPU overload, disk space not within safe limits and many such issues) then the user be notified in real time (you can set tolerance for the alerts so you don’t get mail bombed 😅).

There are many more features too which have been explained briefly in The Work section. The code is present in openwisp-monitoring repository.

Seems interesting, exciting? Want to know the journey? Here you go.

What is OpenWISP?

  • dynamic auto-configuration of new nodes
  • creation of VPN tunnels
  • initialization of WiFi access points
  • configuration of mesh networks
  • configuration of any other network configuration supported by OpenWRT

openwisp.org

The beginning:

Coming to the project selection, there were 4 projects which were announced by OpenWISP. I wasn’t quite sure which one amongst them would bring the most learning outcome (this is what I aimed for GSoC, that is to learn and explore). At that time Federico (one of the core maintainers of the project and also my mentor) suggested that openwisp-monitoring was a very exciting project to work upon given the technologies involved in the public channel. So I read through the measurable outcomes and was determined to apply for it.

While working on the proposal I encountered a lot of queries since I had never worked on most of the things before. I asked those queries on the community’s mailing list and on IM to get clarifications on most of them. Do ask queries, a lot of them until you are sure you understand what the problem is. You will end up saving your time and of your reviewer (this also happens to be the most important learning this summer 😄).

I submitted my draft proposal and got critical feedback at first 😬. This pushed me to try harder and I continued reading the documentation of technologies involved, diligently going through existing examples, best practices and kept iterating my proposal until I was sure that this is it 😤. I believe one’s proposal best explains whether an individual understood the problem.

The Journey

For all the selected students few general guidelines were posted and calls to kick-start the project were organized by the mentors to give us a smooth start.

The Work

Resources metrics

The resources metrics as viewed in one month resolution with help text displayed for all

We added resources metrics. This was a very worthwhile experience as I got to work with Lua, OpenWRT, Plotly (I had very little front-end knowledge before the summer). Noumbissi Valere helped me to get OpenWRT setup with OpenWISP and Pablo Castellano helped me a lot to get used to it and Lua. The end product is as above but it wasn’t so simple. Since, I had to familiarize myself with InfluxQL, update the existing Lua script so that relevant resources data is passed to the monitoring module in NetJSON format, understand how exactly are Metrics, Charts and AlertSettings related; ensuring that I don’t break anything while working on it :)

Swappable Models

Swappable models help developers who want to extend or modify some fields of an existing Model. They can easily do so by making use of this feature of OpenWISP Monitoring. (If you are interested this has been nicely documented in the module’s docs)

Pushing up coverage and speeding up the tests

Abstraction Layer

InfluxDB setting (Fully supported)

Elasticsearch setting (Currently in development)

AlertSettings and Check Inline

AlertSettings Inline
Check Inline

We wanted two tabs, one to ease a user’s trouble of managing the thresholds for various AlertSettings related to a device so that he doesn’t get mail bombed and another one where he could easily view all the checks that are being run periodically in the background. There were few hiccups initially with AlertSettings Inline as there wasn’t any direct relation existing between the AlertSettings model and Device model, so we used nested admin to make up for this short fall.

Global configuration

Without requirements or design, programming is the art of adding bugs to an empty “text” file.

— Louis Srygley

It was Federico’s idea that since we had too many things in a single module (Charts, Metrics, AlertSettings, Notifications, etc.), it made sense to have a single place where we can easily configure default values for all of them. In production there will be 100s of devices which an operator of the network will manage. Changing values for all of them individually would be a nightmare, thus we implemented this solution to design one global configuration which can be overridden easily and if only a single value in the configuration needs to be changed then provide the user with a setting to easily do so. If you are interested to read more about this feature it has been nicely documented in the module’s docs.

I learnt an important concept here that good tools are good by their very design 😇.

The second Timeseries database

  • Hence we wanted to select one timeseries database that is horizontally scalable and is open sourced. It should have a good active community and documentation. We looked and compared across multiple options available including Prometheus, Victoria Metrics and few others.
  • Finally we went for Elasticsearch after I developed a basic prototype which could be used along with the above Abstraction Layer. Now, Elasticsearch is more of a search index than a timeseries database so it was slightly risky decision but owing to its very good documentation we were able to safely couple it with existing code without any major reformations needed. During this phase I learnt PromQL, Elasticsearch-dsl, did exhaustive reading and in the end created a draft PR for the same.
  • This happens to be the most exciting part of entire GSoC and one of the most challenging ones too with a steep learning curve. Currently there is one issue that still needs to be fixed for this to be fully useful. I will try to work and fix this because this is one very big add-on that I think the module can have and that will really be helpful to users.

Making tasks resilient to failures to prevent metric data loss

Documentation and CI

Besides these there were multiple minor Pull requests that we worked upon. If you are interested to know more about them, you may refer to the Done column to get a list of all the completed tasks.

Tech worked upon

What next …

I would like to thank my mentors without whom I am not sure I would have been able to make it this far. The community as a whole and other GSoCers from whom I learnt quite a few things. Most importantly I would like to thank Google for this gem of a program were students familiarize themselves with open source and become better programmers 😄

If you made it this far thanks a lot. I hope this gives you some insight into my GSoC experience and I wish you the best for your GSoC proposal 😇

Student | Developer | Roboticist | Trader | Numismatist | Open source contributor | IoT enthusiast | Learner