As part of Pulse Lab Jakarta’s data science team, hacking is a central part of my job and something the entire team takes pride in. But when I say hacking, let me be clear, I mean it in the sense of dedicated programming, analysis, problem solving and tech engineering.
Our Lab develops toolkits, applications and platforms to improve data-driven decisionmaking and to support evaluation of promising solutions that are oriented towards public good. A question we often get asked is: How does Pulse Lab Jakarta develop these products? More explicitly, how does a UN-based data lab whose counterparts are mostly government agencies and other UN organisations actually “hack”?
I’ve been meaning to share some tips based on our tech development approach for a while now — but as usual, I always find these things easier to do than to write up. Here are my top five, though:
1. Be agile
In a research environment such as Pulse Lab Jakarta, necessary changes in the development stages of a software are inevitable. Still, the development process must be manageable and measurable despite such flexibility for changes. The Agile Movement, through its focus on finding alternatives to traditional project management, provides opportunities to evaluate the direction of a project in its development lifecycle. One of the methods is to divide product development work into small parts, which then minimises the amount of immediate planning and design. After each iteration (which can range from one to four weeks), stakeholders typically get a preview of an application’s working results. This enables the product to adapt to new changes quickly. Since we often collaborate with multiple players on a single project, similar to Agile, we believe that constructive reviews, frank progress updates, and flexibility for changes are essential ingredients for success in a software development process.
2. Don’t reinvent the wheel
At our Lab, we frequently use libraries and modules from other third-party softwares. This way, our team can focus on the research itself. True, using outside software may sometimes incur a fee, but when you compare this to the time and resources that would be invested for building something in-house, using third-party software is usually still the better option. In addition, many open source softwares have features that can be used. An alternative would be for a joint research with other institutions or labs that already have the tools needed. If you are trying to analyse insights for a specific case from a local radio news, for example, you could locate libraries, algorithms or modules that provide speech-to-text features in a particular local language. In such case, the focus would be on creating the main module to process and analyse the text results. Of course, the case would not be the same if the main purpose of the research is to create a local language recognition tool. With many apologies for resorting to an old cliché, do not reinvent the wheel unless your main research is about the wheel!
3. “What will our users use?”
It is always exciting to try a new language or test out state-of-the-art technologies, especially in the world of big data analytics. However, there are various considerations and parameters to include in the calculation when choosing which technology stacks to use for a particular product or project. New technologies are not always the best fit for purpose. Throughout our work on different projects, we have concluded that some technologies and programming languages may be ideal to fix certain problems, yet may not be useful for others. The parameters that weigh significantly in our consideration when choosing technologies to use include the cost and advantages of the whole stack, as well as long-term maintainability. While technology features are of course important, it’s ease of use for our potential clients that we prioritise.
4. Choose a modular design approach
Nowadays, many tools and software that are created in the tech industry are placed on cloud, which in some cases may mean massive architectures. In our Lab, we tend to prefer a modular design approach since this enables loose coupling. Therefore, in a case where we are working with complex systems, our team can confidently separate the modules without worrying about their independent functionality. In other words, the modules become interchangeable and the dependency between them is small. Besides its technical advantages, this approach makes for effective teamwork. It is also very helpful in situations where the data are owned by other organisations with strict licensing agreements. On some projects, we have had to work with external data sources that could only be accessed on specific local networks used by the data owners. By using a modular approach, we were still able to get the results we needed from external data sources and add these results to the larger design.
5. Use and support open source solutions
At Pulse Lab Jakarta, we encourage the use of open source solutions, due to their freedom, customisability, flexibility and interoperability. By using open source, the Lab is free from serious vendor lock-in. We typically find that open source softwares are better at sticking to open standards and principles in comparison to proprietary softwares. As an innovation lab that collaborates with data experts from both the public and private sectors, one of our key priorities is interoperability. We try to ensure that in the event we need to collaborate with other organisations for a greater public good, our efforts are not restrained due to the nature of closed source solutions’ inflexibility and costs. Correspondingly, we do believe it is beneficial to publish some of our applications as open source, as a way to contribute to various areas of development and to encourage feedback from the tech and development communities.
To conclude, these are some of the things that have worked for us, but in no way is this meant to be a full recipe. What our team has learnt from various experiments (and slight mishaps) is that a success story in one situation does not mean it will yield the same results if directly copied and pasted to another. So, we’ll continue trying out things, taking things apart, modifying them, and putting different elements together for better solutions.
Alright, back to hacking now.
Author: Muhammad Subair (Data Engineer)
Pulse Lab Jakarta is grateful for the generous support from the Government of Australia.