Building, Maintaining, and Sharing Knowledge: Software Library Strategies

Published in

The Pixel

5 min readFeb 29, 2016

Every successful tech company reaches the important milestone of having multiple software projects. After proving the success of the first project, you often have the need to expand and start something new. You start to find yourself thinking, “Hey, that code from our other project solves some of my problems. Should I just copy it over?” Copying your code from one project to another may solve your short term need, but often leads to long term issues, such as the two pieces of code diverging. Most importantly, it ignores a core principle of Software Engineering: abstraction!

Abstraction aims to conform software to the age old adage of not “reinventing the wheel”. Why do something twice when you can do it once? Do it once and do it right. Sure, you might say, “The tech industry is constantly moving and developers barely have enough time as it is.” But it is important to remember that you are setting a precedent not only for yourself, but also for those who will have to use your code in the future. Share your knowledge in an orderly and easy to use format.

Libraries Contain a Wealth of Knowledge

And so should yours. In software engineering and in the real world alike, there are at least two approaches to organizing knowledge. General libraries cover a broad range of topics and tend to encompass many fields. Specialized libraries, such as a science or humanities library, provides you with a narrowed down and focused set of information. Let’s look at some examples.

The general Python library that we use here internally is called PixKit. It was aptly named because it serves as a toolkit. It contains a variety of utility modules — for working with databases, datetimes, dictionaries, simple mathematical functions, text manipulation, and much more. PixKit is used in a majority of our microservices because they rest upon similar software needs. As a library, it aims to be a digital “Jack of all trades, but master of none.”

The opposite approach is taken by PixClient, one of our specialized Python libraries. The library only provides the user with a smart client interface that simplifies the popular HTTP requests library. It does this simplification by abstracting away the specifics behind the HTTP URLs and allows the user to request a much simpler “service” URL instead. PixClient is only good at one thing, but does it really well.

Distributing Your Library

Who are the intended recipients and users of your library? Is your library something that only your company could really use? Could it benefit the open source community? Would the open source community be able to help develop it into what your company needs it to be? These are just a few of the questions that you need to answer.

Building an internal library will probably require you to host and maintain your own server to act as a repository for your library. At Pixability, we host a PyPi server for our Python libraries. It’s almost like buying a plot of land and then building the library on top of it. It’s located exactly where it’s convenient for you. It gives you the most control over your library and allows you to do with it as you please.

When some people think of open source software they often think that it means the source code is simply up for grabs and no longer belongs to you. The reality is that there are a large variety of licenses that allow you define how your software is to be used and modified. One benefit is that there is often a public library repository for many programming languages, meaning distributing your software is as simple as uploading the source or executables. Arguably more important, however, is that you expose your code to the world, allowing for external feedback and innovation.

while coding: engineer.drink(Coffee(mocha=True))

There are many different setups depending on your programming language of choice, here is a popular setup for Python that we commonly use at Pixability:

Setuptools for managing & installing dependencies
Virtualenv for isolating the installation environment
Six for writing version independant Python code
PyTest for unit testing
Tox for running PyTest against multiple python versions

Write Thorough Unit Tests

A great unit test will not just cover code, like a blanket, but permeate it. Code coverage is a great resource to help with this, but the unfortunate reality is that it just isn’t quite enough. Many code coverage applications simply keep track of lines that ran. Some applications also provide additional information on how the code branched. In theory, a much better evaluation metric would also count how many unique permutations of input were tested against. Of course, this becomes an even more difficult problem to solve than it already is when dynamically typed programming languages are thrown into the mix.

Writing a great unit test is often much harder than writing the code you’re testing. The solution is to take the time to think through it. What types of input might I expect to see? What types of input might I not expect to see? Both of these questions should provide you with some data to test against. Software Engineers have a tendency to create static and “hard-coded” unit tests. Sometimes, where applicable, randomizing the input can yield valuable results because it does increase the size of your testing input set.

Semantic Versioning

Although in no way required, Semantic Versioning is a great strategy for versioning your library because it emphasises backwards compatibility. In a nutshell, versions are represented by the format “MAJOR.MINOR.PATCH”. The MAJOR version is incremented when you change your API in an incompatible way. The MINOR version is incremented when you add functionality in a backwards compatible way. Lastly, the PATCH version is incremented when you fix bugs in a backwards compatible way. The only time that backwards incompatibility is not completely frowned upon is before version 1.0.0 which should be your baseline for a stable API.

Takeaways

Building a software library is a great way of distributing code (knowledge) across multiple projects. Keeping your library thoroughly tested and versioned helps prevent massive headaches in the future. Why not share it with the rest of your company or even with the open source community?

“Pass on what you have learned.” — Yoda