Creating Your First Python Package

This is my first blog post with hopefully many more to come, talking about how to create your first Python package and release it on PyPI. I’ve been writing software for my startup and for various ad hoc development projects for the past couple of years, however, after I put out my first package a couple of weeks ago, I realized there is so much one learns and sees the benefit of, that not writing code in a “good practice” way actually becomes more painful.

Being a data scientist first, best software principals weren’t always at the forefront of my mind when doing projects. After forcing myself to write tests and attempt to pick up more best practices, I find it being a mindset that I can’t believe I’ve lived without for so long. These are things learned from a small package, I put out in under 1 weeks too! For those of you who are interested, the package is called icd. I have also just released a second one called fastteradata, however, I accidentally got too excited and am currently in progress of writing the retroactive tests for it and doing a major cleaning of the code now.

Some of the benefits of writing your first package, no matter how small, will be:

  1. Being forced to think much more modularly
  2. Writing code with tests in mind
  3. Thinking about making the api simple enough for others to intuitively grasp and reuse your work
  4. Learning new tools that come along with writing tests

Enough about the benefits, let’s get to it!

First, let’s go through a very basic package example starting with the folder structure. A great resource for folder structure templates is a package called cookiecutter. There are many different templates to choose from, and you just pick one, call the command on a downloaded example, and in no time you have some basics filled out to get you started faster. In my case, I used the pypackage-minimal template because I wanted a simple skeleton and wanted to use pytest.

After you use cookiecutter, you’ll want to make sure you have the following files, which we will go through each one at a high level. You can explore each one of these items in much more depth later on.

Directory <package_name> — Where all of your actual package code will live
Directory tests — Where all of your tests you write will live
tox.ini — Used in conjunction with pytest for multi-environment testing
codecov.yml — Allows you to measure how much of your code is backed by tests
LICENSE.txt — License to distribute your code under
README.rst — Where your initial documentation will live (for larger packages, you might move some of the documentation out of your README into something like a Sphinx site.)
setup.py — Used to generate and specify details about your package

Great, now that we have the structure let’s look at actually writing some code. In your package_name directory, which from now on we will call animalsounds, let’s first start by creating an __init__.py file to represent the package. In that file, we can place something like:

from animalsounds.sounds_generator import generator
__version__ = '0.1.0'
__author__ = First Last <email@gmail.com>'
__all__ = []

This will be where everything gets imported for your users to access. In this case, we import the generator method from a file, sounds_generator. Now, create a sounds_generator.py file and place something like the following inside.


sounds_dict = {“cat”:”meow”,”dog”:”woof”,”fish”:”...”}
def generator(animal):
    “””
        Summary:
            Translates what the sound different animals make
        Args:
            animal (str) – A string of the animal you want to hear, valid options include ‘cat’, ‘dog,’ and ‘fish’
        Returns:
            A string of the appropriate sound.
    “””
    sound = “”
    try:
        sound = sounds_dict[animal.lower()]
    except:
        raise Exception(f“We don’t know what a {animal} sounds like! You could be on to something!”)
    return(sound)

You’ll notice in this simple function that the doc strings are almost longer than the actual code logic. This is something that you will thank yourself in the future for a thousand times over. But now that we have our first python package function, what’s next?

Tests!

Any piece of code that you write, you should be writing with tests in mind so that you can better break apart larger functions into base components to hopefully get more reuse out of them. In this case, we wrote our package code first, however, you can also write your tests first, knowing your function can’t fulfill them and only then fill in the code until the test passes. There is a lot of great material out there for Test Driven Development (TDD) if you want to explore the philosophy behind it more. This is something that is a major level up in best practice coding that once you force yourself to do, everything seems so much easier.

For this example, we will be using pytest. So first, same as last time, within our tests directory, let’s place an empty __init__.py file. Next let’s write some tests! When using pytest, by default, it searches the tests directory for files and methods that begin with “test_” recursively. So let’s make a file called test_generator.py and place the following code in it.

import pytest
import animalsounds
def test_generator_cat():
    sound = animalsounds.generator(“cat”)
    assert sound == “meow”
def test_generator_dog():
    sound = animalsounds.generator(“dog”)
    assert sound == “woof”
def test_generator_fish():
    sound = animalsounds.generator(“fish”)
    assert sound == “...”
def test_generator_notfound():
    with pytest.raises(Exception):
        animalsounds.generator(“emu”)

Now, with the following code, we have covered every case for our generator method. Pytest syntax for most tests will just be an “assert” command with your test of the function with known parameters compared to the expected behavior. The last test we wrote, we took advantage of testing even when we expect the function to raise an Exception case. This syntax in pytest is simply the with pytest.raises(Exception): line followed by an invocation of your method that should raise the exception. You can be more specific in these cases as well with specifying specific types of Exceptions.

To invoke the tests and see them pass, just cd to the root of the project and after making sure you have pytest pip installed, type pytest in the terminal and there you go!

Packaging up

Now that we have our code and our tests, let’s expand the functionality of our testing basis a little by introducing tox. Tox allows us to run through tests in multiple environments so that you can be sure your code will work with the differences between python versions as well as linux vs. osx. In our tox.ini file, let’s place the following code:

[tox]
envlist=py34,py35,py36
[testenv]
passenv = *
commands=py.test --cov=animalsounds tests/
            codecov --token={env:CODECOVTOKEN}
deps=pytest
     pytest-cov
     codecov

The above code essentially says we are going to try and test using python3.4, 3.5 and 3.6 environments lays out the dependencies, which in this case is just pytest and some code coverage libraries, and specifies the commands to execute in each environment. This is a great time to talk about code coverage.

The idea behind code coverage is to see what percentage of your code is covered by your tests. It gives a good idea of the confidence of deploying and trusting the methods you write can be repeatable and trusted. This is because the report comes back with a percentage vs. a normal build just having a pass or fail status based on if all 1 of your tests pass or all 1,000 of your tests pass. Codecov is a service that attempts to do just this. Pytest-cov is a package that allows us to generate the specified codecov reports from pytest, which you can see us invoking in the first command. Then the second command is what uploads the codecov report to the service so we can pull down the appropriate badge. When you set up your codecov account, you can sync up your specified github repo and are given a token for each repo. So when we run tox on our local machine, we will have to set the CODECOVTOKEN environment variable before running tox.

Uh oh, as you can see with tox testing the different python environments, you can see there is a syntax error for the python3.4 and 3.5 environments. That is because our package uses a new feature in python 3.6, the function string. f” some string {variable_to_render}” We can either go back and change our package code for compatibility, or in this case, I like the clarity and syntax better, so let’s just remove the py34 and py35 environments in our tox.ini file.

Ok, almost there!

So at this point, we have our package logic, our tests, multiple environments tests, and code coverage reports. Now what happens if others start contributing to our repo and someone edits code, but forgets to run tox or pytest locally before pushing to our repo? Bugs could be introduced and people who depend on your code now have things that break. This is where Continuous Integration (CI) comes in. For this, we will be using travis. What travis does is every time someone pushes to our github repo, we tell travis to run tests and generate new code coverage reports via tox and if something breaks the build, we can send off notifications alerting us of the tragedy. This then auto updates image sources for our code coverage and build passing/failing badges in our README.

To set up travis, we just connect our repo, insert our CODECOVTOKEN environment variable into the settings for our project, and we are good to go!

So now that we have all of the main code components for our new package, now we have to set the license, write documentation and example use in the README, and edit our setup.py file.

For the license, look up and pick out your favorite software license and just copy paste into your LICENSE.txt file.

For the README, use an online real time editor and try to write it in Rich Structured Text (rst) format with examples, docs, dependencies, installation instructions, etc. If you used cookiecutter, you should already have a nice template to work from.

Finally, the setup.py file, you can follow a form like this:

import setuptools
setuptools.setup(
    name="animalsounds",
    version="0.1.0",
    url="https://github.com/username/animalsounds",
    author="First Last",
    author_email="email@gmail.com",
    description="An exploration of different animal sounds.",
    long_description=open('README.rst').read(),
    packages=setuptools.find_packages(),
    install_requires=[],
    classifiers=[
        'Development Status :: 2 - Pre-Alpha',
        'Programming Language :: Python',
        'Programming Language :: Python :: 3',
        'Programming Language :: Python :: 3.6',
    ],
)

Can We Upload Yet?

Once we have all of these items, have pushed to github, have seen our continuous integration run our tests and generate the build passing and coverage reports, we can now upload our code to PyPI with confidence!! How do we get it up there though? Well, we are in luck because there’s a python package for that, and it’s called twine.

Twine makes sure to package up and upload your code to the PyPI servers through a secure connection. You will have to also go on PyPI and create a user account before doing this, but after doing the one line twine command to register and upload, you now have your first package deployed!

Congratulations! Try to pip install it and give yourself a pat on the back!