Our SDK really gets data scientists excited… no matter what operating system they use

How to build a user-friendly Python SDK

Harrison Linowes
Arthur Engineering
Published in
8 min readDec 11, 2020

--

No matter how well designed and implemented an application may be, if the user experience sucks, the application won’t be effective. Dashboard and monitoring products in particular are often marketed and designed for different end users: analysts, business leaders, software engineers. Each user may have a different preferred interface to use the application. Analysts may often use a GUI while a data scientist may use an API or SDK. The challenge is often designing the best user experience across all of these different user interfaces.

At Arthur we’ve taken an API-first approach when designing and building our model monitoring platform. The API exposes functionality to allow customers to manage their machine learning models, send inferences as they are computed, and retrieve information related to the performance of their model. Most data scientists prefer to interact with our platform through our Python SDK, which eventually communicates with our REST API. Our SDK seamlessly integrates with model pipelines, which often are in the form of Jupyter Notebooks. As we’ve built out this functionality we’ve learned a lot about best practices and techniques for creating an SDK. Here’s what we’ve done to build a user-friendly SDK:

Create Clear Function and Class Import Paths

Pathing became a large focus as we were designing our SDK. We want to ensure that users can anticipate the location of utility functions, core SDK classes, and constants they may need to use throughout development. Logical pathing also can be helpful when using autocomplete in IDEs. We determined four main packages that all modules will live under.

  1. Client, handles HTTP communication with our REST API
  2. Common, contains constants and exceptions that may be referenced or used in other pieces of the SDK
  3. Core, contains all core classes that are used to interact with our application (ArthurModel and ArthurAttribute)
  4. Explainability, contains model explainability specific code
SDK folder structure

While organizing code logically can make it easier for users to find import paths, we generally will create shortcuts for commonly used objects. The easiest way to do this is to import a module in a root package __init__.py file. For example, since the ArthurModel is located in arthurai.core.model.ArthurModel we created a shortcut by importing it directly to the arthurai package. This simplifies the syntax to from arthurai import ArthurModel. Logically organizing our code not only makes it easier for users to interface with the SDK, but also helps organize the documentation intuitively.

Provide Thorough Documentation

Code documentation is a vital part in ensuring we create an easy to use SDK. We use RestructuredText and Sphinx to auto-generate user documentation. ReStructuredText is a plaintext markup syntax which can be used in Python function and class inline documentation. Sphinx is a framework which can auto-generate documentation from reStructuredText and export it to multiple formats including HTML. Sphinx is often used to generate Python library documentation and while reStructuredText is the recommended markup syntax for Python documentation (PEP 287), there are three markup syntax options that most developers will use:

ReStructuredText

def foobar(arg1: int, agr2: str) -> bool:
"""Function summary

:param int arg1: Description of arg1
:param str arg2: Description of arg2
:return: Boolean determining fooness of the bar

:example:

>>> result = foobar(1, "hello")
"""

return True

Google-style

def foobar(arg1: int, agr2: str) -> bool:
"""Function summary

Args:
arg1 (int): Description of arg1
arg2 (str): Description of arg2

Retruns:
bool: determining fooness of the bar

Examples:

>>> result = foobar(1, "hello")
>>> print(result)
True
"""

return True

Numpy-style

def foobar(arg1: int, agr2: str) -> bool:
"""Function summary

Parameters
----------
arg1: int
Description of arg1
arg2: str
Description of arg2

Returns
-------
bool: determining fooness of the bar

Examples
--------

>>> result = foobar(1, "hello")
>>> print(result)
True
"""

return True

There are pros and cons to each style and a decision on which to use in your library often comes down to personal preference. We chose to use reSructuredText because of its easy integration with Sphinx and some of our developers familiarity with it. As I mentioned Sphinx supports reStructuredText out of the box, but a plugin called Napoleon can be used to add Google and NumPy style support.

Set Clear Function Design Conventions

While most of the basic functionality of our application is integrated into our SDK, to ensure nimble and clean future development we outline 8 guidelines which our engineers use when creating new functionality.

  1. Determine what package/module new code/functions should live under, or in rare cases, whether a new package or module should be created.
  2. Do not use static methods (@staticmethod) for client facing functions; instead, use module level functions. If a function is not associated with an instance of a class, and is a utility function, it is more clear to associate the function with a module.
  3. Do not use simple getter and setter functions. Use Python @property tags instead. In addition to being more pythonic, this maintains a syntax similar to directly accessing attributes but allows more control over how the attributes are retrieved and manipulated.
  4. A function’s action should be represented by its name.
  5. Try to not hide function operations from clients. For example, if a function significantly transforms its input before storing or sending it to the API then consider creating and exposing a helper function for the client to use. Ensure documentation and error handling is properly updated to guide clients to use this helper function.
  6. Create functions which do specific operations rather than generic functions with optional parameters. If clients use incorrect function, exceptions should direct them toward the correct function for their use-case.
  7. Specify function parameters and generally avoid *kwargs.
  8. Functions which do similar operation should have similar names. In addition, similar parameters should be in the same order with unique parameters appended at the end.

Give Clear Explanation of Exceptions and Error Handling

Many SDK functions will interact with backend microservices or utilize code from 3rd party libraries, any of which could throw an exception. This increases the importance of error handling to ensure users can understand issues and guide themselves to a solution.

Within our SDK, we use custom exceptions to handle errors that occur. Inspired by Python’s hierarchical organization of exceptions (Python Exception Hierarchy), we have taken a similar approach to ensure development is smooth and error messages are clear. At the top of the tree is a Base Exception class which contains helper functions and standardizes our error message format.

Most function calls within our SDK contain a HTTP call to our REST API. API calls generally return errors through HTTP response codes. However, to add extra functionality and convenience in our SDK, we may add additional error handling to our REST calls. To do this we handle all HTTP responses within the SDK functions and not within our HTTP Client package. This allows us to customize API response handling within the context of each SDK function. For example, when calling ArthurModel.save() we hit a POST endpoint on our API by calling the SDK HTTP client package (client.post()). Instead of handling any model save errors within the client.post() method we will pass them through to the ArthurModel._save_model_post() function which initiates the call.

Exceptions handled in HTTP Client:

def _save_model_post(self, model):
"""Saves a model object by sending a POST request to the API"""
# if REST call errors are handled in the http_client
# function it will not have access to model
# atttritbutes and functions
response = self.http_client.post(self.to_json())
return True

Exceptions handled in SDK functions, allows for more module error handling:

import ATTRIBUTE_ERRORdef _save_model_post(self, model):
"""Saves a model object by sending a POST request to the API"""
response = self.http_client.post(self.to_json())if response.status_code == 400 and
ATTRIBUTE_ERROR in response.json():
# retrieves the name of the attribute from the response
# message
attribute_name = response.body() \
[response.body().find(":")+1:]
raise Exception("Error saving model due to " +
f"invalid attribute: {model.get_attribute(attribute_name)}")
return True

Our SDK utilizes a handful of third party libraries which may throw their own exceptions. Another principle we take when using 3rd party packages is to catch exceptions and wrap those errors in our own messaging. This is done to help avoid confusion and give more context to unintended errors.

Testing Everything. Twice.

Testing a customer facing tool is vital. Our test suite primarily consists of unit tests which ensure the functionality of each function call within the SDK. SDK’s in general are often used by clients in various configured environments. For Python SDK’s this means different versions of Python and of existing libraries. Another aspect of an evolving application is ensuring backwards compatibility. Building and maintaining an expansive test suite can help ensure the library will work as intended in all of these situations.

Separate the Test Package

To simplify the structure and minimize the size of our SDK, we’ve separated the testing package from our SDK bundle. Since our testing package is separate, it is organized in the same manner as our SDK package. For example in our SDK the ArthurModel class is located at arthurai.core.models therefore the tests for ArthurModel are located in tests.core.models.

Mock HTTP Client Functions

Within the ArthurAI library we built our own custom wrapper to Python’s request library. The main advantage to this structure is so we can abstract all REST specific functionality to its own package within our library. While it is possible to mock the request library (we’ve used request-mock in the past) the structure we use also makes it easy to mock functions which make HTTP calls instead of dealing with the requests library directly.

Testing CI/CD

Before every PR is merged into our master branch we run a set of tests to ensure that each commit does not introduce new bugs into our library. For our testing suite we use Pytest. However in order to confirm that our library continues to be compatible with all supported Python versions we run our tests using a tool called Tox. Tox provides the ability to run the SDK test suite within different environments making it easy to test the code across different dependencies and versions of Python. In addition to running our test suite, we also run a linter to enforce code styling, conventions, and typing. We use a tool called MyPy for this.

Conclusion

Throughout iterating on the design and implementation of our SDK we found it’s easy to create a library to interface with an API, but it’s difficult to determine how to do this in a scalable and intuitive way which prioritizes the needs of both the developer and end user. The guidelines discussed above have helped us maintain a clean repository of code, made it easy for our developers to add functionality, and created a seamless environment to test and document the ArthurAI Python SDK.

Miscellaneous SDK Conventions & Links

Things that inspired us:

--

--