Documenting Your Data-Science Project — A Guide To Publish Your Sphinx Code Documentation
--
In this article, we like to give you a step by step guide on how to document your Python Data Science project effectively as part of your machine learning model development. You find our example code on GitHub. The solution, we propose, ensures that your documentation is version controlled, shipped with the source code executing your machine learning experiment and made available to your users or co-workers using generally available tools including Sphinx, GitHub, Azure DevOps, and Azure Web App.
Before we start with the technical implementation, we would like to share our thoughts on the necessity and peculiarities of Data Science project documentation. Data Science projects include writing code and using machine learning libraries e.g. TensorFlow, sci-kit learn, PyTorch. In Data Science, we develop software for data preparation, for machine learning model training and testing as well as result evaluation. At least in part, we can treat Data Science projects as if they were software development projects with an additional dynamic component — the data. Therefore, adopting best practices from the software development community into our Data Science projects is a good start. Documentation paradigms in software development aim to support continuous code development, maintenance, and knowledge transfer between developers. For more thoughts about documenting software, we recommend reading the blog written by Andrew Goldis [2].
The Workflow: An Overview
In this tutorial, we will connect 4 different tools or services, namely Sphinx, GitHub, Azure DevOps, and Azure Web App.
In Figure 2, we sketched the automated workflow from the local sphinx installation to the published documentation. One of the major advantages of the solution provided in this blog is the ability to control access to your documentation facilitating the power of Azure Active Directory sign-in. In particular, in a corporate environment, this can be a crucial point. The code used in this example can be found on GitHub.
The Workflow
- ensures that your documentation is automatically updated whenever new code is committed to GitHub or Azure DevOps.
- publishes the documentation to the Microsoft Azure cloud with the option of full access control.
- allows for user management allowing access to the documentation either to anonymous or authenticated users.
Requirements
- Anaconda with python3 (here python 3.6 used)
- Sphinx
- TensorFlow
- Microsoft Azure free account
- Azure DevOps free account
- GitHub account or Azure DevOps integrated git repositories
Getting sphinx up and running
In case you do not have a working sphinx environment, we recommend the sphinx documentation and tutorial. In a new python project, we use the following sphinx commands in combination:
- sphinx-quickstart
- sphinx-build -b html <Source_PATH> <Build_PATH>
- sphinx-apidoc -o <output_PATH> <module_PATH>
- Hands-on tuning and embellishing
Remember to run your python setup script before you build the documentation. (python setup.py install)
Connecting GitHub and Azure DevOps
Connecting GitHub with Azure DevOps is easy to setup:
- Sign in to Azure Boards
- Choose Project Settings
- Choose GitHub connections
- Click on “Connect your GitHub account”
- Sign in using your GitHub credentials
- Select the GitHub repository you like to connect
- Choose Save
- Review the GitHub page and then choose “Approve, Install, & Authorise”
- Confirm with your GitHub password
A comprehensive and detailed description of how-to connect Azure DevOps to GitHub is given here
Setting up the Documentation Build Pipeline
Following best practices and the experiences of the software development community, we will keep the build and deploy/release pipelines separate. An overview of Azure DevOps pipelines can be found here. In this section, we focus on the build pipeline. Our pipeline will go through 5 steps (see Figure 3).
First, the python packages installed and upgraded. Next, the python package we like to document is set up. Then the Sphinx generated documentation is built and then the created HTML files are copied into the artifact directory.
Let’s get started:
- Sign in to Azure DevOps and navigate to your project
- Navigate to the Pipelines page (left column)
- Choose the action to create a new pipeline
- Walkthrough the steps of the wizard and select GitHub as the location of your source code.
- You might be redirected to GitHub to sign in. If so, enter your GitHub credentials
- Select the git repository from the list
- Configure your pipeline by selecting the YAML file from the repository (see Fig. 4)
- Review the pipeline defined in the YAML file and click run (see Fig. 5). The full YAML file can be found in the GitHub repository.
- To see your build select Azure DevOps ➢ Pipelines ➢ Builds ➢ select the latest build (Fig. 6)
- The details of the pipeline are listed and the generated HTML files can be found under artifacts ➢ drop (Fig. 7 top right)
Setting up Microsoft Azure Web App
Let’s use the Azure cloud shell, that we do not need to install the Azure CLI.
Create a resource group, then the Azure App Service Plan and create the Azure Web App.
Now, you can visit your created web service with the browser of your choice.
If you like to control access to the web app and in the end your documentation, you can use the power of Microsoft Azure Active Directory sign-in. For example, this becomes necessary often in a corporate environment. Good documentation can be found here. Having the possibility to use the power of the Azure Active Directory is certainly a huge advantage of the solution presented in this blog.
Connecting Azure DevOps and Azure Web App Service
To be able to set up the release pipeline, we need first to connect Azure DevOps with Azure Web App Service. In short, you need to go through the following steps:
a) Project settings
b) Service connection
c) Choose “+ New Service Connection”
d) Select the connection type: Azure Resource Manager
A detailed description can be found here. If you are working with your company subscription, you may not have the necessary rights to set-up a new service principal, you may need to ask your local administrator or IT to do it for you.
Setting up the Documentation Release Pipeline
In the build pipeline, sphinx generated the HTML files containing the documentation. In the final step, we will deploy the web app via a release pipeline (doing it in the build pipeline is dirty, remember that build != deployment). Let’s get started with setting up the :
- Sign in to Azure DevOps and navigate to your project
- Navigate to the Releases page (left column)
- Choose the action “+ New release pipeline”
- You can select Featured templates or an empty job
- Choose the pipeline “Azure App Service deployment” and click “apply”
- You can rename “stage 1” e.g. to “DeploySphinxDocu” and close the window (Fig. 9)
- We renamed the pipeline to “Release Latest Sphinx Documentation”
- Click on view stage tasks “1 job, 1 task”
- Setup the task (see Fig. 10):
a) Connection type: Azure Resource Manager
b) Azure subscription: Use the connection you set up in the previous section (it should show up in the drop-down menu)
c) App Service type: Web App on Windows
d) App Service name: use the name you defined in the section (Setting up Microsoft Azure Web App)
e) Adjust the field Package or folder to point to the artifact folder (remember in the build pipeline we defined it as drop)
- Select “Pre-deployment conditions” and choose “After release” as your selected trigger (Fig. 11)
- Set up Artifact filters. Choose Type: Include & Build branch: master
- Edit Artifacts
- Set continuous deployment trigger
- You can manually trigger a release by choosing “create release” (top right)
a) Click create release
b) You can see you release in the Release overview
c) Select the latest release to monitor your progress
Finally, you can visit the Web app and see your data Science documentation.
Every time you push new code to the master branch new documentation will be build using sphinx and then deployed to the Microsoft Web App. That way your documentation will always be up to date.
Special Thanks
Special thanks for support to Timo Klimmer you find him on LinkedIn and Twitter.
References
[1] Cinta García wrote an inspirational blog (4 December 2017)
[2] Andrew Goldis (31 March 2018)