How To Install A Private Pypi Server On AWS

Jereth Chu
The Startup
Published in
12 min readJul 8, 2019
Photo by Markus Spiske on Unsplash

Motivation/Background:

What is a pypi server? Those that use python on a regular basis will know that pypi is basically the server that hosts all the public python packages that we all use and love — for example, packages like pandas and requests. These packages can then be installed using pip.

pip install pandas

For those who are not familiar with python, you can take a look at the public pypi server here:

So, what’s the point of setting up your own private pypi server?

Whey you install packages from the public pypi server, you are installing packages that other talented people wrote. However, you might have written something at your company that’s proprietary. You want to share it with everyone at the company, but you don’t want anyone else to install it via the public server. In this case, setting up your own lightweight pypi server might be the way to go.

In my case, our company was using an internal package very often to abstract the uploading and downloading of data from pandas data-frames to our mysql instances. However, this package was simply kept in a folder and copied into each new project repo. We realized that if we had to change any of the source code from the folder, we would have to do it for all the different repos — each with their own copy of the folder. It became pretty clear that turning the source code into its own github project and uploading it as a package to our own pypi server would be a better option. However, the documentation from setting one up end to end was pretty sparse. We had to find different snippets here and there to piece everything together.

So, for the rest of this article, I will be walking you through the steps we took to set up our own pypi-server on an EC2 instance on AWS with password authentication using existing Docker images and docker-compose. This guide expects you to have a descent grasp of Docker images and docker-compose. If not, please read up on the documentation here first:

Docker:

docker-compose:

For the next portion of this guide, I’m going to assume you already have some knowledge of:

  1. Launching EC2 instances on AWS
  2. Using SSH to connect to your EC2 instance
  3. Basic understanding of bash commands

Setting Up The EC2 Instance:

I’m assuming that you already have an AWS account for this tutorial. If not, head over to https://aws.amazon.com to set one up.

Once you have an account, head over to the main AWS site and log-in to the management console by clicking on the orange button at the top right corner of the screen displaying “Sign in to the Console”.

Once you are logged in, you should see a search bar at the top of the page under “Find Services”. Type in EC2 in the bar, then click on the auto suggested option.

Next, you should be at the EC2 Dashboard. Click on the “Launch Instance” blue button under “Create Instance”. You’ll be taken through a series of steps to launch your EC2 instance. The first step is to pick an image for your new instance. We went with the second choice — Amazon Linux AMI 2018.03.0 (HVM), SSD Volume Type, because the repositories is supposed to already include Docker and python should be installed by default.

(I won’t go into too much detail about each step. If it’s your first time setting up an EC2 instance, please go read the setup guide from AWS.)

After you finish setting up your instance, go back to the EC2 Dashboard and you should be able to see your new instance starting up. Wait a few minutes, then SSH into that new instance you created using the following as an example:

ssh -i~/.ssh/YourPrivateKey.pem ec2-user@YourInstanceIPv4Address

(notice the username here in the ssh command is ec2-user, if you chose an ubuntu AMI instead, it would be ubuntu@IPv4Address instead)

Installing Server Packages And Dependencies:

Assuming you’ve successfully connected to your instance using SSH, you should see something like the image below:

Do a package update as suggested by the server prompt. (If you used another linux distribution for the AMI, you will need to use its native package manager instead)

sudo yum update

See if python3 is already pre-installed:

sudo yum list | grep python3

If not, install the version you would like to use:

sudo yum install python36

Next, install docker:

sudo yum install docker
# start the service
sudo service docker start
# give docker permission to run without using sudo every time
sudo usermod -a -G docker ec2-user
# different username if you are not using the amazon linux AMI# exit the instance to make sure the changes take effect
exit

SSH back into the instance and test if the changes have taken effect:

# check if you can run docker without the sudo command
docker info
# if not, debug the previous steps. If so, run a test image
docker run hello-world
# you should see a hello message from docker after running the last command

Now, we need to install docker-compose:

# run each of the following commands 1 at a timesudo curl -L https://github.com/docker/compose/releases/download/1.21.0/docker-compose-`uname -s`-`uname -m` | sudo tee /usr/local/bin/docker-compose > /dev/null# above version might be an older version, so if you want to download the most recent version, check their releases first# give the proper permission for docker-compose
sudo chmod +x /usr/local/bin/docker-compose
# create a symbolic link so you can run docker-compose by just typing docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
# check if the install worked
docker-compose --version
# you should see docker-compose version x.xx.x, build xxxxxxx

Congrats! The server is pretty much set up. Now, you just need to set up the actual pypi-server on top of the EC2 instance.

Pypi-Server Installation:

For the actual pypi server itself, we will be using the officially supported pypi server docker image — pypiserver/pypiserver:latest

But before we download the image, we need to set up a few more things.

First, we need to set up a directory to store usernames and passwords that the pypi server will use to authenticate upload or download requests. We used the “htapasswd” package for this.

# install httpd-tools with yum
sudo yum install httpd-tools
# switch to the user's home directory
cd
# make a new directory called auth
mkdir auth
# cd into the auth directory
cd auth
# create a new .htpasswd file
htpasswd -sc .htpasswd <some_username>
# it will prompt you to enter a new password. Follow the prompts

Nice! You just added the first user that can use your private pypi server. You can add as many new users as you want. However, you need to alter the command slightly or you will be deleting the old .htpasswd file each time.

# -s instead of -sc from before
htpasswd -s .htpasswd <SomeNewUsername>
# follow the prompts

Now, we can finally pull the pypi-server image and spin up the container!

# go to the user's home directory
cd
# create a new docker-compose.yml file
touch docker-compose.yml
# edit the file using vim (vim has a bit of a learning curve)
# but for now you just need to know a few things:
# i means insert, you cannot type text until you press i first
# pressing esc will take you to visual mode, you cannot edit text
# in visual mode, but you can move your cursor around and use
# vim commands
# vim commands start with :
# :w + [enter] will save the file
# :q + [enter] after saving will exit the file
# you can combine them like this
# :wq + [enter]
# let's write our docker-compose file
vim docker-compose.yml
# inside vim editor...# press i to go into insert modeversion: '3.3'services:
pypi-server:
image: pypiserver/pypiserver:latest
ports:
- "8081:8080"
container_name: pypi-server
volumes:
- type: bind
source: /home/ec2-user/auth
target: /data/auth
- type: volume
source: pypi-server
target: /data/packages
command: -P /data/auth/.htpasswd -a update,download,list /data/packages
restart: always
volumes:
pypi-server:
# press [esc] then :wq + [enter] to save and exit the file# now print out the file you just edited to make sure it was saved
cat docker-compose.yml
# you should see everything you wrote in the file

So what did we do here? I won’t go into detail about every line we wrote, but there are a few important points.

First, the port mapping is import, because we can’t make requests directly to the docker container. Instead, we will be making requests to the host EC2 instance we have, so we mapped the host 8081 port to the docker container 8080 port.

Next, we gave the container an easy to remember name via “container_name”. This way, we can easily find and work with this container in the future by using pypi-server instead of the long id docker assigns the container.

We also mapped the host directory we created, which contains our .htpasswd file to the container volume at /data/auth. This allows our pypi-server in the docker container to handle authentication using the host file we created to validate incoming credentials.

Then, we created a named volume “pypi-server” to map to the /data/packages volume in the docker container. This allows packages we upload to the pypi server within the docker container to persist in the named docker volume we created on the host machine. You can check this volume by typing:

docker volume ls

Otherwise, if the container goes down for some reason, the packages uploaded already would be lost.

Finally, we specify the restart field as “always”. This ensures that if the container goes down by accident, it will always restart.

Now, you can boot up the container with:

# use docker-compose to run the container
docker-compose up -d
# the -d option runs the container in the background# wait a few seconds then run
docker container ps
# you should see a container running called pypi-server# exit the ec2 isntance
exit

Congrats! There have been a lot of steps, but your private pypi server is up and running on the cloud now! If you already know how to upload and install packages from a custom pypi server, you can stop here. If not, keep reading.

Uploading Packages To Your Pypi Server:

I won’t go into detail about how to upload packages to a pypi server, since there are many wonderful guides already, so I will link to one instead. You should probably learn how to do this even if you have no interest in setting up your own pypi server, since the upload process to the public server is pretty much the same. Here is the guide we used to prepare our package for upload:

Now assuming you’ve read the article linked above, you can use the following command to upload your package to the pypi server created earlier in this guide.

twine upload --repository-url http://(ec2 IPv4 IP address):8081 dist/*

If you’ve set up your pypi server correctly and entered the correct IP address for your EC2 Instance, you should see the upload succeed. Keep in mind that if you stop and reboot your EC2 server, your IPv4 IP address will change, unless you buy a static IP address from AWS.

To verify that your package was uploaded, you can access the pypi server using your browser at http://(ec2 IPv4 IP address):8081, and you should see something similar to this:

The page will provide a link for you to view all the available packages on the pypi server. If the package you just uploaded shows up, then the upload was definitely successful.

Downloading From Your Pypi Server:

You can install packages from your pypi server now using pip by running the following command:

pip install --extra-index-url http://(EC2 IPv4 IP Address):8081 YourPackageName --trusted-host (EC2 IPv4 IP Address)

If you don’t want to type the extra url and trusted host additions to the pip command, you may set up a .pip directory in your $HOME directory with a file called pip.conf like so:

[global]
extra-index-url = http://(EC2 IPv4 IP Address):8081/
trusted-host = (EC2 IPv4 IP Address)

Make sure your pip, especially if you are in a virtual environment, is the most recent version. We ran into some issues installing our own packages when our pip was below version 19.0.0 using the method above.

You should also look into setting up https using SSL certificates for your server too. Otherwise, http will send your username and password as plain text over the web traffic. This is fine if you just want to test and learn for now, but if you really have something proprietary, then make sure you set up this ASAP.

For most people, reading up to here will suffice. However, our company had one more use case that took some time to figure out. Read on if you want to dockerize images using a mix of private packages and public packages.

How To Build Docker Images Using Your Own Pypi Server:

The problem occurs when you copy the commands from downloading from your pypi server into your Dockerfile. This is because the download command triggers the authentication from the server, which would require you to enter your username and password. This is fine when you are downloading packages using pip from your bash console. However, when building an image with docker-compose build, the process would fail since you cannot enter the credentials during the building process manually. Unfortunately, docker-compose build does not have an interactive mode. There may be other methods to bypass this, but here is how we did it.

You need to add 2 variables into the Dockerfile in the form of ARGs. Below is a Dockerfile we used, but with sensitive information taken out.

FROM python:3.7-slim-stretch
MAINTAINER pmdbt "jerry@lofty.ai"
# install dependenciesRUN apt-get update && \
apt-get install -yq gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 \
libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 \
libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 \
libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 \
ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
# start building from the home path of the image
RUN cd
COPY . /app
WORKDIR /app
ARG USERNAME
ARG PASSWORD
RUN pip install --upgrade setuptools# Append the username and password ARGS in front of server URL
RUN pip install --extra-index-url http://$USERNAME:$PASSWORD@(EC2 IPv4 IP Address):8081 LoftyDataFetcher==0.1.3 --trusted-host (EC2 IPv4 IP Address)
RUN pip install -r requirements.txt
ENV AWS_CONFIG_FILE=/root/.aws/config
CMD ["python3", "-u","main.py"]

Notice that all we changed from the original download command was to add the $USERNAME:$PASSWORD@ in front of the server URL. This allows the values for the USERNAME and PASSWORD to be passed into the image building process within the bash console instead of exposing your actual username and password to the rest of the company or everyone else on the internet.

Then write up your standard docker-compose.yml file and run “build” with the following command:

# pass in the username and password for your pypi server below
docker build --build-arg USERNAME=<YourUserName> --build-arg PASSWORD=<YourPassword>
# press enter and you should see your image starting to build and now docker will know how to handle the authentication portion of installing a package from your private pypi server

Conclusion:

Done! Hopefully, if you have been following this guide the entire time, then you now know how to deploy your own private pypi server with authentication onto an AWS EC2 instance, build and upload your own python packages to your pypi server, installing packages from your pypi server, and how to deal with installing your own packages during the docker image building process.

If you use another cloud provider like GCP or Azure, you should be able to ignore the AWS setup portion, but still follow the rest of the guide.

This guide ended up being longer than I wanted, but hopefully it will save you a lot of the headaches we dealt with over multiple work days. If anyone knows of a more efficient process for anything I talked about, feel free to leave a comment below to help others out.

Feel free to reach out and contact me if you have any questions.

Linkedin and Github

--

--

Jereth Chu
The Startup

I’m a self taught programmer and a founder of Lofty AI (YC S19). I love reading about anything to do with programming, new cloud infrastructure, and ML/AI.