How to Manually Download Container Images
Understanding the OCI Standards
In my quest to better understand how containers work, I’ve been writing a series of blog posts that deep dive into the internals of container systems and implementing them by hand.
This time I wanted to tackle how images are downloaded and used. Most people are aware that a container image is composed of layers, but what does that really mean? When you pull an image you often see the progress bar of multiple layers. Here is an example of redis which has several layers.
But before we dig into the nitty gritty, we need to talk about standards.
Docker vs OCI
Docker wasn’t the first to make containers, but they made them mainstream. As such they wrote their own standards as they went along. We call these the docker standards.
After a while it became clear that this containerization thing was here to stay, and a lot of different companies wanted to play in the same sandbox. Under the auspices of the Linux Foundation a coalition was formed in 2015 called the Open Containers Initiative (OCI) composed of all the big names in the space including Amazon, CoreOS, Docker, IBM, Goldman Sachs, Google, Microsoft and VMware. They developed a set of standards of how to work with containers. These include things such as the image spec, the runtime spec, and what we’re interested in today, the distribution spec.
I want to be clear that Docker isn’t fighting the OCI as they were a founding member, but they do have their own protocols. Today we are going to be focusing on the OCI protocols.
OCI objects
Before moving onto the next section it’s important to understand how OCI stores it’s data. It’s divided up amongst a number of files, and this image from the official spec shows the hierarchy.
Starting from the top, the first thing we’re going to download is the image index or application/vnd.oci.image.index.v1+json
. In a repo you can have multiple images for the same tag, so when you download the image index you’ll get a json object that lists manifests. One for each architecture. e.g. Linux amd64, or Darwin arm64.
Next you’ll download a manifest or application/vnd.oci.image.manifest.v1+json
for a single image. The manfiest will contain some metadata about the image but most important it will have the runtime config or application/vnd.oci.image.config.v1+json
which is a json config file that tells a low-level container runtime about how to start the image. This includes things like environment variables.
Last the manifest will contain a list of layers or application/vnd.coi.image.layer.v1+tar+gzip
. Layers are arbirary binary data, but in this case they are a compressed directory that represents one layer of the final image of the container.
Once the client has downloaded and extracted the config.json and the layers it can convert them into a bundle that the low-level container runtime can use to start the container.
The process
Today we’re going to be downloading a redis image from the Amazon Public Gallery.
Downloading an image involves the following steps.
- Get an auth token.
- Download the image index.
- Download the image manifest.
- Download the image config.
- Download the layers.
Generate an auth token
Amazon’s public gallery allows anonymous access but you still need an auth token. You can get one by running the following
curl https://public.ecr.aws/token/
{"token":"eyJwYXl....."}
This token is only good for a few minutes.
Download an Image Index
We’re going to be downloading the image `public.ecr.aws/v2/docker/library/redis:bookworm`. As it turns out there are many different images with that exact same repo and tag. They just have different architectures. So first we need to download an image index that lists all the manifests that are available for that tag. We’ll be placing the previously obtained auth token in the environment variable $TOKEN. The OCI standard says the way to download a manifest is to use the endpoint.
/v2/<name>/manifests/<reference>
The name is going to be “docker/library/redis” and the reference in this case is the tag name “bookworm”.
curl -s -H "Authorization: Bearer $TOKEN" https://public.ecr.aws/v2/docker/library/redis/manifests/bookworm
{
"manifests": [
{
...
"annotations": {
"com.docker.official-images.bashbrew.arch": "arm64v8",
"org.opencontainers.image.base.digest": "sha256:936ea04e67a02e5e83056bfa8c7331e1c9ae89d4a324bbc1654d9497b815ae56",
"org.opencontainers.image.base.name": "debian:bookworm-slim",
"org.opencontainers.image.created": "2024-10-17T20:06:14Z",
"org.opencontainers.image.revision": "e5650da99bb377b2ed4f9f1ef993ff24729b1c16",
"org.opencontainers.image.source": "https://github.com/redis/docker-library-redis.git#e5650da99bb377b2ed4f9f1ef993ff24729b1c16:7.4/debian",
"org.opencontainers.image.url": "https://hub.docker.com/_/redis",
"org.opencontainers.image.version": "7.4.1"
},
"digest": "sha256:245b69f8dea697e2a20962acb4b391888dbf035a837132f1eb6657ba2048d0ec",
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"platform": {
"architecture": "arm64",
"os": "linux",
"variant": "v8"
},
"size": 2485
},
...
],
"mediaType": "application/vnd.oci.image.index.v1+json",
"schemaVersion": 2
}
The query returned about ten or so different manifest objects. Because I’m on a MacBook with Apple silicon, I’m going to look at the 64-bit ARM image.
We know it’s a manifest because the mediaType is “application/vnd.oci.image.manifest.v1+json” and we can see in the platform section it says it’s “arm64”. At this point it’s important to note the digest. That’s the identifier that will allow us to fetch the manifest.
Download the Image Manifest
The digest of the manifest we want is “sha256:245b69f8dea697e2a20962acb4b391888dbf035a837132f1eb6657ba2048d0ec”. The OCI endpoint for retrieving a manifest is the same as before, we’ll just use a digest instead of a tag as the reference.
curl -s -H "Authorization: Bearer $TOKEN" https://public.ecr.aws/v2/docker/library/redis/manifests/sha256:245b69f8dea697e2a20962acb4b391888dbf035a837132f1eb6657ba2048d0ec | jq
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:b56cae0d36e0e2ec912eafd69436ff880c0aa3e61eb3e7fdc1ef7aad9b00fe5a",
"size": 8734
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:83d624c4be2db5b81ae220b6b10cbc9a559d5800fd32556f4020727098f71ed0",
"size": 29156341
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:473c53d52ee889965a6c7b690184d4b4ca1f2e085e19aebe887b7dad1d26fb44",
"size": 1100
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:6f2cf6cf0e56af0525663ea5aa2eb324e7c53dec39950b53cb4cb76e4745338f",
"size": 875
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:e5799663249b138c8bd0e8114126a921afe45a4a5afe35e67c0c4de9b3d073b5",
"size": 1369398
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:8e3fc4377a6e4dc46f23bb3450210636347c059098ec8eea4eca9af107e4ad4e",
"size": 15322452
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:cedf876e65f74fb9855e41aa6c0b8b612c1729ff11b1ad9a6493cf0a6b1bc893",
"size": 97
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1",
"size": 32
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:3aa9d59b5200d674db402a6a1dfd41c4184709896126e5a9254d8683d33f30c2",
"size": 573
}
],
"annotations": {
"com.docker.official-images.bashbrew.arch": "arm64v8",
"org.opencontainers.image.base.digest": "sha256:936ea04e67a02e5e83056bfa8c7331e1c9ae89d4a324bbc1654d9497b815ae56",
"org.opencontainers.image.base.name": "debian:bookworm-slim",
"org.opencontainers.image.created": "2024-10-04T09:56:40Z",
"org.opencontainers.image.revision": "e5650da99bb377b2ed4f9f1ef993ff24729b1c16",
"org.opencontainers.image.source": "https://github.com/redis/docker-library-redis.git#e5650da99bb377b2ed4f9f1ef993ff24729b1c16:7.4/debian",
"org.opencontainers.image.url": "https://hub.docker.com/_/redis",
"org.opencontainers.image.version": "7.4.1"
}
}
In this manifest we can see a config object (application/vnd.oci.image.config.v1+json) plus eight layers of various sizes (application/vnd.oci.image.layer.v1.tar+gzip).
Download the Image Config
The config is a simple JSON file but it’s stored in the same endpoint as arbitrary binary data. We reference it by the digest.
/v2/<name>/blobs/<digest>
curl -L -s -H "Authorization: Bearer $TOKEN" https://public.ecr.aws/v2/docker/library/redis/blobs/sha256:b56cae0d36e0e2ec912eafd69436ff880c0aa3e61eb3e7fdc1ef7aad9b00fe5a | jq
{
"architecture": "arm64",
"config": {
"ExposedPorts": {
"6379/tcp": {}
},
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"GOSU_VERSION=1.17",
"REDIS_VERSION=7.4.1",
"REDIS_DOWNLOAD_URL=http://download.redis.io/releases/redis-7.4.1.tar.gz",
"REDIS_DOWNLOAD_SHA=bc34b878eb89421bbfca6fa78752343bf37af312a09eb0fae47c9575977dfaa2"
],
"Entrypoint": [
"docker-entrypoint.sh"
],
"Cmd": [
"redis-server"
],
"Volumes": {
"/data": {}
},
"WorkingDir": "/data",
"ArgsEscaped": true
},
"created": "2024-10-04T09:56:40Z",
"history": [
{
"created": "2024-10-04T09:56:40Z",
"created_by": "/bin/sh -c #(nop) ADD file:702193928cded0bcec5edbf4a5660961e7caef8c9d9cafea3337b7f6720c4464 in / "
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "/bin/sh -c #(nop) CMD [\"bash\"]",
"empty_layer": true
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "RUN /bin/sh -c set -eux; \tgroupadd -r -g 999 redis; \tuseradd -r -g redis -u 999 redis # buildkit",
"comment": "buildkit.dockerfile.v0"
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "RUN /bin/sh -c set -eux; \tapt-get update; \tapt-get install -y --no-install-recommends \t\ttzdata \t; \trm -rf /var/lib/apt/lists/* # buildkit",
"comment": "buildkit.dockerfile.v0"
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "ENV GOSU_VERSION=1.17",
"comment": "buildkit.dockerfile.v0",
"empty_layer": true
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "RUN /bin/sh -c set -eux; \tsavedAptMark=\"$(apt-mark showmanual)\"; \tapt-get update; \tapt-get install -y --no-install-recommends ca-certificates gnupg wget; \trm -rf /var/lib/apt/lists/*; \tarch=\"$(dpkg --print-architecture | awk -F- '{ print $NF }')\"; \tcase \"$arch\" in \t\t'amd64') url='https://github.com/tianon/gosu/releases/download/1.17/gosu-amd64'; sha256='bbc4136d03ab138b1ad66fa4fc051bafc6cc7ffae632b069a53657279a450de3' ;; \t\t'arm64') url='https://github.com/tianon/gosu/releases/download/1.17/gosu-arm64'; sha256='c3805a85d17f4454c23d7059bcb97e1ec1af272b90126e79ed002342de08389b' ;; \t\t'armel') url='https://github.com/tianon/gosu/releases/download/1.17/gosu-armel'; sha256='f9969910fa141140438c998cfa02f603bf213b11afd466dcde8fa940e700945d' ;; \t\t'i386') url='https://github.com/tianon/gosu/releases/download/1.17/gosu-i386'; sha256='087dbb8fe479537e64f9c86fa49ff3b41dee1cbd28739a19aaef83dc8186b1ca' ;; \t\t'mips64el') url='https://github.com/tianon/gosu/releases/download/1.17/gosu-mips64el'; sha256='87140029d792595e660be0015341dfa1c02d1181459ae40df9f093e471d75b70' ;; \t\t'ppc64el') url='https://github.com/tianon/gosu/releases/download/1.17/gosu-ppc64el'; sha256='1891acdcfa70046818ab6ed3c52b9d42fa10fbb7b340eb429c8c7849691dbd76' ;; \t\t'riscv64') url='https://github.com/tianon/gosu/releases/download/1.17/gosu-riscv64'; sha256='38a6444b57adce135c42d5a3689f616fc7803ddc7a07ff6f946f2ebc67a26ba6' ;; \t\t's390x') url='https://github.com/tianon/gosu/releases/download/1.17/gosu-s390x'; sha256='69873bab588192f760547ca1f75b27cfcf106e9f7403fee6fd0600bc914979d0' ;; \t\t'armhf') url='https://github.com/tianon/gosu/releases/download/1.17/gosu-armhf'; sha256='e5866286277ff2a2159fb9196fea13e0a59d3f1091ea46ddb985160b94b6841b' ;; \t\t*) echo >&2 \"error: unsupported gosu architecture: '$arch'\"; exit 1 ;; \tesac; \twget -O /usr/local/bin/gosu.asc \"$url.asc\"; \twget -O /usr/local/bin/gosu \"$url\"; \techo \"$sha256 */usr/local/bin/gosu\" | sha256sum -c -; \texport GNUPGHOME=\"$(mktemp -d)\"; \tgpg --batch --keyserver hkps://keys.openpgp.org --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4; \tgpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu; \tgpgconf --kill all; \trm -rf \"$GNUPGHOME\" /usr/local/bin/gosu.asc; \tapt-mark auto '.*' > /dev/null; \t[ -z \"$savedAptMark\" ] || apt-mark manual $savedAptMark > /dev/null; \tapt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \tchmod +x /usr/local/bin/gosu; \tgosu --version; \tgosu nobody true # buildkit",
"comment": "buildkit.dockerfile.v0"
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "ENV REDIS_VERSION=7.4.1",
"comment": "buildkit.dockerfile.v0",
"empty_layer": true
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "ENV REDIS_DOWNLOAD_URL=http://download.redis.io/releases/redis-7.4.1.tar.gz",
"comment": "buildkit.dockerfile.v0",
"empty_layer": true
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "ENV REDIS_DOWNLOAD_SHA=bc34b878eb89421bbfca6fa78752343bf37af312a09eb0fae47c9575977dfaa2",
"comment": "buildkit.dockerfile.v0",
"empty_layer": true
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "RUN /bin/sh -c set -eux; \t\tsavedAptMark=\"$(apt-mark showmanual)\"; \tapt-get update; \tapt-get install -y --no-install-recommends \t\tca-certificates \t\twget \t\t\t\tdpkg-dev \t\tgcc \t\tlibc6-dev \t\tlibssl-dev \t\tmake \t; \trm -rf /var/lib/apt/lists/*; \t\twget -O redis.tar.gz \"$REDIS_DOWNLOAD_URL\"; \techo \"$REDIS_DOWNLOAD_SHA *redis.tar.gz\" | sha256sum -c -; \tmkdir -p /usr/src/redis; \ttar -xzf redis.tar.gz -C /usr/src/redis --strip-components=1; \trm redis.tar.gz; \t\tgrep -E '^ *createBoolConfig[(]\"protected-mode\",.*, *1 *,.*[)],$' /usr/src/redis/src/config.c; \tsed -ri 's!^( *createBoolConfig[(]\"protected-mode\",.*, *)1( *,.*[)],)$!\\10\\2!' /usr/src/redis/src/config.c; \tgrep -E '^ *createBoolConfig[(]\"protected-mode\",.*, *0 *,.*[)],$' /usr/src/redis/src/config.c; \t\tgnuArch=\"$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)\"; \textraJemallocConfigureFlags=\"--build=$gnuArch\"; \tdpkgArch=\"$(dpkg --print-architecture)\"; \tcase \"${dpkgArch##*-}\" in \t\tamd64 | i386 | x32) extraJemallocConfigureFlags=\"$extraJemallocConfigureFlags --with-lg-page=12\" ;; \t\t*) extraJemallocConfigureFlags=\"$extraJemallocConfigureFlags --with-lg-page=16\" ;; \tesac; \textraJemallocConfigureFlags=\"$extraJemallocConfigureFlags --with-lg-hugepage=21\"; \tgrep -F 'cd jemalloc && ./configure ' /usr/src/redis/deps/Makefile; \tsed -ri 's!cd jemalloc && ./configure !&'\"$extraJemallocConfigureFlags\"' !' /usr/src/redis/deps/Makefile; \tgrep -F \"cd jemalloc && ./configure $extraJemallocConfigureFlags \" /usr/src/redis/deps/Makefile; \t\texport BUILD_TLS=yes; \tmake -C /usr/src/redis -j \"$(nproc)\" all; \tmake -C /usr/src/redis install; \t\tserverMd5=\"$(md5sum /usr/local/bin/redis-server | cut -d' ' -f1)\"; export serverMd5; \tfind /usr/local/bin/redis* -maxdepth 0 \t\t-type f -not -name redis-server \t\t-exec sh -eux -c ' \t\t\tmd5=\"$(md5sum \"$1\" | cut -d\" \" -f1)\"; \t\t\ttest \"$md5\" = \"$serverMd5\"; \t\t' -- '{}' ';' \t\t-exec ln -svfT 'redis-server' '{}' ';' \t; \t\trm -r /usr/src/redis; \t\tapt-mark auto '.*' > /dev/null; \t[ -z \"$savedAptMark\" ] || apt-mark manual $savedAptMark > /dev/null; \tfind /usr/local -type f -executable -exec ldd '{}' ';' \t\t| awk '/=>/ { so = $(NF-1); if (index(so, \"/usr/local/\") == 1) { next }; gsub(\"^/(usr/)?\", \"\", so); printf \"*%s\\n\", so }' \t\t| sort -u \t\t| xargs -r dpkg-query --search \t\t| cut -d: -f1 \t\t| sort -u \t\t| xargs -r apt-mark manual \t; \tapt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \t\tredis-cli --version; \tredis-server --version # buildkit",
"comment": "buildkit.dockerfile.v0"
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "RUN /bin/sh -c mkdir /data && chown redis:redis /data # buildkit",
"comment": "buildkit.dockerfile.v0"
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "VOLUME [/data]",
"comment": "buildkit.dockerfile.v0",
"empty_layer": true
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "WORKDIR /data",
"comment": "buildkit.dockerfile.v0"
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "COPY docker-entrypoint.sh /usr/local/bin/ # buildkit",
"comment": "buildkit.dockerfile.v0"
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "ENTRYPOINT [\"docker-entrypoint.sh\"]",
"comment": "buildkit.dockerfile.v0",
"empty_layer": true
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "EXPOSE map[6379/tcp:{}]",
"comment": "buildkit.dockerfile.v0",
"empty_layer": true
},
{
"created": "2024-10-04T09:56:40Z",
"created_by": "CMD [\"redis-server\"]",
"comment": "buildkit.dockerfile.v0",
"empty_layer": true
}
],
"os": "linux",
"rootfs": {
"type": "layers",
"diff_ids": [
"sha256:f0f039847c0897e41273775d599cc761049c809342ff8362efb4caf561186ada",
"sha256:0c85d36ec05d4f2d0bbd6648769ddfff77dce03990e0f197631c970b7f0ee42a",
"sha256:8080d55e9ed62240b47d68d43e9ddcde249f0a69b6ef76ea0ff564fc86c1cc9c",
"sha256:dd4a1061343db9a79cfba34c9dfb321f18421432cdea88c60e2f9dc5bbd3785f",
"sha256:3dc436bb646e14a8998b17d1df720396826eab29469c553dfcb757a7ec9135df",
"sha256:b5e7ceff24e69f5c3e1d5a68be5c8c80f7cd2608dd12b5bd8570c3e42e78da26",
"sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef",
"sha256:f42c3ccd8d30b253fedb5ab1575934ecaebbb7c192a793a6860d52b83647243f"
]
},
"variant": "v8"
}
If you’ve ever worked with runc before then this JSON file should look familiar. It contains everything that runc needs to know about an image in order to run it including where to find the layers. It has a few other things such as some annotations, environment variables, exposed ports, the entrypoint, and the command.
Download the Layers
At this point you should be familiar with how we’re going to fetch the layers. We’ll reference the digests and use the same blob endpoint as before.
/v2/<name>/blobs/<digest>
As described by the mediaType these are gziped tarballs so we will store them as such.
curl -o layer1.tgz -L -s -H "Authorization: Bearer $TOKEN" https://public.ecr.aws/v2/docker/library/redis/blobs/sha256:83d624c4be2db5b81ae220b6b10cbc9a559d5800fd32556f4020727098f71ed0
Now if we extract layer1.tgz and take a look inside we’ll see the base of the root filesystem.
~ ls
bin etc lib opt run sys var
boot home media proc sbin tmp
dev mnt root srv usr
Now you can continue dowloading the layers until you have them all extracted. You’ll see that they contain the incremental changes that were made during the Dockerfile.
Conclusion
I hope you have a better idea of what makes an OCI image and how it’s stored. The next step would be the high-level runtime (containerd, crio) would prepare the layers and generate a bundle which gets handed off to the low-level runtime (runc). I recommend you check out my blog mini-series on building a container runtime by hand.
Appendix
I tried a number of different repositories while working on this article. They all follow the OCI distribution standard so they are very similar, but login is actually not part of the standard. They all have a slightly different way of generating an auth token. For sake of completeness, I will list the curl commnands here.
Docker Hub
It’s often common to have to list the scope of the token when generating it so that’s important.
Generate an anonymous token for a repo.
curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:curlimages/curl:pull" | jq -r .token)
List available tags for the repo.
curl -v -H "Authorization: Bearer $TOKEN" "https://registry-1.docker.io/v2/curlimages/curl/tags/list"
If it’s an official repo, one of the repos that don’t include a namespace, you have a slightly different formula.
curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/redis:pull"
List available tags for the repo.
curl -v -H "Authorization: Bearer $TOKEN" "https://registry-1.docker.io/v2/library/redis/tags/list"
GitHub Container Registry
For an anonymous token you can just use
curl "https://ghcr.io/token?service=ghcr.io&scope=repository:<username>/<repo>:pull"
Or for some reason you can just use the token QQ==
.
If you want to use a GitHub PAT to log in, make sure it has read:packages
permissions. Take your user name and PAT in the form of <username>:<PAT>
then base64 encode them.
curl -H "Authorization: Basic $GITHUB_TOKEN_64" "https://ghcr.io/token?service=ghcr.io&scope=repository:<username>/<repo>:pull"