Downloading Docker images for (almost) air-gapped environments

Use docker pull, what's the problem?

Well, the problem is that in some cases you just can't -- whatever you're working on is just not connected to the internet. But there might be a system for fetching files like offline installers via the browser (because you know, even air-gapped environments need to install regular updates).

So that's essentially the challenge: how do you download Docker images to use later, if all you have is a browser.

Background

To the uninitiated, Docker images indeed can be saved and loaded as simple tar files, as docker cli provides save and load commands. So one would assume that Docker Hub and similar would allow you to just download these tar files, right? Well, no. Closest I've seen was quay.io, but even then you need to have special perms to the repo. And there are good reasons for the reluctance to offer tar downloads -- Docker afterall (finally) has content-addressable images, which go a long way to ensure images aren't being tampered with by MitM attacks. That is of course, if you use docker pull; you lose those guarantees if you operate on plain tar files.

But suppose you know what you're doing and really want to download these tar files, because there is no other way. How can you go about this?

Options

Off-site docker pull && docker save

If you have access to an off-site machine with internet access and some place to host images after pulling them, then all you need is to pull and save the image using docker cli and upload resulting tar file to some hosting service where you can fetch the file from. This however, can get quite tedious if every time you come up with a new Docker image you need to leave your working environment and spend time running this process.

Wrap docker pull && save in a CGI script

Next obvious iteration is to just make a web service calling docker pull/save that just serves you the finished download. This comes with... issues. To name a few:

  • You now need to manage your disk: clean up old images, make sure you don't clean up images not yet downloaded or currently being fetched. And this needs to happend automatically and reliably, because you are no longer manually running docker cli, so any issues you run into you can't fix (remember, this CGI script is a black box when you use it, because you can only poke it via a browser).
  • If you are behind an aggressive proxy that kills connections after seconds of silence on the wire, you need to do something while your CGI script is running docker pull/docker save and not returning output.
  • You need docker running on this server, and your CGI script needs access to docker (with all the fun security implications).
  • CGI scripts, really?!

Docker save script to the rescue

The open source community has come up with a bash script to run the download without invoking Docker. Sadly this still downloads stuff onto the hard-drive. Is there really no way to avoid that?

Convert Docker save script into a (Java) app

I feel queasy when people talk about CGI scripts, but am quite comfortable writing Java web services and this seemed to fit the bill just right. So I ended up writing a Spring Boot app that just translated the bash script, with a few key differences:

  • Nothing sizeable is saved to disk or memory. The image manifest and config files are the only things downloader fetches and stores in memory as it needs those to find the Docker layer files. From this point the downloader constructs a streaming tar file writer and forwards fetched data directly to the browser as a downloadable stream.
  • Because there is not pre-fetch phase, connection is always active, so no need to worry about proxy timeouts.
  • This can run on the tiniest cloud instances, all you need to worry about is you bandwidth costs.

Bandwidth, bandwidth, bandwidth

While this works fine, there is a slight issue of now hosting a proxy. Would it not be great if JavaScript could somehow be coerced into downloading this directly from the browser? Well, we have a way to construct tar streams on the fly in JavaScript and force them to be downloaded, thanks to StreamSaver.js and tar-stream libraries.

But most client-side developers will by now be screaming: CORS! And they are right, unless you turn off CORS in your browser, this cannot run, because by the very design if this fetcher, you're attempting to fetch files hosted on a different domain. As a final nail in the coffin of this idea, it would seem Web Assembly or even Native Client are of no help, because they would still have to obey CORS rules in their HTTP connections.

So basically, if a standard browser is all you have, you're mostly stuck with proxying requests.

Epilogue

I've hosted a very crude UI to the downloader, with source code on github. There's a 500G bandwidth limit on my hosted version, so if anyone knows of a cloud provider without bandwidth caps, feel free to drop in an email @gmail.com.

P.S.

I am aware of services like Nexus Repository Manager or Artifactory which do come with Docker support, but installing those seemed a bit overkill for the puzzle at hand, also not sure if either actually provides a 'download as tar' link for their Docker repositories.

blogroll