August 17th, 2016 · Utilities
Ever since the Windows Subsystem for Linux was announced, the one thing that was bugging me is that it is running an older version of Ubuntu. It wasn't only me, the eighth issue opened on the project's issue tracker is about the ability to switch to an alternate distribution. Shortly after, a tutorial popped up for installing Fedora on WSL. While this worked, I only tried it in a virtual machine, as I could not afford upgrading to Insider Preview on my main machine at the time. As a result, I did not end up experimenting as much as I'd liked.
With the Windows 10 Anniversary Update, the Linux subsystem finally shipped with the stable version of the operating system, and I could begin the experimentation.
After some research on WSL, it seems that the rootfs
directory contains the distribution files, while the /home
, /root
and similar directories are stored separately. This setup is perfect for a scenario where I'd like to seamlessly switch between the distributions, persisting my dotfiles and miscellaneous files in the home directories.
This blog post aims to present my thought process throughout the solving of this issue while also explaining a few design decisions on the way. If you're only interested in how to use the the final product, I suggest reading the project's readme instead.
The aformenetioned Fedora tutorial requires you to download a tarball from Fedora's build system. This works, however, it's too Fedora-specific. Other distributions may also have similar tarballs available for download, however, I wanted a more universal approach, without scouring the web for tarballs. At least, not manually.
Enter Docker. The Docker Hub already has such tarballs available for quick consumption, maybe they can be used for WSL as well. After some prodding at the service, I've determined the best way to download a usable tarball, is to find the original Dockerfile
for the Docker image, and then use the files it packages into its /
directory.
It seems the official OS images are stored in a git repository, available at github.com/docker-library/official-images. There is one file per image under the library
directory, and this contains a list of tags along with their source. Going through the files, it seems there are two different versions in use. The older version seems to be:
# maintainer: Oracle Linux Product Team <ol-ovm-info_ww@oracle.com> (@Djelibeybi)
# Oracle Linux 7
7: git://github.com/oracle/docker-images.git@a44844fe085a561ded44865eafb63f742e4250c1 OracleLinux/7.2
# Oracle Linux 6
6: git://github.com/oracle/docker-images.git@a44844fe085a561ded44865eafb63f742e4250c1 OracleLinux/6.8
While recently updated files use the following format:
Maintainers: The CentOS Project <cloud-ops@centos.org> (@CentOS)
GitRepo: https://github.com/CentOS/sig-cloud-instance-images.git
Directory: docker
Constraints: !aufs
Tags: centos7.2.1511, 7.2.1511
GitFetch: refs/heads/CentOS-7.2.1511
GitCommit: a3c59bd4e98a7f9c063d993955c8ec19c5b1ceff
Tags: centos6, 6
GitFetch: refs/heads/CentOS-6
GitCommit: 98bda021f98ad46991afcd9f8ca657bce762e631
Studying the structure of the repositories and the fields available, it quickly became apparent that enough information is available in both cases to build a direct link to the Dockerfile
using something like https://github.com/$GitRepo/blob/$GitCommit/$Directory/Dockerfile
. The two examples above would then translate to the following URLs:
All the Dockerfiles
available this way, or at least the ones I've tried, have an associated tarball in the repository, and the Dockerfile
has a directive to add the tarball as the root in the image:
ADD centos-7.2.1511-docker.tar.xz /
To download these tarballs, simply replace Dockerfile
to the name of the archive in the URL:
Writing a script was trivial from here:
$ ./get-source.py
usage: ./get-source.py image[:tag]
$ ./get-source.py centos:7.2.1511
[*] Fetching official-images info for centos:7.2.1511...
[*] Fetching Dockerfile from repo CentOS/sig-cloud-instance-images/.../docker...
[*] Downloading archive https://raw.githubusercontent.com/.../centos-7.2.1511-docker.tar.xz...
[*] Rootfs archive for centos:7.2.1511 saved to rootfs_centos_7.2.1511.tar.xz.
The previous method works, but it's limited to official images only, which is somewhat a limiting factor, considering how much bigger the selection could get if the script was extended to the whole list of images published on the Docker Hub.
Updating the script to include these was not an option, since the official images are a one-off special case with their own git repository with a well-defined structure. Instead I needed to look into what happens when you run docker pull
.
Thankfully, the Docker Registry has an open API with documentation, namely the Docker Registry HTTP API V2. To get started, you must first request an auth token for the repository you're about to download:
$ curl "https://auth.docker.io/token?service=registry.docker.io&scope=repository:gentoo/stage3-amd64-hardened:pull"
{"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCIsIng1YyI6WyJNSUlDTHpDQ0FkU2dBd0lCQWdJQkFEQUtCZ2dxaGtqT1BRUURBakJHTVV..."}
With this auth token, you can now go ahead and request the image manifest information for the specified image and tag:
$ curl -H "Authorization: Bearer eyJhbG..." "https://registry.hub.docker.com/v2/gentoo/stage3-amd64-hardened/manifests/latest"
{
"schemaVersion": 1,
"name": "gentoo/stage3-amd64-hardened",
"tag": "latest",
"architecture": "amd64",
"fsLayers": [
{ "blobSum": "sha256:3c4ee6b925b971024bb3d2028207115f3756149081cef55f79fdbe6888983b41" },
{ "blobSum": "sha256:a591e6c7eed71f0be900a0cbb45e0492054f618420e78a46be1c096323fbfa9f" },
{ "blobSum": "sha256:aef964ab23f0f41632f177cdba1097287b98413a2ba3409a95918dbee7e6578c" },
{ "blobSum": "sha256:a6d5a73fd11ff7e470c9589754fd885901d6da4238a430058ae1e1a0ecc57b15" },
{ "blobSum": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" },
{ "blobSum": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" },
{ "blobSum": "sha256:8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f" }
],
"history": [
{ "v1Compatibility": {"architecture":"amd64","author":"Gentoo Docker Team","config":{"Hostname":"55cd1f8f6e5b","Domainname":"","User":"","AttachStdin":false,"AttachStdout":false,"AttachStderr":false,"Tty":false,"OpenStdin":false,"StdinOnce":false,"Env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"],"Cmd":["sh"],"ArgsEscaped":true,"Image":"sha256:6365cf68ed58c4908062151a22aa9d1234c6d4fa8969bf2b31dc52c2f5ab32f5","Volumes":null,"WorkingDir":"","Entrypoint":null,"OnBuild":[],"Labels":{}},"container":"93a57f435dca201e139b13fa4998ba0d11dd52bf12eab563a0257abe6a67dc6d","container_config":{"Hostname":"55cd1f8f6e5b","Domainname":"","User":"","AttachStdin":false,"AttachStdout":false,"AttachStderr":false,"Tty":false,"OpenStdin":false,"StdinOnce":false,"Env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"],"Cmd":["/bin/sh","-c","echo 'UTC' \\u003e /etc/timezone"],"ArgsEscaped":true,"Image":"sha256:6365cf68ed58c4908062151a22aa9d1234c6d4fa8969bf2b31dc52c2f5ab32f5","Volumes":null,"WorkingDir":"","Entrypoint":null,"OnBuild":[],"Labels":{}},"created":"2016-08-17T05:41:07.159433419Z","docker_version":"1.12.0","id":"c71a24e998875ff18808325311c62a20a9fde7f44c9e7d51f6803ec5f5f208d8","os":"linux","parent":"7a73f99472456162ff03fa44723ce4fbd9d6d02589066d671f3ba422cad6bbf2"} },
{ "v1Compatibility": {"id":"7a73f99472456162ff03fa44723ce4fbd9d6d02589066d671f3ba422cad6bbf2","parent":"3cc6502ad622b16ff624706b46783d4e34cfd6dd2a179d636c4ab76d41232b47","created":"2016-08-17T05:41:00.889979649Z","container_config":{"Cmd":["/bin/sh -c sed -e 's/#rc_sys=\\"\\"/rc_sys=\\"lxc\\"/g' -i /etc/rc.conf"]},"author":"Gentoo Docker Team"} },
{ "v1Compatibility": {"id":"3cc6502ad622b16ff624706b46783d4e34cfd6dd2a179d636c4ab76d41232b47","parent":"2a339221d58a14c270157ba148cf464ae72f2178bffcb5da8b41f4df5796030b","created":"2016-08-17T05:40:52.807555134Z","container_config":{"Cmd":["/bin/sh -c /build.sh amd64 x86_64 -hardened"]},"author":"Gentoo Docker Team"} },
{ "v1Compatibility": {"id":"2a339221d58a14c270157ba148cf464ae72f2178bffcb5da8b41f4df5796030b","parent":"05ef4eb35fe1924c7132360ad0a048342ae0f2dde71e06b46a6a3471dceb3925","created":"2016-08-17T05:37:35.212640622Z","container_config":{"Cmd":["/bin/sh -c #(nop) ADD file:d25e0087b910b58876ed72f4a9ffbf75d6deb2be05b3181f93c51d75a92ddef3 in / "]},"author":"Gentoo Docker Team"} },
{ "v1Compatibility": {"id":"05ef4eb35fe1924c7132360ad0a048342ae0f2dde71e06b46a6a3471dceb3925","parent":"b5447a4455e24042cc40c5294938280ad7399ff826a1f50be64f277b02cbf800","created":"2016-08-17T05:37:34.712161133Z","container_config":{"Cmd":["/bin/sh -c #(nop) MAINTAINER Gentoo Docker Team"]},"author":"Gentoo Docker Team","throwaway":true} },
{ "v1Compatibility": {"id":"b5447a4455e24042cc40c5294938280ad7399ff826a1f50be64f277b02cbf800","parent":"4185ddbe03f83877b631b5e271a02f6f232de744ae4bfc48ce44216c706cb7fd","created":"2016-06-23T23:23:37.198943461Z","container_config":{"Cmd":["/bin/sh -c #(nop) CMD [\\"sh\\"]"]},"throwaway":true} },
{ "v1Compatibility": {"id":"4185ddbe03f83877b631b5e271a02f6f232de744ae4bfc48ce44216c706cb7fd","created":"2016-06-23T23:23:36.73131105Z","container_config":{"Cmd":["/bin/sh -c #(nop) ADD file:9ca60502d646bdd815bb51e612c458e2d447b597b95cf435f9673f0966d41c1a in /"]}} }
]
}
This response has its own separate documentation, called the Image Manifest Version 2, Schema 1, however, the returned values are not that interesting. The only interesting part is the fsLayers
member, which lists the digest of the layers, in order of creation.
In order to download the layers, you just need to iterate through the list and request each digest:
$ curl -i -H "Authorization: Bearer eyJhbG..." "https://registry.hub.docker.com/v2/gentoo/stage3-amd64-hardened/blobs/sha256:8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f"
HTTP/1.1 307 Temporary Redirect
Content-Type: application/octet-stream
Docker-Distribution-Api-Version: registry/2.0
Location: https://dseasb33srnrn.cloudfront.net/registry-v2/docker/registry/v2/blobs/sha256/8d/8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f/data?Expires=1471525249&Signature=BtEmohVn12TqUA4Xg7wSF9G9kzAqFHgDvMltgYYQFS4iKNGUwk7Dh5cMSEd~u91oQT8muJl4vHVTnGQ9hcNhGW4ItCqmdExrkCLsspNBpGAx4EtopwATPKt0YQ7QxG3Uyst6U3nc2SACGWvqKzNk0uTS8cBvNSFtQ4sZxGvAhBo_&Key-Pair-Id=APKAJECH5M7VWIS5YZ6Q
Date: Thu, 18 Aug 2016 12:40:49 GMT
Content-Length: 432
Strict-Transport-Security: max-age=31536000
It should be noted, that the API itself will not serve the binary blobs, instead it will just provide an HTTP redirection to a CDN.
The next step was to determine what each layer actually is, and how I could merge them for installation. Thankfully, the documentation quickly revealed that they are application/vnd.docker.image.rootfs.diff.tar.gzip
, simply put, just .tar.gz
.
However, this still leaves me with a bunch of archives, while the installer script only accepts one single tarball. Some research and experimentation later, I found that I can just primitively append the next layer to the previous, and tar
will handle it just fine, as long as --ignore-zeros
is specified:
$ touch a; touch b
$ tar czf a.tar.gz a; tar czf b.tar.gz b
$ cat a.tar.gz >> b.tar.gz
$ tar xvf b.tar.gz # extracts only b. sadface.
b
$ tar xvf b.tar.gz --ignore-zeros # extracts both. happyface.
b
a
From here, writing a script was trivial:
$ ./get-prebuilt.py
usage: ./get-source.py image[:tag]
$ ./get-prebuilt.py gentoo/stage3-amd64-hardened
[*] Requesting authorization token...
[*] Fetching manifest info for gentoo/stage3-amd64-hardened:latest...
[*] Downloading layer sha256:3c4ee6b925b971024bb3d2028207115f3756149081cef55f79fdbe6888983b41...
[*] Downloading layer sha256:a591e6c7eed71f0be900a0cbb45e0492054f618420e78a46be1c096323fbfa9f...
[*] Downloading layer sha256:aef964ab23f0f41632f177cdba1097287b98413a2ba3409a95918dbee7e6578c...
[*] Downloading layer sha256:a6d5a73fd11ff7e470c9589754fd885901d6da4238a430058ae1e1a0ecc57b15...
[*] Downloading layer sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4...
[*] Downloading layer sha256:8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f...
[*] Rootfs archive for gentoo/stage3-amd64-hardened:latest saved to rootfs_gentoo_stage3-amd64-hardened_latest.tar.gz.
As a test, I downloaded the tarballs mentioned above within WSL, uncompressed it with permission preservation (tar xfp
), closed WSL, and moved the extracted directory in place of the old, rusty, ubuntu /rootfs
from Windows Explorer. Rushing back to the console, bash
gave me an error:
Default user not found. Please correct this by running lxrun.exe /setdefaultuser
Running the command does work, however, not all the time. Some distributions I tried don't like to play nice, and you're just left with a broken installation, which either tells you to run the command above, or that the command failed:
Please create a default UNIX user account. The username does not need to match your Windows username.
For more information visit: https://aka.ms/wslusers
Enter new UNIX username: RoliSoft
/usr/sbin/adduser: unrecognized option '--quiet'
Creating UNIX user failed, this can be done later by running lxrun.exe /setdefaultuser
However, I found this to be easily fixable: just copy the user's entries from /etc/passwd
and /etc/shadow
to the new /rootfs
, and that's it. The /home/$USER
directory will persist, as it is stored outside of /rootfs
.
So with that, writing a script to automate all of this should be fairly trivial, right? Well, if WSL wasn't experimental, then yes. But in its current state, quite a few workarounds are required for the switch to be seamless and the new distribution to work correctly.
The WSL directory can be found under %LocalAppData%\lxss
. The /rootfs
directory is the one that has to be completely replaced with the contents of the new tarball. Since the Linux subsystem can't run while it's being replaced, I had to designed the installer script to run under Windows, and occasionally launch a few bash commands through bash -c ...
. At least, that was the initial plan.
The first step, to simply run bash.exe
proved to be difficult, as it was nowhere to be found in the %PATH%
. After a bit of search, %WinDir%\sysnative\bash.exe
ended up to be the working path. However, launching this only resulted in a cryptic error message saying Error: 0x80070057
. Searching for this only lead to issues where the solution was to uncheck "Enable legacy console". Fine, but I'm not using a console. What now?
Turns out, stdout and stderr redirection is currently not supported. Thankfully, the exit status code is correctly returned, so I can check whether the commands finished successfully or not, at least.
So now that I can somewhat launch commands, the next step was to copy the archive for extraction into the WSL /home
directory, since other directories might not be writable by the logged in user. Python has a few ways to copy a file, little did I think this will also prove to be difficult.
Copying a file willy-nilly from the Windows filesystem to the WSL directories renders them unreadable. I wasn't able to exactly track down what it is, since it's not a permission error, it's just a "general I/O error". Thinking some metadata is probably attached to the file, which is missing if you just copy an outsider file, I tried creating a new file from within WSL (touch rootfs.tar.xz
) and then opening the file for writing from the outside. This didn't work, as it reverted to the I/O error after writing.
Thankfully, it quickly clicked, that the Windows partitions are mounted under /mnt
. Translating the absolute path of the archive to this mounted UNIX equivalent, and then copying the file from within WSL solved the issue.
After this, it was smooth sailing, and the install.py
script was born:
As an alternative to Docker, I explored the downloadable ISO images Linux distributions offer, and found that most of them package their rootfs in a SquashFS archive.
These archives can be extracted easily using the unsquashfs
tool. As such, adding SquashFS support to the installer script was relatively trivial. The script just needs to detect whether the specified file is .tar*
or .sfs
/.squashfs
, and provide the according command for decompression.
Since the script can't receive stdout
/stderr
due to a WSL limitation, it check the existence of the unsquashfs
binary in root's $PATH
beforehand, so it can warn the user to launch WSL and install the squashfs-tools
package.
As an example, the downloadable archlinux-2016.08.01-dual.iso
image contains a suitable SquashFS file at ARCH/X86_64/AIROOTFS.SFS
. This file can be installed without any additional hassles, by simply running install.py AIROOTFS.SFS
.
The installer installs the various rootfs archives under the rootfs_<image>_<tag>
name. This makes switching between the Linux distributions very easy, as all one needs to do is just to rename folders in order to determine which distribution is active.
This functionality is provided by the switch.py
script: