APT for downloaders

Big railroad network with many different trains before a setting sun
Image by fancycrave1 on pixabay

Remember what I said last time? I started with "One of the main jobs of a package manager like apt is to download packages". So let us talk about downloading a bit more this time.

APT doesn't hardcode certain protocols like HTTP for downloading. It uses instead external binaries it calls "transport" via a self-defined text protocol similar in style to HTTP.

APT comes with a large set of more or less common protocols by default, but can in this way also be extended to support other protocols… lets look at the available protocols for APT:

The strange kids on the block: gpgv, rred & store

Methods like gpgv, rred and store usually have only a short stay in the progress indication of an update call and are hardly useable by anyone, but they exist as standalone methods to have (potentially many of them) run in parallel and make use of features like switching to a non-root user for their execution.

There is no cloud: file and copy

Feeling limited by the bandwidth-cap of your internet connection? No problem, apt doesn't need the internet – at least if you have a local mirror or a repository in general which resides on a local hard-drive or is e.g. mounted via NFS as a network share.

file will try to reuse the data in the location it is at the moment – which can lead to apt seeing changes in the repository without an update call – while copy will as the name implies store a copy of the data in the usual places data copied from the internet would reside.

In the past sort-of temporary repositories were added to apt with these, but given the features mentioned in APT for package self-builders they slowly approach the state of being considered also one of the strange kids mentioned previously.

Blast from the past: ftp and ssh

ftp is heavily on the decline for file transfer as for apt the usage of http is actually much better as we can make us of many advanced HTTP features. Debian dropped its ftp mirrors and we might end up dropping the method some day… it has surely not seen any active development for years at least.

ssh (also known as rsh) never was that popular to begin with, but can still be used to access repositories on remote systems via SSH assuming you manage to configure it correctly. It could also use some love from active users…

Teaching an old dog new tricks: cdrom

cdrom is something many users will need for the initial installation, but after that… you can through. It is also nowadays a misnomer as it can of course handle all the other rotating discs you can place in a drive like DVDs and Bluerays – as well as the (usually) less round ISO file storage device: USB sticks. The method is so special that it comes with its own binary apt-cdrom to help users work with it. It would also really need some love through, so if you were looking for some way to contribute to Debian and love playing with CDs & USB sticks…

The usual suspect: http

In all likelihood that is what you are using on all your systems. So, there isn't much to say about it expect that in buster it finally got its own manpage… have a look at apt-transport-http(1) someday (even translations are available).

That it is used so often doesn't mean people know or use all the features through: See e.g. auth.conf (granted, not that many repositories are password protected) and automatic proxy configuration. Tip: Have a look at the auto-apt-proxy package.

HTTP2 isn't supported yet and that might still be a while given there isn't a whole lot of point in it for the apt usecase as we know which files we want to acquire & that they are static, but at some point perhaps… [Yours truly still vividly remembers being told by some proxy/mirror people at a conference years ago that pipelining is way too much state keeping to support and apt shouldn't use it just like browsers! These guys probably love HTTP2…]

What most people don't realize is that this transport actually does more than wget, curl or even your webbrowser like having support for SRV records – something your webbrowser doesn't and probably never will support. SRV records are what powers deb.debian.org in case you wonder whats the point. Oh, and apropos point…

The pointless: apt-transport-https

SCNR. What the title actually refers to is that apt contains in the 1.6 series the https transport directly, so apt-transport-https is now an empty transitional package (aka: pointless to install).

Implied is of course something different: That HTTPS would be pointless for APT. It might or might not be, depending on your specific angle. There was a lot written about it, so feel free to read that if you must – e.g. Why does APT not use HTTPS. My point here is mainly that APT can if you want and its easier than ever. deb.debian.org can be accessed via https if you are looking for a mirror.

It also comes with a manpage since recently with apt-transport-https(1) which also mentions the most interesting feature of HTTPS: Client certificates – as access control via username and password is boring.

A small "gotcha" of sorts is that we have opted to forbid redirects from https to http, which breaks a lot more https sources than you would hope, but we decided that if you go for https, you probably don't want to compromise it all for an unsafe redirect. Other less specialized downloaders like wget or curl are less picky…

Sidenote: https is nowadays implemented as a tiny layer over http. We used to use the curl library to implement a semi-independent https but over time that became really ugly. The redirect-downgrade mentioned in the previous paragraph was colossal pain, redirections in general needed to be handled carefully, SRV support not on the horizon and so on. That isn't to say that curl is bad – it is just not really compatible with the architecture we already have.

The tearjerker: apt-transport-tor

Pretty much every reason for using HTTPS is potentially better served by using Tor and thankfully it is super easy to use it: Just install the package and prepend tor+ to all URLs you have in your sources.list files. The README has details and also points to various onion-addresses you can use instead of boring normal domains (that hopefully explains also the tear-pun).

Sidenote: Implementation wise Tor is just a SOCKS proxy, so all this method does is setting some Proxy configuration and then let http(s) do its job, so we wouldn't really need an extra package for it – but its easier for a user that way and I would really like to make it even easier if we had some more contributions on the documentation and scripting front… (hint hint).

The magician: mirror

With my rewrite in 1.6 mirror became my personal favorite and it might be yours too at the end of this section. 🙂

This method doesn't implement a download protocol on its own, it instead just acts as manager instructing other methods to do stuff by first downloading a file listing one or more mirrors and then distributing all requests it is asked to handle to a mirror from the list – and potentially a different mirror for each request… so, it is a potentially local variant of the decommissioned httpredir service, but integrated into apt resolving some (or all) problems it had.

Beside the obvious "I want apt to pull packages from 3 mirrors at the same time" usecase this obviously has it can deal gracefully with partial mirrors as well as mirrors which are less frequently synced without requiring a clever service keeping tabs on it (which was one of the reasons httpredir eventually died).

The manpage apt-transport-mirror uses this contrived advanced example:

file:/srv/local/debian/mirror/     priority:1 type:index
http://partial.example.org/mirror/ priority:2 arch:amd64 arch:all type:deb
http://ftp.us.debian.org/debian/   type:deb
http://ftp.de.debian.org/debian/   type:deb
https://deb.debian.org/debian/

That is just to show off, but should be enough reason for you to go read that manpage. Yours will likely be a lot simpler… mine just mentions some mirrors, is a local file and accessed via the slight arcane tor+mirror+file transport which means: get the local mirror file and access all mirrors listed in there via Tor…

There is a lot you can do with that already, but there is certainly some more stuff missing or could be improved. Feel free to get in touch if you have ideas, with or without patch attached. 😉

Street vendors united: Third-party transports

All transports mentioned so far are either bundled with apt or maintained by the team, but with -s3 and -spacewalk there are at least two transports in Debian maintained by others and in the past there was also -debtorrent but that is no longer maintained. -tor started in this group here as well.

Sadly, some things which should be implemented in transports (if at all) aren't like the dreaded never-ending stream of apt-fast implementations which usually ship with enormous security problems – but at least they are very fast at being insecure –, so I can only encourage exploring the transport system if you think apt should learn to acquire files in a certain fashion or over a certain protocol.

Bonus: Using apt as wget/curl replacement

Okay, it might be a bit of an overstatement, but for a quick download you can call /usr/lib/apt/apt-helper download-file https://example.org /path/to/file and with an optional third parameter you can provide a hashsum for the file. The killer feature might be that you can use any transport here, so tor+http works and does the right thing: That tends to be harder to do with wget/curl. As a bonus this will drop privileges and might even use seccomp, but security might be the topic of another day…

tl;dr

I will leave you now to reconfiguring your sources, especially mirror hopefully gave you some ideas. See you next time! 🙂

P.S.: As seems usual by now, this post was basically done months ago, but I didn't want to post it while people where arguing if curl or apt has more CVEs in its implementation as that was rather silly to watch and also not at the time some people were asking for my head for suspected intended CVE code injection. But after that nonsense cooled off I had forgotten that this was never published… well, better late than never.