APT for downloaders
Remember what I said last time? I started with "One of the main jobs of a package manager like apt is to download packages". So let us talk about downloading a bit more this time.
APT doesn't hardcode certain protocols like HTTP for downloading. It uses instead external binaries it calls "transport" via a self-defined text protocol similar in style to HTTP.
APT comes with a large set of more or less common protocols by default, but can in this way also be extended to support other protocols… lets look at the available protocols for APT:
The strange kids on the block: gpgv
, rred
& store
¶
Methods like gpgv
, rred
and store
usually have only a short stay in the
progress indication of an update call and are hardly useable by anyone, but they
exist as standalone methods to have (potentially many of them) run in parallel and
make use of features like switching to a non-root user for their execution.
There is no cloud: file
and copy
¶
Feeling limited by the bandwidth-cap of your internet connection? No problem, apt doesn't need the internet – at least if you have a local mirror or a repository in general which resides on a local hard-drive or is e.g. mounted via NFS as a network share.
file
will try to reuse the data in the location it is at the moment – which
can lead to apt seeing changes in the repository without an update call – while
copy
will as the name implies store a copy of the data in the usual places data
copied from the internet would reside.
In the past sort-of temporary repositories were added to apt with these, but given the features mentioned in APT for package self-builders they slowly approach the state of being considered also one of the strange kids mentioned previously.
Blast from the past: ftp
and ssh
¶
ftp
is heavily on the decline for file transfer as for apt the usage of http
is actually much better as we can make us of many advanced HTTP features. Debian
dropped its ftp mirrors and we might end up dropping the method some day… it
has surely not seen any active development for years at least.
ssh
(also known as rsh
) never was that popular to begin with, but can still be
used to access repositories on remote systems via SSH assuming you manage to configure
it correctly. It could also use some love from active users…
Teaching an old dog new tricks: cdrom
¶
cdrom
is something many users will need for the initial installation, but
after that… you can through. It is also nowadays a misnomer as it can of
course handle all the other rotating discs you can place in a drive like DVDs
and Bluerays – as well as the (usually) less round ISO file storage device: USB
sticks. The method is so special that it comes with its own binary apt-cdrom
to help users work with it. It would also really need some love through, so if
you were looking for some way to contribute to Debian and love playing with CDs
& USB sticks…
The usual suspect: http
¶
In all likelihood that is what you are using on all your systems. So, there
isn't much to say about it expect that in buster it finally got its own manpage…
have a look at apt-transport-http(1)
someday (even translations are available).
That it is used so often doesn't mean people know or use all the features through:
See e.g. auth.conf
(granted, not that many repositories are password protected)
and automatic proxy configuration. Tip: Have a look at the auto-apt-proxy
package.
HTTP2 isn't supported yet and that might still be a while given there isn't a whole lot of point in it for the apt usecase as we know which files we want to acquire & that they are static, but at some point perhaps… [Yours truly still vividly remembers being told by some proxy/mirror people at a conference years ago that pipelining is way too much state keeping to support and apt shouldn't use it just like browsers! These guys probably love HTTP2…]
What most people don't realize is that this transport actually does more
than wget
, curl
or even your webbrowser like having support for SRV records
– something your webbrowser doesn't and probably never will support. SRV
records are what powers deb.debian.org
in case you wonder whats the point.
Oh, and apropos point…
The pointless: apt-transport-https
¶
SCNR.
What the title actually refers to is that apt contains in the 1.6 series
the https
transport directly, so apt-transport-https
is now an empty
transitional package (aka: pointless to install).
Implied is of course something different: That HTTPS would be pointless for
APT. It might or might not be, depending on your specific angle. There was a
lot written about it, so feel free to read that if you must – e.g. Why does
APT not use HTTPS. My point here is mainly
that APT can if you want and its easier than ever. deb.debian.org
can be
accessed via https if you are looking for a mirror.
It also comes with a manpage since recently with apt-transport-https(1)
which
also mentions the most interesting feature of HTTPS: Client certificates – as
access control via username and password is boring.
A small "gotcha" of sorts is that we have opted to forbid redirects from https to http, which breaks a lot more https sources than you would hope, but we decided that if you go for https, you probably don't want to compromise it all for an unsafe redirect. Other less specialized downloaders like wget or curl are less picky…
Sidenote: https
is nowadays implemented as a tiny layer over http
. We used to
use the curl library to implement a semi-independent https
but over time that
became really ugly. The redirect-downgrade mentioned in the previous paragraph
was colossal pain, redirections in general needed to be handled carefully,
SRV support not on the horizon and so on. That isn't to say that curl is bad –
it is just not really compatible with the architecture we already have.
The tearjerker: apt-transport-tor
¶
Pretty much every reason for using HTTPS is potentially better served by using Tor and
thankfully it is super easy to use it: Just install the package and prepend
tor+
to all URLs you have in your sources.list files. The
README
has details and also points to various onion-addresses you can use instead
of boring normal domains (that hopefully explains also the tear-pun).
Sidenote: Implementation wise Tor is just a SOCKS proxy, so all this method does is setting some Proxy configuration and then let http(s) do its job, so we wouldn't really need an extra package for it – but its easier for a user that way and I would really like to make it even easier if we had some more contributions on the documentation and scripting front… (hint hint).
The magician: mirror
¶
With my rewrite in 1.6 mirror
became my personal favorite and it might be
yours too at the end of this section. 🙂
This method doesn't implement a download protocol on its own, it instead just acts as manager instructing other methods to do stuff by first downloading a file listing one or more mirrors and then distributing all requests it is asked to handle to a mirror from the list – and potentially a different mirror for each request… so, it is a potentially local variant of the decommissioned httpredir service, but integrated into apt resolving some (or all) problems it had.
Beside the obvious "I want apt to pull packages from 3 mirrors at the same time" usecase this obviously has it can deal gracefully with partial mirrors as well as mirrors which are less frequently synced without requiring a clever service keeping tabs on it (which was one of the reasons httpredir eventually died).
The manpage apt-transport-mirror
uses this contrived advanced example:
file:/srv/local/debian/mirror/ priority:1 type:index
http://partial.example.org/mirror/ priority:2 arch:amd64 arch:all type:deb
http://ftp.us.debian.org/debian/ type:deb
http://ftp.de.debian.org/debian/ type:deb
https://deb.debian.org/debian/
That is just to show off, but should be enough reason for you to go read that
manpage. Yours will likely be a lot simpler… mine just mentions some mirrors,
is a local file and accessed via the slight arcane tor+mirror+file
transport
which means: get the local mirror file and access all mirrors listed in there
via Tor…
There is a lot you can do with that already, but there is certainly some more stuff missing or could be improved. Feel free to get in touch if you have ideas, with or without patch attached. 😉
Street vendors united: Third-party transports¶
All transports mentioned so far are either bundled with apt
or maintained by
the team, but with -s3
and -spacewalk
there are at least two transports in Debian
maintained by others and in the past there was also -debtorrent
but that is no
longer maintained. -tor
started in this group here as well.
Sadly, some things which should be implemented in transports (if at all) aren't
like the dreaded never-ending stream of apt-fast
implementations which usually
ship with enormous security problems – but at least they are very fast at being
insecure –, so I can only encourage exploring the transport system if you think
apt should learn to acquire files in a certain fashion or over a certain
protocol.
Bonus: Using apt as wget/curl replacement¶
Okay, it might be a bit of an overstatement, but for a quick download you can call
/usr/lib/apt/apt-helper download-file https://example.org /path/to/file
and with
an optional third parameter you can provide a hashsum for the file. The killer
feature might be that you can use any transport here, so tor+http
works and
does the right thing: That tends to be harder to do with wget/curl. As a bonus this
will drop privileges and might even use seccomp, but security might be the
topic of another day…
tl;dr
I will leave you now to reconfiguring your sources, especially mirror
hopefully gave you some ideas. See you next time! 🙂
P.S.: As seems usual by now, this post was basically done months ago, but I didn't want to post it while people where arguing if curl or apt has more CVEs in its implementation as that was rather silly to watch and also not at the time some people were asking for my head for suspected intended CVE code injection. But after that nonsense cooled off I had forgotten that this was never published… well, better late than never.