The last 10 posts tagged debian are shown here. You can find a list of all posts tagged debian here or subscribe to the RSS feed or Atom feed to be notified of future posts.

Shadow on a sparse null diet

Tomato wrapped with a tape measure
Image by Myriams-Fotos on pixabay

Summer is slowly ending (at least in my hemisphere), so dieting tips to slim down in time for the beach are a bit out of season at the moment, but who doesn't like a guy who suggests that everyone's shadow is too fat and implements a \0 diet anyhow? Thankfully that isn't only surprisingly easy, but also surprisingly impactful:

This is the story of how I ended up potentially saving terabytes of disk space across the world by changing two lines of code.

So, what happened?

Some background first: If you want to create a new user on Debian, you tend to invoke adduser, which is simply put a wrapper around useradd. This tool is part of shadow-utils commonly packaged as just shadow as it deals with the /etc/passwd and /etc/shadow files. It also deals with two log files: /var/log/lastlog and /var/log/faillog.

Those logs store information as you might have guessed already by their name: (Among other things) the time a user logged in last and if login attempt(s) failed. They do so in a binary format and not by appending to the file but each user (based on their UID) has a predefined place to record this information.

So far so good (or bad, depending on how good your spider-sense is). Now, users can not only be added, but also deleted and hence their UID be recycled. So adding a user has to ensure that old log data for a since then deleted user isn't recycled for the new user with that UID. The data hence needs to be reset – which in binary and in this format means overridden with zeros.

My initial problem with this was on a personal obsession-level: If you bootstrap a Debian system the /var/log/lastlog (/var/log/faillog) is empty before apt's postinst is running and after that they are (sparse) files of size 29492 (and 3232) bytes containing nothing but zeros. Seemed like a total pointless waste of resources to me (and something I can't reasonably explain and certainly don't want to emulate in apts postinst script).

Okay, sparse means they are full of holes (yes that is a technical term, see lseek(2) manpage for details) meaning most of those zeros are implicitly here, but don't take up any space in reality. Or at least in some realities, but we will come back to that. Lets just assume for now that I would just prefer those all-zero files, sparse or not, to not be created (or, well, filled in, as they are already created, just empty).

If we called useradd directly, we could use the --no-log-init which is documented to avoid resetting the databases, but Debian uses adduser and that doesn't have such a flag and what would happen if the files weren't empty but in fact a user with that UID existed previously and was deleted so that we indeed need to reset… – wait: Didn't I say the files were empty before useradd was run in my use-case? Empty files surely have no data of previously deleted users, so why should a reset be performed? Turns out the reset is performed as long as the files exist, regardless of their size.

So, how about changing access() to a stats() call and don't perform the reset if the file is smaller than the offset we want to reset? Merged upstream after a short discussion in a single week. Now patiently waiting for an upstream release and/or Debian to package it (an MR to fasttrack the patch into Debian was already merged in salsa).

And there you have it, the files remain empty after apts postinst runs, waiting to be filled by actual data at the time someone actually performs a login (probably not to _apt user, but that doesn't matter). I am happy and the rest of the world doesn't care…

Or perhaps they do?

The UID apt usually gets assigned is 100. That is pretty tiny. The libvirt-daemon-system package e.g. creates a user with the fixed UID 64055. In other words: On install your lastlog file grows to 18 MBs even if you probably never will log in to that user and likely never had a user with that UID before. Okay, pretty much all of those 18 MBs are one big hole, but with a popcon of 12694 (6.64%) that means a lower bound of ~230 GB of holes on Debian machines alone. Still a tiny UID through, some upstreams want to avoid stepping on individual distribution toes and pick UIDs well past 1000000000…

It is just a hole and nobody cares about the size of a hole, right?!

Just ship it in a container

Some people seem to disagree on that. So much so that the documentation of Docker calls using --no-log-init a best practice mentioning both that Debians adduser doesn't have that flag and that an unresolved go bug from 2015 is ultimately to blame that Docker images blow out of proportions if you deal with big holes in lastlog.

Sparse files and their holes are operation system und underlying storage support dependent (all the most interesting filesystems like btrfs, ext4 and tmpfs support it since at least Linux kernel 3.8), but as sparse files are by design behaving like normal files to unsuspecting applications your choice of file copy/transfer, backup and tarball creation can result in a sudden decompression of the holes.

Go's tar implementation is e.g. far from the only one missing support for sparse files. It is even ahead of the curve with supporting them while reading. So, if your docker/podman/… images, tarballs or your off-site backups turn out to be a bit (or a lot, depending on the UIDs you set up in them) smaller than before even if you didn't follow the best practice to opt out of the reset: You are welcome.

And the moral of the story? Don't document workarounds as best practice.

Thanks to Johannes Schauer Marin Rodrigues for apt!254 triggering my obsession and later pointing out Docker as a potential benefactor of my shadow!558 change and of course shadow upstream maintainers for entertaining my tiny contribution.

Sidenote: This of course changes nothing about the problem inherent to lastlog to potentially growing absurdly and spontaneously if you happen to login to a user with a high UID. It just delays this potential failure mode on the assumption that such a login attempt will not happen in practice – or that you at least do not place the resulting system back into a tarball at least. Workarounds like --no-log-init, LASTLOG_UID_MAX or systemd-sysusers bank on that as well: They all have the latent problem of a yo-yo diet…

Sidenote 2: useradd could be made more clever by e.g. not writing zeros into an existing hole. Trimming the file on resets and so on. That isn't resolving the underlying problem of a fixed unchangeable binary format through. Alternatives exist – they even are on your system already: Hello utmp and co, see last (although, if you ask me, that should have been an alias for tail -n1 – just saying). systemd in the mean time has a TODO entry to fold them all into journal and doesn't care about lastlog otherwise.

Winning the Google Open Source Lottery

lottery ticket with some boxes ticked and a few euros alongside it
Image by jackmac34 on pixabay

I don't know about you, but I frequently get mails announcing that I was picked as the lucky winner of a lottery, compensation program or simply as "business associate". Obvious Spam of course, that never happens in reality. Just like my personal "favorite" at the moment: Mails notifying me of inheritance from a previously (more or less) unknown relative. Its just that this is what has happend basically a few weeks ago in reality to me (over the phone through) – and I am still dealing with the bureaucracy required of teaching everyone that I had absolutely no contact in the last two decades with the person for which I am supposed to be one of the legal successors now, regardless of how close the family relation is on paper… but that might be the topic of another day.

On the 1st March a mail titled "Google Open Source Peer Bonus Program" looked at first as if it would fall into this lottery spam class. It didn't exactly help that the mail was multipart HTML and text, but the text really only the text, not mentioning the embedded links used in the HTML part. It even included a prominent and obvious red flag: "Please fill out the form". 20% Bayes score didn't come from nothing. Still, for better or worse the words "Open Source" made it unlikely to be spam similar to how the word PGP indicates authenticity. So it happened, another spam message became true for me. I wonder which one will be next…

You have probably figured out by now that I didn't know that program before. Kinda embarrassing for a previous Google Summer of Code student (GSoC is run by the same office), but the idea behind it is simple: Google employees can nominate contributors to open source stuff for a small monetary "thank you!" gift card. Earlier this week winners for this round were announced – 52 contributors including yours truly. You might be surprised, but the rational given behind my name is APT (I got a private mail with the full rational from my "patron", just in case you wonder if at least I would know more).

It is funny how a guy who was taken aback by the prospect of needing a package manager like YaST to use Linux contributed just months later the first patch to apt and has roughly 8 years later amassed more than 2400 commits. It's birthday season in my family with e.g. mine just a few days ago, so its seems natural that apt has its own birthday today just as if it would be part of my family: 19th years this little bundle of bugs joy is now! In more sober moments I wonder sometimes how apt and I would have turned out if we hadn't meet. Would apt have met someone else? Would I? Given that I am still the newest team member and only recently joined Debian as DD at all…

APT has some strange ways of showing that it loves you: It e.g. helps users compose mails which end in a dilemma to give a recent example. Perhaps you need to be a special kind of crazy1 to consider this good, but as I see it apt has a big enough userbase that regardless of what your patch is doing, someone will like it. That drastically increases the chances that someone will also like it enough to say so in public – offsetting complains from all those who don't like the (effects of the) patch which are omnipresent. And twice in a blue moon some of those will even step forward and thank you explicitly. Not that it would be necessary, but it is nice anyhow. So, thanks for the love supercow, Google & apt users! 🙂

Or in other words: APT might very well be one of the most friendly (package manager related) project to contribute to as the language-specific managers have smaller userbases and hence a smaller chance of having someone liking your work (in public)… so contribute a patch or two and be loved, too! 💖

Disclaimer: I get no bonus for posting this nor are any other strings attached. Birthdays are just a good time to reflect. In terms of what I do with my new found riches (in case I really receive them – I haven't yet so that could still be an elaborate scam…): APT is a very humble program, but even it is thinking about moving away from a dev-box with less than 4 GB of RAM and no SSD, so it is happily accepting the gift and expects me to upgrade sooner now. What kind of precedence this sets for the two decades milestone next year? If APT isn't obsolete by then… We will see.


  1. which even ended up topping Hacker News around New Year's Eve… who would have thought that apt and reproducibility bugs are top news ;) 

the new apt-transport-tor

sliced red onions
Image by ulleo on pixabay

It happened: Now that I am an uploading DD for a few months I finally made my first upload of a package – mind you, not of apt, but of a package I declared my intend to "steal" from another person a few weeks ago on deity@ and later also in a bugreport (#835128).

The result is that apt-transport-tor which used to be maintained by Tim Retout as a modified copy of apt code is now maintained by the APT team (with him and me as uploaders) using the apt code directly via a few symlinks.

That brings along a bunch of changes which I mentioned in the list/bug as well, but for completeness:

  • tor+https options consistently fall back to tor -> https -> http
  • tor+http options consistently fall back to tor -> http
  • socks5h isn't forced. It is just the default (and the only one which will work with (tor+)http at the moment; any with tor+https)
  • a tor-proxy having apt-transport-tor as username & no password (default) will automatically pick a password based on the target host to get you in a new circuit for each host.
  • the User-Agent isn't forced to an all-tor-users-have-the-same value. Especially with tor+http being our normal http I think its better to "hide" between other http users than saying straight that you are a tor user (even if the IP gives it away that you are).
  • tor+https doesn't allow redirection to tor+http. We have this for a while for https -> http already (-tor "broke" it). I think if a user went as far as configuring a https source it should stay an https source or fail.
  • http/https can be disabled to avoid accidentally adding such sources
  • http will not try to connect to .onion domains (RFC7687) and the error hints at using tor+http
  • the methods run as _apt instead of root (like the rest of the apt methods)

I had tried a few times to get people to provide feedback, but there wasn't much. I guess this is good as it means nobody has any complains about it. We will see if that will change now that it is on its way to archive, buildds, mirrors and users: Brace for impact in any case!