GSoC 2016: Summary

Intro, Outro and tl;dr

I participated again as a student in this years edition of the Google Summer of Code with Debian on the project APT↔dpkg communication rework. My initial proposal on the wiki details me and the plan, while this post serves as link hub and explanation of what I did. You can also find personal week-to-week posts starting with Day 0 right here on this blog, too.

The code for this project was already merged and uploaded to Debian unstable multiple times over the course of the summer, so everything described here later can be experienced directly. The official GSoC2016 final is 1.3~rc2, but APT always moves forward and I have no intention of leaving it alone, so this tag just marks the end of the GSoC2016 period and my return to "normal" contribution levels.

On a social front I finally applied for and shortly after received "Debian Developer, uploading" status. This is also the moment I want to thank Michael Vogt (mentor, apt), Julian Andres Klode (apt), Manuel A. Fernandez Montecelo (aptitude), Guillem Jover (dpkg), Enrico Zini (AM), the Debian Outreach team and the countless people I should have mentioned here, too, who have all helped me in many ways over the course of this GSoC and my entire Debian journey up to this point.

It was an overall great experience to work again on something as important as APT in Debian on a full-time basis. After two (very different) GSoCs in 2010 and now 2016 I can full heartily recommend to any student with a passion for open-source to apply next year. Perhaps in Debian and maybe in an APT project? We are waiting for YOU!

Statistics

My first commit as part of GSoC was made on 25. April titled edsp: ask policy engine for the pin of the version directly (minor bugfix), the last commit I will be counting on 17. August titled methods: read config in most to least specific order (regression fix). Not all of them are directly related to the GSoC project itself (the first is), but "just" in the timeframe (like the last) but were handled as part of general emergencies or for similar reasons described later and/or in the weekly reports. This timeframe of 115 days saw a total of 222 commits authored by me + 9 commits committed by me for others (translations, patches, …). The timeframe saw 336 commits as a whole making me responsible for a bit shy of ⅔ of all APT commits in this timeframe with on average of nearly 2 commits each day. A diffstat run over my commits says "322 files changed, 11171 insertions(+), 5847 deletions(-)" consisting of code, documentation and tests (this doesn't include automatic churn like regeneration of po and pot files, which deludes the global statistic). As a special mention our tests alone changed by: "109 files changed, 2759 insertions(+), 1063 deletions(-)". In my weekly reports here on this blog I used ~10574 words (not including this post), another ~23555 words in the IRC channel #debian-apt and sometimes very long mails to deity@ and bugreports (~100 mails) [Not counting private chit-chat with mentor via IRC/mail].

APT External Installation Planner Protocol (EIPP)

The meat of the GSoC project was the ability to let libapt talk to (external) executables (called planners) which are tasked with creating a plan for the installation (and removal) of packages from the system in the order required by their various dependency relations, similar to how libapt can talk to external dependency solvers like aspcud via EDSP. The protocol (current, archive) details how apt and a planner can communicate. APT ships such an external planner already in the form of 'apt' which is "just" using the internal always existing planner implementation, but reads and talks proper EIPP. The major benefit is testing here as it is now possible to generate an EIPP request, feed it to different versions and compare results to find regressions and similar. It also helps in bugreports as such a request is now auto-generated and logged so that it can be easily attached to bugreports and a triager can use that file to reproduce the problem. Previously recreating the system state a user had before the failed upgrade was a very involved, error prune and time consuming task (actually fixing the problem still is, but at least the first step got a lot easier).

APTs good old planner implementation saw also the activation (and fixing) of many previously experimental options intended to optimize the process blocked previously by items of the next paragraph, which makes it look like a new planner now. Entirely new planners exist as prototypes, but they aren't suitable for real use yet due to not handling "edgecases" and being effected by bugs in dpkg. Summary: Everyone can create and work on his own planner in the programming language of choice and run it against realworld cases directly opening a competition space for the invention of future improvements.

APT↔dpkg communication

The other major building block and donor of much of the projects name. Assuming a planner has figured out a plan there is still much left to do which is of no concern for each planner but handled centrally in libapt: The actual calling of dpkg and interpreting its replies. That sounds easy enough, but if you imagine the need of thousand of packages to be installed/configured at once you fear hitting something as barebones as the kernels maximum allowed commandline length. That happened once in a while in the past so finding better solutions to that problem within easy reach (as in: existing in dpkg already, new interfaces for possible future use are a different matter) is in order. Other problems included the overuse of --force options, not communication the purge/removal intentions to dpkg, insufficient crossgrade handling and avoiding losing user configuration on conffile moves involving packages to be purged just to name a few. But also listening to dpkg in terms of how it processes triggers and how all this should be reported in the form of progress reports to the user especially if some steps aren't explicitly planned anymore by a planner, but left to dpkg to do at some point.

The result is mostly invisible to the user, expect that it should all be now slightly faster as e.g. triggers are run less and most "strange" errors a thing of the past.

Side projects, emergency calls and random bufixes

Not actually on my plan for GSoC and at best only marginally related if at all I ended up working on these to deal with important bugs on a "as long as we have a full-time developer" basis.

This includes hunting for strange errors if rred is involved in updating indexes, further preparing for a binary-all future, fixing SRV support, being the master of time, improving security by allowing it to be sidestepped sometimes, improving security by potentially breaking backward-compatibility a bit, stumble into libstdc++6 bugs, implement SOCKS5 proxy support and generic config fallback for acquire methods to be able to propose the mergeback of apt-transport-tor among very many other things.

A complete list can be found with the previously shared git-branch browsing starting at my first commit in the GSoC timeframe (see also statistics above).

Future

I would love to keep working on APT full-time, but that seems rather unrealistic and all good things need to come to an end I guess, so the next weeks will have me adjust to a more "normal" contribution level of "just" in my (extended) free time again. I will also be casually "shopping" for a new money source in the form of a small job while returning to university which hasn't seen a lot of me the last few months and picking up some community work I had delayed for after GSoC. That means I will surely not keep my daily commit average up, but my journey here is far from over:

After many years in APT and Debian there is still something new in it to explore each week as both are evolving continuously – but most of it hidden in plain sight and unnoticed by the general public: Around the start of GSoC I was talking on #gsoc with an admin of another org who couldn't imagine that Debian participated at all as all projects Debian could offer would be bitsized in nature: It is just a distribution, right, not a real org producing value (slightly exaggerated for drama). I would like to concur with this view of course for obvious reasons.

My life would be massively different if I hadn't started to contribute to Debian and APT in particular ~7 years ago – even through I thought I wouldn't be "good enough" back then. I think its fair to say that I showed my past self that in fact I am. Now it is your turn!

Week 16: Constant Optimization

This week saw the release of 1.3~rc1 sporting my tor-changes disguised as general acquire methods changes I mentioned last week as well as the revamp of apt talking to dpkg (and back) I worked on the last couple weeks as part of GSoC. It doesn't include any new planner, a stub is still lying in a wip branch, but our good old planner looks and behaves slight different, so it feels like a new one – and surprising as it is: So far no bugreport related to this. Probably all user systems caught instantly fire! 🙂︎

So, the week was further used to get used to cmake, build and run apt on various porterboxes to fix testcase failures, fixing other incomings and especially pull some hair out while debugging the bug of the week which lends the title to this blogpost: A one word fix for an issues manifesting itself only in optimization level -O3 on ppc64el. Optimizations are evil…

Beside causing quite some time waste for this week as well as in previous years it is also closing a loop: I introduced this problem myself while being a GSoC student… in 2010. Time really flies. And I have no idea what I was thinking either… I could be describing more of these "tiny" bugs, but the commit messages tend to do a reasonable job and if you are really that damn interested: Feel free to ask. 🙂︎

This next week will be the last official in GSoC from a students POV as I am supposed to clean up all bases & submit my work for the final evaluation – this submit will be as a blogpost describing & linking to everything, which equals miles long and relatively soon, so that I purposefully have kept this one a very short one so you will have enough energy to bear with me for the next one.

Week 15: Onion ordering

The week started badly: I had for a long while now a dead 'e' key on my keyboard, but I didn't care that much… I just remapped CAPSLOCK, retrained my fingers and be done with it [I have some fucked up typing style anyhow]. All good, but at this week additional keys started to give up. You have no idea how annoying it is to not be able use the arrow keys. Many things I work with have at least the vim-keybindings, but even in vim picking an autocompletion becomes a nightmare (or navigating shell history)… So, replacement keyboard please! That took a while, especially replacing it as my laptop makes that extra hard it seems but oh well. All working again now! The c-key is actually working a bit too good (you have to only touch it now, which had me worried as it started out with producing and endless stream of 'c' out-of-the-box before I removed the cap once) but so be it for now.

As you might guess that wasn't the ideal work-environment and slowed me down (beside being annoying), so what I intended to do as a sideproject turned out to be covering most of the week. Mergeback of apt-transport-tor into apt? Yes, no, maybe? The first few responses by mail & IRC are in regards to the plan, but that still had the need for a lots of code to be written and refactored. I have to say, implementing SOCKS5 proxy support in apt was kinda fun and not nearly as hard as I had imagined. Slightly harder it was to get a setup working in which I could test it properly. Everyone knows netcat, but that really targets more text-based protocols, not binary ones like SOCKS5. Still, I managed to figure out how to do it with socat eventually, resulting in a testscript we can at least run manually (as it requires a specific port. Not every tool is as nice as our webserver which can be started on port 0 and reports the port it eventually picked for real). Playful as I am I even compared my implementation to others like curl, which our https method is using, where I ended up reporting minor bugs.

But why and why now you might ask: apt-transport-tor can be (surprise surprise) used to let apt talk to Tor network. Conceptionally that isn't incredibly hard: The Tor software provides a SOCKS5 proxy an application can connect to be done. Two years ago then apt-transport-tor was introduced only our curl-backed https method could do that & the intention was be able to make backports of that transport available, too, so even through I wasn't all that happy about it, we ended up with a modified copy of our https method named tor in the archive and as it is with most modified copies of code, they aren't really kept in sync with the original. I really want to get this resolved for stretch, so it slowly gets hightime to start this as if it turns out that I need to take over maintenance for it without previous-maintainer consent there is quiet a bit of waiting involved stuff like this should really not be changed last-minute before the freeze… you will find more details in the mentioned mail on the reasons for proposing this solution in the mail.

Beside, SOCKS support is actually a so much requested feature that the internet actually believes apt would support it already via Acquire::socks::proxy … which is and will also be in future wrong as there is no socks method we would configure a proxy for – if at all you configure a method like http to use a socks proxy…

Of course, bundled with this comes a bunch of other things like better redirection handling across methods and stuff, but that isn't really user visible, so I skip it here and instead refer you to the git branches if you are really interested. A few things I will surely also mention then the relevant code is in the archive so that interested peers can test…

On my actual battle front, the ordering progress was insignificant. I got lots of manual testing and review done, but not much new stuff. The problem is mostly that ordering is easy as long as the problem is easy, but as soon as Pre-Depends are entering the picture you suddenly have to account for all kinds of strange things like temporal removals, loops conflicting or-groups, … stuff you don't want to loose hair over while losing hair over your broken keyboard already. 😉︎

This week for realz, although target is now really more to merge the current stuff for apt 1.3. A new ordering algorithm is as detailed in the initial proposal buster material anyhow – and given all the changes in terms of trigger delaying and pending calls you are likely not to recognize our "old" ordering anymore, but more on this in the next two weeks as that will be the end of GSoC and hence I am going to look back at "the good old times" before GSoC compared to what we have now. 🙂︎

P.S.: This weeks weekend marks the start of a big wine festival in our state capital my family is one of the founding members of. I am "just" going to help building the booth through, so no giant timesink this time – just a couple hours – just in case you hear me saying something about wine again on IRC.

Week 14: This is CROSS!

Picture me as a messenger kicked into an endless pit of complexity by what was supposed to be an easy victim. It wasn't /that/ bad, but massaging apt to treat crossgrades right took some time – and then some more to request that dpkg would handle some of it, too, as it gets confused by multi-instance packages. Much like apt although that was just effecting apts progress reporting so not that bad… perhaps I am play-testing too much

In unrelated news I dealt with the two acquire bugs which started last week as such bugs are annoying & carry the risk of being security problems which would require immediate attention. Thankfully non of them seems to be one, but #831762 had me seriously worried for a while. Trivial in retrospective, but getting to a point in which you consider the possibility of that happening at all…

But back to the topic: There was one thing still needed to get our current internal planner a bit "smarter" by enabling options which existed for a while now, but were never activated by default, and that thing was simulation. While its kinda appealing to have the simulation only display what the planner explicitly told use to do, ignoring what would implicitly be done by the --pending calls we can't really do that as this would be an interface break. There are surely scripts out there doing funny things with this output so having it be incomplete is not an option, which in turn means that what I did internally for the progress reporting (and hook scripts) must also be done in the simulation. Easier said then done through as the implementation of it followed a direct approach running the simulation of each action as soon as the action was called rather than collecting all the actions first to post-process them (as I want to do it) and execute them only then. Add to this that this is a public class, so ABI is a concern… the solution I arrived at is slightly wrong, but is going to satisfy all existing callers (which is only aptitude in the archive thanks to codesearch.d.n) and reuses the "dpkg specific" code, which is a layer violation, but reuse is better than copy&paste without breaking ABI, so I am happy all things considered.

So, with that out of the way glory awaits: Changing the default of PackageManager::Configure from "all" to "smart"… and tada: It works! The simulation shows everything, the dpkg invocations are much shorter, trigger executions delayed, we rely more on --pending calls and progress reporting is properly moving forward as well! Not 100% production ready but good enough for a public wip branch for now (= wip aka: going to be rebased at will).

This also includes the barebones 'dpkg' planner I mentioned last week, based on that I was playing with ideas to find a more viable implementation (= handling Pre-Depends) but nothing of particular note produced yet. Maybe I can get something working this week – it is at least part of the plan beside polishing my wip branch – after leaving that pit that is… Hello, is anyone up there? Hello? Hello? …

Week 13: Progress reporting

On the public side of things I did a bunch of things this week which weren't exactly related to the GSoC project which were triggered by incoming mails (and IRC highlights) with bugreports and interesting whichlists, which isn't completed yet as there are two new responses with debug logs waiting for me next week.

I say that as I brushed up a smallish commit for merge this week which is supposed to deal better with triggers in progress reporting. That sounds boring – writing progress reporting stuff – but many people care deeply about it. While this one is more of a cosmetic change its conceptionally a big one: With the actions the planner proposed, apt builds for each package a list of states it will pass through while it is installed/upgraded/removed/purged. Triggers rain in this parade as we don't know beforehand that a package will be triggered. Instead, a status message from dpkg will tell us that a package was triggered, so if we don't want to end up in a state in which apt tells via progress report that it is done, but still has a gazillion triggers to run we have to notice this and add some progress states to the list of the triggered package – easy right? The "problem" starts with packages which are triggered but are destined to be upgraded to. A triggered package will loose its trigger state if it is unpacked, so our progress report has to skip the trigger states if the package is unpacked – we can't just exclude packages if they will be unpacked as it can easily be that a package is triggered, the trigger is acted upon and is upgraded "ages" later in this apt run.

My wip branch contains many more progress related commits as there is a big gotcha in the description above: I said "with the actions the planner proposed", so what about the actions the planner isn't proposing but will happen as part of dpkg --configure/--remove/--purge --pending calls? And what about hook scripts like apt-listbugs/changes which get told the actions to perform their own magic?

The solution is simple: We don't tell dpkg about this, but for our own usage we do trivial expansions of the --pending commands and use these for progress report planning as well as for telling the hookscripts about them. That sounds like a very simple and optional thing, but it was actually what blocked the activation of various config options I had implemented years ago which delay trigger execution, avoid explicit configuration of all packages at the end and all that which I could now all enable – a bit more on that after this hits the master branch. 🙂︎

I also looked into supporting crossgrades better. In apts conception a crossgrade is the remove of a package of arch A and the installation of a new package (with the same name) of arch B. dpkg on the other hand deals with it like a 'normal' upgrade, just that the architecture of the package changes. The issue with that isn't gigantic usually, but it becomes big with essential packages like if you try to crossgrade dpkg itself with apt: APT refuses to do that by default, but with enough force it will tell dpkg to remove dpkg:A and then it tells dpkg to unpack dpkg:B – just that there is no dpkg anymore which could unpack itself. At least in that case we can skip the remove of dpkg:A, but we can't do it unconditionally as that might very well be some part of an order requirement, so progress reporting should be prepared for either to (not) happen… That isn't finished yet and will surely leak into next week.

Next week will also see my freshly built planner 'dpkg' get a proper tour: With all the --pending calls it seems like a good idea to try to be extra dumb and have a planner just unpack everything in one go, let the rest be covered by --pending calls and see what breaks: Obviously the harder stuff, but I have two directions I would like to explore based on this minimal planner to make it viable. First I have to finish the crossgrading through, my usual self-review of commits and the bugreports I participated in this week want to trigger further actions, too… see you next week!


These are the latest five out of a total of 17 posts tagged gsoc2016. You can browse a list of all posts tagged gsoc2016 here or subscribe to the RSS feed or Atom feed for this tag to be notified of future posts.