Software and hardware annotations, q1 2005

This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.

March 2005

I have finally gotten around to add the Ubuntu sources to my Debian /etc/apt/sources.list:
deb hoary main universe restricted multiverse
deb warty-security main universe restricted multiverse
deb warty-updates main universe restricted multiverse
deb warty main universe restricted multiverse
in order to have the option of installing Ubuntu-only packages or package versions. To ensure that only Debian packages are considered by default I have had to release-pin packages with an origin of Ubuntu to a low priority, by putting these lines in /etc/apt/preferences:
Package: *
Pin: release o=Ubuntu
Pin-Priority: 90
I have just had a look at the initrd for Debian and I was quite amazed to see it is 2MB compressed and about 5MB uncompressed. It is a pretty largish root filesystem, and some mini distributions are smaller.
I was looking at it to answer a question by someone as to how to prevent Debian from loading a specific SCSI driver at startup, even if the HA was present. Now it turns out that this is not easy because there are many excessively helpful mechanisms that try to automate system driver loading and configuration, starting with those in the initrd.
Having a renewed interest in VoIP I have started looking at developments a bit more recent than H323 and this obviously means both SIP, IAX2 and the Asterisk software exchange.
Unsurprinsingly it looks like the usual: the whole are is underdocumented and the desing of stuff is awkward and inconsistent.
Playing with the kernel and looking at the SUSE patches I have noticed they have a ZyDAS 1201 driver as a patch, version 0.15 of the ZyDAS 1201 driver. That patch applies cleanly and the driver just works, even the firmware loading seems good (I have put the ZyDAS firmware files in /usr/local/lib/firmware which is the right place for manually installed firmware files).
The newer releases of the driver are much better than the older ones (I had tried release 0.8 a while ago) and the 2.6 USB code seems to have improved a fair bit too. Even better, there is a note saying that ZyDAS is helping by giving documentation. This means ZyDAS joins the good Linux WiFi chipsets which is particularly welcome as ZyDAS chipset USB thingies are easy to find, cheap and small.
just out of curiosity I have tried to measure what is the effective 802.11b speed one can get, in not very optimal conditions. I have my AP in one room and a PC in the next room, with a wall in between that is rather radio opaque, and thus a signal strength of 35/128 and default parameters.
Under these conditions I could get around 600KiB/s, or around 5mb/s, out of the theoretical maximum of 11mb/s. This is not too bad, and very similar to the actual utilization, around 50%, with 802.11g but I suspect that in better conditions and with a little tuning (frame size, MSS, ...) this can be improved.
Well, I have been playing around with the Linux kernel release, and it seems a lot more reliable than previous releases in the 2.6.x series. I have made a couple of interesting discoveries:
  • It is now possible to select the elevator on a per-block-device basis, by using /sys/block/dev/queue/scheduler.
  • I figured out why I was getting much lower hdparm -t results under 2.6 than under 2.4 and the it appears that to get the same results under 2.6 I must raise the filesystem readahead set with hdparm -an to some large value like 512 blocks.
    Evidently 2.6 kernels don't automagically do as much readahead as 2.4 kernels do.
    Note that lots of readhead make streaming tests look better, but may be terrible for other uses...
Since the Linux 2.6.11 kernel release is so recent, and its two official updates to, I have a look again at a major distribution variant of the same kernel, the SUSE kotd package, to see how much of a variant it is; other kernels from major distributions are similar, for example there is a list of extra drivers in recent Mandrake kernels.
Well, it has almost 500 patches. Some of these add functionality like UML and Xen). But there are very many fixes... These are the patch collections:
Collection # of files total #
of lines
arch 19 16130
drivers 150 365887
fixes 106 14786
rpmify 13 911
suse 157 236323
uml 10 2817
xen 14 36169
Of these the suse and drivers collections seem to be mostly extensions, but they also contain a lot of fixes.
Now the question is, if these things are good for SUSE, who are careful people who test things and have many happy users, why aren't these patches in the original kernel? My comments on this are:
  • Well, some of those patches are not really relevant to the original kernel, because they are SUSE specific or not relevant to a general purpose kernel. But still, there are so many fixes.
  • In order to become part of the original kernel, they have to be submitted to the original kernel maintainers. I can easily see that the majaor distributions may think it is not in their best interests to proactively contribute their collections of fixes to the original kernel.
Another point is that of these changes to the kernel, very many are extensions, and they are distributed as source patches. I know that Linus Torvalds prefers it like this, but I still think that the Linux kernel should have some mechanism to allow modularitazion of the source. Even accepting that it is a monolithic kernel at runtime does not imply that its source should be monolithic to the extent it is now.
Things with P2P seem a bit better, or perhaps worse, than the impressions I got from looking at the people queued to download from my machine. Other statistics show that the percentage of user that cannot provide uploads is way less than the 40-50% I had summarily estimated. According to statistics by Razorback2 which is probably the biggest eDonkey directory site, only about 15% of users cannot provide uploads (LowID). So the mystery of why queues are so long and terrible and download speeds so low persists.
However, I tried a few other downloads. One, and a fairly large one, for an ISO9660 image with same media test files, started almost immediately and proceeded at high speed (around 40KiB/s). The reason seems to be that the file was seeded by some high speed servers from Razorback2 itself, thus showing how effective seeding is.
I then decided to try and download something that was a bit large but also with very many available sources. Finding something suitable was not easy, in part because of the variously objectionable nature of most of the really popular stuff (unfortunately most was not freedom software), in part because not a lot of files are popular.
Well, even with many complete sources, queuing took a fairly long time and downloads were not speedy. Typically once started there were 3-4 sources (out of a few hundred) each delivering around 3-10KiB.
Well, more observations on the dreadful P2P situation. I have left my eDonkey client running with a tasty selection of free software ISOs. The top uploads served are reported as KANOTIX-2005-01.iso with 1.2GB, ubcd32-full.iso with 0.9GB, knoppix-std-0.1.iso with 0.7GB, and I haven't had any download running for a bit.
I have occasionally tried to download something to test the download side, and while my upload side is constantly busy, when I try to download things often there is a single host offering them, and there are huge queues.
I am not not at all surprised that my experience so far (and many others I have read about online) has been so negative, with extremely poor download rates, scarce availability ofqyq seeds, and long queueing times.

February 2005

In the past few days I have done some trying out of P2P systems like eDonkey, Gnutella, OpenFT, and it is pretty obvious there are fairly big problems with the P2P model of operation. The main problem however is simply lack of bandwidth, that is of seeds for downloads. This is going to become worse and worse as ISPs are not trying to switch from high monthly fixes fees to low monthly fees and then charging for bandwidth, in both directions, but the situation is ugly enough as it is.
The main symptom is that I have had both aMule and giFT running for a week now constantly and I have uploaded well over six times more than I have downloaded. When I have tried to download something, it just gets stuck for many hours or days waiting for some site to become free, and then it downloads at very very slow speeds (so it took several days to get the ISO image of System Rescue CD which downloaded in a few dozen minutes from a SourceForge mirror). This has been worse for eDonkey than for Gnutella/OpenFT.
This seems to be a rather common experience, and indeed there are some obvious and systematic causes:
  • Almost all P2P hosts are either on a modem or an ADSL line. This means that at best the theoretical down bandwidth is half the up bandwidth, and in several cases it is one eighth, as many services offer 2mb/s accounts with a 256kb/s up limit (there are also technical reasons for inefficient line utilization).
  • About half of P2P hosts seem to be behind firewalls that forbid incoming connections completely. This is less of a problem for Gnutella/OpenFT, but it is just like that with eDonkey.
Now the combined effect of these two inevitable issues is that in theory download and upload banwidth should equalize to about half the typical/most common bandwidth, which is about 256kb/s, or in practice (taking into account some technicalities) the effective limit should be 28KiB/s. So I would expect that my up banwidth would be close to 28KiB/s, but my effective down bandwidth be around 14KiB/s, assuming that P2P is indeed peer to peer, that is sharing happens symmetrically.
But I usually observe that my aggregate downloads are for a lot less than that, that is when downloading happens it is for 3-6KB/s, and as a rule downloading doesn't happen at all for hours, as transfer requests are queued before getting a short burst of 3-6KB downloading, for an average download speed well below the typical 3-6KB, never mind being equal to the upload speed.
My up link not only runs at top capacity all the time (which is not good, as I am on a theoretically 1:50 contended service), I also see in the queue of people waiting for a chance to download from host around 80-100 hosts, and some of them have been queued for days.
All this indicates that not only maximum upload speeds are on average much lower than download speeds, because of asymmetric speeds for both v.90 modems and ADSL, and that around half of the hosts participating don't accept clients at all because of firewalls, but that very very very few sites are actually sharing.
In other words the typical usage pattern is that people get online, wait a long time to download something they are interested in, and then once the download is complete, they close the connection/sharing.
In other words that files are shared just about only when they are being downloaded, as they are being downloaded, and then only in half of the cases, and at less tha half speed, and most crucially when they are mostly incomplete.
The waiting happens because very few of the P2P hosts have complete file image to share, and everybody else has got incomplete ones that are incomplete in the same way.
In other words:
  • P2P actually have very hierarchical, download style usage patterns.
  • There are very few seed hosts to kickstart the temporary sharing that is what actually takes place, and these seeds are on relatively slow and overloaded connections.
In effect P2P networks are not fully peer to peer, they are shared download systems (much like BitTorrent), with not much in the way of sites to download from to start with.
The consequence is that P2P systems currently are just about useless for an important and interesting use, which is to replace or augment FTP/HTTP/RSYNC/Torrent sites as the primary distribution mechanisms for free software, and in particular for ISO images of free software operating system install CDs.
This is highly regrettable, because P2P could instead be a particularly efficient viral marketing channels for free software installers.
Two fixes are possible, one both weak and unfeasible, and the other unlikely but in theory excellent but for a detail:
Change the behaviour of peers to continue sharing even after the download completes.
This is weak because peers, most of whom have consumer grade, contended modem or ADSL connections, have pitiful upload bandwidths to offer, and it is unfeasible because it just goes against the grain of user behaviour, and the more commonly ISPs charge for bandwidth, against their self interest.
Put the same repositories that currently offer their archives on FTP/HTTP/RSYNC/Torrent on P2P networks too
This can work really well and greatly improve the reliability of downloads from those sites, at the same time relieving them of a large part of the bandwidth cost, as after all while people download they also end up sharing. It is unlikely however that this will happen, as many repositories are publicly funded (e.g. hosted by universities) and P2P systems have been demonized as vehicles for dishonest and criminal behaviour. The technical problem is that P2P systems typically present a completely flat view of the namespace of available files, and most existing archives are arranged, for very good reasons, hierarchically. This can be fixed by having P2P servers flatten file paths into file names, which is not hard.
As a final note, I suspect that the current popularity of P2P systems despite their awful performance is due to historical causes; in the beginning all P2P systems probably were in effect seeded by University students, and in particular computer science ones, who enjoyed symmetrical and very high bandwith connections thanks to attachment to their campus network.
Then the enormous amount of bandwidth consumed and the illegal nature of much of the content offered for sharing led universities to forbid such seeding, and the P2P systems remaining out there are now seedless and sad ghosts of what they were, still popular thanks to fresh memories of a golden age that is no more.
I am looking into P2P programs, mostly based around the eDonkey or the OpenFT protocols.
The motivation of this research is that freedom software packages are becoming ever more sophisticated and bigger, in particular for the albums/compilations known as distributions, especially the live CD ones.
The existing methods are all somewhat unsatisfactory:
  • Download is from a single server per file, putting huge loads on server.
  • No built in verification of the integrity of the transferred file.
  • When an MD5 checksum file is also available, this only tells whether the download failed, not where.
  • Partial downloads in practice can only be restarted from the end.
  • Fortunately there are very many FTP and HTTP servers, even if they are prone to congestion, unfortunately there are few systematic catalogs of servers and indexes of their contents, with the result that the well known servers are even more prone to congestion.
  • RSYNC downloads in chunks and verifies the integrity of each chunk, and can redownload any arbitrary chunk, so that's pretty nice.
  • There is still a single download source at a time.
  • There are relatively few download servers.
  • There are even fewer ways to find catalogs of RSYNC servers and indexes of their contents than for FTP and HTTP.
  • Existing RSYNC clients are slightly more awkward than FTP or HTTP servers, which have nice shell-style or commander-style interfaces.
  • BitTorrent is basically RSYNC where chunks can come from many different servers, which all register with a the original BitTorrent server, which may or may not be the one with the original content.
The ridiculous font situation under Linux is getting ever worse. I have been looking at changing the font used for the GUI elements (toolbars, menus, not the page) in Mozilla and Firefox. The following disgusting issues arose:
  • Mozilla uses GTK 1, which uses the X11 native font system, and Firefox uses GTK 2, which completely ignores it in favour of that idiocy, FontconfigXft2.
  • One can change the Mozilla GUI font by editing/overriding the theme description for its GTK 1 theme, that is by adding some poorly documented lines to $HOME/.gtkrc.
  • In theory, and as documented, one can change the GUI font for Firefox by similarly editing/overriding the GTK 2 theme, by adding some poorly documented lines to $HOME/.gtkrc-2.0.
  • The GTK 2 per-user theme file is called .gtkrc-2.0 even in the 2.2 and 2.4 releases of GTK 2.
  • The font specification in the .gtkrc-2.0 file uses the setting gtk-font-name whose syntax is similar to, but incompatible with that of Fontconfig/Xft2 font names, which in turn is hardly documented, and the differences seem gratuitous. For example, in Fontconfig font names the point size is separated from the font name by a dash, but not in GTK 2 settings.
  • In any case, a bright guy has made sure that several settings which are possible in .gtkrc-2.0 are actually overriden by equivalent settings in the GConf database, which apparently is only documented in an email announcing this patch to a mailing list; this requires Firefox to be dependent not just on the GTK libraries, but also on the GNOME libraries, or at last the GConf ones.
  • Even after all this idiocy has been worked out, if I choose a bitmap/PCF font it is bold by default, and I haven't been able to switch that off. Why? Why? Why?
In this like in many other cases (ALSA springs to mind) the unwillingness and perhaps inability to think things through and go beyond the cool half-assed demo stage seem to me the driving forces.
The insanity of Linux kernel development is becoming ever more manifest in the 2.6.x series. For the sake of entertainment I have had a look at the 2.6.x kernel packages by RedHat and SUSE among many. Well, the RH ES 4.0 2.6.7 krnel has over 250 patches, and the SUSE 2.6.10 kernel source package has several archives of patches, incuding a 4 gigabytes one of fixes.
Sure, some of these will be cool little features that don't really need to be in the mainline kernel (like UML and Xen support), but the number of mere bug fixes, especially inside drivers, is amazing.
Understandably Linux says that his main worry is to make sure that the overall core structure of Linux be right, and this has meant paying a lot less attention to device issues, but it is getting a bit ridiculous.
Also, RedHat and SUSE are hardly untrustworthy as to the stuff they do with their kernel; one might be tempted to just include almost all their patches into the mainline kernel, as if they re good for them, probably they are good for everybody.
Thanks to a letter by Michael Forbes to Linux Magazine I have discovered the recently introduced --link-dest option to rsync and the Perl wrapper script rsnapshot that uses it to automate creating backups of filesystems that are both incremental and full, using forests of hard links.
As to ALSA, I haven't had the time yet to check whether there is a mixer plugin in 1.0.8 but it has a reworked alsamixer with a rather less misleading user interface, in particular for controls that do not correspond to sound channels.

January 2005

Quite entertaining interview with Linus Torvalds in the January 2005 issue of Linux Magazine among the interesting points is that he is currently using a dual PPC G5 system, to practice code portability, and that he lists along with x86 and PPC the ARM architecture as one of the crucial Linux architectures, and the importance he gives embedded Linux, as well as SMP (on which he says his pessimism was wrong, which I disagree with).
Very interesting blog entry about the consequences of defining pseudo-OO in base C which then suggests the use of a preprocessor to autogenerate all the plumbing:
Lets face it, because of C's constraints, writing GTK code, and especially widgets, can be ridiculously slow due to all the long names and the object orientation internals that C can't hide.
C with Classes anyone? :-)
Also, found an interesting product that traces a lot of WIN32 API calls.
Good news for those concerned with the slightly primitive state of ALSA mixing: apparently version 1.0.8rc2 has a mixer abstraction plugin in the ALSA library, and a new graphical mixer application, Mix2005 has been anounced.
Rather fascinating article on Tomcat and general web serving performance issues So you want high performance by Peter Lin. It discusses issues like the very high cost of parsing XML, optimal JNDC architectures, how much time and money it takes to get physical high speed lines, and the cost of power and cooling for faster CPUs and disks in racks.