Computing notes 2011 December

This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.

[file this blog page at: digg del.icio.us Technorati]

111230 Fri Filesystems for SSDs

Having summarized in my own words how flash SSD storage units are structured it may be interesting to discuss which file system to use on them. The main problems with SSDs are:

Large minimum read/write page sizes (typically 8KiB) and erase block sizes (typically 1MiB).
Limited number of erases before an erase block becomes unusable, which implies minimizing write amplification (the page linked to provides further discussion of flash SSD structure and challenges).

The large transaction sizes seem addressable with file systems that allocates space in extents (ideally with an 8KiB granularity) and supports well parity RAID storage, as a multi unit RAID strip can well have a width of 1MiB, and block sizes of 8KiB are not uncommon.

It is also important to ensure that partitions (if any) are aligned, and most recent GNU/Linux and MS-Windows tools do that, and I usually align to 256MiB up to 1GiB anyhow.

As to reducing the number of erase operations, that means also reducing the number of write operations, and as to this unfortunately all journaled file-system have the problem that they stage all updates to both the journal (first) and the filetree (second).

Fortunately most file-systems with a journal write to it only metadata updates and journal the updates as operations instead of actual data, so the amount journaled in normal operation is rather small, especially if filetrees are mounted with the noatime option.

It is still a problem though and for this and other reasons recent versions of the ext4 file-system have the option to disable journaling (the other reason is that on rotating storage journaling can be very expensive my creating long travel distances between the journal area and the active area of the disk).

The most appropriate file-systems would be those designed for MTDs, which are flash SSDs without a simulation layer, but all those available target small capacity devices because they are aimed at embedded applications.

Next most appropriate would be file-systems designed for erasable devices, or for write-intensive profiles. The main candidates are UDF and NILFS2 snd unfortunately the UDF code in Linux is not wholly reliable for writing. NILFS2 instead is currently well maintained and increasingly popular. It seems particularly well suited to flash SSDs as log-based file-systems bunch up writes and their major weakness, that reads then are then more scattered, matters not given the low access times of flash SSDs. NILFS2 has a background toa cleaner that bunches up blocks that are likely to be read sequentially, but that must be disabled on an SSD as it results in write amplification and it is pointless.

There are various online comparisons of NILFS2 with other file-systems on SSD and it is sometimes a bit better and sometime a bit worse, which is unexpected, as it should be way better as it matches the underlying storage profile better. That other filetrees seem comparable probably is because:

The benchmarks used do not take into account a probably significant difference in write amplification.
The FTL firmware that simulates a rewritable, small-sector device works too well with other filetrees, and does not let NILFS2 work as well as it could.
As usual the benchmarks only test one shot performance on a freshly built filetree, and largely the problems with an SSD are about erase counts and performance over time.

Then there are conventional file-systems with good support for alignment and with write optimizations, for example ext4, XFS, BTRFS, OCFS2.

All of the above considered, probably the best choice currently is XFS, as it has good support for clustering, and supports TRIM, and is currently maintained. Many people use ext4 as it is kept very current with new features, including those relating to flash SSDs, but as usual I think that it is of obsolete designed, in particular as to the statically allocated inode list and the flat directory structure. JFS as usual would be much preferable, but it is no longer actively developed, and thuis means that it is lacking barriers and flash oriented features. NILFS2 seems very appropriate and robust, and actively developed, and it may be the best alternative to XFS, and I shall test if more.

BTRFS is the file-system of the future, especially because it supports copy-on-write and flash SSD friendly layouts, but I feel like many that it is not yet production ready because of the lack of a fsck. OCFS2 performs well even in non-shared mode and it is mostly well designed and is actively developed, but I feel that might not last long.

111229 Thu SSDs, unused sectors, and encryption

Since I am reading and thinking about SSDs in depth thanks to the holidays, I have realized that I add to my summary of how flash SSD storage units are structured that there is another important issue: initially there are plenty of empty erase blocks, and that matters because an empty one can be written to not only without first erasing it, but without first reading it either. The erasing does not matter that much, because at some point previously it must have been erase, but avoidance of a RMW cycle is more important.

Therefore ideally the firmware on each flash SSD storage unit should aim to keep all written blocks packed into as few erase blocks are possible, to ensure that as many erase blocks as possible are fully empty.

The difficulty is that eventually all writable physical pages get written, and then no erase block can be considered empty. But the file-system contained within will usually have unused blocks, the problem is that the firmware has no idea of which ones, because that is a file-system level notion. One possibility would be for the file-system to explictly fill with zeroes each unused block and thus mark in that way empty physical pages, but usually that is considered to be too slow (and unused blocks are zeroed only when read after being allocated). This would also not work with most encryption schemes, as a block of all zeroes would encrypt to non zeroes anyhow, the particular value depending on the encryption key and the address of the block (if the encryption layer is using some common scheme).

Therefore most SSD firmwares have some special device operations to declare a sequence of logical sectors as unused or empty, and these typically are the ATA TRIM and FITRIM commands and the SCSI UNMAP commands.

Ideally these would be the equivalent of commands to write, but for whatever reason in some specifications and implementations these commands are actually very slow, so they are best used periodically and not every time a block is released, even if it is possible to enable them with the discard mount option as supported currently the ext4, XFS, BTRFS, OCFS2, GFS2, file-systems (in very recent versions of the Linux kernel).

An alternative is to periodically reset the whole unit with a security erase command and reload its contents.

Another alternative would be the marking of unused logical sectors to the device would happen at the end of fsck as at that point there is an exhaustive and known-good list of unused blocks. Unfortunately I don't know which file-system fsck tools do that.

111228 Wed Large power saving with 3.0 kernel compared to 2.6 kernel

I have just tried using the backported 3.0 kernel now available for ULTS 10 and I was amazed that standalone power consumption for my Toshiba U300, laptop was considerably lower, dropping from around 1200mAh to around 860mAh:

#  acpitool -B
  Battery #1     : present
    Remaining capacity : 1661 mAh, 45.36%, 01:55:36
    Design capacity    : 4000 mAh
    Last full capacity : 3662 mAh, 91.55% of design capacity
    Capacity loss      : 8.450%
    Present rate       : 862 mA
    Charging state     : discharging
    Battery type       : rechargeable, 18087
    Model number       : 32 mAh
    Serial number      : NS2P3SZNJ4WR

That's impressively lower power consumption, and may be due to what is arguably a bug that has been added the the power management code in Linux to match an equivalent bug in the power management code in the firmware of many laptops.

111226 Mon How SSDs are structured

Having reported an explanation of some recent SSD tests I have been considering which file-system to use on SSDs and found a relatively recent (2008) presentation presentation on SSD techology, and Linux file-systems which seems a good introduction. Summarizing the important parts, flash based SSDs are made of a collection of flash chips fairly similar to those used to store ISA PC BIOSes, and they work as a concatenation or RAID0 across them.

The single biggest performance implication is that SSD chips have positioning times negligible compared to those of rotating devices, and those times do not depend on distance.

But that just like BIOS chips, the content of SSD flash chips cannot be modified, but can be erased and then written (and erasure and writing are both slower than reading), and that the number of erasures is limited, and that the minimum amount erased is fairly large. Therefore the physical transaction size, at least for updates, is much larger than the desired logical transaction size, which is usually between 512B and 4096B, which means that at least some updates will involve RMW. Also, the minimum physical read transaction size is often rather larger than the logical sector size.

While traditionally logical sectors have been 512B, the minimum physical read size (page) is the minimum erase transaction size are currently usually 8KiB and 1024KiB which is rather challenging, and the firmware of most SSD storage devices then aims, for marketing purposes, to simulate a traditional read-write small-sector device on a read-erase-write large-sector one, with the following goals:

Minimize the number of erasures, and in particular those triggered by updates involving an RMW cycle.
Stop using blocks that have reached their maximum erase count.

The general solution is to have a table of erase blocks showing how many erasures they have suffered, and which parts of them contain data, and to take advantage of the low and constat access times to allocate logical sectors to erase blocks in an improved layout by keeping a mapping table too from logical sectors to erase block sectors, and the improved layout is pretty obvious:

Write in time order, as in general writes happen in bursts and to consecutive addresses. This implies looking at the SSD as mostly a circular buffer, or a q class="toa">log-structured file-system.
Turn updates into moves, that is when a logical sector should be updated, write its new content to a different physical sector, and then mark the previous physical sector as free, which again is similar to how a log-structured file system would work.
Keep a count of erasures per physical erase block, reserve a certain percentage of available erase blocks to be put into service when an existing erase blocks has reached its maximum erase count. The assumption here is that only a relatively small percentage of erase blocks will reach their erase limit.
Optionally compute a hash code for every logical sector to be written, and look it up in the table of logical sectors, and reuse the relevant physical sector if found (with a usage count).
Optionally define physical sectors to be smaller than logical ones (for example as 256B instead of 512B), and routinely compress every incoming logical sector as that may fit into a single physical sector.
Buffer in volatile memory as many writes as possible before flushing them to flash memory, ideally in an ordered that minimized the number of erasures needed.

As hinted that is operating as a log-structured file-system would, as the latter significantly increase write and update performance and locality, where the latter is important not because of access times proportional to distance as in rotating devices, but because of write times proportional to the number of erase blocks involved more than to the amount to be written.

Most recent flash SSD firmware seems to work mostly as hinted above, with the possible exceptions of hash coding logical sector contents (deduplication) and compression (which is however popular).

Also it has turned out that it is quite important for sustained performance to move logical sectors so they are held in a smaller number of fuller erase blocks, and ideally so that logically near sectors be in the same erase block. The goal is not to defragment logical sectors, but to defragment erase blocks, so that when writes occur in bursts there are emtpy or mostly empty logical blocks.

This processing, which is very similar to the cleaner of a log-structured file-system, has been realized as both on request (with a command called TRIM) and as an automatic background operation.

111225b Sun Single full-screen windows and aspect ratios

As a side note to my displeasure with the increasingly skewed aspect ratios of displays, they make me even more perplexed as to the widespread practice of having a single full-screen window on the display, with a stack of hidden also full-screen windows underneath, instead of a set of overlapping windows. My usual practice on my home 24" 1920×1200 display is to use a number of somewhat overlapping windows typically in sizes like 600×800 (80×40 Emacs text), 800×1180 (96×66 Emacs text), 800× 660×960 (80×60 character terminal) 960×1024 or 1024×1152 (web browers), with some occasional 1210×920 (132×43 character terminal or 1400×1024 (web browser or image viewer) for very wide data listings or graphical content.

Those using a single full-screen visible window per display, especially if GUI elements are horizontal, end up viewing whatever content in an amazingly squat and wide way, and I have found that this is particularly unwholesome for programmers, as it encourages them to write programs with very long lines.

111225 Sun Using skewed aspect ratio monitors in the best way

There are many aspect of contemporary computing that depend on historical details (UNIX files are virtual paper tapes, virtual terminals are one punched card wide, ...) and one of them is related to display aspect ratios: that GUI toolbars are usually grouped to the top or bottom of displays or windows.

The early computers (for example Xerox Alto, Three Rivers Perq Apollo DN workstation) with a GUI tended to have (monochrome, not even grayscale) square (1024×1024) monitors or portrait ones (600×800, 768×1024) because they were mostly designed as document processors.

Color and landscape displays became popular later at first because people who wrote spreadsheets tended to write them with many columns and few rows (also because spreadsheets were originally used on glass tty devices with ratios like 24 rows and 80 columnns), and also wanted to color code cells, and more recently because most entertainment content like movies and photographs and computer games have landscape aspect ratios.

Amusingly some of the first computers with a GUI had a circular display (recyled WW2 radar screens or inspired by them).

As currently most displays have wide and squat landscape aspect ratios the traditional layout of most GUIs makes little sense, and as a rule I reconfigure GUI styles to have toolbars on the sides, to save scarce vertical pixel size. As to this GNOME based applications do badly, as they seem to have several elements that are not designed to work equally well horizontally or vertical, but I usually use KDE and virtually all of its GUI elements work well vertically as well as horizontally, including for example the virtual desktop panel, the task manager, and the system tray.

The sole exception I can find in the KDE SC 4.4 is that the Plasma panel's (somewhat bizarre) options sidebar is rather slippery when used vertically (and annoyingly so).

111224 Sat Default settings for some LCD monitors need tweaking

The reason why I was reminded of gamma settings issues is that for reasons that I don't know some of the LCD monitors that I have been using look rather washed out with their default settings, and this is because their default gamma seems to me usually too high.

The three examples that I have in mind are the displays of the Toshiba U300, Toshiba L630, laptops and the display of the BenQ BL2400PT monitor. The two laptop displays seem to look a lot better (at the cost of some loss of distinction among dark shades) after:

xrandr --output LVDS1 --gamma 1.4:1.4:1.6

Where the higher setting for blue is an attempt to compensate for an overall bluish tint that most laptop displays have. For the BL2400PT the monitors's own gamma settings should be 2.2 and after:

xrandr --output VGA1 --gamma 1.2:1.2:1.2

The BL2400PT and several other monitors, notably the Dell 2007WFP and several similar Dell ones tend to have default settings for sharpening that are too high by one or two notches. Most likely this is designed to counteract the blurring of character shapes by subpixel rendering which is the default in MS-Windows and very regrettably now in most GNU/Linux distributions. Anyhow it also makes graphical shapes including GUI elements seem to have a bizare whitish fringe, and regardless I disable subpixel rendering using instead bitmap or well hinted fonts or font renderers instead.

One amazing detail is that on most monitors that default to excessive sharpening it is enabled when the input signal is digital too. This is what makes me think the default for excessive sharpening is related to the prevalence of subpixel rendering, because the original motivation for sharpening was to improve the somewhat fuzzy analogue output signal of many video cards.

111223 Fri More on the demented ways of RANDR

Some of my least esteemed open source developers are GregKH and KeithP, the former for the appallingly opportunistic replacement of devfs with something equivalent but far bigger, more complex, and less maintanable and similarly the latter for his appallingly opportunistic update of the X window system display model with the unfathomable misdesign of RANDR which I have mentioned previously but two aspects of which in particular continue to vex me:

The DPI and gamma of a monitor are now properties of an output rather than of an X screen, or rather both.
The settings for gamma work in different directions for the screen gamma and the output gamma: higher values of the screen gamma make the display look brighter, and the of the output gamma make it look darker. This probably means that the screen gamma value is that of a gamma transformation, but that of the output gamma is meant to indicate the gamma of the display, except that the default gamma of most displays is by convention 2.2 instead of 1.0.

Also, the availability of setting DPI and gamma by output depends on how recent the version of RANDR is, because whichever buffoon had the idea of using outputs in addition to screens to indicate monitors did not realize until rather late that ideally most if not all the properties of screens needed to be available on outputs too.

But it still seems to me that the change from screens to outputs is the bigger and worse one, not just tha the details of the change have been messed up so tastelessly, as the X model of independent screens with specific properties was one of the more elegant aspects of its architecture.

Also, in large part the output model has been a replacement instead of an addition to the screen model, because some X drivers, notably the intel driver, no longer support screens and only support outputs, which sometimes forces the use of the grotesquely inane syntax where the positions of outputs is specified in the Monitor section (which used to be a monitor type definition, not a monitor instance one).

111222 Thu Benchmarks of a recent SSD with telling numbers

Having just mentioned the special performance profile of SSDs I have also just read a review of a typical contemporary SSD storage unit and the benchmark results on this page are particularly telling. The first two tests are about 4KiB random reads and writes, and the notable aspects are:

Transfer rates are 1/5 to 1/10 those of sequential with larger block sizes (at the bottom of the page). This is common and yet it is in theory surprising: SSDs based flash chips have negligible access times, so there should be relatively indifferent to whether accessed sequentially or randomly. What is presumably happening is that reads and writes are physically happening in erase block sizes rather than 4KiB at a time. It seems that the SSDs minimum read block size is the same as or not much smaller than the erase block size.
The write random 4KiB transfer rates are as usual for flash SSDs 2-3 times higher than for similar patterns of reads. This also seems surprising, as writing to an SSD, because of the need to erase a whole block before writing it, should be much slower than reading, also as erasing and writing are relatively slow operations. What is presumably happening is that for writing the SSD controller is using write-behind, and many 4KiB writes are batched in the controller's RAM until there is enough to fill an erase block, then an empty erase block is allocated and filled in one go, while reads cannot be batched because since they are random, there is no suitable read-ahead strategy, and in any case they would come from different erase blocks, while several 4KiB writes with very different addresses can be written together in one erase block, as long as the controller maintains a mapping of which logical 4KiB sector is stored in which erase block are which offset.
The transfer rate of rotating storage devices is far, far worse yet, by a factor of 40-100, than that of flash SSDs for both reading and writing, as the far from negligible access times impact each and every IO operation.

The conclusion is that the SSD must be simulating a 4KiB (or 512B) sector device on a device with a much larger (erase) block size, and that not only erases, but reads and writes have a fairly large minimum transaction size. Which seems confirmed by the next two tests, again random 4KiB random reads and writes but a rate high enough that there are onn average 32 operations, or 128KiB worth of data, queued on the device at any one time:

The read transfer rate is around 8 times higher than that for a smaller queue depth, and nearly half that of sequential transfers. It looks as if a large amount of queued operations allow the SSD controller to sort requests in physical address order, and to batch them thus minimizing the number of physical read transactions performed. Given the ratio of transfer rates it looks as if the controller can get 8×4KiB logical blocks read per physical transaction.
The write rate is also mugh higher, being 3 times higher than that for a queue depth of 1, and is now also nearly half the peak sequential write transfer rate. This means that with a larger queue depth it is possible for many more logical writes to be written behind and batched in erase blocks sized physical erases and writes.
The transfer rate of rotating storage devices is twice as high as for queue size 1, but it is still terrible, as access time dominates.

How SSD controllers work should be fairly clear by now, and the final confirmation comes from the next two graphs, for 512KiB random reads and writes:

For most SSDs read transfer rates are close to peak sequential ones, most likely because most of them have an erase block size not much bigger than 512KiB, and presumably the benchmark 512KiB reads are also 512KiB aligned, not just 512KiB long. This means that in many or most cases logical and physical reads coincide.
For most SSDs write transfer rates are also close to peak sequential ones, but now are somewhat lower than read transfer rates, probably because of the overhead of having to erase before writing, and both operations being a bit slower than reading. But here too the 512KiB transaction size is likely to be not much smaller than the erase block size.
For rotating storage devices 512KiB reads and writes deliver reasonable transfer rates, 2-3 times smaller than peak sequential ones, and this is obviously because there is only one access time to pay every 128×4KiB blocks. Write transfer rates tend to be 50% higher than read rates presumably because the controller for the rotating storage device also does write-behind using its RAM and can reduce the average distance between transactions.

The final two graphs on that page report bulk sequential transfer rates, and they are typical for all types of devices, with SSD read rates typically being faster than write rates (as physical reads are faster than physical erases and writes), and with rotating storage devices having roughly the same read and write rates.

It is notable that the 3TB drive has a transfer rate of 180MB/, rather higher than the typical 120MB/s of 1TB and 2TB units, which indicates a 50% higher recording density, typical of a recording technology generation switch.

It is also notable that several SSDs peak random and sequential transfer rates exceed even by far SATA2 rates, with for example the peak read rate for the unit under review being 529.2MB/s vs. 278.5MB/s, and 273.8MB/s vs. 228.6 for writing.

111219b Sun SSDs partially replacing storage drives, and list of form factors

Interesting article about the recent storage shortages perhaps being eased which also mentions that SSD units are being purchased sometimes to replace hard to find rotating storage, as they are typically manufactured in other areas from those affecting rotating storage production.

As to SSD there is a recent list of SSD device types and form factors including the increasingly popular PCIe card ones. The list is quite useful and it has a number of very clear photographs and several basic benchmarks.

While SSD form factors are important, some of the more obscure aspects of their structure are far more important, because their performance is very highly anisotropic with load, as they are read-any, erase-many, write-once devices with a very large minimum erase size.

111219 Sun Update on monitor and camera reviews

Discussing with someone I made a point that should be repeated here: good quality LCD monitors like my current Philips 240PW9 or even the cheaper ones I also reviewed are on a different level from most traditional monitors. I am often amazed by how good my monitor is (even if I sometimes wish it had a higher DPI or was greyscale).

I am also still very happy about my Samsung WB2000 camera and sometimes I read reviews of comparable cameras like the Nikon S9100, Canon 230HS or Fuji F550EXR (in 8MP mode, as the 16MP mode is too noisy), where they come out as being equally good in most ways, but with a longer telephoto and a less good display and user interface (the WB2000 has features that few equivalent cameras allow, like raw image capture and manual operation including focus).

As to the display of the WB2000 it is encouraging that its 3" AMOLED display is so good, as AMOLED looks like the natural evolution of monitor displays too, not just for portable devices like a camera, smartphone or tablet.

111218 Sat Had already mentioned the display diagonal issue

Accidentally reading an older set of blog entries I noticed that I had already mentioned the issue with display aspect ratios and using diagonals to indicate monitor sizes, as at the time there was a transition from 16:12 to 16:10 aspect ratios, while more recently there has been a transition from 16:10 to 16:9 aspect ratios.

111215 Thu Email not lost between fetchmail and VM

I have just had a very bizarre moment in which I thought I had seen an email message getting lost where it really should not. My home mail arrangements has fetchmail taking messages from remote servers, and injecting them into a local exim MTA for delivery to traditional local per-user mailboxes in /var/mail/.

I then access these mailboxes via Dovecot from VM version 8.0.13 under Emacs version 23.1.1.

I run fetchmail and before it finished I told VM to download mail. Then that froze up for a significant amount of time and I worried. I checked and at least one mail messages that was received did not make it to my mailbox.

The message however was not lost: it was still in the exim queue. Most likely since there was some interlocking of the mail store mailbox the MTA just stopped local deliveries. So no message (I think) was lost, and I just manually started the MTA to clear the small queue.

It is good to know that in general my e-mail chain is fairly reliable. In the past however I have lost email either because of running out of battery power for my laptop, or kernel crashes. In theory email tools use fsync carefully to ensure that new copies are committed to disk before deleting old copies of messages, but some have windows of vulnerability, and a bad crash can damage the filetree too.

More commonly I have list emails by deleting them inadvertently, between backups, so I could not go back and restore them.

In general I am used to the bad old days (when using for example UUCP mail forwarding)) when e-mail had for various reasons appreciable delivery delays and losses, and I never consider e-mail a reliable communication medium. Unfortunately a lot of people have become used to e-mail being both nearly instantaneous and overall fairly reliable, so they use it as a kind of instant messaging system, rather than memo writing.

111212 Mon Rebuilding dynamic table indices without free space irony

I was looking at various DBMS implementation issues, and I found at the Facebook MySQL discussion group this very amusing entry:

InnoDB uses a B-tree for clustered and secondary indexes. When all inserts are done in random order about 30% of the space in leaf nodes will be unused due to fragmentation. Some of the unused space can be reclaimed by defragmenting indexes. Back in the day it was only possible to defragment the primary key index for InnoDB. But with the arrival of fast index creation it is now possible to defragment all indexes of an InnoDB table. Percona has expanded the cases in which fast index creation is done.

The amusement arises because this related directly to an ancient paper by Michael Stonebraker about the tradeoffs between static indices and dynamic indices in Ingres:

@article{Held:1978:BR:359340.359348,
  author	={Held, Gerald and Stonebraker, Michael},
  title		={B-trees re-examined},
  journal	={Commun. ACM},
  volume	={21},
  issue		={2},
  month		={February},
  year		={1978},
  issn		={0001-0782},
  pages		={139--143},
  numpages	={5},
  url		={http://doi.acm.org/10.1145/359340.359348},
  doi		={http://doi.acm.org/10.1145/359340.359348},
  acmid		={359348},
  publisher	={ACM},
  address	={New York, NY, USA},
  keywords	={B-trees, directory, dynamic directory, index sequential access method, static directory},
}

The topic is a performance comparison between the two types of indices, and in particular as to number of disk operations. The first conclusion was that static indices are more compact than dynamic indices, as the latter must contain empty space (usually 30%) because of index page splits when they fill, while static indices can be built with no unused space, and this reduces disk operations if there are relatively few additions to the indices, and in most databases that's the case most of the time, as additions tend to happen in batches, and indices can be rebuilt periodically after those additions. On that basis they set static indices to be the default.

Unfortunately experience showed that overwhelmingly database administrators did not rebuild the static indices after adding many records to the underlying tables, and then complained about DBMS performance getting worse and worse. Which prompted the authors to state that they should have used instead dynamic indices like B-trees and that the experience made them believers in self-tuning systems.

The irony is that in the Facebook case the dynamic InnoDB indices are explicitly rebuilt as if they were static precisely to squeeze out the free space that makes adding new records efficient.

111211 Sun Recent developments in display specifications and hopeless ones

I have recently written some notes about displays in part because at work (and at home) I stare at a display for many hours a day (when I am not moving equipment around or suffering meetings). I have perhaps stricter requirements for displays than many other people that I have met, and for example I wrote that I prefer taller displays as I mostly work on text. But there are other considerations, some of which are practical, in the sense that there are products that satisfy them, and some are not.

I really like LCD IPS or PVA/MVA displays as they give much wider viewing angles without color or brigthness or contrast changing. The color and/or brightness or contrast changes in other displays can mean that on a largish desktop display without moving one's head widely spaced parts of the displays have different color temperatures or different contrast or brightness, and that moving one's head even a little does change them in the less bad cases.

IPS and PVA/MVA displays also have better image quality, with IPS displays often having better colors and PVA/MVA having better contrast (usually thanks to deeper black).

Even if usually IPS and PVA/MVA displays were used in high end largish monitors, usually in the upper price band for office monitors (currently around £400 for 24in) or for the top band (currently around £900 for 24in) for graphics work monitors.

In the past couple of years some low cost variants of IPS and PVA/MVA have appeared, in particular eIPS from LG.Display, cPVA from Samsung, A-MVA from AU Optronics (which correspond to the 3 monitors I have reviewed recently). and result in monitors that usually cost less than half than previous office oriented IPS and MVA/PVA panels. This is in part due to lower panel cost, and in part to lower build quality of the whole monitor, and to some limitations in the panel specification, as they tend to be run in 18 bit color mode plus timewise dithering rather than full 24 bit color mode.

These panel technologies are well described in this page from the very good TFTCentral display information and review site (please donate to support them, as their reviews and information are indeed very good).

I suspect that most of the impulse for the development of cheap higher quality displays based on IPS and PVA/MVA has been driven by two market segments that have grown dramatically in recent years:

Flat wide screen television displays, which require wide good quality viewing angles to allow for group viewing, or for occasional viewing while moving around the living room, and ideally improved contrast and color quality.
Apple cellphones and tablets have generated a large demand for small high viewing angle displays, and Apple has using IPS panels by LG.display for a long time, also in high end laptop displays.
More desktop monitors are being used by consumers to display movies and to edit and display photos, and better colors and also viewing angles help.

Of the above I think that the most important has been the demand for high quality displays for cellphones tablets because of the colossal numbers involved and the education of consumers willing to buy highly profitable top end gadgets. Cellphones and tables are also driving the adoption of AMOLED displays, as well as some cameras.

I am happy that IPS and PVA/MVA displays have finally become popular so that there is some more choice at a wider set of price points. A few things that I would like for at least desktop monitors and that are somewhat unlikely to happen are:

High resolution displays, higher than the currently standard 100DPI. There are some though, mostly for cellphones, ebook readers and tables, that is with 3in and 6-7in or 10in diagonals. These can reach 367DPI and have impressed buyers. One reason that higher DPI displays have not become popular for desktop monitors is that the MS-Windows and Mac OSX GUIs have some elements hardcoded for 96DPI, in particular icon, and higher DPIs means that the icons shrink too much.
True portrait displays, with a less skewed aspect ratio, that is ideally 12:16, or for a single page display 1024 pixels wide and 1280 pixels high, or at most 10:16 with 1024 wide and 1600 high. Some existing office monitors can be rotated in portrait mode, but they tend to be too tall and narrow, and more importantly they perform poorly in portrait mode, because of the 90° rotation making nearly impossible the use of graphics card acceleration.
True grayscale displays, as they could have much better resolution than color ones, much better luminosity and contrast, and much lower cost. Current LCD displays have pixels divided in three strips for the three primary colors, an arrangement that subpixel rendering tries to exploit (poorly) to improve apparent resolution, and most importantly the display is a complex sandwich of three layers of electronics and polarizers. The three layer sandwich in particular causes a lot of cost and problems, and there are for special applications grayscale monitors of amazingly better visual quality (and there were also amazingly good grayscale CRT displays). OLED displays have similar issues due to having subpixels too and can have multiple layers as well, even if without the polarizers needed by LCDs
Unfortunately many people not having seen the alternative cannot imagine just good a grayscale display can be.

Unfortunately the biggest markets for displays are not for desktop office monitors, but for entertainment purposes, either in very large televisions and and low resolution wide aspect ration landscape color monitors is almost the only option, except for rather small portable device displays.

There have been pretty good portrait page aspect ratio high resolution grayscale displays in recent times, but only for portable ebook readers, even if some of these also have web browsers.

111210 Sat Screen aspect ratio and diagonal size for displays

Reading a review of the Dell XPS 14z laptop I was very amused, or perhaps not, to read that:

The 14z is small for a 14in laptop; it's actually slightly smaller than a 13in MacBook Pro.

The reason is that the Dell XPS 14z has a 1366×768 display, and the MacBook Pro a 1280×800 one, and the diagonal of a display is sort of meaningless without an indication of the aspect ratio, as the more extreme the latter, the smaller a screen is with the diagonal being the same.

Indeed I now own a Toshiba U300 13in laptop and a Toshiba L630 14in laptop and they are essentially the same size as the U300 is deeper and the L630 is wider.

The 13in screen measures 285mm×178mm and the 14in screen 292mm×165mm. The 13in screen at 507cm² is actually bigger than the 14in screen at 482cm², because it has a less skewed aspect ratio.

By switching from 16:12 (that is 4:3) to 16:10 and then to 16:9 aspect ratios display manufacturers have been able to sell smaller displays while quoting the same diagonal sizes, and many people seem to have not realized that.

I don't like the currently popular 16:9 aspect ratio as it is too skewed, and I prefer taller screens because most of my computer activity is editing text files and documents and reading mostly text based web pages, and for all these vertical context is rather more important than width.

Indeed for text long lines seem significantly less readable than shorter lines, and I think that there is a consensus that text document width should be around 60-70 characters, or in most indoeuropean language around 10-15 words.

That is part of the reason why I have a 24in 16:10 screen with a height of 1200 pixels even if I can work with the of the often cheaper 23in 16:9 screens with a height of 1080 pixels that are popular today. The latter are really comparable to 21.5in 16:10 screens because of the skewed aspect ratio:

The 23in 16:9 screen has 2074k pixels over 510mm×287mm or 1464cm².
The 24in 16:10 screen has 2304k pixels over 519mm×324mm or 1681cm².
If we go by screen surface area the 23in 16:9 screen is equivalent to a 21in 16:10 screen, and by number of pixels it is equivalent to a 21.6in 16:10 screen.

I don't particularly need the 1920 pixel width, and I would be satisfied with a 1600×1200 pixel size, as widths allows pages to be seen side-by-side and 1600 pixels allows two text pages to be seen together. But that is the old 16:12 aspect ratio that is no longer popular.

I have tried at work 30in 2560×1440 640mmx400mm displays and they were almost too big to use, in particular from a typical desk viewing distance it was not easy to see the whole screen at once, and sometimes I had to push myself back; I have even tried for some time two of these displays side by side and I almost never looked at the second display as it was too far to the side.

111209 Fri Tanenbaum on BSD, Linux and L4 adoptions

Writing about some Mac OSX CUPS and Mac OSX X11 fonts issues reminded me that Mac OSX is really just a UNIX-like operating system with its own GUI and Objective-C based application libraries. This then reminded me of a recent interview with Professor Andrew Tanenbaum.

The interview has many interesting aspects, for example the mention of his long forgotten Amoeba kernel and more in general of capability system architectures that many think were invented by Jack Dennis in 1967.

But the relevant one is that he points out that since Mac OSX is largely based on Mach and FreeBSD in effect it is the second most popular kernel and operating system after MS-Windows NT with an installed based several times larger than GNU/Linux:

LinuxFr.org : Do you think the Linux success is a proof he was right or is it unrelated?

Andrew Tanenbaum : No, Linux "succeeded" because BSD was frozen out of the market by AT&T at a crucial time. That's just dumb luck. Also, success is relative. I run a political website that ordinary people read. On that site statistics show that about 5% is Linux, 30% is Macintosh (which is BSD inside) and the rest is Windows. These are ordinary people, not computer geeks. I don't think of 5% as that big a success story.

Mach actually is also a microkernel, but the version of Mach used as the foundation of the Mac OSX kernel was fairly monolithic. However He also adds that the (mostly German) L4 research microkernels are deployed on several hundred million mobile phones based on some Qualcomm chipsets.

But then if embedded systems matter as to league tables QNX kernel has been a rather popular embedded microkernel for decades.

Linux as an operating system kernel is nothing new or special, incorporating almost only technology from the 1960s and 1970s, and several aspects of it seem to have been designed without any reference to decades of previous operating systems research, reinventing issues and limitations that were solved long ago.

Its main claim to fame is that like MS-Windows NT it works adequately for most purposes and is still mostly maintainable; not that it does things well or in an advanced way. Indeed I use it mostly opportunistically, because it is a better implementation of most of the UNIX architecture than MS-Windows NT, and it is popular enough that it is more widely supported.

111208 Thu Some Mac OSX programs hanging

I was asked by my friend with Mac OSX™ some time ago to look into why some programs, including its text editors, would hang during startup. To my surprise this was because they were trying to contact the print dæon cupsd and had a fairly long timeout for a reply.

Even more surprisingly the print dæmon would not start because the relevant file /usr/sbin/cupsd was not there, even if the rest of the CUPS subsystem files were there.

The first thing was to restore the file. On a GNU system based on a nice package manager like RPM (or even a less nice one like DPKG) I could just first verify the presence and content of the relevant package files, then extract the file and put in place. Apparently Mac OSX does is somewhat MS-Windows like and has just installers and lacks the ability to verify installed files and a work with installation files.

But I found a shareware tool to do that: Pacifist that fills that gap. It then turns out that bizarrely Mac OSX is installed from a very small number (two IIRC) of very large packages, instead of the more fine grained approach used by most GNU distributions, and they are on the installation DVD.

Using Pacifist it was easy to recover /usr/sbin/cupsd and the owner of the system told me that fixed some other glitches and delays. It looks like that either the GUI libraries or most applications open a connection to the print server during startup, which is moderately strange.

It could be instead that recent versions Mac OSX uses launchd to instantiate server dæmons and this meant that the connection to the CUPS port succeeded, because it was held open by launchd and then there was no response from the missing cupsd and thus a long wait.

111207 Wed Some Mac OSX NX font issues

Some time ago I have been asked to look into why some programs would not start on a GNU/Linux server via NX when started from a Mac OSX client but would when started from MS-Windows. The reason was the lack of some fonts from the Mac OSX emulation of X-Windows. It turns out that the X11 emulator uses the standard OSX fonts (in /System/Library/Fonts/ and /Library/Fonts/), plus those in per-user directories ($HOME/.fonts/ and $HOME/Library/Fonts/), and is missing some of the more common MS-Windows and Linux fonts.

It is sufficient to add them in the usual ways to the relevant Mac OSX font directory (glob al or per user) and then to be sure to use xset fp+ to add them to the X font path, and $HOME/.fonts/fonts.conf to add them to the FontConfig font paths.

111205b Mon Online sources for UW Lisp for the UNIVAC 1100

I was discussing by email compilation techniques and in particular that both interpretation and compilation are reductions, and I made the example of the compiler of the UW Lisp compiler for the UNIVAC 1100 mainframe series, which has been for a long time a source of enlightenment and admitration for me.

I found some of the printouts of its sources and hoping against all hope I searched for some of the lines in those sources and I was delighted to find that the full manual and source for UW Lisp for UNIVAC 1100 have been put online by someone to whom I have grateful.

UW Lisp is notable for the considerable terseness and elegance of both its language and implementation. The assembler source of the interpreter and runtime system is 5,000 lines long, and among other modules the source of the very powerful structure editor is 120 lines long, and the full source of the compiler to machine code is 720 lines long (at least in the version I have, newer versions are a bit longer).

But then I am reminded that UNIX version 7 would support 3 interactive users in 128KiB of memory, and dozens in 1-2MiB, doing software development with VI and compiling with Make.

111205 Mon Remembering 'rh', a much better alternative to 'find'

Having just published a long delayed entry about an excellent page about find I was remind of a long forgotten utility which is a much better alternative to find called rh written over twenty years go by Ken Stauffer.

It is ridiculously better than find. Some examples of search expressions from its manual page:

EXAMPLES
       The following are examples of rh expressions.

               (mode & 022) && (uid == $joe );

       Matches  all  files  that  have  uid  equal to username ``joe'' and are
       writable by other people.

               !uid && (mode & ISUID ) &&
               (mode & 02);

       Matches all files that are owned by root (uid==0) and that have set-uid
       on execution bit set, and are writable.

               (size > 10*1024) && (mode & 0111) &&
               (atime <= NOW-24*3600);

       Finds  all executable files larger than 10K that have not been executed
       in the last 24 hours.

               size < ( ("*.c") ? 4096 : 32*1024 );

       Finds C source files smaller than 4K and other files smaller than  32K.
       No other files will match.

               !(size % 1024);

       Matches files that are a multiple of 1K.

               mtime >= [1982/3/1] && mtime <= [1982/3/31];

       Finds files that were modified during March, 1982.

               strlen >= 4 && strlen <= 10;

       This  expression  will print files whose filenames are between 4 and 10
       characters in length.

               depth > 3;

       Matches files that are at a RELATIVE depth of 3 or more.

               ( "tmp" || "bin" ) ? prune : "*.c";

       This expression does a search for all "*.c" files, however it will  not
       look  into  any directories called "bin" or "tmp". This is because when
       such a filename is encountered the prune variable is evaluated, causing
       further  searching  with  the current path to stop. The general form of
       this would be:

               ("baddir1" || "baddir2" || ... || "baddirn") ?
                       prune : <search expr>

Some of advanced examples:

ADVANCED EXAMPLES
       The following examples show the use of function definitions  and  other
       advanced features of Rh.
        Consider:

               dir()
               {
               return ( (mode & IFMT) == IFDIR );
               }

       This  declares  a  function  that returns true if the current file is a
       directory and false otherwise. The function

       dir now may be used in other expressions.

               dir() && !mine();

       This matches files that are directories and are not owned by the  user.
       This assumes the user has written a mine() function. Since dir and mine
       take no arguments they may be called like:

               dir && !mine;

       Also when declaring a function that takes no arguments the  parenthesis
       may be omitted. For example:

               mine
               {
               return uid == $joe;
               }

       This declares a function mine, that evaluates true when a file is owned
       by user name 'joe'. An alternate way to write mine would be:

               mine(who)
               {
               return uid == who;
               }

       This would allow mine to be called with an argument, for example:

               mine( $sue ) || mine( $joe );

       This expression is true of any file owned by user name 'sue' or  'joe'.
       Since  the  parenthesis  are  optional for functions that take no argu‐
       ments, it would be possible  to  define  functions  that  can  be  used
       exactly  like  constants, or handy macros. Suppose the above definition
       of dir was placed in a users $HOME/.rhrc Then the command:

               rh -e dir

       would execute the expression 'dir' which will print  out  all  directo‐
       ries.  Rh functions can be recursive.

Some examples of canned functions in my own ancient .rhrc:

KB              { return 1<<10; }
MB              { return 1<<20; }

avoid(p)        { return (p) ? prune : 1; }

months          { return 30*days; }
ago(d)          { return NOW-d; }

modin(t)        { return mtime > (NOW-t); }
accin(t)        { return atime > (NOW-t); }
chgin(t)        { return ctime > (NOW-t); }

modaft(t)       { return mtime > t; }
accaft(t)       { return atime > t; }
chgaft(t)       { return ctime > t; }

sg              { return mode & 02000; }
su              { return mode & 04000; }
si              { return mode & 06000; }

w               { return mode & 0222; }
x               { return mode & 0111; }

t(T)            { return (mode&IFMT) == T; }

f               { return (mode&IFMT) == IFREG; }
s               { return (mode&IFMT) == IFLNK; }
d               { return (mode&IFMT) == IFDIR; }

l               { return nlink > 1; }

mine            { return uid == $$; }

c               { return "*.c" || "*.h"; }
cxx             { return "*.C" || "*.H"; }

Z               { return "*.Z"; }
W               { return "*.W"; }
gz              { return "*.gz"; }

junk            { return "core"
                  || ("a.out" && mtime <= ago(2*days))
                  || "*.BAK" || "*.CKP" || "*[~#]"; }

111204 Sun Using the GPG2 agent instead of the OpenSSH agent

Belatedly I have discovered that the GPG2 agent can handle SSH keys too and be a drop in replacement (when the enable-ssh-support option is enabled) for the OpenSSH agent.

However I was a bit surprised by it because it is somewhat rough both in design and implementation and also because it works fairly differently from the OpenSSH agent:

The main difference is that it caches OpenSSH key permanently on disk, not just temporarily in memory. This means that if you set the OpenSSH agent to be the GPG2 agent you need to use ssh-add only once, and then a copy of the supplied private keys will be stored in $HOME/.gnupg/private-keys-v1.d/ encrypted with a password that you must supply. This copy will independent of the original in $HOME/.ssh/.
I think that this first difference is neither a problem nor an advantage, but something that is not immediately obvious, at least to me.
The second major difference is that the GPG2 agent has a default limited lifetime for keys held in memory, and whiel this is just a different default, it is welcome.
The third major difference is that since the GPG2 agent supports multiple applications, and it stores copies of key encrypted with its own password, and periodically deletes them, it and not the application needs to prompt the user for the key encryption password whenever an application wants to use the key and the key needs to be decrypted.
The problem is that this prompts may happens at any time as a consequence of the operation of potentially any application, so the prompt from the GPG2 agent is quite disconnected from the application.
For this reason by default the GPG2 agent comes with a set of prompt programs which usually pop up a request on the current (whatever that means) X display screen.

Starting with these prompt programs the PGP2 agent has some issues (at least on version 2.0.14 on ULTS10):

The GPG2 agent supports OpenPGP keys, OpenSSH keys, S/MIME keys and SC physical keys, and the latter is done via another dæmon started by the GPG2 agent called scdaemon. Unless you do have a SmartCard setup disable it.
The GPG agent can only be started in the background as a standalone server taking commands from a named pipe. If started in the foreground it behaves as a program, that is it takes commands from the standard input.
I could only get the GTk+ prompter program to work reliably.
It is difficult to just run the prompter inside a shell, and let it ask on that terminal the passwords. In theory the /usr/bin/pinentry-curses prompter does something to grab a terminal and prompt on it, but it seems quite buggy. There is also the problem of what happens to the application running on that terminal. Because the prompter is invoked by gpg-agent which is running as a background progress not attached to any terminal.
It is quite important to ensure that the GPG2 agent logs its actions, because setting it up is not totally trivial.

Overall the GPG2 agent is a good substitute for the OpenSSH agent, and I am using it by default.

111203 Sat Blog updates and actually using hypertext

Because of pressure from other engagements I have not been updating my blog or notes as frequently as I wanted, and I have accumulated quite a few draft updates, ranging from fairly complete to just pointer to interesting bits I wanted to write about.

This has made me question my blogging practices: because this means that while I have had time to jot down quick notes and drafts I haven't had the time to publish them into blog entries.

Part of the reason is that I have been striving to keep some standards in my publishing, for example in a previous entry I have remarked about the usefulness of fine tagging of hypertext and I try to enrich my published entries with numerous links to things I mention. I have realized that the latter is one of the things that takes most of my time in publishing my notes, because searching for the most opportune link takes time, even with a faster internet link and computer, in part because there is some kind of feedback that makes other publishers build more complicated pages to use up technology improvements, which is similar to what has happened in applications.

The other main reason is that sometimes I discuss non trivial issues, and these require something more involved than just typing in words, like research and thought and revisions.

While I don't want to descend to the level of the bloggers at some major sites who publish word dumps or just reblog other people's news with little comment, I have decided to try to simplify a bit my blog entries by making them less rich in hyperlinks and a bit closer to jottings than essays. This is sad because after all hypertext is made more valuable by hyperlinks, so in a compromise I'll actually write places where I would like to put an additional hyperlink as anchors but leaving the link unwritten to be occasionally updated later.

I think that I will continue to use fine grained tagging because I am fairly quick at it.

Another small change is in the organization, as I have realized that I must make a bigger distinction between time dependent content and content that is time independent or has an indefinite currency. The example I have in mind is reviews, which usually are fairly perishable as products they describe disappear from the market fairly quickly thanks to constant updates; similarly for shopping notes. This means that I will structure content on my site as two blogs, one being this technical blog and another (yet to be setup) for non technical matters (very few in this site), and I will keep existing pages with less timewise essays and information, for example the always popular Linux ALSA and Linux fonts notes.

Following the above I have been putting felshing out for publishing several items in my long list of blog entries to write, and these will appear in the site syndication feed usually with a date in the past that being when I made the note, as that's the date of currency of the information in those entries.