Software and hardware annotations 2007 March
This document contains only my personal opinions and calls of
judgement, and where any comment is made as to the quality of
anybody's work, the comment is an opinion, in my judgement.
[file this blog page at:
digg
del.icio.us
Technorati]
- 070331d
Flash storage with widely different transfer rates
- An interesting group test in
PC Pro
reports transfer rates for flash storage devices. They range
from a few MiB/s to around 15MiB/s. I was particularly
disappointed to see that xD
storage was by far the slowest, as I recently bought a
digital camera that uses xD
flash storage. However the bigger surprise was the wide overall
spread of the results, and how much more expensive faster flash
storage is. It is a pity that the group test did not include
a test of write wear and write wear compensation strategies.
- 070331c
Check of a 5TB filesystem takes 12 hours
- One of my recurring worries is that while RAID (in particular
RAID0 and variants, not the abysmal RAID[345]) has enabled large
filesystems with impressive aggregate transfer rates, because
transfers can take advantage of the parallelism in RAID,
filesystem checking and recovery has remained serialized, and is
the true limit to filesystem size growth.
Filesystem checking and recovery can be very slow, and a
time of over two months has been reported
for a 1.5TB ext3
filesystem that was heavily
damaged. Another interesting datapoint is the recent
report of a 12 hour time for a 5TB XFS filesystem (with 5 billion inodes, that
is an average file size of just 1KB) that was only lightly
damaged. For a production system an interruption in service of
12 hours for filesystem check and repair following an outage can
be an unpleasant prospect.
- 070331b
Simplicity, APIs, networking
- I have just mentioned the famous
Worse is Better
argument by Richard Gabriel, and
this revolves largely about simplicity, and I have had quite a
few discussions about simplicity recently. Now simplicity is not
a simple subject, and it requires careful consideration. Whatever
it is however has some great advantages, some of which are not
often voiced, for example:
- As a starting point simplicity is a lot more attractive
than complexity, because it is possible (and perhaps too
easy) to turn something simple into something complicated,
but the reverse is very difficult.
- Simple structures are easier to debug, because debugging
is much harder than designing, as this quote from
The elements of programming style
says:
Everyone knows that debugging is twice as hard
as writing a program in the first place. So if you are as
clever as you can be when you write it, how will you ever
debug it?
and it does not apply just to programming, as it is really
about the mind.
Starting things simple however works well only if the simple
initial structure is growable, and this happens usually only if
it has conceptual and architectural integrity. The major example
are the UNIX kernel and C library
APIs,
which were designed with remarkable wisdom and integrity.
But the bigger issue is what is simplicity, as there can be
different types of simplicity, and the Worse
is Better paper itself compares simplicity of
implementation vs. simplicity of
interface. The argument is that simplicity of implementation
matters more, and one reason is that implementations have a
direct and actual cost, while interfaces have a potential and
indirect one.
The cost of a complex implementation must be paid up-front, but
the cost of a complex interface has to be paid only if it gets
used much, and that is always uncertain. One relevant example
that I have been
discussing recently
quite a bit is whether bridging or routing are in some sense
simpler
. Well, here my impression is that
routing, under the one LAN, one subnet
principle, is simpler because under it the structure of the
network is the same at all levels (physical, link, transport);
while bridging does somewhat complicated things behind the
scenes like optimizing a flooding algorithm via learning and
spanning trees. I think that the key here is that to achieve
efficiency bridging has to be too clever, and because of this it
is rather harder to debug than routing, where following traffic
paths is much easier (thanks to the visibility of the structure
of the network and ICMP).
Put another way bridging runs counter to
a principle
that has proved itself over and over again in the evolution of
(inter)networking technology
dumnover the past 25 years:
that it is better to have a dumb, oversized network where the
intelligence is in the edge nodes than a clever, optimized
network where the intelligence is in the infrastructure.
Part of this reason is that conceptual as well as
implementation simplicity makes problem analysis and repair
faster and easier. This matters because in many practicaly
situations what matter is not that there never be service
interruptions, but quick restoration of service; instead of hot
redundancy, warm switchover or even cold repair.
- 070331
Google and lack of non USA search engines
- BusinessWeek discusses whether
Google is too powerful
and my impression is that they do so from a wholly USA
perspective. From an international perspective it is still the
case that virtually all major search engines are USA based or
owned (and thus also a large resource for the USa government).
The French and German governments have realized this and have
started the
Quaero and Theseus
search engine projects. Unfortunately they haven't produced much
in the meantime, probably because they haven't adopted a
worse is better
approach
and gone like so many big projects for high goals to be
delivered someday rather than low ones to be delivered quickly,
which is what Google did.
- 070304
Bridged internetworking and Ethernet's past
- Given my recent
reflections about bridged internetworking
(an oxymoron
:->
) I have gotten into some amusing
discussions, in which I used some secondary but perhaps not so
secondary points about that.
The first is that in many places bridged internetnetworking
is a habit from the past, because before
IP became dominant
there were quite a few protocols, like Novell IPX, NetBIOS,
AppleTalk that were not routable (or easily routable) as they
had been designed for small offices or home with a single LAN
and no network administrators. Being designed for a single LAN
they also tended to use broadcasts for some sort of
autoconfiguration.
Typical scenario: a law practice using Novell Netware
expands from one floor to two floors, or opens a new office in a
nearby town, or merges with another practice across town. It
becomes very tempting to just bridge the old and the new LAN as
this is the quickest and easiest way to avoid confronting
reconfiguration of clients or servers to account for routing (a
temptation not dissimilar to the one that leads to have a
distinct physical server for every network service, even trivial
ones). Repeat this a few times and an international bridged LAN
can happen in a few years. Nowadays most networking software is
based on IP which can be routed, but autconfiguration and some
important services like DHCP or
mDNS
or
NetBIOS browse lists
rely on broadcasts (or multicasts) that are not routable (or not
easily routable).
Another argument I make against bridged LANs is that they
make investigating network problems a lot harder, as the various
provisions within the IP family that help problem investigation,
status and monitoring (principally but not just the ICMP
protocol) are simply missing from Ethernet, and with VLAN
tagging things are even worse). Thus even something as simple
and useful as ping
or traceroute
are
missing (or useless) in a bridged LAN, and
Yersinia
and others are not really troubleshooting tools :-)
.
Then the big irony here is that bridged LANs in this way
reproduce (at the
link instead of the physical layer)
one of the least amusing features of old-style coaxial cable
plants, from Ethernet's distant past, in which a single
segmented cable would run through a site: that the easiest way
to locate a problem area is to start unplugging bits of the LAN
until the problem goes away...
There is one case where I use bridged LANs without much
trouble, and it is when I use inexpensive throw-away
mini-switches to provide multisockets, a bit like USB hubs, and
also to spare the underlying socket from repeated insertions
(if a socket breaks on a cheap mini-switch then it can be thrown
away and replaces without much trouble).