Linux package management
A less schematic discussion of many of the same concepts is
- A package contains a collection of names and files.
- So does a filesystem; indeed a package contains a mini
- Traditional method: just copy the package into the
- Cannot uninstall easily.
- Overlaps among packages undetected.
- Overwrites because of overlaps cannot be undone.
- Why is this bad?
- Lack of reproducibility for mass installs.
- Lack of reinstallability, especially configs.
Bear in mind the concept of unavoidable functionality: it
must be implemented, the only choice is
whether in the computer or the user's head (e.g. spooling,
- Cascading package merging, undoable. Reasons:
- Configuration: default config, site config, host
config, each in a separate package.
- Very simple requirements, implementation hard:
- Add package to filesystem.
- Given a package, list all files in it (by
filesystem, which includes listing not installed
its because overriden).
- Given a filesystem (which can be just a simple
file), list all packages in it (this includes
listing all files that are not in any package).
- Remove package from filesystem, restoring state
before adding (undoing overrides).
- Additional requirements:
- List set of package installation prerequisite
- List set of package provided capabilities.
- Handle very large numbers of packages and files.
- Overlaps must be detected.
- Overlapped files must be saved on install.
- Saved overlaps must be restored on uninstall.
- This can happen to several levels.
- Sophisticated state tracking.
- What about partially overlapping files?
Not a problem to be solved at this level; thus the trend
towards splitting files into directories.
- A filesystem is a classification system.
- It is implemented as a set of names that map (many-to-one) to
a set of files.
- Directories need not actually be implemented, can be
entirely virtual (but beware search permissions). Names don't
necessarily have any given structure.
- Each name can consist of keywords, listed in any
usr/lib/emacs/site-init.el same as
- Any set of keywords defines a directory.
- Any unique subset of a file's keywords identifies it.
cd changes the set of default keywords.
- This solves the package problem.
- This does not exist, we need kludges.
- The set of keywords must be listed in an order given at
- Each different ordering defines a different name.
- Which ordering for package installation? In practice two:
- Package name leading solves the package problem.
- However, because of paths, UNIX/Linux uses the merged
tree structure (except for
/opt), with several
trees and subtrees.
- How to preserve package ownerships in merged trees?
- In-band multiple views, via link (hard or
symbolic) farms: one canonical (because of overlap
restore) package leading view (the depot), one
- Out-of-band databases: one canonical merged view,
database tracks package ownership.
- In-band solves to a large extent the package problem, but
has other problems:
- Hard links don't span partitions, don't apply to
cpio not suitable.
- Symbolic links are ugly, inefficient, fragile.
- Out-of-band requires a lot of extra work. Does not solve
easily the overlap restore problem.
The single most important part is the list of files that are
part of the package. In theory this is all that should be
necessary and desirable; any other information might render
the package installation stateful.
- The other issue is whether the paths are absolute or
relative, or both. Ideally relative, but rare.
- Often packages contain pre/post install/remove scripts. These
are bad news, because the package state is carried
inside more or less invisible or incomprehensible code.
- They are usually used to edit configuraiton files or to
start/stop daemons, automagically, something that the user
should do themselves.
- Metadata is not bad as long as it does not affect the
semantics of the package.
- It usually includes both package and packager
- Particularly important is version information: for the
original sources and for the particular package instance.
Not so much package specific, but package system specific.
Some packages contain only dependencies, usually
called virtual packages.
- Build requisites
If the package manager provides a particular build logic, a
package might be tagged with the list of packages that must
be installed in order to build it.
- Absolutely essential for distribution builders.
- It creates a number of very tricky situations.
- Runtime requisites
- List of packages or capabilities.
- Sometimes list of shared libraries (bad
- Runtime provides
- These are most useful if generic: most packages require
functionality from another package, not a specific package.
- Lots of package formats, there is a converter,
and a related
of the package formats it can convert.
- The major root of all link farm systems, developed at CMU.
- Each volume has a
depot subdirectory in which
packages are installed with package name leading.
- Installation merges into the filesystem by creating
- Optimisations: if a filesystem directory contains files
from a single package, that is done as a symbolic link to
the package directory. This can change if files from other
packages have to be installed in that directory.
- Easy to list bits of filesystem that are not in any
package, and which package any bit of filesystem comes from
(both encoded in the symbolic link).
- Restoring overlaps requires extra state.
- Symbolic link farm.
- Another symbolic link farm.
- Sumbolic link farm, but quite mad (a single per package
wrapper is used to invoke every cmmand in the package).
Slackware package tools
tar archives with scripts and manifest.
- Used by the
- used by the
- All popular package managers are out-of-band;
perhaps this is not that good.
- RPM is the LSB package manager, so the others matter really
little; perhaps this is not that good.
- Very very badly documented.
- Out of band; state is kept in binary format in a Berkeley
DB database. Various versions of DB have been used.
- Package files are in-band, with a binary header followed
- Overrides are sort of handled: checksums, and some files
may be renamed to
.rpmsave (overriden) or
.rpmnew (not overriding). Probably its best
- Only popular package manager that uses
not a bad choice overall, as
tar is a bigger
mess, especially with long filenames. Historical reasons too.
- All package metadata contained in a
file. Poorly designed format, in particular for relocatable
- Each distribution defines different
file extensions. Because of this and different distribution
filesystem layouts, RPM packages are not portable.
- LSB supposedly standardises RPM and filesystem
- Conectiva Linux had modified Debian's APT to work with
RPM instead of DPKG.
- Out of band installation; state is kept as a set of text
files, about five files for each package.
- Package files are in-band, all belong to an
ar archive that contains a couple of
tar archives and a tag file.
- Very poor implementation choices, in particular the state
directory can contain dozens of thousands of files.
- More complete dependency management than others.
- Clever, but grossly inefficient, frontends.
- Not in the LSB, fortunately.
- Used by most proprietary UNIXes, also used at one time by
Caldera, which ahs now switched to RPM.
- Package is a
tar archive with in-band
- Package is first unpacked in a temporary directory, and
then copied to its definitive resting place.
General principle: left to right increasing specificity.
- The original RedHat convention was right:
- Package names, same as the original archive.
- Subpackage name, e.g.
- Original archive version number.
- Package version number.
- Bad practices like putting the edition number in the
package name and a
lib prefix have become
popular (Debian, Mandrake).
It should have major hierarchies (e.g.
usr) and frameworks/subhierarchies
grass) in which multiple related (by use of the
same libraries or data formats) get merged.
- A very sad issue. Driven by stupidity and opportunism.
- It should be based on a careful balancing between path
length and number of files in a directory.
- SuSE and Debian least bad.
- Package building and installing should be different
- RPM does it a bit better: the
.spec file is
self contained and can be used independently. But there can
be many patch files, with improper names.
- RPM still does it wrong: the original archive should not
be part of the RPM. But one can do, and should do,
- DKPG has the original archive, renamed, plus a
metadata file and a single patch file that contains the