Updated: 2012-04-07
Created: 2005-10-31
Older references are not quite accurate, because things in kernel 2.6 are quite better than in kernel 2.4 and filesystem maintainers have reacted to older unfavourable benchmarks by tuning their designs. So the references below are ordered by most recent first.
ext3 FAQ
2004-10-14.| Feature | ext3 |
JFS | XFS |
|---|---|---|---|
| Block sizes | 1024-4096 | 4096 | 512-4096 |
| Max fs size | 8TiB (243B) | 32PiB (255B) | 8EiB (263B)
16TiB (244B) on 32b system |
| Max file size | 1TiB (240B) | 4PiB (252B) | 8EiB (263B)
16TiB (244B) on 32b system |
| Max files/fs | 232 | 232 | 232 |
| Max files/dir | 232 | 231 | 232 |
| Max subdirs/dir | 215 | 216 | 232 |
| Number of inodes | fixed | dynamic | dynamic |
| Indexed dirs | option | auto | auto |
| Small data in inodes | no | auto (xattrs, dirs) | auto (xattrs, extent maps) |
fsck speed |
slow | fast | fast |
fsck space |
? | 32B per inode | 2GiB RAM per 1TiB + 200B per inode
(half on 32b CPU) |
| Redundant metadata | yes | yes | no |
| Bad block handling | yes | mkfs only | no |
| Tunable commit interval | yes | no | metadata |
| Supports VFS lock | yes | yes | yes |
| Has own lock/snapshot | no | no | yes |
| Names | 8 bit | UTF-16 or 8 bit | 8 bit |
noatime |
yes | yes | yes |
O_DIRECT |
yes | yes | yes |
barrier |
yes | no | yes (and checks) |
| commit interval | yes | no | no |
| EA/ACLs | both | both | both |
| Quotas | both | both | both |
| DMAPI | no | patch | option |
| Case insensitive | no | mkfs only |
mkfs only (since 2.6.28) |
| Supported by GRUB | yes | yes | mostly |
| Can grow | online | online only | online only |
| Can shrink | offline | no | no |
| Journals data | option | no | no |
| Journals what | blocks | operations | operations |
| Journal disabling | yes | yes | no |
| Journal size | fixed | fixed | grow/shrink |
| Resize journal | offline | maybe | offline |
| Journal on another partition | yes | yes | yes |
| Special features or misfeatures | In place convert from ext2.
MS Windows drivers. |
Case insensitive option.
Low CPU usage. DCE DFS compatible. OS2 compatible. |
Real time (streaming) section.
IRIX compatible. Very large write behind. Project (subtree) quotas. Superblock on sector 0. |
This section is about known hints and issues with various aspects of common filesystems. They can be just inconveniences or limitations or severe performance problems.
inodesize of 128 bytes is used or kept. If so, multiple updates per second are not recorded. This can impact make processing and fsync.
Kernel version independent hints:
inode64 rotors directories across AGs, and then attempts to allocate space for files created in the AG containing the directory, which is quite different from the alternative because
if you create a bunch of files in the same directory, without inode64 XFS will scatter the extents all over the disk rather than trying to allocate them next to each other.
allocation group, if all allocation groups are in use to grown extents writing can stop for all other files, or similarly if the files are in the same allocation group. Having more allocations groups typically improves multithreaded performance.
will disable XFS' write barrier support.
Kernel version dependent hints:
The default i/o scheduler, CFQ, will defeat much of the parallelization in XFS.
These are pointers to some of the entries in my technical blog where filesystems are discussed:
fsck timesext2 for all my MS
Windows filesystems except the boot one.ext3 with and without extended attributes
and ext3's new hash directory indices.fsck.davtools package to visualize
ext3 fragmentation.fsck takes more than one
month, and some filesystems being VLDBs.ext2 for MS Windows.noatime.ext3 into something else.worksmeans for filesystems.
worksfor file systems.
rootfilesystem.
This is a summary in my own words of this more detailed description of JFS data structures. But there is a much better PDF version of the same document, with inline illustrations, also available inside this RPM from SUSE.
ABNR which describes an extent
contaning zero bytes only.btree and the leaf extents are
called xtrees (and contain an array of
entries called xads)
if they are for an allocation map, and dtrees
if they are for a directory map.jfs_fsck.bmap,
is a file (not a B+-tree, despite being
called map) divided into 4KiB pages. The first block is the bmap control page, and then there are up to three levels of dmap control pages that point to many dmap pages. Each dmap page contains:
jfs_fsck if any.dinomap, and after that a number
of extents called
inode allocation groups.
dinomap contains:
tiedto it, until all such extents are freed.
dtree entries.