[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]
Summary
du - estimate file space usage
Lines of code: 1140
Principal syscall: stat()
Support syscalls: None
Options: 44 (19 short, 25 long)
Descended from du introduced in Version 1 UNIX (1971)
Added to Shellutils in October 1992 [First version]
Number of revisions: 327
Two general disk-usage strategies are to ask the filesystem for accounting metadata or check every file and estimate. The du utility uses the latter strategy via FTS
, which calls stat()
for every entry. Check out the df utility for the other strategy
du_files()
- Traverses input files via FTSduinfo_add()
- Combines size, node, and time dataduinfo_init()
- Sets a duinfo struct to start conditionsduinfo_set()
- Sets the duinfo data to the input sizes/timesfill_mount_table()
- Adds the dev/inode data to the di_mnt tablehash_ins()
- Adds a new dev/inode pair if possible. Indicates if pair was already theremount_point_in_fts_cycle()
- Checks if a mount point is involved in a cycleprint_only_size()
- Prints a size in the requested formatprint_size()
- Prints a size and timeprocess_file()
- Collects size information for all file entries based on level.show_date()
- Displays a date with the requested format
argv_iter()
- Increments an iterator (next argv)argv_iter_init_stream()
- Returns an argv iteratordi_set_alloc()
- Allocates a di_setdi_set_lookup()
- Searches a di_set for an entry and returns true if it existsdie()
- Exit with mandatory non-zero error and message to stderrerror()
- Outputs error message to standard error with possible process termination
Setup
The du utility defines and includes a few global helpers and variables before code execution begins. One noteworthy include from gnulib is a hash table for tracking these device/inode pairs (di_set).
du declares two global structs:
struct duinfo
- Holds directory informationstruct dulevel
- One layer in the directory hierarchy
du uses a unique style in that global variables set by parsing options use a prefix (opt_
). The globals include:
apparent_size
-*di_files
- The dev/inode values for input files already processed*di_mnt
- The list of dev/inode values for mount points*exclude
- The list of files to excludehash_all
- Flag to hash all files for hard link usagehuman_output_opts
- Flag if the sizes are human readable sizes (-h)localtz
- The local system timezonemax_depth
- The number of levels from root to searchopt_all
- Flag if the user wants all files (-a)opt_count_all
- Flag if the user wants link counts (-l)opt_inodes
- Flag if inodes should be counted, not blocks (--inodes)opt_nul_terminate_output
- Flag if output is NUL separated (-0)opt_separate_dirs
- Flag to not summarize subdirectories (-S)opt_threshold
- Flag to set a size threshold (-t)opt_time
- Flag if the most recent modified date is used (--time)output_block_size
- Sets the unit size of blocksprev_level
- The last level processed in a file hierarchyprint_grand_total
- Flag set if the user requests a grand total (-c)*time_style
- The style provided directly by the user*time_format
- The format for time (--time-style)time_type
- The user provided argument for time styletot_dui
- The duinfo struct holding the grand total
main() adds a few more locals before starting parsing:
ai
- The argv iterator for walking the command line file listbit_flags
- Sets FTS search rules*cwd_only[]
- Array for the current working directly symbol*files_from
- The user provided reference filemax_depth_specified
- Flag is the user provided a max depth (-d)opt_summarize_only
- Flag to only use summaries (-s)symlink_deref_bits
- Selects the FTS link following rule
Parsing kicks off with the short options passed as a string literal:
"0abd:chHklmst:xB:DLPSX:"
Parsing
Parsing du asks many questions of the user:
- Should we count all files and directories?
- Are we interested in the grand total?
- Should files we processed from the command line or an input file
- Should the output delimiter be '\0' or '\n'?
- What is the block size scalar? Should it be human-readable?
- Should we consider entry size?
- What is the maximum depth from root we should count?
- What is the timestamp output format?
Parsing failures
Parsing may fail in several ways:
- Providing a nonsensical depth (too large)
- User requests a 0 threshold size
- User provides an invalid exclude file
- User requests both a summary and all entry details
- Providing files on both the command line and in a reference file
User specified parsing failures result in a short error message followed by the usage instructions. Access related parsing errors die with an error message.
Execution
Not all of the work is done with du. The most important tasks pushed off to gnulib involve directory traversal with FTS, which collects file accounting via stat()
. The important tasks within du include 'rolling up' the totals following the post-order hierarchy walk.
Invoking stat()
for each target file is the key reason why du is much slower than df
stat()
include:
st_size
- The total sizest_mtime
- The modified time of the filest_atime
- The last access time of the filest_ctime
- The creation time of the filest_ino
- The inode number of the filest_dev
- The device ID of the sourcest_mode
- The access permissions of the filest_nlink
- The number of hard links to the file
The process to find and account for this information goes like this:
- Organize the input files by creating an iterator over argv or file list
- Create a hash map of device/inode data for input files (useful for cycle detection)
- For each input file:
- Test the file for a valid name
- Open the target file with FTS (invokes
stat()
) - Set the baseline file information (size, time, level/depth)
- If this file is a new level of the hierarchy, either start fresh (lower level) or summarize previously accumulated data (higher level)
- Print sizes as requested
- Print a size grand total
Failure cases:
- Unable to open or close file targets (via FTS)
- Failure to read input
- Invalid input files (zero length names)
- Argument iterator fails
- Bad STDIN file names
- FTS throws an error while walking the tree
- Unable to access directory
- Bad timestamp
All failures at this stage output an error message to STDERR and return without displaying usage help