[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]
Summary
ls - list directory contents
Lines of code: 5309
Principal syscalls: opendir(), readdir()
Support syscall: closedir(), fstat()
Options: 83 (40 short, 43 long)
Spiritually linked with LISTF from CTSS (1963) -- designed in 1962
Added to Shellutils in October 1992 [First version]
Number of revisions: 653
The ls command is the most complex (read: over-engineered) utility in coreutils. Rich features, such as format control, color support, pattern filtering, and sorting intertwine with signal handlers, hash tables, obstacks, interface modes, caching, and access control to deliver a versatile tool.
Setup
The first 1000 lines of ls defines the globals, structs, and enums used throughout the code. Some key players are:
Structs:
struct bin_str
- Holds data for a color indicator.struct color_ext_type
- An extended color style definitionstruct column_info
- Metadata for columnsstruct fileinfo
- Holds data for a single file entry.struct ignore_pattern
- The first of a list of file name matches to ignorestruct pending
- The first of a list of pending directories to process
Enums:
enum acl_type
- The 3 ACL rulesenum Dereference_symlink
- The four strategies to handle symlinksenum filetype
- The 10 types of filesenum format
- Five types of format rulesenum ignore_mode
- The three ways to ignore special filesenum indicator_style
- The four color indicator stylesenum sort_type
- The seven sorting categoriesenum time_style
- Four timestamp stylesenum time_types
- Three time types (modified, create, access)
Entry
ls has three entry points depending if the user invokes via ls
, dir
, or vdir
. Each of these files sets ls_mode
to value used later to manage default options
In all cases, ls:
- Assumes we always print the directory name
- Clears all pending directories
- Gets the current time for upcoming consistency checks
- Initializes flags to a default value
- Sets a default quoting style
- Pulls column and tab size from the environment
Parsing
The largest helper function is decode_switches()
at 579 lines. The function goes through each of the command line arguments to set flags used later during execution. Parsing switches answers these general questions:
- What is the overall output format?
- What are the specific formats? (time, inode, blocks, etc)
- Are colors used?
- How is the output sorted and ordered?
- Is the search recursive?
- Does the search follow links?
Switch decoding finishes by checking for consistency among options. Note that many options simply override others rather than failing (i.e. -l overrides -1).
Parsing failures
These failure cases are explicitly checked:
- An invalid line or tab size
- Invalid time style format
Extra Comments
Colors
Color sequences generally follow ANSI in-band escape formats, which include a LEFT, CODING, and RIGHT sequence to trigger output colors changes in a supported terminal. The LS_COLORS environment variable defines a set of color codes for each file type. Just in case LS_COLORS isn't defined, this ls implementation also defines a set. Colors are changed during the table printing functions, usually as a call to set_normal_color()
, restore_default_color()
, or directly with put_indicator()
.
During initialization, ls parses the LS_COLORS variable. A typical entry looks like di=1:
, which says that directories will be bolded. This label has four components: A label, an '=', an encoding, and a separator. parse_ls_color()
retrieves the string of color encodings and uses a state machine to read and handle an entry a character at a time:
State | Action | Next State |
---|---|---|
PS_START | Check for the first label letter, separator, or * for extentions | PS_2, PS_4, PS_FAIL |
PS_2 | Get the second character of a label | PS_3 |
PS_3 | Check '=' and process subsequent encoding | PS_START, PS_FAIL |
PS_4 | Check '=' after extension and process encoding | PS_START, PS_FAIL |
PS_DONE | Parse succeeded | Terminal State |
PS_FAIL | Something went wrong | Terminal state |
Execution
After parsing, the next step sets up the colors via the environment variables LS_COLORS
or COLORTERM
in that order.
We then set search behaviors. This means finalizing symlink handling and preparing for a recusrive search (if used). Some smaller details include current time zone, sort format, dired mode stacks, and hyperlinks.
The final step before processing is to set up the table data structure that we populate during file system traversal. The involves allocating the table and ensuring the sort order vector is clear. The initial directory (either user specified or cwd) is added to the processing queue and we're ready to go.
Processing directories
If only a single file is specified, that file is added to the file table and printed with print_current_files()
. However, the typical case is that an entire directory is queued. This target directory (and eventually children) live in the pending_dirs
list. These list members are checked for loops and then processed one at a time via print_dir()
. Preparing directory output looks like this:
- Attempt to open directory with the
opendir()
syscall - Check if the directory has been visited to avoid cycles
- Clear the file table to prep for the new directory
- Print the directory name, if appropriate
- Repeatedly make
readdir()
syscall until all files processed - Check that file type isn't ignored
- Add the file to the table (more below)
- Handle any signals that may have arrived
- Close the current directory with
closedir()
syscall - Sort the file table
- Revisit the table entries and add directories to the pending list
- Print the output of the file table (the cwd)
Printing output
Getting the files listed on the screen is the final step. This is more complicated than a call to printf()
due to the variety possible output formats including a list of columns, a single list, many wrapped horizonal entries, and more. The most significant printing function is print_long_format()
. In general, we walk file table and print the entries, which are already processed during add (gobble_file()
- See below for details). The most in-depth printing call sequence worth exploring is:
print_current_files()
print_long_format()
print_name_with_quoting()
quote_name()
- ends withfwrite()
to STDOUTprint_color_indicator()
put_indicator()
- Thefwrite()
here comes first
There are several ways that file processing could fail:
Failure cases:
- Unable to open directory (doesn't exist, no permissions, etc)
- Unable to stat file (doesn't exist, no permissions, bad path, etc)
- Improper hyperlink formatting
- Error reading directory
- Unable to close directory
- Unable to read a followed symlink
Extra comments
Adding a file to the file table
A file is added with a call to gobble_file()
, the 2nd largest helper function in this utility at 332 lines. While printing all the files simply walks the table, outputting all of the lines, most of the work in ls happens as each file is collected and stored. To add a file we perform a subset of the following operations, depending on format required:
- Ensure that the file table is large enough
- Set the file reference in the file table
- Verify quoting requirements
- Construct the full name of the file
- stat or lstat (if symlink) the file
- Check capabilities
- Check security context and access list
- Construct the link name (if link)
- Get widths: owner, group, author, security context, major/minor device, size, and inode
- Return the total size in blocks
Helpers
The complexity of ls naturally leads to a very long list of helper functions
abformat_init()
- Initialize the month abbreviation formatabmon_init()
- Initialize the month abbreviation structureadd_ignore_pattern()
- Adds a file pattern to the ignore listalign_nstrftime()
- Returns the size of the time formatattach()
- Constructs an absolute file name from path and namebasename_is_dot_or_dotdot()
- Checks if the input name is . or ..calculate_columns()
- Returns a column count based on screen sizeclear_files()
- Remove all files from the sorted arraycmp_atime()
- Compare function for access times for two filescmp_ctime()
- Compare function for creation times for two filescmp_extension()
- Compare function for two file extensionscmp_mtime()
- Compare function for modified times for two filescmp_name()
- Compare function for names for two filescmp_size()
- Compare function for sizes for two filescmp_version()
- Compare function for version for two filesdecode_switches()
- Checks all provided switches and sets flags/globalsdev_ino_compare()
- Checks if table entries matchdev_ino_free()
- Frees a table entrydev_ino_hash()
- Computes the table position based on inode numberdev_ino_pop()
- Pop dev_ino from dedicated stackdev_ino_push()
- Push dev_ino on to dedicated stackdired_dump_obstack()
- Outputs a prefix followed by obstack valueserrno_unsupported()
- Tests if an error value implies unsupported function usageextract_dirs_from_files()
- Removed directories from a file listingfile_escape()
- Enforces RFC3986 contraints on stringsfile_escape_init()
- Init the RFC3986 character map (letters/numbers/some symbols)file_failure()
- Outputs a file access failure messagefile_has_acl_cache()
- Caches results of recent file access control attemptsfile_ignored()
- Determines if a file should be ignoredfirst_percent_b()
- Finds the first %b time format (abbreviated month)format_group()
- Wrapper for printing the group formatformat_group_width()
- Wrapper to get the width of the group formatformat_inode()
- Formats the inode number for displayformat_user()
- Wrapper for printing the user formatformat_user_or_group()
- Handles actual printing of user/group formatsformat_user_or_group_width()
- Returns the width of user or group formatsformat_user_width()
- Wrapper to get the user format widthfree_ent()
- Frees file info for a directory entryfree_pending_ent()
- Frees a pending directory entryget_color_indicator()
- Returns a color string based on file typeget_funky_string()
- Parses an LS_COLORS string and returns validityget_link_name()
- Puts the name of a linked file in to a bufferget_type_indicator()
- Gets the file type indicator (for -F)getenv_quoting_style()
- Checks environment for quoting style then sets for lsgetfilecon_cache()
- Checks getfilecon failure cache and returns attribute context sizegobble_file()
- Adds a file to the file tablehas_capability()
- Checks if the file has capabilitieshas_capability_cache()
- Caches the capability check failure resultindent()
- Adds indents, using tabs when possibleinit_column_info()
- Allocates memory for files and columns datainitialize_ordering_vector()
- Initialize sorted_file[] to counting orderis_colored()
- Checks if the current active color matches the inputis_directory()
- Checks if a file entry is a directoryknown_term_type()
- Checks if the TERM env is knownlength_of_file_name_and_frills()
- Returns the total length of file and and extraslong_time_expected_width()
- Returns the number of columns in a timestampmake_link_name()
- Returns the path prepended to an input nameneeds_quoting()
- Cehcks if a file name needs quotingparse_ls_color()
- Initializes the configured color schemepatterns_match()
- Checks a file against all patterns for a matchprep_non_filename_text()
- Outputs color indicates as neededprint_color_indicator()
- Attempts to output color indicates and returns successprint_current_files()
- Display all files in the current file tableprint_dir()
- Output a list of files in a directoryprint_file_name_and_frills()
- Prints file name and extras (inode, block size, etc)print_horizontal()
- Output data transposed (for -x)print_long_format()
- Outputs extra information per entry (for -l)print_many_per_line()
- Outputs multiple entries per row (for -C)print_name_with_quoting()
- Outputs a name with quotesprint_type_indicator()
- Outputs the file type indicator (for -F)print_with_separator()
- Outputs filename followed by a separator and spaceprocess_signals()
- Dispatches waiting signals, if anyput_indicator()
- Outputs a color indicatorqueue_directory()
- Adds a directory to the pending queuequote_name()
- Adds a quote to a name and returns size used (may include a pad)quote_name_buf()
- Stores a quoted name in an input bufferquote_name_width()
- Returns the size of a quoted namerestore_default_color()
- Removes coloring schemeset_exit_status()
- Updates exit status to reflect other than complete successset_line_length()
- Sets the width of a line (for -w)set_normal_color()
- Puts indicator sequence for normal (nonfile) text colorssighandler()
- Handler for all caught signals except SIGTSTPsignal_init()
- Initialize signal handlers - wrapper for signal_setup()signal_restore()
- Restores default signal handlers - wrapper for signal_setup()signal_setup()
- Initialize/Restore signal handlerssort_files()
- Sorts the files in the file table using the sort_typestophandler()
- Handler for SIGTSTPunsigned_file_size()
- Converts a negative size to positive (assume signed wrap)visit_dir()
- Active the directory and warn if already activexstrcoll()
- Extended strcoll(), which checks errno