[GNU Manual] [No POSIX requirement] [Linux man] [No FreeBSD requirement]
Summary
md5sum - print or check md5 digests
Lines of code: 1111
Principal syscall: None
Support syscalls: open(), close(), fadvise()
Options: 18 (5 short, 13 long)
This utility originated with GNU as secure hash usage grew in the 1990s
Added to Textutils in June 1995 [First version]
Number of revisions: 219
The md5sum utility handles user requirements and organizes input files before calling to gnulib to perform the actual md5 computation as per RFC 1321. Note that this utility wraps several possible algorithms (BLAKE2, SHA-1, SHA-256, SHA-384 and SHA-512) although we're only concerned with md5
Helpers:bsd_split_3()
- Splits an input in to 2 parts (digest and escaped file name)digest_check()
- Opens a list of files/digests and verifies matchesdigest_file()
- Prepares an input file for digest computation via gnulibfilename_unescape()
- Converts escaped file names to actual characters (sed:s/\\n/\n/g;s/\\\\/\\/g
)hex_digits()
- Tests that an input string is a valid hex sequenceprint_filename()
- Prints a file name either escaped or unescapedsplit_3()
- Splits an input string in to the three md5 components (hash, format, file name)
die()
- Exit with mandatory non-zero error and message to stderrDIGEST_STREAM()
- Macro function for the select algorithm (this case is md5_stream() in gnulib)error()
- Outputs error message to standard error with possible process termination
Setup
md5sum uses several flags and variables as globals, including:
bsd_reversed
- Flag set if we're using BSD reverse checksumsdigest_hex_bytes
- The size of the output digest (md5 is 16 bytes)have_read_stdin
- Flag set if the utility has read input from STDINignore_missing
- Flag to continue processing on missing files (--ignore-missing)min_digest_line_length
- The minimum length of a valid checksumquiet
- Flag to prevent display of normal feedbackstatus_only
- Flag to prevent display of normal output (--status)strict
- Flag to force abort on any abnormalities (--strict)warn
- Flag to display warnings
main()
introduces a few local variables:
*bin_buffer
- A memory-aligned pointer to the data bufferbin_buffer_unaligned
- The data buffer holding the digest value of the current filebinary
- Flag set if the file should be read in binary mode (-b)do_check
- Flag set if we're verifying a hash list (-c)opt
- The character for the next option to processok
- The final return statusprefex_tag
- Flag set for using BSD style checksums (--tag)
Parsing
Parsing answers the following questions to define the execution parameters
- Are we dealing with text or binary files?
- Are we checking md5 checksums or producing them?
- Should display output be limited in any way?
- Should checksum mismatches reflect in the exit status?
- Should output be newline or NUL terminated?
Parsing failures
These failure cases are explicitly checked:
- Combining --tag and --text modes
- Verifying checksums while using the --zero, --tag, --binary, or --text options
- USing --status, --warn, or --strict while producing checksums
- Unknown option used
User specified parsing failures result in a short error message followed by the usage instructions. Access related parsing errors die with an error message.
Execution
md5sum has two execution strategies: One is the process to generate an md5 digest and the other to verify correctness of a list of files/hashs. Both cases rely on gnulib to generate the digest within the digest_file()
function. The processes look like this:
Generate md5 digests:
- Retrieve the next file name
- Open file (or STDIN)
- Call in to gnulib via
DIGEST_STREAM()
, result md5 digest stored inbin_result
- Verify no error occurred
- Close the file
- Repeat above while there are still files to process
Check md5 digests:
- Retrieve the list of files/digests to check
- Open the list file
- For each file entry:
- Split file entry in to the 3 expected components (file name, digest, access mode)
- Pass the file to the digest generator (above process) to generate a comparison
- Check for failures
- Compare the generated digest with the provided digest
- Repeat process for all files
- Report results
Failure cases:
- Unable to open or close input files
- User-provided checksum format mismatches
- Unable to read from input source
- md5 digest generator in gnulib fails for any reason
- Check list could not be parsed
- Check list checksums do not match
All failures at this stage output an error message to STDERR and return without displaying usage help