[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]
Summary
comm - compare two sorted files line by line
Lines of code: 500
Principal syscalls: write() via fwrite()
Support syscalls: fadvise()
Options: 11 (4 short, 7 long)
Descended from comm in Version 2 UNIX (1972)
Added to Textutils in November 1992 [First version]
Number of revisions: 134
check_order()
- Verifies the order of lines from a filecompare_files()
- Performs the entire comparison and output procedurewriteline()
- Writes the requested line withfwrite()
die()
- Exit with mandatory non-zero error and message to stderrerror()
- Outputs error message to standard error with possible process terminationreadlinebuffer_delim()
- Read an entire line of text including the delimiter to a linebuffer
Setup
comm declares several global flags that control execution flow and are defined during parsing:
both
- Flag to print lines found in both files (default behavior)hard_LC_COLLATE
- Flag set if LC_COLLATE is in a standard location (xmemcoll()
is usable)issued_disorder_warning[]
- Flag array for both input files holding warning statusonly_file_1
- Flag to only print lines in file 1only_file_2
- Flag to only print lines in file 1seen_unpairable
- Flag set if we've observed mismatched lines between filestotal_option
- Flag to print a summary (--total option)
Three global variables that affect output display include:
*col_sep
- The character that separates columns (usually \t)col_sep_len
- The length of the separatordelim
- The character that separates lines
comm includes one local variable in main()
, c
, used as the first letter of the next option to process.
Parsing
Parsing comm sets up execution flags based these ideas:
- Should we verify the input file orders?
- Should there be an alternate column separator?
- Should we newline or NUL delimit?
- Which input files should be displayed?
Parsing failures
These failure cases are explicitly checked:
- Specifying multiple output delimiters
- Not providing two input files
- Unknown options used
Failures result in a short error message followed by the usage instructions.
Execution
The comm utility uses linebuffer data structures to read, hold, and compare lines pulled from the file streams. Since we always pull from stream sequentially, we assume that both input files are sorted in the same way in order for output to be relevant.
The execution process looks like this:
- Initialize linebuffer:
lba
and associated pointers:,*thisline
, and*all_line
- Open both file streams
- Load lines from both files in to associated linebuffers
- Check if there are any lines in either file to process. If so:
- If either stream is empty, write the next line of the non-empty stream
- If both streams have lines, compare and write the lesser
- Pull the next line from the stream written from
- Rotate through the used streams
- Repeat check if there are more lines to process
- Close the files
- Print the total results, if requested
Failure cases:
- Unable to open or close file streams
- Unable to read from a file stream
Failures at this stage output an error message to STDERR unless quiet mode was enabled