[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]
Summary
cut - cut out selected fields of each line of a file
Lines of code: 610
Principal syscall: write() (indirectly through fwrite() and putchar())
Support syscalls: fadvise()
Options: 17 (7 short, 10 long)
Descended from cut as originated in System III (1982) (appeared internally as early as 1980)
Added to Fileutils in November 1992 [First version]
Number of revisions: 192
next_item()
- Updates current index to the next byte/fieldprint_kth()
- True if the current byte/field is printableis_range_start_index()
- True if the current position starts a rangecut_bytes()
- Processes byte mode for an input streamcut_fields()
- Processes field mode for an input streamcut_stream()
- Directs stream processing to the desired modecut_file()
- Starts processing for an input stream
die()
- Exit with mandatory non-zero error and message to stderrerror()
- Outputs error message to standard error with possible process terminationset_fields()
- Initializesfield_range_pair
array
Setup
At global scope, cut.c does the following:
- Declares
field_range_pair
to hold the -f option range. From set-fields.h - Declares
field_1_buffer
as a character buffer for the first field - Declares
field_1_bufsize
holds the size of the buffer - Defines
operating_mode
enum for three operation modes (unknown, byte, and field) - Declares
operating_mode
type for the current operation - Declares
suppress_non_delimited
preventing field mode output for lines without delimiters - Declares
complement
to output the 'inverse' of the selected bytes/fields - Declares
delim
to hold the delimiting character for fields - Defines
line_delim
as the standard newline character - Declares
output_delimiter_specified
flag if a delimiter was defined - Declares
output_delimiter_length
as the length of the delimiter - Declares
output_delimiter_string
as the delimiter used on output - Declares
have_read_stdin
flag set if STDIN is used for processing
main() initializes the following:
delim_specified
- Flag if the user specifies a delimiterok
- Holds the return status of the utilityoptc
- Holds the option character being parsed
Parsing kicks off with the short options passed as a string literal:
"b:c:d:f:nsz"
Parsing
During parsings, we're collecting options and arguments to answer the following questions:
- Are we using byte mode or field mode?
- What are the delimiters?
- Do we collapse consecutive spaces?
- If field mode, do we use a custom input delimiter?
- If field mode, do we suppress lines without delimiters?
Parsing failures
These failure cases are explicitly checked:
- Specifying more than one mode
- Using a multicharacter delimiter
- Using an input delimiter in byte mode
- Suppression non-delimiter output in byte mode
User specified parsing failures result in a short error message followed by the usage instructions. Access related parsing errors die with an error message.
Execution
cut goes though these steps during execution
- Initialize the field range array using parsed configuration
- Open and verify the input (files or STDIN)
- Perform byte or field mode cut operation
- All results written to STDOUT
Failure cases:
- Unable to
fstat()
input or output file - The input and output files are the same
- Failure to write to the output file
- Failure to close input standard input (if used)
All failures at this stage output an error message to STDERR and return without displaying usage help
Extra comments
Let's touch on the two operating modes.
Byte Mode
Just as you'd expect, we're reading in a byte at a time. The procedure is simple:
- If the byte is within the print range: print it
- If not, do nothing
- If it's the end of a line, print it
- If it's the end of the file, print the line delimiter
Field Mode
Field mode is more complicated since we may be suppressing lines and using custom delimiters. It's necessary to buffer the first field in order to retain characters while making a buffering decision. A line without a delimiter has only one field.
- Test for EOF
- If the first field is the whole line, output if not supressing to include end of line character.
- If the first field is the first of many, then print if selected
- Print any subsequent fields based on selection rule
- Escape the default line delimited if overridden with -d option
- Repeat above for all lines until the EOF test fires