[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]
Summary
cat - concatenate and print files
Lines of code: 768
Principal syscall: write() -- wrapped by full_write()
Support syscalls: fstat()
Options: 19 (10 short, 9 long)
Descended from cat introduced in Version 1 UNIX (1971)
Added to Textutils in November 1992 [First version]
Number of revisions: 162
cat()
- Implements all the features for I/O copyingnext_line_num()
- Update the line number buffersimple_cat()
- Basic copy from input to outputwrite_pending()
- Full-write any pending data
die()
- Exit with mandatory non-zero error and message to stderrerror()
- Outputs error message to standard error with possible process terminationfull_write()
- Wrapper forwrite()
that retries on interruptgetpagesize()
- Gets the memory page size for the systemio_blksize()
- Gets the optimal block sizeptr_align()
- Ensures that returned pointer is memory alignedsafe_read()
- Reads with retry on interrupt
Setup
At global scope, cat.c does the following:
- Defines
infile
to point to the input file name - Defines
input_desc
to hold the file descriptor - Defines
line_buf[]
and several associated pointers to manage line number counts. The 18 digit limitation won't likely matter under normal operation. - Defines
newlines
to track number of new lines across many inputs
main() initializes the following:
argind
- argv index for the argument to catc
- Holds the next option character for parsingfile_open_mode
- Bitmap holding the file modehave_read_stdin
- Flag set if STDIN was usedinbuf
- Pointer to the input bufferinsize
- The optimial number of bytes to read innumber
- Flag for numbering the output linesnumber_nonblank
- Flag for numbering the non-blank output linesok
- Flag for execution successout_dev
- The output device numberout_ino
- The output inode numberout_isreg
- Flag if the output is a plain fileoutbuf
- Pointer to the output bufferoutsize
- The optimal number of bytes to write outpage_size
- Stores the size of a memory page for the system (4k is common)show_ends
- Flag for showing the end of line character ($)show_nonprinting
- Flag for showing nonprintable charactersshow_tabs
- Flag for showing tab characters (^I)squeeze_blank
- Flag for skipping repeated blanksstat_buf
- Buffer for the result offstat()
Parsing kicks off with the short options passed as a string literal:
"benstuvAET"
Parsing
During parsing, we're collecting options and arguments to answer the following questions:
- Do we display line numbers? On all lines or only non-empty?
- Do we display non-printables or end of lines?
- Do we collapse consecutive spaces?
Parsing failures
The only parsing failure is when an unknown option is used. In that case, help usage is displayed
Execution
cat goes though these steps during execution
- Open and verify access to output
- Open and verify the input
- Choose an output cat method (simple or normal)
- Write the data between output and input in buffer increments
- Close the input and move to the next
- End with the 'best' possible status
Failure cases:
- Unable to
fstat()
input or output file - The input and output files are the same
- Failure to write to the output file
- Failure to close input standard input (if used)
All failures at this stage output an error message to STDERR and return without displaying usage help
Extra comments
Two points to consider: The choice of cat method and the transfer buffer size
cat() vs simple_cat()
If the input is copied directly to the output without changes, then simple_cat()
is the method. But if additional formatting is requested (i.e. line numbers, non-printables), then the full cat()
function is calleed. The latter is necessarily more complicated at ~300 lines vs 40.
Buffer sizes
Buffers hold data between read and write calls. Common sense says that the buffer should be at least as large as the largest single I/O move. However, reality is more complicated. Consider the stated output buffer size:
OUTSIZE - 1 + INSIZE * 4 + LINE_COUNTER_BUF_LEN + PAGE_SIZE - 1
Source comments provide some discussion, but I'll derive it differently. There are four factors at work:
- The buffer writes OUTSIZE sized chunks
The buffer may not write if there is only OUTSIZE - 1 bytes. In the same pass, the buffer must be able to accept the next read of INSIZE bytes. Therefore the output buffer must have at least OUTSIDE - 1 + INSIZE. - Each character might be modified (non-printables)
Each character read may be unprintable with a leading 'M-^' indicator. Thus INSIZE needs to be multiplied by 4 to hold the adjustment - Each line might be modified (added line numbers)
The maximum supported line number length is 20 digits (as LINE_COUNTER_BUF_LEN). These line numbers are prepended to the line and thus must be part of the output buffer. - Buffer access should be page-aligned
Performance on some architectures depends on alignment. In worst case, the buffer is allocated starting on the 2nd byte of a page and thus to align it, we must move forward PAGE_SIZE - 1 to the beginning of the next page.