[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]
Summary
pr - convert text files for printing
Lines of code: 2848
Principal syscalls: write() (via putchar())
Support syscall: None
Options: 62 (36 short, 26 long)
Descended from pr introduced in Version 1 UNIX (1971)
Added to Textutils in November 1992 [First version]
Number of revisions: 241
The pr utility includes excellent documentation at the top of the source file.
Setup
pr defines many globals and a struct used throughout the code. Some key ideas are:
Structs:
struct COLUMN
- Manages data for a single column on the page. This struct is very similar to an 'object' in OOP - it has variables and generic function pointers that we can assign custom behaviors (a la methods). All output runs through the COLUMN structs.
Globals:
align_empty_cols
- Flag indicating empty columns in line*buff
- The buffer to store solumn databuff_current
- The index tobuff
buff_allocated
- The size ofbuff
*column_vector
- All of the columns we need to output*end_vector
- Holds horizonal position the end of lineexplicit_columns
- Flag set if the number of columns is givenFF_only
- Flag to indicate a form feed detected in columnhave_read_stdin
- Flag indicating that we used STDINjoin_lines
- Flag set for line merger (blends options -w and -s)*line_vector
- Start index of each line inbuff
parallel_files
- Flag for printing multiple files in parallel (defualt no)print_a_header
- Flag indicating the time to print a page headerstoring_columns
- Flag set if we're printing a single file in columns (must buffer)truncate_lines
- Flag to clip lines longer than page widthuse_form_feed
- Flag to use form feed in place of newlines (\f vs \n)
Other important globals are used to hold the computed page geometry. Their names are self explanatory: lines_per_page
, lines_per_header
, lines_per_body
, lines_per_footer
, chars_per_line
, chars_per_column
, chars_per_output_tab
main()
begins by initializing a few more variables
n_files
- The number of files we're processing (index forfile_names
)old_options
- Flag indicating that we're using old options (-w or -s)old_w
- Flag set if we're using the old page width optionold_s
- Flag set if we're using the old separator optionfile_names
- Array holding the input file namescolumn_count_string
- Holds the column count as a stringn_digits
- Length of the column countn_alloc
- Allocated length of the column count
Parsing
Parsing the cli input answers these questions:
- What is the page format? (length, columns, spacing, header, etc)
- What is the inputs source? (files, stdin)
After parsing options, some legacy choices are translated to newer versions including as -s
and -w
.
Finally, we copy the files names passed on the command line in to a file_names[]
array. If there are no file names, assume that the remainder of the standard input is to be processed as a file (usually redirected in).
Parsing failures
These failure cases are explicitly checked:
- Nonsensical page ranges, line numbers, or offsets
- Missing option operands
- Unusual page widths
- Combining column count and parallel printing
- Combining printing across and parallel printing
Execution
Now we're ready for input processing through several layers: File, Page, Line, and Column. Each layer performs tasks and checks before calling to the next lower layer. Actual output printing happens at the Column layer using the COLUMN struct's print function.
Files
First we process all the input files within the function print_files()
. Since this is the highest level, we start by preparing the execution environment by computing global parameters from the parsed options. These include line sizes, separators, tabs, join behavior, and truncation.
The remainder of the file processing includes:
- Initializing column buffers
- Skipping pages if requested by the user
- Determine the final output functions for COLUMNS
- Pass to the page layer (
print_page()
)
Pages
Like files, print_page()
begins with page initialization to set the source for each column to be printed (either a stored buffer or directly from the file)
Other paging flags are the header and the vertical padding/spacing.
Then we begin the loop through each line on the page
Lines
The line loop is contained within the print_page()
function.
At the beginning of a line we must reset counters including output position, spaces skipped, separators skipped, padding status, and a few other flags. Now we perform the actual output by loop through each column (see next section). Afterwards, we complete a line by vertically padding and double spacing (if necessary).
Columns
The heart of this procedure is the call to COLUMN->print_func()
. Naturally, this only happen if the column still has data left to print. Subsequently, we may need to print column separators or add alignment padding if the line was empty.
There are several ways that file processing could fail:
Failure cases:
- Too many pages (overflow)
- Unable to close a file
Helpers
add_line_number()
- Prints the current line numberalign_column()
- Pads column if necessary and prints the separatorbalance()
- Computes balanced lines per page (as in SysV)char_to_clump()
- Converts char clump buffer and returns true sizecleanup()
- Frees global buffersclose_file()
- Closes an input file and updates COLUMN datacols_ready_to_print()
- Returns number of columns with input readyfirst_last_page()
- Sets the first and last page numbergetoptarg()
- Parse option groupsgetoptnum()
- Parse number argumentshold_file()
- Suppress file updates for the rest of the pageinit_fps()
- Initializes input files for processinginit_funcs()
- Sets up the printing/reading functions for COLUMNsinit_header()
- Constructs the page headerinit_page()
- Gathers column status for the page and sets globalsinit_parameters()
- Sets up the page geometry data from inputinit_store_cols()
- Allocate column datainteger_overflow()
- Fail procedure for integer overfowopen_file()
- Accesses an input file and sets COLUMN data as neededpad_across_to()
- Pad line to a positionpad_down()
- Pad the rest of the page (\f or \n as needed)print_char()
- Prints a character, escaping input as neededprint_clump()
- Prints a character groupparse_column_count()
- Parses a column count stringprint_files()
- Process all input filesprint_header()
- Outputs the page headerprint_page()
- Output a pageprint_sep_string()
- Counts separator usage and prints appropriate characterprint_stored()
- Prints aline from the bufferprint_white_space()
- Prints blanks as needed for space or tabread_line()
- Reads an entire line, clumping characters among \n, \f, and EOFread_rest_of_line()
- Read the remainder of a line after a breakreset_status()
- Resume files on holdseparator_string()
- Updates separator string lengthskip_read()
- Reads and counts lines from columns. Discards charactersskip_to_page()
- Skips lines until a specific pagestore_char()
- Handles a character by storing in a bufferstore_columns()
- Buffers leading columns in a line