[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]
Summary
dd - convert and copy a file
Lines of code: 2524
Principal syscalls: read(), write()
Support syscalls: fstat(), fsync(), ftruncate(), freopen(), fadvise()
Options: 0, dd uses OS/360-style operands. There are 13 of them.
Descended from dd introduced in Version 5 UNIX (1974)
Added to Fileutils in October 1992 [First version]
Number of revisions: 332
The complexity naturally leads to a very long list of helper functions:
Helpers:advance_input_after_read_error()
- Seeks forward through the inputadvance_input_offset()
- Adds a value to an offsetalloc_ibuf()
- Allocates the input bufferalloc_obuf()
- Allocates the output bufferapply_translations()
- Builds the translation table for this invocationcache_round()
- Round down the cache length to a buffer size multiplecleanup()
- Close I/O as neededcopy_simple()
- Copy without conversioncopy_with_block()
- Copy input lines to fixed length output blockscopy_with_unblock()
- Copy input lines to variable length output (spaces removed)dd_copy()
- Main copy and convert functionfinish_up()
- Prepare to end utility (restore signals, etc)ifd_reopen()
- Interruptable file reopeniftruncate()
- Interruptable file truncateinstall_signal_handlers()
- Replace default signal handlersinterrupt_handler()
- Handle a standard signalinvalidate_cache()
- Clear cache for data already processediread()
- Interruptable read data from input (signal checks)iread_fullblock()
- Wrapper for iread for an entire blockiwrite()
- Interruptable write data from buffer (signal checks)maybe_close_stdout()
- Function deciding how to close input at 'exit time'multiple_bits_set()
- True if input int has multiple asseted bitsoperand_is()
- Verifies that operand is knownoperand_matches()
- Confirms that inputs match operand formatnl_error()
- Error with newlineparse_integer()
- Converts a string to a number multipleparse_symbols()
- Verifies that input smybol is knownprint_stats()
- Prints output statisticsprint_xfer_stats()
- Outputs time and data statisticsprocess_signals()
- Handles pending signalsquit()
- Cleanly exits the utilityscanargs()
- Parses the operands and sets flagsset_fd_flags()
- Alter the target file flagssiginfo_handler()
- Custom INFO signal handler (print stats)skip()
- Skip data blocks for the target buffer (read and discard)skip_via_lseek()
- Wrapper for lseek to support tape drivesswab_buffer()
- Byte swap the indicated buffertranslate_buffer()
- Applies the active conversion to the target buffertranslate_charset()
- Builds the active conversion tablewrite_output()
- Pushes the output buffer to target output
Setup
The dd utility begins by defining a few global arrays and enums. Some key players are:
Global Arrays:
conversions[]
- Conversion types for argumentsflags[]
- Flag values for I/O buffersstatuses[]
- Status argument valuesascii_to_ebcdic[]
- An indexed conversion table (in octal)ascii_to_ibm[]
- An indexed conversion table (in octal)ebcdic_to_ascii[]
- An indexed conversion table (in octal)
Enums follow these arrays, including conversion flag values, status values, I/O flag values, and human values. These are implemented as values or bitfields depending on usage
After main()
is called, the first action is to install custom signal handlers. dd can be a long-running process and benefits from an extra feature: Printing I/O statistics via a repurposed a signal (USR1 or INFO depending on Linux/Unix).
Parsing
Parsing dd is different than most coreutils in that there are technically no options. The utility still uses GetOpt trivially, but the real work is done with scanargs()
. We are now parsing the arugments in the format of <symbol>=<value>
format. Again, this borrows from the OS/360 syntax
Some questions answered by parsing the operands:
- What are the input and output files?
- What is the conversion type?
- Where do we begin in the I/O files
- How much do we need to convert?
- Special considerations? Such as casing, padding, truncation, etc?
Parsing failures
These failure cases are explicitly checked:
- Unknown operands or values
- Excessive values
- Nonsensical flags (i.e. seek input, skip/count output, etc)
- Mutually exclusive flags (block/unblock, lower and upper case, etc)
Extra Comments
Here is an example the OS/360 DD syntax as described on an N74167 punch card in the JCL user guide from 1971:
Note that this just a syntax example: The OS/360 DD statement is completely unrelated to POSIX dd
Execution
The parser grabbed the conversion details, so now we can set up the translation. This means copying the global conversion tables to the active trans_table[]
. In the 21st century, this doesn't get much use, (especially IBM or EBCDIC formats). The trans_table was already initialized to ASCII identity and it likely stays that way.
Next we open both input and output files, skipping and seeking as necessary. These targets may be STDIO.
We're now ready for the core work in the dd_copy()
function:
- Verify the the input skip is usable
- Verify that the output seek is usable
- Allocate the input and output buffers
- Start the main read/convert loop
- Advance the progress timer
- Clear the input buffer to avoid stale data
- Read from input to the input buffer
- Verify read success and handle errors
- Handle partial block reads
- Translate the input buffer (run all bytes through table)
- Swap byte order, if requested (swap every other byte)
- Push block to the output buffer and write to output target
- Repeat this until all data read
- Clean up remaining data
- Handle dangling odd byte, if any
- Add padding to fill out any remaining block size
- End with new line
- Write the last block
- Truncate if needed
- Restore signal handlers
- Print result status
Execution could fail in several ways:
Failure cases:
- Unable to open input or output targets
- Unable to truncate output
- Unable to fstat output target
- Unable to drop the cache for input or output
- Unable to sync file caches
- Unable to set requested file flags
- Seeking beyond the end of a file
- Unable to modify target file flags