[GNU Manual] [POSIX requirement] [Linux man] [FreeBSD man]
Summary
tsort - topological sort
Lines of code: 574
Principal syscalls: write() via puts()
Support syscalls: open(), close(), fadvise()
Options: 2 (--help
and --version
)
Descended from tsort introduced in Version 7 UNIX (1979)
Added to Textutils in January 1999 [First version]
Number of revisions: 93 [Code Evolution]
tsort is an ancient utility with hand-crafted procedures using custom data structures. It hasn't seen a functional change in a decade(s?) and thus uses virtually no outside assistance from gnulib.
Helpers:count_items()
- Tree function to Count the number of stringsdetect_loops()
- Checks for cycles in successor relationshipsnew_item()
- Creates a new item structurerecord_relation()
- Creates a new relationship between itemsrecurse_tree()
- Performs a given recursive operation on item sub-treescan_zeros()
- Tree function to add an item to global zeros listsearch_item()
- Binary tree search for a string in a tree of itemstsort()
- Performs a topological sort on a single input filewalk_tree()
- Moves to a tree node (right node)
die()
- Exit with mandatory non-zero error and message to stderrerror()
- Outputs error message to standard error with possible process termination
Setup
tsort uses two data structures to organize input and build the output:
struct item
- A node in a binary search tree with parameters including value, left and right child nodes, balance parameter, and a count of successorsstruct successor
- A list ofitem
node relationships
The utility also uses several global pointers to retain state:
*head
- The beginning of the output list*loop
- The loop test pointn_strings
- The number of strings to sort*zeros
- The end of a list of nodes with no ancestors (start node candidates)
The important local variables to know are actually declared in tsort() rather than main()
:
is_stdin
- Flag if we're reading input from STDIN*j
- one node in a relation*k
- The other node in a relationok
- The return status of the utility*root
- The top of our node treetokenbuffer
- A buffer to hold input relations
Parsing
Parsing tsort is different from most utilities in that there are only the default options to consider (help and version).
The utility does parse relationships input from either a file or STDIN, but I consider those part of the execution phase
Using any other option or providing no input data results in a short error message followed by the usage instructions.
Execution
The general strategy constructs a tree from the input data and repeatdly walks the tree collecting data and making decisions. This is much like the early stages of a compiler working on an abstract syntax tree.
The process looks like this:
- Initialize a tokenbuffer to for tokenizing input data
- Read each line of input and construct an item tree node
- Walk the tree to count the number of strings (the amount of work to do)
- Walk the tree again to find unconstrained nodes (no ancestors). The first found will be the head of the tree and candidate to output
- Print the head node, remove it, update tree structure, and repeat previous step tree walk
- Detect cycles -- any remaining nodes in the tree are part of a cycle, find it and print a solution to break the cycle
- Close the input file
Failure cases:
- Unable to open or close input stream
- Dangling constraints / mismatched relations
- Loop detected in node relationships (no topological ordering is possible)
Failures at this stage output an error message to STDERR unless quiet mode was enabled