Guide to Unix/Commands/Text Processing
< Guide to Unix < CommandsUnix supports multiple text processing commands.
awk
awk is a powerful text-processing tool using regular expressions, providing expanded capabilities beyond #cut and #sed. You can learn more in AWK and An Awk Primer Wikibooks.
Links:
- awk, opengroup.org
- awk man page, man.cat-v.org
- The GNU Awk User’s Guide, gnu.org
comm
Identifies lines common to two files or unique to them. Options control the manner of identification, e.g. outputting only common lines.
Links:
- comm, opengroup.org
- comm man page, man.cat-v.org
- 7.4 comm in GNU Coreutils manual, gnu.org
csplit
Splits input into output files. The split can be driven by the number of lines and by a regex match.
Links:
- csplit, opengroup.org
- 5.4 csplit in GNU Coreutils manual, gnu.org
cut
cut can select columns ("fields") from lines in text files, with specifiable column separator.
Links:
- cut, opengroup.org
- cut man page, man.cat-v.org
- 8.1 cut in GNU Coreutils manual, gnu.org
expand
Converts tabs to spaces, defaulting to 8 spaces per tab. See also #unexpand.
Links:
- expand, opengroup.org
- 9.2 expand in GNU Coreutils manual, gnu.org
fmt
Formats text, including reflowing paragraphs to a specific maximum number of characters per line. Does not seem covered by POSIX:
Links:
- 4.1 fmt in GNU Coreutils manual, gnu.org
- fmt man page, freebsd.org
fold
Limits the maximum length of a line in a manner different from #fmt.
Links:
iconv
Converts between character encodings.
Links:
- iconv, opengroup.org
join
Combines lines from files based of their fields, assuming the files are sorted on the fields used for joining.
Links:
- join, opengroup.org
- join man page, man.cat-v.org
- 8.3 join in GNU Coreutils manual, gnu.org
nl
Adds line numbers.
Links:
paste
For multiple files, joins lines corresponding by line number as if each file were a column of a table and each file line a row of the table.
Links:
- paste, opengroup.org
- paste man page, man.cat-v.org
- 8.2 paste in GNU Coreutils manual, gnu.org
pr
Formats input for printing, including pagination with header and footer.
Links:
- pr, opengroup.org
- pr man page, man.cat-v.org
- 4.2 pr in GNU Coreutils manual, gnu.org
sed
sed, a stream editor, is noted for its text replacement capability with regular expression support, but can do more. You can learn more in Sed Wikibook.
Links:
- sed, opengroup.org
- sed man page, man.cat-v.org
- sed, a stream editor - GNU manual, gnu.org
sort
Sorts lines in files.
Links:
- sort, opengroup.org
- sort man page, man.cat-v.org
- 7.1 sort in GNU Coreutils manual, gnu.org
spell
Peforms spell checking. Seems absent from POSIX.
Links:
- spell man page, man.cat-v.org
- GNU spell project, savannah.gnu.org
tr
Performs a character-by-character mapping or "translation", and more.
Links:
- tr, opengroup.org
- tr man page, man.cat-v.org
- 9.1 tr in GNU Coreutils manual, gnu.org
unexpand
Converts spaces to tabs, defaulting to 8 spaces per tabs.
Links:
- unexpand, opengroup.org
- 9.3 unexpand in GNU Coreutils manual, gnu.org
uniq
Outputs single lines out of each same-line bloks, and more. Ideally used with the input sorted.
Links:
- uniq, opengroup.org
- uniq man page, man.cat-v.org
- 7.3 uniq in GNU Coreutils manual, gnu.org