$title = "Sorting the content of a file - sort"; $area = "Unix Support"; $metadata = "unix, commands, file, sort, merge"; $pfloc = ""; require '/usr/local/wwwdocs/ucs/fragments/header.phtml'; require '/usr/local/wwwdocs/ucs/fragments/header-bc.phtml'; ?>
The sort command is a simple database tool. It allows you to specify the field to be sorted on and the type of sort to be carried out on it.
sort filename
Lines are sorted into the following order:
Lines are sorted character by character, starting at the first character in the line. You can also sort the contents of a file on a specific part of each line.Each line of text is a series of fields - words and other characters - separated from each other by a delimiter character - the spaces between the words.
Unless you specify otherwise sort treats the one or more space characters between words as field delimiters. You may want to sort files where words are separated by another character such as a colon (:). Use the -t option to specify another character to act as the field delimiter. For example
-t:
specifies the colon (:) character as the delimiter.
One specific delimiter followed by another indicates an empty field. For example with a colon (: ) as a delimiter
field0:field1::field3:field4
field2 is taken to be empty.
The first field of each line starts at 0 (zero); the second is 1 (one) and so on. To define which field to sort on you give the position of the field at which to start the sort followed by the position at which to end the sort. The position at which to start the sort is given as the number of fields to skip to get to this position. For example
+2
tells sort to skip the first two fields.
The position at which to stop the sort is given as the number of the field at the end of which the sort stops. For example
-3
tells sort to stop the sort at the end of field three.
To sort on the third field of a line use the definition:
+2 -3
To sort on the fields 5 and 6:
+4 -6
You can sort a file on several fields with the same sort command. The file is sorted on the first field; if some lines have the same value for this field they are sorted on the next field you specify and so on.
This is the file scores each line of which gives the initial, surname, gender, age and test score of a person. Each field is separated by a space.
J. Arnott F 43 78 T. Rice M 42 67 M. Macdonald F 39 85 A. Board M 36 43 M. Mouzin F 27 57 R. Bramley M 42 67
To sort the file on two fields, first by age (field 4) and then by score (field 5), placing the output in the file results.
sort -o results +3 -4 +5 scores
This gives the following result in the file results.
M. Mouzin F 27 57 A. Board M 36 43 M. Macdonald F 39 85 T. Rice M 42 67 R. Bramley M 42 75 J. Arnott F 43 78
From the first sort on field 4 (age) sort finds that Rice and Bramley are the same age so it now sorts these two lines on field 5 (score), placing Rice (67) before Bramley (75).
To define which part of a field to sort on, give the position in the field at which to start the sort followed by the position in the field at which to end the sort.
The position at which to start the sort is given as the number of fields to skip followed by the number of characters to skip. For example:
+2.3
tells sort to skip the first two fields and the first three characters of field 3.
The position at which to stop the sort is given as the number of the field followed by number of the character in that field at which the sort is to stop. For example:
-3.6
tells sort to stop the sort at the 6th character in field three.
To sort on the fourth and fifth characters of field three:
+2.3 -3.6
sort sends its results directly to your screen. To save them to a file use the -o option. For example:
sort -o mail addresses
This saves the results of sorting the file addresses in the file mail.
You can check if a file has already been sorted with the -c option. For example:
sort -c +2 -3 accounts
This checks to see if the file accounts has already been sorted on the third field of each line. You will only get a message if the file is not sorted according to this sequence.
You can merge two or more files at the same time as they are sorted. For example:
sort -m -o accounts +1 -2 month[1-3]
sorts the files month1 through month3 on the information in the second field of each line of each file. The results are merged and sent to the file accounts.
Files that have already been sorted can be merged using only the -m option. For example:
sort -m first_qtr second_qtr > halfyear
This merges the contents of files first_qtr and second_qtr and places the result in the file halfyear.
The > character is used to direct the output of the sort command into the file halfyear. (Note that directing output to a file that already exists overwrites its contents).
sort has a number of options which allow you to change the order in which files are sorted. This enables you to match the action of the sort command to the type of data you have. Always use these options immediately after the sort command. For example:
sort -n +1 -2 costs
sorts the data in field 2 of the file costs by its arithmetic value.
-d (Dictionary order) sort only uses numbers, letters and blank spaces when sorting the file. -f (Fold in lower case) sort treats all lower-case letters as if they were upper-case letters. -i (Ignore non-printing characters) sort ignores any characters that do not print. -n (Numeric order) sort recognises the value of numbers and sorts them in terms of their value. -r (Reverse order) sort reverses its default sort order.
To sort a file in dictionary order:
sort -d +1 -2names > orders
Lines in the file names are sorted on field 2 into dictionary order and the result placed in the file orders.
To sort a file in month order:
sort -M +2 -3 -o breakdown orders
Lines in the file orders are sorted on field 3 into month order and the result placed in the file breakdown.
Used with its -u option the sort command removes duplicate lines from a file. For example:
sort -u -o mail_labels addresses
removes all duplicate lines from the file addresses and places the output in the file mail_labels. Duplicate lines do not have to be consecutive for the sort command to identify and remove them.
To sort a file on field2 (the third word):
sort +2 -3 names
To sort several files using a different sort order and send the results to another file:
sort -n -o report +1 -2 month[1-3]
This sorts the files month1 through month3 on field1 (the second word) according to their numeric value and places the result in the file report.
To sort a file on a specific part of one field, using a different field delimiter, removing duplicate lines and sending the output to another file:
sort -t: -o birthdays +3.2 -4.5 friends
This sorts the file friends on the third and fourth characters of field3 (fourth word). Each field in this file is delimited by a : (colon). The results are placed in the file birthdays.
require '/usr/local/wwwdocs/ucs/fragments/footer.phtml'; ?>