Sorting the content of a file: sort

The sort command is a simple database tool. It allows you to specify the field to be sorted on and the type of sort to be carried out on it.

 sort filename

Lines are sorted into the following order:


Sorting on a specific field

Lines are sorted character by character, starting at the first character in the line. You can also sort the contents of a file on a specific part of each line.Each line of text is a series of fields - words and other characters - separated from each other by a delimiter character - the spaces between the words.

Changing the field delimiter

Unless you specify otherwise sort treats the one or more space characters between words as field delimiters. You may want to sort files where words are separated by another character such as a colon (:). Use the -t option to specify another character to act as the field delimiter. For example

   -t:

specifies the colon (:) character as the delimiter.

One specific delimiter followed by another indicates an empty field. For example with a colon (: ) as a delimiter

   field0:field1::field3:field4

field2 is taken to be empty.

Defining the sort field

The first field of each line starts at 0 (zero); the second is 1 (one) and so on. To define which field to sort on you give the position of the field at which to start the sort followed by the position at which to end the sort. The position at which to start the sort is given as the number of fields to skip to get to this position. For example

   +2

tells sort to skip the first two fields.

The position at which to stop the sort is given as the number of the field at the end of which the sort stops. For example

   -3

tells sort to stop the sort at the end of field three.

To sort on the third field of a line use the definition:

   +2 -3

To sort on the fields 5 and 6:

   +4 -6

Sorting on several fields

You can sort a file on several fields with the same sort command. The file is sorted on the first field; if some lines have the same value for this field they are sorted on the next field you specify and so on.

This is the file scores each line of which gives the initial, surname, gender, age and test score of a person. Each field is separated by a space.

   J. Arnott F 43 78
   T. Rice M 42 67
   M. Macdonald F 39 85
   A. Board M 36 43
   M. Mouzin F 27 57
   R. Bramley M 42 67

To sort the file on two fields, first by age (field 4) and then by score (field 5), placing the output in the file results.

   sort -o results +3 -4 +5 scores

This gives the following result in the file results.

   M. Mouzin F 27 57
   A. Board M 36 43
   M. Macdonald F 39 85
   T. Rice M 42 67
   R. Bramley M 42 75
   J. Arnott F 43 78

From the first sort on field 4 (age) sort finds that Rice and Bramley are the same age so it now sorts these two lines on field 5 (score), placing Rice (67) before Bramley (75).

Sorting on a specific part of a field

To define which part of a field to sort on, give the position in the field at which to start the sort followed by the position in the field at which to end the sort.

The position at which to start the sort is given as the number of fields to skip followed by the number of characters to skip. For example:

   +2.3

tells sort to skip the first two fields and the first three characters of field 3.

The position at which to stop the sort is given as the number of the field followed by number of the character in that field at which the sort is to stop. For example:

   -3.6

tells sort to stop the sort at the 6th character in field three.

To sort on the fourth and fifth characters of field three:

   +2.3  -3.6


Saving the results of a sort to a file

sort sends its results directly to your screen. To save them to a file use the -o option. For example:

   sort -o mail addresses

This saves the results of sorting the file addresses in the file mail.


Checking if a file has been sorted

You can check if a file has already been sorted with the -c option. For example:

   sort -c +2 -3 accounts

This checks to see if the file accounts has already been sorted on the third field of each line. You will only get a message if the file is not sorted according to this sequence.


Merging sorted files

You can merge two or more files at the same time as they are sorted. For example:

    sort -m -o accounts +1 -2 month[1-3]

sorts the files month1 through month3 on the information in the second field of each line of each file. The results are merged and sent to the file accounts.

Files that have already been sorted can be merged using only the -m option. For example:

   sort -m first_qtr second_qtr > halfyear

This merges the contents of files first_qtr and second_qtr and places the result in the file halfyear.

The > character is used to direct the output of the sort command into the file halfyear. (Note that directing output to a file that already exists overwrites its contents).


Changing the sort order

sort has a number of options which allow you to change the order in which files are sorted. This enables you to match the action of the sort command to the type of data you have. Always use these options immediately after the sort command. For example:

   sort -n +1 -2 costs

sorts the data in field 2 of the file costs by its arithmetic value.

-d  (Dictionary order) sort only uses numbers, letters
     and blank spaces when sorting the file.

-f  (Fold in lower case) sort treats all lower-case letters
     as if they were upper-case letters.

-i  (Ignore non-printing characters) sort ignores any
     characters that do not print.

-n  (Numeric order) sort recognises the value of numbers
     and sorts them in terms of their value.

-r  (Reverse order) sort reverses its default sort order.

Examples

To sort a file in dictionary order:

   sort -d +1 -2names > orders

Lines in the file names are sorted on field 2 into dictionary order and the result placed in the file orders.

To sort a file in month order:

   sort -M +2 -3 -o breakdown orders

Lines in the file orders are sorted on field 3 into month order and the result placed in the file breakdown.


Removing multiple occurrences of a line

Used with its -u option the sort command removes duplicate lines from a file. For example:

   sort -u -o mail_labels addresses

removes all duplicate lines from the file addresses and places the output in the file mail_labels. Duplicate lines do not have to be consecutive for the sort command to identify and remove them.


Examples

To sort a file on field2 (the third word):

   sort +2 -3 names

To sort several files using a different sort order and send the results to another file:

   sort -n -o report +1 -2 month[1-3]

This sorts the files month1 through month3 on field1 (the second word) according to their numeric value and places the result in the file report.

To sort a file on a specific part of one field, using a different field delimiter, removing duplicate lines and sending the output to another file:

   sort -t: -o birthdays +3.2 -4.5 friends

This sorts the file friends on the third and fourth characters of field3 (fourth word). Each field in this file is delimited by a : (colon). The results are placed in the file birthdays.