colltbl -- create collation database


colltbl [file | -]


The colltbl command takes as input a specification file, file, that describes the collating sequence for a particular language and creates a database that can be read by strxfrm(3C) and strcoll(3C). strxfrm(3C) transforms its first argument and places the result in its second argument. The transformed string is such that it can be correctly ordered with other transformed strings by using strncmp [see string(3C)]. strcoll(3C) transforms its arguments and does a comparison.

If no input file is supplied, stdin is read.

The output file produced contains the database with collating sequence information in a form usable by system commands and routines. The name of this output file is the value you assign to the keyword codeset read in from file. Before this file can be used, it must be installed in the /usr/lib/locale/locale directory with the name LC_COLLATE by someone who is super-user or a member of group bin. locale corresponds to the language area whose collation sequence is described in file. This file must be readable by user, group, and other; no other permissions should be set. To use the collating sequence information in this file, set the LC_COLLATE environment variable appropriately [see environ(5) or setlocale(3C)].

The colltbl command can support languages whose collating sequence can be completely described by the following cases:

The specification file consists of three types of statements:

codeset filename
filename is the name of the output file to be created by colltbl.

order is order_list
order_list is a list of symbols, separated by semicolons, that defines the collating sequence. The special symbol ... specifies symbols that are lexically sequential in a short-hand form. For example, order is a;b;c;d;...;x;y;z would specify the list of lowercase letters. Of course, this could be further compressed to just a;...;z.

A symbol can be up to two bytes in length and can be represented in any one of the following ways:

Any combination of these may be used as well.

The backslash character, ``\'' , is used for continuation. No characters are permitted after the backslash character.

Symbols enclosed in parentheses are assigned the same primary ordering but different secondary ordering. Symbols enclosed in curly brackets are assigned only the same primary ordering. For example,

order is a;b;c;ch;d;(e;é);f;...;z;\
In the above example, ``e'' and ``é'' are assigned the same primary ordering and different secondary ordering, digits 1 through 9 are assigned the same primary ordering and no secondary ordering. Only primary ordering is assigned to the remaining symbols. Notice how double letters can be specified in the collating sequence (letter ``ch'' comes between ``c'' and ``d'').

If a character is not included in the order is statement, it is excluded from the ordering and will be ignored during sorting.

substitute string with repl
The substitute statement substitutes the string string with the string repl.
This can be used, for example, to provide rules to sort the abbreviated month names numerically:
substitute "Jan" with "01"
substitute "Feb" with "02"
substitute "Dec" with "12"
A simpler use of the substitute statement would be to substitute a single character with two characters, as with the substitution of ``ß'' with ``ss'' in German.
The substitute statement is optional. The order is and codeset statements must appear in the specification file.

Any lines in the specification file with a ``#'' in the first column are treated as comments and are ignored. Empty lines are also ignored.


The following example shows the collation specification required to support a hypothetical telephone book sorting sequence.

The sorting sequence is defined by the following rules:

The input specification file to colltbl will contain:
   codeset	telephone

order is A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\ G;g;H;h:I;i;J;j;K;k;L;l;M;m;N;n;O;o;P;p;\ Q;q;R;r;S;s;T;t;U;u;{V;W};{v;w};X;x;Y;y;Z;z

substitute "0" with "zero" substitute "1" with "one" substitute "2" with "two" substitute "3" with "three" substitute "4" with "four" substitute "5" with "five" substitute "6" with "six" substitute "7" with "seven" substitute "8" with "eight" substitute "9" with "nine"


LC_COLLATE database for locale

input file used to construct LC_COLLATE in the default locale.


environ(5), memory(3C), setlocale(3C), strcoll(3C), string(3C), strxfrm(3C)
© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 25 April 2004