About files and directories

Managing files

The directory handling operations (creating, deleting, administering and navigating) allow you to impose a structure on your area of the filesystem. It is the files, however, that store information like program code, text, database records and graphics. The UnixWare system supplies a wide range of commands for managing files. The following sections discuss some of these. Some of the more complex systems are discussed in other topics; see, for example, ``A quick tour of vi'' for an explanation of how to use the vi(1) editor.

Finding out what type of data a file contains

As we saw in ``File and directory attributes'', the system supports numerous different file types. The contents of text files, for example, can be displayed on the screen using such commands as cat, more and pg. Doing this with a binary (or compiled program) file, may cause the screen to lock, as such files usually contain many control characters. (See ``Looking at the contents of a file'' for more on the display commands and on garbling your screen.)

To avoid using an unsuitable command to display the contents of a file, first find out what kind of information a file contains. To do this, use the file(1) command, as follows:

   $ file mbox
   mbox:   ascii text
   $ file tools
   tools:  directory
   $ file /bin/ls
   /bin/ls:        ELF  32-bit LSB executable 80386 Version 1
The file command accepts either a simple filename or a pathname as an argument.

Looking at the contents of a file

The simplest way to look at the contents of a short file is to use the cat(1) command, as follows:

cat filename

If the file is more than one screen long, it scrolls off the screen, making it difficult to read its contents. If this happens, press <Ctrl>S to temporarily stop the scrolling, and <Ctrl>Q to restart the scrolling. If you want to stop the scrolling completely, press <Del>.

If you do not know what is in a file you want to look at, use the cat -v option, as follows:

cat -v filename

This option causes any unprintable characters in filename to be displayed in a manner which does not garble your screen. If you do use cat without using the -v option, and your screen becomes garbled and the machine beeps a lot, press <Ctrl><Del>, <Break>, or <Del> (depending on your terminal). If you cannot clear it, you may need to ask your system administrator for help.

To look at the contents of a file that is too big to fit on a single screen, use the more command, as follows:

more filename

You can use the pg command in the same way.

You can look at more than one file at a time by using the display commands with several filenames as arguments, as follows:

   $ more file4 file5 file6
In the case of the more command, press <Space> to display a screenful of text. When you reach the end of the first file, more displays a message at the bottom of the screen (Next file: filename2). Press <Space> again to go to the next file.

If you want to go directly to the next file before finishing the first, enter :n; more skips to the next file. See ``Listing the contents of a directory'' for more information on the more and pg commands.

Finding out how much text is in a file

The wc(1) command counts the number of lines, words, and characters in a file, using the options -l, -w, or -c respectively. For example, to print the number of characters and lines in a file called myfile, execute the following command:

   $ wc -cl myfile
       32675  684  myfile
The order in which you specify the options determines the order of the output.

You can also give wc a list of files to count:

   $ wc chap1 chap3
       105    676   3844 chap1
       675   3869  24269 chap3
       780   4545  28113 total
The total line gives sums for the lines, words and characters in the two files, chap1 and chap2.

Looking at the beginning and end of a file

To look at the first ten lines of a file, use the head(1) command:

head filename

To look at the last ten lines of a file, use the tail(1) command:

tail filename

If you use a numerical option, for example -20, head or tail will print that number of lines (20) instead of ten, the default, as follows:

   $ head -20 file6.txt

Copying a file

To copy one or more files, use the cp(1) command, which takes one of the following formats:

cp filename copyname
cp filename ... pathname

In the first format, filename (with optional path) is the name of the existing file that you want to be copied; copyname (with optional path) is the name you want the copy to be created with.

If you are using the second format to copy a group of files, you can only specify a directory, pathname, as the destination of the specified files. filename and copyname cannot be the same if they are both in the same directory.

When you copy a file you are creating a duplicate of it, which occupies additional space in the filesystem. Although the contents of the new file are the same as those of the original file, the new copy has its own inode number; any operations carried out on it have no impact on the original.

For example, to copy the file project1 from your current directory to the directory /u/workgrp, type the following command:

   $ cp project1 /u/workgrp
The copy will retain the name project1, but will have a different pathname.

You can copy a file to your current directory by typing a command line like the following:

   $ cp ../../a.out .
In this case, the file called a.out is located two levels above the current working directory, and is to be copied to the current location (as indicated by the ``.'' notation).

NOTE: When copying a file, be careful not to overwrite an existing file. To avoid this, do not create a copy with the same name as an existing file, as this will overwrite (clobber) the contents of the existing file. This can be avoided by setting the noclobber variable.

When you copy a file, you automatically become its new owner. Accordingly, you must have read permission on a file before you can copy it. You can place files in any directory for which you have write permission. If you want to create a copy of a file without changing its ownership, use the command copy -o instead of cp; this preserves the owner and group of the file. For example, to copy /tmp/johnsfile to your home directory without changing the ownership of the file, type the following:

   $ copy -o /tmp/johnsfile johnsfile
For information on how you can assign the ownership of a file to someone else, see ``Giving a file to someone else'' and ``Access control for files and directories''.

Moving or renaming a file

To move one or more files to another directory, use the mv(1) command, as follows:

mv filename ... pathname

The one or more filename arguments (with optional path) specify the file or files you want to move; pathname is the path to the directory where you want to put the file.

For example, to move the file project1 from your current directory to the directory /u/workgrp, type the following:

   $ mv project1 /u/workgrp
The procedure for moving files is the same as for renaming files. You rename a file by moving it to a new filename. To move (rename) a file, type the following:

mv old_filename new_filename

old_filename is the file's current name and new_filename is the name you want to change it to.

You can move a file to a different directory and rename it at the same time. For example, the following command line moves chapter.1 to /u/workgrp and renames it to at the same time:

   $ mv chapter.1 /u/workgrp/
You can place files in any directory to which you have write permission. To move a file, you need read permission unless you own it.

NOTE: If you give a file the same name as an existing filename, the contents of the existing file are overwritten or ``clobbered''. The existing file is deleted. (You can make the system refuse to overwrite existing files by setting the noclobber variable: see ``Specifying command input and output'' for details.)

Removing a file

To remove (or destroy) a file, use the rm(1) command, as follows:

rm filename

Once a file is removed from the system, there is no way of getting it back unless a backup exists on tape or floppy disk, or the filename is a link. Links are explained in ``Creating links to files and directories''.

You can list several files to be removed, or use wildcards to select files. You cannot remove directories with this form of the rm command.

NOTE: It is potentially dangerous to use wildcards to remove files. Before doing so, you should confirm that the correct files have been selected: do so by running the ls command in place of rm. Because the expansion of any filename notation is handled by the shell and not by the individual command, the files selected by ls are the same as those that will be selected by rm.

To remove files interactively, use the -i option, as follows:

rm -i filename1 filename2 . . .

rm with the -i option asks for confirmation before removing a file. A question mark is displayed and you can either type ``y'' to remove the file, or ``n'' to not remove it. It is a good idea to use rm -i to reduce the risk of accidentally removing files. For example, to remove several files from the current directory:

   $ rm -i f*
   file1: ?y
   file2: ?y
   file3: ?y
   format.doc: ?n
As a further safeguard, it may be useful to create an alias, whereby executing rm -r * actually executes rm -ir *: the -i option causes rm to delete files interactively; that is, you must confirm the deletion of each file before it is carried out. See ``Using aliases'' for details of how to create an alias.

Note that using wildcards does not remove hidden files (those whose name begins with a dot); that is, typing rm * does not necessarily remove all the files in a directory. To list the hidden files, type ls -a. For example, if you have a file called .project, you can remove it by typing the following:

   $ rm .project
Remember that there are always at least two files that cannot be removed from a directory; ``.'' (the current directory), and ``..'' (the parent directory).

You can remove a file from a directory other than your current one if you have write permissions on that directory.

Removing files with difficult names

Occasionally, files are created by accident with awkward names. For example, they might contain a slash (/) or an asterisk (*) character. These files cannot be removed by normal means without the risk of destroying other files, because if you try to type their names, the shell will interpret the special characters as wildcards.

For example, suppose you have a directory that contains a corrupted file called all * file and a number of files called file1 and file2 that you want to keep. If you type rm all * file, rm will interpret the ``*'' in the filename as a wildcard, and attempt to execute the following:

rm all file1 file2 file

This command thereby inadvertently deletes file1 and file2.

To correctly remove files with corrupted names, the easiest solution is to use the rm -i option. In this case, rm will prompt you for confirmation before removing each of the specified files in the current directory; type ``n'' for each file other than the corrupt one you want to remove, as follows:

   $ rm -i a*
   all * file: ? y
Alternatively, specify the name of the file, surrounding it with single quotes:
   $ rm 'all * file'
The single quotes prevent the shell from expanding the special character ``*'' in the file's name.

If you have a file that begins with a hyphen (-), rm will mistake its name for an option of some kind. For example, if your file is called -myfile, rm -myfile will be mistaken for an invalid rm command. You can overcome this by invoking rm with the special option, --, which tells rm that the following argument is not an option:

   $ rm -- -myfile
If you have a file that begins or contains illegal characters you may not be able to delete it using rm. An alternative to rm is to find out the inode number of the file and then use the find command to delete it. For example, in the following output listing from an ls -il command, you can see that the file named BadFileName is corrupt in some way.
   8012 -rw-r--r--    1 docmon   other       15 Dec 30 13:2BadFileName
   8015 -rw-r--r--    1 docmon   other     3245 Dec 30 13:26 file1
   7732 -rw-r--r--    1 docmon   other      332 Dec 30 13:26 file2
   7940 -rw-r--r--    1 docmon   other     9721 Dec 30 13:26 file3
For the purpose of this example, the file was created with two backspace characters in front of its name, this is why it is displayed two characters to the left of file1, file2 and file3. The left most column of the display is the inode number, so for BadFileName the inode number is 8012. To delete this file you can now use the find command.

find . -inum 8012 -exec rm {} \;

See ``Regular expressions'' for an explanation of shell wildcards. See ``Filenaming conventions'' for an explanation of what constitutes an illegal filename.

Comparing files

It is often necessary to compare the contents of two files and list any differences. This may be because you have made some changes to a file and cannot remember them; if you have a previous version of the file, you can compare the two. You may have two files with the same name in different directories; you can compare them to see if they are different files or two versions of the same file.

To see if two files differ, use the cmp(1) (compare) command which reads file1 and file2 and reports whether or not they are different:

cmp file1 file2

If they differ, cmp reports the point at which the two files diverge. This is reported in terms of the number of characters into file1 at which the difference was detected, and the number of the line containing that character, as follows:

   $ cmp chapter4 chapter4.bak
   chapter4 chapter4.bak differ: char 28895, line 849
In this case, the two files are the same up to a point on line 849 of chapter4.

To see the precise differences on a line-by-line basis, use the diff(1) command, as follows:

diff filename1 filename2

For example, consider the following two files, note.1 and note.2:

   $ cat note.1
   Please send me a report.
   I need it for tomorrow's meeting.
   $ cat note.2
   Please send me a report today.
   I need it for tomorrow's meeting.
   Thank you
To compare these files, line by line, use the diff command as follows:
   $ diff note.1 note.2
   < Please send me a report.
   > Please send me a report today.
   < Thanks
   > Thank you
The ``2c2'' means that there is a change (``c'') between line 2 in the first file and line 2 in the second. Likewise, the ``4c4'' means that there is a change between line 4 in the first file and line 4 in the second. The ``<'' refers to a line in the first file, and ``>'' refers to a line in the second file. The ``---'' separates the output from each file.

If you want to compare three files, use the diff3(1) command. If you want to compare two sorted files, use comm(1).

Sorting the contents of a file

You can sort a file containing lines of text or numerical data in a variety of ways using the sort(1) command. For example, suppose you have a file called names containing the following:

To sort its contents alphabetically, enter the following command:
   $ sort names
To direct the sorted output to a file (names1) rather than the screen (standard output), you can use either of the following command lines:
   $ sort -o names1 names
   $ sort names > names1
You can cause the original file to be sorted by giving the original filename for both arguments.

You can make sort merge two files together, in order. To do this, type the following:

sort filename1 filename2 > filename3

This creates filename3, which contains the sorted, merged contents of filename1 and filename2. (The sort command sorts the files as it merges them.) You can use the -u option to tell sort to make sure that each line in filename3 is unique; that is, if both filename1 and filename2 contain an identical line, only one copy of the line will be written to filename3:

   $ cat file1
   $ cat file2
Running the sort command on these files merges the contents and places them in alphabetic order, as follows:
   $ sort -u file1 file2 >file3
   $ cat file3
There are several more options that can be used with sort. For example, -r sorts in reverse order; -n sorts on numerical order, not text order; -M causes sort to assume that the first three characters of the field being sorted are months (like ``JAN'', ``FEB'', ``MAR'', and so on) and sorts them into date order.

You can make sort select any field in a line and have it base its comparisons on that field, as follows:

   $ cat birthdays
   charles FEB
   bridget DEC
   sarah   JAN
   $ sort -M +1 birthdays
   sarah   JAN
   charles FEB
   bridget DEC
The +1 flag tells sort to make comparisons between records on the basis of the second field of each line. So, the month abbreviation on each line of the file is used as the basis for the sort operation above, and not the alphabetic order of the first field.

If you have a file where data records are made up of fields separated by some special character (called a ``separator''), you can tell sort to use that separator by using the -t option, as follows:

   $ cat birthdays
   $ sort -M +1 -t: birthdays

Searching for text in a file

To search one or more files for some text, you can use the grep(1) command, as follows:

grep options text filenames

(``grep'' is an acronym for ``global regular expression print''; for a full explanation of regular expressions, see ``Regular expressions''.)

grep searches the contents of filenames for text, and prints any matches. You might want to do this if you cannot remember the name of a file in which you left some information, but can remember enough of it for grep to find it for you.

For example, you might want to locate a memo in the current directory (full of files called something.memo), when you know that the file you are looking for contains the string ``Subject''. The command to use is as follows:

   $ grep 'Subject' *.memo
   stan.memo:Subject:  That's another fine mess you've gotten us into!
grep prints the context of any matches, line by line, with the relevant filename (where more than one file was specified for the search) followed by the line of text that contains the specified string.

The single quote (' ') marks are necessary if you want to search for a string containing spaces, tab characters, or double quote marks. Double quote (" ") marks are necessary if you want to search for a string containing single quote marks; you should put a backslash immediately in front of each quote character (\' \'), as follows:

   $ grep "I\'m right" stan.memo
   Thanks for nothing. I'm right in the center of it (or
If you are not sure whether the string is uppercase, capitalized, or all lowercase letters, use the grep -i (ignore case) option; grep ignores the case of the text in the files being searched, and report all matches, as follows:
   $ grep -i 'PhD' database.memo
   yesterday, when he was awarded his PhD in New
   not so easy to get a pHD nowadays, what with
   is it PHD, phd, phD, etc? He should have stopped
This search has found all lines containing the string ``PhD'', irrespective of how it is capitalized.

If you want to see all lines in a file that do not contain the string, use grep -v.

The use of regular expressions and pattern matching in search operations is explained in ``Regular expressions''. See also regexp(1tcl).

If you have a file containing columns of data in textual form, you can extract information from it using a variety of tools. For example, supposing you have a file called blackbook containing names, extension numbers, login names and dates, in a format like the following:

   Michael Stand:571:mikes:JAN-1-91
   Sue Penny:284:suep:FEB-6-89
   Joshua Ford:954:joshf:JUL-30-88
   Liz Addams:553:liza:AUG-15-91
To see Sue Penny's record, use the following command:
   $ grep Sue blackbook
   Sue Penny:284:suep:FEB-6-89
This is hard to read. To see only Sue's extension number (the second field), you can use the cut(1) command, as follows:
   $ grep Sue blackbook | cut -f2 -d:
The cut command extracts individual fields from a file containing records. The -f2 option tells cut to extract only the second field of each record; the -d: option means that fields are delimited with a colon. In this way, input records may contain spaces and tabs without these characters signaling the start of a new field.

The pipe (|) tells grep to send its output to another program (in this case, to cut) instead of the standard output. See ``Running commands in a pipeline'' for more information on pipes.

To see a list of all the people in your file, followed by their login names, you do not need to use grep: instead, use the cut command, as follows:

   $ cut -f1,3 -d: blackbook
The -f1,3 option tells cut to extract the first and third fields in each record:
   Michael Stand:mikes
   Sue Penny:suep
   Joshua Ford:joshf
   Liz Addams:liza
If you want to put your list in alphabetic order, you can sort it as follows:
   $ cut -f1,3 -d: blackbook | sort -df
   Joshua Ford:joshf
   Liz Addams:liza
   Michael Stand:mikes
   Sue Penny:suep
A more powerful and versatile tool for this sort of operation is the awk(1) command.

Permanent executable copies of complex command lines like these search tools can be stored in shell script files for future use.

Finding files

To search the system for a particular file, use the find(1) command, as follows:

find start_point -follow -name filename -print

The start_point argument tells find where to start searching in the filesystem, for example, root. find searches its starting directory, and all the subdirectories. If you know your file is in one of your own subdirectories you could tell find to start searching from $HOME.) The -follow option tells find that if it encounters a symbolic link, it should follow it to the file the link points to, as described in ``Navigating symbolic links''. The -name option is followed by the name of the file you are looking for. Every time find sees a file with this name, it carries out the actions specified by the subsequent options. For example, the -print option tells find that the action it must take when it finds filename is to print its pathname:

   $ find /tmp -name myfile.tmp -print
find gives lots of error messages when you do not have permission to search a directory, for example:
   $ find / -name chap3 -print

find: cannot chdir to /etc/conf/pack.d/arp find: cannot chdir to /etc/conf/pack.d/arpproc . . .

To suppress these error messages, redirect the error output of the find command to /dev/null (the UNIX system's ``black hole'' directory), as follows:
   $ find / -follow -name chap3 -print 2> /dev/null
For more information on redirecting output, see ``Specifying command input and output''.

find can be used to apply a command to a collection of files that match some selection test, for example, files that are older than a specified age. You can remove all files in your home directory, and all its subdirectories, that have not been accessed for seven days by typing the following command line:

   $ find $HOME -follow -name '*' -atime +7 -exec rm {} +
find starts from the directory specified by its first parameter (in this case, the value of $HOME), follows symbolic links, and selects all the files matching the designated name (in this case, '*') that were last accessed (-atime) seven or more days ago. It then executes (-exec) the rm command on the found files (represented in the expression by {}). The ``+'' at the end of the line specifies that the rm command will be used once with the list of found files as argument to rm.

NOTE: The single quotes around the ``*'' are required. Otherwise find searches for files with names matching those matched by ``*'' in the current directory.

The -exec option allows the execution of any legal shell command along with any permitted options and arguments.

find can carry out other tasks when it finds a file. For example, the following command causes find to execute cp on any file called datafile in the directory /bin; this file is then copied to your home directory.

   $ find /bin -follow -name datafile -exec cp {} $HOME \;

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 22 April 2004