[Next] [Previous] [Top] [Contents] [Index]
VxFS System Administrator's Guide
The VERITAS File System
The VERITAS File System (VxFS or
vxfs) is an extent based, journaling file system intended for use with SCO UnixWare(TM). VxFS provides enhancements that increase SCO UnixWare usability and make the UNIX system more viable for use in the commercial marketplace. VxFS is particularly useful in environments that require high performance and availability and deal with large volumes of data.
The VERITAS File System is available in two forms:
The following topics are covered in this chapter:
- base VxFS
- the separately licensed VxFS Advanced feature set, which provides additional functionality and better performance
Note: This guide is intended for use with both the VxFS and VxFS Advanced feature set. Some of the material covered applies only to the VxFS Advanced feature set.
This chapter provides an overview of most of the VERITAS File System features. Some features are described in more detail in later chapters. The basic features include:
In addition, the VxFS Advanced feature set offers the following features:
- extent based allocation
- extent attributes
- fast file system recovery
- access control lists (ACLs)
The VxFS file system supports a maximum file system size of one terabyte, a file size of two terabytes, and file names up to 255 characters long.
- online administration
- online backup
- enhanced application interface
- enhanced mount options
- improved synchronous write performance
- support for large file systems (up to 1 terabyte)
- support for large files (up to 2 terabytes)
- enhanced I/O performance
- support for BSD style quotas
VxFS does not have inherent limitations on the maximum number of concurrently mounted file systems or concurrently accessed files. Logical block sizes of 1024 bytes (default), 2048 bytes, 4096 bytes, and 8192 bytes are supported.
The VxFS file system supports all
ufs file system features and facilities except for the following:
VxFS also provides support for the Discretionary Access Control features of the
- support for removing or renaming "." and ".." directory entries (these particular operations are disallowed to preserve file system sanity)
sfs file system.
Disk Layout Options
Three disk layout formats are available with VxFS:
- The Version 1 disk layout is the original layout used with earlier releases of VxFS.
The Version 2 disk layout supports features such as:
- dynamic inode allocation
- enhanced security
Version 4 is the new and default VxFS disk layout. It adds support for:
See Chapter 2, "Disk Layout," for a description of the disk layouts.
- files up to 2 terabytes
- file systems up to 1 terabyte
Note: The Version 3 disk layout is not supported on SCO UnixWare.
File System Performance Enhancements
s5 file system and the
ufs file systems supplied with SCO UnixWare use block based allocation schemes which provide good random access to files and acceptable latency on small files. For larger files, however, this block based architecture limits throughput. This makes the
ufs file systems less than optimal for commercial environments.
VxFS addresses this file system performance issue by using a different allocation scheme and by providing increased user control over allocation, I/O, and caching policies. The following performance enhancing features are provided by VxFS:
Details on the use of some of the preceding features can be found in the following sections and also in Chapter 5, "Performance and Tuning," and Chapter 6, "Application Interface."
- extent based allocation
- enhanced mount options
- data synchronous I/O
- direct I/O and discovered direct I/O
- caching advisories
- enhanced directory features
- explicit file alignment, extent size, and preallocation controls
- tuneable I/O parameters
- tuneable indirect data extent size
- integration with VERITAS Volume Manager(TM) (VxVM®)
To provide a conceptual understanding of how the VxFS allocation scheme differs from block based allocation, an overview of this architecture is covered in the section entitled "Extent based Allocation."
Extent Based Allocation
Disk space is allocated by the system in 512 byte sectors, which are grouped to form a logical block. VxFS supports logical block sizes of 1024, 2048, 4096, and 8192 bytes. The default block size is 1K for file systems up to 8 GB, 2K for file systems up to 16 GB, 4K for file systems up to 32 GB, and 8K for larger file systems.
An extent is defined as one or more adjacent blocks of data within the file system. An extent is presented as an address-length pair, which identifies the starting block address and the length of the extent (in file system or logical blocks). When storage is added to a file on a VxFS file system, it is grouped in extents, as opposed to being allocated a block at a time (as is done with the
ufs file systems).
By allocating disk space to files in extents, disk I/O to and from a file can be done in units of multiple blocks. This type of I/O can occur if storage is allocated in units of consecutive blocks. For sequential I/O, multiple block operations are considerably faster than block-at-a-time operations. Almost all disk drives accept I/O operations of multiple blocks.
Extent allocation makes the interpretation of addressed blocks from the inode structure only slightly different from that of block based inodes. The
ufs file system inode structure contains the addresses of 12 direct blocks, one indirect block, and one double indirect block. An indirect block contains the addresses of other blocks. The
ufs indirect block size is 8K and each address is 4 bytes long. A
ufs inode therefore can address 12 blocks directly and up to 2048 more blocks through one indirect address.
A VxFS inode is similar to the
ufs inode. It references 10 direct extents, each of which are pairs of starting block addresses and lengths in blocks. The VxFS inode also points to two indirect address extents, which contain the addresses of other extents:
Each indirect address extent is 8K long and contains 2048 entries. All indirect data extents for a file must be the same size: this value is set when the first indirect data extent is allocated and it is stored in the inode. Directory inodes always use an 8K indirect data extent size. By default, regular file inodes also use an 8K indirect data extent size (this can be changed with
- The first indirect address extent is used for single indirection, where each entry in the extent indicates the starting block number of an indirect data extent.
- The second indirect address extent is used for double indirection, where each entry in the extent indicates the starting block number of a single indirect address extent.
vxtunefs), but they allocate and use the indirect data extents in clusters to simulate larger extents.
In Version 4, VxFS introduced a new type of inode block map organization for indirect extents known as the typed extents. Each entry in the block map consists of a typed descriptor record containing a type, offset, starting block, and number of blocks.
Note: The information in this section applies to the Version 4 disk layout.
Indirect as well as data extents use this format to identify logical file offsets and physical disk locations of any given extent. The extent descriptor fields are defined as follows:
Some notes about typed extents:
- Uniquely identifies and defines an extent descriptor record, and defines the record's length and format.
- Represents the logical file offset in blocks for a given descriptor. Used to optimize lookups and to eliminate hole descriptor entries.
- starting block
The starting file system block of the extent.
- number of blocks
The number of contiguous blocks in the extent
Currently, the typed format is used on regular files only when indirection is needed. Typed records are longer than the previous format and therefore less direct entries can be used in the inode. Newly created files start out using the old format which allows for 10 direct extents in the inode. The inode's block map is converted to the typed format when indirection is needed. This allows us to take full advantage of both formats.
- Indirect address blocks are fully typed and may have variable lengths up to a maximum of 8K (this is the optimum size). On a fragmented file system, indirect extents may be smaller than 8K depending on space availability. VxFS always tries to obtain 8K indirect extents, but will use smaller indirects if needed.
- Indirect Data extents are variable in size. This allows files which must go to indirects to continue to allocate large, contiguous extents and take full advantage of VxFS's optimized I/O.
- Holes in sparse files require no storage. Since a typed record contains the offset and length of a descriptor, holes are eliminated entirely. A hole is determined by adding the offset and length of a descriptor and comparing the result with the offset of the next record.
- There are no limits on the levels of indirection. It is expected however, that fewer levels will be seen with this format given that data extents are of variable lengths.
- New types can be added in the future. Since this format uses a type indicator to determine it's record format and content, new types can be added to accommodate future requirements and new functionality.
The VxFS file system allocates disk space to files in groups of one or more extents. VxFS also allows applications to control some aspects of the extent allocation for a given file. Extent attributes are the extent allocation policies associated with a file.
getext commands allow the
administrator to set or view extent attributes associated with a file,
as well as to preallocate space for a file. Refer to Chapter 3, "Extent Attributes," Chapter 6, "Application Interface,"
for discussions on how to use extent attributes.
vxtunefs command allows the administrator to set or
view the default indirect data extent size. Refer to Chapter 5, "Performance and Tuning,"
manual page for discussions on how
to use the indirect data extent size feature.
Fast File System Recovery
ufs file systems rely on the full structural verification by the
fsck utility as the only means to recover from a system
failure. This involves checking the entire
structure, verifying that the file system is intact, and correcting any
inconsistencies that are found. For large disk configurations, this
process can be very time consuming
fsck_vxfs(1M) manual page
for more information).
The VxFS file system provides recovery only seconds after a system failure by utilizing a tracking feature called intent logging. Intent logging is a scheme that records pending changes to the file system structure. These changes are recorded in a circular intent log.
During system failure recovery, the VxFS
fsck utility performs an intent log replay, which scans the intent log, nullifying or completing file system operations that were active when the system failed. The file system can then be mounted without completing a full structural check of the entire file system. Except for the fact that VxFS file system can be recovered in a few seconds, the intent log recovery feature is not readily apparent to either the user or the system administrator.
When the disk has a hardware failure, the intent log replay may not be able to completely recover the damaged file system structure. In such cases, the full structural mode of
fsck provided with VxFS must be run.
The VxFS file system supports the Discretionary Access Control (DAC) mechanism to provide control over user access to files. Permission to access a file with DAC is provided in two forms:
- Permission Bits
Permission bits control user access to
files. These can be set to specify whether the owner, group, and others
(i.e., everyone else) have permission to read, write, or execute
a file. Specific users other than the owner cannot be allocated file
permissions using this approach. Refer to the
manual pages for information on viewing and
setting file permissions.
- Access Control Lists
Control List (ACL) is composed of a series of entries that identify
specific users or groups and their access privileges for a particular
file. A file may have its own ACL or may share an ACL with other files.
ACLs have the advantage of being able to specify detailed access
permissions for multiple users and groups.
Refer to the
manual pages for information on viewing and
Online System Administration
A VxFS file system can be defragmented and resized while it remains online and accessible to users. The following sections contain detailed information about these features.
Free resources are originally aligned in the most efficient order possible and are allocated to files in a way that is considered by the system to provide optimal performance. When a file system is active for extended periods of time, files grow, shrink, are created, and are removed. Over time, the original ordering of free resources is lost. As this process continues, the file system tends to spread further and further along the disk, leaving unused gaps or fragments between areas that are in use. This process is known as fragmentation. Fragmentation leads to degraded performance because the file system has fewer choices when selecting an extent (a group of contiguous data blocks) to assign to a file.
s5 file system uses the
dcopy utility to reorganize a file system and remove fragmentation, but it has two drawbacks:
- The file system must be unmounted.
dcopy is time consuming.
ufs file system uses the concept of cylinder groups to limit fragmentation. Cylinder groups are self contained sections of a file system that are composed of inodes, data blocks, and bitmaps that indicate free inodes and data blocks. Allocation strategies in
ufs attempt to place inodes and data blocks in close proximity. This reduces fragmentation but does not eliminate it.
The VxFS file system provides the online administration utility
fsadm to resolve the problem of fragmentation. One of the functions of the
fsadm utility is to defragment a mounted file system. To defragment, the
- removes unused space from directories
- makes all small files contiguous
- consolidates free blocks for file system use
fsadm utility can be run on demand; it should be
scheduled regularly as a
cron job (see the
fsadm_vxfs(1M) manual page for
When a file system is created, it is assigned a specific size. Changes in system usage may result in file systems that are too small or too large for the new usage.
s5 file systems, there are traditionally three solutions to the problem of a file system that is too small:
When a file system is too large, most file systems make reclaiming the unused space a matter of off-loading the contents of the file system and rebuilding it to a new size. The solutions provided by the
- Move some users to a different file system.
- Move a subdirectory of the file system to a new file system.
- Copy the entire file system to a larger file system.
s5 file systems are undesirable as they require that the file system be unmounted, and users are unable to access the file system while it is being modified.
The VxFS file system utility
fsadm provides a mechanism to solve these problems without unmounting the file system or interrupting users' productivity.
fsadm enables the VxFS file system to be resized while it is mounted. A file system can be expanded or shrunk via
fsadm. However, since the VxFS file system may only be mounted on one device, expanding the file system means that the underlying device must also be expandable while the file system is mounted.
VxVM allows expandability by providing virtual disks that can be expanded while being accessed. The VxFS and VxVM packages work together to provide online expansion capability. For additional information about the online expansion capabilities of VxVM, refer to the VERITAS Volume Manager System Administrator's Guide.
The VxFS file system provides a method for performing online backup of data using the "snapshot" feature of VxFS. A snapshot image of a mounted file system is created by "snapshot" mounting another file system, which then becomes an exact read-only copy of the first file system. The original file system is said to be snapped, and the copy is called the snapshot. The snapshot is a consistent view of the snapped file system at the point in time when the snapshot was made.
When changes are made to the snapped file system, the old data is first copied to the snapshot so that it is retained. When the snapshot is read, the old data is returned if the data was changed, or the current data from the snapped file system is returned. Backups are made by one of the following methods:
information about performing online backups, see Chapter 4, "Online Backup"
and vxdump(1M) manual pages.
- copying selected files from the snapshot file system (using find and cpio)
- backing up the entire file system (using
volcopy or fscat)
- doing a full or incremental backup (using vxdump)
The VxFS file system conforms to the System V Interface Definition (SVID) requirements and supports access using the Network File System (NFS). In addition to supporting these standard interfaces, VxFS provides enhancements that can be taken advantage of by applications that require performance features not provided by other file systems. These enhancements are introduced in this section and covered in detail in Chapter 6, "Application Interface."
In most cases, any application designed to run on the
ufs file systems should run transparently on the VxFS file system. The only exceptions are applications that depend on the pathname truncation that occurs when using the
s5 file system. These applications are not portable to
s5 truncates pathnames to 14 characters.
Applications that run on a
ufs file system should function identically on a VxFS file system.
Expanded Application Facilities
The VxFS file system provides some facilities frequently associated with commercial applications. These facilities make it possible to
Since these facilities are provided using VxFS-specific ioctl system
calls, most existing UNIX system applications
do not use these facilities. The cp, cpio, and
- preallocate space for a file
- specify a fixed extent size for a file
- bypass the system buffer cache for file I/O
- specify the expected access pattern for a file
utilities use these facilities to preserve extent attributes and
allocate space more efficiently. The current attributes of a file can
be listed using the
Custom applications can use these facilities to receive the benefits of
the resulting performance improvement. For portability reasons, these
applications should check what file system type they are using before
using these interfaces.
The VxFS file system supports extended
mount options to specify:
Details pertaining to the VxFS
- enhanced data integrity modes
- enhanced performance mode
- temporary file system modes
- improved synchronous writes
mount options can be found
in Chapter 5, "Performance and
Tuning," and in the
Enhanced Data Integrity Modes
Note: Performance tradeoffs are associated with these mount options.
ufs file systems are "buffered" in the sense that resources are allocated to files and data is written asynchronously to files. File systems are buffered in this way to provide better performance. In general, the buffering schemes work well without compromising data integrity.
If a system failure occurs while a process is allocating space to a file, uninitialized data or data from another file may be present in the extended file after reboot. Also, data written shortly before the system failure may be lost.
In environments where performance is more important than absolute data integrity, the preceding situation is not of great concern. However, for environments where data integrity is critical, the VxFS file system provides a
blkclear for Data Integrity
mount -o blkclear option that guarantees that uninitialized data does not appear in a file.
The VxFS file system provides a
closesync for Data Integrity
mincache=closesync option, which is useful in desktop environments where users are likely to shut off the power on the machine without halting it first. With the
closesync mode, only files that are currently being written when the system crashes or is turned off can lose data. In this mode, any changes to the file are flushed to disk when the file is closed.
Enhanced Performance Mode
ufs file systems are asynchronous in the sense that structural changes to the file system are not immediately written to disk. File systems are designed this way to provide better performance. However, if a system failure occurs, recent changes to the file system may be lost: attribute changes to files may disappear, and recently created files may be removed.
The default logging mode provided by VxFS (
log) guarantees that all structural changes to the file system have been logged to disk before the system call returns to the application. If a system failure occurs,
fsck replays any recent changes so that no metadata is lost. Recently written file data is lost unless a request was made to
sync it to disk.
The VxFS file system provides a
delaylog for Enhanced Performance
delaylog option, which can be used to increase performance. With the
delaylog option, the logging of some structural changes is delayed. If a system failure occurs, recent changes may be lost. This option provides at least the same correctness guarantees that traditional UNIX file systems provide for system failure, along with fast file system recovery.
Temporary File System Modes
On most UNIX systems, temporary file system directories (such as
/usr/tmp) are commonly used to hold files that do not need to be retained when the system reboots. Since such file systems are temporary, there is no need for the underlying file system to maintain a high degree of structural integrity for these directories.
The VxFS file system provides a
tmplog For Temporary File Systems
tmplog option that allows the user to get higher performance on temporary file systems. With this option enabled, the logging of practically all operations is delayed for improved performance.
The VxFS file system provides a
nolog For Temporary File Systems
nolog option to
mount that disables intent logging. With this option enabled, system performance is considerably improved. However, in the event of a system failure, it is likely that any recently changed files on a
nolog file system will contain random data.
Since the intent log is disabled, fast file system recovery does not work with this option: a full structural check must be run instead.
nolog option is only recommended for file systems that will be remade with mkfs after every system failure.
Improved Synchronous Writes
VxFS provides superior performance for synchronous write applications.
datainlog option to
mount greatly improves the performance of small synchronous writes (typically used by Network File System servers).
datainlog is a default option to
The use of the
convosync=dsync option to
mount improves the performance of applications that require synchronous data writes but not synchronous inode time updates.
Note: Use of the
convosync=dsync option violates POSIX semantics.
Enhanced I/O Performance
VxFS provides enhanced I/O performance by an aggressive I/O clustering policy, providing integration with the VxVM, and allowing the system administrator to set application specific parameters on a per-file system basis.
Enhanced I/O Clustering
I/O clustering is a technique of grouping multiple I/O operations together for improved performance. The VxFS I/O clustering policies provide more aggressive I/O clustering than other file systems and offer higher I/O throughput when using large files. When accessing large files, performance is comparable to that provided by raw disk.
VxFS interfaces with VxVM to determine the I/O characteristics of the underlying volume and perform I/O accordingly. It also uses this information at
mkfs time to perform proper allocation unit alignments to prepare for efficient I/O operations from the kernel.
As part of the VxFS/VxVM integration, VxVM exports a set of I/O parameters to achieve better I/O performance. This interface can be used to achieve enhanced performance for different volume configurations (such as RAID-5, striped, and mirrored volumes). For a RAID-5 volume, full stripe writes are important for good I/O performance. VxFS uses these parameters to issue appropriate I/O requests to VxVM to get better performance from the system.
System administrators can also set application specific parameters on a per-file system basis to improve I/O performance.
- Default Indirect Extent Size
- On disk layout versions 1 and 2, this value can be set up to determine the indirect data extent size. All the indirect extents would be allocated in this size, provided a fixed extent size is not set and the file does not already have indirect extents. The Version 4 disk layout uses typed extents, which have variable sized indirects.
- This value defines the minimum I/O size above which all the sizes would be performed as direct I/O.
For a discussion on the usage of VxVM integration and performance
benefits, refer to Chapter 5,
"Performance and Tuning," Chapter
6, "Application Interface," and the
- This value defines the maximum size of a single direct I/O.
The VxFS file system supports the Berkeley Software Distribution (BSD) style user quotas, which can be used to allocate per-user quotas on VxFS file systems. The quota system limits the use of two principal resources of a file system: files and data blocks. The system administrator can assign users quotas for each of these resources. A quota consists of two limits for each resource:
The system administrator is responsible for assigning hard and soft limits to users.
- The hard limit represents an absolute limit on data blocks or files. The user may never exceed the hard limit under any circumstances.
- The soft limit is lower than the hard limit and may be exceeded for a limited amount of time. This allows users to temporarily exceed limits if needed, as long as they are back under those limits before the allowed time limit expires.
For additional information on quotas, refer to Chapter 7, "Quotas."
Support for Large Files
The changes implemented with the Version 4 disk layout have greatly expanded file system scalability. Because file system structures are no longer in fixed locations, VxFS can now support files up to two terabytes in size (see Chapter 2, "Disk Layout").
File systems can be created or mounted with or without large files by specifying the
nolargefiles option of the
nolargefiles is specified, a file system will not
contain any files 2 gigabytes or larger, and large files cannot be
largefiles is specified, the file system
allows files 2 gigabytes or larger (see the
Note: Be careful when enabling large file system capability. System administration utilities such may experience problems if they are not large file aware.
VxFS System Administrator's Guide
[Next] [Previous] [Top] [Contents] [Index]