VxFS System Administrator's Guide
This chapter describes how the application writer can optimize applications for use with the VxFS. To optimize VxFS for use with applications, see Chapter 5, "Performance and Tuning."
The following topics are covered in this chapter:
All advisories are set using the
VX_SETCACHE ioctl command. The current set of advisories can be obtained with the
VX_GETCACHE ioctl command. For details on the use of these ioctl commands,
VX_DIRECTadvisory is set, the user is requesting direct data transfer between the disk and the user-supplied buffer for reads and writes. This bypasses the kernel buffering of data, and reduces the CPU overhead associated with I/O by eliminating the data copy between the kernel buffer and the user's buffer. This also avoids taking up space in the buffer cache that might be better used for something else. The direct I/O feature can provide significant performance gains for some applications.
For an I/O operation to be performed as direct I/O, it must meet certain alignment criteria. The alignment constraints are usually determined by the disk driver, the disk controller, and the system memory management hardware and software. The file offset must be aligned on a sector boundary. The transfer size must be a multiple of the sector size.
If a request fails to meet the alignment constraints for direct I/O, the request is performed as data synchronous I/O. If the file is currently being accessed by using memory mapped I/O, any direct I/O accesses are done as data synchronous I/O.
Since direct I/O maintains the same data integrity as synchronous I/O, it can be used in many applications that currently use synchronous I/O. If a
direct I/O request does not allocate storage or extend the file, the inode is not immediately written.
The CPU cost of direct I/O is about the same as a raw disk transfer. For sequential I/O to very large files, using direct I/O with large transfer sizes can provide the same speed as buffered I/O with much less CPU overhead.
If the file is being extended or storage is being allocated, direct I/O must write the inode change before returning to the application. This eliminates some of the performance advantages of direct I/O.
The direct I/O and
VX_DIRECT advisories are maintained on a per-file-descriptor basis.
VX_UNBUFFEREDadvisory is set, I/O behavior is the same as direct I/O with the
VX_DIRECTadvisory set, so the alignment constraints that apply to direct I/O also apply to unbuffered. For I/O with unbuffered I/O, however, if the file is being extended, or storage is being allocated to the file, inode changes are not updated synchronously before the write returns to the user. The
VX_UNBUFFEREDadvisory is maintained on a per-file-descriptor basis.
VX_SETCACHEioctl. When the file system gets an I/O request larger than the
discovered_direct_iosz,it attempts to use direct I/O on the request. For large I/O sizes, Discovered Direct I/O can perform much better than buffered I/O.
Discovered Direct I/O behavior is similar to direct I/O and has the same alignment constraints, except writes that allocate storage or extend the file size do not require writing the inode changes before returning to the application.
For information on how to set the
discovered_direct_iosz, see "I/O Tuning" in Chapter 5.
VX_DSYNCadvisory is set, the user is requesting data synchronous I/O. In synchronous I/O, the data is written, and the inode is written with updated times and (if necessary) an increased file size. In data synchronous I/O, the data is transferred to disk synchronously before the write returns to the user. If the file is not extended by the write, the times are updated in memory, and the call returns to the user. If the file is extended by the operation, the inode is written before the write returns.
Like direct I/O, the data synchronous I/O feature can provide significant application performance gains. Since data synchronous I/O maintains the same data integrity as synchronous I/O, it can be used in many applications that currently use synchronous I/O. If the data synchronous I/O does not allocate storage or extend the file, the inode is not immediately written. The data synchronous I/O does not have any alignment constraints, so applications that find it difficult to meet the alignment constraints of direct I/O should use data synchronous I/O.
If the file is being extended or storage is allocated, data synchronous I/O must write the inode change before returning to the application. This case eliminates the performance advantage of data synchronous I/O.
The direct I/O and
VX_DSYNC advisories are maintained on a per-file-descriptor basis.
VX_SEQadvisory indicates that the file is being accessed sequentially. When the file is being read, the maximum read-ahead is always performed. When the file is written, instead of trying to determine whether the I/O is sequential or random by examining the write offset, sequential I/O is assumed. The pages for the write are not immediately flushed. Instead, pages are flushed some distance behind the current write point.
VX_RANDOM advisory indicates that the file is being accessed randomly. For reads, this disables read-ahead. For writes, this disables the flush-behind. The data is flushed by the pager, at a rate based on memory contention.
VX_NOREUSE advisory is used as a modifier. If both
VX_NOREUSE are set, pages are immediately freed and put on the quick reuse free list as soon as the data has been used. If
VX_NOREUSE is set when doing sequential I/O, pages are also put on the quick reuse free list when they are flushed. The
VX_NOREUSE may slow down access to the file, but it can reduce the cached data held by the system. This can allow more data to be cached for other files and may speed up those accesses.
VX_SETEXTioctl command allows an application to reserve space for a file, and set fixed extent sizes and file allocation flags. The current state of much of this information can be obtained by applications using the
getextcommand provides access to this functionality). For details, see the getext(1), setext(1), and vxfsio(7) manual pages.
Each invocation of the
VX_SETEXT ioctl affects all the elements in the
vx_ext structure. When using
VX_SETEXT, always use the following procedure:
VX_GETEXTto read the current settings.
VX_SETEXTto set the values.
VX_SETEXTioctl is issued, the reservation value is set in the inode on disk. If the file size is less than the reservation amount, the kernel allocates space to the file from the current file size up to the reservation amount. When the file is truncated, space below the reserved amount is not freed. The
VX_CONTIGUOUSflags can be used to modify reservation requests. It should be noted that
VX_NOEXTENDis the only one of these flags that is persistent; the other flags may have persistent effects, but they are not returned by the
VX_TRIM flag is set, when the last close occurs on the inode, the reservation is trimmed to match the file size and the
VX_TRIM flag is cleared. Any unused space is freed. This can be useful if an application needs enough space for a file, but it is not known how large the file will become. Enough space can be reserved to hold the largest expected file, and when the file has been written and closed, any extra space will be released.
VX_NOEXTEND flag is set, an attempt to write beyond the current reservation, which requires the allocation of new space for the file, fails instead. To allocate new space to the file, the space reservation must be increased. This can be used like
ulimit to prevent a file from using too much space.
VX_CONTIGUOUS flag is set, any space allocated to satisfy the current reservation request is allocated in one extent. If there is not one extent large enough to satisfy the request, the request fails. For example, if a file is created and a 1 MB contiguous reservation is requested, the file size is set to zero and the reservation to 1 MB. The file will have one extent that is 1 MB long. If another reservation request is made for a 3 MB contiguous reservation, the new request will find that the first 1 MB is already allocated and allocate a 2 MB extent to satisfy the request. If there are no 2 MB extents available, the request fails. (Extents are, by definition, contiguous.)
VX_NORESERVE flag is set, the reservation value in the inode is not changed. This flag is used by applications to do temporary reservation. Any space past the end of the file is given up when the file is closed. For example, if the
cp command is copying a file that is 1 MB long, it can request a
1 MB reservation with the
VX_NORESERVE flag set. The space is allocated, but the reservation in the file is left at 0. If the program aborts for any reason or the system crashes, the unused space past the end of the file is released. When the program finishes, there is no cleanup because the reservation was never recorded on disk.
VX_CHGSIZE flag is set, the file size is increased to match the reservation amount. This flag can be used to create files with uninitialized data. Because this allows uninitialized data in files, it is restricted to users with appropriate privileges.
It is possible to use these flags in combination. For example, using
VX_NORESERVE changes the file size but does not set any reservation. When the file is truncated, the space is freed. If the
VX_NORESERVE flag had not been used, the reservation would have been set on disk along with the file size.
Space reservation is used to make sure applications do not fail because the file system is out of space. An application can preallocate space for all the files it needs before starting to do any work. By allocating space in advance, the file is optimally allocated for performance, and file accesses are not slowed down by the need to allocate storage. This allocation of resources can be important in applications that require a guaranteed response time.
With very large files, use of space reservation can avoid the need to use indirect extents. It can also improve performance and reduce fragmentation by guaranteeing that the file consists of large contiguous extents. Sometimes when critical file systems run out of space,
cron jobs, mail, or printer requests fail. These failures are harder to track if the logs kept by the application cannot be written due to a lack of space on the file system.
By reserving space for key log files, the logs will not fail when the system runs out of space. Process accounting files can also have space reserved so accounting records will not be lost if the file system runs out of space. In addition, by using the
VX_NOEXTEND flag for log files, the maximum size of these files can be limited. This can prevent a runaway failure in one component of the system from filling the file system with error messages and causing other failures. If the
VX_NOEXTEND flag is used for log files, the logs should be cleaned up before they reach the size limit in order to avoid losing information.
By using a fixed extent size, an application can reduce allocations and guarantee good extent sizes for a file. An application can reserve most of the space a file needs, and then set a relatively large fixed extent size. If the file grows beyond the reservation, any new extents are allocated in the fixed extent size.
Another use of a fixed extent size occurs with sparse files. The file system usually does I/O in page size multiples. When allocating to a sparse file, the file system allocates pages as the smallest default unit. If the application always does subpage I/O, it can request a fixed extent size to match its I/O size and avoid wasting extra space.
When setting a fixed extent size, an application should not select too large a size. When all extents of the required size have been used, attempts to allocate new extents fail: this failure can happen even though there are blocks free in smaller extents.
Fixed extent sizes can be modified by the
VX_ALIGN flag. If the
VX_ALIGN flag is set, then any future extents allocated to the file are aligned on a fixed extent size boundary relative to the start of the allocation unit. This can be used to align extents to disk striping boundaries or physical disk boundaries.
VX_ALIGN flag is persistent and is returned by the
VX_FREEZEioctl command is used to freeze a file system. Freezing a file system temporarily blocks all I/O operations to a file system and then performs a
syncon the file system. When the
VX_FREEZEioctl is issued, all access to the file system is blocked at the system call level. Current operations are completed and the file system is synchronized to disk. Freezing provides a stable, consistent file system.
When the file system is frozen, any attempt to use the frozen file system, except for a
VX_THAW ioctl command, is blocked until a process executes the
VX_THAW ioctl command or the time-out on the freeze expires.
VX_GET_IOPARAMETERSioctl to get the recommended