volrec - structure defining a volume record


#include <sys/types.h>
#include <sys/vol.h>

#define Name_LEN       14
#define COMMENT_LEN    40
#define UTIL_NUM       3
#define UTIL_LEN       14
#define Name_SZ        (Name_LEN + 1)
#define COMMENT_SZ     (COMMENT_LEN + 1)
#define UTIL_SZ        (UTIL_LEN + 1)

struct volseqno { ulong_t seqno_lo, seqno_hi; };
typedef struct volseqno volseqno_t;
typedef struct volseqno volrid_t;

struct volrec {
        struct v_tmp  v_tmp;           /* non-persistent fields */
        struct v_perm v_perm;          /* persistent fields */
Fields for the v_perm structure:

	char      v_name[Name_SZ];            /* record name */
	char      v_use_type[Name_SZ];        /* volume usage type name */
	char      v_fstype[FSTYPE_SZ];        /* guess of volumes fstype */
	char      v_comment[COMMENT_SZ];      /* comment field */
	char      v_putil[UTIL_NUM][UTIL_SZ]; /* persistent util fields */
	char      v_state[STATE_SZ];          /* utility state of volume */
	char      v_pref_name[Name_SZ];       /* plex name if V_PREPER */
	char      v_start_opts[V_STOPTS_SZ];  /* volume start options */
	enum vol_r_pol  v_read_pol;           /* method of plex selection */
	minor_t   v_minor;                    /* minor number in disk group */
	uid_t     v_uid;                      /* owner of /dev/vol/name /* 
	gid_t     v_gid;                      /* group of /dev/vol/name /* 
	mode_t    v_mode;                     /* mode of /dev/vol/name /* 
	ulong_t   v_pflag;                    /* persistent volume flags /* 
	long      v_pl_num;                   /* associated plex count /* 
	volseqno_t  v_update_tid;             /* trans id of last update /* 
	voff_t    v_len;                      /* byte length of volume /* 
	voff_t    v_log_len;                  /* length of log area /* 
	volrid_t  v_rid;                      /* unique identifier /* 
	volrid_t  v_pref_plex_rid;            /* preferred plex record ID /* 
	volseqno_t  v_detach_tid;             /* trans id of kernel detach /* 
Fields for the v_tmp structure:

	char      v_tutil[UTIL_NUM][UTIL_SZ   /* non-persistent util fields /* 
	long      v_rec_lock;                 /* 1 if record is locked */
	long      v_data_lock;                /* 1 if volume is data locked */
	enum vol_kstate v_kstate;             /* relation to file space */
	enum vol_except v_r_some;             /* if some plex reads fail */
	enum vol_except v_w_all;              /* if all plex writes fail*/
	enum vol_except v_w_some;             /* if some plex writes fail */
	long      v_lasterr;                  /* last volume error or 0 */
	ulong_t   v_tflag;                    /* non-persistent volume flags */
	long      v_log_serial_lo;            /* log serial number/low part*/
	long      v_log_serial_hi;            /* log serial number/hi part */
	dev_t     v_bdev;                     /* block dev for volume */
	size_t    v_iosize;                   /* minimum size for raw I/Os */
	voff_t    v_rwback_offset;            /* read/write-back offset */


A volrec structure is the structure used to communicate volume record information between the volume configuration daemon, vxconfigd, and programs using the Volume Manager library to query for configurations and to make configuration changes.

The two structures contained in the volrec structure differentiate elements of the volume record that are persistent and that are non-persistent. The division of fields between v_tmp and v_perm structures is somewhat historical, however the v_perm structure contains information that is stored persistently (i.e., fields that are recovered unchanged after a system reboot), or is directly derivable from persistent volume record information. The v_tmp field, on the other hand, contains fields that can be modified without the changes being stored persistently.

The uses of the various volume fields are defined as follows:

The volume name. This field cannot be changed directly, although it can be changed by calling vxvm_rename.

This is a 64-bit record ID assigned to the volume record, which is unique within the disk group for the duration of existence for the disk group. This does not change as a result of a vxvm_rename, even though the record name changes.

The usage type associated with the volume. This is used to select a utility set that maintains state and plex consistency in a manner appropriate to the usage of the volume.

The file system type of any file system residing on the volume. A usage type may choose to use or ignore this field.

A null-terminated comment string associated with the record. The contents are arbitrary except that they cannot contain a new line.

An array of three null-terminated strings that can be used as scratch pads by utilities. These fields are preserved across reboots. By convention, the first field is reserved for usage types; the second field for higher-level applications, such as the VERITAS Visual Administrator; and the third field for local site administrators.

A null-terminated state field that is reserved specifically for use by usage types.

The name of the preferred plex for use when the v_read_pol field is set to V_PREFER. This field is derived from the v_pref_plex_rid field.

This is an arbitrary string that is reserved for usage-type utilities. The intention is that this field be used to store options that apply to the volume, such as for the volume start operation. This is normally a comma-separated list of flag names, and option=value pairs. See the gen and fsgen versions of vxvol(1M), for information on how this field is used by the gen and fsgen utilities.

The policy for selecting plexes to satisfy volume read operations. This can have one of the following values:

Candidate plexes are selected in sequence for each sequential volume read operation. This is known as a round-robin approach.

The plex named by the v_pref_name field is used if it can satisfy the read request. If the preferred plex cannot satisfy the read request, then this policy becomes equivalent to the round-robin policy.

A default policy is selected based on the current configuration of the volume. If the volume has two or more active plexes, and exactly one of those plexes is striped, then the striped plex is preferred; otherwise, the round-robin read policy is used.

The minor number of the block and character volume devices associated with the volume record. The volume minor number is assigned when the volume is created. This is a read-only field. Conditions may force the actual volume device minor number to differ from the v_minor field. This can happen in disk groups other than rootdg, if a conflict occurs. This can also happen in the rootdg disk group if the V_PFLAG_FORCEMINOR flag is used to force a particular value for v_minor, even if the indicated number is unavailable.

v_uid, v_gid, and v_mode
The user ID, group ID, and permission modes for the volume's block and character device nodes, and for the device nodes for the associated plexes.

Flags associated with the volume that are preserved across reboots. The set of persistent flags that can be set is:

The write-back-on-read-failure flag. If set, then an attempt is made to fix a read error from a participating plex (i.e., one without the noerror flag). The method used to fix the read error is to read from another plex associated with the volume and write back to the plex with the read error. The read operation is then retried to verify that the operation is fixed. This requires at least two associated, enabled, participating, read-mode plexes.

This is an effective way of handling device drivers that can revector blocks on write failures, and can be used to handle the majority of media failures on many disk drives. For this operation to be effective, the underlying device driver must not revector blocks on read errors.

If set (vxmake and vxassist set this by default), then some writes to mirrored volumes that use dirty region logging will be copied into an allocated kernel buffer before being written to disk. The reason for doing such a copy is that write requests given to the volume device driver can point to pages of memory that are still undergoing change. Without doing a copy, the blocks written to each plex might be different. If you are sure that your application does not modify pages while they are written, or if you are certain that mirrors with differing contents do not represent a problem, then you can turn off this flag.

This flag is set on a reboot if the volume was open at the time of a system crash, and the volume had been written at least once. This implies that the volume, if it is mirrored, requires recovery to ensure consistency between plexes.

If this is set, then force the setting of v_minor specified on creation of the volume record. If this flag is not set, v_minor might be remapped to an unused value. This flag is required to set minor numbers less than 5. This does not guarantee that the actual volume device node will have the indicated minor number, however, if the volume is in rootdg, then the volume will be given that minor number (if no other volume in the disk group has that minor number) after a reboot.

This is a bit-mask that specifies bits in the v_pflag field that indicate the logging type for the volume. The bits masked out by this macro can have one of the following values:

The logging type is undefined. Volumes that were created in Release 1.0 of the Volume Manager have this type. This value is effectively identical to V_PFLAG_NONE except that utilities are able to use the V_PFLAG_LOGUNDEF flage as a license to default the logging type to something else.

No logging is performed for the volume. Even if a logging subdisk is defined for a plex, the logging subdisk is not used.

A dirty region log is written periodically to each log subdisk associated with an associated, enabled plex. This log keeps track of the regions that have changed due to I/O writes to a mirrored volume. For any write operation to the volume, before writing the data, the regions being written are marked dirty in the log. If a write causes a region to become dirty when it was previously clean, the log is written to disk before the write operation can occur.

The number of plexes associated with the volume.

The transaction ID of the last update to this record. This field is assigned when changes to a disk group are committed.

The length of the volume. This can be set arbitrarily, even if it is longer or shorter than some or all of the associated plexes. This value is in sectors.

The length for a volume log. For the block-change-logging log type, this value must always be 1. However, future logging types may support larger log lengths. The length for all subdisk logs associated with the volume must be at least this long. This value is in sectors.

Specify the record ID of the preferred plex for the volume. This field is used only if v_read_pol is set to V_PREFER.

An array of three null-terminated strings that can be used as scratch pads by utilities. These fields are cleared on reboot. By convention, the first field is reserved for usage types, the second field for higher-level applications, such as OA&M scripts and the VERITAS Visual Administrator; and the third field for local site administrators.

A boolean value that is 1 if the volume is date-locked in the caller's current transaction, and 0 otherwise. This is a read-only field.

A boolean value that is 1 if the volume is data-locked in the caller's current transaction, and 0 otherwise. This is a read-only field.

The accessibility of the volume. This field can have one of the following values:

The volume block device can be used, and reads and writes to the block or character volume device are accepted.

The volume block device cannot be used, and reads or writes to the character device are rejected. Volume ioctls are still usable, and the plex devices for associated plexes can be used, within the bounds of the plex pl_kstate fields.

The volume cannot be used for any operations, and neither can the plex devices for any of the associated plexes.

This field is set to V_DISABLED after a reboot.

v_r_all, v_r_some, v_w_all, and v_w_some
Exception policies for the volume. These devices are classified by the following types:

Read failure on all plexes

Read failure on some plexes

Write failure on all plexes

Write failure on some plexes

If one of these exception conditions is encountered, then the corresponding action is taken. The possible actions are:

Takes no action. However, if the operation fails for all candidate plexes, then the operation still fails.

Fails the operation, but takes no further action.

Detaches the plex with the failure. The operation fails only if the operation fails for all candidate plexes.

Detaches the plex with the failure and returns a failure for the operation, even if the operation can be satisfied by another plex.

Detaches the volume but does not fail the operation.

Detaches the volume and fails the operation.

A higher-level error policy which detaches failing plexes. However, if detaching a complete plex would result in no complete plexes remaining, then V_GEN_DET detaches the volume rather than detaching the failing plexes. A complete plex is one that has the PL_TFLAG_COMPLETE flag set in the plex pl_tflag field.

A higher-level error policy which detaches failing plexes. However, if detaching a plex results in no complete plexes remaining, then V_GEN_DET_SPARSE leaves exactly one complete plex enabled, and detaches all incomplete plexes that have volume blocks mapped to subdisks in the region of the failure. This policy allows the volume to continue operating on a failing plex, and does not disable mirrored regions that are unaffected by the failing operation.

In the case of a logging volume, the volume is detached if a write failure occurs to all enabled log subdisks associated with the volume.

Detaches the failing plexes, and the volume, and returns a failure for the operation. This policy can be used by applications that wish to make decisions about changing the Volume Manager configuration based on failures. The detached state of a plex can be used as an indication of which plexes failed, and making the volume detached prevents future I/Os from succeeding until the problem is resolved.

This operates exactly like the V_GEN_DET error policy, except that it detaches the volume if the number of complete plexes would drop below two. This ensures that a volume is either mirrored to at least two plexes, or is non-operational until the situation is repaired.

Not all plexes are taken into account in the exception policy selection or actions. A plex is ignored under any of the following conditions:

  • The plex is not enabled.

  • The plex does not have a read or write mode appropriate for the operation.

  • The plex has the PL_PFLAG_NOERROR flag set.

  • The plex does not have mapped subdisk blocks that are appropriate for the range of the requested operation.

The exception policies are normally set implicitly by the operational utilities. The utilities provided by VERITAS set all the exception policies to V_GEN_DET_SPARSE and do not provide a means for changing the policies to something else.

A sequence number for the last I/O error to be encountered on the volume. This is a read-only field.

A bitmask of flags that is cleared after a reboot. Flags defined in this field are:

A flag that can be turned on to request read/writeback mode. In read/writeback mode, a read request for a mirrored volume will write back to all other plexes the resulting data from the read. The operation is affected by the v_rwback_offset field. This mode is intended for volume recovery operations.

This is a status flag which indicates that the read/writeback mode operation is still in effect. This flag is set when V_TFLAG_RWBACK is set. If the read/writeback offset (see v_rwback_offset) reaches the end of the volume, then the kernel will turn off this flag.

A status flag that indicates that the volume device that corresponds to the volume record is open or mounted as a file system.

A status flag which indicates the volume has a logging type of VOL_PFLAG_LOGBLKNO, is enabled, and has at least one enabled, associated plex with an enabled log subdisk. This flag is not cleared when exception policies are invoked that detach a volume or its plexes.

An error has rendered the volume unusable. The volume cannot be started.

v_log_serial_lo and v_log_serial_hi
These values, taken together, yield a unique monotonically increasing value that is changed for every log write that occurs to a volume with logging enabled. These two numbers are cleared by a reboot, but are normally set explicitly by a vxvol start operation. The value in v_log_serial_lo is incremented by one for every log write. When the value would surpass LONG_MAX (normally 2147483647 for 32-bit machines), v_log_serial_lo is set to zero and v_log_serial_hi is incremented by one. Thus, on 32-bit machines, v_log_serial_lo and v_log_serial_hi represent the low 31 bits and the high 31 bits, respectively, of a 62 bit number.

Unlike all other flags, the values of the log serial number fields cannot always be trusted within a transaction. The reason for this is that data-locks are not obtained by vxconfigd until after a utility has completely described a transaction for vxconfigd to transmit to the kernel. Other fields that can be changed by the kernel are checked at the time of a vol_commit to ensure that the fields haven't changed, and if any kernel- modifiable fields have changed since the corresponding vol_trans call, then the utility is asked to retry the transaction.

However, a volume with significant I/O activity is likely to change the value of the serial number fields often enough that such volumes may have to be retried an unacceptable number of times, so these fields are not checked.

Utilities must be prepared to ensure that volume logs are in a quiescent state (normally by setting the volume to V_DETACHED or by disabling logging) before using the value of a log within a transaction. The existing utility set uses the log serial number fields only to set the serial number for a volume.

v_bdev, v_cdev
The device numbers for the volume block device node. Normally, these are computed from the v_minor number. However, in cases of collision, they may have different minor numbers.

The largest sector size of any disk assicated (through a subdisk) with the volume. At the present time, only one sector size (normally 512 bytes) is supported, so this field will always match the single system sector size.

When read/writeback mode is turned on, this field is loaded into the kernel as the current read/ writeback offset pointer. Reads then occur before this offset into the volume will not invoke read/writeback recovery. If a read occurs on the boundary, then then the kernel will increase the pointer to the end of that read, after a successful result from the operation. This automatically-increasing pointer causes the degradation from the read/writeback mode to decrease as volume recovery progresses.


vxconfigd(1M), vxintro(1M), vxiod(1M), vxmake(1M), vxvol(1M), plexrec(4), sdrec(4)

© 1997 The Santa Cruz Operation, Inc. All rights reserved.