DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 
HDK Technical Reference

SDI bus timeout/reset recovery

The SDI bus timeout/reset mechanism is used to recover from hardware problems in mass storage devices. These failures can result in jobs timing out, or can cause a target device-generated bus reset.

First, a few definitions to help explain how this mechanism works:


Bus device reset
Immediately forces a target device (disk, tape drive, for example) to drop all current and pending jobs and perform a hard reset.

Hard Reset
A peripheral that implements hard resets will fail all current and pending jobs to the HBA controller upon receipt of a SCSI bus reset or bus device reset. Different brands of controllers may respond to this differently; for example, some HBAs may keep track of failed jobs following a bus reset and resubmit them, without the driver's intervention.

Recovery gauntlet
A target driver procedure used to recover from a SCSI bus reset or timeout, resubmitting those jobs which may have been dropped or for which errors have occurred.

SCSI bus reset
Clears all SCSI devices from the bus. The reset may be initiated by any physical device on the bus, including the HBA controller itself. A SCSI bus reset is not a SCSI command, but rather it is caused by asserting an electrical signal on the SCSI bus. What happens to jobs issued to a device prior to the reset depends on whether the device supports hard resets and/or soft resets. One should always use only the supported types of resets for a given device.

Soft reset
After a soft reset, a given SCSI peripheral will attempt to process all jobs it has identified, that is, all jobs that have gone through the necessary SCSI protocol to be owned by the target device. If a job has not been identified completely by target device, the device will not process the job and it is up to the HBA controller to resubmit or fail the job.

The following commands and return values are used for this feature.


Command completion values

SDI_TIME_NOABORT
The command has timed-out but may still be on the HBA controller or target device.

SDI_TIME
A timed-out job has been either aborted or completed.

sfb commands

SFB_RESET_DEVICE
Reset a SCSI device through a bus device reset. Returns SDI_RET_OK if the device reset is supported, SDI_RET_ERR otherwise.

SFB_RESET_BUS
Reset the SCSI bus. Returns SDI_RET_OK if bus reset is supported, SDI_RET_ERR otherwise.

SFB_TIMEOUT_ON
Inform the HBA driver that timeouts are enabled.

SFB_TIMEOUT_OFF
Inform the HBA driver that timeouts are disabled.

SDI ioctls

B_NEW_TIMEOUT_VALUES
Provide new timeout values for a specific SCSI address.

B_TIMEOUT_SUPPORT
Turns timeout support on or off.


NOTE: Any HBA driver supporting timeout/reset should set HBA_TIMEOUT_RESET in its device flag.

Two triggers can cause the recovery mechanism to be invoked: a job timeout detected by the HBA watchdog timer or a specific SCSI command completion status code. These latter include:


SDI_CRESET
The associated command failed due to some device resetting the SCSI bus.

SDI_RESET
The associated command failed due to the HBA resetting the SCSI bus.

SDI_TIME
The job has been timed out and aborted.

SDI_TIME_NOABORT
The job has been timed out, but not aborted.

For more details, see sdi_timeout.


© 2005 The SCO Group, Inc. All rights reserved.
OpenServer 6 and UnixWare (SVR5) HDK - June 2005