JPH06230992A

JPH06230992A - Computer system and method for recovery of computer system from fault

Info

Publication number: JPH06230992A
Application number: JP5041918A
Authority: JP
Inventors: Yoshifumi Takamoto; 良史高本; Hitoshi Tsunoda; 仁角田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-02-06
Filing date: 1993-02-06
Publication date: 1994-08-19

Abstract

PURPOSE:To provide the computer system and its fault recovery method which are low in cost and highly reliable. CONSTITUTION:If a fault occurs to one constituent element (e.g. compressing mechanism 113 of disk device 106) in a computer system equipped with plural constituent elements, a substitute constituent element (emulation by CPU 114, etc.) which functions as a substitute for the constituent element is searched for and made to perform substitute processing. Further, another device (disk device 107) or a host processor 101 is requested to perform processing substituting the constituent element where the fault occurs, thereby performing substitutive processing. Consequently, even when many devices are connected, the reliability never decreases and the low-cost, high-reliability system can be structured.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、計算機システムおよび
その障害回復方法に関し、特に高信頼性および高い拡張
性が要求されるファイルシステムを備えた計算機システ
ムなどに適用して好適な障害回復方法および計算機シス
テムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer system and its failure recovery method, and particularly to a failure recovery method and a failure recovery method suitable for application to a computer system equipped with a file system which requires high reliability and high expandability. Regarding computer systems.

【０００２】[0002]

【従来の技術】ハードウェアの信頼性を向上させる手法
として、一般にはハードウェアの二重化や符号による誤
り訂正といった技術が用いられる。例えば、ディスクシ
ステムにおいても、格納ディスクドライブの多重化、制
御プロセッサの多重化、あるいはデータパスの多重化な
どにより、高い信頼性を保証するシステムが多数存在す
る。2. Description of the Related Art As a method for improving the reliability of hardware, techniques such as hardware duplication and code error correction are generally used. For example, even in a disk system, there are many systems that ensure high reliability by multiplexing storage disk drives, control processors, or data paths.

【０００３】格納ディスクドライブの多重化は、同一デ
ータを複数のディスクドライブに格納しておき、もし何
れかのディスクドライブあるいはディスクドライブを制
御する制御装置が障害を起こしたときには、同一データ
が格納された他のディスクドライブにアクセスすること
で処理を中断させないようにするものである。In the multiplexing of storage disk drives, the same data is stored in a plurality of disk drives, and if one of the disk drives or a control device controlling the disk drive fails, the same data is stored. By accessing another disk drive, the processing is not interrupted.

【０００４】ここで、同一データを複数のディスクドラ
イブに格納する方式としては、大きく２種類の方式が存
在する。１つは、ソフトウェア（一般には、オペレーテ
ィングシステム）が複数のディスクドライブにデータを
格納する入出力命令をそれぞれ発行する方式である。も
う１つは、オペレーティングシステムは一回の入出力命
令を発行し、その一回の入出力命令でディスク制御装置
が自動的に複数のディスクドライブに格納するようにハ
ードウェア制御を行なう方式である。There are roughly two types of methods for storing the same data in a plurality of disk drives. One is a method in which software (generally an operating system) issues input / output instructions for storing data in a plurality of disk drives. The other is a method in which the operating system issues one I / O command, and the hardware control is performed so that the disk control device automatically stores in one or more I / O commands in a plurality of disk drives. .

【０００５】現状では、ハードウェアコストが低減され
たことや、ソフトウェアの負荷削減などの理由から、後
者のハードウェア制御によるディスクドライブ多重化が
行なわれるようになってきている。こういった環境で、
何れかのディスクドライブのアクセスが不可能に陥った
場合には、ハードウェアが障害を検知し、自動的に他の
ディスクドライブで代用する制御を行なう。こういった
制御は、例えば、特開平３−２２６８２５号公報などに
記載されている。At present, the latter disk drive multiplexing by hardware control has come to be performed for reasons such as reduction of hardware cost and reduction of software load. In such an environment,
When access to one of the disk drives becomes impossible, the hardware detects the failure and automatically performs control to substitute another disk drive. Such control is described in, for example, Japanese Patent Application Laid-Open No. 3-226825.

【０００６】一方、制御プロセッサの多重化やデータパ
スの多重化も、同様に、障害発生時には障害を起こして
いない制御プロセッサやデータパスを使用し、これによ
り高信頼性を保証している。一般には、システムを構成
する要素の内、最も障害の発生しやすい箇所やあるいは
システムバスといったシステムの中核となる箇所での多
重化を行なっている。On the other hand, also in the multiplexing of control processors and the multiplexing of data paths, similarly, when a failure occurs, a control processor and a data path that have not failed are used, thereby ensuring high reliability. In general, the multiplexing is performed at the location where the failure is most likely to occur, or at the core of the system, such as the system bus, among the constituent elements of the system.

【０００７】[0007]

【発明が解決しようとする課題】上述した多重化技術
は、比較的容易に信頼性を向上させることができるとい
ったメリットを持つ。しかし、ある意味ではリソースを
無駄に使用していることにもなる。ハードウェアコスト
が以前に比べ低下したとはいえ、これら多重化されたシ
ステムを多数連結させるような大規模システムでは、多
重化によるコストが大きくなる。The above-mentioned multiplexing technique has an advantage that reliability can be improved relatively easily. But in a sense, it's also a waste of resources. Although the hardware cost is lower than before, the cost of multiplexing becomes large in a large-scale system in which a large number of these multiplexed systems are connected.

【０００８】さらに、上記従来技術では、あらかじめ定
められた多重度を容易に変更することができない。例え
ば、始めに二重化で設計されたシステムを、容易に三重
化、四重化にすることは難しい。始めに二重化で設計さ
れたシステムは、実装上、さらに三重化、四重化するた
めのハードウエアを組み込むことが難しいからである。
また、多重化制御用ハードウェアの仕様変更コストなど
によるコスト高が生じるからである。こういった場合、
通常は、さらに上位のソフトウェアによる多重化などを
行なわなければならない。Further, in the above-mentioned conventional technique, the predetermined multiplicity cannot be easily changed. For example, it is difficult to easily triple or quadruple a system originally designed for duplexing. This is because it is difficult for a system initially designed with duplexing to incorporate hardware for triple and quadruple mounting.
Also, the cost is increased due to the cost of changing the specifications of the hardware for multiplexing control. In these cases,
Normally, higher level software must perform multiplexing.

【０００９】本発明は、上述の従来例における問題点に
鑑み、低コストでかつ高信頼な計算機システムおよびそ
の障害回復方法を提供することを目的とする。In view of the above-mentioned problems in the conventional example, it is an object of the present invention to provide a low cost and highly reliable computer system and its failure recovery method.

【００１０】[0010]

【課題を解決するための手段】上記目的を達成するた
め、本発明は、複数の構成要素を備えた計算機システム
において、上記構成要素の障害を検知する検知手段と、
ある構成要素に障害が発生したとき、その構成要素を代
替して機能する代替構成要素を探索する探索手段と、探
索された代替構成要素に対し、障害が発生した構成要素
を代替する処理を行なわせる管理手段とを備えたことを
特徴とする。In order to achieve the above object, the present invention is a computer system having a plurality of components, and a detection means for detecting a failure of the components,
When a failure occurs in a certain constituent element, a searching means for searching for an alternative constituent element which functions by substituting the constituent element and a processing for replacing the failed constituent element with respect to the searched alternative constituent element It is characterized by having a management means for making it.

【００１１】また、ホストプロセッサを備えた計算機シ
ステムまたはネットワークに接続される周辺機器であっ
て、複数の構成要素を備えた周辺機器において、上記構
成要素の障害を検知する検知手段と、ある構成要素に障
害が発生したとき、内部の他の構成要素中から障害を起
こした構成要素を代替して機能する代替構成要素を探索
する探索手段と、探索された代替構成要素に対し、障害
が発生した構成要素を代替する処理を行なわせる管理手
段とを備えたことを特徴とする。Further, in a peripheral device connected to a computer system or a network including a host processor, the peripheral device including a plurality of components, a detection means for detecting a failure of the above components, and a certain component When a failure occurs, a search means for searching for an alternative constituent element that functions by substituting the failed constituent element among other internal constituent elements, and a failure occurs for the searched alternative constituent element. And a management means for performing a process of substituting the constituent elements.

【００１２】前記代替構成要素は、例えばプログラムを
実行可能なプロセッサを備えており、該プロセッサを用
いて、前記障害が発生した構成要素の動作をエミュレー
ションまたはシミュレーションにより模擬してもよい。
また、前記障害が発生した構成要素がデータ圧縮機構で
ある場合、前記プロセッサはデータ圧縮のエミュレーシ
ョンを行なってデータ圧縮機構を代替する処理を行なう
とよい。The alternative component may include, for example, a processor capable of executing a program, and the processor may be used to simulate the operation of the component in which the failure has occurred by emulation or simulation.
Further, when the component in which the failure has occurred is a data compression mechanism, the processor may emulate data compression and perform a process of replacing the data compression mechanism.

【００１３】前記探索手段は、障害が発生した構成要素
を示すデータ、およびその構成要素を代替する代替構成
要素を示すデータを含むデータ列を生成して、前記管理
手段へと送り、前記管理手段は、該データ列に基づいて
前記代替する処理を行なわせるようにしてもよい。The searching means generates a data string including data indicating a constituent element in which a failure has occurred and data indicating an alternative constituent element that replaces the constituent element, sends the data string to the managing means, and the managing means. May perform the alternative process based on the data string.

【００１４】さらに、前記構成要素に障害が発生してい
るかどうかを示すデータと、障害が発生したときその構
成要素を代替する処理を表すデータとを格納した構成管
理テーブルを備え、前記検知手段は、該構成管理テーブ
ルを参照して、構成要素の障害を検知し、前記探索手段
は、該構成管理テーブルを参照して、障害が発生した構
成要素を代替する処理を探索するようにしてもよい。Further, there is provided a configuration management table which stores data indicating whether or not a failure has occurred in the constituent element and data representing a process of substituting the constituent element when the failure occurs, and the detection means comprises the detecting means. Alternatively, the search means may refer to the configuration management table to detect a failure of a component, and the search unit may refer to the configuration management table to search for a process for substituting the failed component. .

【００１５】さらに、前記各構成要素の障害を検知した
とき、再度その構成要素に同じ動作を反復させ、所定回
数以下の反復動作によりその構成要素が正常に動作した
ときには、その構成要素には障害が発生しなかったもの
として動作を継続するとともに、そのような反復動作に
より正常動作に移行した回数を数え、該回数が所定値を
越えたときにはその旨を外部に伝えるようにしてもよ
い。Further, when a failure of each of the constituent elements is detected, the constituent element is made to repeat the same operation again, and when the constituent element normally operates by repeating a predetermined number of times or less, the constituent element is impaired. It is also possible to continue the operation assuming that the occurrence of the above-mentioned condition has not occurred, count the number of times the normal operation is performed by such a repetitive operation, and notify the outside to that effect when the number of times exceeds a predetermined value.

【００１６】前記構成要素は、多重化されていてもよ
い。前記管理手段自体が、前記障害が発生した構成要素
を代替する処理を行なうことが可能な構成要素であって
もよい。前記構成要素は、例えば計算機システムのファ
イルシステムを構成する構成要素である。The above components may be multiplexed. The management means itself may be a component capable of performing a process of substituting the component having the failure. The constituent elements are constituent elements that constitute a file system of a computer system, for example.

【００１７】さらに、本発明は、ホストプロセッサと、
複数の周辺機器とを備えた計算機システムであって、そ
の各周辺機器は複数の構成要素を備えている、計算機シ
ステムにおいて、上記周辺機器の構成要素の障害を検知
する検知手段と、ある構成要素に障害が発生したとき、
その構成要素を含む周辺機器以外の周辺機器または上記
ホストプロセッサにより、障害が発生した構成要素を代
替する処理を行なわせる管理手段とを備えたことを特徴
とする。Further, the present invention comprises a host processor,
A computer system comprising a plurality of peripheral devices, each peripheral device comprising a plurality of components, in a computer system, a detection means for detecting a failure of the component of the peripheral device, and a component When a failure occurs,
The present invention is characterized by including a management unit that causes a peripheral device other than the peripheral device including the constituent element or the host processor to perform a process of substituting the constituent element in which a failure has occurred.

【００１８】前記管理手段は、障害回復処理依頼である
ことを示す識別子、障害を起こした構成要素を示す識別
子、および依頼する処理を示す識別子を含むデータ列を
生成して送出することにより、前記周辺機器またはホス
トプロセッサに代替処理を行なわせるようにしてもよ
い。また、前記管理手段は、前記データ列を複数の周辺
機器に対して送信し、該データ列に格納された要求を処
理可能な周辺機器が、該データ列に基づく代替処理を行
なうようにしてもよい。The management means generates and sends a data string including an identifier indicating a failure recovery processing request, an identifier indicating a failed component, and an identifier indicating a processing to be requested, thereby transmitting the data string. The peripheral device or the host processor may be made to perform the alternative process. Further, the management means may transmit the data string to a plurality of peripheral devices, and the peripheral device capable of processing the request stored in the data string may perform the alternative process based on the data string. Good.

【００１９】さらに、前記構成要素に障害が発生してい
るかどうかを示すデータと、障害が発生したときその構
成要素を代替する処理あるいは代替手段を表すデータと
を格納した構成管理テーブルを備え、前記検知手段は、
該構成管理テーブルを参照して、構成要素の障害を検知
し、前記管理手段は、該構成管理テーブルを参照して、
どこに前記代替する処理を行なわせるかを決定するよう
にしてもよい。Further, there is provided a configuration management table which stores data indicating whether or not a failure has occurred in the constituent element and data representing a process or alternative means for replacing the constituent element when the failure occurs, The detection means is
By referring to the configuration management table, a failure of a component is detected, and the management unit refers to the configuration management table,
It may be determined where to perform the alternative process.

【００２０】前記周辺機器は、多重化されていてもよ
い。また、前記周辺機器は、計算機システムのファイル
システムであってもよい。The peripheral devices may be multiplexed. Further, the peripheral device may be a file system of a computer system.

【００２１】さらに、本発明は、ホストプロセッサと、
互いに通信可能な複数のファイルシステムとを備えた計
算機システムであって、その各ファイルシステムはディ
スクデバイスを含む複数の構成要素を備えているととも
に、各ファイルシステム間のクロスコールにより、互い
に他のファイルシステムのディスクデバイスにアクセス
できる、計算機システムにおいて、上記ファイルシステ
ムの構成要素の障害を検知する手段と、ディスクデバイ
ス以外の構成要素に障害が発生したとき、その構成要素
を含むファイルシステム以外のファイルシステムに対
し、障害が発生した構成要素を代替する処理を依頼する
手段とを備え、該依頼を受けたファイルシステムは、そ
の依頼に基づいて代替処理を行なうとともに、クロスコ
ールにより、前記障害が発生した構成要素を含むファイ
ルシステムのディスクデバイスに、直接アクセスするこ
とを特徴とする。Further, the present invention comprises a host processor,
A computer system having a plurality of file systems capable of communicating with each other, each file system having a plurality of constituent elements including a disk device, and by cross-calling between the file systems, other file systems mutually In a computer system that can access a disk device of the system, a means for detecting a failure in the above-mentioned file system component, and a file system other than the file system including the component when a failure occurs in a component other than the disk device On the other hand, a means for requesting a process for substituting a component having a failure is provided, and the file system which has received the request performs the alternative process based on the request, and the failure causes by the cross call. A file system disk containing the components The device, and wherein the direct access.

【００２２】また、本発明に係る計算機システムの周辺
機器は、ホストプロセッサを備えた計算機システムまた
はネットワークに接続される周辺機器であって、複数の
構成要素を備えた周辺機器において、自周辺機器の内部
にある構成要素の障害を検知する手段と、ある構成要素
に障害が発生したとき、他周辺機器に対して、その障害
が発生した構成要素を代替する処理を依頼するデータ列
を送出する手段と、他周辺機器から上記依頼した処理の
処理結果を受取ったときは、上記障害を起こした構成要
素をバイパスして、処理を継続する手段と、他周辺機器
から上記障害が発生した構成要素を代替する処理を依頼
するデータ列を受けたときは、該データ列の依頼に基づ
いて処理を行ない、処理結果を上記データ列の送出元に
返送する手段とを備えたことを特徴とする。A peripheral device of a computer system according to the present invention is a peripheral device connected to a computer system having a host processor or a network, and is a peripheral device having a plurality of constituent elements. A means for detecting a failure of an internal constituent element, and a means for, when a failure occurs in a certain constituent element, transmitting a data string requesting a process for substituting the failed constituent element to another peripheral device. When the processing result of the requested processing is received from another peripheral device, a means for continuing the processing by bypassing the component causing the failure and a component causing the failure from the other peripheral device are provided. When receiving a data string requesting a substitute process, a process is performed based on the request of the data string, and a processing result is returned to the sender of the data string. And it said that there were pictures.

【００２３】また、ホストプロセッサを備えた計算機シ
ステムまたはネットワークに接続される周辺機器であっ
て、複数の構成要素を備えた周辺機器において、自周辺
機器の内部にある構成要素の障害を検知する手段と、あ
る構成要素に障害が発生したとき、上記ホストプロセッ
サに対して、その障害が発生した構成要素を代替する処
理を依頼するデータ列を送出する手段と、上記ホストプ
ロセッサから上記依頼した処理の処理結果を受取ったと
きは、上記障害を起こした構成要素をバイパスして、処
理を継続する手段とを備えたことを特徴とする。Further, in a peripheral device connected to a computer system or a network including a host processor, the peripheral device including a plurality of components, means for detecting a failure of a component inside the peripheral device itself. And, when a failure occurs in a certain component, means for sending to the host processor a data string requesting a process for replacing the failed component, and a process of the requested process from the host processor. When the processing result is received, a means for bypassing the above-mentioned faulty component and continuing the processing is provided.

【００２４】さらに、本発明は、ホストプロセッサと、
該ホストプロセッサに接続される周辺機器とを備えた計
算機システムにおいて、該周辺機器は、通常は自機器の
内部で行なう処理を、他の周辺機器またはホストプロセ
ッサに依頼する手段を備えたことを特徴とする。Further, the present invention comprises a host processor,
In a computer system including a peripheral device connected to the host processor, the peripheral device is provided with means for requesting another peripheral device or the host processor to perform a process that is normally performed inside the own device. And

【００２５】また、プログラムを実行可能なプロセッサ
を含む複数の構成要素を備えた計算機システムにおい
て、上記プロセッサ以外の構成要素の動作を上記プロセ
ッサでエミュレートまたはシミュレートし、上記構成要
素による処理結果と上記プロセッサによるエミュレート
またはシミュレートの結果とを比較する手段と備えたこ
とを特徴とする。Further, in a computer system including a plurality of constituent elements including a processor capable of executing a program, the operation of the constituent elements other than the processor is emulated or simulated by the processor, and a processing result by the constituent element is obtained. It is characterized by being provided with means for comparing the result of emulation or simulation by the processor.

【００２６】さらに、本発明に係る計算機システムの障
害回復方法は、複数の構成要素を備えた計算機システム
の障害回復方法において、上記構成要素の障害を検知す
るステップと、ある構成要素に障害が発生したとき、そ
の構成要素を代替して機能する代替構成要素を探索する
ステップと、探索された代替構成要素に対し、障害が発
生した構成要素を代替する処理を行なわせるステップと
を備えたことを特徴とする。Furthermore, a computer system failure recovery method according to the present invention is a failure recovery method for a computer system having a plurality of components, wherein a step of detecting a failure of the above-mentioned components and a failure of a certain component occur. Then, a step of searching for an alternative constituent element that functions by replacing the constituent element, and a step of causing the searched alternative constituent element to perform a process of substituting the failed constituent element are provided. Characterize.

【００２７】また、ホストプロセッサと、複数の周辺機
器とを備えた計算機システムであって、その各周辺機器
は複数の構成要素を備えている計算機システムの障害回
復方法において、上記周辺機器の構成要素の障害を検知
するステップと、ある構成要素に障害が発生したとき、
その構成要素を含む周辺機器以外の周辺機器または上記
ホストプロセッサにより、障害が発生した構成要素を代
替する処理を行なわせるステップとを備えたことを特徴
とする。A computer system comprising a host processor and a plurality of peripheral devices, each peripheral device having a plurality of components, in a failure recovery method for a computer system, the components of the peripheral device. The steps to detect the failure of, and when a certain component fails,
A step of causing a peripheral device other than the peripheral device including the constituent element or the host processor to perform a process for substituting the constituent element in which a failure has occurred.

【００２８】[0028]

【作用】複数の構成要素を備えた計算機システムにおい
て、ある構成要素に障害が発生したとき、その構成要素
を代替して機能する代替構成要素を探索し、その代替構
成要素に障害が発生した構成要素を代替する処理を行な
わせるようにしているので、例えばエミュレーションな
どにより障害を起こしたハードウェアと同等の処理を行
なうようにすることができ、これにより低コストで高信
頼なシステムを構築できる。In a computer system having a plurality of constituent elements, when a certain constituent element has a failure, an alternative constituent element that functions by substituting the constituent element is searched for, and the alternative constituent element has a failure. Since the processing for substituting the elements is performed, it is possible to perform the processing equivalent to the hardware that has caused a failure by, for example, emulation or the like, and thereby a low-cost and highly reliable system can be constructed.

【００２９】また、障害を起こした構成要素を代替する
処理を他の周辺機器やホストプロセッサに依頼すること
ができるので、通信可能な周辺機器やホストプロセッサ
であればどこへでも障害回復依頼を行なうことができ、
その結果を障害発生した機器へ転送することができ、処
理を中断させない高信頼なシステムの構築が可能にな
る。Further, since it is possible to request the processing for substituting the faulty component to another peripheral device or the host processor, the fault recovery request is issued to any communicable peripheral device or host processor. It is possible,
The result can be transferred to the device in which a failure has occurred, and a highly reliable system that does not interrupt the processing can be constructed.

【００３０】[0030]

【実施例】以下、図面を用いて、本発明の実施例を詳細
に説明する。Embodiments of the present invention will be described in detail below with reference to the drawings.

【００３１】［実施例１］図１は、本発明の一実施例に
係る計算機システムであり、かつ、本発明の障害回復方
法を適用したディスクシステム（周辺機器）を用いた計
算機システムを示したものである。[Embodiment 1] FIG. 1 shows a computer system according to an embodiment of the present invention, and a computer system using a disk system (peripheral device) to which the failure recovery method of the present invention is applied. It is a thing.

【００３２】このシステムは、大きく分けて、ホストプ
ロセッサ１０１，１０２とディスク装置１０６，１０７
とから構成され、両者はチャネルケーブル１８１，１８
２により接続されている。チャネルケーブル１８１，１
８２の制御のために、ホストプロセッサ１０１，１０２
側には入出力プロセッサ（ＩＯＰ）１０３，１０４が、
ディスク装置側にはチャネルアダプタ１０５が、それぞ
れ設置されている。チャネルアダプタ１０５とディスク
装置１０６，１０７は、システムバス１８３に接続され
ている。This system is roughly divided into host processors 101 and 102 and disk devices 106 and 107.
And channel cables 181, 18
Connected by two. Channel cable 181,1
To control the host processor 101, 102
Input / output processors (IOP) 103 and 104 are provided on the side,
Channel adapters 105 are installed on the disk device side. The channel adapter 105 and the disk devices 106 and 107 are connected to the system bus 183.

【００３３】ホストプロセッサ１０１，１０２はディス
ク装置１０６，１０７に対して入出力要求を発行し、要
求を受けたディスク装置１０６，１０７は、要求内容に
従ってデータの読出し／書込み（READ/WRITE）などを行
ない、結果をホストプロセッサ１０１，１０２に通知す
る。ディスク装置１０６，１０７は、本実施例では匡体
イメージである。ディスク装置１０６のように、１つの
匡体内に複数のディスクシステム（ファイルシステム）
が入っていてもよい。The host processors 101 and 102 issue input / output requests to the disk devices 106 and 107, and the disk devices 106 and 107 that have received the request perform data read / write (READ / WRITE) according to the requested contents. Then, the result is notified to the host processors 101 and 102. The disk devices 106 and 107 are box images in this embodiment. Multiple disk systems (file systems) within one enclosure, such as the disk device 106
May be included.

【００３４】ディスク装置１０６は、構成管理テーブル
１１１，１３１、障害管理部１１２，１１６，１３２，
１３６、オプション機構（ＯＰ）１１３，１３３、制御
プロセッサ（ＣＰＵ）１１４，１１７，１３４，１３
７、キャッシュメモリ１１５，１３５、ディスクドライ
ブ１１８，１１９，１３８，１３９から成る。同様に、
ディスク装置１０７は、構成管理テーブル１５１、障害
管理部１５２，１５６、オプション機構１５３、制御プ
ロセッサ１５４，１５７、キャッシュメモリ１５５、デ
ィスクドライブ１５８，１５９から成る。図から分かる
ように、ディスク装置１０６は、左側と右側の２つのデ
ィスクシステムを備えている。The disk device 106 includes configuration management tables 111 and 131, failure management units 112, 116 and 132,
136, option mechanism (OP) 113, 133, control processor (CPU) 114, 117, 134, 13
7, cache memories 115 and 135, and disk drives 118, 119, 138, and 139. Similarly,
The disk device 107 includes a configuration management table 151, failure management units 152 and 156, an option mechanism 153, control processors 154 and 157, a cache memory 155, and disk drives 158 and 159. As can be seen from the figure, the disk device 106 has two disk systems, a left side and a right side.

【００３５】構成管理テーブル１１１，１３１，１５１
は、それぞれ、ディスク装置１０６，１０７内の各構成
要素の障害に関する情報を管理している。障害管理部１
１２，１１６，１３２，１３６，１５２，１５６は、本
発明の特徴の１つであり、ディスク装置１０６，１０７
内の各構成要素が障害を起こした場合、できるだけ処理
を停止させないように管理する部分である。障害管理部
１１２，１１６，１３２，１３６，１５２，１５６は、
それぞれ、システムバス１８３に接続されている。Configuration management tables 111, 131, 151
Manages information about a failure of each component in the disk devices 106 and 107, respectively. Fault management unit 1
12, 116, 132, 136, 152 and 156 are one of the features of the present invention, and the disk devices 106 and 107.
This is a part that manages the processing so that it does not stop as much as possible when each component inside fails. The fault management units 112, 116, 132, 136, 152, 156 are
Each is connected to the system bus 183.

【００３６】オプション機構１１３，１３３，１５３
は、ディスク装置を高機能化するための機構であり、ユ
ーザにとって便利な機能が１つあるいは複数設定するこ
とができる。以下、本実施例では圧縮機構を例にとって
説明するが、特に圧縮に限る必要はない。Optional mechanism 113, 133, 153
Is a mechanism for enhancing the function of the disk device, and one or more functions convenient for the user can be set. In this embodiment, the compression mechanism will be described below as an example, but the compression mechanism is not limited to this.

【００３７】制御プロセッサ１１４，１１７，１３４，
１３７，１５４，１５７は、ディスク装置１０６，１０
７全体を制御し、例えばホストプロセッサ１０１，１０
２からのコマンドを解釈したり、キャッシュメモリ１１
５，１３５，１５５内のデータ管理などを行なう。ディ
スクドライブ１１８，１１９，１３８，１３９，１５
８，１５９には、ユーザのデータなどが格納されてい
る。Control processors 114, 117, 134,
137, 154 and 157 are disk devices 106 and 10
7 and controls, for example, the host processors 101, 10
2 interprets commands from the cache memory 11
Data management in 5,135,155 is performed. Disk drives 118, 119, 138, 139, 15
8, 159 stores user data and the like.

【００３８】また、１２１〜１２６，１４１〜１４６，
１６１〜１６６は、それぞれ構成管理テーブル１１１，
１３１，１５１の更新／参照を意味する。１２７，１４
７，１６７，１２８，１４８，１６８は、オプション機
構や制御プロセッサの制御を受けないで、直接にデータ
を次の処理構成部へ転送するためのパスである。これら
についての詳細は、後に詳しく説明する。Further, 121 to 126, 141 to 146
161 to 166 are configuration management tables 111 and 111, respectively.
131, 151 update / reference. 127,14
Reference numerals 7, 167, 128, 148, 168 are paths for directly transferring data to the next processing component without being controlled by the optional mechanism or control processor. Details of these will be described later.

【００３９】図２は、本実施例における障害回復の流れ
の一例（概略）を示したフローチャートである。FIG. 2 is a flow chart showing an example (outline) of the flow of failure recovery in this embodiment.

【００４０】ある構成要素に障害が発生すると、まずそ
の構成要素が多重化されているかどうかを調べる（ステ
ップ２０１）。その結果、多重化されていてかつ代用可
能な構成要素があるなら（ステップ２０５）、代用可能
な構成要素で代替して処理を続行する。When a failure occurs in a certain component, it is first checked whether or not the component is multiplexed (step 201). As a result, if there is a component which is multiplexed and can be substituted (step 205), the component which can be substituted is substituted and the processing is continued.

【００４１】多重化を利用できない場合（構成要素が多
重化されていない場合、または多重化されていても代用
が不可能な場合）は、同等の機能をエミュレーション可
能かどうか調べる（ステップ２０２）。ここで、エミュ
レーションとは、本来ハードウェアが実行する処理をソ
フトウェア的な手段で代用することを意味する。エミュ
レーション可能ならば、そのエミュレーションを実行
し、エミュレーション処理が正常終了したかどうかを判
別する（ステップ２０６）。処理が正常終了したなら
ば、継続して処理を続行する。When the multiplexing is not available (when the constituent elements are not multiplexed or when the constituent elements are multiplexed but cannot be substituted), it is checked whether or not an equivalent function can be emulated (step 202). Here, the emulation means that the processing originally executed by the hardware is replaced by a software means. If the emulation is possible, the emulation is executed and it is determined whether or not the emulation processing is normally completed (step 206). If the processing ends normally, the processing continues.

【００４２】さらに、エミュレーション不可能またはエ
ミュレーション異常終了のときは、ディスク装置が通信
可能な範囲で他のディスク装置に対して同一の処理を依
頼できるかどうか調べる（ステップ２０３）。他装置へ
依頼可能であれば、その処理をその他装置へ依頼して実
行してもらい、その処理が正常終了したかどうかを判別
する（ステップ２０７）。処理が正常終了したならば、
継続して処理を続行する。Further, when the emulation is impossible or the emulation is abnormally terminated, it is checked whether the same processing can be requested to another disk device within a communicable range of the disk device (step 203). If it can be requested to another device, the process is requested to the other device to be executed, and it is determined whether or not the process is normally completed (step 207). If the process ends normally,
Continue processing.

【００４３】さらに、他装置への依頼が不可能または他
装置へ依頼した処理が異常終了のときは、ホストに対し
て同一の処理を依頼できるかどうか調べる（ステップ２
０４）。ホストに対して依頼可能であれば、その処理を
ホストに依頼して実行してもらい、その処理が正常終了
したかどうかを判別する（ステップ２０８）。処理が正
常終了したならば、継続して処理を続行する。ホストへ
の依頼も不可能またはホストへ依頼した処理が異常終了
したときは、入出力要求を発行したホストに対して、回
復不可能の障害発生した旨を通知した後、異常終了す
る。Further, when the request to the other device is impossible or the process requested to the other device is abnormally terminated, it is checked whether the same process can be requested to the host (step 2).
04). If it is possible to request the host, the host is requested to execute the process, and it is determined whether the process has ended normally (step 208). If the processing ends normally, the processing continues. When the request to the host is impossible or the processing requested to the host ends abnormally, the host that issued the I / O request is notified that an unrecoverable failure has occurred, and then ends abnormally.

【００４４】このようにして、本実施例では、障害回復
の手段を多段化している。従来は、高々、多重化した構
成要素を代用するだけの障害回復であったが、図２のよ
うに障害回復の手段を多段化することにより、装置に対
する信頼性をより高めることができる。In this way, in this embodiment, the failure recovery means is multistage. Conventionally, at most, the failure recovery was made only by substituting the multiplexed constituent elements, but the reliability of the apparatus can be further improved by providing the failure recovery means in multiple stages as shown in FIG.

【００４５】次に、障害管理部１１２，１３２，１５２
について、詳細に説明する。３つの障害管理部１１２，
１３２，１５２の構成および動作は同様であるので、以
下では障害管理部１１２について説明するものとし、障
害管理部１３２，１５２については説明を省略する。Next, the fault management units 112, 132, 152
Will be described in detail. Three fault management units 112,
Since the configurations and operations of 132 and 152 are the same, the failure management unit 112 will be described below, and the description of the failure management units 132 and 152 will be omitted.

【００４６】図３は、障害管理部１１２のブロック図を
示す。３０１はコマンド解析プロセッサ、３０２はコマ
ンドバッファ、３０３はデータバッファ、３０４は障害
検知プロセッサ、３０５は障害管理プロセッサ、３０６
は障害管理プロセッサが実行すべきマイクロプログラム
が格納されたメモリ、３１１はデータバッファ内のデー
タの転送先を切り替えるセレクタである。メモリ３０６
には、エミュレーションをリクエストするマイクロプロ
グラム３０７、外部装置による代替処理をリクエストす
るマイクロプログラム３０８、およびホストによる代替
処理をリクエストするマイクロプログラム３０９が格納
されている。FIG. 3 shows a block diagram of the fault management unit 112. 301 is a command analysis processor, 302 is a command buffer, 303 is a data buffer, 304 is a failure detection processor, 305 is a failure management processor, 306
Is a memory in which a microprogram to be executed by the fault management processor is stored, and 311 is a selector for switching the transfer destination of the data in the data buffer. Memory 306
A microprogram 307 requesting emulation, a microprogram 308 requesting alternative processing by an external device, and a microprogram 309 requesting alternative processing by the host are stored.

【００４７】障害管理部１１２における処理の概要を説
明する。まず、システムバス１８３を介して転送された
コマンドがコマンド解析プロセッサ３０１で解釈され、
そのコマンドを実行するためにディスクシステム内のど
の構成要素を必要とするのかが解析される。その結果
は、障害検知プロセッサ３０４に転送される。これを受
けて、障害検知部３０４は、処理に必要な構成要素が実
行可能状態にあるのかどうかを判別する。この判別は、
構成管理テーブル１１１を参照することにより行なう。The outline of the processing in the fault management unit 112 will be described. First, the command transferred via the system bus 183 is interpreted by the command analysis processor 301,
It analyzes which component in the disk system it needs to execute the command. The result is transferred to the fault detection processor 304. In response to this, the failure detection unit 304 determines whether or not the constituent elements required for the processing are in the executable state. This determination is
This is performed by referring to the configuration management table 111.

【００４８】処理に必要な構成要素が障害を起こしてい
た場合、障害検知プロセッサ３０４は、再度、構成管理
テーブル１１１を参照し、障害を起こした構成要素を代
替する代わりの処理方法が存在するかどうかを検索す
る。When a component required for processing has a failure, the failure detection processor 304 refers to the configuration management table 111 again, and is there an alternative processing method for substituting the failed component? Search for something.

【００４９】もし代わりの処理方法が見つかれば、その
情報を障害管理プロセッサ３０５へ転送し、代わりの処
理を依頼する。その依頼を受けた障害管理プロセッサ３
０５は、その依頼に応じて、メモリ３０６内のマイクロ
プログラムを実行する。また、障害管理プロセッサ３０
５は、代わりの処理がどこで実行されたかに基づいてセ
レクタ３１１のデータ転送先を切り替える。If an alternative processing method is found, the information is transferred to the fault management processor 305, and an alternative processing is requested. Fault management processor 3 that received the request
05 executes the microprogram in the memory 306 in response to the request. In addition, the fault management processor 30
5 switches the data transfer destination of the selector 311 based on where the alternative process is executed.

【００５０】この実施例では、代わりの処理（代替処
理）として、図２に示したように、同一構成要素の多重
化による方法のほか、エミュレーションによる方法、外
部装置に処理依頼を行なう方法、およびホストに対して
処理依頼を行なう方法などが設けられている。したがっ
て、障害管理プロセッサ３０５は、代替処理を、エミュ
レーションにより行なう場合はエミュレーションリクエ
ストプログラム３０７を、外部装置に処理依頼を行なう
場合は外部装置リクエストプログラム３０８を、ホスト
に対して処理依頼を行なう場合はホストリクエストプロ
グラム３０９を、それぞれ実行することとなる。In this embodiment, as alternative processing (alternative processing), as shown in FIG. 2, in addition to the method of multiplexing the same constituent elements, the method of emulation, the method of requesting processing to an external device, and A method for requesting processing to the host is provided. Therefore, the fault management processor 305 uses the emulation request program 307 when performing the alternative process by emulation, the external device request program 308 when requesting a process to an external device, and the host when requesting the process to the host. The request program 309 will be executed respectively.

【００５１】上記の処理により、障害が発生しても処理
を中断することなく継続して実行可能となる。By the above processing, even if a failure occurs, the processing can be continuously executed without interruption.

【００５２】次に、障害管理部１１６，１３６，１５６
について説明する。３つの障害管理部１１６，１３６，
１５６の構成および動作は同様であるので、以下では障
害管理部１１６について説明する。また、障害管理部１
１６の構成および動作は、上述の障害管理部１１２と同
様であるので、以下では、特に異なる部分について説明
する。Next, the fault management units 116, 136 and 156.
Will be described. Three failure management units 116, 136
Since the configuration and operation of 156 are the same, the fault management unit 116 will be described below. In addition, the fault management unit 1
The configuration and operation of 16 are the same as those of the failure management unit 112 described above, and therefore, in the following, particularly different portions will be described.

【００５３】図４は、図１の障害管理部１１６のブロッ
ク図を示したものである。障害管理部１１６の構成要素
４０１〜４１１は、図３の障害管理部１１２の構成要素
３０１〜３１１と同等のものである。ただし、障害管理
部１１６では、必要な機能が少ない。これは、オプショ
ン機構を管理していないためである。FIG. 4 is a block diagram of the fault management unit 116 shown in FIG. The components 401 to 411 of the fault management unit 116 are equivalent to the components 301 to 311 of the fault management unit 112 of FIG. However, the failure management unit 116 has few necessary functions. This is because the option mechanism is not managed.

【００５４】オプション機構を管理していないので、オ
プション機構へのデータパスの切り替えの必要がなく、
またオプション機構のエミュレーションをリクエストす
ることもないので、メモリ４０６内にエミュレーション
リクエストプログラムを備える必要もない。したがっ
て、図４では、図３のデータパス３１２やエミュレーシ
ョンリクエストプログラム３０７に相当する部分はな
い。Since the option mechanism is not managed, there is no need to switch the data path to the option mechanism,
Further, since the emulation of the optional mechanism is not requested, it is not necessary to provide the emulation request program in the memory 406. Therefore, in FIG. 4, there is no part corresponding to the data path 312 or the emulation request program 307 of FIG.

【００５５】このように、障害管理部１１２，１３２，
１５２，１１６，１３６，１５６は、それぞれ、その管
理する対象に応じて必要な機能のみを備えていればよ
い。したがって、障害管理部を設けるにあたっては、コ
ストを最小限に抑えることができる。In this way, the fault management units 112, 132,
Each of 152, 116, 136, and 156 may have only necessary functions according to the object to be managed. Therefore, the cost can be minimized when the failure management unit is provided.

【００５６】図５は、構成管理テーブル１１１の構造を
示したものである。構成管理テーブル１３１，１５１も
同様の構成である。FIG. 5 shows the structure of the configuration management table 111. The configuration management tables 131 and 151 also have the same configuration.

【００５７】図５において、５０１は当該ディスクシス
テムの構成要素を示し、例えば圧縮機構１１３、制御プ
ロセッサ１１４，１１７、キャッシュメモリ１１５、デ
ィスクドライブ１１８，１１９などがそれにあたる。図
５の構成要素５０１の欄が「圧縮機構（ＯＰ）」とある
のは圧縮機構１１３を、「制御ＣＰＵ１」とあるのは制
御プロセッサ１１４を、「制御ＣＰＵ２」とあるのは制
御プロセッサ１１７を、「キャッシュ」とあるのはキャ
ッシュメモリ１１５を、「ドライブ１」とあるのはディ
スクドライブ１１８を、「ドライブ２」とあるのはディ
スクドライブ１１９を、それぞれ示している。In FIG. 5, reference numeral 501 denotes a constituent element of the disk system, for example, the compression mechanism 113, the control processors 114 and 117, the cache memory 115, the disk drives 118 and 119. In the column of the component 501 in FIG. 5, “compression mechanism (OP)” means the compression mechanism 113, “control CPU 1” means the control processor 114, and “control CPU 2” means the control processor 117. , "Cache" indicates the cache memory 115, "drive 1" indicates the disk drive 118, and "drive 2" indicates the disk drive 119.

【００５８】５０２は各構成要素の多重化の情報を記憶
する領域を示す。ここには、当該構成要素が、もし単独
で存在するなら”１”、２重化されているなら”２”な
どと格納されている。ある構成要素がｎ重化されている
とき、その構成要素はｎ個の同じ要素を内部に備えてい
ることになるが、それらｎ個の要素を部分要素と呼ぶも
のとする。多重化の情報５０２により、多重化による代
替処理が可能かどうかがわかる。もし、多重化された構
成要素中の部分要素に障害が発生し、その部分要素が利
用不可能になった場合は、本フィールドの値は障害が起
こった部分要素の数だけ引かれる。Reference numeral 502 denotes an area for storing information on multiplexing of each component. Here, the component is stored as "1" if it exists alone, as "2" if it is duplicated, and so on. When a constituent element is n-folded, the constituent element has n identical elements therein, and these n elements are called partial elements. The multiplexing information 502 shows whether or not alternative processing by multiplexing is possible. If a subelement of the multiplexed component fails and the subelement becomes unavailable, the value of this field is subtracted by the number of subelements that have failed.

【００５９】５０３は当該構成要素の障害の有無を示
す。多重度による障害回復ができなくなったときに、本
フィールドがマーク（「×」印）される。図では、
「○」でその構成要素に障害がないことを、「×」でそ
の構成要素に障害があって利用不可能であることを示
す。本フィールドがマークされたとき（障害が発生した
とき）、後述の代替手段５０５により当該構成要素を代
替させて処理を続行させるか、または続行不可能を処理
依頼者に対して伝えなければならないかを判断する。Reference numeral 503 indicates the presence / absence of a fault in the component. This field is marked (marked with "x") when failure recovery due to multiplicity becomes impossible. In the figure,
“O” indicates that the component has no fault, and “x” indicates that the component is faulty and cannot be used. When this field is marked (when a failure occurs), it is necessary to substitute the component by the alternative means 505 described later to continue the processing, or to inform the processing requester that the processing cannot be continued. To judge.

【００６０】５０４は当該構成要素の障害発生回数を記
憶しておくフィールドである。各構成要素は、障害が発
生しても何度かリトライすることで正常な動作ができる
ようになることがある。本フィールドは、リトライによ
り再度使用できるようになった回数を記憶しているフィ
ールドである。装置の管理者は、本情報を参照すること
で、リトライにより処理が続けられてはいるが、あまり
にも障害発生回数が多いと何か障害を発生させる原因が
あるのではないかと推測でき、早期に装置の改良にかか
ることができる。なお、本フィールド５０４でいう「障
害」は、上記フィールド５０３でいう「障害」とは意味
が異なる。フィールド５０３でいう「障害」は、その構
成要素が利用不可能となる障害のことである。Reference numeral 504 is a field for storing the number of times of occurrence of a failure in the component. Even if a failure occurs, each component may be able to operate normally by retrying several times. This field is a field that stores the number of times it can be used again by retry. By referring to this information, the device administrator can infer that processing may continue due to retries, but if there are too many failures, there may be some cause of failures, and the early warning In addition, the device can be improved. The “fault” in the field 504 has a different meaning from the “fault” in the field 503. The "fault" in the field 503 is a fault in which the constituent element cannot be used.

【００６１】５０５は当該構成要素に対する代替手段を
示す。代替手段とは、当該構成要素が（多重化されてい
るときは、その部分要素のすべて）障害を起こしたと
き、その構成要素を代替して同等の処理を行なう他の手
段のことである。例えば、圧縮機構１１３の場合は、代
替手段として、プロセッサによるエミュレーション、他
の装置に対して圧縮処理を依頼する手段、およびホスト
に対して圧縮処理を依頼する手段などがあり、これらが
フィールド５０５に登録されている。Reference numeral 505 indicates an alternative means for the component. The alternative means is another means for substituting the constituent element and performing an equivalent process when the constituent element causes a failure (in the case of multiplexing, all of the partial elements). For example, in the case of the compression mechanism 113, alternative means include emulation by a processor, means for requesting compression processing to another device, and means for requesting compression processing to the host. These are stored in the field 505. It is registered.

【００６２】上述した内容を有する構成管理テーブル１
１１は、図１に示したように、各構成要素が障害を発生
したときに、パス１２１〜１２６（構成管理テーブル１
３１，１５１ではパス１４１〜１４６，１６１〜１６
６）を通じて更新される。すなわち、障害を発生した構
成要素自身が、構成管理テーブルを更新する。Configuration management table 1 having the above contents
As shown in FIG. 1, 11 indicates paths 121 to 126 (configuration management table 1) when each component fails.
31 and 151, paths 141 to 146, 161 to 16
It will be updated through 6). That is, the failed component itself updates the configuration management table.

【００６３】次に、さらに詳細に図３の障害管理部１１
２（図４の障害管理部１１６も同様）の説明を行なう。Next, the fault management unit 11 of FIG. 3 will be described in more detail.
2 (the same applies to the failure management unit 116 in FIG. 4).

【００６４】図６は、ホストから障害管理部に転送され
るコマンドパケットの形式を示す。コマンドパケット
は、処理要求と、制御情報と、もし必要であれば処理デ
ータとが連なったデータ列である。６０１はコマンドフ
ィールドであり、本コマンドパケットが何を要求するも
のかが記述されている。６０２は位置情報を示し、フィ
ールド６０１の要求が何処に対してのものなのかが記述
されている。例えば、コマンドフィールド６０１にディ
スクドライブに対しての書き込み（ＷＲＩＴＥ）要求が
記述されているのであれば、位置情報フィールド６０２
には、装置名、ドライブ名、およびドライブブロック番
号などが格納されていることになる。FIG. 6 shows the format of a command packet transferred from the host to the failure management section. A command packet is a data string in which a processing request, control information, and if necessary, processing data are linked. A command field 601 describes what the command packet requests. A position information 602 indicates where the request in the field 601 is directed. For example, if the command field 601 describes a write (WRITE) request to the disk drive, the position information field 602
The device name, drive name, drive block number, and the like are stored in.

【００６５】図７に、この位置情報フィールドに記述さ
れる情報とその意味を示す。７０１は位置情報の具体例
を示し、７０２はその意味を示す。位置情報７０１に記
載されている「装置１」は装置名、「ＶＯＬ１」はディ
スクドライブ名、「Ｂｌｏｃｋ１００」はドライブブロ
ック番号を示す。図に示すように、位置情報の指定方法
としては、装置名からブロック番号まですべて指定する
方法、装置名とディスクドライブ名を指定する方法、装
置名のみを指定する方法、あるいは何処の装置が処理を
行なってもかまわない”ＡＮＹ”を指定する方法などが
存在する。FIG. 7 shows the information described in this position information field and its meaning. Reference numeral 701 shows a specific example of the position information, and 702 shows its meaning. In the position information 701, "device 1" indicates a device name, "VOL1" indicates a disk drive name, and "Block100" indicates a drive block number. As shown in the figure, the location information can be specified by specifying all from the device name to the block number, specifying the device name and disk drive name, specifying only the device name, or where There is a method of designating "ANY" which does not matter.

【００６６】再び図６を参照して、６０３は、誰が本コ
マンドパケットを発行したのか、その要求者を記述する
フィールドである。通常の場合はホストからのディスク
入出力要求であるため、本フィールド６０３にはホスト
の識別子が記述されている。なお、本発明では、ある装
置が他の装置に対してコマンドパケットを発行すること
もあるため、そのようなコマンドパケットでは本フィー
ルド６０３には要求者である装置名が記述されることと
なる。これらについては後で詳細に説明する。６０４は
コマンドに対する処理データが格納されている。本フィ
ールド６０４は、コマンドによっては必要ない場合もあ
る。Referring again to FIG. 6, 603 is a field for describing who issued this command packet and the requester thereof. In the normal case, since the disk input / output request is made from the host, the host identifier is described in this field 603. In the present invention, a certain device may issue a command packet to another device, and therefore, in such a command packet, the device name of the requester is described in this field 603. These will be described in detail later. 604 stores processing data for the command. This field 604 may not be necessary depending on the command.

【００６７】図８は、図３に示した障害管理部１１２の
コマンド解析プロセッサ３０１および障害検知プロセッ
サ３０４の処理フローチャートを示したものである。ス
テップ８０１〜８０４はコマンド解析部３０１の動作を
示し、ステップ８０５〜８０９は障害検知プロセッサ３
０４の動作を示す。FIG. 8 is a processing flowchart of the command analysis processor 301 and the failure detection processor 304 of the failure management unit 112 shown in FIG. Steps 801 to 804 show the operation of the command analysis unit 301, and steps 805 to 809 are the failure detection processor 3
The operation of 04 is shown.

【００６８】コマンド解析プロセッサ３０１は、ステッ
プ８０１で、システムバス１８３を介して転送されたパ
ケット（例えば図６）の読み込みを行なう。ステップ８
０２では読み込まれたパケットの解析が行なわれ、パケ
ットのコマンド部（図６のコマンド６０１、位置情報６
０２、および要求者６０３）をコマンドバッファ３０２
へ転送する。次に、ステップ８０３で、パケットのデー
タ部（図６の処理データ６０４）をデータバッファに転
送する。In step 801, the command analysis processor 301 reads the packet (for example, FIG. 6) transferred via the system bus 183. Step 8
In 02, the read packet is analyzed, and the command part of the packet (command 601 in FIG. 6, position information 6
02, and the requester 603) to the command buffer 302
Transfer to. Next, in step 803, the data part of the packet (processed data 604 in FIG. 6) is transferred to the data buffer.

【００６９】さらに、ステップ８０４では、パケットの
コマンド部を解析し、必要ならその解析結果を障害検知
プロセッサ３０４に送る。図８では、要求されたコマン
ドが圧縮機構１１３を使用する処理であるのかどうかを
判定する部分（ステップ８０４）のみ図示しているが、
他に、キャッシュメモリ１１５および制御プロセッサ１
１４についても、それらを使用するのかどうか判別さ
れ、使用するのであればその旨が障害検知プロセッサ３
０４に知らされる。Further, in step 804, the command part of the packet is analyzed, and if necessary, the analysis result is sent to the fault detection processor 304. Although FIG. 8 illustrates only the portion (step 804) for determining whether or not the requested command is a process using the compression mechanism 113,
In addition, the cache memory 115 and the control processor 1
Also for 14, it is judged whether or not to use them, and if so, that fact is detected by the failure detection processor 3
04 will be informed.

【００７０】なお、図４のコマンド解析プロセッサ４０
１であれば、制御プロセッサ１１７およびディスクドラ
イブ１１８，１１９について、それらを使用するのかど
うか判別され、使用するのであればその旨が障害検知プ
ロセッサ４０４に知らされることとなる。The command analysis processor 40 shown in FIG.
If it is 1, the control processor 117 and the disk drives 118 and 119 are discriminated whether to use them, and if they are to be used, the failure detection processor 404 is notified of that fact.

【００７１】次に、障害検知プロセッサ３０４は、ステ
ップ８０５で、コマンド解析プロセッサ３０１からの解
析結果に基づいて構成管理テーブル１１１の検索を行な
う。この検索は、ステップ８０４で選択された構成要素
が使用可能かどうか判定を行なうための検索である。具
体的には、図５の構成管理テーブル１１１のフィールド
５０３を参照することにより行なう。ステップ８０６で
は、その検索の結果、選択された構成要素に障害が発生
しているかどうかを判定する。Next, in step 805, the fault detection processor 304 searches the configuration management table 111 based on the analysis result from the command analysis processor 301. This search is a search for determining whether or not the component selected in step 804 can be used. Specifically, this is done by referring to the field 503 of the configuration management table 111 of FIG. In step 806, as a result of the search, it is determined whether or not the selected component has a failure.

【００７２】例えば、ステップ８０４の解析の結果、要
求されたコマンドを実行する際に圧縮機構１１３を使用
するということが知らされていたら、ステップ８０５で
は図５の構成管理テーブル１１１の圧縮機構１１３の障
害の有無５０３を参照し、ステップ８０６で圧縮機構１
１３が使用可能かどうか判定する。For example, if the result of the analysis at step 804 indicates that the compression mechanism 113 is used when executing the requested command, at step 805, the compression mechanism 113 of the configuration management table 111 of FIG. Referring to the presence / absence 503 of the failure, in step 806, the compression mechanism 1
It is determined whether 13 is available.

【００７３】ステップ８０６で当該構成要素に障害が発
生していないなら、その構成要素による通常の動作によ
って、要求されたコマンドを実行することができるか
ら、障害管理プロセッサ３０５にその旨を知らせて、処
理を障害管理プロセッサ３０５に移す。これにより、障
害管理プロセッサ３０５は、障害が発生していない場合
の通常の動作を継続して、そのコマンドを実行すること
となる。この障害管理プロセッサ３０５の処理は、図１
０を参照して後述する。If no fault has occurred in the component in step 806, the requested command can be executed by the normal operation of the component, so the fault management processor 305 is notified of that fact, The processing is transferred to the fault management processor 305. As a result, the fault management processor 305 continues the normal operation when no fault has occurred and executes the command. The process of this fault management processor 305 is as shown in FIG.
This will be described later with reference to 0.

【００７４】ステップ８０６で当該構成要素に障害が発
生しているときは、その構成要素は利用不可能であるか
ら、ステップ８０７で再度構成管理テーブル１１１を検
索する。そして、ステップ８０８で、その構成要素（例
えば、圧縮機構１１３）に代替手段があるかどうかを判
別する。If a failure has occurred in the constituent element in step 806, the constituent element cannot be used. Therefore, the configuration management table 111 is searched again in step 807. Then, in step 808, it is determined whether or not the constituent element (for example, the compression mechanism 113) has an alternative means.

【００７５】代替手段があれば、障害管理プロセッサ３
０５にその旨を知らせて、処理を障害管理プロセッサ３
０５に移す。これにより、障害管理プロセッサ３０５
は、障害が発生した構成要素に対し、（実際には、所定
回数のリトライを行なった後に）代替手段に処理を依頼
して、障害が発生した構成要素を代替させる。この障害
管理プロセッサ３０５の処理は、図１０を参照して後述
する。If there is an alternative, the fault management processor 3
05 to that effect and the processing is executed by the fault management processor 3
Move to 05. As a result, the fault management processor 305
Requests the alternative means to process the component having the failure (actually, after retrying a predetermined number of times) to replace the component having the failure. The process of the fault management processor 305 will be described later with reference to FIG.

【００７６】ステップ８０８でもし代替手段がないよう
であれば、ステップ８０９で回復不可能障害である旨
を、処理の依頼者に通知する。If there is no alternative means in step 808, the processing requester is notified in step 809 that the failure is unrecoverable.

【００７７】図９は、障害検知プロセッサ３０４が構成
要素の障害を検知し、かつその構成要素の代替手段があ
るとき（図８のステップ８０８から障害管理プロセッサ
に処理が移行するシーケンス）、障害管理プロセッサ３
０５へ転送するパケットを示している。FIG. 9 shows the fault management when the fault detection processor 304 detects a fault in a component and there is a substitute for that component (sequence from step 808 in FIG. 8 where the process shifts to the fault management processor). Processor 3
The packet to be transferred to 05 is shown.

【００７８】９０１は障害を起こした構成要素を示すフ
ィールド、９０２はその構成要素の障害発生回数を格納
したフィールドである。障害発生回数は、図５のフィー
ルド５０４に記述されている。フィールド９０３には、
障害を起こした構成要素を代替する代替手段（代替構成
要素）が記述されている。本実施例において、圧縮機構
１１３の代替手段としては、エミュレーションによる手
段、他の装置に対して圧縮処理を依頼する手段、および
ホストに対して圧縮処理を依頼する手段などがあり、こ
こではエミュレーションによる手段が選択された場合の
パケットの例を示している。Reference numeral 901 is a field indicating a component having a failure, and reference numeral 902 is a field storing the number of occurrences of failure of the component. The number of times of failure occurrence is described in the field 504 of FIG. In field 903,
An alternative means (alternative component) for substituting the failed component is described. In the present embodiment, as alternative means of the compression mechanism 113, there are means by emulation, means for requesting compression processing to another device, means for requesting compression processing to the host, etc. The example of the packet when a means is selected is shown.

【００７９】図１０は、図３に示した障害管理部１１２
の障害管理プロセッサ３０５の処理フローチャートを示
したものである。FIG. 10 shows the fault management unit 112 shown in FIG.
7 is a processing flowchart of the fault management processor 305.

【００８０】まず、ステップ１００１では、要求された
処理を実行する際に必要な構成要素に障害が発生してい
るかどうかをチェックする。障害が無いようであれば
（図８のステップ８０６で障害なしのとき）、処理を続
行する。ステップ１００１で、障害が発生しているよう
であれば（図８のステップ８０６〜８０８で障害有りか
つ代替手段有りのとき）、ステップ１００２に移る。First, in step 1001, it is checked whether or not a failure has occurred in a component necessary for executing the requested processing. If there is no obstacle (when there is no obstacle at step 806 in FIG. 8), the process is continued. If a failure occurs in step 1001 (when there is a failure and there is an alternative means in steps 806 to 808 in FIG. 8), the process proceeds to step 1002.

【００８１】ステップ１００２では、転送されたパケッ
ト（例えば図９）を解析し、障害が発生した構成要素に
対しリトライを行なう。そして、ステップ１００３で、
ステップ１００２のリトライが成功したかどうかをチェ
ックする。リトライが成功したらステップ１００９に進
み、リトライが失敗したらステップ１００４に進む。ス
テップ１００４では、リトライの回数があらかじめ設定
された値”Ｍ”を越えたかどうかをチェックする。越え
ていないときはステップ１００２に戻り、リトライを繰
り返す。ステップ１００４でリトライ回数が所定値”
Ｍ”を越えたときは、障害検知プロセッサ３０４が選択
した代替手段を用いるため、ステップ１００５に進む。In step 1002, the transferred packet (for example, FIG. 9) is analyzed, and the component having the failure is retried. Then, in step 1003,
It is checked whether the retry in step 1002 has succeeded. If the retry is successful, the process proceeds to step 1009, and if the retry is unsuccessful, the process proceeds to step 1004. At step 1004, it is checked whether or not the number of retries exceeds a preset value "M". If not exceeded, the process returns to step 1002 and repeats the retry. The number of retries is a predetermined value in step 1004 "
If M ”is exceeded, the alternative means selected by the fault detection processor 304 is used, and the process advances to step 1005.

【００８２】ステップ１００３でリトライが成功したと
きは、ステップ１００９で、構成管理テーブル内の障害
発生回数（図５のフィールド５０４）の内容に１をプラ
スする。そして、ステップ１０１０で、プラスされた値
があらかじめ設定された値”Ｎ”を越えたかどうかチェ
ックする。もし所定の値”Ｎ”を越えているようであれ
ば、ステップ１０１１に移り、当該構成要素に障害が多
いことをホストに通知してステップ１０１２に移る。ス
テップ１０１０で障害発生回数が所定の値”Ｎ”を越え
ていないようであれば、ステップ１０１２に移る。When the retry is successful in step 1003, 1 is added to the content of the failure occurrence frequency (field 504 in FIG. 5) in the configuration management table in step 1009. Then, in step 1010, it is checked whether or not the added value exceeds the preset value "N". If it exceeds the predetermined value “N”, the process moves to step 1011 to notify the host that there are many faults in the component and to move to step 1012. If it is determined in step 1010 that the failure occurrence frequency does not exceed the predetermined value “N”, the process proceeds to step 1012.

【００８３】ステップ１０１２ではセレクタ３１１（図
３）を通常パスに切り替える。例えば、図３の圧縮機構
１１３の障害が検知され、リトライの結果、とりあえず
圧縮機構１１３が正常となったときは、データバッファ
３０３の圧縮処理すべきデータが圧縮機構１１３に転送
されるように、セレクタ３１１のパスを通常パス３１２
に切り替える。In step 1012, the selector 311 (FIG. 3) is switched to the normal path. For example, when a failure of the compression mechanism 113 in FIG. 3 is detected and the result of the retry is that the compression mechanism 113 becomes normal for the time being, the data to be compressed in the data buffer 303 is transferred to the compression mechanism 113. The path of the selector 311 is the normal path 312.
Switch to.

【００８４】ステップ１００４でリトライ回数が所定
値”Ｍ”を越えたときは、ステップ１００５で、障害検
知プロセッサ３０４が選択した代替手段がエミュレーシ
ョンであるかどうか判別する。代替手段がエミュレーシ
ョンであればエミュレーションリクエストを行ない、そ
うでなければステップ１００６に進む。ステップ１００
６では、代替手段が他装置への依頼であるかどうか判別
する。他装置への依頼であればそれを実行し、そうでな
ければステップ１００７に進む。ステップ１００７では
代替手段がホストへの依頼であるかどうか判別する。ホ
ストへの依頼であればそれを実行し、そうでなければス
テップ１００８に進む。When the number of retries exceeds the predetermined value "M" in step 1004, it is determined in step 1005 whether the alternative means selected by the fault detection processor 304 is emulation. If the alternative means is emulation, the emulation request is made, and if not, the process proceeds to step 1006. Step 100
At 6, it is determined whether the alternative means is a request to another device. If it is a request to another device, it is executed, and if not, the process proceeds to step 1007. In step 1007, it is determined whether the alternative means is a request to the host. If it is a request to the host, it is executed, and if not, the process proceeds to step 1008.

【００８５】障害検知プロセッサ３０４が選択した代替
手段が見つからないようであれば、ステップ１００８で
修復不可能障害発生を処理要求者に伝える。If the alternative means selected by the failure detection processor 304 cannot be found, the occurrence of an unrepairable failure is notified to the processing requester in step 1008.

【００８６】次に、図１０のステップ１００５〜ステッ
プ１００７で選択された各代替手段の詳細な処理手順を
説明する。Next, a detailed processing procedure of each alternative means selected in steps 1005 to 1007 of FIG. 10 will be described.

【００８７】図１１は、図１０のステップ１００５から
実行されるエミュレーションの処理フローチャートを示
す。図１０の左側のフローチャートは、図３で障害管理
プロセッサ３０５が実行するエミュレーションリクエス
トプログラム３０７の処理を示す。ここでは、圧縮処理
をリクエストするものとする。図１０の右側のフローチ
ャートは、圧縮処理のリクエストに応じて、図３の制御
プロセッサ１１４が圧縮処理を行なう手順を示す。ここ
で、エミュレーションとは、本来オプション機構１１３
が行なわなければならない処理を、制御プロセッサ１１
４のマイクロプログラムで代替して同等の処理を行なう
ことを意味している。FIG. 11 shows a processing flow chart of emulation executed from step 1005 of FIG. The flowchart on the left side of FIG. 10 shows the processing of the emulation request program 307 executed by the fault management processor 305 in FIG. Here, the compression process is requested. The flowchart on the right side of FIG. 10 shows a procedure in which the control processor 114 of FIG. 3 performs a compression process in response to a request for the compression process. Here, the emulation is originally an optional mechanism 113.
Control processor 11 performs processing that must be performed by
It means that the same processing is performed by substituting the micro program of 4.

【００８８】図１１において、障害管理プロセッサ３０
５は、ステップ１１０１で、図３のセレクタ３１１を制
御プロセッサ１１４へのパス３１４に切り替える。これ
により、データバッファ３０３内のデータは、本来のオ
プション機構１１３から制御プロセッサ１１４に転送さ
れるようになる。ステップ１１０２では、制御プロセッ
サ１１４に対しエミュレーション要求を発行する。具体
的には、エミュレーション要求を示す識別子と共に制御
プロセッサ１１４に対して割り込みをかける。In FIG. 11, the fault management processor 30
5 switches the selector 311 of FIG. 3 to the path 314 to the control processor 114 in step 1101. As a result, the data in the data buffer 303 is transferred from the original option mechanism 113 to the control processor 114. At step 1102, an emulation request is issued to the control processor 114. Specifically, the control processor 114 is interrupted together with the identifier indicating the emulation request.

【００８９】割り込みを受けた制御プロセッサ１１４
は、エミュレーションであることを認識した後、ステッ
プ１１０５で、データバッファ３０３から圧縮を行なう
べきデータを取り込む。ステップ１１０６では、オプシ
ョン機構１１３の動作をエミュレートする。具体的に
は、オプション機構１１３と同じアルゴリズムによる圧
縮を行なう。Control processor 114 that received the interrupt
After recognizing that it is emulation, in step 1105, the data to be compressed is fetched from the data buffer 303. In step 1106, the operation of the option mechanism 113 is emulated. Specifically, the compression is performed by the same algorithm as the option mechanism 113.

【００９０】ステップ１１０７では、ステップ１１０６
の圧縮処理の結果が正常かどうかをチェックする。圧縮
処理が異常終了であれば、ステップ１１０８に進み、異
常終了を障害管理部１１２の障害管理プロセッサ３０５
に通知する。正常にエミュレーションが終了したのであ
れば、ステップ１１０９で、キャッシュ１１５に圧縮後
のデータを書き込む。その後、ステップ１１１０で、正
常終了を障害管理部１１２の障害管理プロセッサ３０５
に通知する。In step 1107, step 1106
Check whether the result of the compression process of is normal. If the compression processing ends abnormally, the process proceeds to step 1108, and the abnormal end is indicated by the failure management processor 305 of the failure management unit 112.
To notify. If the emulation has been normally completed, the compressed data is written in the cache 115 in step 1109. Thereafter, in step 1110, the normal termination is indicated by the fault management processor 305 of the fault management unit 112.
To notify.

【００９１】終了通知を受け取った障害管理プロセッサ
３０５は、エミュレーション処理内でステップ１１０３
を実行し、異常終了かどうかチェックする。その結果、
異常終了であれば、ステップ１１０４で、障害検知プロ
セッサ３０４に対して異常終了を報告する。ステップ１
１０３で異常終了でなければ、エミュレーション処理を
終了する。The fault management processor 305, which has received the end notification, executes step 1103 in the emulation processing.
And check if it has terminated abnormally. as a result,
If it is an abnormal end, the abnormal end is reported to the failure detection processor 304 in step 1104. Step 1
If the process does not end abnormally in 103, the emulation process ends.

【００９２】異常終了を報告された障害検知プロセッサ
３０４は、構成管理テーブル１１１内のフィールド５０
５からエミュレーションに代わる代替手段をさがし、代
替手段があればその代替手段による代替処理を上記と同
様にして試みる。この処理は、代替手段による処理が正
常終了するかあるいは代替手段の候補が無くなるまでつ
づけられる。The fault detection processor 304, which has been notified of the abnormal termination, uses the field 50 in the configuration management table 111.
The alternative means for emulation is searched for from No. 5, and if there is an alternative means, the alternative process by the alternative means is tried in the same manner as above. This processing is continued until the processing by the alternative means ends normally or there are no alternative means candidates.

【００９３】上述したように、障害を起こした構成要素
に代えてエミュレーションによって処理するので、コス
トの高いハードウェアを多重に持たなくても、多重化と
ほぼ同じ信頼性を得ることができ、低コストで信頼性の
高いディスクシステムを実現することができる。エミュ
レーションは性能面ではハードウェアに劣るが、例え
ば、障害が発生してから比較的短時間の間に保守を行な
い部品交換を行なう体制であればほとんど問題はない。As described above, since the processing is performed by emulation instead of the faulty component, it is possible to obtain almost the same reliability as the multiplexing without having to multiply the costly hardware. It is possible to realize a highly reliable disk system at a cost. The emulation is inferior to the hardware in terms of performance, but there is almost no problem if, for example, the system is such that maintenance is performed within a relatively short time after a failure occurs and parts are replaced.

【００９４】また、本実施例によれば、次のような効果
もある。オプション機構１１３を装備していないディス
ク装置に対して、オプション機構を使用する要求が発行
された場合、従来であれば、オプション機構が無い旨を
ユーザに通知し、異常終了と判定されていたが、本実施
例によれば、その場合でもエミュレーションによる処理
続行が可能となる。Further, according to this embodiment, there are the following effects. When a request for using the option mechanism is issued to a disk device not equipped with the option mechanism 113, conventionally, the user is notified that there is no option mechanism, and it is determined that the termination is abnormal. According to the present embodiment, even in that case, it is possible to continue processing by emulation.

【００９５】図１２は、上記エミュレーションによるデ
ータの流れを示したものである。太い矢印がデータの流
れを示す。FIG. 12 shows a data flow by the emulation. Thick arrows indicate the flow of data.

【００９６】オプション機構１１３に障害が発生した場
合、制御プロセッサ１１４でオプション機構１１３と同
等の処理（エミュレーション）を行ない、オプション機
構１１３が正常に動作したのと同じようにキャッシュメ
モリ１１５に圧縮データが格納されることを示してい
る。このような代替処理は、オプション機構１１３の障
害に限らず、ソフトウェア的に処理可能なものであれば
なんでも適用可能である。When a failure occurs in the option mechanism 113, the control processor 114 performs the same processing (emulation) as that of the option mechanism 113, and the compressed data is stored in the cache memory 115 as if the option mechanism 113 operated normally. It indicates that it is stored. Such an alternative process is not limited to the failure of the option mechanism 113, and any process that can be processed by software can be applied.

【００９７】図１３は、図１０のステップ１００６から
実行される他装置への障害回復依頼の処理フローチャー
トを示す。本処理は、図３で障害管理プロセッサ３０５
が実行する他装置への依頼リクエストプログラム３０８
の処理に相当する。FIG. 13 shows a processing flow chart of a fault recovery request to another device which is executed from step 1006 of FIG. This process is performed by the fault management processor 305 in FIG.
Request program 308 to be executed by another device
Corresponds to the processing of.

【００９８】まず、障害管理プロセッサ３０５は、ステ
ップ１３０１で、他装置に対する要求パケットを作成す
る。First, in step 1301, the fault management processor 305 creates a request packet for another device.

【００９９】図１４に、この他装置に対する要求パケッ
トのフォーマット例を示す。このパケットは、コマンド
フィールド１４０１、位置情報フィールド１４０２、依
頼者フィールド１４０３、依頼する処理内容フィールド
１４０４、および処理を行なうデータフィールド１４０
５からなる。FIG. 14 shows an example of the format of a request packet for this other device. This packet includes a command field 1401, a position information field 1402, a requester field 1403, a processing content field 1404 to be requested, and a data field 140 to be processed.
It consists of 5.

【０１００】コマンドフィールド１４０１には、障害回
復依頼であることが記述されている。位置情報フィール
ド１４０２には、どの装置で実行して欲しいかが記述さ
れている。この例では”ＡＮＹ”指定であるため、圧縮
処理が可能な装置であればどこでもかまわないことを示
している。なお、複数の装置で同じ処理が行なわれた場
合は、最も早く結果を返してきた装置の値を使用するな
どの方法を採ればよい。The command field 1401 describes that the request is a failure recovery request. The location information field 1402 describes which device the user wants to execute. In this example, since "ANY" is designated, it indicates that any device that can perform compression processing may be used. If the same processing is performed by a plurality of devices, a method such as using the value of the device that returns the earliest result may be used.

【０１０１】依頼者フィールド１４０３には、障害を起
こした装置の識別子が格納される。この識別子は、依頼
された装置から処理結果を転送（返送）するときに用い
られる。依頼する処理内容フィールド１４０４には、本
実施例では圧縮機構の代行要求が格納されている。デー
タフィールド１４０５には、圧縮処理を行なう対象であ
るデータが格納されている。The requester field 1403 stores the identifier of the failed device. This identifier is used when the processing result is transferred (returned) from the requested device. In the requested processing content field 1404, a substitute request for the compression mechanism is stored in this embodiment. The data field 1405 stores data to be compressed.

【０１０２】再び図１３を参照して、ステップ１３０１
で図１４のような要求パケットを作成した後、ステップ
１３０２ではシステムバス１８３に対して作成したパケ
ットを流す。これにより、システムバス１８３に接続さ
れた装置すべてにパケットが転送される。パケット転送
後は、依頼した処理が終了するのを待つ。Referring again to FIG. 13, step 1301
After creating the request packet as shown in FIG. 14, the created packet is sent to the system bus 183 in step 1302. As a result, the packet is transferred to all the devices connected to the system bus 183. After the packet transfer, wait for the requested processing to end.

【０１０３】そのパケットを受け取った各装置は、自機
内の障害管理部でそのパケット内の位置情報フィールド
１４０２をチェックし、自機への依頼であるかどうか判
定する。自機への依頼であったときは、コマンドフィー
ルド１４０１を解釈し、前記図８などで説明したのと同
様にして、依頼された処理を実行する。そして、処理結
果を依頼元の装置に返す。図１４のパケットの例では、
位置情報が”ＡＮＹ”であるので、パケットを受けたす
べての装置で当該処理が実行され、その結果が送出され
る。依頼元の装置では、一番早く送られてきた処理結果
を用いて、以降の処理を続行することとなる。Each device receiving the packet checks the position information field 1402 in the packet in the fault management unit in the device itself and determines whether the request is to the device itself. When it is a request to the own device, the command field 1401 is interpreted, and the requested processing is executed in the same manner as described in FIG. 8 and the like. Then, the processing result is returned to the requesting device. In the packet example of FIG. 14,
Since the position information is "ANY", the processing is executed by all the devices that have received the packet, and the result is transmitted. The requesting device uses the processing result sent earliest to continue the subsequent processing.

【０１０４】ステップ１３０３では、依頼した処理が正
常に終了したかどうかをチェックする。もし異常終了で
あれば、ステップ１３０４で障害検知プロセッサ３０４
にその旨が通知される。そうでなければ、ステップ１３
０５でセレクタ３１１を直結パス１２７に切り替える。
これにより、キャッシュメモリ１１５とデータバッファ
３０３とが直結される。ステップ１３０６では、この直
結パス１２７により、データバッファ３０３内に格納さ
れている他装置による実行結果が、直接、キャッシュメ
モリ１１５に転送される。At step 1303, it is checked whether the requested processing has been completed normally. If it is abnormal termination, the fault detection processor 304
Will be notified to that effect. Otherwise, step 13
At 05, the selector 311 is switched to the direct connection path 127.
As a result, the cache memory 115 and the data buffer 303 are directly connected. In step 1306, the execution result by another device stored in the data buffer 303 is directly transferred to the cache memory 115 by this direct connection path 127.

【０１０５】なお、本実施例では、１本のシステムバス
１８３上に接続された他装置に障害回復要求を発行して
いるが、図１４のパケットが転送可能な範囲であればど
この装置にでも障害回復を依頼することができる。例え
ば、ネットワークに接続された装置、またはネストされ
たネットワーク上の装置などに、処理を依頼するように
してもよい。したがって、本発明によれば、多数台のデ
ィスク装置を接続した場合、１台分のディスク装置が動
作可能な構成要素が複数のディスク装置により得られて
いる限りは、システムの動作を停止することなく動作可
能となる。In the present embodiment, the failure recovery request is issued to another device connected to one system bus 183, but to which device the packet transfer in FIG. 14 is possible. But you can ask for disaster recovery. For example, the processing may be requested to a device connected to the network, a device on the nested network, or the like. Therefore, according to the present invention, when a large number of disk devices are connected, the operation of the system should be stopped as long as the plurality of disk devices provide a component capable of operating one disk device. It becomes possible to operate without.

【０１０６】一般に装置の数が多くなればなるほどシス
テム全体の信頼性は低下する。しかし、本発明では、数
が増えたことによる信頼性低下を最小限に抑さえること
ができる。Generally, the larger the number of devices, the lower the reliability of the entire system. However, according to the present invention, it is possible to minimize deterioration in reliability due to the increase in the number.

【０１０７】図１５は、他装置へ処理を依頼する際の処
理の流れを図式化したものである。太い矢印は、処理の
流れを示すとともに、例えばデータ書き込み時のデータ
の流れをも示すと考えてよい。FIG. 15 is a diagram showing the flow of processing when requesting processing to another device. It can be considered that the thick arrow indicates not only the flow of processing but also the flow of data when writing data, for example.

【０１０８】ディスク装置１０６の障害管理部１１２
は、圧縮機構１１３の障害を検知し、他の装置へ圧縮機
構１１３で行なうべき処理の代替を依頼する。図１５で
は、ディスク装置１０７の障害管理部１５２が、これを
受けて、ディスク装置１０７内の圧縮機構１５３で、障
害を起こした圧縮機構１１３と同等の処理を行なう。そ
の結果は、障害管理部１５２から障害管理部１１２に送
られる。障害管理部１１２は、セレクタ３１１を直結パ
ス１２７に切り替えて、受けたデータを直接キャッシュ
メモリ１１５に転送している。Failure management unit 112 of disk device 106
Detects a failure of the compression mechanism 113 and requests another device to substitute the process to be performed by the compression mechanism 113. In FIG. 15, the failure management unit 152 of the disk device 107 receives this, and the compression mechanism 153 in the disk device 107 performs the same processing as that of the failed compression mechanism 113. The result is sent from the failure management unit 152 to the failure management unit 112. The failure management unit 112 switches the selector 311 to the direct connection path 127 and transfers the received data directly to the cache memory 115.

【０１０９】図１６は、図１０のステップ１００７から
実行されるホストへの障害回復依頼の処理フローチャー
トを示す。本処理は、図３で障害管理プロセッサ３０５
が実行するホストへの依頼リクエストプログラム３０９
の処理に相当する。FIG. 16 shows a processing flowchart of the failure recovery request to the host which is executed from step 1007 of FIG. This process is performed by the fault management processor 305 in FIG.
Request request program 309 executed by the host
Corresponds to the processing of.

【０１１０】まず、障害管理プロセッサ３０５は、ステ
ップ１６０１で、ホストプロセッサ１０１，１０２に対
する要求パケットを作成する。First, in step 1601, the fault management processor 305 creates a request packet for the host processors 101 and 102.

【０１１１】図１７に、このホストプロセッサに対する
要求パケットのフォーマット例を示す。このパケット
は、コマンドフィールド１７０１、位置情報フィールド
１７０２、依頼者フィールド１７０３、依頼する処理内
容フィールド１７０４、および処理を行なうデータフィ
ールド１７０５からなる。FIG. 17 shows a format example of a request packet for this host processor. This packet includes a command field 1701, a position information field 1702, a requester field 1703, a processing content field 1704 to request, and a data field 1705 to perform processing.

【０１１２】コマンドフィールド１７０１には、障害回
復依頼であることが記述されている。位置情報フィール
ド１７０２には、どの装置で実行して欲しいかが記述さ
れている。ここでは、ホストプロセッサへの処理依頼で
あるから”ホスト”と記述されている。依頼者フィール
ド１７０３には、障害を起こした装置の識別子が格納さ
れる。この識別子は、依頼されたホストプロセッサ側か
ら処理結果を転送（返送）するときに用いられる。依頼
する処理内容フィールド１７０４には、本実施例では圧
縮機構の代行要求が格納されている。データフィールド
１７０５には、圧縮処理を行なう対象であるデータが格
納されている。The command field 1701 describes that the request is a failure recovery request. The position information field 1702 describes which device is desired to be executed. Here, since it is a processing request to the host processor, it is described as "host". The requester field 1703 stores the identifier of the device that has failed. This identifier is used when the processing result is transferred (returned) from the requested host processor side. In the requested processing content field 1704, a substitute request for the compression mechanism is stored in this embodiment. The data field 1705 stores data to be compressed.

【０１１３】再び図１６を参照して、ステップ１６０１
で図１７のような要求パケットを作成した後、ステップ
１６０２ではシステムバス１８３に対して作成したパケ
ットを流す。これにより、チャネルアダプタ１０５を介
してシステムバス１８３に接続されたホストプロセッサ
１０１，１０２に、パケットが転送される。パケット転
送後は、依頼した処理が終了するのを待つ。Referring again to FIG. 16, step 1601
After creating the request packet as shown in FIG. 17, the created packet is sent to the system bus 183 in step 1602. As a result, the packet is transferred to the host processors 101 and 102 connected to the system bus 183 via the channel adapter 105. After the packet transfer, wait for the requested processing to end.

【０１１４】そのパケットを受け取ったホストプロセッ
サでの処理については、後に詳しく説明する。The processing in the host processor which receives the packet will be described in detail later.

【０１１５】ステップ１６０３では、依頼した処理が正
常に終了したかどうかをチェックする。もし異常終了で
あれば、ステップ１６０４で障害検知プロセッサ３０４
にその旨が通知される。そうでなければ、ステップ１６
０５でセレクタ３１１を直結パス１２７に切り替える。
これにより、キャッシュメモリ１１５とデータバッファ
３０３とが直結される。ステップ１６０６では、この直
結パス１２７により、データバッファ３０３内に格納さ
れているホストプロセッサによる実行結果が、直接、キ
ャッシュメモリ１１５に転送される。At step 1603, it is checked whether the requested processing has been completed normally. If it is abnormal termination, the fault detection processor 304 is checked in step 1604
Will be notified to that effect. Otherwise, step 16
At 05, the selector 311 is switched to the direct connection path 127.
As a result, the cache memory 115 and the data buffer 303 are directly connected. In step 1606, the execution result by the host processor stored in the data buffer 303 is directly transferred to the cache memory 115 by this direct connection path 127.

【０１１６】図１８は、ホストプロセッサ１０１（また
は１０２）内の特に障害回復関係のプログラムの構成を
示している。FIG. 18 shows the structure of a program particularly related to failure recovery in the host processor 101 (or 102).

【０１１７】１８０２はアプリケーションプログラムを
示す。ディスク装置１０６，１０７に対する入出力要求
は、アプリケーションプログラム１８０２が発行する。
１８０３は、オペレーティングシステムを示している。
オペレーティングシステム１８０３はホスト１０１（１
０２）を制御するプログラムであり、図ではそのうち入
出力関係だけを取り上げて記述している。Reference numeral 1802 indicates an application program. Input / output requests to the disk devices 106 and 107 are issued by the application program 1802.
Reference numeral 1803 indicates an operating system.
The operating system 1803 is the host 101 (1
02), and only the input / output relationship is taken up and described in the figure.

【０１１８】１８０４は入出力終了処理プログラムであ
り、アプリケーションプログラム１８０２から発行され
た入出力の終了処理を行なう。終了処理には、例えば入
出力制御に使用したメモリ領域といったリソースの開放
などが含まれる。１８０７は入出力発行処理プログラム
である。入出力終了処理プログラム１８０４とは逆に、
入出力に必要なリソースを確保し、入出力プロセッサ１
０３（１０４）に対して非同期入出力処理を依頼する。Reference numeral 1804 denotes an input / output termination processing program, which performs termination processing of the input / output issued from the application program 1802. The termination processing includes releasing resources such as a memory area used for input / output control. Reference numeral 1807 is an input / output issue processing program. Contrary to the I / O termination processing program 1804,
I / O processor 1 that secures the resources required for I / O
03 (104) for asynchronous input / output processing.

【０１１９】１８０６は割り込み受付プログラムであ
る。アプリケーションプログラム１８０２から発行され
た入出力の終了時にデバイス側からホストに対して割り
込みが発生する。割り込み受付プログラム１８０６は、
この割り込みを受け付けるプログラムである。具体的に
は、デバイス側から発行されたパケット（図１７のパケ
ットも含む）の取り込みを行なう。Reference numeral 1806 is an interrupt acceptance program. At the end of the input / output issued from the application program 1802, an interrupt occurs from the device side to the host. The interrupt acceptance program 1806
This program accepts this interrupt. Specifically, the packet issued from the device side (including the packet in FIG. 17) is fetched.

【０１２０】１８０５は終了割り込み解析プログラムで
ある。このプログラム１８０５は、受け付けた割り込み
が、入出力終了割り込みなのかそれとも障害回復要求
（図１７）なのかを判定する。１８０８はデバイス要求
管理プログラムである。このプログラム１８０８は、デ
バイス側から要求された内容を実行および管理し、実行
結果を要求したデバイスに転送する。Reference numeral 1805 is a termination interrupt analysis program. The program 1805 determines whether the accepted interrupt is an I / O end interrupt or a failure recovery request (FIG. 17). Reference numeral 1808 is a device request management program. This program 1808 executes and manages the contents requested from the device side, and transfers the execution result to the requesting device.

【０１２１】デバイス要求管理プログラム１８０８が実
行すべきプログラムは、機能プログラムライブラリ１８
０９に格納されており、必要に応じて実行される。例え
ば、圧縮処理を依頼されたのであれば、圧縮プログラム
１８１０が実行される。従来のホストプロセッサは、入
出力終了割込などを除いて、デバイス側からの処理依頼
を受け付ける機能を備えていなかった。本実施例では、
ホストプロセッサが上記のような構成を備えているた
め、デバイス側からの処理依頼を受け付けることが可能
となる。The program to be executed by the device request management program 1808 is the function program library 18
09, and is executed as needed. For example, if the compression process is requested, the compression program 1810 is executed. The conventional host processor does not have a function of receiving a processing request from the device side, except for an input / output end interrupt. In this embodiment,
Since the host processor has the above-mentioned configuration, it becomes possible to receive a processing request from the device side.

【０１２２】図１９は、上記構成のホストプロセッサの
動作を示すフローチャートである。FIG. 19 is a flow chart showing the operation of the host processor having the above configuration.

【０１２３】ホストプロセッサは、まず、ステップ１９
０１で入出力終了割り込み受付処理を行なう。この処理
は、上記の割り込み受付プログラム１８０６による処理
であり、具体的には、デバイス側から発行されたパケッ
ト（図１７など）の取り込みなどの処理である。The host processor first proceeds to step 19
At 01, input / output end interrupt acceptance processing is performed. This processing is processing by the interrupt acceptance program 1806 described above, and is specifically processing such as fetching of a packet (such as FIG. 17) issued from the device side.

【０１２４】次に、終了割り込み解析プログラム１８０
５による処理１９２１を行なう。すなわち、まずステッ
プ１９０２で、受け取ったパケットを解析し、ステップ
１８０３で、その割り込みが入出力終了割り込みなのか
それとも障害回復要求なのかを判定する。もし、通常の
入出力終了割り込みであれば、入出力終了処理プログラ
ム１８０４を起動し、ステップ１９０９で入出力処理を
完了させる。Next, the end interrupt analysis program 180
Processing 1921 according to No. 5 is performed. That is, first, in step 1902, the received packet is analyzed, and in step 1803, it is determined whether the interrupt is an input / output end interrupt or a failure recovery request. If it is a normal input / output end interrupt, the input / output end processing program 1804 is started, and the input / output processing is completed in step 1909.

【０１２５】ステップ１９０３でデバイス側からの処理
要求であると判定されたときは、デバイス要求管理プロ
グラム１８０８に制御を移し、処理１９２２を行なう。
処理１９２２では、まずステップ１９０４で、パケット
（図１７）を解析する。そして、解析の結果、何を行な
うべきかを、ステップ１９０５〜１９０７で判定する。If it is determined in step 1903 that the request is a processing request from the device side, control is transferred to the device request management program 1808 and processing 1922 is performed.
In the process 1922, first, in step 1904, the packet (FIG. 17) is analyzed. Then, as a result of the analysis, what to do is determined in steps 1905 to 1907.

【０１２６】ステップ１９０５〜１９０７の判定に基づ
いて、処理１９２３で、機能プログラムライブラリ１８
０９中のプログラムを用いて、要求された処理を実行す
る。例えば、ステップ１９０５で要求された処理が圧縮
処理であったときは、圧縮プログラム１８１０により、
ステップ１９１０で圧縮処理を行なう。同様に、ステッ
プ１９０６で要求されたのが処理１であるときはステッ
プ１９１１でその処理１を行ない、ステップ１９０７で
要求されたのが処理２であるときはステップ１９１２で
その処理２を行なう。Based on the judgments in steps 1905 to 1907, in the processing 1923, the function program library 18
The requested processing is executed using the program in 09. For example, when the processing requested in step 1905 is compression processing, the compression program 1810
In step 1910, compression processing is performed. Similarly, when the process 1 is requested in step 1906, the process 1 is executed in step 1911, and when the process 2 is requested in step 1907, the process 2 is executed in step 1912.

【０１２７】ステップ１９１０，１９１１，１９１２の
後、ステップ１９０８で、依頼を発行した装置に対し実
行結果を転送する。After steps 1910, 1911 and 1912, in step 1908, the execution result is transferred to the device that issued the request.

【０１２８】図２０は、上記ホストプロセッサでの回復
処理の概要を示したものである。太い矢印は、処理の流
れを示すとともに、データの流れをも示すと考えてよ
い。FIG. 20 shows an outline of the recovery process in the host processor. It can be considered that the thick arrow indicates not only the flow of processing but also the flow of data.

【０１２９】ディスク装置１０６の障害管理部１１２
は、圧縮機構１１３の障害を検知し、ホストに対し圧縮
機構１１３で行なうべき処理の代替を依頼する。図２０
では、ホストプロセッサ１０１が、これを受けて、内部
の圧縮プログラム１８１０を用いて、障害を起こした圧
縮機構１１３と同等の圧縮処理を行なう。その結果は、
ホストプロセッサ１０１から障害管理部１１２に送られ
る。障害管理部１１２は、セレクタ３１１を直結パス１
２７に切り替えて、受けたデータを直接キャッシュメモ
リ１１５に転送している。Failure management unit 112 of disk device 106
Detects a failure of the compression mechanism 113 and requests the host to substitute the processing to be performed by the compression mechanism 113. Figure 20
Then, in response to this, the host processor 101 uses the internal compression program 1810 to perform compression processing equivalent to that of the compression mechanism 113 that has failed. The result is
It is sent from the host processor 101 to the failure management unit 112. The failure management unit 112 connects the selector 311 to the direct connection path 1
It switches to 27 and transfers the received data directly to the cache memory 115.

【０１３０】これにより、前述の他の装置での障害回復
と合わせて、デバイスが通信可能な、あらゆる装置ある
いはホストプロセッサを利用した障害回復が行なえるよ
うになり、より信頼性の高いシステムを構築することが
可能となる。As a result, in addition to the failure recovery in the other apparatus described above, it becomes possible to perform the failure recovery using any apparatus or the host processor with which the device can communicate, thereby constructing a more reliable system. It becomes possible to do.

【０１３１】なお、本発明はディスク装置のみに限定さ
れず、計算機システム一般に適用可能である。また、本
発明の基本は多数のデバイスまたはコンピューティング
システム間でのリソースの有効利用にある。従って、本
発明は、障害回復に限定されることなく、一般に、多数
のデバイスまたはコンピューティングシステム間でリソ
ースを有効に利用するために適用することができる。The present invention is not limited to the disk device, but can be applied to computer systems in general. Also, the basis of the present invention is the efficient use of resources among multiple devices or computing systems. Thus, the present invention is not limited to disaster recovery, but can be generally applied to effectively utilize resources among multiple devices or computing systems.

【０１３２】［実施例２］次に、本発明の第２の実施例
を説明する。本実施例では、上記第１の実施例のシステ
ムと同じシステムを用いる。特に、クロスコール動作を
行なう点に特徴がある。[Embodiment 2] Next, a second embodiment of the present invention will be described. In this embodiment, the same system as the system of the first embodiment is used. In particular, it is characterized by performing a cross call operation.

【０１３３】図２１は、本実施例における障害回復処理
をクロスコールを用いて行なう場合の動作の流れを示
す。太い矢印は、処理の流れを示すとともに、データの
流れをも示すと考えてよい。FIG. 21 shows the flow of operations when the failure recovery processing in this embodiment is performed using cross call. It can be considered that the thick arrow indicates not only the flow of processing but also the flow of data.

【０１３４】ディスク装置１０６の障害管理部１１２
は、上位装置からのデータ書き込み要求を受けて、圧縮
機構１１３によりデータを圧縮し、キャッシュ１１５に
格納する。障害管理部１１６は、キャッシュ１１５に格
納されたデータをディスクドライブ１１８に書き込もう
とするが、この際、ディスクドライブ１１８，１１９の
入出力制御を行なっている制御プロセッサ１１７が障害
を起こしていたとする。Failure management unit 112 of disk device 106
In response to a data write request from the host device, the compression mechanism 113 compresses the data and stores it in the cache 115. The failure management unit 116 attempts to write the data stored in the cache 115 to the disk drive 118, but at this time, it is assumed that the control processor 117 performing the input / output control of the disk drives 118 and 119 has failed.

【０１３５】このとき、障害管理部１１６は、制御プロ
セッサ１１７の障害を検知し、他装置に対して障害回復
を依頼する。その依頼を受けた障害管理部１３２は、受
けとったデータを直結パス１４７でキャッシュ１３５に
格納し、さらに障害管理部１３６は、キャッシュ１３５
に格納されたデータを制御プロセッサ１３７に渡し、制
御プロセッサ１３７はそのデータをディスクドライブ１
１８に書き込む。このように、データを元の装置に戻す
ことなく、障害回復依頼を受けた装置が直接クロスコー
ルでディスクドライブ１１８に対してアクセスするよう
になっている。At this time, the fault management unit 116 detects a fault in the control processor 117 and requests another device for fault recovery. The failure management unit 132 that received the request stores the received data in the cache 135 through the direct connection path 147, and the failure management unit 136 further stores the received data in the cache 135.
Data stored in the disk drive 1 to the control processor 137, and the control processor 137 transfers the data to the disk drive 1
Write to 18. As described above, the device that receives the failure recovery request directly accesses the disk drive 118 by cross-call without returning the data to the original device.

【０１３６】このクロスコールを可能にするのは、パス
１７１，１７２の存在による。このパスにより、独立の
制御形態間（すなわち、ディスク装置１０６中の左側の
ディスクシステムと右側のディスクシステムとの間）で
のディスクドライブのアクセスが可能になる。The existence of the paths 171 and 172 makes this cross-call possible. This path enables access of the disk drive between independent control modes (that is, between the left disk system and the right disk system in the disk device 106).

【０１３７】次に、クロスコールによる障害回復処理を
さらに詳細に説明する。Next, the fault recovery process by cross call will be described in more detail.

【０１３８】図２２は、クロスコールによる障害回復処
理依頼の手順を示すフローチャートである。本処理は、
障害管理部１１６の障害管理プロセッサ４０５により実
行される。まず、ステップ２２０１で、要求パケットの
作成を行なう。次に、ステップ２２０２で、作成された
パケットをシステムバス１８３に送信し、クロスコール
可能な装置へ転送する。FIG. 22 is a flow chart showing the procedure for requesting a failure recovery process by cross call. This process is
It is executed by the fault management processor 405 of the fault management unit 116. First, in step 2201, a request packet is created. Next, in step 2202, the created packet is sent to the system bus 183 and transferred to a cross-call enabled device.

【０１３９】要求パケットとして２種類の例をあげる。Two types of request packets will be given.

【０１４０】図２３は、第１のパケットの例であり、図
２１において障害管理部１１６が送出する要求パケット
の例である。このパケットは、コマンドフィールド２３
０１、位置情報フィールド２３０２、障害回復依頼者フ
ィールド２３０３、処理依頼者フィールド２３０４、依
頼する処理内容フィールド２３０５、および処理を行な
うデータフィールド２３０６からなる。FIG. 23 is an example of the first packet, and is an example of the request packet sent by the fault management unit 116 in FIG. This packet contains the command field 23
01, position information field 2302, failure recovery requester field 2303, processing requester field 2304, requested processing content field 2305, and processing data field 2306.

【０１４１】コマンドフィールド２３０１は、コマンド
を示し、障害回復依頼であることを相手に伝える。位置
情報フィールド２３０２には、依頼を行なう装置の識別
子が格納されている。障害回復依頼者フィールド２３０
３には、障害回復の依頼元の識別子が格納されている。The command field 2301 indicates a command and informs the other party that the request is a failure recovery request. The location information field 2302 stores the identifier of the requesting device. Disaster recovery requester field 230
3 stores the identifier of the failure recovery request source.

【０１４２】ここでは、ディスク装置１０６内の左側の
ディスクシステムの障害管理部１１６から、同じディス
ク装置１０６内の右側のディスクシステムの障害管理部
１３２，１３６への依頼であるので、位置情報フィール
ド２３０２にはディスク装置１０６内の右側のディスク
システムを表す「装置１−２」と記述され、障害回復依
頼者フィールド２３０３にはディスク装置１０６内の左
側のディスクシステムを表す「装置１−１」と記述され
ている。Here, since the request is from the failure management unit 116 of the left disk system in the disk device 106 to the failure management units 132 and 136 of the right disk system in the same disk device 106, the position information field 2302 Is described as “device 1-2” that represents the right disk system in the disk device 106, and the failure recovery requester field 2303 is described as “device 1-1” that represents the left disk system in the disk device 106. Has been done.

【０１４３】処理依頼者フィールド２３０４には、処理
の依頼者であるホストの識別子が格納されている。この
識別子により当回復処理が終了した際に、依頼元に終了
処理を返すことなく直接ホストに終了報告ができる。依
頼する処理内容フィールド２３０５には、依頼内容が記
述される。本例では、制御プロセッサ１１７の処理であ
ることが記述されている。データフィールド２３０６に
は、依頼処理データが記述される。本例では、キャッシ
ュメモリ１１５内の必要データが格納されている。The processing requester field 2304 stores the identifier of the host that is the requester of the processing. When this recovery process is completed by this identifier, the completion report can be sent directly to the host without returning the completion process to the requester. The requested content is written in the requested processing content field 2305. In this example, it is described that the processing is performed by the control processor 117. Request processing data is described in the data field 2306. In this example, necessary data in the cache memory 115 is stored.

【０１４４】以上のようなパケットにより、他装置への
依頼を行なっているので、依頼を受けた装置の側では、
不必要に処理を重複させることなく効率よく処理可能と
なる。Since the request to the other device is made by the packet as described above, the device receiving the request
The processing can be efficiently performed without duplicating the processing unnecessarily.

【０１４５】図２４は、第２のパケットの例であり、や
はり図２１において障害管理部１１６が送出する要求パ
ケットの例である。このパケットは、コマンドフィール
ド２４０１、位置情報フィールド２４０２、処理要求者
フィールド２４０３、依頼する処理内容フィールド２４
０４、および処理を行なうデータフィールド２４０５か
らなる。FIG. 24 is an example of the second packet, which is also an example of the request packet sent by the fault management unit 116 in FIG. This packet includes a command field 2401, a position information field 2402, a processing requester field 2403, and a requested processing content field 24.
04, and a data field 2405 for processing.

【０１４６】コマンドフィールド２４０１にはコマンド
が格納されており、ここではクロスコールであることが
示されている。位置情報フィールド２４０２には位置情
報が格納されており、ここではアクセス対象となるディ
スクドライブの位置が格納されている。要求者フィール
ド２４０３には、第１のパケットの処理依頼者フィール
ド２３０４と同様にホストの識別子が格納されている。
処理内容フィールド２４０４には、依頼する処理の内容
が記述され、ここでは障害が発生した制御プロセッサ１
１７の処理を示す識別子が格納される。データフィール
ド２４０５には、第１のパケットのデータフィールド２
３０６と同様にキャッシュメモリ１１５上の必要なデー
タが格納されている。A command is stored in the command field 2401, which indicates that the command is a cross call. Position information is stored in the position information field 2402, and here, the position of the disk drive to be accessed is stored. The requester field 2403 stores the host identifier as in the case of the processing requester field 2304 of the first packet.
The processing content field 2404 describes the content of the requested processing, and here, the control processor 1 in which the failure has occurred.
An identifier indicating the processing of 17 is stored. Data field 2405 contains data field 2 of the first packet.
Similar to 306, necessary data is stored in the cache memory 115.

【０１４７】図２３または図２４のパケットのどちらを
使用しても目的とするクロスコールは可能である。以下
では、図２４の第２のパケットを使用したクロスコール
処理について詳細に説明する。The target cross call is possible by using either the packet shown in FIG. 23 or the packet shown in FIG. Hereinafter, the cross call process using the second packet in FIG. 24 will be described in detail.

【０１４８】図２５は、図２４のようなクロスコール依
頼パケットを受けた装置の処理フローチャートを示して
いる。ステップ２５０１〜２５０３は、図７のステップ
７０１〜７０３に等しいので、説明を省略する。次に、
ステップ２５０４ではパケットのコマンドフィールド２
４０１がクロスコール依頼であるかどうかを判別する。
クロスコール依頼であるときは、障害検知部３０４によ
るステップ２５０５に進む。FIG. 25 shows a processing flowchart of the apparatus which receives the cross call request packet as shown in FIG. Since steps 2501-2503 are the same as steps 701-703 in FIG. next,
In step 2504, the command field 2 of the packet
It is determined whether 401 is a cross call request.
If the request is a cross-call, the process proceeds to step 2505 by the failure detection unit 304.

【０１４９】障害検知部３０４は、ステップ２５０５
で、パケットのデータフィールド２４０５のデータをキ
ャッシュメモリ１３５に転送する。このときは、図１３
のステップ１３０５で述べたのと同様に、障害管理部１
３２のセレクタ（図３のセレクタ３１１）では直結パス
１４７が選択されている。The fault detecting unit 304, step 2505
Then, the data in the data field 2405 of the packet is transferred to the cache memory 135. At this time,
In the same manner as described in step 1305 of the above, the fault management unit 1
In the 32 selectors (selector 311 in FIG. 3), the direct connection path 147 is selected.

【０１５０】次に、ステップ２５０６で、パケットの依
頼処理フィールド２４０４のデータを制御プロセッサ１
３７に転送する。この転送処理を行うのは、障害管理部
１３６である。このとき、障害管理部１３６のセレクタ
（図４のセレクタ４１１）ではパス４１４が選択されて
いる。これにより、キャッシュメモリ１３５に格納され
たデータ（パケットのデータフィールド２４０５のデー
タ）は、パス４１４を介して制御プロセッサ１３７に渡
され、制御プロセッサ１３７は、そのデータをクロスコ
ールでディスクデバイス１１８に書き込む。Next, in step 2506, the data in the request processing field 2404 of the packet is transferred to the control processor 1.
Transfer to 37. The fault management unit 136 performs this transfer process. At this time, the path 414 is selected by the selector (selector 411 in FIG. 4) of the failure management unit 136. As a result, the data stored in the cache memory 135 (data in the data field 2405 of the packet) is passed to the control processor 137 via the path 414, and the control processor 137 writes the data to the disk device 118 by cross call. .

【０１５１】このような処理により、制御プロセッサ１
３４での処理などを省略することができ、高速なクロス
コールが可能となる。また、このクロスコールにより、
実施例１で述べた方式に比べ、往復に要する通信オーバ
ヘッドを少なくでき、より高速に処理を行なうことがで
きる。By such processing, the control processor 1
The processing in 34 and the like can be omitted, and high-speed cross call becomes possible. Also, with this cross call,
As compared with the method described in the first embodiment, the communication overhead required for round trip can be reduced, and the processing can be performed at higher speed.

【０１５２】［実施例３］上述した第１および第２の実
施例では、ユーザがアクセスしたいデータが唯一１つの
ディスクドライブに存在するファイルシステムに本発明
を適用した例を述べた。しかし、より高信頼性を備えた
ファイルシステムを実現する場合には、同一データを複
数のディスク装置に格納しておくことで高い信頼性を得
る方式を用いることもある。これをミラーリングとい
う。[Third Embodiment] In the first and second embodiments described above, the example in which the present invention is applied to the file system in which the data which the user wants to access exists in only one disk drive has been described. However, in order to realize a file system with higher reliability, a method of obtaining high reliability by storing the same data in a plurality of disk devices may be used. This is called mirroring.

【０１５３】第３の実施例は、ミラーリングされたディ
スクファイルシステムに本発明を適用した例であり、そ
のようなシステムで効率的な障害回復動作を実現した例
である。なお、本実施例のシステムの構成は上述した第
１の実施例と同様のものを用いることとし、以下では、
特に第３の実施例に特有の部分を中心に説明する。The third embodiment is an example in which the present invention is applied to a mirrored disk file system, and an example in which an efficient failure recovery operation is realized in such a system. The system configuration of this embodiment is the same as that of the first embodiment described above.
In particular, the description will focus on the part that is unique to the third embodiment.

【０１５４】図２６は、ミラーリングされた本実施例の
ディスクファイルシステムにおける障害回復の動作概要
を示したものである。太い矢印は、処理の流れを示すと
ともに、データの流れをも示すと考えてよい。ここで
は、ディスクドライブ１１８とディスクドライブ１５８
がミラーリングされており、これらのドライブ１１８，
１５８に同じ内容のデータが存在していることを前提と
する。FIG. 26 shows an outline of a failure recovery operation in the mirrored disk file system of this embodiment. It can be considered that the thick arrow indicates not only the flow of processing but also the flow of data. Here, the disk drive 118 and the disk drive 158
Are mirrored and these drives 118,
It is assumed that data having the same content exists in 158.

【０１５５】いま、ディスクドライブ１１８が障害を起
こしたとする。この障害を検知した障害管理部１１２
は、構成管理テーブル１１１を参照して、障害を起こし
たディスクドライブ１１８がディスクドライブ１５８に
ミラーリングされていることを得る。その後、障害管理
部１１２は、ディスクドライブ１５８に対して処理を行
なうことで、障害回復を行なっている。It is assumed that the disk drive 118 has failed. Fault management unit 112 that has detected this fault
Obtains that the failed disk drive 118 is mirrored by the disk drive 158 by referring to the configuration management table 111. After that, the failure management unit 112 performs failure recovery by performing processing on the disk drive 158.

【０１５６】図２７は、ミラーリングの情報が格納され
ている構成管理テーブル１１１の内容を示している。テ
ーブルの各領域２７０１〜２７０５は、図５の構成管理
テーブル１１１の領域５０１〜５０５と同様の内容を格
納する領域である。図５の構成管理テーブル１１１と異
なる点は、レコード情報としてミラーリングがつけ加わ
ったところである。FIG. 27 shows the contents of the configuration management table 111 in which mirroring information is stored. Each area 2701 to 2705 of the table is an area for storing the same contents as the areas 501 to 505 of the configuration management table 111 of FIG. The difference from the configuration management table 111 of FIG. 5 is that mirroring is added as record information.

【０１５７】図２７では、構成要素２７０１が「ドライ
ブ１」（ディスクドライブ１１８）の代替手段２７０５
として「ミラー，装置２：ドライブ１」と記述されてい
る。これにより、ディスクドライブ１１８が、ディスク
装置１０７のディスクドライブ１５８とミラーリングさ
れていることがわかる。In FIG. 27, the component 2701 is an alternative means 2705 for the "drive 1" (disk drive 118).
Is described as "mirror, device 2: drive 1". This shows that the disk drive 118 is mirrored with the disk drive 158 of the disk device 107.

【０１５８】図２８は、本実施例における障害管理部１
１２の障害管理プロセッサ３０５の処理フローチャート
を示したものである。同図において、ステップ２８０１
〜２８０７は前述の第１の実施例の図１０のステップ１
００１〜１００７に、ステップ２８０９はステップ１０
０８に、ステップ２８１０〜２８１３はステップ１００
９〜１０１２に、それぞれ対応し、これらのステップに
おける動作は第１の実施例と同様であるので説明は省略
する。FIG. 28 shows the fault management unit 1 in this embodiment.
12 shows a processing flowchart of 12 failure management processors 305. In the figure, step 2801
2807 is step 1 in FIG. 10 of the first embodiment described above.
001 to 1007, step 2809 is step 10
08, steps 2810 to 2813 are steps 100.
9 to 1012 respectively, and the operations in these steps are the same as those in the first embodiment, and the description thereof will be omitted.

【０１５９】図２８のフローチャートでは、ステップ２
８０８が追加されている。これは、ミラーリングによる
障害回復が可能かどうかをチェックするステップであ
る。障害を起こした構成要素にミラーリングが施されて
いれば、そのミラー（ミラーリングされた複数の構成要
素のうちの障害を起こしていないもの）を使用して、処
理を継続することができる。In the flowchart of FIG. 28, step 2
808 has been added. This is a step of checking whether the failure recovery by mirroring is possible. If the failed component is mirrored, then the mirror (the non-failed one of the mirrored components) can be used to continue processing.

【０１６０】図２９は、図２８のステップ２８０８から
実行されるミラーへの障害回復依頼の処理フローチャー
トを示す。本処理のプログラムは、図３のエミュレーシ
ョンリクエストプログラム３０７や外部装置リクエスト
プログラム３０８と同様にメモリ３０６に格納されてい
る。障害管理プロセッサ３０５は、まずステップ２９０
１で、ミラーリングが施されている装置に対しての回復
依頼パケットを作成する。次に、ステップ２９０２で、
そのパケットをシステムバス１８３を介して目的の装置
に転送する。FIG. 29 shows a processing flowchart of the failure recovery request to the mirror which is executed from step 2808 of FIG. The program of this processing is stored in the memory 306 similarly to the emulation request program 307 and the external device request program 308 of FIG. The fault management processor 305 first executes step 290.
In step 1, a recovery request packet for a device that is mirrored is created. Then, in step 2902,
The packet is transferred to the target device via the system bus 183.

【０１６１】図３０は、上記の障害回復依頼パケットの
構造を示している。このパケットは、第１の実施例でホ
ストから障害管理部に転送される図６のパケットと同様
のものである。ただし、図３０はミラーへの依頼パケッ
トであるから、位置情報フィールド３００２には、ミラ
ーリングされた装置が指定されている。ミラーリング時
のパケットの特徴は、オリジナルのパケットの位置情報
をミラーリングが施された装置と入れ替えるだけで容易
に処理依頼を行なうことができる点である。FIG. 30 shows the structure of the above failure recovery request packet. This packet is similar to the packet of FIG. 6 transferred from the host to the failure management unit in the first embodiment. However, since FIG. 30 shows a request packet to the mirror, the position information field 3002 specifies the mirrored device. The characteristic of the packet at the time of mirroring is that the processing request can be easily made by simply replacing the position information of the original packet with the device to which the mirroring is applied.

【０１６２】［実施例４］上述の各実施例では、ある構
成要素が障害を起こしたときに、その障害を起こした構
成要素を他の構成要素で代替して処理を継続するシステ
ムを説明した。これに対し、第４の実施例は、前述のエ
ミュレーション処理を障害時に使用するのではなく、デ
ータの信頼性を高めるために多重化動作をさせ、ハード
ウェアとで多数決判定を行なうものである。[Embodiment 4] In each of the above embodiments, when a certain component fails, another component replaces the failed component and continues the processing. . On the other hand, in the fourth embodiment, the emulation processing described above is not used at the time of a failure, but a multiplexing operation is performed to improve the reliability of data, and a majority decision is made with hardware.

【０１６３】図３１は、本実施例のファイルシステムに
おける障害管理部１１２の構成を示す。基本的な構成
は、図３と同様であるので、異なる点について特に詳し
く説明する。図３１の障害管理部１１２では、オプショ
ン機構が多重化（２重化）されており、２つのオプショ
ン機構３１０１，３１０２を備えている。また、多数決
判定を行なうための比較器３１０３，３１０４が追加さ
れている。FIG. 31 shows the structure of the failure management unit 112 in the file system of this embodiment. Since the basic configuration is the same as that of FIG. 3, the different points will be described in detail. In the failure management unit 112 of FIG. 31, the option mechanism is multiplexed (duplexed) and includes two option mechanisms 3101 and 3102. Further, comparators 3103 and 3104 for making a majority decision are added.

【０１６４】本実施例において、第１の実施例の図６に
示したようなデータ書き込み要求のパケットを受け付け
たとする。このとき、障害管理プロセッサ３０５は、デ
ィスクに書き込むべきデータを圧縮するため、多重化さ
れたオプション機構３１０１，３１０２にデータを渡
す。また、オプション機構３１０１，３１０２に障害が
発生していないときでも、第１の実施例で説明したエミ
ュレーションを、制御プロセッサ１１４に実行させる。In this embodiment, it is assumed that a data write request packet as shown in FIG. 6 of the first embodiment is accepted. At this time, the fault management processor 305 passes the data to the multiplexed option mechanisms 3101 and 3102 in order to compress the data to be written to the disk. Further, even when there is no failure in the option mechanisms 3101 and 3102, the control processor 114 is caused to execute the emulation described in the first embodiment.

【０１６５】オプション機構３１０１での圧縮処理の結
果とオプション機構３１０２での圧縮処理の結果とは、
比較器３１０３で比較される。さらに、その比較結果が
等しいときには、オプション機構３１０１，３１０２で
の圧縮処理の結果とエミュレーションの結果とが、比較
器３１０４で比較される。これらの比較の結果、どれか
１つのデータが他の２つのデータと異なっていた場合、
多数決により値があっていた２つのデータを採用し、キ
ャッシュ１１５に書き込む。The result of the compression processing by the option mechanism 3101 and the result of the compression processing by the option mechanism 3102 are
It is compared by the comparator 3103. Further, when the comparison results are the same, the comparator 3104 compares the result of the compression processing in the option mechanisms 3101 and 3102 with the result of the emulation. As a result of these comparisons, if any one data is different from the other two data,
The two data having the value determined by the majority decision are adopted and written in the cache 115.

【０１６６】このようにして、本発明は、障害時だけで
なく通常の動作時にも適用することができ、より信頼性
の高い動作を実現できる。なお、多数決をとる方式に限
定することなく、１つのハードウェアとエミュレーショ
ンとで２重化を行なうなどのバリエーションをもたせる
こともできる。In this way, the present invention can be applied not only at the time of failure but also at the time of normal operation, and more reliable operation can be realized. It should be noted that the present invention is not limited to the method of taking a majority vote, and it is possible to provide a variation such as duplication using one piece of hardware and emulation.

【０１６７】［実施例５］上述した各実施例において
は、図３および図４に示したように、障害管理部内の障
害管理プロセッサ３０５，４０５と制御プロセッサ１１
４，１１７とを設けているが、これらを共用することも
できる。[Embodiment 5] In each of the embodiments described above, as shown in FIGS. 3 and 4, the fault management processors 305 and 405 and the control processor 11 in the fault management unit.
4, 117 are provided, but they can be shared.

【０１６８】図３２は、図３の障害管理プロセッサ３０
５と制御プロセッサ１１４とを、１つの制御プロセッサ
３２０１に置き換えたものである。この制御プロセッサ
３２０１が、図３の障害管理プロセッサ３０５と制御プ
ロセッサ１１４の行なっていた動作を実行する。これに
より、部品点数を削減することができ、より装置コスト
を削減することができる。１つのプロセッサで、従来２
つのプロセッサが行なっていた動作を実行するので、性
能は前述の方式と比較して若干低下するが、プロセッサ
の数が削減された分かなりのコスト削減が見込める。FIG. 32 shows the fault management processor 30 of FIG.
5 and the control processor 114 are replaced with one control processor 3201. The control processor 3201 executes the operations performed by the fault management processor 305 and the control processor 114 of FIG. As a result, the number of parts can be reduced, and the device cost can be further reduced. 2 with one processor
Since the operation performed by one processor is performed, the performance is slightly lower than that of the above-mentioned method, but a considerable cost reduction can be expected due to the reduced number of processors.

【０１６９】[0169]

【発明の効果】本発明によれば、複数の構成要素からな
る計算機システムにおいて、ある構成要素が障害を起こ
したとしても、他の構成要素（エミュレーションな
ど）、他の装置、あるいはホストプロセッサなどに、そ
の構成要素の代替処理を依頼することができるので、多
数の装置を接続しても信頼性が低下することがない。ま
た、エミュレーション、他の装置、あるいはホストプロ
セッサによる代替処理を行なっているので、多重化によ
り信頼性を向上させる従来の手法に比較して、低コスト
で信頼性の高いシステムを構築できる。According to the present invention, in a computer system composed of a plurality of constituent elements, even if a certain constituent element fails, another constituent element (emulation, etc.), another device, a host processor, etc. Since it is possible to request the alternative processing of the constituent element, reliability is not deteriorated even if a large number of devices are connected. Further, since the emulation, another device, or the alternative process by the host processor is performed, it is possible to construct a low-cost and highly reliable system as compared with the conventional method of improving the reliability by multiplexing.

[Brief description of drawings]

【図１】本発明に係る計算機システムの第１の実施例の
全体構成図を示す。FIG. 1 shows an overall configuration diagram of a first embodiment of a computer system according to the present invention.

【図２】第１の実施例における障害回復の流れを示すフ
ローチャートを示す。FIG. 2 shows a flowchart showing a flow of failure recovery in the first embodiment.

【図３】第１の実施例における障害管理部１１２のブロ
ック図を示す。FIG. 3 shows a block diagram of a failure management unit 112 in the first embodiment.

【図４】第１の実施例における障害管理部１１６のブロ
ック図を示す。FIG. 4 shows a block diagram of a failure management unit 116 in the first embodiment.

【図５】第１の実施例における障害情報が格納された構
成管理テーブルを示す。FIG. 5 shows a configuration management table storing failure information according to the first embodiment.

【図６】第１の実施例においてホストから転送されるコ
マンドパケット構成を示す。FIG. 6 shows a structure of a command packet transferred from a host in the first embodiment.

【図７】第１の実施例におけるコマンドパケット内の位
置情報の詳細を示す。FIG. 7 shows details of position information in a command packet in the first embodiment.

【図８】第１の実施例における障害検知部の処理フロー
チャートを示す。FIG. 8 shows a processing flowchart of a failure detection unit in the first embodiment.

【図９】第１の実施例における障害管理プロセッサへの
伝達パケット構成を示す。FIG. 9 shows a configuration of a transfer packet to a fault management processor in the first embodiment.

【図１０】第１の実施例における障害管理プロセッサの
処理フローチャートを示す。FIG. 10 shows a processing flowchart of a fault management processor in the first embodiment.

【図１１】第１の実施例におけるエミュレーション処理
のフローチャートを示す。FIG. 11 shows a flowchart of an emulation process in the first embodiment.

【図１２】第１の実施例におけるエミュレーション動作
の概要を示す。FIG. 12 shows an outline of emulation operation in the first embodiment.

【図１３】第１の実施例における他装置への障害回復依
頼処理フローチャートを示す。FIG. 13 shows a flowchart of a failure recovery request processing to another device in the first embodiment.

【図１４】第１の実施例における他装置への障害回復依
頼パケット構成を示す。FIG. 14 shows the structure of a fault recovery request packet to another device in the first embodiment.

【図１５】第１の実施例における他装置への障害回復依
頼処理の概要を示す。FIG. 15 shows an outline of processing for requesting a failure recovery to another device in the first embodiment.

【図１６】第１の実施例におけるホストへの障害回復依
頼処理フローチャートを示す。FIG. 16 shows a flowchart of a failure recovery request processing to the host in the first embodiment.

【図１７】第１の実施例におけるホストへの障害回復依
頼パケット構成を示す。FIG. 17 shows the structure of a failure recovery request packet to the host in the first embodiment.

【図１８】第１の実施例におけるホスト内のプログラム
構成を示す。FIG. 18 shows a program structure in the host in the first embodiment.

【図１９】第１の実施例におけるホストの障害回復処理
フローチャートを示す。FIG. 19 shows a flowchart of a host failure recovery process in the first embodiment.

【図２０】第１の実施例におけるホストへの障害回復依
頼処理の概要を示す。FIG. 20 shows an outline of failure recovery request processing to the host in the first embodiment.

【図２１】第２の実施例におけるクロスコール処理の概
要を示す。FIG. 21 shows an outline of cross-call processing in the second embodiment.

【図２２】第２の実施例におけるクロスコール依頼処理
フローチャートを示す。FIG. 22 shows a cross-call request processing flowchart in the second embodiment.

【図２３】第２の実施例におけるクロスコール依頼パケ
ット構成（その１）を示す。FIG. 23 shows a cross call request packet configuration (part 1) in the second embodiment.

【図２４】第２の実施例におけるクロスコール依頼パケ
ット構成（その２）を示す。FIG. 24 shows a cross call request packet configuration (part 2) in the second embodiment.

【図２５】第２の実施例においてクロスコール依頼を受
けた装置の処理フローチャートを示す。FIG. 25 shows a processing flowchart of an apparatus which receives a cross call request in the second embodiment.

【図２６】第３の実施例におけるミラー時の障害回復概
要を示す。FIG. 26 shows an outline of failure recovery during mirroring in the third embodiment.

【図２７】第３の実施例におけるミラー時の構成管理テ
ーブルを示す。FIG. 27 shows a configuration management table at the time of mirroring in the third embodiment.

【図２８】第３の実施例におけるミラー時の障害管理プ
ロセッサ処理フローチャートを示す。FIG. 28 shows a flow chart of a fault management processor process during mirroring in the third embodiment.

【図２９】第３の実施例におけるミラー時の障害回復依
頼フローチャートを示す。FIG. 29 shows a failure recovery request flowchart at the time of mirroring in the third embodiment.

【図３０】第３の実施例におけるミラー時の障害回復依
頼パケット構成を示す。FIG. 30 shows the structure of a failure recovery request packet at the time of mirroring in the third embodiment.

【図３１】第４の実施例におけるエミュレーションを使
用した多数決判定を行なうディスクシステムのブロック
構成（一部）図を示す。FIG. 31 is a block diagram (partial view) of a disk system for performing majority decision using emulation in the fourth embodiment.

【図３２】第５の実施例における制御プロセッサを使用
した障害管理部を示す。FIG. 32 shows a fault management unit using the control processor in the fifth embodiment.

[Explanation of symbols]

１０１，１０２…ホストプロセッサ、１０６，１０７…
ディスク装置、１１１，１３１，１５１…構成管理テー
ブル、１１２，１１６，１３２，１３６，１５２，１５
６…障害管理部、１１３，１３３，１５３…オプション
機構、１１４，１１７，１３４，１３７，１５４，１５
７…制御プロセッサ、１１５，１３５，１５５…キャッ
シュメモリ、１１８，１１９，１３８，１３９，１５
８，１５９…ディスクドライブ、３０１，４０１…コマ
ンド解析プロセッサ、３０４，４０４…障害検知プロセ
ッサ、３０５，４０５…障害管理プロセッサ、３０７…
エミュレーションリクエストプログラム、３０８，４０
８…外部装置リクエストプログラム、３０９，４０９…
ホストリクエストプログラム、３１１，４１１…セレク
タ。101, 102 ... Host processor, 106, 107 ...
Disk device, 111, 131, 151 ... Configuration management table, 112, 116, 132, 136, 152, 15
6 ... Fault management unit, 113, 133, 153 ... Optional mechanism, 114, 117, 134, 137, 154, 15
7 ... Control processor, 115, 135, 155 ... Cache memory, 118, 119, 138, 139, 15
8, 159 ... Disk drive, 301, 401 ... Command analysis processor, 304, 404 ... Fault detection processor, 305, 405 ... Fault management processor, 307 ...
Emulation request program, 308, 40
8 ... External device request program, 309, 409 ...
Host request program, 311, 411 ... Selector.

Claims

[Claims]

1. In a computer system having a plurality of constituent elements, a detection means for detecting a failure of the constituent element, and an alternative constituent element that functions by substituting the constituent element when a failure occurs in the constituent element. A computer system comprising: a searching unit that searches for a component and a management unit that causes the found alternative component to perform a process of substituting the component that has failed.

2. The alternative component comprises a processor capable of executing a program, and using the processor,
The computer system according to claim 1, wherein the operation of the failed component is simulated by emulation or simulation.

3. The computer system according to claim 2, wherein the component in which the failure has occurred is a data compression mechanism, and the processor emulates data compression.

4. The search means generates a data string including data indicating a constituent element in which a failure has occurred and data indicating an alternative constituent element that replaces the constituent element, and sends the data string to the management means. The computer system according to any one of claims 1 to 3, wherein the management unit causes the alternative process to be performed based on the data string.

5. A configuration management table that stores data indicating whether or not a failure has occurred in the constituent element and data indicating processing for substituting the constituent element when the failure occurs, the detection table comprising: The means refers to the configuration management table to detect a failure of a component, and the searching means refers to the configuration management table to search for a process for substituting the failed component. The computer system according to any one of claims 1 to 4.

6. Further, when a failure of each of the constituent elements is detected, the constituent element is made to repeat the same operation again, and when the constituent element normally operates by repeating a predetermined number of times or less, the constituent element is Is provided with means for continuing the operation assuming that no failure has occurred, counting the number of times the operation has shifted to the normal operation by such repetitive operation, and transmitting the fact to the outside when the number of times exceeds a predetermined value. The computer system according to any one of Items 1 to 5.

7. The computer system according to claim 1, wherein the constituent elements are multiplexed.

8. The computer system according to claim 1, wherein the management means itself is also a component capable of performing a process for substituting the component in which the failure has occurred. .

9. The computer system according to claim 1, wherein the constituent elements are constituent elements of a file system of the computer system.

10. A computer system having a host processor or a peripheral device connected to a network, comprising:
In a peripheral device having a plurality of constituent elements, a detecting means for detecting a failure of the constituent element, and, when a failure occurs in a certain constituent element, substitutes the failed constituent element from other internal constituent elements. A peripheral of a computer system characterized by including a searching means for searching an alternative constituent element that functions as a function, and a management means for performing processing for replacing the found alternative constituent element with the failed constituent element. machine.

11. A computer system comprising a host processor and a plurality of peripheral devices, each peripheral device comprising a plurality of constituent elements, wherein failure of the constituent elements of the peripheral equipment is eliminated. A detection means for detecting, and a management means for causing a peripheral device other than the peripheral device including the component or the host processor to perform a process for substituting the component for which the failure has occurred when a failure occurs in the component. A computer system characterized by having.

12. The management means generates and sends a data string including an identifier indicating a failure recovery processing request, an identifier indicating a failed component, and an identifier indicating a requested processing. The computer system according to claim 11, characterized by causing the peripheral device or the host processor to perform an alternative process.

13. The management means transmits the data string to a plurality of peripheral devices, and the peripheral device capable of processing the request stored in the data string performs an alternative process based on the data string. The computer system according to claim 12, wherein

14. A configuration management table that stores data indicating whether or not a failure has occurred in the component and data representing a process or alternative means for replacing the component when the failure occurs. The detecting unit refers to the configuration management table to detect a failure of a component, and the managing unit refers to the configuration management table to determine where to perform the alternative process. The computer system according to any one of claims 11 to 13, wherein:

15. The computer system according to claim 11, wherein the peripheral devices are multiplexed.

16. The computer system according to claim 11, wherein the peripheral device is a file system of a computer system.

17. A computer system comprising a host processor and a plurality of file systems capable of communicating with each other, each file system comprising a plurality of constituent elements including a disk device, and between the respective file systems. In the computer system, which can mutually access the disk devices of other file systems by means of the cross-call of the above, the means for detecting the failure of the above-mentioned file system components, and the configuration when a failure occurs in the components other than the disk device. A file system other than the file system including the element, and a means for requesting a process for substituting the failed component, and the file system that has received the request performs the alternative process based on the request, Due to the cross-call, Computer system, characterized in that the disk device of the file system that contains the elements, direct access.

18. A computer system including a host processor or a peripheral device connected to a network, comprising:
In a peripheral device that has multiple components, a means to detect a fault in a component inside the peripheral device itself, and when a fault occurs in a component, the fault occurs in another peripheral device. A means for transmitting a data string requesting a process to replace a component, and when receiving the processing result of the requested process from another peripheral device, bypasses the component causing the failure,
When a data sequence requesting a process for substituting the component in which the failure has occurred is received from a means for continuing the process and another peripheral device, the process is performed based on the request for the data sequence, and the processing result is displayed as the above data. A peripheral device for a computer system, which is provided with a means for returning the line to the sender.

19. A computer system comprising a host processor or a peripheral device connected to a network, comprising:
In a peripheral device equipped with multiple components, a means for detecting a fault in a component inside the peripheral device itself, and when a fault occurs in a component, the fault occurs in the host processor. Means for transmitting a data string for requesting processing to replace a component, and when the processing result of the requested processing is received from the host processor, bypasses the failed component and continues the processing. A peripheral device for a computer system, which is provided with:

20. A computer system comprising a host processor and a peripheral device connected to the host processor, wherein the peripheral device performs processing normally performed inside itself on another peripheral device or host processor. A computer system having means for requesting.

21. In a computer system comprising a plurality of constituent elements including a processor capable of executing a program, the operation of the constituent elements other than the processor is emulated or simulated by the processor, and a processing result by the constituent element is obtained. A computer system comprising: means for comparing a result of emulation or simulation by the processor.

22. A failure recovery method for a computer system having a plurality of constituent elements, the step of detecting a failure of the constituent element, and when a failure occurs in a certain constituent element, the constituent element is replaced and functions. A failure recovery method for a computer system, comprising: a step of searching for an alternative constituent element; and a step of causing the searched alternative constituent element to perform processing for replacing the failed constituent element.

23. The alternative component includes a processor capable of executing a program, and the operation of the faulty component is simulated by emulation or simulation using the processor.
A method for recovering from a failure in the computer system described in.

24. The failure recovery method for a computer system according to claim 23, wherein the component in which the failure has occurred is a data compression mechanism, and the processor emulates data compression.

25. A configuration management table that stores data indicating whether or not a failure has occurred in the component, and data representing a process of substituting the component when the failure occurs, the configuration management table comprising: The step of performing refers to the configuration management table to detect a failure of the component, and the step of searching refers to the configuration management table to search for a process for substituting the failed component. The failure recovery method for a computer system according to any one of claims 22 to 24.

26. Further, when a failure of each of the constituent elements is detected, the constituent element is caused to repeat the same operation again, and when the constituent element normally operates by repeating a predetermined number of times or less, the constituent element is Has a step of continuing the operation assuming that no failure has occurred, counting the number of times the operation has shifted to a normal operation by such repetitive operation, and transmitting the fact to the outside when the number of times exceeds a predetermined value. A failure recovery method for a computer system according to any one of Items 22 to 25.

27. A failure recovery method for a computer system according to claim 22, wherein said constituent elements are multiplexed.

28. The failure recovery method for a computer system according to claim 22, wherein the component is a component that constitutes a file system of the computer system.

29. A computer system comprising a host processor and a plurality of peripheral devices, each peripheral device comprising a plurality of components, in a failure recovery method for a computer system, wherein the components of the peripheral device are included. Detecting a failure of the component, and when a failure occurs in a certain component, the peripheral device other than the peripheral device including the component or the host processor performs a process of substituting the failed component. A failure recovery method for a computer system, comprising:

30. The step of causing the alternative process to generate a data string including an identifier indicating a failure recovery process request, an identifier indicating a failed component, and an identifier indicating a requesting process. 30. The failure recovery method for a computer system according to claim 29, characterized in that the peripheral device or the host processor is caused to perform an alternative process by transmitting the data as a substitute.

31. The step of causing the alternative process includes transmitting the data string to a plurality of peripheral devices, and the peripheral device capable of processing the request stored in the data string is based on the data string. 31. The failure recovery method for a computer system according to claim 30, characterized by performing an alternative process.

32. A configuration management table further storing data indicating whether or not a failure has occurred in the component, and data representing a process or alternative means for replacing the component when the failure occurs. The detecting step refers to the configuration management table to detect a failure of a component, and the step of performing the substitution process refers to the configuration management table to perform the substitution process. The method according to claim 29 or 3, wherein it is determined whether or not to perform.
1. A failure recovery method for a computer system according to any one of 1.

33. A failure recovery method for a computer system according to claim 29, wherein said peripheral devices are multiplexed.

34. The computer system failure recovery method according to claim 29, wherein the peripheral device is a file system of a computer system.