JPH03500585A

JPH03500585A - Enhanced input/output architecture for toroidally connected distributed storage parallel computers

Info

Publication number: JPH03500585A
Application number: JP63508682A
Authority: JP
Inventors: コック，ロナルド・スティーブン
Original assignee: イーストマン・コダック・カンパニー
Priority date: 1987-10-08
Filing date: 1988-09-29
Publication date: 1991-02-07
Also published as: EP0334943B1; WO1989003564A1; EP0334943A1; DE3867837D1; US4942517A

Abstract

(57)【要約】本公報は電子出願前の出願データであるため要約のデータは記録されません。 (57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】この発明はトロイダル接続された分布記憶装置型並列計算機の入出力能力及び効率の向上に関する。[Detailed description of the invention] This invention aims to improve the input/output capacity and efficiency of a toroidally connected distributed storage type parallel computer. Concerning rate improvement.

背景技術計算のための更なる処理能力を達成しようと努力して、多くの計算機設計者は並列処理に向っている。しかしながら、並列処理を採用した種々の計算機アーキテクチャがある。Background technology In an effort to achieve more processing power for computation, many computer designers Suitable for column processing. However, various computer architectures that employ parallel processing There is Kucha.

並列処理計算機は二つの一般的な形式、即ち共有記憶装置型処理装置を備えたもの及び分布記憶装置型処理装置を備えたもの、に分類され得る。共有記憶装置型計算機は同じ記憶装置にアクセスすることのできる多数の処理装置を含む。対照的に、分布記憶装置型計算機は個々の独立した記憶装置を持った処理装置を備えている。分布記憶装置型処理装置間の通信は、通信相互接続部によって行われる。この発明は、明確には分布記憶装置型並列処理計算機に、更に詳細には、トロイダル接続された計算機に関する。Parallel processing computers come in two general types: those with shared storage processing units; and distributed storage type processing devices. shared storage type A computer includes multiple processing units that can access the same storage device. contrast Generally speaking, a distributed storage computer has a processing unit with each independent storage device. ing. Communication between distributed storage type processing units is carried out by a communication interconnect. . This invention specifically relates to a distributed storage type parallel processing computer, and more specifically to a distributed storage type parallel processing computer. Regarding connected computers.

トロイダル接続された分布記憶装置型並列計算機の性能は、幾つかの因子、即ち、計算機の全記憶容量、計算機における処理装置の数、所与の問題によって必要とされる処理装置間の通信、及びデータを計算機に又は計算機から移動し得る速度、によって制限される。大抵の並列計算機においては、処理装置の数及びその記憶容量を存在するどのような要求にも適合させるように容易に変更することができるので、最初の二つの因子は、比較的重要ではない。第３の因子即ち処理装置相互の通信は、広範且つ集中的な研究の主題である（リン及びモルトパン「Ｍ２メツシュ：拡大メツシュ・アーキテクチャ」並列処理に関する第１回国際会議議事録、３０６ページ、並びにカールソン「グローバル・メツシュを備えたメツシュ：並列計算のためのフレキシブルな高速組織」超高速計算に関する第１回国際会議議事録、６１８ページを参照のこと）。The performance of a toroidally connected distributed storage parallel computer depends on several factors: , the total storage capacity of the computer, the number of processing units in the computer, and the amount required by the given problem. communications between processing units that are limited by the degree. In most parallel computers, the number of processing units and their Storage capacity can be easily modified to suit whatever requirements exist. The first two factors are relatively unimportant. The third factor, processing equipment Communication between plants is the subject of extensive and intensive research (Lin and Maltpan "M 2 Mesh: Extended Mesh Architecture” 1st International Conference on Parallel Processing Minutes, page 306, and Carlson, “Metz with a Global Metz” "Flexible high-speed organization for parallel computing" 1st national conference on ultra-high-speed computing (See Minutes of the International Conference, page 618).

大形のデータ集合の利用（例えば、イメージ処理又はデータベース検査など）に対しては、入出力（１／○）要件も並列計算機の総合スループットを規制する重要な因子に十分なり得る。For use with large data sets (e.g. image processing or database inspection) In contrast, input/output (1/○) requirements are also important in regulating the overall throughput of parallel computers. This can be a sufficient factor.

この明細書においては語「トロイダル」は語「メツシュ」を含むものと理解される。In this specification, the word "toroidal" is understood to include the word "metsu". Ru.

しばしば使用される二つの一般的な並列処理モード即ち分布型とパイプライン型とがある。分布型処理はどの処理装置もデータの異なった部分集合に対して同じ動作を行うことを必要とする。パイプライン処理はどの処理装置も同じ集合のデータに対して異なった動作を行うことを必要とする。トロイダル接続された計算機は両モードで動作するのに特に適しているが、問題もある。Two common parallelism modes often used: distributed and pipelined There is. Distributed processing means that every processing device handles different subsets of data in the same way. Requires performing an action. Pipeline processing means that every processing device processes the same set of data. require different actions on the data. toroidally connected computation Although the machine is particularly suited to operating in both modes, there are problems.

適切なＩｌｏを提供する試みは、一般に各処理装置のノードにＩ１０チャンネルを設けること、及び制御チャンネルを介して種々の処理装置への書き込みを行わせることを含む。この方法は分布型処理に向いているが、チップ費用（ｃｈｊｐ　ｃｏｓｔ）及び回路板面積の点からみると、比較的高価である（ナギの米国特許第４．５１４，８０７号「並列計算機」、及びシンキング・マシーンズ社製「コネクション・マシーン」、インテル・サイエンティフィック・コンピュータズ社製「ハイパキューブ」及びエヌキューブ（Ｎ　ｃｕｂｅ）製「エヌキューブ（Ｎｃｕｂｅ）　／ＩＯＪの市販製品を参照のこと）。更に、Ｉ１０チャンネルにフィードする装置は別の隘路を与える。Attempts to provide adequate Ilo generally include an I10 channel for each processing unit node. and writing to various processing devices via control channels. Including making. This method is suitable for distributed processing, but the chip cost (chjp cost) and circuit board area, it is relatively expensive (Nagi's US special Patent No. 4.514,807 “Parallel Computer” and Thinking Machines Inc. “ Connection Machine”, Intel Scientific Computers "Hyper Cube" manufactured by the company and "N Cube" manufactured by N cube (N cube) Ncube) / IOJ commercial products). Furthermore, on the I10 channel Feeding devices present another bottleneck.

時折取り上げられる代替的方法は、一つの処理装置とのみＩｌｏを行い、並列処理装置の相互接続網を用いてデータを更に分布させることである。この方法は安価ではあるが、帯域幅の制約を受ける。An alternative approach that is sometimes taken up is to perform Ilo with only one processing unit and avoid parallel processing. further distributing data using an interconnected network of physical devices. This method is cheap However, it is subject to bandwidth constraints.

これらの一般的方法は、多数のデータ源又は行先とのＩｌｏを行うことができない。これにより、他の計算機及びデータ記憶又は発生装置と共働するときの有用性が制限される。これらの方法のいずれもがパイプライン型及び分布型並列処理の両方に対して同等に有効ではあり得ない。これらの方法のいずれも、フレキシブルナＩ　／　Ｏ速度を提供することができない。並列システムの主要な利点の一つはその性能のフレキシビリティ　（計算性能を改善するために処理装置を追加すること）であるので、Ｉ１０性能及びハ・・−ドウエアを同じようにスケール（ｓｃａｌｅ）することは重要である。These general methods cannot perform Ilo with multiple data sources or destinations. stomach. This makes it useful when working with other computing and data storage or generation devices. Sexuality is restricted. Both of these methods involve pipelined and distributed parallel processing. cannot be equally valid for both. Both of these methods Unable to provide Bruna I/O speeds. The main advantages of parallel systems One is its performance flexibility (additional processing units can be added to improve calculation performance). I10 performance and hardware should be scaled in the same way. It is important to scale.

この発明の目的は、上述の諸問題を除去するトロイダル接続された分布記憶装置型並列計算機のための一層有効なアーキテクチャを提供することである。The purpose of this invention is to provide a toroidally connected distributed storage device which eliminates the above-mentioned problems. The purpose of this invention is to provide a more effective architecture for type parallel computers.

発明の開示この目的は、各々が独立した記憶装置を有する複数行の処理装置を備えたトロイダル接続された分布記憶装置型並列計算機において、ａ）少なくとも一つの共通Ｉ１０チャンネルと、ｂ）前記共通Ｉ１０チャンネルに接続されるようになされた１行の処理装置（　１　１０）と、Ｃ）各々が前記の１１０行の特定の処理装置と関連した複数のバッファ機構であって、各バッファ機構が、１１０行の任意の所与のＩ１０処理装置が、Ｉ１０チャンネルによるバッファ機構へのアクセスとは無関係に、且つ、前記の１１０行における他の処理装置のバッファ機構へのアクセスとは無関係に、そのバッファ機構にアクセスすることができるように、■１０チャンネルを処理装置に接続するようになされ、データ分布が有効に容易化される複数のバッファ機構と、を含む改良によって達成される。Disclosure of invention The goal is to create a Trojan with multiple rows of processing units, each with independent storage. In a parallel parallel computer with distributed memory connected in parallel, a) at least one common I10 channel; and b) said common I10 channel. a one-line processing device (1 10) connected to the C) a plurality of buffer mechanisms, each associated with a particular processing unit of said 110 lines. Thus, each buffer mechanism allows any given I10 processing unit of 110 rows to Regardless of the channel's access to the buffer mechanism, and line 110 above. that buffer, regardless of its access to the buffer mechanism of other processing units in Connect the ■10 channels to the processing unit so that you can access the mechanism. and a plurality of buffer mechanisms, which are configured to effectively facilitate data distribution. This is achieved through improvements.

メツシュ又はトロイドはそのトポロジー的構造（ｔｏｐｏｌｏｇｉｃａｌｓｔｒｕｃｔｕｒｅ）を破壊することなく大きさを変え又は回転させることができるので、一つの行の処理装置への言及は一列の処理装置への言及と考えることができる。A mesh or toroid is defined by its topological structure (topological structure). The structure can be resized or rotated without destroying it. , a reference to a processor in one row can be thought of as a reference to a processor in a column. Ru.

図面の簡単な説明第１図は、一つの行に挿入された制御処理装置を存する従来技術のトロイダル接続された計算機を示す。Brief description of the drawing FIG. 1 shows a prior art toroidal connection with a control processor inserted in one row. Shows a connected computer.

第２図は、Ｉ１０チャンネルを持った入／出ノノ（Ｉｌｏ）行と一つのＩ１０バッファ機構が１１０行における各処理装置と関連している関連のＩ１０バッファ機構とを用いたこの発明に係るトロイダル接続された分布記憶装置型並列計算機を示す。Figure 2 shows an input/output (Ilo) row with an I10 channel and one I10 channel. An associated I10 buffer is associated with each processing unit in line 110. A toroidally connected distributed storage parallel computer according to the present invention using a mechanism shows.

第３図は、個別のＩ１０チャンネルに接続された二つのＩ１０処理装置の行を含むこの発明に係るトロイダル接続された計算機を示す。Figure 3 includes two rows of I10 processing units connected to separate I10 channels. 1 shows a toroidally connected computer according to this invention.

第４図は、この発明に従って使用され、通常の記憶、仲裁及び母線方法を使用するＩ１０バッファ機構を示す。FIG. 4 shows the method used in accordance with this invention, using conventional storage, arbitration and busbar methods. The I10 buffer mechanism is shown.

第５図は、この発明に従って使用され、通常の二重ポート記憶装置を使用する別のＩ１０バッファ機構を示す。FIG. 5 shows another example of a conventional dual-port storage device used in accordance with the present invention. The I10 buffer mechanism of FIG.

第６図は、ハンドシェイク制御論理部と共に先入れ先出しくＦ　ｒ　ＦＯ）バッファの双方向性ラッチを用いて、処理装置に高速並列データＩ１０ポートを接続するようになされた更に別のＩ１０バッファ機構を示す。FIG. 6 shows a first-in, first-out buffer with handshake control logic. Connect high-speed parallel data I10 ports to processing equipment using bidirectional latches in 3 shows yet another I10 buffer scheme adapted to do so.

第７図は、通常の母線として実現されたＩ１０チャンネルに一つの１１０行が接続されている、第３図に類似したこの発明の一実施例を示す。Figure 7 shows that one 110 row is connected to the I10 channel realized as a normal busbar. 3 shows an embodiment of the invention, similar to FIG.

第８図は、１１０行が高速並列入力に接続され、別の１１０行が高速並列出力に接続された、並列処理装置を有するこの発明の一実施例を示す。Figure 8 shows that 110 rows are connected to high-speed parallel inputs and another 110 rows are connected to high-speed parallel outputs. 1 shows an embodiment of the invention with parallel processing devices connected;

発明を実施するための形態この発明は、トロイダル接続された並列計Ｗ機で改良された入出力動作を行うためのアーキテクチャに関係する。Mode for carrying out the invention This invention provides improved input/output operation in a toroidally connected parallel meter W machine. related to the architecture of

第１図は通信チャンネル１４と相互接続された多くの処理装置１２、及び一つの行に挿入されたノード制御器１６を備えた従来技術の配列１０を示している。各処理装置１２は独立した記憶装置を持っている。ノード制御器１６はトロイダル処理装置の閉じた面へのエントリ・ポイントとして働く。配列の任意の行を１１０行として選択し得る。これらの行の数は、意図される特定の応用に依存する。FIG. 1 shows a number of processing units 12 interconnected with a communication channel 14, and one 1 shows a prior art array 10 with node controllers 16 inserted in rows; each Processing unit 12 has independent storage. The node controller 16 is toroidal Serves as an entry point to the closed side of the processing unit. 11 any row of the array It can be selected as row 0. The number of these lines depends on the particular application intended.

第２図は、この発明に従って接続された特定の１１０行を示している。この処理装置の行２０はＩ１０チャンネル２６から直接にアクセスされ得る。Ｉ１０チャンネルは、大量の情報を迅速に転送し得る任意の機構である。典型的には、Ｉ１０チャンネルは第４図及び第５図に示された標準計算機母線又は並列Ｉ１０ボートである。１１０行は、各処理装置を異なったバッファ機構２４に直接に接続させている。各バッファ機構は、■１０チャンネルがバッファの任意の一つと高速で独立に通信することができるように、記憶・制御装置を提供する。FIG. 2 shows 110 particular rows connected in accordance with the present invention. This process Device row 20 can be accessed directly from I10 channel 26. I10cha A channel is any mechanism that can rapidly transfer large amounts of information. Typically, I1 0 channel is the standard computer bus or parallel I10 board shown in Figures 4 and 5. It is. Line 110 connects each processing unit directly to a different buffer mechanism 24. It's set. Each buffer mechanism has ■10 channels fast with any one of the buffers Provides storage and control equipment so that communication can be performed independently.

任意の所与の処理装置は、入力サイクル又は出力サイクルを完了するためにその関連のバッファ機構と通信する。任意の処理装置は、他のバッファ機構とのＩ１０チャンネル又は処理装置の活動とは無関係にその関連するバッファをアクセスすることができる必要がある。関連の処理装置の他には、Ｉ１０チャンネルだけがバッファへアクセスでき、他の処理装置はそれ自身のバッファとのみ通信することができる。Ｉ１０チャンネル及び処理装置は同時に一つのバッファと通信しようとすることがあるので、それらの通信を仲裁するための装置が設けられなければならない。Ｉ１０チャンネル又は処理装置をバッファの主（ｆｆｌａｓｔｅｒ　ｏｆ　ｔｈｅ　ｂｕｆｆｅｒ）に指定することができ、従って適切な仲裁機構を含まなければならない。処理装置が一つのデータ集合を処理している間にｒ１０チャンネルが別のデータ集合を通信することができるほどバッファが大形に作られていれば、それも又有用であり得る。Any given processing device can complete its input cycle or output cycle. Communicate with the associated buffer mechanism. Any processing device may have I1 with other buffer mechanisms. 0 accesses its associated buffer independent of channel or processing unit activity need to be able to. In addition to the related processing equipment, only the I10 channel has access to the buffer, and other processing units communicate only with its own buffer. be able to. The I10 channel and processing unit communicate with one buffer at a time. equipment to arbitrate these communications must be provided. Must be. The I10 channel or processing unit is the main buffer (fflaste). r of the buffer), and therefore an appropriate arbitration mechanism must include structure. While the processing unit is processing one data set, r The buffer is large enough that 10 channels can communicate different data sets. If made, it can also be useful.

任意の特定の応用に対するＩ１０チャンネルと各チャンネルにおける処理装置との組合せは設計上の考慮事項である。第３図は、二つの行がＩｌｏに対して使用されている形態を示している。多数の行を同じＩ１０チャンネルに接続することも又可能である。I10 channels and processing units in each channel for any particular application. The combination of is a design consideration. Figure 3 shows that two rows are used for Ilo. It shows the form in which it is used. Connecting multiple rows to the same I10 channel It is also possible.

第４図に移ると、仲裁・制御回路３２及び記憶装置３４を備えた機構２４が示されている。この構成は処理装置１２又はチャンネル２６から記憶装置３４へのアクセスを可能にする。このような構成２４はすでに作られており、記憶装置３４のための標準ランダム・アクセス記憶装置（ＲＡＭ）と共に、また、仲裁・制御回路３２を実現する標準母線要求／許可及びデータ読出し／書込み信号と共に首尾よく用いられている。処理装置は回路２４の主（ｔｈｅｍａｓｔｅｒ）であって、記憶装置への情報の読出し、書込みをする。Turning to FIG. 4, a mechanism 24 with an arbitration and control circuit 32 and a storage device 34 is shown. It is. This configuration provides access to storage device 34 from processing unit 12 or channel 26. access. Such a configuration 24 has already been created and the storage device 34 Along with standard random access storage (RAM) for arbitration and control The standard bus request/grant and data read/write signals that implement the circuit 32 The tail is often used. The processing unit is the master of the circuit 24. to read and write information to the storage device.

例えば、Ｉ１０チャンネル２６が記憶装置３４と通信することを望むときには、Ｉ１０チャンネルはまず処理装置に要求の信号を送る。そこで処理装置は記憶装置３４との通信を止め、Ｉ１０チャンネル２６にアクノリジ信号を送り返す。For example, when I10 channel 26 desires to communicate with storage device 34, The I10 channel first sends a request signal to the processing unit. Therefore, the processing unit It stops communication with the device 34 and sends an acknowledge signal back to the I10 channel 26.

第５図において、二重ポート記憶装置３６は二重バッファ記憶装置（ｄｏｕｂｌｃｄ−ｂｕｆｆＯｒｅｄ　ｍｃｍｏｒｙ”）を提供する。二重バッファ記憶装置は、その二つの別個の機構（この場合にはＩ１０チャンネル及び処理装置）が同時且つ独立にアクセスすることのできる記憶装置である。これにより、第６図に示されたような特別の仲裁・制御回路の必要性をなくする。In FIG. 5, dual port storage 36 is double buffer storage (double buffer storage). cd-buffOred mcmory”). Double buffer storage , the two separate mechanisms (in this case the I10 channel and the processing unit) are the same. A storage device that can be accessed simultaneously and independently. This results in Figure 6. Eliminating the need for special arbitration and control circuitry as shown.

第６図には、以下のように一時的に、動作するラッチ／ＦＩＦＯメモリバッファ３８を含む別の機構２４が示されている。Figure 6 shows a latch/FIFO memory buffer that temporarily operates as shown below. Another mechanism 24 is shown including 38.

データ転送の方向に依存して処理装置又はポートによりデータがバッファ３８に与えられる。ＦＩＦＯ（先入れ先出しバッファ）はデータ利用可能信号の受信時にデータをバッファへ読込む。Data is transferred to buffer 38 by the processing unit or port depending on the direction of data transfer. Given. FIFO (first in, first out buffer) is used when receiving a data available signal. reads the data into the buffer.

データ要求信号の受信時に、ＦＩＦＯはデータを書き出す。Upon receiving a data request signal, the FIFO writes out the data.

ポート及び処理装置は（上述のハンドシェイクの信号を用いて）直接通信することができるけれども、バッファが満杯になるまで、ＦＩＦＯは処理装置又はポートが短時間の間異なる速度でデータを転送することを可能にする。Ports and processing units may communicate directly (using the handshake signals described above). However, the FIFO is not used by the processing unit or port until the buffer is full. allows clients to transfer data at different speeds for short periods of time.

この発明を構成する種々の素子は、モジニール状に配列し、分布型又はパイプライン型の動作を与えるように相互接続することができる。例えば、単一の印刷配線板を用いて、処理装置の行及びそのバッファ・制御機構２４と共働する関連のチャンネルを備えた１１０行を提供することができる。処理装置を別個の印刷配線板上に設けることもできる。第７図では、分布型動作モードが実現されている。第８図においては、種々の素子は、分布型及びパイプライン型の両モードにおいて採用され得る構成を与えるように選択される。この配置においては、二つの別個のＩ１０チャンネル印刷配線板が使用される。任意数のＩ１０ボードを母線上に配置し、処理装置配列の一つの行として相互接続することができる。配列の任意の行は一組のＩ１０ボードと関連のチャンネルとを有することができる。The various elements constituting this invention are arranged in a modular manner, distributed type or pipeline type. can be interconnected to provide in-type operation. For example, a single printing layout A wire board is used to create a line of the processing unit and its associated buffer and control mechanism 24. 110 rows with channels can be provided. If the processing unit is a separate printing It can also be provided on a wire plate. In Figure 7, the distributed mode of operation is realized. . In Figure 8, various elements are shown in both distributed and pipelined modes. selected to provide a configuration that can be employed. In this arrangement, two A separate I10 channel printed wiring board is used. Any number of I10 boards as busbars and interconnected as one row of a processing device array. array of Any row can have a set of I10 boards and associated channels.

第７図には、計算機５０に結合された母線の形のチャンネル２６が示されている。表示装置５２も計算機５ｏに接続されている。この構造は、分布型及びパイプライン型の処理モードに構成され、良好に作動した。英国ブリストルのインモス社によって製造される「インモス・トランスピユータ（Ｉｎｍｏｓ　Ｔｒａｎｓｐｕｔｅｒ）Ｊが処理装置として使用された。制御ノードも同じトランスピユータテアった。機構２４はダイナミック・ランダム・アクセス記憶装置であった。FIG. 7 shows a channel 26 in the form of a bus bar coupled to a calculator 50. . A display device 52 is also connected to the computer 5o. This structure is distributed type and pipe It was configured in line type processing mode and worked well. Inmos, Bristol, UK "Inmos Transputer" manufactured by putter) J was used as the processing device. The control node also has the same transp It was Tatea. Facility 24 was a dynamic random access storage device.

Ｉ１０チャンネル２６は工業標準ＶＭＥ母線で実現された。使用されたホスト計算機５０は［サン（Ｓｕｎ）Ｊワークステーションであった。仲裁は、母線要求／許可、読圧し／書込み、データ利用可能／要求及びアクノリジ信号を用いた定義された母線プロトコルによって通常の方法で達成された。I10 channel 26 was implemented with an industry standard VME bus. Host used Computer 50 was a Sun J workstation. Arbitration is busbar request /permit, read/write, data available/configuration using request and acknowledge signals This was accomplished in the usual way by a defined busbar protocol.

各記憶装置と関連した異なった記憶容量をそれぞれ有する三つの個別の印刷配線板が実現された。更に、これらの印刷配線板の一つは、第２図に示すように四つの処理装置を備えた１１０行を実現した。これらの印刷配線板は上記のように相互接続され、トロイダル配列における種々の行長さ及び列高さを持った４個の処理装置から９６個の処理装置に及ぶ種々の大きさの計算機を構成する。これらの計算機は、大全のデータ集合を処理する間、並列処理装置に対する重大な通信上の隘路に遭遇することなく、ワークステーション・ホスト（ｗｏｒｋｓｔａｔｉｏｎ　ｈｏｓｔ）の性能の７０倍もの性能を示した。Three separate printed wires, each with a different storage capacity associated with each storage device The board was realized. Furthermore, one of these printed wiring boards has four parts as shown in Figure 2. A 110-line system was realized with a processing device of. These printed wiring boards are compatible as described above. Four treatments are interconnected and have various row lengths and column heights in a toroidal arrangement. We configure computers of various sizes, ranging from physical units to 96 processing units. these While processing a large data set, the computer uses critical communication connections to parallel processing units. workstation host (workstati) without encountering bottlenecks. on host).

この発明を用いると、処理装置配列のＩ１０侑造を容易に変更することができる。パイプライン型の応用に対しては、第８図に示されたように、一つの行における一つのＩ１０チャンネルはデータ入力として機能し、別のＩ１０チャンネルはデータ出力として機能することができる。そこで、データは各段で処理されて構造を流れる。より高速又はより低速のデータ、スループットが要求される場合には、配列（及び関連の１１０行）は、処理装置の数を一定に保ちながら、より広く又はより狭くされる。その代りに、処理速度が変わり、Ｉ１０スルーブツト速度が一定に保たれる場合には、配列は処理装置の非Ｉ１０行（ｎｏｎ　−Ｉ　／　ＯｒｏνＳ）を除去し又は追加することによって、より高く又はより短くすることができる。処理装置の追加又は削除により、処理速度が変わるが、Ｉ１０ハードウェアは追加されない。これは、計算能力及びＩ１０スループットの両方に対して費用効率がよくフレキシブルな性能を与える。Using this invention, it is possible to easily change the I10 arrangement of processing equipment. . For pipelined applications, in one row, as shown in Figure 8, One I10 channel serves as the data input and another I10 channel serves as the data input. Can function as data output. Therefore, data is processed in each stage. flowing through the structure. When faster or slower data or throughput is required The array (and associated 110 rows) can be made more spacious while keeping the number of processing units constant. narrower or narrower. Instead, the processing speed changes, increasing the I10 throughput speed. If the degree is kept constant, the array will be make it higher or shorter by removing or adding OroνS) be able to. Processing speed changes by adding or removing processing devices, but No hardware is added. This increases both computational power and I10 throughput. cost-effective and flexible performance.

上述のように、第８図は分布型処理に対しても同様に十分に使用し得る。異なるＩ１０チャンネルをデータ源及び／又は行先として使用し得る。As mentioned above, FIG. 8 can be used equally well for distributed processing. different The I10 channel may be used as a data source and/or destination.

複数のＩ１０チャンネルを持つことにより、付加的な能力が提供される。処理装置配列全体はスイッチング回路網として動作することができる。データをソースＩ１０チャンネルから読み込んで別のＩ１０チャンネルへ向けることができる。Having multiple I10 channels provides additional capabilities. processing equipment The entire array can operate as a switching network. source data It can read from an I10 channel and direct it to another I10 channel.

効　果この発明は、トロイダル接続された分布記憶型並列計算機のための改善されたＩ１０性能を提供する。この発明は、処理装置とＩ１０ハードウェアとの要件の間に費用効率よく均衡を保って、多命令・多データ（ＭＩＭＤ）並列処理装置の実用的且つフレキシブルな実現を示す。この発明は処理装置配列を利用する。配列をより広く又はより狭くして、より多（の又はより少ない処理装置がＩ１０行上のデータを受け入れるようにする（バッファ用ハードウェアはこの速度に適応することができなければならない）ことにより、データ速度を変更することができる。単純なトーラス又はメッシユは、少なくとも一つの行又は列が各１１０経路に対するようにして、多数の行又は列を用いることによって、任意の数のデータＩ１０経路を持つことができる。上の考察は、種々の必要性をパイプライン型及び分布型の処理に対する大きなＩｌｏの要件に合致させる際に、大きなフレキシビリティを与える。共通の行と関連したバッファも、異なる応用の要件を満たすように１１０行を拡張又は追加することによって大きさを適応させることができるデータ記憶装置のためのローカル・キャッシュ（ｌｏｃａｌ　ｃａｃｈｅ）として機能することができる。行に対して高速インターフェースを設けることによって、個々の処理装置及びその相互接続部を比較的低速、簡単且つ安価にすることができる。effect This invention provides improved I for toroidally connected distributed memory parallel computers. 10 performance. This invention provides a solution between the requirements of the processing unit and the I10 hardware. cost-effectively balance the implementation of multi-instruction, multi-data (MIMD) parallel processors. It shows a practical and flexible implementation. The invention utilizes a processing device array. array wider or narrower so that more (or fewer) processing units are on the I10 line. data (the buffering hardware has to adapt to this speed). (must be able to change the data rate) Ru. A simple torus or mesh has at least one row or column of 110 paths each. Any number of data by using a large number of rows or columns, such as It can have an I10 route. The above considerations explain the various needs of pipeline type and large flexibility in meeting large Ilo requirements for distributed and distributed processing. Gives you the ability. Common rows and associated buffers also meet the requirements of different applications You can adapt the size by extending or adding 110 lines like so: local cache for data storage and function. By providing a fast interface to This makes the individual processing devices and their interconnections relatively slow, simple, and inexpensive. I can do it.

更に、複数のＩ１０チャンネルが使用される場合には、処理装置配列は単純なスイッチング回路網として機能することができる。Furthermore, if multiple I10 channels are used, the processing unit array can be reduced to a simple It can function as a switching network.

ＦＩＧ、　５￥コイ７゛ルたコ里根ＩＦＩＧ、　７ＦＩＧ、　８国際調査報告ｍ＋’ｍｍ＋ａ’１ｋｅＮｔｎ−１ｔ＠、ＰＣＴ／ＵＳ　ＥｌＢ１０３３４０国際調査報告FIG. 5 ￥Koi 7゛ru Tako Rine I FIG. 7 FIG.8 international search report m+’mm+a’1keNtn-1t@, PCT/US ElB103340 country international investigation report

Claims

[Claims]

1. Toroidally connected with rows of processing units, each with independent storage In a distributed memory parallel computer, a) at least one common I/O channel and b) the processing device adapted to be connected to the common I/O channel. (I/O) line and c) a plurality of buffer mechanisms, each associated with a particular processing unit of said I/O line; Each buffer mechanism specifies that any given I/O processing device in an I/O row The mechanism can be configured independently and before access to the buffer mechanism by the I/O channel. access to the buffer mechanism of other processing units on the I/O line. Connect the processing device to the I/O channel so that it can access and multiple buffer mechanisms, thereby effectively facilitating data distribution. A calculator characterized by:

2. Two I/O channels, each accessing a different I/O row, and each I/O A channel and its particular I/O line have associated multiple buffer mechanisms. 2. The computer according to claim 1.

3. 2. The computer of claim 1, wherein each buffer tree includes dual port storage.

4. The computer according to claim 1, wherein each buffer mechanism includes a FIFO buffer in a latch type. .

5. A first I/O channel receives input data in parallel and transfers the data to its I/O channel. O row, and a second I/O channel receives processed data from that I/O row. 3. A computer according to claim 2, wherein the computer receives and sends the data as parallel output.