JPH07200542A

JPH07200542A - Vector processor

Info

Publication number: JPH07200542A
Application number: JP33811793A
Authority: JP
Inventors: Shoji Nakatani; 彰二中谷; Takashi Mochiyama; 貴司持山; Koji Kuroda; 浩二黒田; Katsuhiko Konno; 勝彦今野; Hiroaki Atsumi; 宏昭渥美
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1993-12-28
Filing date: 1993-12-28
Publication date: 1995-08-04

Abstract

PURPOSE:To improve the performance of a vector processor which successively calculate a series of data stored in a main storage device after supplying them to an arithmetic pipeline by performing the arithmetic processing without deteriorating the overall computing efficiency for the arithmetic pipeline of small throughput and by separating the converging period of a vector macroinstruction from the outrun prevention control in regard of the sum total, the retrieval, etc. CONSTITUTION:If a pipeline that has small throughput compared with other pipelines is detected, a bank management part prescribes the access timing to a vector register of the pipeline. Under such conditions, the pipeline of small throughput uses the timing (SR) set for a memory access pipeline as the access timing (DR1/DR2/DW3) set to the vector register. In addition, a changing part is added on a transmission line of an outrun prevention signal. Thus the outrun prevention signal is variable by a notification signal which notifies a converging period of the relevant instruction.

Description

Detailed Description of the Invention

【０００１】（目次）産業上の利用分野従来の技術（図１９）発明が解決しようとする課題（図１９）課題を解決するための手段（図１，図２）作用実施例（ａ）第１実施例の説明（図３〜図７）（ｂ）第２実施例の説明（図８〜図１８）発明の効果(Table of Contents) Industrial Application Field of the Prior Art (FIG. 19) Problem to be Solved by the Invention (FIG. 19) Means for Solving the Problem (FIGS. 1 and 2) Action Example (a) Description of the first embodiment (FIGS. 3 to 7) (b) Description of the second embodiment (FIGS. 8 to 18)

【０００２】[0002]

【産業上の利用分野】本発明は、主記憶装置に格納され
ている一連のデータを順次演算パイプラインに入力して
演算するベクトル処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a vector processing device for sequentially inputting a series of data stored in a main storage device to a calculation pipeline for calculation.

【０００３】[0003]

【従来の技術】一般に、ベクトル処理装置においては、
例えば、ベクトルＢに属するエレメント・データｂ₀,ｂ
₁,…および／またはベクトルＣに属するエレメント・デ
ータｃ ₀,ｃ₁,…をパイプライン処理によって演算し、そ
の結果得られたａ₀,ａ₁,…をベクトルＡに属するエレメ
ント・データとして抽出することが行なわれる。この場
合、主記憶装置から置換ロードして演算パイプラインに
入力したり、演算パイプラインから主記憶装置に置換ス
トアしたりすることは、主記憶装置のアクセス速度に制
限され、処理速度が遅くなってしまう。2. Description of the Related Art Generally, in a vector processing device,
For example, the element data b belonging to the vector B₀, b
₁, ... and / or the element data belonging to the vector C
Data c ₀, c₁, ... are calculated by pipeline processing and
A obtained as a result of₀, a₁, ... belong to vector A
Extraction as input data. This place
In case of replacement, it is replaced and loaded from the main memory to the operation pipeline.
Input or replace from arithmetic pipeline to main memory
The access speed of the main memory is restricted
The processing speed is slowed down.

【０００４】このため、通常、主記憶装置と演算パイプ
ラインとの間に、複数個のバンク単位により構成されイ
ンターリーブ構造をもつベクトル・レジスタをそなえる
ことが行なわれている。このベクトル・レジスタは、１
つのベクトル・レジスタに属するエレメント・データの
例えば第ｉ番目のデータと第（ｉ＋１）番目のデータと
が互いに異なるバンク単位に格納されるように構成さ
れ、各バンク単位の読出出力が互いに異なるパスを介し
て演算パイプラインに供給されるように構成されるとと
もに、演算パイプラインから得られた演算結果を互いに
異なるパスを介して各バンク単位に書き込むように構成
されている。For this reason, it is common practice to provide a vector register having a unit of a plurality of banks and having an interleave structure between the main memory device and the arithmetic pipeline. This vector register is 1
For example, the i-th data and the (i + 1) -th data of the element data belonging to one vector register are configured to be stored in different bank units, and the read output of each bank unit is different from each other. In addition to being configured to be supplied to the operation pipeline via the operation pipeline, the operation result obtained from the operation pipeline is written to each bank unit via different paths.

【０００５】この演算パイプラインが複数ある場合、演
算パイプラインからの要請によってベクトル・レジスタ
へ同時に並列的にアクセスすることが可能な数は、例え
ば８バンク単位にインターリーブされているとすれば、
８個まで可能である。従って、各演算パイプラインから
ベクトル・レジスタへのアクセスにおいて、同時に同一
のバンク単位にアクセスしないように、また、各バンク
単位ないし各演算パイプラインが効率よく動作するよう
に、ベクトル・レジスタへのアクセス・タイミングを管
理することが極めて重要となる。When there are a plurality of arithmetic pipelines, the number of simultaneously accessible parallel vector registers in response to a request from the arithmetic pipeline is interleaved in units of, for example, 8 banks.
Up to 8 are possible. Therefore, in accessing the vector register from each operation pipeline, access to the vector register is prevented so that the same bank unit is not accessed at the same time, and each bank unit or each operation pipeline operates efficiently.・ It is extremely important to manage the timing.

【０００６】ベクトル・レジスタへの各エレメント・デ
ータの格納に際し、例えば上述のエレメント・データｂ
₀,ｂ₁,…，ｃ₀,ｃ₁,…については同じエレメント番号の
データが演算される関係から、同じタイミングで読出を
行なうと好都合であるため、可能な限り異なったバンク
単位に位置するように格納されている。従って、ベクト
ル・レジスタへアクセスするためのハードウェアでは、
各エレメント・データの最初の格納アドレス（例えば第
０番目のエレメントに対するバンク情報）について記憶
する手段が必要であるとともに、同時に同一のバンク単
位へアクセスしないような作用が必要となり、アクセス
制御のためのハードウェアが複雑になってしまう。When storing each element data in the vector register, for example, the above-mentioned element data b
₀ , b ₁ , ..., C ₀ , c ₁ , ... Are conveniently located in different bank units as much as possible because it is convenient to read at the same timing because the data of the same element number is calculated. Is stored as. So in the hardware to access the vector register,
A means for storing the first storage address of each element data (for example, bank information for the 0th element) is required, and an operation for not accessing the same bank unit at the same time is required. The hardware becomes complicated.

【０００７】そこで、従来より、インターリーブされた
複数のバンク単位に複数のエレメント・データを記憶す
るベクトル・レジスタと、このベクトルレジスタの各エ
レメント・データをアクセスする演算パイプラインおよ
びメモリ・アクセス・パイプラインと、これらのパイプ
ラインが各バンク単位をアクセスできるタイミングを示
すバンクスロットを管理するバンク管理部とを有するベ
クトル処理装置において、各パイプラインを起動する時
点で、バンク管理部からバンクスロット信号を送出する
ことにより、各パイプラインのアクセスタイミングを規
定することが行なわれている（例えば特開昭５７−３１
０７９号公報参照）。Therefore, conventionally, a vector register for storing a plurality of element data in a unit of a plurality of interleaved banks, and an arithmetic pipeline and a memory access pipeline for accessing each element data of the vector register. In a vector processing device having a bank management unit that manages a bank slot indicating the timing at which these pipelines can access each bank unit, a bank slot signal is sent from the bank management unit at the time of activating each pipeline. By doing so, the access timing of each pipeline is regulated (for example, JP-A-57-31).
079).

【０００８】このような従来のベクトル処理装置におけ
るバンクスロットのタイミング例について、図１９を参
照しながら説明する。この図１９において、Ｂ０〜Ｂ７
はそれぞれ８個のバンク単位を示し、ＬＷは、メモリ・
アクセス・パイプラインのうちのロード・パイプライン
が主記憶装置（メモリ）からベクトル・レジスタにデー
タを書き込むタイミングを示し、ＳＲは、メモリ・アク
セス・パイプラインのうちのストア・パイプラインがベ
クトル・レジスタから主記憶装置へ読み出すタイミング
を示す。An example of the bank slot timing in such a conventional vector processing apparatus will be described with reference to FIG. In FIG. 19, B0 to B7
Indicates a unit of 8 banks, and LW indicates a memory
The load pipeline of the access pipeline indicates the timing of writing data from the main memory (memory) to the vector register, and SR is the store pipeline of the memory access pipeline. The timing of reading from the memory to the main memory is shown.

【０００９】また、Ｅ，Ｆはそれぞれ例えば２種類の演
算パイプライン（乗算もしくは加算パイプライン）が動
作するバンクスロットの区別を示し、各Ｅ，Ｆに付され
たＲ，Ｗはそれぞれ各オペランドによるベクトル・レジ
スタから各演算パイプラインへの読出(READ)，各演算パ
イプラインからベクトル・レジスタへの書込（WRITE)に
対応するものであり、Ｒ，Ｗに付された数字１，２，３
はそれぞれ各オペランドＯＰ１，ＯＰ２，ＯＰ３の番号
に対応している（ＯＰ１＊ＯＰ２⇒ＯＰ３；＊は演算
子）。Further, E and F indicate the distinction between bank slots in which, for example, two types of operation pipelines (multiplication or addition pipelines) operate, and R and W attached to each E and F depend on each operand. It corresponds to reading from the vector register to each operation pipeline (READ) and writing from each operation pipeline to the vector register (WRITE), and the numbers 1, 2, and 3 attached to R and W.
Respectively correspond to the numbers of the operands OP1, OP2, OP3 (OP1 * OP2⇒OP3; * is an operator).

【００１０】さらに、（０）〜（７）はカウント値を示
している。この図１９に示すように、従来、演算パイプ
ラインのように固定パイプライン長として割り当てるバ
ンクスロットには、ＥＲ１，ＥＲ２，ＥＷ３が割り当て
られ、メモリ・アクセス・パイプラインのようにパイプ
ライン長が不定のパイプラインには、ＬＷまたはＳＲが
割り当てられている。Further, (0) to (7) indicate count values. As shown in FIG. 19, ER1, ER2, and EW3 are conventionally assigned to bank slots that are conventionally assigned as fixed pipeline lengths like arithmetic pipelines, and pipeline lengths are undefined like memory access pipelines. LW or SR is assigned to the pipeline.

【００１１】一方、ベクトル処理装置では、処理を高速
化するために、１つの命令の実行完了を待たずに後続の
命令の実行を開始するが、処理スループットの異なるパ
イプライン間で結果オペランドと入力オペランドとに依
存関係がある場合には、追越し防止制御を行なう必要が
ある。複数のパイプライン間のデータリンクの状況を正
確に確認して適切な追越し防止制御を行なうには、物量
的なインパクトが大きく、遅延時間も大きくなるので、
現実的ではない。On the other hand, in the vector processing device, in order to speed up the processing, execution of the subsequent instruction is started without waiting for the completion of the execution of one instruction, but the result operand and the input are input between pipelines having different processing throughputs. If there is a dependency relationship with the operand, it is necessary to perform overtaking prevention control. To accurately check the status of data links between multiple pipelines and perform appropriate overtaking prevention control, the physical impact will be large and the delay time will be large.
Not realistic.

【００１２】そのため、適切さは欠くが簡便な手法が従
来より使用されている。その１つが、少なくとも１つの
演算パイプラインがリンク状態にあるときには、追越し
禁止制御を行なう際に、対象とする演算パイプラインの
みでなく、全演算パイプラインに対して追越し禁止制御
を適用するという手法である。つまり、従来の追越し制
御防止部は、ロード命令が他のパイプラインとリンク動
作（チェイニングともいう）を開始するという情報を命
令発信／管理部から受け取ると、そのロード命令が完了
するまで、追越し防止制御を活性化する。追越し防止制
御部は、リンク動作しているロード・パイプラインが、
メモリ・バス・コンフリクト等の要因で、所定のスルー
プットを下回るおそれがあることを検出すると、追越し
防止制御信号を立ち上げることにより、演算パイプライ
ン，ストア・パイプラインの実行を中断させる。Therefore, a simple method, which lacks appropriateness, has been used conventionally. One of them is a method of applying overtaking prohibition control not only to the target arithmetic pipeline but also to all arithmetic pipelines when performing the overtaking prohibition control when at least one arithmetic pipeline is in the link state. Is. In other words, the conventional overtaking control prevention unit receives the information that the load instruction starts the link operation (also called chaining) from another pipeline from the instruction transmitting / managing unit, and the overtaking control is completed until the load instruction is completed. Activate preventive control. The overtaking prevention control unit is
When it is detected that the throughput may fall below a predetermined throughput due to a memory bus conflict or the like, the execution of the arithmetic pipeline and the store pipeline is suspended by raising the overtaking prevention control signal.

【００１３】[0013]

【発明が解決しようとする課題】ところで、図１９に示
すようにバンクスロットのタイミングを設定する場合、
固定長パイプラインの中でも、乗算パイプラインや加算
パイプラインは、各サイクル、演算結果を得ることが可
能であるが、割算パイプラインのように各サイクル毎に
数ビットの結果しか得られないものもある（乗算／加算
時に比べ、１／３以下のパイプラインで、例えば１／７
のスループット）。By the way, when setting the timing of the bank slot as shown in FIG.
Among fixed length pipelines, the multiplication pipeline and the addition pipeline can obtain the operation result in each cycle, but like the division pipeline, only a few bits of result can be obtained in each cycle. There is also a pipeline (1/3 or less compared to the case of multiplication / addition, for example, 1/7
Throughput).

【００１４】このようなスループットの少ない演算パイ
プライン、つまり読出あるいは書込に要するバンクの使
用時間が少ない演算パイプラインが、バンクスロット
（ＥＲ１，ＥＲ２，ＥＷ３，ＦＲ１，ＦＲ２，ＦＷ３）
を占有することは、全体の演算効率を落とし、ベクトル
処理装置全体のスループットの低下を招くという課題が
あった。Such an arithmetic pipeline with a low throughput, that is, an arithmetic pipeline with a small bank usage time required for reading or writing, is a bank slot (ER1, ER2, EW3, FR1, FR2, FW3).
Occupancy of the vector reduces the overall calculation efficiency and lowers the throughput of the entire vector processing device.

【００１５】また、従来の追越し防止制御方式では、他
のパイプラインの動作フェイズには関与しないため、過
剰にパイプラインを止めてしまうことがある。例えば、
総和命令，検索命令，抽出命令のように最終的に１つの
結果を求める演算では、演算途中で発生する中間的な結
果をまとめ上げるための収束期間が存在するが、このよ
うな収束期間は、原理的に前述のリンク動作とは独立に
動作することが可能であるにもかかわらず、従来の追越
し防止制御では、その追越し防止制御信号による処理停
止のために収束処理までも停止してしまい、処理速度の
低下を招くなどの課題もあった。Further, in the conventional overtaking prevention control system, the pipeline may be excessively stopped because it does not participate in the operation phase of other pipelines. For example,
In an operation that finally obtains one result, such as a summation instruction, a search instruction, and an extraction instruction, there is a convergence period for collecting intermediate results that occur in the middle of the operation, but such a convergence period is Although it is possible in principle to operate independently of the above-mentioned link operation, in the conventional overtaking prevention control, even the convergence processing is stopped due to the processing stop by the overtaking prevention control signal, There were also problems such as a decrease in processing speed.

【００１６】本発明はこのような課題に鑑み創案された
もので、本発明の第１の目的は、スループットの少ない
演算パイプラインについて、全体の演算効率を落とすこ
となく演算処理を行なえるようにして、演算スループッ
トの向上をはかったベクトル処理装置を提供することで
ある。また、本発明の第２の目的は、演算パイプライン
が総和，検索等のベクトルマクロ命令の収束処理のシー
ケンス実行中である場合には、追越し防止制御を行なわ
ないように制御することにより、処理速度の改善をはか
ったベクトル処理装置を提供することである。The present invention has been made in view of the above problems, and a first object of the present invention is to enable an arithmetic pipeline having a low throughput to perform arithmetic processing without lowering the overall arithmetic efficiency. Thus, it is an object of the present invention to provide a vector processing device aiming to improve the calculation throughput. A second object of the present invention is to perform processing by controlling not to perform overtaking prevention control when the operation pipeline is executing a sequence of convergence processing of vector macro instructions such as summation and search. It is an object of the present invention to provide a vector processing device with an improved speed.

【００１７】[0017]

【課題を解決するための手段】図１は第１の発明の原理
説明図である。第１の発明のベクトル処理装置も、基本
的には従来のベクトル処理装置と同様に、インターリー
ブされた複数のバンク単位に複数のエレメント・データ
を記憶するベクトル・レジスタと、このベクトル・レジ
スタの各エレメント・データをアクセスする複数の演算
パイプラインおよび１つまたは複数のメモリ・アクセス
・パイプラインと、これらのパイプラインが各バンク単
位をアクセスできるタイミングを示すバンクスロットを
管理するバンク管理部とから構成され、演算パイプライ
ンおよびメモリ・アクセス・パイプラインがベクトル・
レジスタの各バンク単位を順次アクセスして各エレメン
ト・データが処理されるようになっている。FIG. 1 illustrates the principle of the first invention. The vector processing device of the first invention is also basically the same as the conventional vector processing device, and a vector register for storing a plurality of element data in units of a plurality of interleaved banks, and each of the vector registers. Comprised of a plurality of operation pipelines for accessing element data and one or more memory access pipelines, and a bank management unit for managing bank slots indicating timings at which these pipelines can access each bank unit And the arithmetic and memory access pipelines
Each bank of registers is sequentially accessed to process each element data.

【００１８】そして、第１の発明では、複数の演算パイ
プラインの中に、他の演算パイプラインに比べて演算ス
ループットの低い演算パイプラインを少なくとも１つ有
する場合、図１に示すように、バンク管理部により、複
数の演算パイプラインのうちの演算スループットの低い
演算パイプラインがベクトル・レジスタをアクセスする
タイミング（ＤＲ１，ＤＲ２，ＤＷ３）は、メモリ・ア
クセス・パイプラインとして割り付けられたタイミン
グ、特にメモリ・アクセス・パイプラインのうちのスト
ア・パイプラインがベクトル・レジスタから主記憶部へ
ストア動作する読み出しタイミング（ＳＲ）のバンクス
ロットに対して割り当てられている（請求項１，２）。
なお、図１中の各符号は、図１９により前述したものと
同様であるが、Ｄは、演算スループットの低い演算パイ
プラインが動作するバンクスロットの区別を示してい
る。In the first invention, when at least one operation pipeline having a lower operation throughput than other operation pipelines is provided in the plurality of operation pipelines, as shown in FIG. The timing (DR1, DR2, DW3) at which the management unit accesses the vector register by the operation pipeline having a low operation throughput among the plurality of operation pipelines is the timing assigned as the memory access pipeline, especially the memory. The store pipeline of the access pipelines is assigned to the bank slot of the read timing (SR) at which the store operation is performed from the vector register to the main memory (claims 1 and 2).
Note that each reference numeral in FIG. 1 is the same as that described above with reference to FIG. 19, but D indicates the distinction of bank slots in which the operation pipeline with low operation throughput operates.

【００１９】また、図２は第２の発明の原理ブロック図
で、この図２において、２１はインターリーブされた複
数のバンク単位に複数のエレメント・データを記憶する
ベクトル・レジスタ、２２はベクトル・レジスタ２１上
のデータを入力オペランドとするかもしくは演算結果を
ベクトル・レジスタ２１に書き込む１つまたは複数の演
算パイプライン、２３は主記憶部２４からベクトル・レ
ジスタ２１へデータを転送する１つまたは複数のロード
・パイプラインである。FIG. 2 is a block diagram of the principle of the second invention. In FIG. 2, 21 is a vector register for storing a plurality of element data in a plurality of interleaved banks, and 22 is a vector register. One or a plurality of operation pipelines that use the data on 21 as an input operand or write an operation result to the vector register 21, and 23 is one or more that transfers data from the main memory 24 to the vector register 21. It is a load pipeline.

【００２０】また、２５は追越し防止制御部で、この追
越し防止制御部２５は、ロード・パイプライン２３から
ベクトル・レジスタ２１へデータを転送する命令の実行
中に、ロード・パイプライン２３がベクトル・レジスタ
２１に書き込んだデータを入力オペランドとする後続の
演算命令を演算パイプライン２２が実行する場合、命令
の実行順序を保証するために、ロード・パイプライン２
３の実行を後続の演算パイプライン２２の処理が追い越
す条件を検出した時に、全ての演算パイプライン２２の
実行を一時中断するものである。Further, reference numeral 25 denotes an overtaking prevention control unit, and the overtaking prevention control unit 25 is used by the load pipeline 23 during execution of an instruction for transferring data from the load pipeline 23 to the vector register 21. When the operation pipeline 22 executes a subsequent operation instruction that uses the data written in the register 21 as an input operand, the load pipeline 2 is used to guarantee the execution order of the instructions.
When the condition of the processing of the subsequent arithmetic pipeline 22 overtaking the execution of No. 3 is detected, the execution of all the arithmetic pipelines 22 is suspended.

【００２１】そして、第２の発明では、変更部２６が新
たにそなえられている。この変更部２６は、ベクトル・
レジスタ２１からのデータ供給を受けるリード処理期間
と、リード処理期間後に結果をまとめ上げる収束期間と
を必要とするベクトル命令については、該当ベクトル命
令の収束処理を実行中の演算パイプライン２２に対する
追越し防止制御を行なわないように、追越し防止制御部
２５から出力される追越し防止制御信号を変更するもの
である（請求項３）。In the second aspect of the invention, the changing unit 26 is newly provided. This changing unit 26 is a vector
For a vector instruction that requires a read processing period in which data is supplied from the register 21 and a convergence period in which the results are collected after the read processing period, overtaking prevention for the operation pipeline 22 that is executing the convergence processing of the corresponding vector instruction The overtaking prevention control signal output from the overtaking prevention control unit 25 is changed so as not to perform control (claim 3).

【００２２】なお、この変更部２６は、演算パイプライ
ン２２にそなえてもよい（請求項４）。また、該当ベク
トル命令の収束処理を実行中の演算パイプライン２２
が、基本演算器と収束を処理する付加演算器とを有する
構成のもので、収束処理を該付加演算器により実行し、
収束処理中、基本演算器により後続の他の演算命令を実
行できるものである場合には、収束処理中、変更部２６
が、付加演算器に対してのみ追越し防止制御を行なわな
いように、追越し防止制御信号を変更してもよい（請求
項５）。The changing unit 26 may be provided in the arithmetic pipeline 22 (claim 4). In addition, the operation pipeline 22 that is executing the convergence processing of the corresponding vector instruction
Is a configuration having a basic arithmetic unit and an additional arithmetic unit for processing convergence, and the convergence processing is executed by the additional arithmetic unit,
During the convergence process, if it is possible to execute another subsequent arithmetic instruction by the basic arithmetic unit, the changing unit 26 during the convergence process.
However, the overtaking prevention control signal may be changed so that the overtaking prevention control is not performed only on the additional arithmetic unit (claim 5).

【００２３】[0023]

【作用】上述した第１の発明のベクトル処理装置（請求
項１，２）では、演算パイプラインのように固定長のパ
イプラインにおいても、特にスループットの少ないパイ
プラインを、メモリ・アクセス・パイプラインのような
１つのバンクスロットしか使用しないパイプラインと共
用することにより、演算パイプラインをオーバラップさ
せて実行している。In the vector processing device according to the first aspect of the present invention (claims 1 and 2), even in a fixed length pipeline such as an arithmetic pipeline, a pipeline with a particularly low throughput is used as a memory access pipeline. By sharing with a pipeline that uses only one bank slot, the arithmetic pipelines are overlapped and executed.

【００２４】また、上述した第２の発明のベクトル処理
装置（請求項３）では、追越し防止制御部２５から追越
し防止制御信号が出力された際に、演算パイプライン２
２が収束処理のシーケンスを実行中で、収束期間条件が
成立している間は、変更部２６により追越し防止制御部
２５からの追越し防止制御信号が変更され、収束処理中
の演算パイプライン２２に対する追越し防止制御が禁止
される。In the vector processing device of the second aspect of the present invention (claim 3), when the overtaking prevention control signal is output from the overtaking prevention control section 25, the arithmetic pipeline 2
2 is executing the sequence of the convergence process and while the condition of the convergence period is satisfied, the changing unit 26 changes the overtaking prevention control signal from the overtaking prevention control unit 25, and the operation pipeline 22 in the convergence process is changed. Overtaking prevention control is prohibited.

【００２５】なお、この変更部２６を演算パイプライン
２２にそなえた場合には、収束処理を実行中の演算パイ
プラインが、追越し防止制御部２５からの追越し防止制
御のための信号を無視する形で、演算パイプライン２２
に対する追越し防止制御部２５による追越し防止制御が
禁止される（請求項４）。また、収束処理中の演算パイ
プライン２２が基本演算器と収束処理用の付加演算器と
をもつものである場合には、変更部２６により追越し防
止制御部２５からの追越し防止制御信号を変更すること
で、付加演算器に対してのみ追越し防止制御が禁止され
る（請求項５）。When the changing unit 26 is provided in the arithmetic pipeline 22, the arithmetic pipeline executing the convergence process ignores the signal for the overtaking prevention control from the overtaking prevention control unit 25. Then, the arithmetic pipeline 22
The overtaking prevention control by the overtaking prevention control unit 25 is prohibited (claim 4). If the arithmetic pipeline 22 in the convergence process has a basic arithmetic unit and an additional arithmetic unit for the convergence process, the changing unit 26 changes the overtaking prevention control signal from the overtaking prevention control unit 25. As a result, the overtaking prevention control is prohibited only for the additional arithmetic unit (claim 5).

【００２６】[0026]

【実施例】以下、図面を参照して本発明の実施例を説明
する。（ａ）第１実施例の説明図３は本発明の第１実施例としてのベクトル処理装置を
示すブロック図で、この図３において、１−０，１−
１，…，１−ｎはそれぞれベクトル・レジスタで、各ベ
クトル・レジスタ１−０，１−１，…，１−ｎは、それ
ぞれ、インターリーブされたバンク単位Ｂ０，Ｂ１，
…，Ｂ７（本実施例では８バンク単位の場合を示す）に
複数のエレメント・データを記憶するものである。Embodiments of the present invention will be described below with reference to the drawings. (A) Description of First Embodiment FIG. 3 is a block diagram showing a vector processing apparatus as a first embodiment of the present invention. In FIG. 3, 1-0, 1-
1, ..., 1-n are vector registers, and the vector registers 1-0, 1-1, ..., 1-n are interleaved bank units B0, B1, respectively.
, B7 (in this embodiment, a case of 8 banks is shown) stores a plurality of element data.

【００２７】２は主記憶部２０とベクトル・レジスタ１
−０，１−１，…，１−ｎとの間において各エレメント
・データを高速にロードないしストアすべくパイプライ
ン構成されたメモリ・アクセス・パイプラインで、ロー
ド・パイプライン２Ａおよびストア・パイプライン２Ｂ
を有している。ここで、ロード・パイプライン２Ａは、
主記憶部２０からのエレメント・データをベクトル・レ
ジスタ１−０，１−１，…，１−ｎへロードするための
ものであり、ストア・パイプライン２Ｂは、ベクトル・
レジスタ１−０，１−１，…，１−ｎに格納されたエレ
メント・データを主記憶部２０へストアするためのもの
である。2 is a main memory 20 and a vector register 1
-0, 1-1, ..., 1-n is a memory access pipeline configured to load or store each element data at high speed, and a load pipeline 2A and a store pipe. Line 2B
have. Here, the load pipeline 2A is
The element data from the main memory 20 is loaded into the vector registers 1-0, 1-1, ..., 1-n, and the store pipeline 2B is a vector register.
This is for storing the element data stored in the registers 1-0, 1-1, ..., 1-n in the main storage unit 20.

【００２８】３Ａはメモリ・アクセス・パイプライン２
のロード・パイプライン２Ａからのエレメント・データ
をベクトル・レジスタ１−０，１−１，…，１−ｎにお
ける各バンク単位Ｂ０〜Ｂ７に書き込むための書込レジ
スタ、３Ｂ−０，３Ｂ−１，…，３Ｂ−ｍはそれぞれ後
述する演算パイプライン５−０，５−１，…，５−ｍか
らのエレメント・データ（演算結果）をベクトル・レジ
スタ１−０，１−１，…，１−ｎにおける各バンク単位
Ｂ０〜Ｂ７に書き込むための書込レジスタである。3A is a memory access pipeline 2
Write register 3B-0, 3B-1 for writing the element data from the load pipeline 2A in the vector registers 1-0, 1-1, ..., 1-n in each bank unit B0-B7. , 3B-m respectively receives element data (operation result) from operation pipelines 5-0, 5-1, ..., 5-m, which will be described later, in vector registers 1-0, 1-1 ,. It is a write register for writing in each bank unit B0 to B7 in -n.

【００２９】４Ａはベクトル・レジスタ１−０，１−
１，…，１−ｎに格納されたエレメント・データをメモ
リ・アクセス・パイプライン２のストア・パイプライン
２Ｂへ読み出すための読出レジスタである。また、４Ｂ
−０，４Ｃ−０；４Ｂ−１，４Ｃ−１；…；４Ｂ−ｍ，
４Ｃ−ｍはそれぞれ後述する演算パイプライン５−０，
５−１，…，５−ｍ毎にそなえられた一対の読出レジス
タで、各対の読出レジスタ４Ｂ−０，４Ｃ−０；４Ｂ−
１，４Ｃ−１；…；４Ｂ−ｍ，４Ｃ−ｍは、それぞれ、
演算対象となる一対のエレメント・データを演算パイプ
ライン５−０，５−１，…，５−ｍに入力すべく、その
一対のエレメント・データをベクトル・レジスタ１−
０，１−１，…，１−ｎにおける各バンク単位Ｂ０〜Ｂ
７から読み出すためのものである。4A is a vector register 1-0, 1-
A read register for reading the element data stored in 1, ..., 1-n to the store pipeline 2B of the memory access pipeline 2. Also, 4B
-0,4C-0; 4B-1, 4C-1; ...; 4B-m,
4C-m is an operation pipeline 5-0, which will be described later,
A pair of read registers provided for every 5-1, ..., 5-m. Each pair of read registers 4B-0, 4C-0; 4B-
, 4C-1; 4B-m and 4C-m are respectively
In order to input a pair of element data to be operated to the operation pipelines 5-0, 5-1, ..., 5-m, the pair of element data is input to the vector register 1-.
Bank units B0 to B in 0, 1-1, ..., 1-n
It is for reading from 7.

【００３０】５−０，５−１，…，５−ｍは演算パイプ
ラインで、これらの演算パイプライン５−０，５−１，
…，５−ｍは、それぞれ、一対の読出レジスタ４Ｂ−
０，４Ｃ−０；４Ｂ−１，４Ｃ−１；…；４Ｂ−ｍ，４
Ｃ−ｍを介して入力されたエレメント・データに対して
四則演算等の演算処理を施し、その演算結果（エレメン
ト・データ）を出力するものである。5-0, 5-1, ..., 5-m are operation pipelines, and these operation pipelines 5-0, 5-1,
, -M are a pair of read registers 4B-, respectively.
0,4C-0; 4B-1, 4C-1; ...; 4B-m, 4
The element data input via C-m is subjected to arithmetic processing such as four arithmetic operations, and the arithmetic result (element data) is output.

【００３１】そして、６は各種のベクトル演算命令を出
力する命令制御部、７は命令制御部６からの命令を受け
て動作するバンク管理部で、このバンク管理部７は、メ
モリ・アクセス・パイプライン２および演算パイプライ
ン５−０，５−１，…，５−ｍがベクトル・レジスタ１
−０，１−１，…，１−ｎにおける各バンク単位Ｂ０〜
Ｂ７をアクセスできるタイミングを示すバンクスロット
を管理するもので、各バンク単位Ｂ０〜Ｂ７へのアクセ
ス・タイミングを規制するバンクスロット・カウンタ７
ａを有している。Further, 6 is an instruction control section for outputting various vector operation instructions, 7 is a bank management section which operates in response to an instruction from the instruction control section 6, and the bank management section 7 is a memory access pipe. Line 2 and operation pipelines 5-0, 5-1, ..., 5-m are vector registers 1
Bank units B0 to 0, 1-1, ..., 1-n
A bank slot counter 7 that manages a bank slot that indicates the timing at which B7 can be accessed, and that regulates the access timing to each bank unit B0 to B7
a.

【００３２】次に、上述のようなベクトル処理装置の一
般的な動作について説明する。図３に示す本実施例のベ
クトル処理装置では、各ベクトル・レジスタ１−０，１
−１，…，１−ｎは、各バンク単位Ｂ０〜Ｂ７にそれぞ
れ分散するように対応付けられている。そして、各ベク
トル・レジスタ１−０，１−１，…，１−ｎに格納され
るエレメント・データは、第０番目のデータがバンク単
位Ｂ０に記憶され、第１番目のデータがバンク単位Ｂ１
に記憶され、第７番目のデータがバンク単位Ｂ７に記憶
されるというように各バンク単位に順次記憶され、いわ
ゆるインターリーブした形に格納され、同じナンバのデ
ータが同じバンク単位に位置するように格納される。Next, the general operation of the above vector processing device will be described. In the vector processing device of the present embodiment shown in FIG. 3, each vector register 1-0, 1
-1, ..., 1-n are associated so as to be dispersed in the bank units B0 to B7, respectively. As for the element data stored in each of the vector registers 1-0, 1-1, ..., 1-n, the 0th data is stored in the bank unit B0 and the 1st data is stored in the bank unit B1.
The 7th data is sequentially stored in each bank unit such that the 7th data is stored in the bank unit B7 and stored in the so-called interleaved form, and the data of the same number is stored in the same bank unit. To be done.

【００３３】例えば、ベクトルＢに属するエレメント・
データｂ₀,ｂ₁,…が主記憶部２０からロードされてベク
トル・レジスタ１−１内に格納されているものとし、ま
たベクトルＣに属するエレメント・データｃ₀,ｃ₁,…が
同様にベクトル・レジスタ１−２に格納されているもの
とする。この状態で、例えば、命令制御部６からバンク
管理部７に対して、ベクトル加算命令「ＯＰ１〔＃１Ｖ
Ｒ（ｉ）〕＋ＯＰ２〔＃２ＶＲ（ｉ）〕⇒ＯＰ３〔＃０
ＶＲ（ｉ）〕」が与えられたとすると、バンク管理部７
により、次のごとく〜の処理（図４参照）が実行さ
れる。なお、本実施例では、演算パイプライン（加算パ
イプライン）５−０が３段のステップ段数をもつものと
する（図４参照）。For example, the elements belonging to the vector B
The data b ₀ , b ₁ , ... Are assumed to be loaded from the main memory 20 and stored in the vector register 1-1, and the element data c ₀ , c ₁ , ... It is assumed to be stored in the vector register 1-2. In this state, for example, from the instruction control unit 6 to the bank management unit 7, the vector addition instruction “OP1 [# 1V
R (i)] + OP2 [# 2VR (i)] ⇒OP3 [# 0
VR (i)] "is given, the bank management unit 7
As a result, the following processes (see FIG. 4) are executed. In this embodiment, the operation pipeline (addition pipeline) 5-0 has three step stages (see FIG. 4).

【００３４】ここで、＃０ＶＲ（ｉ），＃１ＶＲ
（ｉ），＃２ＶＲ（ｉ）はそれぞれベクトル・レジスタ
１−０，１−１，１−２の各バンク単位Ｂｉに格納され
るデータを意味し、上記ベクトル加算命令は、ベクトル
・レジスタ１−１に格納された各エレメント・データ
と、ベクトル・レジスタ１−２に格納された各エレメン
ト・データとを加算し、その加算結果（エレメント・デ
ータ）をベクトル・レジスタ１−０に格納する命令とな
っている。Here, # 0VR (i), # 1VR
(I) and # 2VR (i) mean the data stored in each bank unit Bi of the vector registers 1-0, 1-1, 1-2, and the vector addition instruction is the vector register 1- An instruction for adding each element data stored in 1 and each element data stored in the vector register 1-2 and storing the addition result (element data) in the vector register 1-0; Has become.

【００３５】タイミング・サイクル（バンクスロット
・カウンタ７ａによりカウントされるカウント値に対応
するもの）Ｔ０，Ｔ１，…において、バンク単位Ｂ０，
Ｂ１，…，Ｂ７，…に対して、順次、リード・アクセス
が行なわれ、その結果、読出レジスタ４Ｂ−０，４Ｃ−
０を介して、エレメント・データｂ₀,ｂ₁,…およびｃ ₀,
ｃ₁,…が、順次、ベクトル・レジスタ１−１，１−２か
ら読み出される。Timing cycle (bank slot
.Corresponding to the count value counted by the counter 7a
In T0, T1, ..., bank units B0,
Read access to B1, ..., B7 ,.
Is performed, and as a result, the read registers 4B-0 and 4C-
Element data b through 0₀, b₁, ... and c ₀,
c₁, ... are the vector registers 1-1 and 1-2 in sequence
Read from.

【００３６】タイミング・サイクルＴ２において、デ
ータｂ₀とｃ₀とは演算パイプライン５−０のステップ
Ｉに入力される。タイミング・サイクルＴ３において、データｂ₀とｃ
₀とは演算パイプライン５−０のステップIIに入力され
ると同時に、データｂ₁とｃ₁とが演算パイプライン５
−０のステップＩに入力される。In timing cycle T2, data b ₀ and c ₀ are input to step I of arithmetic pipeline 5-0. In timing cycle T3, data b ₀ and c
₀ is input to step II of the operation pipeline 5-0, and at the same time, the data b ₁ and c ₁ are input to the operation pipeline 5
Input to step I of -0.

【００３７】タイミング・サイクルＴ４において、デ
ータｂ₀とｃ₀とは演算パイプライン５−０のステップ
III に入力され、データｂ₁とｃ₁とは演算パイプライ
ン５−０のステップIIに入力されると同時に、データｂ
₂とｃ₂とが演算パイプライン５−０のステップＩに入
力される。タイミング・サイクルＴ５において、データｂ₀とｃ
₀との加算結果であるデータａ₀が書込レジスタ３Ｂ−
０にセットされる。In the timing cycle T4, the data b ₀ and c ₀ are the steps of the operation pipeline 5-0.
The data b ₁ and c ₁ are input to step III of the operation pipeline 5-0, and at the same time,
₂ and c ₂ are input to step I of the operation pipeline 5-0. In timing cycle T5, data b ₀ and c
_The data a ₀ which is the addition result with ₀ is written in the write register 3B-
It is set to 0.

【００３８】タイミング・サイクルＴ６において、こ
のデータａ₀が、ベクトル・レジスタ１−０のバンク単
位Ｂ０に書き込まれる。以下、順次得られるデータａ₁,
ａ₂,…が書込レジスタ３Ｂ−０にセットされ、書込レジ
スタ３Ｂ−０にセットされたデータａ₁,ａ₂,…は、それ
ぞれ、ベクトル・レジスタ１−０のバンク単位Ｂ１，Ｂ
２，…，Ｂ７，Ｂ０，…に順次に書き込まれる。At timing cycle T6, this data a ₀ is written to the bank unit B0 of the vector register 1-0. Hereinafter, the data a ₁ ,
a _2, ... it is set in the write register 3B-0, the write register 3B-0 to the set data a _1, a _2, ..., respectively, per bank of vector registers 1-0 B1, B
2, ..., B7, B0 ,.

【００３９】ここで、演算パイプライン５−０では、同
じナンバ（添字番号）のエレメント・データがステップ
Ｉに入力されるように、ベクトルＢ，Ｃに属するエレメ
ント・データの入力側にタイミングを合わせるためのバ
ッファ・レジスタが１段設けられている。このように構
成することによって、ベクトルＢ，Ｃに関して加算し、
その結果得られたベクトルＡをバンク単位Ｂ０，Ｂ１，
…の順にアクセスすることが可能になる。Here, in the operation pipeline 5-0, the timing is adjusted to the input side of the element data belonging to the vectors B and C so that the element data having the same number (subscript number) is input to the step I. There is provided one stage of buffer register for. With this configuration, addition is made for the vectors B and C,
The resulting vector A is assigned to bank units B0, B1,
It becomes possible to access in order.

【００４０】ついで、前記の各バンク単位Ｂ０〜Ｂ７へ
のアクセス制御を簡略化すべく、そのアクセス制御を行
なうバンク管理部７について、図５，図６を参照しなが
ら説明する。図５において、１１−１，１１−２，１１
−３はメモリ・アクセス・パイプライン２もしくは演算
パイプライン５−０，５−１，…，５−ｍがベクトル・
レジスタ１−０，１−１，…，１−ｎにアクセスするタ
イミング・サイクル（以下、バンクスロットという）を
記憶する管理レジスタ、１２はバンクスロット割当回
路、１３はバンクスロットを記憶しメモリ・アクセス・
パイプライン２へ通知する通知レジスタ、１４は起動信
号制御部である。Next, in order to simplify the access control to each of the bank units B0 to B7, the bank management unit 7 that performs the access control will be described with reference to FIGS. In FIG. 5, 11-1, 11-2, 11
-3 is a memory access pipeline 2 or an operation pipeline 5-0, 5-1, ..., 5-m is a vector
A management register for storing timing cycles (hereinafter referred to as bank slots) for accessing the registers 1-0, 1-1, ..., 1-n, 12 is a bank slot allocation circuit, and 13 is memory access for storing bank slots.・
A notification register 14 for notifying the pipeline 2 is a start signal controller.

【００４１】バンク管理部７では、各バンク単位Ｂ０〜
Ｂ７にアクセスするパイプライン装置（メモリ・アクセ
ス・パイプライン２もしくは演算パイプライン５−０，
５−１，…，５−ｍ）が同一バンク単位へ同時にアクセ
スすることのないように、さらには無駄な空き時間を生
じない効率のよいアクセスが可能になるアクセス制御を
実現するために、バンクスロット・カウンタ７ａ（１個
設けられている）を、常時カウントすることによって、
バンクスロットと呼ばれるタイミング・サイクルＴ０〜
Ｔ７を規定し、そのカウンタ出力信号を各パイプライン
装置へ通知している。In the bank management unit 7, each bank unit B0 to B0
Pipeline device for accessing B7 (memory access pipeline 2 or operation pipeline 5-0,
5-1, ..., 5-m) does not access the same bank unit at the same time, and further, in order to realize access control that enables efficient access with no wasted idle time, By constantly counting the slot counter 7a (one provided),
Timing cycle T0 called bank slot
T7 is specified and the counter output signal is notified to each pipeline device.

【００４２】なお、このとき、バンクＢ１をアクセスす
るタイミングは、バンクＢ０よりも１サイクル遅れた状
態であるため、本カウンタ７ａによるカウント値が
“１”の時、１サイクル前でバンクＢ０をアクセスして
いたパイプラインがバンクＢ１をアクセスする。従っ
て、バンクスロット・カウンタ７ａによるカウント値
は、バンクＢ０にアクセスするパイプラインの順を示し
ている。At this time, since the timing of accessing the bank B1 is delayed by one cycle from the bank B0, when the count value of the main counter 7a is "1", the bank B0 is accessed one cycle before. The pipeline that was being accessed accesses bank B1. Therefore, the count value of the bank slot counter 7a indicates the order of pipelines that access the bank B0.

【００４３】各管理レジスタ１１−１，１１−２，１１
−３は、各パイプライン装置が有する各バンク単位にデ
ータ転送するためのチャネル（アクセス要求）に対し
て、起動する時点において割り当てられるバンクスロッ
ト番号（Ｂ１）を記憶する例えば３ビットの記憶素子
（実際には記憶内容の有効／無効表示のためにさらに１
ビットが必要）で構成されるものであり、パイプライン
装置がベクトル・レジスタ１−０，１−１，…，１−ｎ
にアクセスしている期間、そのチャネル（アクセス要
求）に割り当てられたバンクスロット番号を記憶してい
る。Each management register 11-1, 11-2, 11
-3 is, for example, a 3-bit storage element (for example, a 3-bit storage element that stores a bank slot number (B1) assigned at the time of activation for a channel (access request) for data transfer in each bank unit of each pipeline device. Actually 1 more for valid / invalid display of stored contents
(Bits are required), and the pipeline device has vector registers 1-0, 1-1, ..., 1-n.
The bank slot number assigned to the channel (access request) is stored during the access to the.

【００４４】バンクスロット割当回路１２は、管理レジ
スタ１１−１，１１−２，１１−３の出力と、バンクス
ロット・カウンタ７ａの出力とによって、使用中のバン
クスロット番号と現在のバンクスロット番号とを知り、
起動のあるパイプライン装置に対して空き時間の最少と
なるようなバンクスロット番号を割り当てる選択回路で
ある。The bank slot allocation circuit 12 uses the outputs of the management registers 11-1, 11-2, 11-3 and the output of the bank slot counter 7a to determine the bank slot number in use and the current bank slot number. Know
It is a selection circuit that assigns a bank slot number that minimizes the idle time to a pipeline device that is activated.

【００４５】ここで、命令制御部６が、バンク管理部７
に起動信号を与え、例えば、メモリ・アクセス・パイプ
ライン２が主記憶部２０へアクセスして読出データをベ
クトル・レジスタ１−０へデータ転送するように要求し
たとする。このとき、バンクスロット割当回路１２は、
メモリ・アクセス・パイプライン２を起動する時点にお
いて、当該チャネル（アクセス要求）に相当する管理レ
ジスタ１１−１，１１−２，１１−３に、その選択した
バンクスロット番号をセット番号として伝え記憶させ
る。Here, the instruction control unit 6 causes the bank management unit 7 to operate.
Suppose that the memory access pipeline 2 requests the main memory 20 to transfer the read data to the vector register 1-0. At this time, the bank slot allocation circuit 12
At the time of starting the memory access pipeline 2, the selected bank slot number is transmitted and stored as a set number in the management registers 11-1, 11-2, 11-3 corresponding to the channel (access request). .

【００４６】図６は上述の動作を詳細に説明するための
もので、この図６に示すように、メモリ・アクセス・パ
イプライン２が起動信号によって起動されると、エレメ
ント・データａ₀，ａ₁，…，ａ_n（主記憶部２０に格
納されているものとする）にメモリ・アクセスが開始さ
れ、アクセス・タイムｔ_Aの後に、メモリ・アクセス・
パイプライン２のバッファ・レジスタ（図５には図示せ
ず）にその読出内容がロードされる。FIG. 6 is for explaining the above-mentioned operation in detail. As shown in FIG. 6, when the memory access pipeline 2 is activated by the activation signal, the element data a ₀ , a ₁ , ..., A _n (assumed to be stored in the main storage unit 20) is started, and after the access time t _A , the memory access is started.
The read contents are loaded into the buffer register (not shown in FIG. 5) of pipeline 2.

【００４７】各パイプライン装置には、予め決められた
バンクスロット（図１，図７，図１９参照）が割り当て
られているので、各パイプライン装置でバンクスロット
・カウンタ７ａの出力を参照し、所望のタイミングにな
った時点で、バッファ・レジスタに一時記憶していたエ
レメント・データａ₀，ａ₁，…，ａ_nをベクトル・レ
ジスタ１−０，…へ順次転送する。最後のデータを転送
すると、パイプライン終了信号によってリセット信号を
バンク管理部７へ送出し、当該転送チャネルに相当する
管理レジスタ１１−１または１１−２，１１−３の内容
をリセットし無効にする。Since a predetermined bank slot (see FIGS. 1, 7, and 19) is assigned to each pipeline device, the output of the bank slot counter 7a is referred to in each pipeline device, when it becomes the desired timing, buffer register for temporary storage to have the element data _{_{a 0, a 1, ...,}} a n vector registers 1-0, sequentially transferred to .... When the last data is transferred, a reset signal is sent to the bank management unit 7 by the pipeline end signal, and the contents of the management register 11-1 or 11-2, 11-3 corresponding to the transfer channel are reset and invalidated. .

【００４８】なお、バッファ・レジスタは、主記憶部２
０の読出出力をストローブするタイミングと、バンクス
ロットによりベクトル・レジスタ１−０，…に書き込む
までの期間を調整する複数個のレジスタである。また、
上述した例では、主記憶部２０が他のアクセス装置から
の要求によってビジー状態等であることを考慮すると、
起動してからベクトル・レジスタ１−０を使用するまで
の時間が一定でない。即ち、上述の例では、パイプライ
ン長が不定な装置とすることができ、演算パイプライン
５−０，５−１，…，５−ｍは、ステップ数（即ちパイ
プライン長）が固定であり、主記憶部２０の状況に影響
されず起動されてからベクトル・レジスタ１−０を使用
するまでの時間は一定となる。この場合は、図４におい
て説明した通り、演算パイプライン５−０，５−１，
…，５−ｍが、ベクトル・レジスタ１−０にアクセスす
るタイミング関係は固定となっている。The buffer / register is the main storage unit 2.
A plurality of registers for adjusting the timing for strobing the read output of 0 and the period until writing to the vector registers 1-0, ... Also,
In the above-mentioned example, considering that the main storage unit 20 is in a busy state or the like due to a request from another access device,
The time from starting up to using the vector register 1-0 is not constant. That is, in the above example, the pipeline length can be an indefinite device, and the operation pipelines 5-0, 5-1, ..., 5-m have a fixed number of steps (that is, pipeline length). The time from activation to use of the vector register 1-0 is constant regardless of the status of the main storage unit 20. In this case, as described in FIG. 4, the operation pipelines 5-0, 5-1 and
, 5-m have a fixed timing relationship for accessing the vector register 1-0.

【００４９】このため、パイプライン長が固定の装置に
おいて、各チャネル（アクセス要求）間のアクセス・タ
イミングのずれを認識することにより、図５に示すごと
く、起動信号制御部１４において、使用中のバンクスロ
ット番号，バンクスロット・カウンタ７ａの内容および
起動信号に基づき、起動信号のタイミングでバンクスロ
ットのタイミングを判断することが可能である。Therefore, by recognizing a shift in access timing between channels (access requests) in a device having a fixed pipeline length, as shown in FIG. 5, the activation signal controller 14 is in use. Based on the bank slot number, the contents of the bank slot counter 7a and the activation signal, the timing of the activation slot can be used to determine the timing of the bank slot.

【００５０】さて、本実施例では、上述のような一般的
な動作を行なうベクトル処理装置において、複数の演算
パイプライン５−０，５−１，…，５−ｍの中に、他の
演算パイプラインに比べて演算スループットの低い演算
パイプラインとして、割算パイプライン１５（６個の割
算器５ａ〜５ｆを有してなるパイプライン）を有する場
合に、バンク管理部７は、図７に示すように、バンクス
ロットを管理している。Now, in the present embodiment, in the vector processing device which performs the general operation as described above, other arithmetic operations are performed in a plurality of arithmetic pipelines 5-0, 5-1, ..., 5-m. When a division pipeline 15 (a pipeline including six dividers 5a to 5f) is provided as an operation pipeline having a lower operation throughput than the pipeline, the bank management unit 7 operates as shown in FIG. As shown in, the bank slot is managed.

【００５１】つまり、バンク管理部７は、各割算パイプ
ライン５ａ〜５ｆがベクトル・レジスタ１−０，１−
１，…，１−ｎをアクセスするタイミング（ＤＲ１，Ｄ
Ｒ２，ＤＷ３）を、メモリ・アクセス・パイプライン２
のうちのストア・パイプライン２Ｂがベクトル・レジス
タ１−０，１−１，…，１−ｎから主記憶部２０へスト
ア動作する読み出しタイミング（ＳＲ）のバンクスロッ
トに対して割り当てている。That is, in the bank management unit 7, each of the division pipelines 5a to 5f is a vector register 1-0, 1-.
1, ..., 1-n access timing (DR1, D
R2, DW3) to the memory access pipeline 2
Of the vector registers 1-0, 1-1, ..., 1-n are assigned to the bank slot of the read timing (SR) for the store operation to the main memory 20.

【００５２】なお、図７において、各符号は、図１９に
より前述したものと同様であるが、ＤＲ１，ＤＲ２は割
算パイプライン１５における読出オペランドのタイミン
グを示し、ＤＷ３は割算パイプライン１５における書込
オペランドのタイミングを示している。割算パイプライ
ン１５が動作している間は、タイミングを時分割で使用
する。また、割算パイプライン１５とストア・パイプラ
イン２Ｂとはいずれか一方のみ動作するように制御す
る。さらに、Ｅ，Ｆは演算パイプライン５−０，５−
１，…，５−ｍのうちの加算パイプラインもしくは乗算
パイプラインが動作するバンクスロットの割付タイミン
グを示す。In FIG. 7, the reference numerals are the same as those described above with reference to FIG. 19, but DR1 and DR2 indicate the timing of the read operand in the division pipeline 15, and DW3 indicates the timing in the division pipeline 15. The timing of the write operand is shown. Timing is used in a time division manner while the division pipeline 15 is operating. Further, the division pipeline 15 and the store pipeline 2B are controlled so that only one of them operates. Further, E and F are arithmetic pipelines 5-0 and 5-
The allocation timing of the bank slot in which the addition pipeline or the multiplication pipeline of 1, ..., 5-m operates is shown.

【００５３】本実施例では、図７に示すように、２種類
の演算パイプラインが動作している際に、各バンク単位
Ｂ０〜Ｂ７で割算パイプライン１５の各割算器５ａ〜５
ｆは１６タイミング・サイクル毎に到来する、ストア・
パイプライン２Ｂがベクトル・レジスタ１−０，１−
１，…，１−ｎから主記憶部２０へストア動作する読み
出しタイミングＳＲのバンクスロットを共用して、割算
パイプライン１５による演算処理を実行している。In this embodiment, as shown in FIG. 7, when two types of operation pipelines are operating, each divider 5a-5 of the division pipeline 15 is divided by bank units B0-B7.
f arrives every 16 timing cycles, store
Pipeline 2B is a vector register 1-0, 1-
The calculation processing by the division pipeline 15 is executed by sharing the bank slot of the read timing SR for performing the store operation from 1, ..., 1-n to the main storage unit 20.

【００５４】例えば、割算パイプライン１５内の割算器
５ａでは、バンク単位Ｂ０の最初のストア読出タイミン
グＳＲ（タイミング・サイクルＴ４）と、バンク単位Ｂ
１の最初ストア読出タイミングＳＲ（タイミング・サイ
クルＴ５）とで、ベクトル・レジスタ１−０，１−１，
…，１−ｎからのデータの読出（ＤＲ１，ＤＲ２）を行
なう。そして、１６タイミング・サイクル後のバンク単
位Ｂ１の２周期目のストア読出タイミングＳＲ（タイミ
ング・サイクルＴ５）で、割算結果のベクトル・レジス
タ１−０，１−１，…，１−ｎへの書込（ＤＷ３）を行
なった後、再び、バンク単位Ｂ２の２周期目のストア読
出タイミングＳＲ（タイミング・サイクルＴ６）と、バ
ンク単位Ｂ３の２周期目のストア読出タイミングＳＲ
（タイミング・サイクルＴ７）とで、ベクトル・レジス
タ１−０，１−１，…，１−ｎからのデータの読出（Ｄ
Ｒ１，ＤＲ２）を行ない、以下、同様の処理を繰り返
す。For example, in the divider 5a in the division pipeline 15, the first store read timing SR (timing cycle T4) of the bank unit B0 and the bank unit B0.
1 first store read timing SR (timing cycle T5) and vector registers 1-0, 1-1,
.., 1-n is read (DR1, DR2). Then, at the store read timing SR (timing cycle T5) of the second cycle of the bank unit B1 after 16 timing cycles, the division result is transferred to the vector registers 1-0, 1-1, ..., 1-n. After the writing (DW3) is performed, the second cycle store read timing SR of the bank unit B2 (timing cycle T6) and the second cycle store read timing SR of the bank unit B3 are again performed.
(Timing cycle T7), read data from the vector registers 1-0, 1-1, ..., 1-n (D
R1, DR2) is performed, and the same processing is repeated thereafter.

【００５５】また、割算パイプライン１５内の割算器５
ｂでも、同様に、バンク単位Ｂ３の最初のストア読出タ
イミングＳＲ（タイミング・サイクルＴ７）と、バンク
単位Ｂ４の最初ストア読出タイミングＳＲ（タイミング
・サイクルＴ０）とで、ベクトル・レジスタ１−０，１
−１，…，１−ｎからのデータの読出（ＤＲ１，ＤＲ
２）を行ない、１６タイミング・サイクル後のバンク単
位Ｂ４の２周期目のストア読出タイミングＳＲ（タイミ
ング・サイクルＴ０）で、割算結果のベクトル・レジス
タ１−０，１−１，…，１−ｎへの書込（ＤＷ３）を行
ない、以下、同様の処理を繰り返す。その他の割算器５
ｃ，５ｄ，５ｅ，５ｆについても同様に、タイミングの
割付を行なって、割算処理が行なわれる。Also, the divider 5 in the division pipeline 15
Also in b, similarly, the vector store 1-0, 1 with the first store read timing SR (timing cycle T7) of the bank unit B3 and the first store read timing SR (timing cycle T0) of the bank unit B4.
Reading data from -1, ..., 1-n (DR1, DR
2), and at the second cycle of the store read timing SR (timing cycle T0) of the bank unit B4 after 16 timing cycles, the division result vector registers 1-0, 1-1, ..., 1- Writing to n (DW3) is performed, and the same processing is repeated thereafter. Other divider 5
Similarly, for c, 5d, 5e, and 5f, timing allocation is performed and division processing is performed.

【００５６】このように、本発明の第１実施例のベクト
ル処理装置によれば、演算パイプラインのように固定長
のパイプラインにおいても、特にスループットの少ない
パイプライン、例えば割算パイプライン１５を、メモリ
・アクセス・パイプライン２のような１つのバンクスロ
ットしか使用しないパイプラインと共用することによ
り、演算パイプラインをオーバラップさせて実行し、全
体の演算効率を落とすことなく演算処理を行なえ、演算
スループットが大幅に向上することになる。As described above, according to the vector processing device of the first embodiment of the present invention, even in a fixed length pipeline such as an arithmetic pipeline, a pipeline with a particularly low throughput, for example, the division pipeline 15 is used. , The memory access pipeline 2 and the pipeline that uses only one bank slot, the arithmetic pipelines can be overlapped and executed, and the arithmetic processing can be performed without lowering the overall arithmetic efficiency. The calculation throughput will be greatly improved.

【００５７】また、特に、メモリ・アクセス・パイプラ
イン２の中でも、主記憶部２０から読出を行なうロード
・パイプライン２Ａは演算のソースとなるため、結果オ
ペランドを格納するためのストア・パイプライン２Ｂの
バンクスロットと共用することにより、ベクトル処理装
置全体のスループットを向上させることができるのであ
る。In particular, among the memory access pipelines 2, the load pipeline 2A for reading from the main memory 20 serves as the source of the operation, and therefore the store pipeline 2B for storing the result operand. It is possible to improve the throughput of the entire vector processing device by using the same bank slot.

【００５８】なお、上述の実施例では、スループットの
少ないパイプラインが割算パイプラインである場合につ
いて説明したが、本発明は、これに限定されるものでは
なく、スループットの少ないパイプラインがスクエア・
ルート（２乗根）演算パイプライン等である場合にも同
様に適用され、上述した実施例と同様の作用効果が得ら
れる。In the above embodiment, the case where the pipeline with low throughput is the division pipeline has been described, but the present invention is not limited to this, and the pipeline with low throughput is a square pipeline.
It is similarly applied to the case of a root (square root) operation pipeline or the like, and the same effect as the above-described embodiment can be obtained.

【００５９】（ｂ）第２実施例の説明図８は本発明の第
２実施例としてのベクトル処理装置を示すブロック図
で、この図８において、２１はインターリーブされた複
数のバンク単位に複数のエレメント・データを記憶する
ベクトル・レジスタ（ＶＲ）、２２−１，２２−２，
…，２２−ｎはそれぞれベクトル・レジスタ２１上のデ
ータを入力オペランドとするかもしくは演算結果をベク
トル・レジスタ２１に書き込む演算パイプラインであ
る。(B) Description of Second Embodiment FIG. 8 is a block diagram showing a vector processing device as a second embodiment of the present invention. In FIG. 8, 21 is a plurality of interleaved banks. Vector register (VR) for storing element data, 22-1, 22-2
, 22-n are operation pipelines that use the data in the vector register 21 as an input operand or write the operation result to the vector register 21.

【００６０】また、２３−１，２３−２，…，２３−ｍ
はベクトル・レジスタ２１と主記憶部２４との間におい
て各エレメント・データを高速にロードないしストアす
べくパイプライン構成されたメモリ・アクセス・パイプ
ラインで、各メモリ・アクセス・パイプライン２３−
１，２３−２，…，２３−ｍは、ロード・パイプライン
２３Ａおよびストア・パイプライン２３Ｂとしての機能
を有している。ここで、ロード・パイプライン２３Ａ
は、主記憶部２４からベクトル・レジスタ２１へデータ
を転送（ロード）するためのものであり、ストア・パイ
プライン２３Ｂは、ベクトル・レジスタ２１に格納され
たデータを主記憶部２４へ転送（ストア）するためのも
のである。23-1, 23-2, ..., 23-m
Is a memory access pipeline constituted by a pipeline for loading or storing each element data at high speed between the vector register 21 and the main memory unit 24. Each memory access pipeline 23-
, 23-m have a function as a load pipeline 23A and a store pipeline 23B. Where the load pipeline 23A
Is for transferring (loading) data from the main memory unit 24 to the vector register 21, and the store pipeline 23B transfers (stores) the data stored in the vector register 21 to the main memory unit 24. ) To do so.

【００６１】なお、図８で示した各メモリ・アクセス・
パイプライン２３−１，２３−２，…，２３−ｍでは、
ロード・パイプライン２３Ａとストア・パイプライン２
３Ｂとがペアとなってそなえられているが、いずれか一
方のみを有する構成としてもよい。また、メモリ・アク
セス・パイプライン２３−１，２３−２，…，２３−ｍ
のいずれかがロード・パイプライン２３Ａもしくはスト
ア・パイプライン２３Ｂの一方のみ有する構成としても
よい。It should be noted that each memory access shown in FIG.
In the pipelines 23-1, 23-2, ..., 23-m,
Load pipeline 23A and store pipeline 2
3B and 3B are provided as a pair, but a configuration having only one of them may be adopted. Also, the memory access pipelines 23-1, 23-2, ..., 23-m
Either of the above may have only one of the load pipeline 23A and the store pipeline 23B.

【００６２】２５は追越し防止制御部で、この追越し防
止制御部２５は、メモリ・アクセス・パイプライン２３
−１，２３−２，…，２３−ｍにおけるロード・パイプ
ライン２３Ａからベクトル・レジスタ２１へデータを転
送する命令の実行中に、ロード・パイプライン２３Ａが
ベクトル・レジスタ２１に書き込んだデータを入力オペ
ランドとする後続の演算命令を、演算パイプライン２２
−１，２２−２，…，２２−ｎのいずれかが実行する場
合、命令の実行順序を保証するために、ロード・パイプ
ライン２３Ａの実行を後続の演算パイプラインの処理が
追い越す条件を検出した時に、全ての演算パイプライン
２２−１，２２−２，…，２２−ｎの実行を一時中断す
べく、例えば、処理の一時中断を要求している間は
“１”に立ち上がり、処理を続行可能な時には“０”に
なる追越し防止制御信号を出力するものである。Reference numeral 25 denotes an outpacing prevention control unit, which is a memory access pipeline 23.
Input the data written to the vector register 21 by the load pipeline 23A during the execution of the instruction to transfer the data from the load pipeline 23A to the vector register 21 at -1, 23-2, ..., 23-m. Subsequent operation instructions that are used as operands are processed by the operation pipeline 22.
When any of -1, 22-2, ..., 22-n is executed, a condition is detected that the processing of the subsequent operation pipeline overtakes the execution of the load pipeline 23A in order to guarantee the execution order of the instructions. At this time, in order to temporarily suspend the execution of all the arithmetic pipelines 22-1, 22-2, ..., 22-n, for example, while requesting the temporary suspension of the processing, it rises to “1” to execute the processing. When it is possible to continue, the overtaking prevention control signal which becomes "0" is output.

【００６３】２７は各演算パイプライン２２−１，２２
−２，…，２２−ｎや各メモリ・アクセス・パイプライ
ン２３−１，２３−２，…，２３−ｍに対する命令の発
信と進行状況とを管理する命令発信／管理部で、この命
令発信／管理部２７は、命令を発信するときに、その命
令の発信以前に実行を開始され未だその実行を完了して
いない命令との間にオペランド・レジスタのリンクがあ
るかどうかをチェックし、リンクがあれば、追越し防止
制御部２５にリンク条件が成立したことをリンク情報と
して通知する機能を有している。27 is each operation pipeline 22-1, 22
-2, ..., 22-n and the memory access pipelines 23-1, 23-2 ,. When issuing an instruction, the management unit 27 checks whether or not there is a link of an operand register with an instruction that has started execution before the instruction is issued and has not yet completed execution, and If there is, it has a function of notifying the overtaking prevention control unit 25 as link information that the link condition is satisfied.

【００６４】そして、第２実施例では、図９に示すよう
な追越し防止制御信号変更部２６Ａおよび収束期間通知
部２６Ｂが、各演算パイプライン２２−１，２２−２，
…，２２−ｎ毎にそなえられている。この追越し防止制
御信号変更部２６Ａは、ベクトル・レジスタ２１からの
データ供給を受けるリード処理期間と、リード処理期間
後に結果をまとめ上げる収束期間とを必要とするベクト
ル命令については、該当ベクトル命令の収束処理を実行
中の演算パイプライン２２−１，２２−２，…，２２−
ｎに対する追越し防止制御を行なわないように、追越し
防止制御部２５から出力される追越し防止制御信号を変
更するものである。Then, in the second embodiment, the overtaking prevention control signal changing unit 26A and the convergence period notifying unit 26B as shown in FIG. 9 are provided in the respective operation pipelines 22-1, 22-2.
..., provided every 22-n. The overtaking prevention control signal changing unit 26A converges the corresponding vector instruction for a vector instruction that requires a read processing period in which data is supplied from the vector register 21 and a convergence period in which results are collected after the read processing period. Operation pipelines 22-1, 22-2, ..., 22- executing processing
The overtaking prevention control signal output from the overtaking prevention control unit 25 is changed so that the overtaking prevention control for n is not performed.

【００６５】また、収束期間通知部２６Ｂは、各演算パ
イプライン２２−１，２２−２，…，２２−ｎが有する
収束期間の開始信号(start final sequence;例えば収束
期間開始時に立ち上がるパルス信号）と収束期間の終了
信号(terminal final sequence；例えば収束期間終了時
に立ち上がるパルス信号）とに基づいて、各演算パイプ
ライン２２−１，２２−２，…，２２−ｎが前述した収
束期間にあるか否かを、収束期間通知信号として対応す
る追越し防止制御信号変更部２６Ａへ出力するもので、
例えば、図９に示すように構成されている。The convergence period notifying section 26B has a start final sequence (start final sequence; for example, a pulse signal which rises at the start of the convergence period) of each operation pipeline 22-1, 22-2, ..., 22-n. And whether each of the operation pipelines 22-1, 22-2, ..., 22-n is in the above-mentioned convergence period, based on the end signal of the convergence period (terminal final sequence; for example, a pulse signal rising at the end of the convergence period). Whether or not to output as a convergence period notification signal to the corresponding overtaking prevention control signal changing unit 26A,
For example, it is configured as shown in FIG.

【００６６】図９において、２６ａはＯＲゲート、２６
ｂはＡＮＤゲート、２６ｃはフリップフロップ、２６ｅ
はインバータ（ＮＯＴゲート）であり、ＯＲゲート２６
ａは、開始信号とフリップフロップ２６ｃからのデータ
出力との論理和をとるものであり、ＡＮＤゲート２６ｂ
は、ＯＲゲート２６ａからの論理和出力と、終了信号の
インバータ２６ｅによる反転信号との論理積をとるもの
である。In FIG. 9, 26a is an OR gate and 26a.
b is an AND gate, 26c is a flip-flop, 26e
Is an inverter (NOT gate), and the OR gate 26
a is the logical sum of the start signal and the data output from the flip-flop 26c, and the AND gate 26b
Is a logical product of the logical sum output from the OR gate 26a and the inverted signal of the end signal from the inverter 26e.

【００６７】また、フリップフロップ２６ｃは、ＡＮＤ
ゲート２６ｂからの論理積出力に応じてデータ出力を
“１”に立ち上げるもので、ＯＲゲート２６ａに開始信
号が入力されると、フリップフロップ２６ｃのデータ出
力は立ち上がって“１”にセットされる一方、ＡＮＤゲ
ート２６ｂに終了信号が入力されると、フリップフロッ
プ２６ｃのデータ出力は“１”から“０”にリセットさ
れるようになっている。つまり、フリップフロップ２６
ｃのデータ出力（収束期間通知手段）は、各演算パイプ
ライン２２−１，２２−２，…，２２−ｎが収束処理を
実行中（収束期間中）、“１”に立ち上がっている。The flip-flop 26c is ANDed.
The data output is raised to "1" according to the logical product output from the gate 26b. When the start signal is input to the OR gate 26a, the data output of the flip-flop 26c is raised and set to "1". On the other hand, when the end signal is input to the AND gate 26b, the data output of the flip-flop 26c is reset from "1" to "0". That is, the flip-flop 26
The data output of c (convergence period notifying means) rises to "1" while each of the operation pipelines 22-1, 22-2, ..., 22-n is executing the convergence process (during the convergence period).

【００６８】そして、追越し防止制御信号変更部２６Ａ
をなすＡＮＤゲート２６ｄは、追越し防止制御部２５に
より生成される追越し防止制御信号と、フリップフロッ
プ２６ｃのデータ出力（収束期間通知手段）のインバー
タ（ＮＯＴゲート）２６ｆによる反転信号との論理積を
とるもので、フリップフロップ２６ｃのデータ出力が
“１”の間は、収束期間であるので、追越し防止制御信
号は、ＡＮＤゲート２６ｄで強制的に“０”に変更され
る一方、フリップフロップ２６ｃのデータ出力が“０”
の時は、追越し防止制御信号はそのまま出力されるよう
になっている。Then, the overtaking prevention control signal changing unit 26A
The AND gate 26d that forms the logical AND of the overtaking prevention control signal generated by the overtaking prevention control unit 25 and the inverted signal by the inverter (NOT gate) 26f of the data output (convergence period notifying means) of the flip-flop 26c. However, since the data output of the flip-flop 26c is "1" during the convergence period, the overtaking prevention control signal is forcibly changed to "0" by the AND gate 26d, while the data of the flip-flop 26c is changed. Output is "0"
At this time, the overtaking prevention control signal is output as it is.

【００６９】次に、上述のごとく構成されているベクト
ル処理装置の動作について、図１０〜図１８を参照しな
がら説明する。〔該当命令の動作の説明〕まず、該当命令の動作に関して説明する。ただし、実際
のインプリメンテーションの都合により、ここに記す動
作とは若干異なる動作をすることもある。ベクトル・レ
ジスタ２１上のデータの総和を求める総和演算命令や、
ベクトル・レジスタ２１上のデータの最大値もしくは最
小値を求める検索演算命令では、元となるデータをベク
トル・レジスタ２１から供給を受ける。Next, the operation of the vector processing device configured as described above will be described with reference to FIGS. [Description of Operation of Corresponding Instruction] First, the operation of the corresponding instruction will be described. However, depending on the actual implementation, the operation may be slightly different from the operation described here. A sum operation instruction for calculating the sum of the data in the vector register 21,
In the search operation instruction for obtaining the maximum value or the minimum value of the data in the vector register 21, the original data is supplied from the vector register 21.

【００７０】総和演算命令では、順次、累和を求めるこ
とになるが、処理速度の向上のために、パイプラインの
ステージ分の部分累和を、多重にもつ演算器分だけ生成
する。次に、総和演算の結果を求めるために、これらの
部分和を全て足し込む。総和演算では、この部分和を足
し込む処理を実行する期間が収束期間となる。検索演算
命令では、順次、比較し選択することになるが、この場
合も、ステージ数分の部分的な選択結果が、多重にもつ
演算器分だけ生成された後、これらの部分的な選択結果
の中から最終的に１つの結果を求める。検索演算では、
この部分的な選択結果の中から最終的な１つの結果を求
める操作を実行する期間が収束期間となる。In the sum operation instruction, the cumulative sum is sequentially obtained, but in order to improve the processing speed, the partial cumulative sums for the stages of the pipeline are generated only for the multiple arithmetic units. Next, these partial sums are all added to obtain the result of the summation operation. In the total sum calculation, the period during which the process of adding the partial sums is executed is the convergence period. In the search operation instruction, comparison and selection are sequentially performed, but in this case as well, after the partial selection results for the number of stages are generated for the arithmetic units having multiple, these partial selection results Finally, one result is sought. In the search operation,
The convergence period is a period for executing an operation for obtaining one final result from the partial selection results.

【００７１】これらの収束期間における処理は、ベクト
ル・レジスタ２１からのデータ供給を受けない処理のた
め、ベクトル・レジスタ２１のリンクによる追越し防止
制御は不要である。〔一般的な追越し防止制御の説明〕追越し防止制御を司るベクトル処理装置の部分は、一例
をあげると次のようになっている。命令の発信と進行状
況とを管理する命令発信／管理部２７が命令を発信する
ときに、その命令の発信以前に実行を開始され、未だそ
の実行を完了していない命令との間にオペランド・レジ
スタのリンクがあるかどうかをチェックし、リンクがあ
れば、追越し防止制御部２５にリンク条件が成立したこ
とを通知する。Since the processing in these convergence periods is the processing in which the data supply from the vector register 21 is not received, the overtaking prevention control by the link of the vector register 21 is unnecessary. [Description of General Overtaking Prevention Control] An example of the portion of the vector processing device that controls the overtaking prevention control is as follows. When the command transmission / management unit 27, which manages the command transmission and the progress status, transmits a command, it executes an operand between the command that has been executed before the command is transmitted and has not yet been executed. It is checked whether there is a register link, and if there is a link, the overtaking prevention control unit 25 is notified that the link condition is satisfied.

【００７２】このリンク条件は、リンクの成立したベク
トル・レジスタ２１にデータを書き込む先行命令の実行
を完了して、レジスタ・リンクが解消したときに解除さ
れる。レジスタ・リンクが複数存在するときは、全ての
レジスタ・リンクが解消されたときにリンク条件を解除
する。リンク条件の成立中は、先行命令のデータ書込が
中断されることを検出すると、追越し防止制御部２５が
起動される。この追越し防止制御部２５は、全ての演算
パイプライン２２−１，２２−２，…，２２−ｎとスト
ア・パイプライン２３Ｂとに対して処理を一時中断させ
るように通知する。ただし、リンクの起点（親）となる
パイプラインについては、処理の中断は通知しない。先
行命令のデータ書込を再開するか、もしくは、先行命令
を完了すると、追越し防止制御部２５は、全ての演算パ
イプライン２２−１，２２−２，…，２２−ｎとストア
・パイプライン２３Ｂとに対して処理を再開するように
通知する。This link condition is canceled when the register link is canceled by completing the execution of the preceding instruction for writing the data to the vector register 21 in which the link is established. When there are a plurality of register links, the link condition is released when all the register links are released. When it is detected that the data writing of the preceding instruction is interrupted while the link condition is satisfied, the overtaking prevention control unit 25 is activated. The overtaking prevention control unit 25 notifies all the operation pipelines 22-1, 22-2, ..., 22-n and the store pipeline 23B to suspend the processing. However, regarding the pipeline that is the starting point (parent) of the link, the interruption of processing is not notified. When the data writing of the preceding instruction is restarted or the preceding instruction is completed, the overtaking prevention control unit 25 causes all the operation pipelines 22-1, 22-2, ..., 22-n and the store pipeline 23B. Notify and to restart processing.

【００７３】実際には、各演算パイプライン２２−１，
２２−２，…，２２−ｎのベクトル・レジスタ２１から
のデータ読出のスループットは、どの演算パイプライン
２２−１，２２−２，…，２２−ｎでも等しいようにし
て、処理スループットの差による動的な追越し防止制御
を不要としている。このため、スループットの変化する
リンクの親となるパイプラインはロード・パイプライン
２３Ｂに限定される。演算パイプライン２２−１，２２
−２，…，２２−ｎ間の追越し防止については、全ての
演算パイプライン２２−１，２２−２，…，２２−ｎに
対して同時に処理の中断と再開とを指示することで回避
されている。Actually, each operation pipeline 22-1,
The throughputs of data read from the vector registers 21 of 22-2, ..., 22-n are made equal in any of the operation pipelines 22-1, 22-2 ,. It does not require dynamic overtaking prevention control. Therefore, the pipeline that is the parent of the link whose throughput changes is limited to the load pipeline 23B. Arithmetic pipelines 22-1 and 22
The prevention of overtaking between −2, ..., 22-n is avoided by instructing all arithmetic pipelines 22-1, 22-2 ,. ing.

【００７４】なお、インプリメンテーションの都合によ
り、実際には、低スループットの演算パイプラインもベ
クトル処理装置上に存在している。このような場合に
は、通常は、この低スループットのパイプラインは、他
のパイプラインとのリンク動作を禁止するように命令発
信の段階で制御されている。とは言っても、より性能を
追求する制御では、低スループットのパイプラインと言
えども、制限付きでリンク動作をさせることがある。Note that due to the implementation, a low-throughput arithmetic pipeline actually exists on the vector processing device. In such a case, this low-throughput pipeline is usually controlled at the stage of issuing an instruction so as to prohibit the link operation with other pipelines. However, in the control for pursuing higher performance, even if the pipeline has low throughput, the link operation may be limited.

【００７５】一例を挙げれば、先行する命令を処理する
パイプラインのスループットが高いか等しければ、その
パイプラインの書込結果を読み出すような形のリンク動
作を行なうことは可能である。禁止されているのは、低
スループットの演算パイプラインの書込結果を使用する
形で、高スループットのパイプラインが該低スループッ
トのパイプラインにリンクして動作することである。As an example, if the throughput of the pipeline processing the preceding instruction is high or equal, it is possible to perform the link operation of reading the write result of the pipeline. What is prohibited is that the high throughput pipeline operates by linking to the low throughput pipeline by using the write result of the low throughput arithmetic pipeline.

【００７６】〔リード処理期間の説明〕リード処理期間は、命令発信／管理部２７からの命令で
最初に処理するデータを演算パイプライン２２−１，２
２−２，…，２２−ｎがベクトル・レジスタ２１から受
け取った時点を開始とし、その命令で最後に処理するデ
ータを演算パイプライン２２−１，２２−２，…，２２
−ｎがベクトル・レジスタ２１から受け取った時点を完
了とする。[Explanation of Read Processing Period] In the read processing period, the data processed first by the command from the command transmission / management unit 27 is processed by the operation pipelines 22-1 and 22-2.
2-2, ..., 22-n starts when the data is received from the vector register 21, and the data to be processed last by the instruction is calculated by the operation pipelines 22-1, 22-2 ,.
Complete when -n is received from vector register 21.

【００７７】〔追越し防止制御信号変更部２６Ａ（Ａ
ＮＤゲート２６ｄ）を追越し防止制御部２５側に、収束
期間通知部２６Ｂを命令発信／管理部２７側にそなえた
場合の説明〕ここでは、命令発信／管理部２７から追越し防止制御部
２５に対して、各演算パイプライン２２−１，２２−
２，…，２２−ｎの処理が収束期間に入ったことを演算
パイプライン情報とともに通知する機能と、収束期間が
完了したことを演算パイプライン情報とともに通知する
機能とが追加されている。つまり、図９により前述した
収束期間通知部２６Ｂとしての機能が命令発信／管理部
２７にそなえられている。[Overtaking prevention control signal changing unit 26A (A
Description of a case in which the ND gate 26d) is provided for the overtaking prevention control unit 25 side and the convergence period notification unit 26B is provided for the command transmission / management unit 27 side] Then, each operation pipeline 22-1, 22
A function of notifying that the processing of 2, ..., 22-n has entered the convergence period together with the operation pipeline information and a function of notifying that the processing of the convergence period has been completed together with the operation pipeline information are added. That is, the function as the convergence period notification unit 26B described above with reference to FIG. 9 is provided in the command transmission / management unit 27.

【００７８】また、追越し防止制御部２５には、各演算
パイプライン２２−１，２２−２，…，２２−ｎに対す
る処理の一時中断（追越し防止制御信号）を通知する機
能が各演算パイプライン２２−１，２２−２，…，２２
−ｎ毎に設けられているが、その追越し防止制御部２５
における各追越し防止制御信号出力用信号線毎に、追越
し防止制御信号変更部２６Ａが設けられ、演算パイプラ
イン２２−１，２２−２，…，２２−ｎのうち、収束期
間にある演算パイプラインには処理の一時中断（追越し
防止制御信号）が通知されないようになっている。Further, the overtaking prevention control unit 25 has a function of notifying a temporary interruption of processing (overtaking prevention control signal) to each of the arithmetic pipelines 22-1, 22-2, ..., 22-n. 22-1, 22-2, ..., 22
Although it is provided for each −n, the overtaking prevention control unit 25
In each of the operation pipelines 22-1, 22-2, ..., 22-n, an operation pipeline in the convergence period is provided with an overtaking prevention control signal changing unit 26A for each signal line for outputting the overtaking prevention control signal in FIG. Is not notified of a temporary interruption of processing (overtaking prevention control signal).

【００７９】具体的には、命令発信／管理部２７では、
命令をデコードして収束期間をもつ命令であることを検
出すると、その命令を発信する演算パイプライン２２−
１，２２−２，…，２２−ｎの管理部（図示せず）に収
束期間を有する命令である旨のフラグを立てる。演算パ
イプライン側管理部では、ベクトル長からリード期間に
要する時間を計算し、追越し防止制御部２５からの処理
の一時中断を通知する機能による情報を使用して、リー
ド処理期間を正確に把握する。Specifically, the command issuing / management unit 27
When the instruction is decoded and it is detected that the instruction has a convergence period, the operation pipeline 22- that issues the instruction
A flag indicating that the instruction has a convergence period is set in a management unit (not shown) of 1, 22-2, ..., 22-n. The calculation pipeline side management unit calculates the time required for the read period from the vector length, and uses the information by the function of notifying the temporary interruption of the process from the overtaking prevention control unit 25 to accurately grasp the read processing period. .

【００８０】リード処理期間の完了とともに、収束期間
が開始されるわけであるから、この時に追越し防止制御
部２５に対して、収束期間通知部２６Ｂにより、その演
算パイプライン２２−１，２２−２，…，２２−ｎが収
束期間に入ったことを通知する。このとき通知される収
束期間通知信号は、前述した通り、収束期間にあるとき
にはその演算パイプライン２２−１，２２−２，…，２
２−ｎに対応する信号線が“１”に立ち上がり、それ以
外のときは“０”となるものである。Since the convergence period is started with the completion of the read processing period, at this time, the convergence period notification unit 26B notifies the overtaking prevention control unit 25 of the operation pipelines 22-1 and 22-2. , ..., 22-n is notified that it has entered the convergence period. The convergence period notification signal notified at this time is, as described above, the operation pipelines 22-1, 22-2, ..., 2 during the convergence period.
The signal line corresponding to 2-n rises to "1", and otherwise becomes "0".

【００８１】そして、追越し防止制御信号出力用の信号
線上に、前述のように追越し防止制御信号変更部２６Ａ
をなすＡＮＤゲート２６ｄを設けることで、対応する演
算パイプライン２２−１，２２−２，…，２２−ｎから
の収束期間通知信号（フリップフロップ２６ｃのデータ
出力）が“０”のときには、追越し防止制御信号は、そ
のままＡＮＤゲート２６ｄを通過して対応する演算パイ
プライン２２−１，２２−２，…，２２−ｎに通知され
る一方、対応する演算パイプライン２２−１，２２−
２，…，２２−ｎからの収束期間通知信号が“１”のと
きには、追越し防止制御信号は、ＡＮＤゲート２６ｄに
より“０”に変更されてから対応する演算パイプライン
２２−１，２２−２，…，２２−ｎに通知される。Then, as described above, the overtaking prevention control signal changing section 26A is provided on the signal line for outputting the overtaking prevention control signal.
By providing the AND gate 26d that forms the AND gate 26d, when the convergence period notification signal (data output of the flip-flop 26c) from the corresponding operation pipeline 22-1, 22-2, ... The prevention control signal directly passes through the AND gate 26d and is notified to the corresponding operation pipelines 22-1, 22-2, ..., 22-n, while the corresponding operation pipelines 22-1, 22-
When the convergence period notification signal from 2, ..., 22-n is "1", the overtaking prevention control signal is changed to "0" by the AND gate 26d, and then the corresponding operation pipelines 22-1, 22-2 , ..., 22-n are notified.

【００８２】これにより、従来、図１０に示すように、
追越し防止制御信号の出力時には収束期間であっても完
全に中断されていた処理が、本実施例では、収束期間中
にある演算パイプラインについては変更された追越し防
止制御信号を受けることにより、図１１に示すように、
中断されることなく実行されるようになる。〔追越し防止制御信号変更部２６Ａおよび収束期間通
知部２６Ｂを各演算パイプライン側にそなえた場合の動
作の説明〕ここでは、各演算パイプライン２２−１，２２−２，
…，２２−ｎ毎に追越し防止制御信号変更部２６Ａおよ
び収束期間通知部２６Ｂをそなえ、各演算パイプライン
自身のシーケンサ（開始信号，終了信号）に基づいて、
収束期間通知部２６Ｂにより自分が収束期間にいること
を検出し、収束期間中は、追越し防止制御信号変更部２
６Ａにより、追越し防止制御部２５から各演算パイプラ
イン２２−１，２２−２，…，２２−ｎに対する追越し
防止制御信号を、“１”から“０”に変更するようにし
ている。As a result, as shown in FIG.
Although the processing that was completely interrupted even during the convergence period at the time of outputting the overtaking prevention control signal, in the present embodiment, the operation pipeline in the convergence period receives the changed overtaking prevention control signal, As shown in 11,
It will be executed without interruption. [Explanation of Operation when Providing Overtaking Prevention Control Signal Changing Unit 26A and Convergence Period Notifying Unit 26B on Each Operation Pipeline Side] Here, each operation pipeline 22-1, 22-2
, 22-n is provided with an overtaking prevention control signal changing unit 26A and a convergence period notifying unit 26B, and based on the sequencer (start signal, end signal) of each operation pipeline itself,
The convergence period notification unit 26B detects that the user is in the convergence period, and during the convergence period, the overtaking prevention control signal changing unit 2
6A is used to change the overtaking prevention control signal from the overtaking prevention control unit 25 to each of the operation pipelines 22-1, 22-2, ..., 22-n from “1” to “0”.

【００８３】なお、各演算パイプライン２２−１，２２
−２，…，２２−ｎには、命令，ベクトル長，起動信号
を受け取って処理を実行できるようにデータフローを制
御するための内部シーケンサ（図示せず）が用意されて
いる。この内部シーケンサは、データフローを制御する
ための命令の各種シーケンスを実行できるように、命令
に伴うシーケンス連鎖の手順を滞りなく実行できるよう
になっている。The arithmetic pipelines 22-1 and 22
An internal sequencer (not shown) for controlling the data flow so that the instruction, the vector length, and the activation signal can be received and the processing can be executed is provided in each of -2, ..., 22-n. This internal sequencer is capable of smoothly executing the sequence chain procedure associated with the instructions so that it can execute various sequences of instructions for controlling the data flow.

【００８４】収束期間通知部２６Ｂでは、リード処理期
間の終了と収束期間開始との情報を内部シーケンサから
受け取って、収束期間以外のときは、追越し防止制御信
号変更部２６Ａ（ＡＮＤゲート２６ｄ）により、追越し
防止制御信号をそのまま通過させるが、対応する演算パ
イプラインに対する収束期間のときには、追越し防止制
御信号を“０”に変更する。The convergence period notifying unit 26B receives information on the end of the read processing period and the start of the convergence period from the internal sequencer, and when it is outside the convergence period, the overtaking prevention control signal changing unit 26A (AND gate 26d) Although the overtaking prevention control signal is passed as it is, the overtaking prevention control signal is changed to “0” during the convergence period for the corresponding arithmetic pipeline.

【００８５】これにより、上述のように追越し防止制御
信号変更部２６Ａおよび収束期間通知部２６Ｂを各演算
パイプライン側にそなえた場合にも、従来、図１０に示
すように、追越し防止制御信号の出力時には収束期間で
あっても完全に中断されていた処理が、本実施例では、
収束期間中にある演算パイプラインについては変更され
た追越し防止制御信号を受けることにより、図１１に示
すように、中断されることなく実行されるようになる。As a result, even when the overtaking prevention control signal changing unit 26A and the convergence period notifying unit 26B are provided on the respective operation pipeline sides as described above, conventionally, as shown in FIG. At the time of output, the processing that was completely interrupted even in the convergence period is
By receiving the changed overtaking prevention control signal, the arithmetic pipeline in the convergence period can be executed without interruption as shown in FIG.

【００８６】〔演算パイプラインの一構成例およびそ
の動作の説明〕ここで、演算パイプラインの一例として、図１３に示す
ように４段のステージから成るパイプライン式の加算器
２２ａ〜２２ｄを、図１２に示すように４個有する総和
演算パイプライン２２Ａを示す。なお、図１３におい
て、２８ａは第１ステージ・レジスタ、２８ｂは第２ス
テージ・レジスタ、２８ｃは第３ステージ・レジスタ、
２８ｄは第４ステージ・レジスタ、２８ｅは転送用中継
レジスタ、２９はセレクタ、３０は指数差計算手段、３
１は桁合わせ手段、３２は加算／減算手段、３３は正規
化手段である。また、各加算器２２ａ〜２２ｄのステー
ジ段数や、加算器の数については、処理装置毎に最適な
段数が用意され、処理する命令（検索演算命令等）によ
っては加算器を比較選択手段とすることもできる。[Example of Arrangement of Arithmetic Pipeline and Description of Its Operation] Here, as an example of an arithmetic pipeline, as shown in FIG. 13, the pipeline type adders 22a to 22d each including four stages are As shown in FIG. 12, a total operation pipeline 22A having four pieces is shown. In FIG. 13, 28a is a first stage register, 28b is a second stage register, 28c is a third stage register,
28d is a fourth stage register, 28e is a transfer relay register, 29 is a selector, 30 is an exponent difference calculation means, 3
1 is a digit alignment means, 32 is an addition / subtraction means, and 33 is a normalization means. Further, with respect to the number of stages of the adders 22a to 22d and the number of adders, an optimal number of stages is prepared for each processing device, and the adder is used as a comparison / selection means depending on an instruction to be processed (search operation instruction etc.). You can also

【００８７】各加算器２２ａ〜２２ｄの第１ステージは
桁合わせに先立つ指数比較処理、第２ステージは桁合わ
せ処理を実行する。第３ステージは加算／減算処理を実
行する。第４ステージは正規化処理である。加算器２２
ａはベクトル・レジスタ中のデータ・エレメント番号の
４による剰余が０のエレメントを処理し、加算器２２ｂ
はエレメント番号の４による剰余が１のエレメントを処
理し、加算器２２ｃはエレメント番号の４による剰余が
２のエレメントを処理し、加算器２２ｄはエレメント番
号の４による剰余が３のエレメントを処理するように構
成されている。The first stage of each of the adders 22a to 22d executes the exponent comparison process prior to the digit alignment, and the second stage executes the digit alignment process. The third stage executes addition / subtraction processing. The fourth stage is a normalization process. Adder 22
a processes an element whose remainder is 0 due to the data element number 4 in the vector register, and the adder 22b
Of the element number 4 processes the element whose remainder is 1, the adder 22c processes the element whose remainder by 2 of the element number 4 is 2, and the adder 22d processes the element whose remainder by the element number of 4 is 3. Is configured.

【００８８】また、各加算器２２ａ〜２２ｄを一つにま
とめるために、加算器２２ｂ〜２２ｄから加算器２２ａ
に対して結果を転送することができるようになってい
る。総和演算では、ベクトル・レジスタ２１からのデー
タ供給を受けるリード処理期間中は、第４ステージのデ
ータは第１ステージに戻されて、加算器２２ａ〜２２ｄ
は累和加算器として動作する。また、４つの加算器２２
ａ〜２２ｄは並列に動作する。最終リード処理期間中は
各加算器２２ａ〜２２ｄ中に４つの部分和を生成する。Further, in order to combine the adders 22a to 22d into one, the adders 22b to 22d are added to the adder 22a.
The result can be transferred to. In the summing operation, during the read processing period in which the data is supplied from the vector register 21, the data of the fourth stage is returned to the first stage, and the adders 22a to 22d are added.
Operates as a cumulative sum adder. Also, four adders 22
a to 22d operate in parallel. During the final read processing period, four partial sums are generated in each of the adders 22a to 22d.

【００８９】加算器２２ａには、Σ（Ｅ_16i) ，Σ（Ｅ
_16i+4），Σ（Ｅ_16i+8），Σ（Ｅ _16i+12）の４種類の
部分和が生成される。ここで、Ｅ_iはエレメント番号ｉ
のデータ・エレメントの値である。収束期間では、各部
分和を取りまとめる手段は、幾つも存在しえるが、ここ
では、その中の一例を図１４に示す。In the adder 22a, Σ (E_16i), Σ (E
_{16i + 4}), Σ (E_{16i + 8}), Σ (E _{16i + 12}) 4 types
Partial sums are generated. Where E_iIs the element number i
Is the value of the data element of. In the convergence period, each part
There can be many ways to organize the disintegration, but here
Then, an example thereof is shown in FIG.

【００９０】まず、各加算器２２ａ〜２２ｄ中で、４つ
の部分和を足し合わせて各々１つの部分結果を生成す
る。次に、これらの４つの部分結果を足し合わせて最終
的な結果を演算する。ここでも、加算器２２ａを例にと
って、部分結果を生成する様子を説明する。収束期間に
入った瞬間に、どの部分和がどのステージにいるかにつ
いては、ＶＬ長に依存するので、Σ（Ｅ_16i) の部分和
が第１ステージ・レジスタ２８ａにセットされるまで空
足しを行ない演算順序を保証する。第１ステージ・レジ
スタ２８ａで２τの間、部分和Σ（Ｅ_16i) をホールド
する。First, in each of the adders 22a to 22d, four partial sums are added up to generate one partial result. The four partial results are then added together to compute the final result. Here again, the manner in which the partial result is generated will be described taking the adder 22a as an example. At the moment of entering the convergence period, which partial sum is in which stage depends on the VL length, and therefore emptying is performed until the partial sum of Σ (E _16i ) is set in the first stage register 28a. Guarantee the calculation order. The partial sum Σ (E _16i ) is held for 2τ by the first stage register 28a.

【００９１】その間に、第１ステージのもう一方のレジ
スタ２８ａに部分和Σ（Ｅ_16i+4）をセットする。部分
和Σ（Ｅ_16i) と部分和Σ（Ｅ_16i+4）を第１ステージ
から第２ステージに移動させ加算処理を開始するととも
に、部分和Σ（Ｅ_16i+8）を第１ステージ・レジスタ２
８ａにセットし、２τの間、ホールドする。部分和Σ
（Ｅ_16i+8）をホールド中に、もう一つの第１ステージ
・レジスタ２８ａに部分和Σ（Ｅ_16i+12）をセットす
る。そして、部分和Σ（Ｅ_16i+8）と部分和Σ（Ｅ
_16i+12）とを加算する。In the meantime, the other cash register of the first stage
The partial sum Σ (E_{16i + 4}) Is set. part
Sum Σ (E_16i) And partial sum Σ (E_{16i + 4}) The first stage
From the second stage to start the addition process
And the partial sum Σ (E_{16i + 8}) Is the first stage register 2
Set to 8a and hold for 2τ. Partial sum Σ
(E_{16i + 8}) Hold another stage
The partial sum Σ (E_{16i + 12})
It Then, the partial sum Σ (E_{16i + 8}) And partial sum Σ (E
_{16i + 12}) And are added.

【００９２】続いて、部分和｛Σ（Ｅ_16i) ＋Σ（Ｅ
_16i+4）｝を第１ステージ・レジスタ２８ａにセット
し、３τの間、ホールドする。その間に部分和｛Σ（Ｅ
_16i+8）＋Σ（Ｅ_16i+12）｝をもう一方の第１ステージ
・レジスタ２８ａにセットして加算処理を実行して加算
器２２ａの中間結果を求める。収束期間における、この
中間和を求める動作は、各加算器２２ａ〜２２ｄで並行
して行なわれる。Then, the partial sum {Σ (E _16i ) + Σ (E
_{16i + 4} )} is set in the first stage register 28a and held for 3τ. Meanwhile, the partial sum {Σ (E
_{16i + 8} ) + Σ (E _{16i + 12} )} is set in the other first stage register 28a and addition processing is executed to obtain the intermediate result of the adder 22a. The operation of obtaining the intermediate sum during the convergence period is performed in parallel by each of the adders 22a to 22d.

【００９３】最終的な結果を求めるための演算は、加算
器２２ａで実行する。加算器２２ａでは、第１ステージ
・レジスタ２８ａに加算器２２ａの中間和をホールド
し、加算器２２ｂの中間和がもう一方の第１ステージ・
レジスタ２８ａにセットされるのを待つ。そして、加算
器２２ａと加算器２２ｂとの中間和どうしを加算して中
間和Ａを求める。この中間和Ａは、第１ステージ・レジ
スタ２８ａ上でホールドされ、加算器２２ｃの中間和が
もう一方の第１ステージ・レジスタ２８ａにセットされ
るのを待つ。そして、中間和Ａと加算器２２ｃの中間和
が加算されて、中間和Ｂが生成される。この中間和Ｂ
は、第１ステージ・レジスタ２８ａでホールドされ、加
算器２２ｄの中間和がもう一方の第１ステージ・レジス
タ２８ａにセットされるのを待つ。そして、中間和Ｂと
加算器２２ｄの中間和とを加算して、最終的な結果を生
成する。The operation for obtaining the final result is executed by the adder 22a. In the adder 22a, the intermediate sum of the adder 22a is held in the first stage register 28a, and the intermediate sum of the adder 22b is stored in the other first stage register.
Wait for the register 28a to be set. Then, the intermediate sums of the adder 22a and the adder 22b are added to obtain the intermediate sum A. This intermediate sum A is held on the first stage register 28a and waits until the intermediate sum of the adder 22c is set on the other first stage register 28a. Then, the intermediate sum A and the intermediate sum of the adder 22c are added to generate the intermediate sum B. This intermediate sum B
Is held by the first stage register 28a and waits for the intermediate sum of the adder 22d to be set in the other first stage register 28a. Then, the intermediate sum B and the intermediate sum of the adder 22d are added to generate a final result.

【００９４】〔付加演算器付き演算パイプラインに本
発明を適用した実施例の説明〕次に、各演算パイプライン２２−１，２２−２，…，２
２−ｎが、基本演算器と収束を処理する付加演算器とを
有する構成のもので（例えば図１７参照）、収束処理を
付加演算器により実行し、収束処理中、基本演算器によ
り後続の他の演算命令を実行できるものである場合につ
いて説明する。[Explanation of an embodiment in which the present invention is applied to an operation pipeline with an additional operation unit] Next, each operation pipeline 22-1, 22-2, ..., 2
2-n has a configuration including a basic computing unit and an additional computing unit that processes convergence (see, for example, FIG. 17), the convergence process is executed by the additional computing unit, and during the convergence process, A case will be described in which another operation instruction can be executed.

【００９５】このような場合も、基本演算器から切り離
されて収束期間に入ってから、その命令を完了するまで
の収束期間中、付加演算器では、追越し防止制御部２５
から各演算パイプライン２２−１，２２−２，…，２２
−ｎに対する処理の一時中断を通知する手段による情報
（追越し防止制御信号）を、追越し防止制御信号変更部
２６Ａにより受け付けないようにしている。Even in such a case, during the convergence period from the disconnection from the basic arithmetic unit to the convergence period until the completion of the instruction, the additional arithmetic unit has the overtaking prevention control unit 25.
To arithmetic pipelines 22-1, 22-2, ..., 22
The overtaking prevention control signal changing unit 26A does not accept the information (overtaking prevention control signal) by the means for notifying the temporary interruption of the processing for -n.

【００９６】付加演算器を使用する場合には、内部シー
ケンサについても、基本演算器部分と付加演算器部分と
に分割されている。付加演算器部分の内部シーケンサ
は、基本演算器部分の内部シーケンサから起動され、収
束期間に入る時点で基本演算器部分から切り離される旨
の通知を受ける。付加演算器部分の内部シーケンサは、
基本演算器部分から切り離されると収束のためのシーケ
ンスを起動する。When the additional arithmetic unit is used, the internal sequencer is also divided into a basic arithmetic unit and an additional arithmetic unit. The internal sequencer of the additional arithmetic unit is activated from the internal sequencer of the basic arithmetic unit, and receives a notification that the internal sequencer of the additional arithmetic unit will be disconnected from the basic arithmetic unit when the convergence period starts. The internal sequencer of the additional computing unit is
When separated from the basic computing unit, the sequence for convergence is activated.

【００９７】収束期間通知部２６Ｂは、リード期間の終
了と収束期間開始との情報を付加演算器部分の内部シー
ケンサから受け取って、追越し防止制御信号変更部２６
Ａにより、収束期間以外のときは、追越し防止制御信号
をそのまま通過させるが、対応する演算パイプライン２
２−１，２２−２，…，２２−ｎの付加演算器に対する
収束期間の時には追越し防止制御信号を“０”に変更す
る。The convergence period notifying section 26B receives the information about the end of the read period and the start of the convergence period from the internal sequencer of the additional arithmetic unit, and the overtaking prevention control signal changing section 26.
According to A, the passing control signal is passed as it is during the period other than the convergence period.
During the convergence period for the additional arithmetic units 2-1, 22-2, ..., 22-n, the overtaking prevention control signal is changed to "0".

【００９８】基本演算器部分については、付加演算器部
分を切り離した後は、付加演算器部分が演算を完了する
まで、付加演算器部分を必要としない演算を実行可能と
して、命令発信／管理部２７からの起動待ちとなる。付
加演算器部分が命令を完了し且つ基本演算器部分が付加
演算器部分を使用しない命令のリード処理期間を完了す
るか、付加演算器部分と基本演算器部分との双方ともに
命令の実行を完了していれば、その演算パイプライン２
２−１，２２−２，…，２２−ｎは、付加演算器を使用
する命令を実行可能になる。Regarding the basic arithmetic unit, after the additional arithmetic unit is separated, the instruction transmitting / managing unit can execute the arithmetic that does not require the additional arithmetic unit until the additional arithmetic unit completes the arithmetic operation. Waiting for activation from 27. The additional computing unit completes the instruction and the basic computing unit completes the read processing period of the instruction that does not use the additional computing unit, or both the additional computing unit and the basic computing unit complete the execution of the instruction. If so, the operation pipeline 2
22-1, 22-2, ..., 22-n can execute an instruction using an additional arithmetic unit.

【００９９】これにより、収束処理を行なう付加演算器
付きの演算パイプラインについても、従来、図１５に示
すように、追越し防止制御信号の出力時には収束期間で
あっても完全に中断されていた処理が、本実施例では、
収束期間中にある演算パイプラインの付加演算器につい
ては変更された追越し防止制御信号を受けることによ
り、図１６に示すように、中断されることなく実行され
るようになる。As a result, as for the operation pipeline with the additional operation unit for performing the convergence processing, as shown in FIG. 15, conventionally, the processing which was completely interrupted even during the convergence period at the time of outputting the overtaking prevention control signal. However, in this embodiment,
By receiving the changed overtaking prevention control signal, the additional arithmetic unit of the arithmetic pipeline in the convergence period can be executed without interruption as shown in FIG.

【０１００】〔付加演算器付き演算パイプラインの一
構成例およびその動作の説明〕ここでは、付加演算器付き演算パイプラインの一例とし
て、図１７に示すように、４段のステージから成るパイ
プライン式の基本演算器（図１３に示した加算器と同様
構成のもの）３４ａ〜３４ｄと、４段のステージから成
るパイプライン式の付加演算器（図１３に示した加算器
と同様構成のもの）３５ａ〜３５ｄとを組み合わせた複
合演算器３６ａ〜３６ｄを４つもつような総和演算パイ
プライン２２Ｂを示す。[Structural Example of Arithmetic Pipeline with Additional Arithmetic Unit and Description of Its Operation] Here, as an example of the arithmetic pipeline with an additional arithmetic unit, as shown in FIG. 17, a pipeline including four stages is shown. Expression-based arithmetic unit (having the same configuration as the adder shown in FIG. 13) 34a to 34d and a pipeline-type additional arithmetic unit (having the same configuration as the adder shown in FIG. 13) consisting of four stages ) 35a to 35d are combined with each other, and a total operation pipeline 22B having four complex operation units 36a to 36d is shown.

【０１０１】なお、ステージ段数や加算器の数について
は、処理装置毎に最適な段数が用意され、処理する命令
（検索演算命令等）によっては加算器を比較選択手段と
することもできる。また、前述したものと同様に、各演
算器（加算器）３４ａ〜３４ｄおよび３５ａ〜３５ｄの
第１ステージは桁合わせに先立つ指数比較処理、第２ス
テージは桁合わせ処理を実行する。第３ステージは加算
／減算処理を実行する。第４ステージは正規化処理であ
る。With respect to the number of stages and the number of adders, an optimal number of stages is prepared for each processing device, and the adder can be used as the comparison / selection means depending on the instruction to be processed (search operation instruction, etc.). Further, similarly to the above-mentioned one, the first stage of each of the arithmetic units (adders) 34a to 34d and 35a to 35d executes the exponent comparison process prior to the digit alignment, and the second stage executes the digit alignment process. The third stage executes addition / subtraction processing. The fourth stage is a normalization process.

【０１０２】複合演算器３６ａはベクトル・レジスタ２
１中のデータ・エレメント番号の８による剰余が０と１
のエレメントを処理し、複合演算器３６ｂはエレメント
番号の８による剰余が２と３のエレメントを処理し、複
合演算器３６ｃはエレメント番号の８による剰余が４と
５のエレメントを処理し、複合演算器３６ｄはエレメン
ト番号の８による剰余が６と７のエレメントを処理する
ように構成されている。The complex computing unit 36a is the vector register 2
The remainder due to the data element number 8 in 1 is 0 and 1
, The composite arithmetic unit 36b processes the elements having the remainders 2 and 3 due to the element number 8, and the composite arithmetic unit 36c processes the elements having the remainders 4 and 5 due to the element number 8 to perform the composite arithmetic operation. The device 36d is configured to process the elements whose remainders by the element number 8 are 6 and 7.

【０１０３】また、各付加加算器３５ａ〜３５ｄの結果
を一つにまとめるために、付加加算器３５ｂ，付加加算
器３５ｃ，付加加算器３５ｄから付加加算器３５ａに対
して結果を転送できるようになっている。そして、リー
ド処理期間には、基本演算器３４ａ〜３４ｄからなる基
本演算部３４では、連続する２つのエレメントの和を求
めて、付加演算器３５ａ〜３５ｄからなる付加演算部３
５に転送する。付加演算部３５は基本演算部３４から受
け取った中間和を足し込む。リード期間が終了するとき
には、付加演算部内に４つの部分和が生成されている。
複合演算器３６ａを例にとると、この４つの部分和は、
Σ（Ｅ_32i＋Ｅ_32i+1) ，Σ（Ｅ_32i+8＋Ｅ_32i+9），
Σ（Ｅ_32i+16＋Ｅ₃₂ _i+17），Σ（Ｅ_32i+24＋Ｅ_32i+25）
となる。ここで、Ｅ_iはエレメント番号ｉのデータ・エ
レメントの値である。In order to combine the results of the additional adders 35a to 35d into one, the results can be transferred from the additional adder 35b, the additional adder 35c, and the additional adder 35d to the additional adder 35a. Has become. Then, during the read processing period, the basic arithmetic unit 34 including the basic arithmetic units 34a to 34d calculates the sum of two consecutive elements, and the additional arithmetic unit 3 including the additional arithmetic units 35a to 35d.
Transfer to 5. The additional calculation unit 35 adds the intermediate sum received from the basic calculation unit 34. At the end of the read period, four partial sums have been generated in the additional calculation unit.
Taking the complex computing unit 36a as an example, these four partial sums are
Σ (E _32i + E _{32i + 1} ), Σ (E _{32i + 8} + E _{32i + 9} ),
Σ (E _{32i + 16} + E ₃₂ _{i + 17} ), Σ (E _{32i + 24} + E _{32i + 25} )
Becomes Here, E _i is the value of the data element of the element number i.

【０１０４】収束期間に入ると、付加演算部３５は基本
演算部３４のデータバスからのデータを受け付けないよ
うにする。続いて、４つの部分和から中間結果を計算
し、最後に４つの中間結果から最終結果を計算する。複
合演算器３６ａを例にとって中間結果を求める手順を図
１８により説明する。When the convergence period starts, the additional arithmetic unit 35 stops accepting data from the data bus of the basic arithmetic unit 34. Subsequently, the intermediate result is calculated from the four partial sums, and finally the final result is calculated from the four intermediate results. The procedure for obtaining the intermediate result will be described with reference to FIG.

【０１０５】まず、演算順序を保証するために、部分和
Σ（Ｅ_32i＋Ｅ_32i+1) が第１ステージ・レジスタ２８
ａ上に来るまで空足しを行なう。その部分和は第１ステ
ージ・レジスタ２８ａ上で２τ間ホールドされる。その
間に部分和Σ（Ｅ_32i+8＋Ｅ _32i+9）をもう一つの第１
ステージ・レジスタ２８ａにセットする。そして、部分
和Ａの計算を開始する。First, in order to guarantee the operation order, the partial sum
Σ (E_32i+ E_{32i + 1}) Is the first stage register 28
Do emptying until you come to the top. The partial sum is the first step
It is held for 2τ on the charge register 28a. That
Partial sum Σ (E_{32i + 8}+ E _{32i + 9}) Another first
It is set in the stage register 28a. And part
The calculation of the sum A is started.

【０１０６】次に、部分和Σ（Ｅ_32i+16＋Ｅ_32i+17）を
第１ステージ・レジスタ２８ａにセットし２τの間ホー
ルドする。その間に部分和Σ（Ｅ_32i+24＋Ｅ_32i+25）を
もう一方の第１ステージ・レジスタ２８ａにセットす
る。そして、もう一つの部分和Ｂを計算する。部分和Ａ
を第１ステージ・レジスタ２８ａに２τの間ホールド
し、その間に部分和Ｂをもう一方の第１ステージ・レジ
スタ２８ａにセットし、そして中間結果を計算する。Next, the partial sum Σ (E _{32i + 16} + E _{32i + 17} ) is set in the first stage register 28a and held for 2τ. Meanwhile, the partial sum Σ (E _{32i + 24} + E _{32i + 25} ) is set in the other first stage register 28a. Then, another partial sum B is calculated. Partial sum A
Is held in the first stage register 28a for 2τ, while the partial sum B is set in the other first stage register 28a and the intermediate result is calculated.

【０１０７】収束期間における、この中間結果を求める
動作は、各付加演算器３５ａ〜３５ｄで並行して行なわ
れる。最終的な結果を求めるための演算は、付加演算器
３５ａで実行する。付加演算器３５ａでは、第１ステー
ジ・レジスタ２８ａに付加演算器３５ａの中間結果をホ
ールドし、付加演算器３５ｂの中間結果がもう一方の第
１ステージ・レジスタ２８ａにセットされるのを待つ。
そして、付加演算器３５ａと付加演算器３５ｂの中間結
果どうしを加算して中間結果Ａを求める。この中間結果
Ａは、第１ステージ・レジスタ２８ａ上でホールドさ
れ、付加演算器３５ｃの中間結果がもう一方の第１ステ
ージ・レジスタ２８ａにセットされるのを待つ。The operation for obtaining the intermediate result during the convergence period is performed in parallel by each of the additional arithmetic units 35a to 35d. The calculation for obtaining the final result is executed by the additional calculator 35a. The additional arithmetic unit 35a holds the intermediate result of the additional arithmetic unit 35a in the first stage register 28a and waits for the intermediate result of the additional arithmetic unit 35b to be set in the other first stage register 28a.
Then, the intermediate results of the additional arithmetic unit 35a and the additional arithmetic unit 35b are added together to obtain an intermediate result A. The intermediate result A is held on the first stage register 28a and waits for the intermediate result of the additional arithmetic unit 35c to be set on the other first stage register 28a.

【０１０８】そして、中間結果Ａと付加演算器３５ｃの
中間結果とが加算されて中間結果Ｂが生成される。この
中間結果Ｂは、第１ステージ・レジスタ２８ａでホール
ドされ、付加演算器３５ｄの中間結果がもう一方の第１
ステージ・レジスタ２８ａにセットされるのを待つ。そ
して、中間結果Ｂと付加演算器３５ｄの中間結果とを加
算して、最終的な結果を生成する。Then, the intermediate result A and the intermediate result of the additional operator 35c are added to generate the intermediate result B. This intermediate result B is held by the first stage register 28a, and the intermediate result of the additional arithmetic unit 35d is the other first result.
Wait for the stage register 28a to be set. Then, the intermediate result B and the intermediate result of the additional operator 35d are added to generate a final result.

【０１０９】このように、本発明の第２実施例のベクト
ル処理装置によれば、追越し防止制御部２５から追越し
防止制御信号が出力された際に、演算パイプライン２２
−１，２２−２，…，２２−ｎ，２２Ａ，２２Ｂが収束
処理のシーケンスを実行中で、収束期間条件が成立して
いる間は、追越し防止制御信号変更部２６Ａにより追越
し防止制御部２５からの追越し防止制御信号が変更さ
れ、収束処理中の演算パイプライン２２に対する追越し
防止制御が禁止されるので、追越し防止制御信号変更部
２６Ａおよび収束期間通知部２６Ｂという極めて少量の
物量の増加によるだけで、収束期間におけるレジスタ・
リンクによる追越し制御のオーバヘッドを回避すること
ができ、性能の向上を実現することができる。As described above, according to the vector processing device of the second embodiment of the present invention, when the overtaking prevention control signal is output from the overtaking prevention control unit 25, the arithmetic pipeline 22
-1, 22-2, ..., 22-n, 22A, 22B are executing the sequence of the convergence process, and while the convergence period condition is satisfied, the overtaking prevention control signal changing unit 26A causes the overtaking prevention control unit 25. The overtaking prevention control signal is changed, and the overtaking prevention control for the operation pipeline 22 during the convergence process is prohibited. In the convergence period,
It is possible to avoid the overhead of the overtaking control by the link and improve the performance.

【０１１０】[0110]

【発明の効果】以上詳述したように、本発明のベクトル
処理装置（請求項１，２）によれば、スループットの少
ない演算パイプラインを、１つのバンクスロットしか使
用しないメモリ・アクセス・パイプライン（ストア・パ
イプライン）と共用し、演算パイプラインをオーバラッ
プさせて実行することにより、演算スループットの大幅
な向上を実現できる効果がある。As described above in detail, according to the vector processing device of the present invention (claims 1 and 2), a memory access pipeline using only one bank slot is used as an arithmetic pipeline having a low throughput. By sharing with (Store Pipeline) and executing the operation pipelines in an overlapping manner, there is an effect that a significant improvement in the operation throughput can be realized.

【０１１１】また、本発明のベクトル処理装置（請求項
３〜５）によれば、追越し防止制御部２５から追越し防
止制御信号が出力された際に、演算パイプラインが収束
処理のシーケンスを実行中で、収束期間条件が成立して
いる間は、変更部追越し防止制御信号が変更され、収束
処理中の演算パイプラインに対する追越し防止制御が禁
止されるので、処理速度の大幅な改善を実現できる効果
がある。Further, according to the vector processing device of the present invention (claims 3 to 5), when the overtaking prevention control signal is output from the overtaking prevention control unit 25, the arithmetic pipeline is executing the sequence of the convergence processing. Therefore, while the convergence period condition is satisfied, the change unit overtaking prevention control signal is changed and the overtaking prevention control for the operation pipeline during the convergence process is prohibited, so that a significant improvement in processing speed can be realized. There is.

【図面の簡単な説明】[Brief description of drawings]

【図１】第１の発明の原理説明図である。FIG. 1 is a diagram illustrating the principle of the first invention.

【図２】第２の発明の原理ブロック図である。FIG. 2 is a principle block diagram of a second invention.

【図３】本発明の第１実施例としてのベクトル処理装置
を示すブロック図である。FIG. 3 is a block diagram showing a vector processing device as a first embodiment of the present invention.

【図４】第１実施例の動作を説明するためのタイミング
チャートである。FIG. 4 is a timing chart for explaining the operation of the first embodiment.

【図５】第１実施例のバンク管理部の構成例を示すブロ
ック図である。FIG. 5 is a block diagram showing a configuration example of a bank management unit of the first embodiment.

【図６】第１実施例の動作を説明するためのタイミング
チャートである。FIG. 6 is a timing chart for explaining the operation of the first embodiment.

【図７】第１実施例の動作を説明するためのタイミング
チャートである。FIG. 7 is a timing chart for explaining the operation of the first embodiment.

【図８】本発明の第２実施例としてのベクトル処理装置
を示すブロック図である。FIG. 8 is a block diagram showing a vector processing device as a second embodiment of the present invention.

【図９】第２実施例における追越し防止制御信号変更部
の構成を示す回路図である。FIG. 9 is a circuit diagram showing a configuration of an overtaking prevention control signal changing unit in the second embodiment.

【図１０】第２実施例の動作を説明するための図であ
る。FIG. 10 is a diagram for explaining the operation of the second embodiment.

【図１１】第２実施例の動作を説明するための図であ
る。FIG. 11 is a diagram for explaining the operation of the second embodiment.

【図１２】第２実施例の演算パイプラインの構成例を示
すブロック図である。FIG. 12 is a block diagram showing a configuration example of an arithmetic pipeline of the second embodiment.

【図１３】第２実施例の加算器の構成例を示すブロック
図である。FIG. 13 is a block diagram showing a configuration example of an adder of a second embodiment.

【図１４】第２実施例の演算パイプラインの動作例を説
明するためのタイミングチャートである。FIG. 14 is a timing chart for explaining an operation example of the arithmetic pipeline according to the second embodiment.

【図１５】第２実施例の動作を説明するための図であ
る。FIG. 15 is a diagram for explaining the operation of the second embodiment.

【図１６】第２実施例の動作を説明するための図であ
る。FIG. 16 is a diagram for explaining the operation of the second embodiment.

【図１７】第２実施例の付加演算器付き演算パイプライ
ンの構成例を示すブロック図である。FIG. 17 is a block diagram showing a configuration example of an arithmetic pipeline with an additional arithmetic unit according to the second embodiment.

【図１８】第２実施例の付加演算器付き演算パイプライ
ンの動作例を説明するためのタイミングチャートであ
る。FIG. 18 is a timing chart for explaining an operation example of the arithmetic pipeline with additional arithmetic unit according to the second embodiment.

【図１９】一般的なバンクスロットのタイミング設定例
を示すタイミングチャートである。FIG. 19 is a timing chart showing an example of general bank slot timing setting.

[Explanation of symbols]

１−０，１−１，…，１−ｎベクトル・レジスタ２メモリ・アクセス・パイプライン２Ａロード・パイプライン２Ｂストア・パイプライン３Ａ，３Ｂ−０，３Ｂ−１，…，３Ｂ−ｍ書込レジス
タ４Ａ，４Ｂ−０，４Ｃ−０，４Ｂ−１，４Ｃ−１，…，
４Ｂ−ｍ，４Ｃ−ｍ読出レジスタ５−０，５−１，…，５−ｍ演算パイプライン６命令制御部７バンク管理部７ａバンクスロット・カウンタ１１−１，１１−２，１１−３管理レジスタ１２バンクスロット割当回路１３通知レジスタ１４起動信号制御部１５割算パイプライン１５ａ〜１５ｆ割算器２０主記憶部２１ベクトル・レジスタ２２，２２−１，２２−２，…，２２−ｎ演算パイプ
ライン２２Ａ，２２Ｂ総和演算パイプライン２２ａ〜２２ｄ加算器２３−１，２３−２，…，２３−ｍメモリ・アクセス
・パイプライン２３，２３Ａロード・パイプライン２３Ｂストア・パイプライン２４主記憶部２５追越し防止制御部２６変更部２６Ａ追越し防止制御信号変更部２６Ｂ収束期間通知部２６ａＯＲゲート２６ｂＡＮＤゲート２６ｃフリップフロップ２６ｄＡＮＤゲート２６ｅ，２６ｆインバータ（ＮＯＴゲート）２７命令発信／管理部２８ａ第１ステージ・レジスタ２８ｂ第２ステージ・レジスタ２８ｃ第３ステージ・レジスタ２８ｄ第４ステージ・レジスタ２８ｅ転送用中継レジスタ２９セレクタ３０指数差計算手段３１桁合わせ手段３２加算／減算手段３３正規化手段３４基本演算部３４ａ〜３４ｄ基本演算器３５付加演算部３５ａ〜３５ｄ付加演算器３６ａ〜３６ｄ複合演算器1-0, 1-1, ..., 1-n Vector register 2 Memory access pipeline 2A Load pipeline 2B Store pipeline 3A, 3B-0, 3B-1, ..., 3B-m Write Registers 4A, 4B-0, 4C-0, 4B-1, 4C-1, ...,
4B-m, 4C-m read register 5-0, 5-1, ..., 5-m operation pipeline 6 instruction control unit 7 bank management unit 7a bank slot counter 11-1, 11-2, 11-3 management Register 12 Bank slot allocation circuit 13 Notification register 14 Activation signal control unit 15 Division pipeline 15a to 15f Divider 20 Main storage unit 21 Vector register 22, 22-1, 22-2, ..., 22-n Operation pipe Lines 22A and 22B Summation operation pipelines 22a to 22d Adders 23-1, 23-2, ..., 23-m Memory access pipeline 23, 23A Load pipeline 23B Store pipeline 24 Main memory 25 Overtaking Prevention control unit 26 Change unit 26A Overtaking prevention control signal change unit 26B Convergence period notification unit 26a OR gate 26b AND gate 26c Flip-flop 26d AND gate 26e, 26f Inverter (NOT gate) 27 Command issuing / management unit 28a First stage register 28b Second stage register 28c Third stage register 28d Fourth stage register 28e Transfer Relay register 29 Selector 30 Exponent difference calculation means 31 Digit matching means 32 Addition / subtraction means 33 Normalization means 34 Basic operation unit 34a to 34d Basic operation unit 35 Additional operation unit 35a to 35d Additional operation device 36a to 36d Complex operation device

───────────────────────────────────────────────────── フロントページの続き (72)発明者今野勝彦神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (72)発明者渥美宏昭神奈川県川崎市中原区上小田中1015番地富士通株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Katsuhiko Konno 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture, Fujitsu Limited (72) Inventor, Hiroaki Atsumi 1015, Kamiodanaka, Nakahara-ku, Kawasaki, Kanagawa Prefecture Fujitsu Limited

Claims

[Claims]

1. A vector register (1-0, ..., 1-n) for storing a plurality of element data in units of a plurality of interleaved banks, and the vector register (1-0, ..., 1-). n) a plurality of operation pipelines (5-0, ..., 5-m) for accessing each element data and one or more memory access pipelines (2), and the operation pipeline (5 -0, ..., 5-m) and a memory access pipeline (2), and a bank management unit (7) that manages a bank slot indicating a timing at which each bank unit can be accessed. (5-0, ..., 5-m) and the memory access pipeline (2)
In the vector processing device for sequentially accessing each bank unit of the registers (1-0, ..., 1-n) to process each element data, the plurality of operation pipelines (5-0, ..., 5-m) Has at least one operation pipeline (15) having a lower operation throughput than other operation pipelines, and the bank management unit (7) allows the operation pipelines (5-0, ..., 5-m) and the memory access pipeline (2) are connected to the vector registers (1-0, ...,
1-n), the plurality of arithmetic pipelines (5-
Of the 0, ..., 5-m), the operation pipeline (15) having a low operation throughput is the vector register (1-
0, ..., 1-n) is accessed by using the timing assigned as the memory access pipeline (2).

2. The plurality of arithmetic pipelines (5-0,
, 5-m), the operation pipeline (15) with a low operation throughput is the vector register (1-0,
, 1-n) is accessed by the store pipeline (2B) of the memory access pipeline (2) at the vector register (1-0,
, 1-n) is assigned to a bank slot at a read timing for performing a store operation from the main storage unit (20).

3. A vector register (21) for storing a plurality of element data in a plurality of interleaved banks, and data on the vector register (21) as an input operand, or an operation result. One or more operation pipelines (22) for writing data to the vector register (21) and one or more load pipes for transferring data from the main memory (24) to the vector register (21) A line (23), the load pipeline (23) is configured to transfer the data from the load pipeline (23) to the vector register (21) during execution of an instruction. Execution order of instructions when the operation pipeline (22) executes the subsequent operation instruction whose input operand is the data written in 21) In order to guarantee the order, the execution of the load pipeline (23) is executed by the subsequent operation pipeline (2
In the vector processing device having the overtaking prevention control unit (25) for temporarily suspending the execution of all the operation pipelines (22) when the condition of the process of 2) overtaking is detected, the vector register (21) For a vector instruction that requires a read processing period for receiving the data supply and a convergence period for collecting the results after the read processing period, overtaking prevention control for the operation pipeline (22) that is executing the convergence processing of the vector instruction is performed. A vector processing device comprising a changing unit (26) for changing an overtaking prevention control signal output from the overtaking prevention control unit (25) so as not to perform.

4. The vector processing apparatus according to claim 3, wherein the changing unit (26) is provided in the arithmetic pipeline (25).

5. An arithmetic pipeline (22) executing a convergence process of a corresponding vector instruction has a basic arithmetic unit and an additional arithmetic unit for processing convergence, and the convergence process is performed by the additional arithmetic unit. If the basic operation unit can execute another subsequent operation instruction during the convergence process, the changing unit (26) prevents overtaking only for the additional operation unit during the convergence process. 4. The overtaking prevention control signal is changed so as not to perform control.
Alternatively, the vector processing device according to item 4.