JP5732953B2

JP5732953B2 - Vector processing apparatus, vector processing method, and program

Info

Publication number: JP5732953B2
Application number: JP2011066124A
Authority: JP
Inventors: 山田　洋平; 洋平山田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2011-03-24
Filing date: 2011-03-24
Publication date: 2015-06-10
Anticipated expiration: 2031-03-24
Also published as: JP2012203544A

Description

本発明は、命令パケットを制御することにより処理性能を向上させるのに好適なベクトル処理装置、ベクトル処理方法、及び、プログラムに関する。 The present invention relates to a vector processing apparatus, a vector processing method, and a program suitable for improving processing performance by controlling instruction packets.

プロセッサによるキャッシュへの書き込み制御については、メモリにデータを書き込むのと同時にキャッシュにも書き込みを行うライトスルー制御と、キャッシュにのみ書き込むライトバック制御とが知られている。ライトスルー制御は、ライトバック制御よりも制御が簡単である反面、ストア処理に伴うメモリへのアクセスの頻度が多く、また、ストア処理にかかる時間が大きい。したがって、ライトスルー制御は、プロセッサの性能向上の制約となるという短所がある。 As for the write control to the cache by the processor, there are known a write-through control for writing to the cache at the same time as writing data to the memory and a write-back control for writing only to the cache. The write-through control is simpler than the write-back control, but has a high frequency of access to the memory accompanying the store process and takes a long time for the store process. Therefore, the write-through control has a disadvantage that it becomes a restriction for improving the performance of the processor.

図１２に示すような、スカラプロセッシング部９１とベクトルプロセッシング部９２とを備えるベクトル処理装置９において、スカラプロセッシング部９１にはライトスルー制御が採用される。例えば、レジスタからメモリ９５へデータの書き込みを実行する際、Ｌ１キャッシュ９１ａに当該データを書き込むのと同時にメモリ９５へとデータを書き込む処理が行われる。 In the vector processing device 9 including the scalar processing unit 91 and the vector processing unit 92 as shown in FIG. 12, the scalar processing unit 91 employs write-through control. For example, when data is written from the register to the memory 95, the data is written to the memory 95 at the same time as the data is written to the L1 cache 91a.

仮に、スカラプロセッシング部９１において、データをＬ１キャッシュ９１ａにのみ書き込むライトバック制御を採用したとする。この場合、スカラプロセッシング部９１がベクトルプロセッシング部９２内のデータのロードを行う際に、多くのアドレス要素に対して、Ｌ１キャッシュ９１ａに格納されたデータのフラッシュ及びメモリへの書き戻し、又はＬ１キャッシュ９１ａからベクトルプロセッシング部９２へのデータの転送等が必要となる。すなわち、ライトバック制御を採用すると、制御が非常に複雑になり、また、ベクトルプロセッシング部９２に対するロード命令の実行が停滞する可能性がある。したがって、スカラプロセッシング部とベクトルプロセッシング部とを並列に設けたベクトル処理装置においては、上記の不都合を回避するため、ライトスルー制御が採用されている。ライトスルー制御が採用されているベクトル処理装置は、データを毎回メモリ９５へ書き込まなければならないので、ライトバック制御が採用されているプロセッサに比べ、多くの命令パケットがアドレス変換部９３へ発行される。 Assume that the scalar processing unit 91 employs write-back control for writing data only to the L1 cache 91a. In this case, when the scalar processing unit 91 loads the data in the vector processing unit 92, the data stored in the L1 cache 91a is written back to the flash memory and the L1 cache for many address elements. Data transfer from 91a to the vector processing unit 92 is required. That is, when the write back control is employed, the control becomes very complicated, and the execution of the load instruction for the vector processing unit 92 may be stagnant. Therefore, in a vector processing apparatus in which a scalar processing unit and a vector processing unit are provided in parallel, write-through control is employed in order to avoid the above inconvenience. Since the vector processing apparatus employing the write-through control must write data to the memory 95 every time, more instruction packets are issued to the address conversion unit 93 than the processor employing the write-back control. .

プロセッサが発行する命令パケットの数、形式、又はタイミングなどは、プロセッサの処理性能に大きく影響を及ぼすため、従来から様々な手法が検討されてきた。例えば、特許文献１には、１つのプロセッサが発行する命令コードの長さ、及び、命令を発行する周期を、データに合わせて調節する技術が開示されている。しかし、複数のプロセッサを有するベクトル処理装置においては、上記のようにデータの書き込み手法に制限があるため、ベクトル処理装置に特化した命令パケットの形式等を検討する必要がある。 Since the number, format, or timing of instruction packets issued by a processor greatly affects the processing performance of the processor, various methods have been conventionally studied. For example, Patent Document 1 discloses a technique for adjusting the length of an instruction code issued by one processor and the cycle of issuing instructions according to data. However, in a vector processing apparatus having a plurality of processors, there is a limitation on the data writing method as described above, so it is necessary to consider the format of an instruction packet specialized for the vector processing apparatus.

例えば、ライトスルー制御を採用するスカラプロセッシング部９１は、１〜８バイト長のデータをストアするスカラストア命令を実行する際、ストアするデータサイズにもかかわらず８バイトのデータフィールドをもつ命令パケットをアドレス変換部９３等に発行していた。また、スカラプロセッシング部９１は、データをストアするアドレス範囲が所定のアドレス境界を跨がない場合には１つの命令パケット、所定のアドレス境界を跨ぐ場合には２つの命令パケットを発行していた。このため、例えば、連続したアドレスに対して１、２バイト等のデータのスカラストア命令を実行すると、８バイトのデータフィールドを全部使用しない、すなわち、命令パケットの利用効率の悪い命令パケットが複数発行される。そして、命令パケットが発行されるたびにアドレス変換部９３やメモリネットワーク９４へデータ転送を行わなければならなかった。 For example, when a scalar processing unit 91 that employs write-through control executes a scalar store instruction that stores data having a length of 1 to 8 bytes, an instruction packet having an 8-byte data field is used regardless of the data size to be stored. It was issued to the address conversion unit 93 and the like. Further, the scalar processing unit 91 issues one instruction packet when the address range for storing data does not cross a predetermined address boundary, and two instruction packets when the address range crosses a predetermined address boundary. For this reason, for example, when a scalar store instruction of data such as 1 or 2 bytes is executed for consecutive addresses, the entire 8-byte data field is not used, that is, a plurality of instruction packets with inefficient use of instruction packets are issued. Is done. Each time an instruction packet is issued, data must be transferred to the address conversion unit 93 and the memory network 94.

一方、スカラプロセッシング部９１から発行される命令パケットは、常にスカラプロセッシング部９１が命令パケットを生成した順にアドレス変換部９３に発行されている。スカラプロセッシング部９１及びベクトルプロセッシング部９２での演算処理を優先させるため、アクセスするメモリのアドレスが重ならなければ、多くの場合ストア命令よりもロード命令を優先的に実行することが求められる。しかし、ベクトルプロセッシング部９２に対するロード命令は処理対象のアドレス領域が広範囲にわたるため、多数の命令が存在するスカラプロセッシング部９１のパイプライン中においてはアドレス領域の重なりを逐一判定することは実装上困難である。したがって、スカラプロセッシング部９１とベクトルプロセッシング部９２とを並列に設けたベクトル処理装置９においては、従来から、スカラプロセッシング部９１に対するストア命令とベクトルプロセッシング部９２に対するロード命令との間の追い越し制御は行われていなかった。 On the other hand, the instruction packet issued from the scalar processing unit 91 is always issued to the address conversion unit 93 in the order in which the scalar processing unit 91 generates the instruction packet. In order to prioritize the arithmetic processing in the scalar processing unit 91 and the vector processing unit 92, it is often required to execute the load instruction preferentially over the store instruction unless the addresses of the memory to be accessed overlap. However, since the load instruction for the vector processing unit 92 covers a wide range of address areas to be processed, it is difficult in implementation to determine the overlap of the address areas one by one in the pipeline of the scalar processing unit 91 in which many instructions exist. is there. Therefore, in the vector processing device 9 in which the scalar processing unit 91 and the vector processing unit 92 are provided in parallel, the overtaking control between the store instruction for the scalar processing unit 91 and the load instruction for the vector processing unit 92 has conventionally been performed. It wasn't.

特開２００５−２６７２１５号公報JP 2005-267215 A

上記のような命令パケットの形式は、ストアするデータが少ないのにもかかわらず、複数の命令パケットが発行され、データ転送の負荷が大きいという問題があった。そして、このようなデータ転送の負荷は、ベクトルプロセッシング部でのロード又はストア処理も滞らせ、ベクトル処理装置全体の性能低下の一因となっていた。 The format of the instruction packet as described above has a problem that a plurality of instruction packets are issued despite a small amount of data to be stored, and the load of data transfer is large. Such a data transfer load also causes a load or store process in the vector processing unit to be delayed, causing a reduction in the performance of the entire vector processing apparatus.

また、ストア命令のアドレスとロード命令のアドレスとが重ならなければ、ロード命令の命令パケットを追い越させて優先的に実行させることにより、ベクトル処理装置の処理速度をより高速化させることが求められる。 Further, if the address of the store instruction and the address of the load instruction do not overlap, it is required to increase the processing speed of the vector processing device by overtaking the instruction packet of the load instruction and executing it preferentially. .

本発明の目的は、上記のような課題を解決するもので、命令パケットを制御することにより処理性能を向上させるのに好適なベクトル処理装置、ベクトル処理方法、及び、プログラムを提供することを目的とする。 An object of the present invention is to solve the above-described problems, and to provide a vector processing apparatus, a vector processing method, and a program suitable for improving processing performance by controlling an instruction packet. And

本発明の第１の観点に係るベクトル処理装置は、
スカラ演算を行うスカラ処理部と、ベクトル演算を行うベクトル処理部と、記憶部と、を備えるベクトル処理装置であって、
前記スカラ処理部は、
前記記憶部からデータをロードする命令あるいは前記記憶部にデータをストアする命令を示す第１の命令パケット及び第２の命令パケットを発行するものであって、
前記スカラ処理部により発行される第１の命令パケットを格納する命令パケットバッファ部と、
前記第１の命令パケット及び当該第１の命令パケットの後に発行される前記第２の命令パケットの種類を判別し、前記第１の命令パケットが指定する第１のアドレスと、前記第２の命令パケットが指定する第２のアドレスとを比較し、前記第１の命令パケットと前記第２の命令パケットとを結合する指示を出力するバッファ制御部と、
前記出力された指示に基づき前記第１の命令パケットと前記第２の命令パケットとを結合する結合処理部と、
を備え、
前記バッファ制御部は、
前記第１の命令パケットが示す命令がスカラストア命令であり、前記第２の命令パケットが示す命令が
（ａ）スカラストア命令であって、前記第１のアドレスと前記第２のアドレスが一致する場合、前記結合処理部に前記結合する指示を出力し、結合されたパケットを前記第１の命令パケットとして命令パケットバッファ部に格納させ、前記第１のアドレスと前記第２のアドレスが一致しない場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させ、
（ｂ）ベクトルロード命令であって、前記第１のアドレスと前記第２のアドレスが重複しない場合、前記第２の命令パケットを出力し、前記第１のアドレスと前記第２のアドレスが重複する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させ、
（ｃ）スカラロード命令であって、前記第１のアドレスと前記第２のアドレスが一致しない場合、前記第２の命令パケットを出力し、前記第１のアドレスと前記第２のアドレスが一致する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させ、
（ｄ）ベクトルストア命令であって、前記第１のアドレスと前記第２のアドレスが重複しない場合、前記第２の命令パケットを出力し、前記第１のアドレスと前記第２のアドレスが重複する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させる
ことを特徴とする。 A vector processing apparatus according to the first aspect of the present invention provides:
A vector processing apparatus comprising a scalar processing unit that performs a scalar operation, a vector processing unit that performs a vector operation, and a storage unit,
The scalar processing unit is
Issuing a first instruction packet and a second instruction packet indicating an instruction to load data from the storage unit or an instruction to store data in the storage unit;
An instruction packet buffer unit for storing a first instruction packet issued by the scalar processing unit;
It determines the type of the second command packet issued after said first command packet and the first instruction packet, the first address of the first instruction packet designated, the second instruction A buffer control unit that compares a second address specified by the packet and outputs an instruction to combine the first instruction packet and the second instruction packet;
A combining processor that combines the first instruction packet and the second instruction packet based on the output instruction;
With
The buffer control unit
The instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is (a) a scalar store instruction, and the first address matches the second address. The combination processing unit outputs the combination instruction, the combined packet is stored in the instruction packet buffer unit as the first instruction packet , and the first address and the second address do not match , Outputting the first instruction packet, causing the second instruction packet to be stored in the instruction packet buffer unit as the first instruction packet,
(B) a vector load instruction, when said first address and said second address does not overlap, the second outputs an instruction packet, the second address and the first address duplicate If so, the first instruction packet is output, the second instruction packet is stored in the instruction packet buffer unit as the first instruction packet,
(C) In the case of a scalar load instruction, when the first address and the second address do not match, the second instruction packet is output , and the first address and the second address match Output the first instruction packet, store the second instruction packet as the first instruction packet in the instruction packet buffer unit,
(D) In the case of a vector store instruction, when the first address and the second address do not overlap, the second instruction packet is output, and the first address and the second address overlap. In this case, the first instruction packet is output, and the second instruction packet is stored in the instruction packet buffer unit as the first instruction packet .

本発明の第２の観点に係るベクトル処理方法は、
スカラ演算を行うスカラ処理部と、ベクトル演算を行うベクトル処理部と、記憶部と、を備え、前記スカラ処理部は、前記記憶部からデータをロードする命令あるいは前記記憶部にデータをストアする命令を示す第１の命令パケット及び第２の命令パケットを発行するものであって、当該スカラ処理部が、命令パケットバッファ部と、バッファ制御部と、結合処理部とを備える、ベクトル処理装置が実行するベクトル処理方法であって、
当該ベクトル処理方法は、
前記命令パケットバッファ部が、前記スカラ処理部により発行される第１の命令パケットを格納する命令パケットバッファ工程と、
前記バッファ制御部が、前記第１の命令パケット及び当該第１の命令パケットの後に発行される前記第２の命令パケットの種類を判別し、前記第１の命令パケットが指定する第１のアドレスと、前記第２の命令パケットが指定する第２のアドレスとを比較し、前記第１の命令パケットと前記第２の命令パケットとを結合する指示を出力するバッファ制御工程と、
前記結合処理部が、前記出力された指示に基づき前記第１の命令パケットと前記第２の命令パケットとを結合する結合処理工程と、
を備え、
前記バッファ制御工程において、前記バッファ制御部は、
前記第１の命令パケットが示す命令がスカラストア命令であり、前記第２の命令パケットが示す命令が
（ａ）スカラストア命令であって、前記第１のアドレスと前記第２のアドレスが一致する場合、前記結合処理部に前記結合する指示を出力し、結合されたパケットを前記第１の命令パケットとして命令パケットバッファ部に格納させ、前記第１のアドレスと前記第２のアドレスが一致しない場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させ、
（ｂ）ベクトルロード命令であって、前記第１のアドレスと前記第２のアドレスが重複しない場合、前記第２の命令パケットを出力し、前記第１のアドレスと前記第２のアドレスが重複する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させ、
（ｃ）スカラロード命令であって、前記第１のアドレスと前記第２のアドレスが一致しない場合、前記第２の命令パケットを出力し、前記第１のアドレスと前記第２のアドレスが一致する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させ、
（ｄ）ベクトルストア命令であって、前記第１のアドレスと前記第２のアドレスが重複しない場合、前記第２の命令パケットを出力し、前記第１のアドレスと前記第２のアドレスが重複する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させる
ことを特徴とする。 A vector processing method according to the second aspect of the present invention provides:
A scalar processing unit that performs a scalar operation; a vector processing unit that performs a vector operation; and a storage unit, wherein the scalar processing unit loads data from the storage unit or stores data in the storage unit The first instruction packet and the second instruction packet are issued, and the scalar processing unit includes an instruction packet buffer unit, a buffer control unit, and a combination processing unit. A vector processing method for
The vector processing method is
An instruction packet buffer step in which the instruction packet buffer unit stores a first instruction packet issued by the scalar processing unit;
The buffer control unit determines the type of the second command packet issued after said first command packet and the first instruction packet, the first address of the first instruction packet designated compares the second address the second instruction packet is specified, the buffer control step of outputting an instruction to combine said first instruction packet and said second instruction packet,
A combining processing step in which the combining processing unit combines the first instruction packet and the second instruction packet based on the output instruction;
With
In the buffer control step, the buffer control unit includes:
The instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is (a) a scalar store instruction, and the first address matches the second address. The combination processing unit outputs the combination instruction, the combined packet is stored in the instruction packet buffer unit as the first instruction packet , and the first address and the second address do not match , Outputting the first instruction packet, causing the second instruction packet to be stored in the instruction packet buffer unit as the first instruction packet,
(B) a vector load instruction, when said first address and said second address does not overlap, the second outputs an instruction packet, the second address and the first address duplicate If so, the first instruction packet is output, the second instruction packet is stored in the instruction packet buffer unit as the first instruction packet,
(C) In the case of a scalar load instruction, when the first address and the second address do not match, the second instruction packet is output , and the first address and the second address match Output the first instruction packet, store the second instruction packet as the first instruction packet in the instruction packet buffer unit,
(D) In the case of a vector store instruction, when the first address and the second address do not overlap, the second instruction packet is output, and the first address and the second address overlap. In this case, the first instruction packet is output, and the second instruction packet is stored in the instruction packet buffer unit as the first instruction packet .

本発明の第３の観点に係るプログラムは、
コンピュータを、
スカラ演算を行うスカラ処理部と、ベクトル演算を行うベクトル処理部と、記憶部と、を備え、前記スカラ処理部は、前記記憶部からデータをロードする命令あるいは前記記憶部にデータをストアする命令を示す第１の命令パケット及び第２の命令パケットを発行する、ベクトル処理装置として機能させるプログラムであって、
前記プログラムは、前記コンピュータを、
前記スカラ処理部において、
前記スカラ処理部により発行される第１の命令パケットを格納する命令パケットバッファ部、
前記第１の命令パケット及び当該第１の命令パケットの後に発行される前記第２の命令パケットの種類を判別し、前記第１の命令パケットが指定する第１のアドレスと、前記第２の命令パケットが指定する第２のアドレスとを比較し、前記第１の命令パケットと前記第２の命令パケットとを結合する指示を出力するバッファ制御部、
前記出力された指示に基づき前記第１の命令パケットと前記第２の命令パケットとを結合する結合処理部、
として機能させ、
前記バッファ制御部は、
前記第１の命令パケットが示す命令がスカラストア命令であり、前記第２の命令パケットが示す命令が
（ａ）スカラストア命令であって、前記第１のアドレスと前記第２のアドレスが一致する場合、前記結合処理部に前記結合する指示を出力し、結合されたパケットを前記第１の命令パケットとして命令パケットバッファ部に格納させ、前記第１のアドレスと前記第２のアドレスが一致しない場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させ、
（ｂ）ベクトルロード命令であって、前記第１のアドレスと前記第２のアドレスが重複しない場合、前記第２の命令パケットを出力し、前記第１のアドレスと前記第２のアドレスが重複する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させ、
（ｃ）スカラロード命令であって、前記第１のアドレスと前記第２のアドレスが一致しない場合、前記第２の命令パケットを出力し、前記第１のアドレスと前記第２のアドレスが一致する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させ、
（ｄ）ベクトルストア命令であって、前記第１のアドレスと前記第２のアドレスが重複しない場合、前記第２の命令パケットを出力し、前記第１のアドレスと前記第２のアドレスが重複する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記第１の命令パケットとして前記命令パケットバッファ部に格納させる
ように機能させることを特徴とする。
The program according to the third aspect of the present invention is:
Computer
A scalar processing unit that performs a scalar operation; a vector processing unit that performs a vector operation; and a storage unit, wherein the scalar processing unit loads data from the storage unit or stores data in the storage unit A program for functioning as a vector processing device that issues a first instruction packet and a second instruction packet indicating
The program causes the computer to
In the scalar processing unit,
An instruction packet buffer unit for storing a first instruction packet issued by the scalar processing unit;
It determines the type of the second command packet issued after said first command packet and the first instruction packet, the first address of the first instruction packet designated, the second instruction A buffer control unit that compares a second address specified by the packet and outputs an instruction to combine the first instruction packet and the second instruction packet;
A combining processor that combines the first instruction packet and the second instruction packet based on the output instruction;
Function as
The buffer control unit
The instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is (a) a scalar store instruction, and the first address matches the second address. The combination processing unit outputs the combination instruction, the combined packet is stored in the instruction packet buffer unit as the first instruction packet , and the first address and the second address do not match , Outputting the first instruction packet, causing the second instruction packet to be stored in the instruction packet buffer unit as the first instruction packet,
(B) a vector load instruction, when said first address and said second address does not overlap, the second outputs an instruction packet, the second address and the first address duplicate If so, the first instruction packet is output, the second instruction packet is stored in the instruction packet buffer unit as the first instruction packet,
(C) In the case of a scalar load instruction, when the first address and the second address do not match, the second instruction packet is output , and the first address and the second address match Output the first instruction packet, store the second instruction packet as the first instruction packet in the instruction packet buffer unit,
(D) In the case of a vector store instruction, when the first address and the second address do not overlap, the second instruction packet is output, and the first address and the second address overlap. The first instruction packet is output, and the second instruction packet is stored in the instruction packet buffer unit as the first instruction packet.
It is made to function as follows.

上記プログラムは、プログラムが実行されるコンピュータとは独立して、コンピュータ通信網を介して配布・販売することができる。また、上記プログラムを格納した記録媒体は、コンピュータとは独立して配布・販売することができる。 The above program can be distributed and sold via a computer communication network independently of the computer on which the program is executed. The recording medium storing the program can be distributed and sold independently from the computer.

本発明によれば、ベクトル処理装置において、ストア命令の命令パケットを保持するバッファをスカラプロセッシング部に設けることにより、複数のストア命令パケットの結合及び命令パケット間の追い越し制御を行うことができる。これにより、スカラプロセッシング部から発行されるストア命令パケットの数を削減し、ベクトルロード／ストア命令の追い越しを行い、処理性能を向上させることができる。 According to the present invention, in the vector processing device, by providing a buffer for holding an instruction packet of a store instruction in the scalar processing unit, a combination of a plurality of store instruction packets and an overtaking control between instruction packets can be performed. Thereby, the number of store instruction packets issued from the scalar processing unit can be reduced, vector load / store instructions can be overtaken, and the processing performance can be improved.

本発明の実施形態に係るベクトル処理装置が実現されるベクトル処理装置を説明するための図である。It is a figure for demonstrating the vector processing apparatus by which the vector processing apparatus which concerns on embodiment of this invention is implement | achieved. 命令パケットの形式を説明するための図である。It is a figure for demonstrating the format of an instruction packet. 実施形態１のスカラプロセッシング部を説明するための図である。3 is a diagram for explaining a scalar processing unit according to the first embodiment. FIG. 命令パケットの結合処理を説明するための図である。It is a figure for demonstrating the combination process of an instruction packet. 実施形態１の命令バッファ部の各部が行う命令パケット制御処理を説明するためのフローチャート図である。FIG. 5 is a flowchart for explaining instruction packet control processing performed by each unit of the instruction buffer unit according to the first embodiment. スカラストア命令パケット制御処理を説明するためのフローチャート図である。It is a flowchart for demonstrating a scalar store instruction packet control process. ベクトルロード／ストア命令パケット制御処理を説明するためのフローチャート図である。It is a flowchart for demonstrating a vector load / store instruction packet control process. スカラロード命令パケット制御処理を説明するためのフローチャート図である。It is a flowchart for demonstrating a scalar load instruction packet control process. その他の命令パケット制御処理を説明するためのフローチャート図である。It is a flowchart figure for demonstrating other command packet control processing. 有効でない命令パケットの制御処理を説明するためのフローチャート図である。It is a flowchart for demonstrating control processing of the command packet which is not effective. 実施形態２の命令バッファ部の各部が行う命令パケット制御処理を説明するためのフローチャート図である。FIG. 10 is a flowchart for explaining instruction packet control processing performed by each unit of an instruction buffer unit according to the second embodiment. 従来のベクトル処理装置を説明するための図である。It is a figure for demonstrating the conventional vector processing apparatus.

（１．実施形態１のベクトル処理装置の概要構成）
本発明の実施形態に係るベクトル処理装置は、典型的には、図１に示すベクトル処理装置１により実現される。以下、ベクトル処理装置１の概要構成について説明する。 (1. Outline Configuration of Vector Processing Device of Embodiment 1)
The vector processing apparatus according to the embodiment of the present invention is typically realized by the vector processing apparatus 1 shown in FIG. Hereinafter, a schematic configuration of the vector processing device 1 will be described.

ベクトル処理装置１は、図１に示すように、スカラプロセッシング部１１と、ベクトルプロセッシング部１２と、アドレス変換部１３と、メモリネットワーク１４と、メモリ１５と、から構成される。 As shown in FIG. 1, the vector processing device 1 includes a scalar processing unit 11, a vector processing unit 12, an address conversion unit 13, a memory network 14, and a memory 15.

ベクトル処理装置１は、スカラプロセッシング部１１内のスカラレジスタに格納されたデータをメモリ１５へ格納させる「スカラストア命令」と、メモリ１５からスカラレジスタにデータを読み込む「スカラロード命令」と、ベクトルプロセッシング部１２内のベクトルレジスタに格納されたデータをメモリ１５へ格納させる「ベクトルストア命令」と、メモリ１５からベクトルレジスタにデータを読み込む「ベクトルロード命令」と、を扱う。以下、ベクトル処理装置１がこれらの命令を扱う場合を例に、各部の機能について説明する。 The vector processing apparatus 1 includes a “scalar store instruction” for storing data stored in the scalar register in the scalar processing unit 11 in the memory 15, a “scalar load instruction” for reading data from the memory 15 into the scalar register, and vector processing. A “vector store instruction” for storing data stored in the vector register in the unit 12 in the memory 15 and a “vector load instruction” for reading data from the memory 15 into the vector register are handled. Hereinafter, the function of each unit will be described by taking as an example the case where the vector processing apparatus 1 handles these instructions.

スカラプロセッシング部１１は、例えば、Ｌ１キャッシュ１１ａと、命令バッファ部１１ｂと、スカラデータを格納するスカラレジスタと（図示せず）、各種演算器と（図示せず）、を有する。スカラプロセッシング部１１は、所定のプログラムに従って、命令のフェッチ、デコード、スケジューリング、及び、各種命令の実行を行う。スカラプロセッシング部１１は、ベクトルプロセッシング部１２、アドレス変換部１３等、ベクトル処理装置１内の構成要素に対して、ロード／ストア命令やベクトル演算をはじめとした各種制御命令を示す「命令パケット」を発行する。 The scalar processing unit 11 includes, for example, an L1 cache 11a, an instruction buffer unit 11b, a scalar register that stores scalar data (not shown), and various arithmetic units (not shown). The scalar processing unit 11 fetches, decodes, schedules, and executes various instructions according to a predetermined program. The scalar processing unit 11 sends “instruction packets” indicating various control instructions including load / store instructions and vector operations to the components in the vector processing device 1 such as the vector processing unit 12 and the address conversion unit 13. Issue.

以下、スカラプロセッシング部１１が発行するスカラストア命令を示す命令パケットを「スカラストア命令パケット」、スカラロード命令を示す命令パケットを「スカラロード命令パケット」、ベクトルストア命令を示す命令パケットを「ベクトルロード命令パケット」、ベクトルロード命令を示すパケットを「ベクトルロード命令パケット」、という。また、スカラロード／ストア命令及びベクトルロード／ストア命令以外の命令であって、スカラプロセッシング部１１が発行するすべての命令（例えば、ページ切替命令やプロセッサ間の通信を行う命令）を示す命令パケットを「その他の命令パケット」、という。各命令パケットの形式を図２に示す。なお、その他の命令パケットの形式は任意であり、以下省略する。 Hereinafter, an instruction packet indicating a scalar store instruction issued by the scalar processing unit 11 is a “scalar store instruction packet”, an instruction packet indicating a scalar load instruction is “scalar load instruction packet”, and an instruction packet indicating a vector store instruction is “vector load” An instruction packet ”and a packet indicating a vector load instruction are referred to as a“ vector load instruction packet ”. In addition, an instruction packet indicating all instructions issued by the scalar processing unit 11 (for example, a page switching instruction and an instruction for performing communication between processors) other than the scalar load / store instruction and the vector load / store instruction. “Other instruction packets”. The format of each instruction packet is shown in FIG. Other command packet formats are arbitrary and will be omitted below.

図２（ａ）に示すように、スカラストア命令パケット２００は、命令コード２０１と、アドレス２０２と、バイトイネーブル２０３と、ストアデータ２０４と、から構成される。 As shown in FIG. 2A, the scalar store instruction packet 200 includes an instruction code 201, an address 202, a byte enable 203, and store data 204.

命令コード２０１は、命令の種類別又はパケット形式を示すものである。 The instruction code 201 indicates the instruction type or packet format.

アドレス２０２は、ストアデータ２０４が格納されるメモリ１５上のアドレスを示すものである。アドレス２０２は、８の倍数の値しかとることができないものとする。 The address 202 indicates an address on the memory 15 where the store data 204 is stored. It is assumed that the address 202 can only take a value that is a multiple of 8.

バイトイネーブル２０３は、６４ビット幅のストアデータ２０４を１バイト単位で区切り、どのバイトにストアデータ２０４が入っているか（有効）であるかを示す８ビット幅の情報である。 The byte enable 203 is 8-bit width information indicating which byte contains the store data 204 (valid) by dividing the 64-bit width store data 204 in units of 1 byte.

ストアデータ２０４は、メモリ１５に格納する６４ビット幅のデータである。 The store data 204 is 64-bit width data stored in the memory 15.

すなわち、スカラストア命令パケット２００は、指定されたアドレス２０２にストアデータ２０４を格納する命令を示すものである。 That is, the scalar store instruction packet 200 indicates an instruction for storing the store data 204 at the designated address 202.

図２（ｂ）に示すように、スカラロード命令パケット３００は、命令コード３０１と、アドレス３０２と、から構成される。 As shown in FIG. 2B, the scalar load instruction packet 300 includes an instruction code 301 and an address 302.

命令コード３０１は、命令コード２０１と同様に、命令の種類別又はパケット形式を示すものである。 Similar to the instruction code 201, the instruction code 301 indicates the type of instruction or the packet format.

アドレス３０２は、スカラプロセッシング部１１に読み込まれるデータのメモリ１５上のアドレスを示すものである。 An address 302 indicates an address on the memory 15 of data read by the scalar processing unit 11.

すなわち、スカラロード命令パケット３００は、アドレス３０２で指定されたメモリ１５上のデータを読み込む命令を示すものである。 That is, the scalar load instruction packet 300 indicates an instruction for reading data on the memory 15 designated by the address 302.

図２（ｃ）に示すように、ベクトルロード／ストア命令パケット４００は、命令コード４０１と、ベースアドレス４０２と、ディスタンス４０３と、要素数４０４と、から構成される。 As shown in FIG. 2C, the vector load / store instruction packet 400 includes an instruction code 401, a base address 402, a distance 403, and an element number 404.

命令コード４０１は、命令コード２０１と同様に、命令の種類別又はパケット形式を示すものである。 As with the instruction code 201, the instruction code 401 indicates the type of instruction or packet format.

ベースアドレス４０２は、メモリ１５上のアドレスを示すものである。 The base address 402 indicates an address on the memory 15.

ディスタンス４０３は、ベースアドレス４０２からの距離を示すものである。 A distance 403 indicates a distance from the base address 402.

要素数４０４は、アドレスの数を示すものである。 The element number 404 indicates the number of addresses.

ここで、ベースアドレス４０２を“Ｘ”、ディスタンス４０３を“Ｙ”、要素数４０４を“Ｎ”とすると、ベクトルロード／ストア命令パケットが指定するアドレスは、Ｘ、Ｘ＋Ｙ、Ｘ＋２＊Ｙ、・・・Ｘ＋（Ｎ−１）＊Ｙ、である。 Here, if the base address 402 is “X”, the distance 403 is “Y”, and the number of elements 404 is “N”, the addresses specified by the vector load / store instruction packet are X, X + Y, X + 2 * Y,. X + (N-1) * Y.

すなわち、ベクトルロード／ストア命令パケット４００は、ベースアドレス４０２及びベースアドレス４０２からディスタンス４０３の整数倍だけ離れたアドレスであって、要素数個のアドレスに対し、一斉にデータの読込み又は書き込みの命令を示すものである。 That is, the vector load / store instruction packet 400 is an address that is separated from the base address 402 and the base address 402 by an integer multiple of the distance 403, and commands for reading or writing data are simultaneously sent to several addresses. It is shown.

スカラプロセッシング部１１は、スカラストア命令を実行する際、アドレス２０２及びストアデータ２０４を決定し、それらを図２（ａ）の形式の命令パケットに整形してアドレス変換部１３に送出する。また、スカラプロセッシング部１１は、スカラロード命令を実行する際、アドレス３０２を決定し、図２（ｂ）の形式の命令パケットに整形してアドレス変換部１３に送出する。 When executing the scalar store instruction, the scalar processing unit 11 determines the address 202 and the store data 204, shapes them into an instruction packet having the format shown in FIG. Further, when executing the scalar load instruction, the scalar processing unit 11 determines the address 302, shapes it into an instruction packet having the format shown in FIG. 2B, and sends it to the address conversion unit 13.

一方、スカラプロセッシング部１１は、ベクトルロード／ストア命令を実行する際、ベースアドレス４０２、ディスタンス４０３、及び、要素数４０４を決定し、それらを図２（ｃ）の形式の命令パケットに整形する。そして、スカラプロセッシング部１１は、当該命令パケットを、アドレス変換部１３へ送出し、アドレス変換部１３を介してベクトルプロセッシング部１２に引き渡す。 On the other hand, when executing the vector load / store instruction, the scalar processing unit 11 determines the base address 402, the distance 403, and the number of elements 404, and shapes them into an instruction packet having the format shown in FIG. Then, the scalar processing unit 11 sends the instruction packet to the address conversion unit 13 and delivers it to the vector processing unit 12 via the address conversion unit 13.

ベクトルプロセッシング部１２は、ベクトルデータを格納するベクトルレジスタや各種演算器（図示せず）を有する。ベクトルプロセッシング部１２は、スカラプロセッシング部１１からの命令に基づいて各種演算処理を行い、又は、アドレス変換部１３とロード／ストアデータの送受信を行う。例えば、ベクトルプロセッシング部１２が、スカラプロセッシング部１１からベクトルストア命令パケットを受け付けると、ベクトルレジスタに格納されたデータをアドレス変換部１３に送出する。 The vector processing unit 12 includes a vector register for storing vector data and various arithmetic units (not shown). The vector processing unit 12 performs various arithmetic processes based on instructions from the scalar processing unit 11 or transmits / receives load / store data to / from the address conversion unit 13. For example, when the vector processing unit 12 receives a vector store instruction packet from the scalar processing unit 11, it sends the data stored in the vector register to the address conversion unit 13.

アドレス変換部１３は、スカラプロセッシング部１１又はベクトルプロセッシング部１２が行うデータのロード／ストアの実行に際し、必要があれば論理アドレスから物理アドレスへの変換を行い、メモリ１５へのデータ転送のスケジューリングを行う。 The address conversion unit 13 performs conversion from a logical address to a physical address, if necessary, when performing data load / store performed by the scalar processing unit 11 or the vector processing unit 12, and schedules data transfer to the memory 15. Do.

アドレス変換部１３は、スカラプロセッシング部１１からスカラストア命令パケットを受け取ると、命令パケットの内容を解釈し、物理アドレスへの変換処理を行う。そして、アドレス変換部１３は、変換後のアドレス及びストアデータ２０４を、メモリネットワーク１４を介してメモリ１５へ転送する。 When the address conversion unit 13 receives the scalar store instruction packet from the scalar processing unit 11, the address conversion unit 13 interprets the content of the instruction packet and performs conversion processing to a physical address. Then, the address conversion unit 13 transfers the converted address and the store data 204 to the memory 15 via the memory network 14.

アドレス変換部１３は、スカラプロセッシング部１１からスカラロード命令パケットを受け取ると、命令パケットの内容を解釈し、物理アドレスへの変換処理を行う。そして、アドレス変換部１３は、変換後のアドレスを、メモリネットワーク１４を介してメモリ１５へ転送する。 When the address conversion unit 13 receives the scalar load instruction packet from the scalar processing unit 11, the address conversion unit 13 interprets the content of the instruction packet and performs conversion processing to a physical address. Then, the address conversion unit 13 transfers the converted address to the memory 15 via the memory network 14.

アドレス変換部１３は、スカラプロセッシング部１１からベクトルストア命令パケットを受け取り、ベクトルプロセッシング部１２からデータを受け取ると、ベースアドレス、ディスタンス、及び要素数で指定されたアドレスの実アドレスへの展開及び物理アドレスへの変換処理を行う。そして、アドレス変換部１３は、変換後のアドレス又はストアデータを、メモリネットワーク１４を介してメモリ１５へ転送する。 When the address conversion unit 13 receives the vector store instruction packet from the scalar processing unit 11 and receives data from the vector processing unit 12, the address conversion unit 13 expands the physical address to the address specified by the base address, the distance, and the number of elements. Convert to. Then, the address conversion unit 13 transfers the converted address or store data to the memory 15 via the memory network 14.

アドレス変換部１３は、スカラプロセッシング部１１からベクトルロード命令の命令パケットを受け取ると、実アドレスへの展開及び物理アドレスへの変換処理を行う。そして、アドレス変換部１３は、変換後のアドレスを、メモリネットワーク１４を介してメモリ１５へ転送する。 When receiving the instruction packet of the vector load instruction from the scalar processing unit 11, the address conversion unit 13 performs expansion to a real address and conversion to a physical address. Then, the address conversion unit 13 transfers the converted address to the memory 15 via the memory network 14.

メモリネットワーク１４は、アドレス変換部１３とメモリ１５との間のデータ転送を行う。 The memory network 14 performs data transfer between the address conversion unit 13 and the memory 15.

メモリ１５は、スカラプロセッシング部１１又はベクトルプロセッシング部１２が読込み又は書き込みを行うデータを記憶する。 The memory 15 stores data that the scalar processing unit 11 or the vector processing unit 12 reads or writes.

（２．実施形態１のスカラプロセッシング部の概要構成）
以下、上記ベクトル処理装置１が有する本実施形態のスカラプロセッシング部の概要構成について、具体的に説明する。
実施形態１のスカラプロセッシング部は、命令パケットの結合及び追い越しを行う。 (2. Outline configuration of scalar processing unit of embodiment 1)
Hereinafter, a schematic configuration of the scalar processing unit of the present embodiment included in the vector processing device 1 will be specifically described.
The scalar processing unit of Embodiment 1 combines and overtakes instruction packets.

スカラプロセッシング部１１は、Ｌ１キャッシュ１１ａと、命令バッファ部１１ｂと、を備える。 The scalar processing unit 11 includes an L1 cache 11a and an instruction buffer unit 11b.

Ｌ１キャッシュ１１ａは、スカラプロセッシング部１１において使用頻度が高いデータを格納する。スカラプロセッシング部１１は、スカラストア命令の実行の際に、Ｌ１キャッシュ１１ａの内容を更新すると同時にメモリ１５にデータを書き込む処理を行う。 The L1 cache 11a stores data frequently used in the scalar processing unit 11. The scalar processing unit 11 updates the contents of the L1 cache 11a and simultaneously writes data to the memory 15 when executing the scalar store instruction.

命令バッファ部１１ｂは、スカラプロセッシング部１１が発行する命令パケットを結合又は追い越し処理を行うためのものである。命令バッファ部１１ｂは、スカラプロセッシング部１１が命令パケットをアドレス変換部１３に送出する前に命令パケットを受け付ける。命令バッファ部１１ｂは、図３に示すように、アドレス比較・バッファ制御部１１ｂ１と、命令パケットバッファ部１１ｂ２と、結合処理部１１ｂ３と、を備え、以下のように構成する。 The instruction buffer unit 11b is for combining or overtaking an instruction packet issued by the scalar processing unit 11. The instruction buffer unit 11 b receives an instruction packet before the scalar processing unit 11 sends the instruction packet to the address conversion unit 13. As shown in FIG. 3, the instruction buffer unit 11b includes an address comparison / buffer control unit 11b1, an instruction packet buffer unit 11b2, and a combination processing unit 11b3, and is configured as follows.

アドレス比較・バッファ制御部１１ｂ１は、命令バッファ部１１ｂが受け付けた命令パケット（以下、「受付命令パケット」という）、及び、命令パケットバッファ部１１ｂ２に保持されている命令パケット（以下、「保持命令パケット」という）の種類を判定する。そして、保持命令パケットのアドレスと受付命令パケットのアドレスとを比較する。ここで、保持命令パケットは、受付命令パケットよりも先に受け付けた命令パケットである。すなわち、アドレス比較・バッファ制御部１１ｂ１は、先行する命令パケット（保持命令パケット）、及び、後続の命令パケット（受付命令パケット）の種類を判定し、両命令パケットのアドレスを比較する。そして、アドレス比較・バッファ制御部１１ｂ１は、受付命令パケット及び保持命令パケットの種類ならびに比較結果に基づいて、先行する命令パケットと後続の命令パケットとの結合指示、又は、命令間の追い越しを行う。 The address comparison / buffer control unit 11b1 includes an instruction packet received by the instruction buffer unit 11b (hereinafter referred to as “accepted instruction packet”) and an instruction packet stored in the instruction packet buffer unit 11b2 (hereinafter referred to as “held instruction packet”). )). Then, the address of the holding instruction packet is compared with the address of the reception instruction packet. Here, the holding instruction packet is an instruction packet received prior to the reception instruction packet. That is, the address comparison / buffer control unit 11b1 determines the type of the preceding instruction packet (holding instruction packet) and the subsequent instruction packet (acceptance instruction packet), and compares the addresses of both instruction packets. Then, the address comparison / buffer control unit 11b1 instructs to combine the preceding instruction packet and the subsequent instruction packet or overtake between instructions based on the types of the reception instruction packet and the holding instruction packet and the comparison result.

命令パケットバッファ部１１ｂ２は、命令バッファ部１１ｂが受け付けた命令パケットを１つ保持するバッファである。 The instruction packet buffer unit 11b2 is a buffer that holds one instruction packet received by the instruction buffer unit 11b.

結合処理部１１ｂ３は、アドレス比較・バッファ制御部１１ｂ１の指示に基づいて、命令パケットを１つに結合させる。 The combination processing unit 11b3 combines instruction packets into one based on an instruction from the address comparison / buffer control unit 11b1.

以下、アドレス比較・バッファ制御部１１ｂ１の具体的な機能について説明する。 Hereinafter, specific functions of the address comparison / buffer control unit 11b1 will be described.

スカラプロセッシング部１１より図２の形式等の命令パケットが生成されると、当該命令パケットはまず命令バッファ部１１ｂに送られる。命令バッファ部１１ｂが命令パケットを受け付けると、アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットが、（Ａ）スカラストア命令パケット、（Ｂ）ベクトルロード／ストア命令パケット、（Ｃ）スカラロード命令パケット、（Ｄ）その他の命令パケット、（Ｅ）有効でない命令パケットのいずれかであるかを判定する。 When the instruction packet of the format of FIG. 2 is generated from the scalar processing unit 11, the instruction packet is first sent to the instruction buffer unit 11b. When the instruction buffer unit 11b receives the instruction packet, the address comparison / buffer control unit 11b1 determines that the received instruction packet includes (A) a scalar store instruction packet, (B) a vector load / store instruction packet, and (C) a scalar load instruction packet. , (D) any other instruction packet, and (E) an invalid instruction packet.

スカラプロセッシング部１１からは常に何らかの信号を含む「命令パケット」が生成され、出力されており、その命令パケットに含まれる信号は、実際に実行すべき命令を示すものである場合と、無視されるべきものである場合とがある。したがって、アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットが「有効である」か否かも判断することとする。ここで、命令パケットが「有効である」とは、命令パケットが、有効ｂｉｔが“１”であり実際に実行されるべき命令の情報を含むことをいう。すなわち、有効な命令パケットには、（Ａ）スカラストア命令、（Ｂ）ベクトルロード／ストア命令、（Ｃ）スカラロード命令、又は（Ｄ）その他の命令の命令パケットが含まれる。一方、パケットが「有効でない」とは、命令パケットが、有効ｂｉｔが“０”であり実行されるべき命令の情報を含まないことをいう。 The scalar processing unit 11 always generates and outputs an “instruction packet” including some signal, and the signal included in the instruction packet is ignored when it indicates an instruction to be actually executed. Sometimes it should be. Therefore, the address comparison / buffer control unit 11b1 also determines whether or not the reception instruction packet is “valid”. Here, the instruction packet is “valid” means that the instruction packet includes information on an instruction to be actually executed with an effective bit “1”. That is, valid instruction packets include (A) scalar store instructions, (B) vector load / store instructions, (C) scalar load instructions, or (D) instruction packets of other instructions. On the other hand, “invalid” packet means that the instruction packet has a valid bit “0” and does not include information on an instruction to be executed.

次に、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットが「有効である」か否かを判断する。 Next, the address comparison / buffer control unit 11b1 determines whether or not the retained instruction packet is “valid”.

受付命令パケットが（Ａ）スカラストア命令パケットであり、保持命令パケットが有効である場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットがスカラストア命令パケットであるか、スカラストア命令パケットであれば受付命令パケットと保持命令パケットのアドレスが一致するか否かを判断する。保持命令パケットがスカラストア命令パケットであって、それらのアドレスが一致する場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットと受付命令パケットとを結合する。アドレス比較・バッファ制御部１１ｂ１は結合後のパケット（以下、「結合命令パケット」という）を、命令パケットバッファ部１１ｂ２に保持する。 When the received instruction packet is (A) a scalar store instruction packet and the hold instruction packet is valid, the address comparison / buffer control unit 11b1 determines whether the hold instruction packet is a scalar store instruction packet or a scalar store instruction packet. For example, it is determined whether the addresses of the reception instruction packet and the holding instruction packet match. If the holding instruction packet is a scalar store instruction packet and the addresses match, the address comparison / buffer control unit 11b1 combines the holding instruction packet and the reception instruction packet. The address comparison / buffer control unit 11b1 holds the combined packet (hereinafter referred to as “combined instruction packet”) in the instruction packet buffer unit 11b2.

保持命令パケットと受付命令パケットとの結合の例を、図４を用いて示す。
図４（ａ）は保持命令パケット、図４（ｂ）は受付命令パケット、図４（ｃ）は保持命令パケット（図４（ａ））と受付命令パケット（図４（ｂ））とを結合した後の命令パケット（結合命令パケット）である。 An example of the combination of the hold command packet and the reception command packet will be described with reference to FIG.
4A is a holding instruction packet, FIG. 4B is a reception instruction packet, and FIG. 4C is a combination of a holding instruction packet (FIG. 4A) and a reception instruction packet (FIG. 4B). This is an instruction packet (combined instruction packet) after being performed.

結合命令パケットの命令コード７０１は、スカラストア命令を示すものであり、命令コード５０１、６０１と同じである。アドレス７０２は、アドレス５０２、６０２と同じである。結合命令パケットのバイトイネーブル７０３は、保持命令パケットのバイトイネーブル５０３と受付命令パケットのバイトイネーブル６０３とのＯＲ処理を行ったものとする。結合命令パケットのストアデータ７０４は、保持命令パケットのストアデータ５０４と、受付命令パケットのストアデータ６０４とが示す有効な方を１バイト単位で選択したものとする。ただし、両方のパケットで有効なバイトが存在した場合には、同じアドレスに対するストア命令が連続しているものと考え、遅く受け取った受付命令パケットのストアデータ６０４を優先する。 The instruction code 701 of the combined instruction packet indicates a scalar store instruction and is the same as the instruction codes 501 and 601. The address 702 is the same as the addresses 502 and 602. Assume that the byte enable 703 of the combined instruction packet is obtained by performing an OR process on the byte enable 503 of the hold instruction packet and the byte enable 603 of the reception instruction packet. Assume that the combined instruction packet store data 704 is obtained by selecting the valid one indicated by the hold instruction packet store data 504 and the accept instruction packet store data 604 in units of 1 byte. However, when valid bytes exist in both packets, it is considered that the store instructions for the same address are consecutive, and the store data 604 of the reception instruction packet received late is given priority.

一方、受付命令パケットが（Ａ）スカラストア命令パケットであり、（ｉ）保持命令パケットが有効でないと判断された場合、（ｉｉ）保持命令パケットがスカラストア命令パケット以外の有効な命令パケットである場合、あるいは、（ｉｉｉ）保持命令パケットがスカラストア命令パケットで、両者のアドレスが一致しない場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットをアドレス変換部１３に出力させ、受付命令パケットであるスカラストア命令パケットを命令パケットバッファ部１１ｂ２に保持させる。 On the other hand, if the received instruction packet is (A) a scalar store instruction packet, and (i) the held instruction packet is determined to be invalid, (ii) the held instruction packet is a valid instruction packet other than the scalar store instruction packet. Or (iii) if the hold instruction packet is a scalar store instruction packet and the addresses do not match, the address comparison / buffer control unit 11b1 causes the address conversion unit 13 to output the hold instruction packet and A scalar store instruction packet is held in the instruction packet buffer unit 11b2.

これにより、同じアドレスへのスカラストア命令を実行する場合、スカラプロセッシング部１１からアドレス変換部１３に発行する命令パケット数を削減することができる。この結果、データ転送帯域を効率化し、メモリネットワーク１４等のデータ転送の負荷を軽減することができる。一方、先行の命令と後続の命令とがスカラストア命令でも、両命令が対象とするアドレスが一致しない場合には、命令パケットの追い越しは行わず、命令が生成された順に命令パケットを発行することができる。 Thereby, when executing a scalar store instruction to the same address, the number of instruction packets issued from the scalar processing unit 11 to the address conversion unit 13 can be reduced. As a result, the data transfer bandwidth can be made efficient and the data transfer load of the memory network 14 and the like can be reduced. On the other hand, even if the preceding instruction and the subsequent instruction are scalar store instructions, if the addresses targeted by both instructions do not match, the instruction packets are not overtaken and the instruction packets are issued in the order in which the instructions were generated. Can do.

受付命令パケットが（Ｂ）ベクトルロード／ストア命令パケットであり、保持命令パケットが有効である場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットがスカラストア命令パケットであるか、スカラストア命令パケットであれば受付命令パケットと保持命令パケットのアドレスが重複するか否かを判断する。受付命令パケットが（Ｂ）ベクトルロード／ストア命令パケットであり、保持命令パケットがスカラストア命令パケットであって、それらのアドレスが重複しない場合、あるいは、受付命令パケットが（Ｂ）ベクトルロード／ストア命令パケットであって、保持命令パケットが有効でない場合、アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットであるベクトルロード／ストア命令パケットを、保持命令パケットを追い越して、アドレス変換部１３に出力する。 When the received instruction packet is (B) vector load / store instruction packet and the hold instruction packet is valid, the address comparison / buffer control unit 11b1 determines whether the hold instruction packet is a scalar store instruction packet or a scalar store instruction packet. If so, it is determined whether the addresses of the reception instruction packet and the holding instruction packet overlap. When the reception instruction packet is (B) vector load / store instruction packet and the holding instruction packet is a scalar store instruction packet and their addresses do not overlap, or the reception instruction packet is (B) vector load / store instruction When the hold instruction packet is not valid, the address comparison / buffer control unit 11b1 outputs the vector load / store instruction packet, which is the reception instruction packet, to the address conversion unit 13 by overtaking the hold instruction packet.

一方、受付命令パケットが（Ｂ）ベクトルロード／ストア命令パケットであり、保持命令パケットがスカラストア命令パケットであって、それらのアドレスが重複する場合、あるいは、受付命令パケットが（Ｂ）ベクトルロード／ストア命令パケットであり、保持命令パケットが有効であるがスカラストア命令パケットではない場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットをアドレス変換部１３へ出力し、受付命令パケット（ベクトルロード／ストア命令パケット）を命令パケットバッファ部１１ｂ２に保持させる。 On the other hand, if the reception instruction packet is (B) vector load / store instruction packet and the holding instruction packet is a scalar store instruction packet and their addresses overlap, or the reception instruction packet is (B) vector load / store If it is a store instruction packet and the hold instruction packet is valid but not a scalar store instruction packet, the address comparison / buffer control unit 11b1 outputs the hold instruction packet to the address conversion unit 13, and receives a reception instruction packet (vector load / Store the instruction packet) in the instruction packet buffer unit 11b2.

これにより、アドレスが重複しなければ、スカラストア命令よりもベクトルロード／ストア命令を優先的に実行する追い越し制御を行うことができる。 Thus, if the addresses do not overlap, overtaking control can be performed in which the vector load / store instruction is executed with priority over the scalar store instruction.

受付命令パケットが（Ｃ）スカラロード命令パケットであり、保持命令パケットが有効である場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットがスカラストア命令パケットであるか、スカラストア命令パケットであれば受付命令パケットと保持命令パケットのアドレスが一致するか否かを判断する。受付命令パケットが（Ｃ）スカラロード命令パケットであり、保持命令パケットがスカラストア命令パケットであって、それらのアドレスが一致しない場合、あるいは、受付命令パケットが（Ｃ）スカラロード命令パケットであって、保持命令パケットが有効でない場合、アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットであるスカラロード命令パケットを、保持命令パケットを追い越して、アドレス変換部１３に出力する。 When the received instruction packet is (C) a scalar load instruction packet and the hold instruction packet is valid, the address comparison / buffer control unit 11b1 determines whether the hold instruction packet is a scalar store instruction packet or a scalar store instruction packet. For example, it is determined whether the addresses of the reception instruction packet and the holding instruction packet match. When the reception instruction packet is (C) a scalar load instruction packet and the holding instruction packet is a scalar store instruction packet and their addresses do not match, or the reception instruction packet is (C) a scalar load instruction packet When the hold instruction packet is not valid, the address comparison / buffer control unit 11b1 outputs the scalar load instruction packet, which is the reception instruction packet, to the address conversion unit 13 by overtaking the hold instruction packet.

一方、受付命令パケットが（Ｃ）スカラロード命令パケットであり、保持命令パケットがスカラストア命令パケットであって、アドレスが一致する場合、あるいは、受付命令パケットが（Ｃ）スカラロード命令パケットであり、保持命令パケットが有効であるがスカラストア命令パケットでない場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットをアドレス変換部１３に出力し、受付命令パケット（スカラロード命令パケット）を命令パケットバッファ部１１ｂ２に保持させる。 On the other hand, if the reception instruction packet is (C) a scalar load instruction packet and the holding instruction packet is a scalar store instruction packet and the addresses match, or the reception instruction packet is (C) a scalar load instruction packet, When the hold instruction packet is valid but not the scalar store instruction packet, the address comparison / buffer control unit 11b1 outputs the hold instruction packet to the address conversion unit 13, and receives the reception instruction packet (scalar load instruction packet). 11b2.

これにより、アドレスが一致しなければ、スカラストア命令よりもスカラロード命令を優先的に実行する追い越し制御を行うことができる。 Thereby, if the addresses do not match, overtaking control can be performed in which the scalar load instruction is executed with priority over the scalar store instruction.

受付命令パケットが（Ｄ）その他の命令パケットであり、保持命令パケットが有効である場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットを、アドレス変換部１３に出力し、受付命令パケット（その他の命令パケット）を命令パケットバッファ部１１ｂ２に保持させる。 When the received command packet is (D) other command packet and the held command packet is valid, the address comparison / buffer control unit 11b1 outputs the held command packet to the address converting unit 13, and receives the received command packet (others). Instruction packet) is held in the instruction packet buffer unit 11b2.

一方、受付命令パケットが（Ｄ）その他の命令パケットであり、保持命令パケットが有効でない場合、アドレス比較・バッファ制御部１１ｂ１は、アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットをそのままアドレス変換部１３へ出力する。 On the other hand, when the received command packet is (D) other command packet and the held command packet is not valid, the address comparison / buffer control unit 11b1 uses the received command packet as it is as the address conversion unit. 13 to output.

これにより、有効なパケットであるその他の命令パケットは、有効でないパケットとよりも優先させて出力することができる。 As a result, other command packets that are valid packets can be output with priority over packets that are not valid.

受付命令パケットが（Ｅ）有効でない命令パケットであり、保持命令パケットが有効である場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットがスカラストア命令パケットであるか否かを判断する。受付命令パケットが（Ｅ）有効でない命令パケットであり、保持命令パケットがスカラストア命令パケットである場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットを命令パケットバッファ部１１ｂ２に保持させたまま、受付命令パケットをそのままアドレス変換部１３へ出力する。 When the received instruction packet is (E) an invalid instruction packet and the retained instruction packet is valid, the address comparison / buffer control unit 11b1 determines whether or not the retained instruction packet is a scalar store instruction packet. When the received instruction packet is (E) an invalid instruction packet and the retained instruction packet is a scalar store instruction packet, the address comparison / buffer control unit 11b1 holds the retained instruction packet in the instruction packet buffer unit 11b2. The reception instruction packet is output to the address conversion unit 13 as it is.

一方、受付命令パケットが（Ｅ）有効でない命令パケットであり、保持命令パケットがスカラストア命令パケット以外の有効な命令パケットである場合、あるいは、受付命令パケットが（Ｅ）有効でない命令パケットであり、保持命令パケットも有効でない命令パケットである場合、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットをアドレス変換部１３に出力し、受付命令パケット（有効でない命令パケット）を命令パケットバッファ部１１ｂ２に保持させる。 On the other hand, when the reception instruction packet is (E) an invalid instruction packet and the holding instruction packet is an invalid instruction packet other than a scalar store instruction packet, or the reception instruction packet is (E) an invalid instruction packet, If the retained instruction packet is also an invalid instruction packet, the address comparison / buffer control unit 11b1 outputs the retained instruction packet to the address conversion unit 13, and retains the received instruction packet (invalid instruction packet) in the instruction packet buffer unit 11b2. Let

これにより、後続に有効なパケットが来ない場合、命令パケットバッファ部１１ｂ２に命令パケットが滞留してしまうことを防ぐことができる。 Thereby, it is possible to prevent the instruction packet from staying in the instruction packet buffer unit 11b2 when a valid packet does not follow.

（３．実施形態１の命令バッファ部の動作）
以下、結合処理及び追い越し制御処理を実行する命令バッファ部の動作について説明する。ベクトル処理装置１に電源が投入されると、命令バッファ部１１ｂは、図５のフローチャートに示す命令パケット制御処理を開始する。 (3. Operation of instruction buffer unit of embodiment 1)
The operation of the instruction buffer unit that executes the combination process and the overtaking control process will be described below. When the vector processor 1 is turned on, the instruction buffer unit 11b starts an instruction packet control process shown in the flowchart of FIG.

アドレス比較・バッファ制御部１１ｂ１は、命令パケットを受け付けたか否かを判断する（ステップＳ１１）。命令パケットを受け付けたと判断された場合（ステップＳ１１；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、受け付けた命令パケット（受付命令パケット）の種類を判定する（ステップＳ１２）。一方、命令パケットを受け付けていないと判断された場合（ステップＳ１１；Ｎｏ）、そのまま待機する。 The address comparison / buffer control unit 11b1 determines whether or not an instruction packet has been received (step S11). If it is determined that the instruction packet has been received (step S11; Yes), the address comparison / buffer control unit 11b1 determines the type of the received instruction packet (accepted instruction packet) (step S12). On the other hand, when it is determined that the instruction packet has not been received (step S11; No), the process waits as it is.

アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットが「スカラストア命令パケット」であると判断した場合（ステップＳ１２；スカラストア命令パケット）、スカラストア命令パケット制御処理（図６）を開始する（ステップＳ１３）。
アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットが「ベクトルロード／ストア命令パケット」であると判断した場合（ステップＳ１２；ベクトルロード／ストア命令パケット）、ベクトルロード／ストア命令パケット制御処理（図７）を開始する（ステップＳ１４）。
アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットが「スカラロード命令パケット」であると判断した場合（ステップＳ１２；スカラロード命令パケット）、スカラロード命令パケット制御処理（図８）を開始する（ステップＳ１５）。
アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットが「その他の命令パケット」であると判断した場合（ステップＳ１２；その他の命令パケット）、その他の命令パケット制御処理（図９）を開始する（ステップＳ１６）。
アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットが「有効でない命令パケット」であると判断した場合（ステップＳ１２；有効でない命令パケット）、有効でない命令パケットの制御処理（図１０）を開始する（ステップＳ１７）。 When the address comparison / buffer control unit 11b1 determines that the received instruction packet is a “scalar store instruction packet” (step S12; a scalar store instruction packet), it starts a scalar store instruction packet control process (FIG. 6) (step S12). S13).
When the address comparison / buffer control unit 11b1 determines that the received instruction packet is a “vector load / store instruction packet” (step S12; vector load / store instruction packet), vector load / store instruction packet control processing (FIG. 7). ) Is started (step S14).
When the address comparison / buffer control unit 11b1 determines that the received instruction packet is a “scalar load instruction packet” (step S12; a scalar load instruction packet), it starts a scalar load instruction packet control process (FIG. 8) (step 8). S15).
When the address comparison / buffer control unit 11b1 determines that the received instruction packet is “other instruction packet” (step S12; other instruction packet), the other instruction packet control process (FIG. 9) is started (step 9). S16).
If the address comparison / buffer control unit 11b1 determines that the received instruction packet is an “invalid instruction packet” (step S12; invalid instruction packet), it starts control processing (FIG. 10) of the invalid instruction packet (FIG. 10). Step S17).

（３．１スカラストア命令パケット制御処理）
アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットが有効か否かを判断する（ステップＳ１３１）。保持命令パケットが有効であると判断された場合（ステップＳ１３１；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットがスカラストア命令パケットか否かを判断する（ステップＳ１３２）。一方、保持命令パケットが有効でないと判断された場合（ステップＳ１３１；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、現在保持されている有効でない命令パケットをアドレス変換部１３に出力し、受付命令パケット（スカラストア命令パケット）を命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１３５）。 (3.1 Scalar store instruction packet control processing)
The address comparison / buffer control unit 11b1 determines whether or not the retained instruction packet is valid (step S131). If it is determined that the hold instruction packet is valid (step S131; Yes), the address comparison / buffer control unit 11b1 determines whether the hold instruction packet is a scalar store instruction packet (step S132). On the other hand, when it is determined that the retained instruction packet is not valid (step S131; No), the address comparison / buffer control unit 11b1 outputs the currently retained invalid instruction packet to the address conversion unit 13, and receives the received instruction packet. (Scalar store instruction packet) is held in the instruction packet buffer unit 11b2 (step S135).

ステップＳ１３２において、保持命令パケットがスカラストア命令パケットであると判断された場合（ステップＳ１３２；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケット（先行するスカラストア命令パケット）のアドレスと受付命令パケット（後続のスカラストア命令パケット）のアドレスとを比較する（ステップＳ１３３）。 If it is determined in step S132 that the holding instruction packet is a scalar store instruction packet (step S132; Yes), the address comparison / buffer control unit 11b1 accepts the address of the holding instruction packet (preceding scalar store instruction packet). The address of the instruction packet (subsequent scalar store instruction packet) is compared (step S133).

一方、ステップＳ１３２において、保持命令パケットがスカラストア命令パケット以外の有効な命令パケットであると判断された場合（ステップＳ１３２；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットをアドレス変換部１３へ出力し、受付命令パケット（スカラストア命令パケット）を命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１３５）。すなわち、受付命令パケットがスカラストア命令パケットであり、保持命令パケットがスカラストア命令パケット以外の有効な命令パケットである場合、アドレス比較・バッファ制御部１１ｂ１は、追い越し制御を行わず、命令パケットが生成された順に、命令パケットをアドレス変換部１３に出力する。 On the other hand, when it is determined in step S132 that the held instruction packet is a valid instruction packet other than the scalar store instruction packet (step S132; No), the address comparison / buffer control unit 11b1 converts the held instruction packet into an address conversion unit. The received instruction packet (scalar store instruction packet) is held in the instruction packet buffer unit 11b2 (step S135). That is, when the received instruction packet is a scalar store instruction packet and the held instruction packet is a valid instruction packet other than the scalar store instruction packet, the address comparison / buffer control unit 11b1 generates an instruction packet without performing overtaking control. The instruction packets are output to the address conversion unit 13 in the order in which they are performed.

ステップＳ１３３において、保持命令パケットのアドレスと受付命令パケットのアドレスとが一致すると判断された場合（ステップＳ１３３；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットと受付命令パケットとを結合し、結合後の命令パケットを命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１３４）。例えば、保持命令パケットが図４（ａ）に示す命令パケットであり、受付命令パケットが図４（ｂ）に示す命令パケットである場合、アドレス比較・バッファ制御部１１ｂ１は、それらを結合して図４（ｃ）に示す命令パケットを生成する。そして、アドレス比較・バッファ制御部１１ｂ１は、生成した当該命令パケットを命令パケットバッファ部１１ｂ２に保持させる。 If it is determined in step S133 that the address of the holding instruction packet matches the address of the reception instruction packet (step S133; Yes), the address comparison / buffer control unit 11b1 combines the holding instruction packet and the reception instruction packet. The combined instruction packet is held in the instruction packet buffer unit 11b2 (step S134). For example, when the holding instruction packet is the instruction packet shown in FIG. 4A and the reception instruction packet is the instruction packet shown in FIG. 4B, the address comparison / buffer control unit 11b1 combines them to The instruction packet shown in 4 (c) is generated. Then, the address comparison / buffer control unit 11b1 holds the generated instruction packet in the instruction packet buffer unit 11b2.

一方、ステップＳ１３３において、アドレスが一致しないと判断された場合（ステップＳ１３３；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケット（先行するスカラストア命令パケット）をアドレス変換部１３へ出力し、受付命令パケット（後続のスカラストア命令パケット）を命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１３５）。 On the other hand, if it is determined in step S133 that the addresses do not match (step S133; No), the address comparison / buffer control unit 11b1 outputs the retained instruction packet (the preceding scalar store instruction packet) to the address conversion unit 13. The reception instruction packet (subsequent scalar store instruction packet) is held in the instruction packet buffer unit 11b2 (step S135).

ステップＳ１３４又はステップＳ１３５の処理の後、アドレス比較バッファ制御部１３は、スカラストア命令パケット制御処理を終了する。 After the process of step S134 or step S135, the address comparison buffer control unit 13 ends the scalar store instruction packet control process.

（３．２ベクトルロード／ストア命令パケット制御処理）
アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットが有効か否かを判断する（ステップＳ１４１）。保持命令パケットが有効であると判断された場合（ステップＳ１４１；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットがスカラストア命令パケットか否かを判断する（ステップＳ１４２）。一方、保持命令パケットが有効でないと判断された場合（ステップＳ１４１；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、受付命令パケット（ベクトルロード／ストア命令パケット）をそのままアドレス変換部１３へ出力する（ステップＳ１４５）。 (3.2 Vector load / store instruction packet control processing)
The address comparison / buffer control unit 11b1 determines whether or not the hold instruction packet is valid (step S141). If it is determined that the hold instruction packet is valid (step S141; Yes), the address comparison / buffer control unit 11b1 determines whether the hold instruction packet is a scalar store instruction packet (step S142). On the other hand, when it is determined that the hold instruction packet is not valid (step S141; No), the address comparison / buffer control unit 11b1 outputs the reception instruction packet (vector load / store instruction packet) to the address conversion unit 13 as it is ( Step S145).

ステップＳ１４２において、保持命令パケットがスカラストア命令パケットであると判断された場合（ステップＳ１４２；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケット（スカラストア命令パケット）のアドレスと受付命令パケット（ベクトルロード／ストア命令パケット）のアドレスとを比較する（ステップＳ１４３）。 If it is determined in step S142 that the holding instruction packet is a scalar store instruction packet (step S142; Yes), the address comparison / buffer control unit 11b1 determines the address of the holding instruction packet (scalar store instruction packet) and the received instruction packet. The address of (vector load / store instruction packet) is compared (step S143).

一方、ステップＳ１４２において、保持命令パケットがスカラストア命令パケット以外の有効な命令パケットであると判断された場合（ステップＳ１４２；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットをアドレス変換部１３へ出力し、受付命令パケット（ベクトルロード／ストア命令パケット）を命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１４４）。すなわち、受付命令パケットがベクトルロード／ストア命令パケットであり、保持命令パケットが、スカラストア命令パケット以外の有効な命令パケットである場合、アドレス比較・バッファ制御部１１ｂ１は、追い越し制御を行わず、命令パケットが生成された順に、命令パケットをアドレス変換部１３に出力する。 On the other hand, when it is determined in step S142 that the held instruction packet is a valid instruction packet other than the scalar store instruction packet (step S142; No), the address comparison / buffer control unit 11b1 converts the held instruction packet into an address conversion unit. The received instruction packet (vector load / store instruction packet) is held in the instruction packet buffer unit 11b2 (step S144). That is, when the reception instruction packet is a vector load / store instruction packet and the holding instruction packet is a valid instruction packet other than the scalar store instruction packet, the address comparison / buffer control unit 11b1 does not perform overtaking control, The instruction packets are output to the address translation unit 13 in the order in which the packets are generated.

ステップＳ１４３において、保持命令パケットのアドレスと、受付命令パケットのアドレス、すなわち、ベースアドレスを基準とした要素数個のアドレスとが重複する場合（ステップＳ１４３；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケット（スカラストア命令パケット）をアドレス変換部１３へ出力し、受付命令パケット（ベクトルロード／ストア命令パケット）を命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１４４）。 In step S143, when the address of the holding instruction packet and the address of the reception instruction packet, that is, the number of elements based on the base address overlap (step S143; Yes), the address comparison / buffer control unit 11b1 The held instruction packet (scalar store instruction packet) is output to the address conversion unit 13, and the reception instruction packet (vector load / store instruction packet) is held in the instruction packet buffer unit 11b2 (step S144).

一方、ステップＳ１４３において、アドレスが重複しないと判断された場合（ステップＳ１４３；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケット（スカラストア命令パケット）をそのまま保持させ、受付命令パケット（ベクトルロード／ストア命令パケット）を優先してアドレス変換部１３へ出力する（ステップＳ１４５）。 On the other hand, if it is determined in step S143 that the addresses do not overlap (step S143; No), the address comparison / buffer control unit 11b1 holds the held instruction packet (scalar store instruction packet) as it is, and receives the received instruction packet (vector The load / store instruction packet) is prioritized and output to the address translation unit 13 (step S145).

ステップＳ１４４及びステップＳ１４５の処理の後、アドレス比較バッファ制御部１３は、ベクトルロード／ストア命令パケット制御処理を終了する。 After the processes in steps S144 and S145, the address comparison buffer control unit 13 ends the vector load / store instruction packet control process.

（３．３スカラロード命令パケット制御処理）
アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットが有効か否かを判断する（ステップＳ１５１）。保持命令パケットが有効であると判断された場合（ステップＳ１５１；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットがスカラストア命令パケットか否かを判断する（ステップＳ１５２）。一方、保持命令パケットが有効でないと判断された場合（ステップＳ１５１；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、受付命令パケット（スカラロード命令パケット）をそのままアドレス変換部１３へ出力する（ステップＳ１５５）。 (3.3 Scalar load instruction packet control processing)
The address comparison / buffer control unit 11b1 determines whether or not the retained instruction packet is valid (step S151). If it is determined that the hold instruction packet is valid (step S151; Yes), the address comparison / buffer control unit 11b1 determines whether the hold instruction packet is a scalar store instruction packet (step S152). On the other hand, when it is determined that the hold instruction packet is not valid (step S151; No), the address comparison / buffer control unit 11b1 outputs the reception instruction packet (scalar load instruction packet) as it is to the address conversion unit 13 (step S155). ).

ステップＳ１５２において、保持命令パケットがスカラストア命令パケットであると判断された場合（ステップＳ１５２；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケット（スカラストア命令パケット）のアドレスと受付命令パケット（スカラロード命令パケット）のアドレスとを比較する（ステップＳ１５３）。 If it is determined in step S152 that the holding instruction packet is a scalar store instruction packet (step S152; Yes), the address comparison / buffer control unit 11b1 determines the address of the holding instruction packet (scalar store instruction packet) and the received instruction packet. The address of the (scalar load instruction packet) is compared (step S153).

一方、ステップＳ１５２において、保持命令パケットがスカラストア命令パケット以外の有効な命令パケットであると判断された場合（ステップＳ１５２；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットをアドレス変換部１３へ出力し、受付命令パケット（スカラロード命令パケット）を命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１５４）。すなわち、受付命令パケットがスカラロード命令パケットであり、保持命令パケットがスカラストア命令パケット以外の有効な命令パケットである場合、アドレス比較・バッファ制御部１１ｂ１は、追い越し制御を行わず、命令パケットが生成された順に、命令パケットをアドレス変換部１３に出力する。 On the other hand, when it is determined in step S152 that the held instruction packet is a valid instruction packet other than the scalar store instruction packet (step S152; No), the address comparison / buffer control unit 11b1 converts the held instruction packet into an address conversion unit. 13, the reception instruction packet (scalar load instruction packet) is held in the instruction packet buffer unit 11b2 (step S154). That is, when the received instruction packet is a scalar load instruction packet and the held instruction packet is a valid instruction packet other than a scalar store instruction packet, the address comparison / buffer control unit 11b1 does not perform overtaking control and generates an instruction packet. The instruction packets are output to the address conversion unit 13 in the order in which they are performed.

ステップＳ１５３において、保持命令パケットのアドレスと受付命令パケットのアドレスとが一致すると判断された場合（ステップＳ１５３；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケット（スカラストア命令パケット）をアドレス変換部１３へ出力し、受付命令パケット（スカラロード命令パケット）を命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１５４）。 If it is determined in step S153 that the address of the holding instruction packet matches the address of the receiving instruction packet (step S153; Yes), the address comparison / buffer control unit 11b1 addresses the holding instruction packet (scalar store instruction packet). The received instruction packet (scalar load instruction packet) is output to the conversion unit 13 and held in the instruction packet buffer unit 11b2 (step S154).

一方、ステップＳ１５３において、アドレスが一致しないと判断された場合（ステップＳ１５３；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケット（スカラストア命令パケット）をそのまま保持させ、受付命令パケット（スカラロード命令パケット）を優先してアドレス変換部１３へ出力する（ステップＳ１５５）。 On the other hand, if it is determined in step S153 that the addresses do not match (step S153; No), the address comparison / buffer control unit 11b1 holds the held instruction packet (scalar store instruction packet) as it is, and receives the received instruction packet (scalar). The load command packet) is prioritized and output to the address translation unit 13 (step S155).

ステップＳ１５４及びステップＳ１５５の処理の後、アドレス比較バッファ制御部１３は、スカラロード命令パケット制御処理を終了する。 After the processes of step S154 and step S155, the address comparison buffer control unit 13 ends the scalar load instruction packet control process.

（３．４その他の命令パケット制御処理）
アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットが有効か否かを判断する（ステップＳ１６１）。保持命令パケットが有効であると判断された場合（ステップＳ１６１；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットをアドレス変換部１３へ出力し、受付命令パケット（その他の命令パケット）を命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１６２）。一方、保持命令パケットが有効でないと判断された場合（ステップＳ１６１；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットをそのままアドレス変換部１３へ出力する（ステップＳ１６３）。 (3.4 Other instruction packet control processing)
The address comparison / buffer control unit 11b1 determines whether or not the retained instruction packet is valid (step S161). When it is determined that the hold command packet is valid (step S161; Yes), the address comparison / buffer control unit 11b1 outputs the hold command packet to the address conversion unit 13, and receives the reception command packet (other command packets). The instruction packet buffer unit 11b2 holds the instruction packet (step S162). On the other hand, if it is determined that the retained instruction packet is not valid (step S161; No), the address comparison / buffer control unit 11b1 outputs the received instruction packet as it is to the address conversion unit 13 (step S163).

ステップＳ１６２及びステップＳ１６３の処理の後、アドレス比較バッファ制御部１３は、その他の命令パケット制御処理を終了する。 After the processing of step S162 and step S163, the address comparison buffer control unit 13 ends the other instruction packet control processing.

（３．５有効でない命令パケットの制御処理）
アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットが有効か否かを判断する（ステップＳ１７１）。保持命令パケットが有効であると判断された場合（ステップＳ１７１；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットがスカラストア命令パケットか否かを判断する（ステップＳ１７２）。一方、保持命令パケットが有効でないと判断された場合（ステップＳ１７１；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、現在保持されている有効でない命令パケットをアドレス変換部１３に出力し、受付命令パケット（有効でない命令パケット）を命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１７４）。 (3.5 Control processing of invalid instruction packet)
The address comparison / buffer control unit 11b1 determines whether or not the retained instruction packet is valid (step S171). If it is determined that the hold instruction packet is valid (step S171; Yes), the address comparison / buffer control unit 11b1 determines whether the hold instruction packet is a scalar store instruction packet (step S172). On the other hand, when it is determined that the retained instruction packet is not valid (step S171; No), the address comparison / buffer control unit 11b1 outputs the currently retained invalid instruction packet to the address conversion unit 13, and receives the received instruction packet. (Invalid instruction packet) is held in the instruction packet buffer unit 11b2 (step S174).

ステップＳ１７２において、保持命令パケットがスカラストア命令パケットであると判断された場合（ステップＳ１７２；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケット（スカラストア命令パケット）を命令パケットバッファ部１１ｂ２に保持させたまま、受付命令パケット（有効でない命令パケット）をそのままアドレス変換部１３へ出力する（ステップＳ１７３）。 If it is determined in step S172 that the holding instruction packet is a scalar store instruction packet (step S172; Yes), the address comparison / buffer control unit 11b1 converts the holding instruction packet (scalar store instruction packet) into the instruction packet buffer unit 11b2. The received instruction packet (invalid instruction packet) is output to the address conversion unit 13 as it is (step S173).

一方、ステップＳ１７２において、保持命令パケットがスカラストア命令パケット以外の有効な命令パケットであると判断された場合（ステップＳ１７２；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットをアドレス変換部１３へ出力し、受付命令パケット（有効でない命令パケット）を命令パケットバッファ部１１ｂ２に保持させる（ステップＳ１７４）。 On the other hand, when it is determined in step S172 that the held instruction packet is a valid instruction packet other than the scalar store instruction packet (step S172; No), the address comparison / buffer control unit 11b1 converts the held instruction packet into an address conversion unit. The received instruction packet (invalid instruction packet) is held in the instruction packet buffer unit 11b2 (step S174).

ステップＳ１７３及びステップＳ１７４の処理の後、アドレス比較バッファ制御部１３は、有効でない命令パケットの制御処理を終了する。 After the processes in steps S173 and S174, the address comparison buffer control unit 13 ends the control process for the invalid instruction packet.

本実施形態によれば、アドレス変換部に発行される命令パケット数を削減することができ、データ転送の負荷を軽減することができる。ストア命令のアドレスとロード命令のアドレスとが重ならなければ、ロード命令を追い越させて優先的に実行することができ、ベクトル処理装置の処理速度を高めることができる。 According to the present embodiment, the number of instruction packets issued to the address translation unit can be reduced, and the data transfer load can be reduced. If the address of the store instruction and the address of the load instruction do not overlap, the load instruction can be overtaken and executed preferentially, and the processing speed of the vector processing device can be increased.

（４．実施形態２のベクトル処理装置の概要構成）
実施形態２のベクトル処理装置１は、図１に示すように、ベクトル処理装置１は、スカラプロセッシング部１１と、ベクトルプロセッシング部１２と、アドレス変換部１３と、メモリネットワーク１４と、メモリ１５と、から構成される。ベクトルプロセッシング部１２、アドレス変換部１３、メモリネットワーク１４、及び、メモリ１５は、実施形態１のものと同様の機能を有する。 (4. Outline Configuration of Vector Processing Device of Embodiment 2)
As shown in FIG. 1, the vector processing device 1 according to the second embodiment includes a scalar processing unit 11, a vector processing unit 12, an address conversion unit 13, a memory network 14, a memory 15, Consists of The vector processing unit 12, the address conversion unit 13, the memory network 14, and the memory 15 have the same functions as those in the first embodiment.

（５．実施形態２のスカラプロセッシング部の概要構成）
実施形態２のスカラプロセッシング部は、実施形態１の機能に加え、命令パケットが一定時間以上保持された場合、比較する命令パケットが存在しなくても保持命令パケットを出力する機能を有するものである。
以下、実施形態１のベクトル処理装置と異なる機能を実現するアドレス比較・バッファ制御部１１ｂ１について説明する。 (5. Outline configuration of scalar processing unit of embodiment 2)
In addition to the functions of the first embodiment, the scalar processing unit of the second embodiment has a function of outputting a retained instruction packet even when there is no instruction packet to be compared when the instruction packet is retained for a certain period of time. .
The address comparison / buffer control unit 11b1 that implements functions different from those of the vector processing apparatus of the first embodiment will be described below.

アドレス比較・バッファ制御部１１ｂ１は、カウンタ（図示せず）を備え、命令バッファ部１１ｂが命令パケットを受け付けない期間、命令パケットバッファ部１１ｂ２を次のように制御する。 The address comparison / buffer control unit 11b1 includes a counter (not shown), and controls the instruction packet buffer unit 11b2 as follows during a period in which the instruction buffer unit 11b does not accept an instruction packet.

アドレス比較・バッファ制御部１１ｂ１は、カウンタにより命令パケットバッファ部１１ｂ２に命令パケットを保持させてからの経過時間を計測する。そして、アドレス比較・バッファ制御部１１ｂ１は、経過時間が所定の時間を経過した場合、保持命令パケットをアドレス変換部１３へ出力させる。一方、経過時間が所定の閾値を超えない場合、アドレス比較・バッファ制御部１１ｂ１は、命令パケットバッファ部１１ｂ２に命令パケットを保持させたまま待機する。 The address comparison / buffer control unit 11b1 measures an elapsed time after the instruction packet is held in the instruction packet buffer unit 11b2 by a counter. Then, the address comparison / buffer control unit 11b1 outputs the retained instruction packet to the address conversion unit 13 when the predetermined time has elapsed. On the other hand, when the elapsed time does not exceed the predetermined threshold, the address comparison / buffer control unit 11b1 waits while holding the instruction packet in the instruction packet buffer unit 11b2.

これにより、命令パケットバッファ部１１ｂ１に命令パケットが滞留して、ベクトル処理装置１全体の処理が滞ることを防ぐことができる。また、ストア命令の処理にかかる時間を、ある一定の値以下にすることを保証することが可能となる。 As a result, it is possible to prevent the instruction packet from staying in the instruction packet buffer unit 11b1 and delaying the processing of the entire vector processing apparatus 1. In addition, it is possible to ensure that the time required for processing the store instruction is below a certain value.

（６．実施形態２の命令バッファ部の動作）
以下、本実施形態の命令バッファ部の各部が行う動作について説明する。ベクトル処理装置１に電源が投入されると、命令バッファ部１１ｂは、図１１のフローチャートに示す命令パケット制御処理を開始する。なお、図１１のフローチャートにおいて、図５のフローチャートと同じステップ番号が付されているステップは、図５のフローチャートにおける処理と同様の処理を行う。以下、異なる処理を行うステップについて説明する。 (6. Operation of instruction buffer unit of embodiment 2)
Hereinafter, operations performed by each unit of the instruction buffer unit of the present embodiment will be described. When the vector processing apparatus 1 is powered on, the instruction buffer unit 11b starts an instruction packet control process shown in the flowchart of FIG. In the flowchart of FIG. 11, steps having the same step numbers as those in the flowchart of FIG. 5 perform the same processes as those in the flowchart of FIG. Hereinafter, steps for performing different processes will be described.

ステップＳ１１において、命令パケットを受け付けていないと判断された場合（ステップＳ１１；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、命令パケットバッファ部１１ｂ２に命令パケットを保持させてから所定の時間が経過したか否かを判断する（ステップＳ２７）。所定の時間経過したと判断された場合（ステップＳ２７；Ｙｅｓ）、アドレス比較・バッファ制御部１１ｂ１は、保持命令パケットをアドレス変換部１３へ出力する（ステップＳ２８）。そして、ステップＳ１１に戻る。 If it is determined in step S11 that the instruction packet has not been received (step S11; No), the address comparison / buffer control unit 11b1 has passed the predetermined time since the instruction packet buffer unit 11b2 held the instruction packet. Whether or not (step S27). If it is determined that the predetermined time has elapsed (step S27; Yes), the address comparison / buffer control unit 11b1 outputs the retained instruction packet to the address conversion unit 13 (step S28). Then, the process returns to step S11.

一方、所定の時間を経過していないと判断された場合（ステップＳ２７；Ｎｏ）、アドレス比較・バッファ制御部１１ｂ１は、ステップＳ１１に戻り、引き続き、命令パケットを受け付けたか否か判断する。 On the other hand, if it is determined that the predetermined time has not elapsed (step S27; No), the address comparison / buffer control unit 11b1 returns to step S11 and subsequently determines whether or not an instruction packet has been received.

本実施形態によれば、命令パケットが長期間、バッファに滞留してしまうのを防ぐことができる。 According to this embodiment, it is possible to prevent the instruction packet from staying in the buffer for a long time.

なお、上記実施形態において、ベクトル処理装置は上記の構成に限られるものではない。例えば、スカラプロセッシング部、ベクトルプロセッシング部、メモリ等は複数備えられるようにしてもよい。また、本実施形態では、アドレス変換部を介してスカラプロセッシング部からベクトルプロセッシング部へ命令パケットが送出されているが、これに限らず、スカラプロセッシング部からベクトルプロセッシング部へ直接、命令パケットを送出するようにしてもよい。また、各命令パケットの形式は、実施形態に示したものに限らない。例えば、ベクトル処理装置が、スカラプロセッシング部を複数有する場合、命令パケットを発行したスカラプロセッシング部を識別する識別子等を命令パケットに含めるようにしてもよい。 In the above embodiment, the vector processing apparatus is not limited to the above configuration. For example, a plurality of scalar processing units, vector processing units, memories, etc. may be provided. In this embodiment, the instruction packet is transmitted from the scalar processing unit to the vector processing unit via the address conversion unit. However, the present invention is not limited to this, and the instruction packet is transmitted directly from the scalar processing unit to the vector processing unit. You may do it. The format of each instruction packet is not limited to that shown in the embodiment. For example, when the vector processing apparatus has a plurality of scalar processing units, an identifier for identifying the scalar processing unit that issued the instruction packet may be included in the instruction packet.

また、上記実施形態において、アドレス比較・バッファ制御部１１ｂ１は、有効でない命令パケットをアドレス変換部１３へ出力しなくてもよい。あるいは、アドレス比較・バッファ制御部１１ｂ１は、有効でない命令パケットを、どのような順序でも出力することができる。例えば、ステップＳ１３５、Ｓ１７４において、アドレス比較・バッファ制御部１１ｂ１は、保持されている有効でない命令パケットをアドレス変換部１３に出力せず、受付命令パケットを命令パケットバッファ部１１ｂ２に上書きしてしまうこととしてもよい。また、ステップＳ１７３において、アドレス比較・バッファ制御部１１ｂ１は、受付命令パケットである有効でない命令パケットをアドレス変換部１３に出力しないこととしてもよい。 In the above-described embodiment, the address comparison / buffer control unit 11 b 1 may not output an invalid instruction packet to the address conversion unit 13. Alternatively, the address comparison / buffer control unit 11b1 can output invalid instruction packets in any order. For example, in steps S135 and S174, the address comparison / buffer control unit 11b1 does not output the retained invalid instruction packet to the address conversion unit 13, and overwrites the instruction packet buffer unit 11b2 with the received instruction packet. It is good. In step S173, the address comparison / buffer control unit 11b1 may not output an invalid instruction packet that is an accepted instruction packet to the address conversion unit 13.

上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 A part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.

（付記１）
スカラ演算を行うスカラ処理部と、ベクトル演算を行うベクトル処理部と、記憶部と、を備えるベクトル処理装置であって、
前記スカラ処理部は、
前記記憶部からデータをロードする命令あるいは前記記憶部にデータをストアする命令を示す第１の命令パケット又は第２の命令パケットを発行するものであって、
前記スカラ処理部により発行される第１の命令パケットを格納する命令パケットバッファ部と、
前記第１の命令パケット及び当該第１の命令パケットの後に発行される第２の命令パケットの種類を判別し、前記第１の命令パケットが指定する第１のアドレスと、第２の命令パケットが指定する第２のアドレスとを比較し、前記第１の命令パケットと前記第２の命令パケットとを結合する指示を出力するバッファ制御部と、
前記出力された指示に基づき前記第１の命令パケットと前記第２の命令パケットとを結合する結合処理部と、
を備え、
前記バッファ制御部は、
前記第１の命令パケットが示す命令がスカラストア命令であり、前記第２の命令パケットが示す命令が
（ａ）スカラストア命令であって、前記第１のアドレスと前記第２のアドレスが一致する場合、前記結合処理部に前記結合する指示を出力し、結合されたパケットを命令パケットバッファ部に格納させ、
（ｂ）ベクトルロード命令又はベクトルストア命令であって、前記第１のアドレスと前記第２のアドレスが重複しない場合、前記第２の命令パケットを出力し、
（ｃ）スカラロード命令であって、前記第１のアドレスと前記第２のアドレスが一致しない場合、前記第２の命令パケットを出力する
ことを特徴とするベクトル処理装置。 (Appendix 1)
A vector processing apparatus comprising a scalar processing unit that performs a scalar operation, a vector processing unit that performs a vector operation, and a storage unit,
The scalar processing unit is
Issuing a first instruction packet or a second instruction packet indicating an instruction to load data from the storage unit or an instruction to store data in the storage unit,
An instruction packet buffer unit for storing a first instruction packet issued by the scalar processing unit;
A type of the first instruction packet and a second instruction packet issued after the first instruction packet is determined, and a first address designated by the first instruction packet and a second instruction packet are A buffer control unit that compares a second address to be designated and outputs an instruction to combine the first instruction packet and the second instruction packet;
A combining processor that combines the first instruction packet and the second instruction packet based on the output instruction;
With
The buffer control unit
The instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is (a) a scalar store instruction, and the first address matches the second address. The combination processing unit outputs the instruction to combine, and stores the combined packet in the instruction packet buffer unit;
(B) a vector load instruction or a vector store instruction, and if the first address and the second address do not overlap, output the second instruction packet;
(C) A vector processing device that outputs a second instruction packet when the first address and the second address do not coincide with each other in a scalar load instruction.

（付記２）
前記バッファ制御部は、
前記第１の命令パケットが示す命令がスカラストア命令であり、前記第２の命令パケットが示す命令がスカラストア命令であって、前記第１のアドレスと前記第２のアドレスが一致しない場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記命令パケットバッファ部に格納させる
ことを特徴とする付記１に記載のベクトル処理装置。 (Appendix 2)
The buffer control unit
When the instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is a scalar store instruction, and the first address and the second address do not match, The vector processing apparatus according to appendix 1, wherein a first instruction packet is output and the second instruction packet is stored in the instruction packet buffer unit.

（付記３）
前記バッファ制御部は、
前記第１の命令パケットが示す命令がスカラストア命令であり、前記第２の命令パケットが示す命令がベクトルロード命令又はベクトルストア命令であって、前記第１のアドレスと前記第２のアドレスとが重複する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記命令パケットバッファ部に格納させる
ことを特徴とする付記１又は２に記載のベクトル処理装置。 (Appendix 3)
The buffer control unit
The instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is a vector load instruction or a vector store instruction, and the first address and the second address are 3. The vector processing device according to appendix 1 or 2, wherein when there is an overlap, the first instruction packet is output and the second instruction packet is stored in the instruction packet buffer unit.

（付記４）
前記バッファ制御部は、
前記第１の命令パケットが示す命令がスカラストア命令であり、前記第２の命令パケットが示す命令がスカラロード命令であって、前記第１のアドレスと前記第２のアドレスとが一致する場合、前記第１の命令パケットを出力し、前記第２の命令パケットを前記命令パケットバッファ部に格納させる
ことを特徴とする付記１乃至３のいずれか１つに記載のベクトル処理装置。 (Appendix 4)
The buffer control unit
When the instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is a scalar load instruction, and the first address matches the second address, The vector processing device according to any one of appendices 1 to 3, wherein the first instruction packet is output and the second instruction packet is stored in the instruction packet buffer unit.

（付記５）
前記バッファ制御部は、
前記第１の命令パケットが示す命令がベクトルロード命令、ベクトルストア命令、又は、スカラロード命令である場合、前記第１の命令パケットを出力する
ことを特徴とする付記１乃至４のいずれか１つに記載のベクトル処理装置。 (Appendix 5)
The buffer control unit
If the instruction indicated by the first instruction packet is a vector load instruction, a vector store instruction, or a scalar load instruction, the first instruction packet is output. The vector processing device described in 1.

（付記６）
前記バッファ制御部は、前記命令パケットバッファ部が前記第１の命令パケットを格納してからの経過時間を計測し、当該経過時間が所定の値を超えた場合、前記第１の命令パケットを出力する
ことを特徴とする付記１乃至５のいずれか１つに記載のベクトル処理装置。 (Appendix 6)
The buffer control unit measures an elapsed time after the instruction packet buffer unit stores the first instruction packet, and outputs the first instruction packet when the elapsed time exceeds a predetermined value. The vector processing device according to any one of supplementary notes 1 to 5, wherein:

（付記７）
スカラ演算を行うスカラ処理部と、ベクトル演算を行うベクトル処理部と、記憶部と、を備え、前記スカラ処理部は、前記記憶部からデータをロードする命令あるいは前記記憶部にデータをストアする命令を示す第１の命令パケット又は第２の命令パケットを発行するものであって、当該スカラ処理部が、命令パケットバッファ部と、バッファ制御部と、結合処理部とを備える、ベクトル処理装置が実行するベクトル処理方法であって、
当該ベクトル処理方法は、
前記命令パケットバッファ部が、前記スカラ処理部により発行される第１の命令パケットを格納する命令パケットバッファ工程と、
前記バッファ制御部が、前記第１の命令パケット及び当該第１の命令パケットの後に発行される第２の命令パケットの種類を判別し、前記第１の命令パケットが指定する第１のアドレスと、第２の命令パケットが指定する第２のアドレスとを比較し、前記第１の命令パケットと前記第２の命令パケットとを結合する指示を出力するバッファ制御工程と、
前記結合処理部が、前記出力された指示に基づき前記第１の命令パケットと前記第２の命令パケットとを結合する結合処理工程と、
を備え、
前記バッファ制御工程において、前記バッファ制御部は、
前記第１の命令パケットが示す命令がスカラストア命令であり、前記第２の命令パケットが示す命令が
（ａ）スカラストア命令であって、前記第１のアドレスと前記第２のアドレスが一致する場合、前記結合処理部に前記結合する指示を出力し、結合されたパケットを命令パケットバッファ部に格納させ、
（ｂ）ベクトルロード命令又はベクトルストア命令であって、前記第１のアドレスと前記第２のアドレスが重複しない場合、前記第２の命令パケットを出力し、
（ｃ）スカラロード命令であって、前記第１のアドレスと前記第２のアドレスが一致しない場合、前記第２の命令パケットを出力する
ことを特徴とするベクトル処理方法。 (Appendix 7)
A scalar processing unit that performs a scalar operation; a vector processing unit that performs a vector operation; and a storage unit, wherein the scalar processing unit loads data from the storage unit or stores data in the storage unit The first instruction packet or the second instruction packet indicating that the scalar processing unit includes an instruction packet buffer unit, a buffer control unit, and a combination processing unit. A vector processing method for
The vector processing method is
An instruction packet buffer step in which the instruction packet buffer unit stores a first instruction packet issued by the scalar processing unit;
The buffer control unit determines a type of the first instruction packet and a second instruction packet issued after the first instruction packet, and a first address designated by the first instruction packet; A buffer control step of comparing a second address specified by a second instruction packet and outputting an instruction to combine the first instruction packet and the second instruction packet;
A combining processing step in which the combining processing unit combines the first instruction packet and the second instruction packet based on the output instruction;
With
In the buffer control step, the buffer control unit includes:
The instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is (a) a scalar store instruction, and the first address matches the second address. The combination processing unit outputs the instruction to combine, and stores the combined packet in the instruction packet buffer unit;
(B) a vector load instruction or a vector store instruction, and if the first address and the second address do not overlap, output the second instruction packet;
(C) A vector processing method characterized by outputting the second instruction packet when the first address and the second address do not coincide with each other in a scalar load instruction.

（付記８）
コンピュータを、
スカラ演算を行うスカラ処理部と、ベクトル演算を行うベクトル処理部と、記憶部と、を備え、前記スカラ処理部は、前記記憶部からデータをロードする命令あるいは前記記憶部にデータをストアする命令を示す第１の命令パケット又は第２の命令パケットを発行する、ベクトル処理装置として機能させるプログラムであって、
前記プログラムは、前記コンピュータを、
前記スカラ処理部において、
前記スカラ処理部により発行される第１の命令パケットを格納する命令パケットバッファ部、
前記第１の命令パケット及び当該第１の命令パケットの後に発行される第２の命令パケットの種類を判別し、前記第１の命令パケットが指定する第１のアドレスと、第２の命令パケットが指定する第２のアドレスとを比較し、前記第１の命令パケットと前記第２の命令パケットとを結合する指示を出力するバッファ制御部、
前記出力された指示に基づき前記第１の命令パケットと前記第２の命令パケットとを結合する結合処理部、
として機能させ、
前記バッファ制御部は、
前記第１の命令パケットが示す命令がスカラストア命令であり、前記第２の命令パケットが示す命令が
（ａ）スカラストア命令であって、前記第１のアドレスと前記第２のアドレスが一致する場合、前記結合処理部に前記結合する指示を出力し、結合されたパケットを命令パケットバッファ部に格納させ、
（ｂ）ベクトルロード命令又はベクトルストア命令であって、前記第１のアドレスと前記第２のアドレスが重複しない場合、前記第２の命令パケットを出力し、
（ｃ）スカラロード命令であって、前記第１のアドレスと前記第２のアドレスが一致しない場合、前記第２の命令パケットを出力する
として機能させることを特徴とするプログラム。 (Appendix 8)
Computer
A scalar processing unit that performs a scalar operation; a vector processing unit that performs a vector operation; and a storage unit, wherein the scalar processing unit loads data from the storage unit or stores data in the storage unit A program for functioning as a vector processing device that issues a first instruction packet or a second instruction packet indicating
The program causes the computer to
In the scalar processing unit,
An instruction packet buffer unit for storing a first instruction packet issued by the scalar processing unit;
A type of the first instruction packet and a second instruction packet issued after the first instruction packet is determined, and a first address designated by the first instruction packet and a second instruction packet are A buffer control unit that compares a second address to be designated and outputs an instruction to combine the first instruction packet and the second instruction packet;
A combining processor that combines the first instruction packet and the second instruction packet based on the output instruction;
Function as
The buffer control unit
The instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is (a) a scalar store instruction, and the first address matches the second address. The combination processing unit outputs the instruction to combine, and stores the combined packet in the instruction packet buffer unit;
(B) a vector load instruction or a vector store instruction, and if the first address and the second address do not overlap, output the second instruction packet;
(C) A program that causes a scalar load instruction to output the second instruction packet when the first address and the second address do not match.

本発明によれば、命令パケットを制御することにより処理性能を向上させるのに好適なベクトル処理装置、ベクトル処理方法、及び、プログラムを提供することができる。 According to the present invention, it is possible to provide a vector processing device, a vector processing method, and a program suitable for improving processing performance by controlling an instruction packet.

１、９ベクトル処理装置
１１、９１スカラプロセッシング部
１１ａ、９１ａＬ１キャッシュ
１１ｂ命令バッファ部
１１ｂ１アドレス比較・バッファ制御部
１１ｂ２命令パケットバッファ部
１１ｂ３結合処理部
１２、９２ベクトルプロセッシング部
１３、９３アドレス変換部
１４、９４メモリネットワーク
１５、９５メモリ
1, 9 Vector processing unit 11, 91 Scalar processing unit 11a, 91a L1 cache 11b Instruction buffer unit 11b1 Address comparison / buffer control unit 11b2 Instruction packet buffer unit 11b3 Join processing unit 12, 92 Vector processing unit 13, 93 Address conversion unit 14 94 Memory network 15, 95 Memory

Claims

A vector processing apparatus comprising a scalar processing unit that performs a scalar operation, a vector processing unit that performs a vector operation, and a storage unit,
The scalar processing unit is
Issuing a first instruction packet and a second instruction packet indicating an instruction to load data from the storage unit or an instruction to store data in the storage unit;
An instruction packet buffer unit for storing a first instruction packet issued by the scalar processing unit;
It determines the type of the second command packet issued after said first command packet and the first instruction packet, the first address of the first instruction packet designated, the second instruction A buffer control unit that compares a second address specified by the packet and outputs an instruction to combine the first instruction packet and the second instruction packet;
A combining processor that combines the first instruction packet and the second instruction packet based on the output instruction;
With
The buffer control unit
The instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is (a) a scalar store instruction, and the first address matches the second address. The combination processing unit outputs the combination instruction, the combined packet is stored in the instruction packet buffer unit as the first instruction packet , and the first address and the second address do not match , Outputting the first instruction packet, causing the second instruction packet to be stored in the instruction packet buffer unit as the first instruction packet,
(B) a vector load instruction, when said first address and said second address does not overlap, the second outputs an instruction packet, the second address and the first address duplicate If so, the first instruction packet is output, the second instruction packet is stored in the instruction packet buffer unit as the first instruction packet,
(C) In the case of a scalar load instruction, when the first address and the second address do not match, the second instruction packet is output , and the first address and the second address match Output the first instruction packet, store the second instruction packet as the first instruction packet in the instruction packet buffer unit,
(D) In the case of a vector store instruction, when the first address and the second address do not overlap, the second instruction packet is output, and the first address and the second address overlap. In this case, the vector processing apparatus outputs the first instruction packet, and stores the second instruction packet in the instruction packet buffer unit as the first instruction packet .

The buffer control unit
The first command packet indicates instruction vector load instruction, vector store instruction, or, if a scalar load instruction, vector processing apparatus according to claim 1, wherein the outputting the first command packet .

The buffer control unit measures an elapsed time after the instruction packet buffer unit stores the first instruction packet, and outputs the first instruction packet when the elapsed time exceeds a predetermined value. vector processing apparatus according to claim 1 or 2, characterized in that.

A scalar processing unit that performs a scalar operation; a vector processing unit that performs a vector operation; and a storage unit, wherein the scalar processing unit loads data from the storage unit or stores data in the storage unit The first instruction packet and the second instruction packet are issued, and the scalar processing unit includes an instruction packet buffer unit, a buffer control unit, and a combination processing unit. A vector processing method for
The vector processing method is
An instruction packet buffer step in which the instruction packet buffer unit stores a first instruction packet issued by the scalar processing unit;
The buffer control unit determines the type of the second command packet issued after said first command packet and the first instruction packet, the first address of the first instruction packet designated compares the second address the second instruction packet is specified, the buffer control step of outputting an instruction to combine said first instruction packet and said second instruction packet,
A combining processing step in which the combining processing unit combines the first instruction packet and the second instruction packet based on the output instruction;
With
In the buffer control step, the buffer control unit includes:
The instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is (a) a scalar store instruction, and the first address matches the second address. The combination processing unit outputs the combination instruction, the combined packet is stored in the instruction packet buffer unit as the first instruction packet , and the first address and the second address do not match , Outputting the first instruction packet, causing the second instruction packet to be stored in the instruction packet buffer unit as the first instruction packet,
(B) a vector load instruction, when said first address and said second address does not overlap, the second outputs an instruction packet, the second address and the first address duplicate If so, the first instruction packet is output, the second instruction packet is stored in the instruction packet buffer unit as the first instruction packet,
(C) In the case of a scalar load instruction, when the first address and the second address do not match, the second instruction packet is output , and the first address and the second address match Output the first instruction packet, store the second instruction packet as the first instruction packet in the instruction packet buffer unit,
(D) In the case of a vector store instruction, when the first address and the second address do not overlap, the second instruction packet is output, and the first address and the second address overlap. In this case, the vector instruction method outputs the first instruction packet and stores the second instruction packet in the instruction packet buffer unit as the first instruction packet .

Computer
A scalar processing unit that performs a scalar operation; a vector processing unit that performs a vector operation; and a storage unit, wherein the scalar processing unit loads data from the storage unit or stores data in the storage unit A program for functioning as a vector processing device that issues a first instruction packet and a second instruction packet indicating
The program causes the computer to
In the scalar processing unit,
An instruction packet buffer unit for storing a first instruction packet issued by the scalar processing unit;
It determines the type of the second command packet issued after said first command packet and the first instruction packet, the first address of the first instruction packet designated, the second instruction A buffer control unit that compares a second address specified by the packet and outputs an instruction to combine the first instruction packet and the second instruction packet;
A combining processor that combines the first instruction packet and the second instruction packet based on the output instruction;
Function as
The buffer control unit
The instruction indicated by the first instruction packet is a scalar store instruction, the instruction indicated by the second instruction packet is (a) a scalar store instruction, and the first address matches the second address. The combination processing unit outputs the combination instruction, the combined packet is stored in the instruction packet buffer unit as the first instruction packet , and the first address and the second address do not match , Outputting the first instruction packet, causing the second instruction packet to be stored in the instruction packet buffer unit as the first instruction packet,
(B) a vector load instruction, when said first address and said second address does not overlap, the second outputs an instruction packet, the second address and the first address duplicate If so, the first instruction packet is output, the second instruction packet is stored in the instruction packet buffer unit as the first instruction packet,
(C) In the case of a scalar load instruction, when the first address and the second address do not match, the second instruction packet is output , and the first address and the second address match Output the first instruction packet, store the second instruction packet as the first instruction packet in the instruction packet buffer unit,
(D) In the case of a vector store instruction, when the first address and the second address do not overlap, the second instruction packet is output, and the first address and the second address overlap. The first instruction packet is output, and the second instruction packet is stored in the instruction packet buffer unit as the first instruction packet.
Program for causing to function as.