JP2927583B2

JP2927583B2 - Parallel processing programming simulator

Info

Publication number: JP2927583B2
Application number: JP24957191A
Authority: JP
Inventors: 喜弘林
Original assignee: NITSUTO SEIKO KK
Current assignee: NITSUTO SEIKO KK
Priority date: 1991-09-27
Filing date: 1991-09-27
Publication date: 1999-07-28
Anticipated expiration: 2014-07-28
Also published as: JPH0588940A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、並列実行の記述が可能
な高水準言語による並列処理プログラムのシミュレーシ
ョンを行う並列処理プログラミングシミュレータに関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel processing programming simulator for simulating a parallel processing program in a high-level language capable of describing parallel execution.

【０００２】[0002]

【従来の技術】従来のシミュレータは、プログラムの正
しさを検証するものであり、プログラムの効率を予測す
ることができなかった。2. Description of the Related Art Conventional simulators verify the correctness of a program and cannot predict the efficiency of the program.

【０００３】[0003]

【発明が解決しようとする課題】従来のシミュレータ
は、プログラムの効率を予測することができなかったの
で、並列実行の記述が可能な高水準言語による並列処理
プログラムのコンパイルに際して効率の良いプログラム
を開発することが困難であった。本発明はかかる事情に
鑑みて成されたものであり、並列実行の記述が可能な高
水準言語による並列処理プログラムのコンパイルに際し
て効率の良いプログラムを開発することができる並列処
理プログラミングシミュレータを提供することを目的と
する。Since the conventional simulator could not predict the efficiency of the program, an efficient program was developed for compiling a parallel processing program in a high-level language capable of describing parallel execution. It was difficult to do. The present invention has been made in view of such circumstances, and provides a parallel processing programming simulator capable of developing an efficient program when compiling a parallel processing program in a high-level language capable of describing parallel execution. With the goal.

【０００４】[0004]

【課題を解決するための手段】本発明は、同時に実行す
る処理をまとめた並列実行文を含む高水準言語による並
列処理プログラムを読み込んで字句を解読することによ
りつづりの間違いを検出する字句解析手段と、この字句
解析手段から単語を受け取って文法を解析することによ
り文法の間違いを検出する構文解析手段と、この構文解
析手段から受け取った構文にしたがって各種の演算処理
を行う実行手段と、この実行手段による処理時間を計測
して並列処理プログラムの効率を予測するデータ生成手
段とを設けたことを特徴としている。SUMMARY OF THE INVENTION The present invention provides a lexical analysis means for detecting a spelling error by reading a parallel processing program in a high-level language including a parallel execution statement that summarizes processes to be executed simultaneously and decoding the lexical data. Syntactic analysis means for receiving a word from the lexical analysis means and analyzing the grammar to detect a grammatical error; execution means for performing various arithmetic processing according to the syntax received from the syntax analysis means; Data generating means for measuring the processing time by the means and estimating the efficiency of the parallel processing program.

【０００５】[0005]

【作用】字句解析手段は、同時に実行する処理をまとめ
た並列実行文を含む高水準言語による並列処理プログラ
ムを読み込んで字句を解読することによりつづりの間違
いを検出する。構文解析手段は、字句解析手段から単語
を受け取って文法を解析することにより文法の間違いを
検出する。実行手段は、構文解析手段から受け取った構
文にしたがって各種の演算処理を行う。データ生成手段
は、実行手段による処理時間を計測して並列処理プログ
ラムの効率を予測する。The lexical analysis means detects a spelling error by reading a parallel processing program in a high-level language including a parallel execution statement that summarizes processes to be executed simultaneously and decoding the lexical data. The syntax analyzer detects a grammatical error by receiving a word from the lexical analyzer and analyzing the grammar. The execution means performs various arithmetic processes according to the syntax received from the syntax analysis means. The data generation unit estimates the efficiency of the parallel processing program by measuring the processing time by the execution unit.

【０００６】[0006]

【実施例】以下、本発明の実施例を図面を用いて詳細に
説明する。図１は本発明の一実施例における並列処理プ
ログラミングシミュレータを備えた並列処理システム設
計支援装置の概略構成図で、この並列処理システム設計
支援装置は、入力手段１と、シミュレータ２と、並列処
理プログラムコンパイル装置３と、マルチプロセッサシ
ミュレータ４とにより構成されている。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a schematic configuration diagram of a parallel processing system design support apparatus having a parallel processing programming simulator according to an embodiment of the present invention. The parallel processing system design support apparatus includes an input unit 1, a simulator 2, a parallel processing program It comprises a compiling device 3 and a multiprocessor simulator 4.

【０００７】入力手段１は、入力機能と表示機能とを有
していると共に、図形表現型言語を計算機処理に有利な
テキスト型言語に自動変換する機能を有している。シミ
ュレータ２は、入力手段１を用いて入力される設計者の
指示に基づいて、入力手段１によりテキスト型言語に変
換された並列処理プログラムを実行して、文法チェック
等を行うと共に、各プロセスの実行時間等を計測するこ
とにより並列処理プログラムコンパイル装置３による機
能分割に必要なデータを生成して入力手段１に表示させ
る。さらにシミュレータ２は、入力手段１を用いて入力
される設計者の指示に基づいて、並列処理プログラムコ
ンパイル装置３により機能分割された各プロセッサ毎の
プログラムを実行して、各プロセッサの動作時間や同期
待ち時間等を計測することにより並列処理プログラムコ
ンパイル装置３による最適プロセッサ数および結合方式
の決定に必要なデータを生成すると共に入力手段１に表
示させる。The input means 1 has an input function and a display function, and also has a function of automatically converting a graphic expression type language into a text type language advantageous for computer processing. The simulator 2 executes a parallel processing program converted into a text type language by the input unit 1 based on a designer's instruction input using the input unit 1 to perform a grammar check and the like, and to execute each process. By measuring the execution time and the like, data necessary for function division by the parallel processing program compiling device 3 is generated and displayed on the input means 1. Further, the simulator 2 executes a program for each processor, the functions of which are divided by the parallel processing program compiling device 3, based on a designer's instruction input by using the input means 1, and operates and synchronizes each processor. By measuring the waiting time and the like, data necessary for determining the optimal number of processors and the coupling method by the parallel processing program compiling device 3 is generated and displayed on the input means 1.

【０００８】並列処理プログラムコンパイル装置３は、
入力手段１を用いて入力される設計者の指示に基づい
て、テキスト型の高水準言語で記述された並列処理プロ
グラムをコンパイルして複数のプロセッサを作動させる
ための命令コードを生成すると共に、最適なプロセッサ
の数とプロセッサ間の結合方式を決定する。マルチプロ
セッサシミュレータ４は、入力手段１を用いて入力され
る設計者の指示に基づいて、並列処理プログラムコンパ
イル装置３により生成された各プロッセッサ毎の命令コ
ードを実行し、各プロセッサやプロセッサ間の結合方式
の効率等を計測して入力手段１に表示させる。The parallel processing program compiling device 3
Based on a designer's instruction input using the input means 1, a parallel processing program described in a text-type high-level language is compiled to generate an instruction code for operating a plurality of processors, and Determine the number of processors and the coupling method between processors. The multiprocessor simulator 4 executes an instruction code for each processor generated by the parallel processing program compiling device 3 based on a designer's instruction input using the input means 1, and connects each processor and the connection between processors. The efficiency of the system is measured and displayed on the input means 1.

【０００９】この並列処理システム設計支援装置は、例
えばワークステーションあるいはパーソナルコンピュー
タ等により実現される。図２は並列処理プログラムコン
パイル装置３の構成図で、並列処理プログラムコンパイ
ル装置３は、機能分割手段７と、プロセッサ台数・結合
方式決定手段８と、，命令コード生成手段９とにより構
成されており、機能分割手段７は、非再帰化処理手段１
０と、剥離処理手段１１と、１ループ化処理手段１２と
により構成されている。This parallel processing system design support apparatus is realized by, for example, a workstation or a personal computer. FIG. 2 is a configuration diagram of the parallel processing program compiling device 3. The parallel processing program compiling device 3 includes a function dividing means 7, a number-of-processors / coupling-method determining means 8, and an instruction code generating means 9. , The function dividing means 7 includes the non-recursive processing means 1
0, separation processing means 11, and one loop processing means 12.

【００１０】機能分割手段７は、入力手段１によりテキ
スト型に変換された並列処理プログラムを各プロセッサ
毎のプログラムに分割する。プロセッサ台数・結合方式
決定手段８は、機能分割手段７の分割結果に基づいてプ
ロセッサの台数およびプロセッサ間の結合方式を決定
し、ネットリストを生成する。The function dividing means 7 divides the parallel processing program converted into a text type by the input means 1 into a program for each processor. The number-of-processors / connection-method determining means 8 determines the number of processors and the connection method between the processors based on the division result of the function dividing means 7, and generates a netlist.

【００１１】命令コード生成手段９は、機能分割手段７
により機能分割されたプログラムをコンパイルしてプロ
セッサを駆動するための命令コードを生成する。図３は
シミュレータ２の構成図で、シミュレータ２は、コマン
ド解析手段１４と、字句解析手段１５と、構文解析手段
１６と、実行手段１７と、データ生成手段１８とにより
構成されており、実行手段１７は切換手段１９を備えて
いる。The instruction code generating means 9 includes a function dividing means 7
To generate an instruction code for driving the processor by compiling the program divided into functions. FIG. 3 is a configuration diagram of the simulator 2. The simulator 2 includes a command analysis unit 14, a lexical analysis unit 15, a syntax analysis unit 16, an execution unit 17, and a data generation unit 18. 17 is provided with switching means 19.

【００１２】コマンド解析手段１４は、入力手段１を介
して供給される設計者からの指示を解析し、シミュレー
ションを制御する。字句解析手段１５は、入力手段１に
よりテキスト型に変換された並列処理プログラムあるい
は並列処理プログラムコンパイル装置３の機能分割手段
７により機能分割されたプログラムを読み込んで字句を
解読し、つづりの間違いをチェックする。The command analyzing means 14 analyzes instructions supplied from the designer through the input means 1 and controls the simulation. The lexical analysis means 15 reads the parallel processing program converted into the text type by the input means 1 or the program divided by the function dividing means 7 of the parallel processing program compiling device 3 and decodes the lexical characters to check spelling errors. I do.

【００１３】構文解析手段１６は、字句解析手段１５か
らの単語を受け取って文法を解析し、文法の間違いをチ
ェックする。実行手段１７は、構文解析手段１６から解
析結果を受け取り、構文に従って各種の演算処理を行
う。データ生成手段１８は、プログラムの処理時間等を
計測し、プログラムの効率を予測して並列処理プログラ
ムコンパイル装置３の機能分割手段７による機能分割の
ためのデータあるいは並列処理プログラムコンパイル装
置３のプロセッサ台数・結合方式決定手段８によるプロ
セッサ台数や結合方式の決定のためのデータを生成する
と共に、その結果を入力手段１に表示させる。The syntactic analysis means 16 receives the word from the lexical analysis means 15, analyzes the grammar, and checks for grammatical errors. The execution unit 17 receives the analysis result from the syntax analysis unit 16 and performs various arithmetic processes according to the syntax. The data generating means 18 measures the processing time of the program and the like, predicts the efficiency of the program, and predicts the efficiency of the program. The data for function division by the function dividing means 7 of the parallel processing program compiling apparatus 3 or the number of processors of the parallel processing program compiling apparatus 3 Generate data for determining the number of processors and the connection method by the connection method determining means 8 and display the result on the input means 1.

【００１４】切換手段１９は、実行手段１７があるプロ
セスの演算を実行中に同期待ちになり、続行条件が成立
しないときに、実行中のプロセス情報を退避させ、別の
プロセスの演算に切り換える。図４はマルチプロセッサ
シミュレータ４の構成図で、マルチプロセッサシミュレ
ータ４は、スケジューリング手段２１と、プロセッサエ
ミュレーション手段２２と、メモリ管理手段２３と、並
列性解析手段２４とにより構成されている。The switching means 19 waits for synchronization while the execution means 17 is executing an operation of a certain process, and when the continuation condition is not satisfied, saves the information of the process being executed and switches to an operation of another process. FIG. 4 is a configuration diagram of the multiprocessor simulator 4. The multiprocessor simulator 4 includes a scheduling unit 21, a processor emulation unit 22, a memory management unit 23, and a parallelism analysis unit 24.

【００１５】スケジューリング手段２１は、入力手段１
を介して入力される設計者の指示に基づいて、プロセッ
サの数や結合方式の種類やプロセッサのシミュレーショ
ン順位等を決定した後、並列処理プログラムコンパイル
装置３の命令コード生成手段９により生成された命令コ
ードおよびプロセッサ台数・結合方式決定手段８により
生成されたネットリストを読み込んでシミュレーション
を制御する。The scheduling means 21 includes the input means 1
After determining the number of processors, the type of coupling method, the simulation order of the processors, and the like based on the designer's instruction input through the command line, the instruction generated by the instruction code generation means 9 of the parallel processing program compiling device 3 The simulation is controlled by reading the code and the netlist generated by the number-of-processors / coupling-method determining means 8.

【００１６】プロセッサエミュレーション手段２２は、
スケジューリング手段２１により呼び出され、プロセッ
サの動作を命令サイクル単位で実行する。またこの間に
ハードウエアレベルの実行状態および通信路毎のアクセ
ス状態の各種時間を計測する。メモリ管理手段２３は、
プロセッサエミュレーション手段２２により呼び出さ
れ、例えば４Ｇバイトの仮想記憶２５と例えば４Ｍバイ
トの主記憶２６との対応をとると共に、ネットリストに
基づいてプロセッサ間通信の制御を行う。The processor emulation means 22 comprises:
Called by the scheduling means 21, the operation of the processor is executed in instruction cycle units. During this time, various times of the execution state at the hardware level and the access state for each communication path are measured. The memory management means 23
It is called by the processor emulation means 22 to establish correspondence between the virtual memory 25 of, for example, 4 Gbytes and the main memory 26 of, for example, 4 Mbytes, and to control the inter-processor communication based on the netlist.

【００１７】並列性解析手段２４は、スケジューリング
手段２１により呼び出され、各プロセッサのソフトウェ
アレベルの実行時間を計測すると共に、システムの効率
を演算し、結果を入力手段１に表示させる。次に動作を
説明する。先ず、上記並列処理システム設計支援装置を
用いた設計法の概略について述べる。The parallelism analyzing means 24 is called by the scheduling means 21, measures the execution time of each processor at the software level, calculates the efficiency of the system, and displays the result on the input means 1. Next, the operation will be described. First, an outline of a design method using the parallel processing system design support device will be described.

【００１８】設計者は、並列アルゴリズムを論理的な処
理単位である手続き毎に設計する。このとき、手続きは
必ずしも物理的なプロセッサと１対１に対応していな
い。また、手続き外部との通信は引数によって表され
る。しかも、この段階では引数は他の変数と全く同様に
扱われ、引数の仕様や引数への代入に関しての同期は一
切不要である。並列処理の記述に関しては、同時に実行
する処理をまとめて１つの並列実行文として記述する。
つまり、１つの並列実行文内の処理がすべて終了される
まで自動的に同期がとられかつ実行終了段階ですべての
変数の値が更新されることを保証する並列アルゴリズム
（以下「同期化並列アルゴリズム」という）として記述
する。この記述レベルを以下ＰＤＬ（Ｐｒｏｃｅｄｕｒ
ｅＤｅｓｃｒｉｐｔｉｏｎＬｅｖｅｌ）という。A designer designs a parallel algorithm for each procedure which is a logical processing unit. At this time, the procedure does not always correspond one-to-one with the physical processor. Communication with the outside of the procedure is represented by an argument. Moreover, at this stage, the arguments are treated exactly like the other variables, and there is no need to synchronize the specification of the arguments or the assignment to the arguments. Concerning the description of the parallel processing, the processing to be executed simultaneously is collectively described as one parallel execution statement.
In other words, a parallel algorithm (hereinafter referred to as a “synchronous parallel algorithm”) that is automatically synchronized until all the processing in one parallel executable statement is completed and that the values of all variables are updated at the end of execution. "). This description level is hereinafter referred to as PDL (Procedur
eDescription Level).

【００１９】並列処理プログラムコンパイル装置３は、
シミュレータ２によりシミュレーションされたＰＤＬ記
述に基づいて、各プロセッサ単位の処理を設計する。先
ず、並列性を考慮して同期化並列アルゴリズムを可能な
数のプロセス（並列実行文中で呼び出され、並列実行文
を直接あるいは間接的に含まない手続き）に機能分割す
る。このとき、逐次実行されている部分は各プロセスに
そのまま複写され、並列実行文についてはできるだけプ
ロセス間の処理時間が均一になるように振り分ける。そ
して、あらかじめ一組の論理型変数を用意しておき、分
割前に並列実行文の間である条件が成立するまで処理を
ストップさせるための同期文とこれらの論理型変数への
代入文を組み合わせた通信文を発生させて、各プロセス
間の同期をとる。これらの一連の処理は、多数のプロセ
スが張り合わされた状態であったものが強制的に引き剥
がされて、別々の手続きになるようすを表していること
から、剥離処理と呼ぶ。そして、最初に実行する手続き
のみが並列実行文を含む記述（以下「並列プロセス」と
いう）に変換する。すなわち、ＰＤＬ記述の段階におけ
るフローグラフが図５のようである場合、剥離処理後の
フローグラフは図６のようになる。この状態で、各プロ
セス間の処理時間および稼働率が設計仕様を満足すれ
ば、各プロセスを各々１つのプロセッサに割り付ける。
設計仕様を満足していない場合、特にプロセス間の処理
時間の格差が激しい場合は、プロセス中に１つのループ
文のみ含むように分割し、同期文を発生する。この場合
１つのループ文とは、あるループ文が他のループ文を完
全に含む構造（入れ子構造）になっていないループ文の
数である。したがって、２つのループ文が入れ子構造を
構成している場合は、１つのループ文として扱う。そし
て、処理時間が均等化されるように各プロセス間の比率
を求め、その比率にできるだけ近づくようにＢｅｒｎｓ
ｔｅｉｎの条件を用いてプロセスを分割する。このプロ
セッサ単位の記述レベルをＳＤＬ（Ｓｔｒｕｃｔｕｒｅ
ＤｅｓｃｒｉｐｔｉｏｎＬｅｖｅｌ）と呼ぶ。もし
この段階で設計仕様を明らかに満足しない場合は、ＰＤ
Ｌ設計からやり直す必要がある。The parallel processing program compiling device 3 comprises:
Based on the PDL description simulated by the simulator 2, processing for each processor is designed. First, in consideration of parallelism, the function of the synchronization parallel algorithm is divided into a possible number of processes (procedures called in parallel execution statements and not directly or indirectly including parallel execution statements). At this time, the sequentially executed portion is copied to each process as it is, and the parallel execution statements are distributed so that the processing time between the processes becomes as uniform as possible. Then, prepare a set of Boolean variables in advance, and combine a synchronization statement to stop processing until a condition between parallel execution statements is satisfied before splitting and an assignment statement to these Boolean variables. A message is generated to synchronize the processes. A series of these processes is referred to as a peeling process because a process in which a large number of processes are stuck together is forcibly peeled off so that separate procedures are performed. Then, only the procedure to be executed first is converted into a description including a parallel execution statement (hereinafter referred to as a “parallel process”). That is, when the flow graph at the stage of the PDL description is as shown in FIG. 5, the flow graph after the peeling process is as shown in FIG. In this state, if the processing time and operation rate between the processes satisfy the design specifications, each process is assigned to one processor.
If the design specifications are not satisfied, especially if there is a large difference in the processing time between the processes, the process is divided so as to include only one loop statement, and a synchronous statement is generated. In this case, one loop statement is the number of loop statements in which a certain loop statement does not completely include another loop statement (nested structure). Therefore, if two loop statements form a nested structure, they are treated as one loop statement. Then, a ratio between the processes is calculated so that the processing time is equalized, and Berns is determined so as to be as close as possible to the ratio.
The process is divided using the tein condition. The description level for each processor is set to SDL (Structure).
This is referred to as “Description Level”. If the design specifications are not clearly satisfied at this stage, PD
It is necessary to start over from the L design.

【００２０】さらに並列処理プログラムコンパイル装置
３は、シミュレータ２によりシミュレーションされたＳ
ＤＬ記述に基づいて各プロセッサ間の通信状態を判断
し、通信の頻繁に行われているプロセッサ同士をすべて
結合する。しかも、通信が必要なプロセッサのうちどの
プロセッサとも通信できないプロセッサ（孤立プロセッ
サ）がないように結合し、これをネットワークとする。
そして、ＳＤＬ記述をプロセッサ固有の命令コードに変
換する。また決定したネットワークを、結合するプロセ
ッサの論理番号の組からなるネットリストに変換する。
さらにネットリストから、各プロセッサが直接あるいは
間接に通信すべきプロセッサの論理番号リストをプロセ
ッサ毎に生成し、命令コード中に付加する。なお、ネッ
トワーク中で直接結合していないプロセッサ同士が通信
を行うためには、第３のプロセッサを経由して通信を行
う必要がある（間接通信）。そのために、各プロセッサ
は他の全てのプロセッサにデータを転送するために必要
な中継用プロセッサの候補を記憶する必要がある。この
段階で、再度設計仕様を検討し、これを満足しない場合
は設計のやり直しをする必要がある。ネットワークに対
して、軽度なものならば標準的なネットワークに変更す
ることも可能であるが、重度なものではＰＤＬ設計を変
更する必要がある。Further, the parallel processing program compiling device 3 executes S
The communication state between the processors is determined based on the DL description, and all processors that frequently communicate are connected. In addition, the processors are connected so that there is no processor (isolated processor) that cannot communicate with any of the processors requiring communication, and this is used as a network.
Then, the SDL description is converted into an instruction code unique to the processor. Further, the determined network is converted into a netlist including a set of logical numbers of the processors to be connected.
Further, from the net list, a logical number list of the processors with which each processor should communicate directly or indirectly is generated for each processor, and added to the instruction code. Note that in order for processors that are not directly coupled in the network to communicate with each other, it is necessary to perform communication via the third processor (indirect communication). For this purpose, each processor needs to store a candidate for a relay processor necessary for transferring data to all other processors. At this stage, the design specifications are examined again, and if they are not satisfied, it is necessary to redo the design. The network can be changed to a standard network if the network is mild, but the PDL design needs to be changed if the network is severe.

【００２１】次に、上記並列処理システム設計支援装置
を用いた設計法の詳細について述べる。設計者は、図形
表現型の高水準言語により記述されたＰＤＬ記述の並列
プログラムを入力手段１に入力する。これにより入力手
段１は、図形表現型の言語をテキスト型の言語に変換
し、図外の記憶手段に格納する。Next, details of a design method using the parallel processing system design support apparatus will be described. The designer inputs a parallel program of PDL description described in a graphic expression type high-level language to the input unit 1. Thereby, the input unit 1 converts the graphic expression type language into the text type language and stores it in the storage unit (not shown).

【００２２】次に設計者は、入力手段１に並列プログラ
ムのシミュレーションを行う旨の指示を入力する。これ
によりシミュレータ２の字句解析手段１５は、図７のよ
うに、記憶手段からＰＤＬ記述の並列プログラムａを逐
次読み出し、単語を文章中から切り出して、予め記憶手
段に記憶されている記号表ｂからシンボルに対応した属
性を探し出した後、シンボルと属性とをシンボル・属性
リストｃとして出力する。なお記号表ｂは、プログラム
言語の種類に応じて作成しておく。Next, the designer inputs to the input means 1 an instruction to simulate the parallel program. As a result, the lexical analysis means 15 of the simulator 2 sequentially reads out the parallel program a of the PDL description from the storage means as shown in FIG. 7, cuts out words from the sentence, and extracts the words from the symbol table b stored in the storage means in advance. After searching for the attribute corresponding to the symbol, the symbol and the attribute are output as a symbol / attribute list c. The symbol table b is created according to the type of the programming language.

【００２３】次に構文解析手段１６は、図８のように、
字句解析手段１５から１文単位でシンボル・属性リスト
ｃを受け取り、予め記憶手段に記憶されている文法規則
表ｄからシンボル・属性リストｃに対応した文の種類を
特定し、構文解析表ｅを作成する。なお文法規則表ｄ
は、プログラム言語の種類に応じて作成しておく。次に
実行手段１７は、図９のように、構文解析手段１６から
構文解析表ｅを受け取り、解析結果に従った処理を予め
記憶手段に記憶されている実行関数表ｆから探し出し、
実行する。このとき、処理に必要なデータを予め記憶手
段に記憶されている処理管理表ｇから取り出し、結果を
処理管理表ｇの対応する箇所に書き込む。Next, the syntactic analysis means 16, as shown in FIG.
The symbol / attribute list c is received in units of one sentence from the lexical analysis means 15, the type of the sentence corresponding to the symbol / attribute list c is specified from the grammar rule table d stored in advance in the storage means, and the syntax analysis table e is obtained. create. Grammar rule table d
Are created according to the type of programming language. Next, as shown in FIG. 9, the execution unit 17 receives the syntax analysis table e from the syntax analysis unit 16 and searches for a process according to the analysis result from the execution function table f stored in the storage unit in advance.
Execute. At this time, data necessary for the processing is extracted from the processing management table g stored in the storage means in advance, and the result is written in a corresponding location of the processing management table g.

【００２４】一方データ生成手段１８は、図１０のよう
に、構文解析手段１６から構文解析表ｅを受け取り、対
応するアセンブリコードを等価コード表ｈに作成して、
その等価コードの実行時間を予め記憶手段に記憶されて
いる命令コード表ｉとアドレッシングモード表ｊとを参
照しながら計算し、現在実行中のプロセスの時間表ｋに
実行時間を加算する。On the other hand, as shown in FIG. 10, the data generation means 18 receives the syntax analysis table e from the syntax analysis means 16 and creates a corresponding assembly code in an equivalent code table h.
The execution time of the equivalent code is calculated with reference to the instruction code table i and the addressing mode table j stored in the storage means in advance, and the execution time is added to the time table k of the currently executing process.

【００２５】シミュレーションが終了すると、結果が入
力手段１に表示される。結果が仕様を満足していれば、
次の段階に進む。仕様を満足していなければ、ＰＤＬ記
述の段階から設計をやり直す。次に設計者は、入力手段
１に並列プログラムをＰＤＬ記述からＳＤＬ記述に変換
する旨の指示を入力する。これにより並列処理プログラ
ムコンパイル装置３は、設計者の指示に基づいて記憶手
段から並列プログラムを逐次読み出し、機能分割手段７
により機能分割を行う。この機能分割は、ＰＤＬ記述で
表された同期化並列アルゴリズムをＳＤＬ記述で表され
る並列プロセスに変換するものであって、再帰プロセス
の非再帰化処理と、剥離処理と、１ループ化処理とから
なり、以下に詳細を説明する。なお、以下の説明で用い
る用語の定義を下記表１、２に示す。When the simulation is completed, the result is displayed on the input means 1. If the result satisfies the specification,
Proceed to the next stage. If the specifications are not satisfied, the design is started again from the stage of PDL description. Next, the designer inputs an instruction to convert the parallel program from the PDL description to the SDL description into the input unit 1. Thereby, the parallel processing program compiling device 3 sequentially reads out the parallel programs from the storage means based on the instruction of the designer, and
Divides functions. This function division converts a synchronization parallel algorithm represented by a PDL description into a parallel process represented by an SDL description, and includes a non-recursive process of a recursive process, a stripping process, and a one-loop process. The details will be described below. The definitions of terms used in the following description are shown in Tables 1 and 2 below.

【００２６】[0026]

【表１】 [Table 1]

【００２７】[0027]

【表２】 [Table 2]

【００２８】非再帰化処理手段１０は、並列プログラム
中の並列実行文を含む再帰手続きである再帰プロセスを
通常の再帰手続きに変換する。すなわち再帰プロセス
は、再帰呼び出しが実行される回数分のプロセスが生成
され、これらの処理が同時に実行されるものであり、実
行するまで発生するプロセスの数が確定しないため、機
能分割を困難にすることから、機能分割を実行するに当
たり、再帰プロセスを普通の再帰手続きに変換する必要
がある。この再帰プロセスの非再帰化処理により、図１
１（Ａ）のようなプロセスモデルが図１１（Ｂ）のよう
なプロセスモデルに変換される。具体的には、下記
（１）、（２）の処理を実行する。The non-recursive processing means 10 converts a recursive process which is a recursive procedure including a parallel execution statement in a parallel program into a normal recursive procedure. That is, in the recursive process, processes for the number of times recursive calls are executed are generated, and these processes are executed simultaneously. Since the number of processes that occur until execution is not determined, it becomes difficult to divide functions. Therefore, when performing the function division, it is necessary to convert the recursive process into an ordinary recursive procedure. By the non-recursive processing of this recursive process, FIG.
A process model as shown in FIG. 1A is converted into a process model as shown in FIG. Specifically, the following processes (1) and (2) are executed.

【００２９】（１）並列実行文を含む手続きがその並列
実行文から呼び出している手続きによってさらに呼び出
されている場合、該当する全ての手続きＡに対して手続
き名をＡ’に変えて複製をとり、並列実行文を逐次実行
文に置き換える。（２）並列実行文中で呼び出しされているすべての手続
きに対して呼び出している手続き名をＡからＡ’に置き
換える。(1) When a procedure including a parallel execution statement is further called by a procedure called from the parallel execution statement, the procedure name is changed to A 'for all the corresponding procedures A, and the procedure is duplicated. , Replaces the parallel executable statements with the sequential executable statements. (2) For all procedures called in the parallel execution statement, replace the calling procedure names from A to A '.

【００３０】剥離処理手段１１は、分割処理と、同期機
構・通信プロトコルの発生処理と、被分割集合の手続き
化処理と、インライン展開処理とからなる剥離処理を行
う。分割処理は、最大の実行文を持つ並列実行文と同数
の文の集合に、各処理量が均等化されるようにｍｉｎ−
ｍａｘ原理に基づいて処理を分割するものである。具体
的には、下記（１）〜（５）の処理を行う。The peeling processing means 11 performs a peeling process including a dividing process, a process of generating a synchronization mechanism / communication protocol, a procedure for processing a set to be divided, and an inline expansion process. The division processing is performed such that each processing amount is equalized to a set of statements of the same number as the number of parallel execution statements having the largest execution statement.
The process is divided based on the max principle. Specifically, the following processes (1) to (5) are performed.

【００３１】（１）変数型の分割・合成処理により、従
属型並列処理の前後に変換部を発生させる。（２）従属型並列処理を構成するすべての並列実行文
（下記数１）を１文単位の集合（下記数２）に分割し、
この最大要素数ｍ_iを被分割手続き数とする。（３）各文の実行時間に対するｍｉｎ−ｍａｘ原理に基
づいて、集合（下記数２）をｍ_i個の集合（下記数３）
にまとめる。(1) A conversion unit is generated before and after dependent parallel processing by variable type division / synthesis processing. (2) All parallel execution statements (formula 1 below) constituting the dependent parallel processing are divided into sets (formula 2 below) in units of one statement,
The maximum number of elements m _i and the split procedure number. (3) On the basis of the min-max principle for the execution time of each sentence, a set (the following equation 2) is divided into _mi sets (the following equation 3)
Put together.

【００３２】（４）従属型並列処理を含んでいた実行部
をｍ_i個コピーし、その中に下記数３の要素を埋め込
む。（５）各処理に対して、埋め込みが起こった基本ブロッ
ク内で他の処理に分配された変数定義が使用されている
場合は送受信用の代入文を発生させる。[0032] (4) an execution unit contained Dependent Parallel Processing and m _i pieces copy, embed the following elements: number 3 therein. (5) For each process, if a variable definition distributed to another process is used in the embedded basic block, an assignment statement for transmission and reception is generated.

【００３３】[0033]

【数１】 (Equation 1)

【００３４】[0034]

【数２】 (Equation 2)

【００３５】[0035]

【数３】 (Equation 3)

【００３６】同期機構・通信プロトコルの発生処理は、
分割後の各集合に対して通信プロトコルを生成するもの
である。具体的には、下記（１）〜（３）の処理を行
う。（１）変数の分類を行い、すべての受信・放送変数ａ^k
に対して同期変数ｆ^kを生成し、通信変数ｖ^kとする。
このとき、同期変数の受信側および送信側の要素数はそ
の変数を受信および送信している手続きの数に合わせ
る。The processing for generating the synchronization mechanism and communication protocol is as follows.
A communication protocol is generated for each of the divided sets. Specifically, the following processes (1) to (3) are performed. (1) Variables are classified and all received / broadcast variables a ^k
, A synchronization variable f ^k is generated and set as a communication variable v ^k .
At this time, the number of elements on the receiving side and the transmitting side of the synchronization variable is adjusted to the number of procedures for receiving and transmitting the variable.

【００３７】（２）下記〜の規則にしたがって、受
信点・放送点・中継点にそれぞれ同期機構・通信プロト
コルを発生させる。このとき、正論理または負論理で統
一し、しかも初期値の挿入は１つの変数につき１回とす
る。なお、受信点とは並列実行文中で受信変数が使用さ
れる場所をいい、放送点とは並列実行文中で放送変数が
使用される場所をいい、中継点とは代入文両辺に中継変
数が存在する場所をいう。(2) According to the following rules, a synchronization mechanism and a communication protocol are generated at the receiving point, the broadcasting point, and the relay point, respectively. At this time, positive logic or negative logic is used, and the initial value is inserted once for each variable. Note that the receiving point refers to the place where the received variable is used in the parallel executable statement, the broadcast point refers to the place where the broadcast variable is used in the parallel executable statement, and the relay point refers to the place where the relay variable exists on both sides of the assignment statement. A place to do.

【００３８】受信（放送）点が制御構造の実行文中に
あれば、受信（放送）点に受信（放送）プロトコルを発
生させる。制御構造に属さない場合、受信（放送）点が
１つ以上存在すれば受信（放送）プロトコルを、それ以
外は同期機構を発生させる。中継点に中継プロトコルを発生させる。If the reception (broadcast) point is in the execution statement of the control structure, a reception (broadcast) protocol is generated at the reception (broadcast) point. If it does not belong to the control structure, it generates a reception (broadcast) protocol if there is at least one reception (broadcast) point, and generates a synchronization mechanism otherwise. Generate a relay protocol at the relay point.

【００３９】制御構造の有無により同期変数の値変化
に差がでてはならない。（３）同期変数ｆｅｎｄを生成し、他のすべての実行
文を下記数４で挟む。Change in value of synchronization variable depending on presence or absence of control structure
There must be no difference. (3) Synchronous variable f generate end and execute all other
The sentence is sandwiched by Equation 4 below.

【００４０】[0040]

【数４】 (Equation 4)

【００４１】被分割集合の手続き化処理は、被分割集合
を実行部とする手続きを生成し、これらの手続きを呼び
出す１つの並列実行文を生成し、これを元の実行部と置
き換えるものである。具体的には、下記（１）〜（４）
の処理を行う。（１）生成された通信変数の複製をとる。また、各被分
割集合の同期変数ｆｅｎｄの複製を変数名が重ならぬよ
うにとる。The procedural processing of the divided set is performed by dividing the divided set
Generate procedures that have
Generate one parallel executable statement and place it in the original executable.
It is something to replace. Specifically, the following (1) to (4)
Is performed. (1) Duplicate the generated communication variables. Also, each share
Synchronous variable f of split set Duplicate end, variable names do not overlap
Sea urchin.

【００４２】（２）下記〜の規則にしたがって、被
分割集合を実行部とする手続き宣言部を生成する。この
とき変数と変数型との対応をとる。受信変数となっているものおよび同期文中に現れた論
理型変数は入力変数とする。放送変数となっているものおよび定義された論理型変
数は出力変数とする。(2) In accordance with the following rules (1) to (4), a procedure declaration section in which the divided set is an execution section is generated. At this time, the correspondence between the variable and the variable type is taken. Those that are received variables and logical variables that appear in the synchronization statement are input variables. Those that are broadcast variables and defined logical variables are output variables.

【００４３】中継変数となっているものについては、
受信点と放送点とにおける変数名を変更した後で入出力
変数とする。自動的に生成された変数は内部変数とする。（３）複製された通信変数を引数として、生成された手
続きをすべて呼び出す１つの並列実行文を生成する。こ
のとき、それぞれの手続き宣言部の入出力変数に対応し
て入出力引数を生成する。For the relay variables,
After the variable names at the receiving point and the broadcasting point are changed, they are set as input / output variables. Automatically generated variables are internal variables. (3) Using the duplicated communication variables as arguments, generate one parallel executable statement that calls all generated procedures. At this time, input / output arguments are generated corresponding to the input / output variables of each procedure declaration section.

【００４４】（４）生成された並列実行文に各手続きか
らの同期変数ｆｅｎｄ_iに対し下記数５の終了判定文
を加えた逐次実行文を元の実行部と置き換える。(4) A synchronization variable f from each procedure is added to the generated parallel execution statement. The sequential execution statement in which the end judgment statement of the following expression 5 is added to end _i is replaced with the original execution unit.

【００４５】[0045]

【数５】 (Equation 5)

【００４６】インライン展開処理は、展開前の手続きが
カーネルでなければ、手続きを呼び出している手続きに
対して、手続き呼び出し文を展開後の実行部で置き換え
るものである。具体的には、下記（１）〜（５）の手続
きを実行する。なお、カーネルとは最初に実行される手
続きをいう。（１）手続き呼び出し文を対応する手続きの実行部で置
き換える。In the inline expansion process, if the procedure before expansion is not a kernel, the procedure calling statement is replaced with the execution unit after expansion for the procedure calling the procedure. Specifically, the following procedures (1) to (5) are executed. The kernel is a procedure that is executed first. (1) Replace the procedure call statement with the execution part of the corresponding procedure.

【００４７】（２）実行部に含まれる内部変数を、変数
名が重複する場合は名前を変更し、重複しない場合はそ
のまま定義する。（３）変数の型が存在しない場合は変数型の定義を行っ
た後、変数を型に対応させる。（４）実行部に含まれる入出力変数を入出力引数で、変
更された内部変数を変更後の名前でそれぞれ置き換え
る。(2) The internal variables included in the execution unit are renamed when the variable names are duplicated, and are defined as they are when the variable names are not duplicated. (3) If the type of the variable does not exist, define the variable type and then associate the variable with the type. (4) The input / output variables included in the execution unit are replaced with input / output arguments, and the changed internal variables are replaced with the changed names.

【００４８】剥離処理により生成されたカーネルは、複
数の並列実行文が逐次的に実行される構造を持つことに
なる。剥離処理では、各並列実行文毎にはプロセスに対
する処理量の均等化を考慮している。しかし、カーネル
から呼び出されている全てのプロセスに対する処理量の
均等化については考慮していない。したがって、生成さ
れた並列プロセス全体としての各プロセスの処理量の均
等化を行うことにより、最適なシステムとなる。そこ
で、並列実行文間でプロセスの処理量の比により、それ
ぞれのプロセスの実行プロセッサ数を決定する必要があ
る。そこでは、各プロセスの処理をｎ分割する必要が生
じる。そして、これが可能なものは、手続き内にループ
文を含んでいる場合である。このループ文に着目した分
割法を適用可能とするために、すべての手続きを１ルー
プ化する必要が生じる。この１ループ化処理は下記
（１）〜（６）の手順により実現される。The kernel generated by the stripping process has a structure in which a plurality of parallel executable statements are sequentially executed. In the stripping process, equalization of the processing amount for each parallel execution statement is considered. However, no consideration is given to equalizing the processing amount for all processes called from the kernel. Therefore, by equalizing the processing amount of each process as a whole of the generated parallel processes, an optimal system is obtained. Therefore, it is necessary to determine the number of execution processors of each process based on the ratio of the processing amount of the process between the parallel execution statements. In this case, it is necessary to divide the processing of each process into n. This is possible when the procedure contains a loop statement. In order to be able to apply the division method focusing on this loop statement, it is necessary to make all the procedures into one loop. This one-loop processing is realized by the following procedures (1) to (6).

【００４９】（１）２つ以上のループ文を含む手続きの
実行部をループ文の終了部分で分割する。（２）分割した文の集合の間にデータ依存関係が存在す
る場合、その変数を記憶する。（３）下記〜の規則にしたがい、分割した文の集合
を実行部とする手続き宣言部を生成する。(1) The execution part of a procedure including two or more loop statements is divided at the end of the loop statement. (2) If there is a data dependency between the set of divided sentences, the variable is stored. (3) According to the following rules (1) to (4), a procedure declaration section is generated in which a set of divided statements is an execution section.

【００５０】データ依存関係が存在する変数が実行部
で使用されているものは入力変数とする。データ依存関係が存在する変数が実行部で定義されて
いるものは出力変数とする。データ依存関係が存在する変数が実行部で使用・定義
されているものについては、変数名を分離して入出力変
数とする。さらに、実行部の使用箇所と定義箇所の変数
名を入出力変数名で置き換える。Variables having a data dependency are used in the execution unit as input variables. Variables for which data dependence exists are defined in the execution unit as output variables. If a variable having a data dependency is used or defined in the execution unit, the variable name is separated and used as an input / output variable. Further, the variable names at the used and defined locations of the execution unit are replaced with input / output variable names.

【００５１】データ依存関係が存在しない変数は局所
変数とする。（４）実行部の使用・定義箇所をそれぞれ受信・放送点
と見做して同期機構・通信プロトコルを発生させる。（５）各手続きに同期変数ｆｅｎｄを生成して、下記
数４で実行部をはさむ。Variables having no data dependency are local variables. (4) Synchronization mechanism / communication protocol is generated by regarding the use / definition places of the execution unit as reception / broadcast points, respectively. (5) Synchronous variable f for each procedure end is generated, and the execution unit is inserted by the following equation (4).

【００５２】（６）元のプロセスを呼び出している箇所
に分割した手続き呼び出し文を生成する。次に設計者は、入力手段１に機能分割処理後のプログラ
ムのシミュレーションを行う旨の指示を入力する。これ
によりシミュレータ２の字句解析手段１５は、図７のよ
うに、記憶手段から並列プログラムａを逐次読み出し、
単語を文章中から切り出して、記号表ｂからシンボルに
対応した属性を探し出した後、シンボルと属性とをシン
ボル・属性リストｃとして出力する。(6) Generate a procedure call sentence divided into places where the original process is called. Next, the designer inputs to the input means 1 an instruction to simulate the program after the function division processing. Thereby, the lexical analysis means 15 of the simulator 2 sequentially reads the parallel program a from the storage means as shown in FIG.
After extracting a word from the text and searching for an attribute corresponding to the symbol from the symbol table b, the symbol and the attribute are output as a symbol / attribute list c.

【００５３】次に構文解析手段１６は、図８のように、
字句解析手段１５から１文単位でシンボル・属性リスト
ｃを受け取り、文法規則表ｄからシンボル・属性リスト
ｃに対応した文の種類を特定し、構文解析表ｅを作成す
る。次に実行手段１７は、図９のように、構文解析手段
１６から構文解析表ｅを受け取り、解析結果に従った処
理を実行関数表ｆから探し出し、実行する。このとき、
処理に必要なデータを処理管理表ｇから取り出し、結果
を処理管理表ｇの対応する箇所に書き込む。また実行手
段１７は、構文解析手段１６から受け取った構文解析表
ｅの文がｗａｉｔ文であれば切換手段１９を起動する。Next, as shown in FIG.
The symbol / attribute list c is received in units of one sentence from the lexical analysis unit 15, the type of the sentence corresponding to the symbol / attribute list c is specified from the grammar rule table d, and a syntax analysis table e is created. Next, as shown in FIG. 9, the execution means 17 receives the syntax analysis table e from the syntax analysis means 16, finds out a process according to the analysis result from the execution function table f, and executes it. At this time,
The data necessary for the processing is extracted from the processing management table g, and the result is written in the corresponding part of the processing management table g. The execution unit 17 activates the switching unit 19 if the sentence of the syntax analysis table e received from the syntax analysis unit 16 is a wait sentence.

【００５４】これにより切換手段１９は、図１２のよう
に、構文解析表ｅの条件部を参照し、条件がＴ（Ｔｒｕ
ｅ）であれば何もせずに処理を続行する。条件がＦ（Ｆ
ａｌｓｅ）であれば、現在の状態をプロセス管理表ｍに
記入した後、タイムホイールｎを１ステップ進めて次に
実行すべきプロセス名を取得し、プロセス管理表ｍに従
って実行すべきプログラムに切り換える。As a result, the switching unit 19 refers to the condition part of the syntax analysis table e as shown in FIG.
If e), the processing is continued without doing anything. If the condition is F (F
If (alse), the current state is written in the process management table m, and then the time wheel n is advanced by one step to obtain a process name to be executed next, and is switched to a program to be executed according to the process management table m.

【００５５】すなわち図１３のように、第ｉプロセスを
実行し（ステップＳ１）、同期待ちか否かを判断し（ス
テップＳ２）、同期待ちであれば処理続行の条件が不成
立であるか否かを判断し（ステップＳ３）、不成立であ
れば第ｉプロセスを退避させ（ステップＳ４）、ｉに１
を加えてｉとし（ステップＳ５）、第ｉプロセスを呼び
出して（ステップＳ６）、ステップＳ１に戻る。ステッ
プＳ２で同期待ちでなければステップＳ１に戻る。ステ
ップＳ３で条件不成立でなければステップＳ１に戻る。That is, as shown in FIG. 13, the i-th process is executed (step S1), and it is determined whether or not the synchronization is awaited (step S2). Is determined (step S3), and if not satisfied, the i-th process is saved (step S4), and 1 is set to i.
Is added to i (step S5), the i-th process is called (step S6), and the process returns to step S1. If the synchronization is not waiting in step S2, the process returns to step S1. If the condition is not satisfied in step S3, the process returns to step S1.

【００５６】一方データ生成手段１８は、図１０のよう
に、構文解析手段１６から構文解析表ｅを受け取り、対
応するアセンブリコードを等価コード表ｈに作成して、
その等価コードの実行時間と同期待ち時間とを、予め記
憶手段に記憶されている命令コード表ｉとアドレッシン
グモード表ｊとを参照しながら計算し、現在実行中のプ
ロセスの時間表ｋに実行時間と同期待ち時間とに分けて
加算する。On the other hand, as shown in FIG. 10, the data generation means 18 receives the syntax analysis table e from the syntax analysis means 16 and creates a corresponding assembly code in the equivalent code table h.
The execution time and the synchronization waiting time of the equivalent code are calculated with reference to the instruction code table i and the addressing mode table j stored in the storage means in advance, and the execution time is stored in the time table k of the currently executing process. And the synchronization wait time.

【００５７】シミュレーションが終了すると、結果が入
力手段１に表示される。結果が仕様を満足していれば、
次の段階に進む。仕様を満足していなければ、ＰＤＬ記
述の段階から設計をやり直す。次に設計者は、入力手段
１にプロセッサ台数およびプロセッサ間の結合方式を決
定し、プロセッサを作動させるための命令コードを生成
する旨の指示を入力する。これにより並列処理プログラ
ムコンパイル装置３のプロセッサ台数・結合方式決定手
段８は、シミュレータ２のデータ生成手段１８により生
成されたデータを参照して、プロセッサの台数およびプ
ロセッサ間の結合方式を決定すると共に、中継用プロセ
ッサリストを生成する。When the simulation is completed, the result is displayed on the input means 1. If the result satisfies the specification,
Proceed to the next stage. If the specifications are not satisfied, the design is started again from the stage of PDL description. Next, the designer determines the number of processors and the coupling method between the processors into the input means 1 and inputs an instruction to generate an instruction code for operating the processors. Thereby, the number-of-processors / coupling-method determining means 8 of the parallel processing program compiling device 3 refers to the data generated by the data generating means 18 of the simulator 2 to determine the number of processors and the coupling method between the processors, Generate a relay processor list.

【００５８】結合方式の決定は、下記（１）〜（３）の
手順で行う。（１）すべてのプロセッサの組（ｐａⁱ，ｐａ^j）
に対して結合力を計算し、結合力行列Ｆ＝（Ｆ^ij）を求
める。ここで結合力とは、プロセス間の通信量であり、
変数のサイズと処理実行中に送受信した回数とにより決
定することができ、本実施例では下記数６を結合力関数
とする。この関数は、プロセスが実行中に通信した情報
量に比例した量であると見做すことができる。The determination of the connection method is performed according to the following procedures (1) to (3). (1) All processor sets (p a ⁱ , p a ^j )
Is calculated, and a bonding force matrix F = (F ^ij ) is obtained. Here, the binding force is the communication amount between processes,
It can be determined by the size of the variable and the number of transmissions and receptions during the execution of the process. This function can be considered to be an amount proportional to the amount of information communicated during the execution of the process.

【００５９】[0059]

【数６】 (Equation 6)

【００６０】（２）得られた結合力行列Ｆのすべての要
素に対し、そのヒストグラムをとる。（３）ヒストグラムがいくつかの分布に分割される場合
は、結合力の大きな分布から順に分布に属するプロセッ
サの組をネットワークとして直接結合し、ネットリスト
を作成する。このとき、通信が必要なプロセッサのうち
どのプロセッサとも直接あるいは間接的に結合せず孤立
するプロセッサが存在しなくなるまで分布を採用する。
また、分布が分割しない場合は、この分布に属するすべ
てのプロセッサをネットワークとして直接結合し、ネッ
トリストを作成する。(2) A histogram is obtained for all the elements of the obtained coupling force matrix F. (3) When the histogram is divided into several distributions, a set of processors belonging to the distribution is directly connected as a network in order from a distribution having a large coupling force to create a netlist. At this time, the distribution is adopted until there is no isolated processor that is not directly or indirectly coupled to any of the processors requiring communication.
If the distribution is not divided, all the processors belonging to this distribution are directly connected as a network to create a netlist.

【００６１】このように結合方式を決定することによ
り、プロセス間の通信量の小さなものについては直接結
合させずに中継させることで、通信用の回路の増大を抑
えることができ、通信量の大きなものについては直接結
合させることで、中継に伴う通信のオーバヘッドを減少
させることができる。以上の手順でネットワークが決定
すると、とりあえずプロセッサ間の通信は可能となる
が、これは単に線がつながったに過ぎない。つまり、す
べてのプロセッサが互いに接続された場合を除いて、直
接通信のできないプロセッサの組が存在する。このプロ
セッサ同士は、第３者を経由しないと通信することがで
きない。ところが、どのプロセッサを経由すれば通信を
行うことが可能であるのかといっった情報はまだ決定さ
れていない。したがって、この情報を決定する必要があ
る。これはネットリストが与えられれば、決定すること
は可能である。この問題は、与えられた２点間の最短経
路を決定する問題として知られ、ダイクストラ（Ｄｉｊ
ｋｓｔｒａ）法等の解法が有名である。本実施例では、
ダイクストラ法を用いて通信経路を決定しているので、
詳細なアルゴリズムの説明は省略する。自分以外のプロ
セッサへ通信するための情報としては、そのプロセッサ
へ通信するために、直接通信すべきプロセッサのリスト
を記憶しておけばよい。また、最短経路が複数存在して
も差し支えなく、これをすべて記憶しておけば、特定の
通信路のみが混雑することを防ぐことが可能となる。By determining the coupling method in this manner, by relaying a small communication amount between processes without directly coupling, it is possible to suppress an increase in the number of communication circuits and to increase the communication amount. By directly linking them, communication overhead associated with relaying can be reduced. When the network is determined by the above procedure, communication between the processors is possible for the time being, but this is merely a connection of the lines. That is, there is a set of processors that cannot communicate directly except when all processors are connected to each other. The processors cannot communicate with each other without going through a third party. However, information as to which processor can communicate with has not been determined yet. Therefore, this information needs to be determined. This can be determined given the netlist. This problem is known as the problem of determining the shortest path between two given points, and is called Dijkstra (Dij
Solving methods such as the kstra method are famous. In this embodiment,
Since the communication route is determined using the Dijkstra method,
Detailed description of the algorithm is omitted. As information for communicating with other processors, a list of processors to be directly communicated with in order to communicate with that processor may be stored. In addition, there may be a plurality of shortest paths, and if all of the shortest paths are stored, it is possible to prevent a specific communication path from being congested.

【００６２】また並列処理プログラムコンパイル装置３
の命令コード生成手段９は、機能分割手段７により機能
分割されたプログラムをコンパイルして、プロセッサを
作動させるための命令コードを生成する。次に設計者
は、入力手段１に並列処理プログラムコンパイル装置３
により生成された命令コードおよびネットリストを用い
てシミュレーションを行う旨の指示を入力する。これに
よりマルチプロセッサシミュレータ４がシミュレーショ
ンを行う。マルチプロセッサシミュレータ４は、図１４
のように、（Ａ）の共有メモリ型や、（Ｂ）の木構造型
や、（Ｃ）の配列型や、（Ｄ）のトーラス型や、（Ｅ）
の超立方体型（ハイパーキューブ型）等の代表的なネッ
トワークを実現可能であり、ネットワークの性能を比較
できるようになされている。The parallel processing program compiling device 3
Instruction code generation means 9 compiles the program divided by the function division means 7 and generates an instruction code for operating the processor. Next, the designer inputs the parallel processing program compiling device 3 to the input means 1.
Is input using the instruction code and the netlist generated by the above. Thereby, the multiprocessor simulator 4 performs a simulation. The multiprocessor simulator 4 is shown in FIG.
(A), shared memory type (A), tree structure type (B), array type (C), torus type (D), and (E)
It is possible to realize a representative network such as a hypercube type (hypercube type), and to compare network performances.

【００６３】スケジューリング手段２１は、設計者の指
示に基づいて、プロセッサの数、ネットワークの種類、
プロセッサのシミュレーション順位等を決定した後、並
列処理プログラムコンパイル装置３のコード生成手段９
が生成したコードおよびプロセッサ台数・結合方式決定
手段８が決定したネットリストを読み込んで、シミュレ
ーションの制御を行う。すなわち図１５のように、初期
設定時には、メモリ管理手段２３に通信網情報と初期値
データとを渡し、タイムホイールｐにプロセッサの実行
順序を設定する。そしてシミュレーション時には、タイ
ムホイールｐの実行プロセッサのプロセッサ番号をプロ
セッサエミュレーション手段２２に渡し、タイムホイー
ルｐのポインタを１つ進める。The scheduling means 21 determines the number of processors, the type of network,
After determining the simulation order of the processor, the code generation means 9 of the parallel processing program compiling device 3
Reads the generated code and the netlist determined by the number-of-processors / coupling-method determining means 8, and controls the simulation. That is, as shown in FIG. 15, at the time of initialization, the communication network information and the initial value data are passed to the memory management means 23, and the execution order of the processor is set on the time wheel p. At the time of simulation, the processor number of the execution processor of the time wheel p is passed to the processor emulation means 22, and the pointer of the time wheel p is advanced by one.

【００６４】プロセッサエミュレーション手段２２は、
スケジューリング手段２１により呼び出され、対象とす
るプロセッサの動作を命令サイクル単位で実行する。こ
の間にハードウェアレベルの実行状態（演算中、命令呼
び出し、ローカルメモリ読み出し・書き込み・呼び出し
待ち・書き込み待ち、共有メモリ呼び出し・書き込み、
呼び出し待ち、書き込み待ち、Ｉ／Ｏ呼び出し・書き込
み・呼び出し待ち・書き込み待ち）および通信路毎のア
クセス状態（呼び出し・書き込み・呼び出し待ち・書き
込み待ち）の時間を計測する。すなわち図１６のよう
に、命令コードすなわちアセンブリ言語で書かれたプロ
グラムｑを実行する。例えば、ｍｏｖａ，ｄｏを読み
込み、ａに対するアドレスを出力してメモリ管理手段２
３からデータｄａｔａ（ａ）を受理する。そして命令が
ｍｏｖであることから、プロセッサデータファイルｒの
レジスタファイル中のｄｏへデータｄａｔａ（ａ）を代
入した後、ポインタを１つ進める。このとき同時に、各
サイクルに要した時間を、命令サイクル時間表ｓを参照
してプロセッサデータファイルｒ中の時間情報部に記入
する。The processor emulation means 22 comprises:
Called by the scheduling means 21, the operation of the target processor is executed in instruction cycle units. During this time, the hardware-level execution status (calculation, instruction call, local memory read / write / call wait / write wait, shared memory call / write,
The times of call waiting, writing wait, I / O call / write / call wait / write wait) and the access state (call / write / call wait / write wait) for each communication path are measured. That is, as shown in FIG. 16, an instruction code, that is, a program q written in an assembly language is executed. For example, mov a, do is read, an address for a is output, and the memory management unit 2 is read.
3 receives data data (a). Then, since the instruction is mov, after assigning data data (a) to do in the register file of the processor data file r, the pointer is advanced by one. At the same time, the time required for each cycle is entered in the time information section in the processor data file r with reference to the instruction cycle time table s.

【００６５】メモリ管理手段２３は、プロセッサエミュ
レーション手段２２から呼び出され、仮想空間すなわち
仮想記憶２５と物理空間すなわち主記憶２６との対応を
とる。同時に、Ｉ／Ｏ空間を介してネットワークによる
プロセッサ間の通信の制御を行う。すなわち図１７のよ
うに、初期設定時にはスケジューリング手段２１により
通信網の形態情報を受け取り、各ポートに対してアクセ
ス可能なプロセッサを示す属性をセットすることによっ
て、通信網ｔを設定する。このポートは属性にセットさ
れた１組のプロセッサのみがアクセス可能であり、これ
によって通信を行うことが可能である。そしてシミュレ
ーションが開始されると、メモリ管理手段２３はプロセ
ッサから指定されたプロセッサ番号とアドレスとによっ
て、対応するデータをプロセッサエミュレーション手段
２２に返す。なおｕはプロセッサデータ、ｖは共有デー
タである。The memory management means 23 is called from the processor emulation means 22 and makes correspondence between the virtual space, that is, the virtual memory 25, and the physical space, that is, the main memory 26. At the same time, communication between processors by a network is controlled via the I / O space. That is, as shown in FIG. 17, at the time of initialization, the communication network form information is received by the scheduling means 21 and an attribute indicating a processor that can access each port is set to set the communication network t. This port can be accessed only by a set of processors set in the attribute, thereby enabling communication. When the simulation is started, the memory management unit 23 returns corresponding data to the processor emulation unit 22 according to the processor number and address specified by the processor. Note that u is processor data and v is shared data.

【００６６】並列性解析手段２４は、スケジューリング
手段２１により呼び出され、各プロセッサのソフトウェ
アレベルの実行状態に関する時間を計測する。すなわち
マルチプロセッサシミュレータ４には、各プロセッサ毎
に１０個のシステム変数が用意されており、この変数を
設定することにより、変数が設定されている間の時間を
独立に計測することが可能となる。設計者は、この変数
を用いることによって任意の処理の実行時間を計測する
ことが可能となる。１０個のシステム変数のうち、ｐｒ
ｂ０〜ｐｒｂ２はシステムで予約されており、それぞ
れ、処理実行中、通信中、待機中の時間の計測を行う。
処理実行中は、プロセッサが自分の与えられた仕事を実
行している時間であり、通信中は、処理に必要なデータ
の送受信または他のプロセッサへの中継処理を行ってい
る時間であり、待機中は、何もせずにただ待っている時
間である。それぞれのシステム変数に対する計測を実行
すると共に、計測された結果をハードウェアレベルの結
果と合わせて集計し、データファイルとして出力する。The parallelism analyzing means 24 is called by the scheduling means 21 and measures the time relating to the software-level execution state of each processor. That is, in the multiprocessor simulator 4, ten system variables are prepared for each processor, and by setting these variables, the time during which the variables are set can be measured independently. . The designer can measure the execution time of an arbitrary process by using this variable. Of the 10 system variables, pr
b0 to prb2 are reserved in the system and measure the time during processing, during communication, and during standby, respectively.
During processing, it is the time during which the processor is performing its assigned task, and during communication, it is the time during which data required for processing is transmitted or received or relayed to another processor. Inside is the time just waiting without doing anything. Measurement is performed for each system variable, and the measured results are totaled together with the results at the hardware level and output as a data file.

【００６７】シミュレーションが終了すると、結果が入
力手段１に表示される。結果が仕様を満足していれば、
設計を終了する。仕様を満足していなければ、ＰＤＬ記
述の段階から設計をやり直す。なお、通信の効率が満足
できない場合は、図１４（Ａ）〜（Ｅ）のような既存の
ネットワークと効率を比較し、既存のネットワークで仕
様を満足すればそれを採用する。When the simulation is completed, the result is displayed on the input means 1. If the result satisfies the specification,
Finish the design. If the specifications are not satisfied, the design is started again from the stage of PDL description. If the communication efficiency is not satisfactory, the efficiency is compared with the existing network as shown in FIGS. 14A to 14E, and if the existing network satisfies the specifications, it is adopted.

【００６８】以上の手順で開発された並列処理プログラ
ムを実行する並列処理プロセッサは、例えば図１８のよ
うに、ローカルメモリ２８₀〜２８_n-1を有する複数の
プロセッサ２９₀〜２９_n-1が通信網により結合し、通
信路を介して他の複数のプロセッサ２９₀〜２９_n-1と
の通信を行うものである。通信網は、複数のプロセッサ
２９₀〜２９_n-1同士の結合状態を示すネットリストに
よりデータバス３０と双方向メモリブロック３１₀〜３
１_n-1との対応を切り換える汎用スイッチ３２と、汎用
スイッチ３２を制御するネットワークコントローラ３３
とからなる。通信は、結合可能な複数のプロセッサ２９
₀〜２９_n-1のみがアクセス可能な複数の双方向メモリ
ブロック３１₀〜３１_n-1の書き込みにより行われる。
またこの通信網は、すべてのプロセッサ２９₀〜２９
_n-1がアクセス可能な共有メモリ３４を持ち、共有メモ
リ３４を介してもプロセッサ２９₀〜２９_n-1同士は通
信可能である。プロセッサ２９₀〜２９_n-1同士の通信
におけるハードウェアレベルでの排他制御は自動的に行
われているものとし、さらにソフトウェア的には、各プ
ロセッサ２９₀〜２９_n-1は非分割の読み書きサイクル
のｔａｓ（ＴｅｓｔａｎｄＳｅｔ）命令を備えている
ものとする。図１９はメモリ空間（仮想空間）の全体構
成図で、メモリ空間は大きく３つの部分（ローカルメモ
リ空間、共有メモリ空間、Ｉ／Ｏ空間）に分かれてい
る。ローカルメモリ空間は、それぞれ対応するプロセッ
サ２９₀〜２９_n-1のみがアクセス可能な領域であり、
各プロセッサ２９₀〜２９_n-1毎に存在する。ローカル
メモリ空間は、さらにプロセッサ２９₀〜２９_n-1固有
の例外ベクタ領域（プロセッサ２９₀〜２９_n-1によっ
ては存在しない場合もある）、ＰＥ結合情報部、ユーザ
領域からなっている。共有メモリ空間は、すべてのプロ
セッサ２９₀〜２９_n-1がアクセス可能な領域であり、
ネットワークコントローラ３３により排他制御され、同
時には１つのプロセッサ２９₀〜２９_n-1のみがアクセ
ス可能である。Ｉ／Ｏ空間は多数のメモリブロックから
なっており、各ブロックは特定の１組のプロセッサのみ
がアクセス可能である。その排他制御は、各プロセッサ
２９₀〜２９_n-1の結合状態を表すネットリストにした
がって、ネットワークコントローラ３３が行う。この領
域は、各ブロック毎に同時にアクセス可能である。ＰＥ
（プロセッサ）結合情報部は、ローカルメモリ空間中に
存在し、図２０のように、プロセッサ２９₀〜２９_n-1
の論理番号、ネットワークの種類、ネットワーク用パラ
メータ、そのプロセッサが直接Ｉ／Ｏを介して通信可能
なプロセッサの番号とそのＩ／Ｏアドレス等を格納す
る。ユーザ領域は、図２１のように、プロセッサ２９₀
〜２９_n-1が実行するプログラムを格納する領域とプロ
セッサ２９₀〜２９_n-1のみがアクセスするデータを格
納する領域とからなる。なお例外ベクタ領域は、各プロ
セッサ２９₀〜２９_n-1により異なるため、その構造の
説明を省略する。Ｉ／Ｏ空間は、図２２のように、１ブ
ロック１６バイトで構成され、これを介して１組のプロ
セッサが通信を行うことができる。１ブロックは、さら
に属性領域、送受信バッファ、予備領域からなる。プロ
セッサ２９₀〜２９_n-1がアクセス可能な領域は送受信
バッファおよび予備領域であり、属性領域はネットワー
クコントローラ３３のみがアクセス可能である。As shown in FIG. 18, for example, as shown in FIG. 18, a plurality of processors 29 _{0 to} 29 _n-1 having local memories 28 _{0 to} 28 _n-1 execute the parallel processing program developed by the above procedure. It is connected by a communication network and communicates with a plurality of other processors 29 _{0 to} 29 _n-1 via a communication path. The communication network communicates with the data bus 30 and the bidirectional memory blocks 31 _{0 to} 3 ₀ by using a netlist indicating a connection state between the plurality of processors 29 _{0 to} 29 _n-1.
1 _n-1 and a network controller 33 for controlling the general-purpose switch 32
Consists of Communication is through a plurality of processors 29 that can be combined.
_This is performed by writing a plurality of bidirectional memory blocks 31 _{0 to} 31 _{n−1 that} can be accessed only by _{0 to} 29 _n−1 .
This communication network also includes all the processors 29 _{0 to} 29
_n-1 has an accessible shared memory 34, and the processors 29 _{0 to} 29 _n-1 can communicate with each other via the shared memory 34. The exclusive control at the hardware level in the communication between the processors 29 _{0 to} 29 _n-1 is assumed to be automatically performed. Further, in terms of software, each of the processors 29 _{0 to} 29 _n-1 performs undivided read / write. It is assumed that a cycle tas (Test Set) instruction is provided. FIG. 19 is an overall configuration diagram of a memory space (virtual space). The memory space is roughly divided into three parts (local memory space, shared memory space, and I / O space). The local memory space is an area that can be accessed only by the corresponding processors 29 _{0 to} 29 _n−1 ,
It exists for each of the processors 29 _{0 to} 29 _n-1 . Local memory space is further (may not be present depending on the processor 29 ₀ ~29 _n-1) Processor 29 ₀ ~29 _n-1 unique exception vector area, become PE binding information unit, from the user area. The shared memory space is an area that can be accessed by all processors 29 _{0 to} 29 _n−1 ,
Exclusively controlled by the network controller 33, only one processor 29 _{0 to} 29 _n-1 can access at the same time. The I / O space consists of a number of memory blocks, each block being accessible only by a particular set of processors. The exclusive control is performed by the network controller 33 in accordance with the net list indicating the connection state of the processors 29 _{0 to} 29 _n-1 . This area can be accessed simultaneously for each block. PE
(Processor) The connection information section exists in the local memory space, and as shown in FIG. 20, the processors 29 _{0 to} 29 _n-1.
, The network type, network parameters, the number of the processor with which the processor can directly communicate via I / O, the I / O address thereof, and the like. The user area includes a processor 29 ₀ as shown in FIG.
To 29 _n−1, and an area for storing data accessed only by the processors 29 _{0 to} 29 _n−1 . Since the exception vector area differs depending on each of the processors 29 _{0 to} 29 _n−1 , the description of the structure is omitted. The I / O space is made up of 16 bytes per block as shown in FIG. 22, through which a set of processors can communicate. One block further includes an attribute area, a transmission / reception buffer, and a spare area. The areas accessible by the processors 29 _{0 to} 29 _n−1 are the transmission / reception buffer and the spare area, and the attribute area is accessible only by the network controller 33.

【００６９】以下、上記並列処理プログラムコンパイル
装置の主要な動作について、具体的な動作を説明する。（具体的実施例１）再帰プロセスの非再帰化処理につい
て、下記数７に示すガウス関数を含む不定積分値を求め
る手続きＩｎｔＧａｕｓｓ( ) の非再帰化処理を用いて
説明する。Hereinafter, a specific operation of the main operation of the parallel processing program compiling apparatus will be described. (Specific Embodiment 1) The non-recursive processing of the recursive process will be described using the non-recursive processing of the procedure IntGauss () for obtaining an indefinite integral value including a Gaussian function shown in the following Expression 7.

【００７０】[0070]

【数７】 (Equation 7)

【００７１】不定積分値Ｇ_mは、ｍの偶数・奇数によ
り、下記数８のように２つの場合に分かれる。The indefinite integral value G _m is divided into two cases as shown in the following Expression 8 according to the even and odd numbers of m.

【００７２】[0072]

【数８】 (Equation 8)

【００７３】ここで、上記数８の値は下記数９〜数１１
により計算できる。Here, the value of the above equation 8 is obtained by the following equations 9 to 11
Can be calculated by

【００７４】[0074]

【数９】 (Equation 9)

【００７５】[0075]

【数１０】 (Equation 10)

【００７６】[0076]

【数１１】 [Equation 11]

【００７７】Ｇ_mの値を求める手続きをＩｎｔＧａｕｓ
ｓ( ) 、Ｅ_nの値を求める手続きをＩｎｔＥＧａｕｓｓ
( ) 、Ｏ_nの値を求める手続きをＩｎｔＯＧａｕｓｓ
( ) 、Ｏ_nの積分値を求める手続きをＩｎｔＩＯＧａｕ
ｓｓ( ) とすると、プログラムは図２３〜図２６のよう
になる。なお図２４の網かけ部分は並列実行文である。
手続きＩｎｔＥＧａｕｓｓ( ) において、手続きＩｎｔ
ＯＧａｕｓｓ( ) および手続きＩｎｔＩＯＧａｕｓｓ
( ) は並列実行文中で呼び出されているため、プロセス
である。さらに、手続きＩｎｔＯＧａｕｓｓ( ) は手続
きＩｎｔＥＧａｕｓｓ( ) を呼び出しているため、手続
きＩｎｔＥＧａｕｓｓ( ) は再帰手続きである。したが
って、手続きＩｎｔＥＧａｕｓｓ( ) は再帰プロセスで
ある。The procedure for finding the value of G _m is IntGauss
s (), IntEGauss a procedure for determining the value of E _n
(), IntOGauss a procedure for determining the value of O _n
(), IntIOGau procedures for obtaining the integral value of O _n
If ss () is used, the program is as shown in FIGS. The shaded portion in FIG. 24 is a parallel execution statement.
In the procedure IntEGauss (), the procedure Int
OGauss () and Procedure IntIOGauss
() Is a process because it is called in a parallel execution statement. Furthermore, since procedure IntOGas () calls procedure IntEGauss (), procedure IntEGauss () is a recursive procedure. Therefore, the procedure IntEGauss () is a recursive process.

【００７８】再帰プロセスに対する非再帰化処理は、下
記（１）（２）の原理に基づいて行う。（１）再帰プロセスＡの並列実行文を順次実行文とした
手続きＡ’を新たに生成する。（２）再帰プロセスＡから呼び出される手続きに対し
て、再帰プロセスＡを呼び出す手続き名をすべてＡ’に
変更する。The non-recursive processing for the recursive process is performed based on the following principles (1) and (2). (1) A procedure A ′ is newly generated in which parallel execution statements of the recursive process A are sequentially executed. (2) For the procedure called from the recursive process A, change all the procedure names that call the recursive process A to A '.

【００７９】すなわち、非再帰化処理後の手続きＩｎｔ
ＥＧａｕｓｓ( ) は図２７のようになる。また新たに生
成された手続きＩｎｔＥＧａｕｓｓ１( ) を図２８に示
し、修正された手続きＩｎｔＯＧａｕｓｓ( ) を図２９
に示す。なお図２９において網かけ部分は変更箇所を示
す。（具体的実施例２）機能分割からネットワークの決定ま
での処理について、図３０のプログラムのように高速フ
ーリエ変換におけるバタフライ演算を実行する手続きｃ
ａｌｃを用いて説明する。なお、この例では再帰プロセ
スは存在せず、また、全ての手続きは１ループ化されて
いるため、再帰プロセスの非再帰化処理および１ループ
化処理を行う必要がないので、説明を省略する。That is, procedure Int after non-recursive processing
EGauss () is as shown in FIG. FIG. 28 shows a newly generated procedure IntEGauss1 (), and FIG. 29 shows a modified procedure IntOGasus ().
Shown in In FIG. 29, the shaded portion indicates a changed portion. (Embodiment 2) A procedure c for executing a butterfly operation in fast Fourier transform as shown in the program of FIG.
Explanation will be given using alc. In this example, there is no recursive process, and all the procedures are formed into one loop, so that it is not necessary to perform the non-recursive process and the one-loop process of the recursive process, and a description thereof will be omitted.

【００８０】（１）分割処理図３０のプログラムからは図３１のようなフローグラフ
が得られる。このフローグラフの節は各実行文を表し、
複数の分岐すなわち破線で囲んだ部分は並列実行を表
す。分割処理を施した結果、図３２のようなフローグラ
フになる。（２）同期機構・通信プロトコルの発生処理変数の分類処理による手続きｃａｌｃに対する変数の分
類結果は、図３３のようになる。変数ｙｒｅ，ｙｉｍ，
ｘｒｅ，ｘｉｍは、この手続き中では独立した変数であ
るが、手続きｃａｌｃを呼び出している手続きｃｆｆｔ
中ではどちらも変数ｚｒｅ，ｚｉｍであるため、この両
者は同一変数として扱われる。したがって、変数ｙｒ
ｅ，ｙｉｍ，ｘｒｅ，ｘｉｍは中継変数となる。続い
て、分類結果を基に孤立変数以外の全ての変数に対し
て、同期変数を発生させ、通信変数とする。このように
して得られた通信変数を図３４に示す。また、同期機構
・通信プロトコルの発生処理後のプログラムの処理実行
部分を図３５〜図３８に示す。なお図３５〜図３８にお
いて、網かけ部分は並列処理プログラムコンパイル装置
３の機能分割手段７の剥離処理手段１１により生成され
た同期機構・通信プロトコルを表す。(1) Dividing Process A flow graph as shown in FIG. 31 is obtained from the program of FIG. The sections of this flow graph represent each executable statement,
A plurality of branches, that is, a portion surrounded by a broken line indicates parallel execution. As a result of performing the division processing, a flow graph as shown in FIG. 32 is obtained. (2) Synchronization Mechanism / Communication Protocol Generation Processing The variable classification result for the procedure calc by the variable classification processing is as shown in FIG. The variables ire, yim,
xre and xim are independent variables in this procedure, but the procedure cfft calling the procedure calc
Since both are variables zre and zim, they are treated as the same variable. Therefore, the variable yr
e, yim, xre, and xim are relay variables. Subsequently, synchronization variables are generated for all variables other than the isolated variables based on the classification result, and are used as communication variables. FIG. 34 shows the communication variables thus obtained. FIGS. 35 to 38 show processing execution portions of the program after the synchronization mechanism / communication protocol generation processing. In FIGS. 35 to 38, the shaded portion indicates the synchronization mechanism / communication protocol generated by the separation processing means 11 of the function dividing means 7 of the parallel processing program compiling device 3.

【００８１】（３）非分割集合の手続き化処理およびイ
ンライン展開処理同期機構・通信プロトコルの発生が行われた後、この集
合を処理実行部とする手続きを生成する。そして、この
手続きｃａｌｃに対してインライン展開処理が実行され
る。（４）ネットワークの決定ネットワークの決定に必要な結合力分布を求めるため
に、シミュレータ２により再度シミュレーションを行
う。シミュレーションの結果得られた結合力行列を図３
９に示し、これにより作成された結合力分布を図４０に
示す。なお図３９において、Ｐ０はシステム全体の管
理、Ｐ１〜Ｐ７は三角関数表作成、Ｐ８〜Ｐ１１はバタ
フライ演算、Ｐ１２〜Ｐ１４はビットリバース演算であ
る。この結合力分布から、結合力が８以上（図３９中＊
印部）のプロセッサの組（ネット）については直接結合
し、これをネットワークとする。この結果として得られ
たネットワークを図４１にグラフの形で示す。このグラ
フの節はプロセッサを表し、実線の枝は通信路を表す。
また破線の枝は、直接は結合していないが通信を行って
いること、つまり、これらの間の通信には第３のプロセ
ッサが必要であることを表す。(3) Non-partitioned set procedural processing and inline expansion processing After the generation of the synchronization mechanism and the communication protocol, a procedure is generated in which this set is used as a processing execution unit. Then, an inline expansion process is executed for this procedure calc. (4) Determination of Network In order to obtain a binding force distribution required for determining a network, simulation is performed again by the simulator 2. Fig. 3 shows the binding force matrix obtained as a result of the simulation.
9 and FIG. 40 shows the resulting bonding force distribution. In FIG. 39, P0 is management of the entire system, P1 to P7 are trigonometric function tables, P8 to P11 are butterfly operations, and P12 to P14 are bit reverse operations. From this bonding force distribution, a bonding force of 8 or more (* in FIG. 39)
The set of processors (net) of (marked part) are directly connected to each other to form a network. The resulting network is shown in graphical form in FIG. The nodes of this graph represent processors, and the solid-line branches represent communication paths.
The broken-line branch indicates that communication is being performed although not directly connected, that is, communication between them requires a third processor.

【００８２】なお上記実施例では、プロセッサ同士を直
接結合する範囲を決定するに際して、通信の必要なプロ
セッサ同士が直接あるいは間接的にすべて結合される最
大の結合力を基準にしたが、この基準になる結合力は入
力手段１を用いて設計者がもっと小さい値を設定できる
ように構成してもよい。また上記実施例では、必ず１ル
ープ化処理を実行するように構成したが、１ループ化処
理は設計者が入力手段１を用いて指示した場合にのみ実
行するように構成してもよい。In the above embodiment, when determining the range in which the processors are directly coupled to each other, the maximum coupling power in which all the processors that need communication are directly or indirectly coupled is used as a reference. The coupling force may be configured so that the designer can set a smaller value by using the input means 1. In the above embodiment, the one-loop processing is always executed. However, the one-loop processing may be executed only when the designer gives an instruction using the input unit 1.

【００８３】[0083]

【発明の効果】以上説明したように本発明によれば、同
時に実行する処理をまとめた並列実行文を含む高水準言
語による並列処理プログラムを読み込んで字句を解読す
ることによりつづりの間違いを検出する字句解析手段
と、この字句解析手段から単語を受け取って文法を解析
することにより文法の間違いを検出する構文解析手段
と、この構文解析手段から受け取った構文にしたがって
各種の演算処理を行う実行手段と、この実行手段による
処理時間を計測して並列処理プログラムの効率を予測す
るデータ生成手段とを設けたので、並列処理プログラム
の正しさの検証と同時に効率の予測が可能であることか
ら、各プロセッサの効率が均等でしかもシステム全体と
して効率の良いプログラムの開発ができるという優れた
効果を奏する。As described above, according to the present invention, a spelling error is detected by reading a parallel processing program in a high-level language including a parallel execution statement that summarizes processes to be executed at the same time and decoding the lexical data. Lexical analysis means, syntax analysis means for receiving a word from the lexical analysis means and analyzing the grammar to detect a grammatical error, and execution means for performing various arithmetic processing according to the syntax received from the syntax analysis means. And a data generation unit that measures the processing time of the execution unit and predicts the efficiency of the parallel processing program. Therefore, it is possible to verify the correctness of the parallel processing program and predict the efficiency at the same time. This has an excellent effect that the efficiency of the program can be evenly developed and an efficient program can be developed as the whole system.

[Brief description of the drawings]

【図１】本発明の一実施例における並列処理プログラミ
ングシミュレータを備えた並列処理システム設計支援装
置の概略構成図である。FIG. 1 is a schematic configuration diagram of a parallel processing system design support apparatus including a parallel processing programming simulator according to an embodiment of the present invention.

【図２】並列処理プログラムコンパイル装置の構成図で
ある。FIG. 2 is a configuration diagram of a parallel processing program compiling device.

【図３】シミュレータの構成図である。FIG. 3 is a configuration diagram of a simulator.

【図４】マルチプロセッサシミュレータの構成図であ
る。FIG. 4 is a configuration diagram of a multiprocessor simulator.

【図５】ＰＤＬ記述によるプロセスモデルのフローグラ
フである。FIG. 5 is a flow graph of a process model based on a PDL description.

【図６】剥離処理後のプロセスモデルのフローグラフで
ある。FIG. 6 is a flow graph of a process model after a peeling process.

【図７】字句解析手段の動作説明図である。FIG. 7 is an explanatory diagram of the operation of the lexical analyzer.

【図８】構文解析手段の動作説明図である。FIG. 8 is an explanatory diagram of the operation of the syntax analysis means.

【図９】実行手段の動作説明図である。FIG. 9 is an explanatory diagram of the operation of the execution means.

【図１０】データ生成手段の動作説明図である。FIG. 10 is an explanatory diagram of the operation of the data generation means.

【図１１】非再帰化処理前後のプロセスモデルのフロー
グラフである。FIG. 11 is a flow graph of a process model before and after non-recursive processing.

【図１２】切換手段の動作説明図である。FIG. 12 is an explanatory diagram of the operation of the switching means.

【図１３】切換手段の動作を説明するためのフローチャ
ートである。FIG. 13 is a flowchart for explaining the operation of the switching means.

【図１４】マルチプロセッサシミュレータにより実現可
能なネットワークの構成図である。FIG. 14 is a configuration diagram of a network that can be realized by a multiprocessor simulator.

【図１５】スケジューリング手段の動作説明図である。FIG. 15 is an explanatory diagram of the operation of the scheduling means.

【図１６】プロセッサエミュレーション手段の動作説明
図である。FIG. 16 is an explanatory diagram of the operation of the processor emulation means.

【図１７】スケジューリング手段の動作説明図である。FIG. 17 is an explanatory diagram of the operation of the scheduling means.

【図１８】並列処理プロセッサの実モデルの構成図であ
る。FIG. 18 is a configuration diagram of a real model of a parallel processing processor.

【図１９】仮想空間の構成図である。FIG. 19 is a configuration diagram of a virtual space.

【図２０】ＰＥ結合情報部の構成図である。FIG. 20 is a configuration diagram of a PE combination information unit.

【図２１】ユーザ領域の構成図である。FIG. 21 is a configuration diagram of a user area.

【図２２】Ｉ／Ｏ空間の構成図である。FIG. 22 is a configuration diagram of an I / O space.

【図２３】ＰＤＬ記述によるプログラムの説明図であ
る。FIG. 23 is an explanatory diagram of a program in PDL description.

【図２４】ＰＤＬ記述によるプログラムの説明図であ
る。FIG. 24 is an explanatory diagram of a program in PDL description.

【図２５】ＰＤＬ記述によるプログラムの説明図であ
る。FIG. 25 is an explanatory diagram of a program in PDL description.

【図２６】ＰＤＬ記述によるプログラムの説明図であ
る。FIG. 26 is an explanatory diagram of a program in PDL description.

【図２７】非再帰化処理後のプログラムの説明図であ
る。FIG. 27 is an explanatory diagram of a program after non-recursive processing.

【図２８】非再帰化処理後のプログラムの説明図であ
る。FIG. 28 is an explanatory diagram of a program after non-recursive processing.

【図２９】非再帰化処理後のプログラムの説明図であ
る。FIG. 29 is an explanatory diagram of a program after non-recursive processing.

【図３０】ＰＤＬ記述によるプログラムの説明図であ
る。FIG. 30 is an explanatory diagram of a program in PDL description.

【図３１】ＰＤＬ記述によるプロセスモデルのフローグ
ラフである。FIG. 31 is a flow graph of a process model based on a PDL description.

【図３２】分割処理後のプロセスモデルのフローグラフ
である。FIG. 32 is a flow graph of a process model after division processing.

【図３３】変数の分類結果の説明図である。FIG. 33 is an explanatory diagram of a variable classification result.

【図３４】通信変数の説明図である。FIG. 34 is an explanatory diagram of communication variables.

【図３５】同期機構・通信プロトコルを埋め込んだプロ
グラムの説明図である。FIG. 35 is an explanatory diagram of a program in which a synchronization mechanism and a communication protocol are embedded.

【図３６】同期機構・通信プロトコルを埋め込んだプロ
グラムの説明図である。FIG. 36 is an explanatory diagram of a program in which a synchronization mechanism and a communication protocol are embedded.

【図３７】同期機構・通信プロトコルを埋め込んだプロ
グラムの説明図である。FIG. 37 is an explanatory diagram of a program in which a synchronization mechanism and a communication protocol are embedded.

【図３８】同期機構・通信プロトコルを埋め込んだプロ
グラムの説明図である。FIG. 38 is an explanatory diagram of a program in which a synchronization mechanism and a communication protocol are embedded.

【図３９】結合力行列の説明図である。FIG. 39 is an explanatory diagram of a coupling force matrix.

【図４０】結合力分布の説明図である。FIG. 40 is an explanatory diagram of a bonding force distribution.

【図４１】ネットワークの説明図である。FIG. 41 is an explanatory diagram of a network.

[Explanation of symbols]

１５字句解析手段１６構文解析手段１７実行手段１８データ生成手段 15 lexical analysis means 16 syntax analysis means 17 execution means 18 data generation means

Claims

(57) [Claims]

1. A lexical analysis means for detecting a spelling error by reading a parallel processing program in a high-level language including a parallel execution statement that summarizes processes to be executed at the same time and decoding the lexical data. And a parsing means for detecting a grammatical error by analyzing the grammar, executing means for performing various arithmetic operations according to the syntax received from the parsing means, and measuring a processing time by the executing means. A parallel processing programming simulator, comprising: a data generation means for predicting the efficiency of the parallel processing program.