CN105528243B

CN105528243B - A kind of priority packet dispatching method and system using data topology information

Info

Publication number: CN105528243B
Application number: CN201510382438.5A
Authority: CN
Inventors: 陈莉; 韩冬妮; 侯雄辉
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2015-07-02
Filing date: 2015-07-02
Publication date: 2019-01-11
Anticipated expiration: 2035-07-02
Also published as: CN105528243A

Abstract

The present invention discloses a kind of priority packet dispatching method and system using data topology information, this method comprises: obtaining the original mesh space of data topology information, the size and floating point precision of the mesh flake in original mesh space is arranged, generates new mesh space；According to the stencil format of new mesh space and parallel area, task image (four-dimensional spacetime domain) is tightened in building, calculates the priority packet number of wherein each task；Obtain the data slice that current task is accessed, and it is relied on by way of format is abstract or function pointer and argument to decide whether to be related to neighbor data, and generate respective markers, go out to be related to the circulation of neighbor data dependence according to marker recognition, this circulation is to walk effective time, according to the currently active time step the duty mapping in circulation to some task for tightening task image, the priority number of current task is calculated according to the priority number of the latter, the priority packet of task is supported to dispatch.

Description

A kind of priority packet dispatching method and system using data topology information

Technical field

It is the present invention relates to computer and the task schedule field of user class, in particular to a kind of using data topology information Priority packet dispatching method and system.

Background technique

There is the memory wall for being difficult to go beyond in modern microprocessor architecture, how to optimize the data locality in program, Its caching performance on a processor is improved, is the important topic of optimizing application.

The background of structured grid method is as follows: there is a large amount of in the fields such as thermal diffusion, electromagnetic field and hydrodynamics Solving Nonlinear Equation, the only available analytic solutions of only a few problem or perturbed solution, structured grid method are that solution is this kind of One of most important numerical method of problem, exemplary steps are as follows: discretization solves domain, continuous solution domain is turned to limited Discrete point set, such as fixed isometric net or adaptive structured grid, then replace difference quotient with difference coefficient, ask solving in domain The discrete solution of grid node out makes solution of difference equation converge on the solution of differential method by certain iterative steps.

The Difference Calculation that each point carries out in structure (discrete) grid, it is often necessary to the value of neighbouring some points, this The calculating of class is referred to as Stencil calculating, and Stencil calculates the core for being structured grid application and universally acknowledged 7 classes height One of performance calculating mode.

The relevant technologies 1: optimize the data reusing in stencil calculating with Time skewing method, as shown in Figure 1:

The calculating memory access ratio of structured grid application is very low, and the utilization rate for how optimizing cache exists as this class method is improved One of the key of performance on multiple nucleus system, academia propose the optimization algorithm of various Time skewing, to optimize simultaneously simultaneously Row and data locality, main thought be, to the hyperspace of the stencil iteration step calculated and space lattice composition into The piecemeal of line tilt --- originally very long data reusing is reduced to inside piecemeal, then determines the dependence between these piecemeals Relationship, realizes a kind of quasi-static task schedule, the essence of Time skewing optimization be the time iteration circulation of outer layer with The traversal calculating of internal space lattice carries out unified consideration, implements complicated cyclical-transformation.

Time skewing method needs could be implemented outermost time cyclical-transformation to internal layer, for complicated applications For implement it is very difficult, it is even not enforceable.Adaptive grid method is exactly such a application type, this method A kind of effective ways of complicated physical problem are to solve for, it according to error dynamics is determined at every point of time or spatial point is It is no that adaptability is needed to encrypt, to improve solving precision.Firstly, such methods are extremely complex in realization --- it must be based on maturation Field programming framework carry out application and development, data structure becomes complicated and is no longer simple array, calculate step and also go out The case where existing C/Fortran mixes, it is static to carrying out between entire mesh space and each calculating step that this causes compiling to be difficult Dependency analysis, to cannot achieve time skewing.Second, due to the dynamic of grid dividing, the neighbours of mesh point are It static cannot determine, which also limits the implementations of time skewing optimization.

The relevant technologies 2: task is parallel and task schedule in optimizing scheduling, as (English in Fig. 2 describes corresponding Fig. 2 Be choleskey decompose in four basic operations, code realization in be call high-performance math library library function), Fig. 3 It is shown:

Task parallel programming model is the parallel programming model studied and used extensively on multi-core platform in recent years, it is intended to be simplified Multiple programming and improve multicore utilization rate, industrial circle and academia have developed many this kind of multiple programming interfaces, than if any Cilk/Cilk++, OpenMP3.0, X10, Habanero-Java, TBB, TPL etc., the program write with this interface, in program Task will form a derivation tree namely a directed acyclic graph, and runtime system is responsible for task schedule, and each verification answers one A physical thread, each physical thread can execute many logic tasks, and this task schedule carried out in user's state space is significantly The expense of scheduling is reduced, to improve the execution efficiency of multithread programs, runtime system uses task stealing dispatching algorithm, obtains Load balance is obtained, the service efficiency of multicore is improved.

In terms of improving the data locality in task schedule, Umut A.Acar, Guy E.Blelloch, Robert D.Blumofe.The Data Locality of Work Stealing.ACM Symposium on Parallel Algorithms and Architectures.Proceedings of the twelfth annual ACM symposium of Parallel algorithms and architectures.Jul.2000,Bar Harbor,Maine,United States.pp.1-12 proposes calculating and the affine mechanism of task, and is adopted by some actual task scheduling systems, this The node that kind method is more embodied in cluster to the optimization of grid application on storage optimization, for the effect of optimization of cache It is not significant.

Priority scheduling is also a kind of common optimizing scheduling, but it is commonly used in those critical paths obviously than it The longer task image in his path, improves the load balance of task.For example Fig. 2 shows the choleskey based on partitioned organization The task image of decomposition needs to give higher priority wherein blue-black dpotrf is key task.

But in the task image much applied and apparent key task, such as the task image of structured grid application is not present The appearance of Fig. 3 is generally showed, coupling is close between each task, and each directed walk is almost just as long, does not deposit In some especially long critical path.(here, the color of task indicates that it is executed on which thread.) currently without human hair Existing, priority packet scheduling can be used to improve the task schedule performance of structured grid application.

Summary of the invention

The present invention is directed structured grid application, it is therefore an objective on traditional Task Scheduling Mechanism, pass through pragmatic family The collaboration of interface and scheduling system, realizes a kind of priority packet dispatching method and system using data topology information.

The present invention proposes a kind of priority packet dispatching method using data topology information, comprising:

Step 1, the mesh flake in the original mesh space is arranged in the original mesh space for obtaining the data topology information Size and floating point precision, generate new mesh space；

Step 2, according to the stencil format of new mesh space and parallel area, building tightens task image, calculates the deflation The priority packet of each task is numbered in task image, wherein priority computation rule are as follows: according to the sequence of row major to each Data slice distributes initial priority number；The priority number for tightening task image upper topology first layer task is its access The maximum value of priority number corresponding to each data slice；The priority number of other tasks is that the dependence source of each dependence edge is appointed The maximum value of the priority number of business；

Step 3, obtain the data slice that is accessed of current task, and by format be abstracted or function pointer and argument by way of It is relied on to decide whether to be related to neighbor data, and generates respective markers；According to the label, identifies and be related to neighbor data dependence Circulation, it is described circulation be effective time walk；According to the space coordinate of effective time step and task, current task is mapped To some task tightened on task image, the priority number of current task is calculated according to the priority number of the latter.

The priority packet dispatching method using data topology information, each data slice has in the step 1 Unique mesh space coordinate.

The priority packet dispatching method using data topology information, between the step 2 and the step 3 also Spatial position including the data slice accessed according to each task obtains seat of the task in the new mesh space Mark.

The priority packet dispatching method using data topology information, the step 2 further include the constraint of grouping It is less than the half of the size of afterbody cache including data footprint；Data reusing degree in each grouping is greater than default threshold Value.

The priority packet dispatching method using data topology information further includes that each task is affine to fixation Thread, so that the task of each grouping is evenly dispersed into each thread, task schedule operative constraint in current group It executes.

The present invention also proposes that a kind of priority packet using data topology information dispatches system, comprising:

New mesh space module is generated, for obtaining the original mesh space of the data topology information, the original is set The size and floating point precision of the mesh flake of beginning mesh space, generate new mesh space；

The priority packet module for tightening task image, according to the stencil format of new mesh space and parallel area, building is tight Contracting task image determines the grouping width for tightening task image；The priority computation rule of task on the deflation task image are as follows: according to The sequence of row major distributes initial priority to each data slice；The priority of the topological first layer task for tightening task image The maximum value of priority number corresponding to each data slice accessed for the task；Other tasks is preferential on deflation task image Grade is the maximum value of the priority number of the dependence originating task of each dependence edge of this task；

Dynamic Packet module, the data slice accessed for obtaining current task, and by the way that format is abstract or function pointer It is relied on the form of argument to decide whether to be related to neighbor data, and generates respective markers；According to the label, identifies and be related to The circulation that neighbor data relies on, the circulation are to walk effective time；It is walked according to the effective time the task in the circulation It is mapped to some task tightened on task image, the priority number of current task is calculated according to the priority number of the latter.

The priority packet using data topology information dispatches system, described to generate in new mesh space module often A grid data piece has unique mesh space coordinate.

The priority packet using data topology information dispatches system, further includes obtaining index module, is used for root According to the spatial position for the data slice that current task is accessed, coordinate of the task in the new mesh space is obtained.

The priority packet using data topology information dispatches system, and the constraint of grouping includes that data footprint wants small In the half of the size of afterbody cache；Data reusing degree in each grouping is greater than preset threshold.

The priority packet using data topology information dispatches system, further includes compatibility distribution module, is used for Each task is affine to fixed thread, so that the task of each grouping is evenly dispersed into each thread, task schedule Operative constraint executes in current group.

As it can be seen from the above scheme the present invention has the advantages that

Technical effect includes two parts, one is the task of support reflecting automatically to four-dimensional spacetime domain (tightening task image) It penetrates；The second is supporting priority packet scheduling, and the reuse to shared buffer memory is realized between packets inner, adjacent packets.

1. providing one group of user interface, support task image to the automatic mapping of four-dimensional spacetime domain (tightening task image), and not It is limited by program complexity.This group of user interface is as described in Table 1:

One group of new user interface of table 1.

The reason of this technology point brings beneficial effect: 1) information provided according to user, scheduling system are able to achieve task and arrive The mapping in four-dimensional spacetime domain (tightening task image).The data access information that user is responsible for division task, describes task.Task is to four Tie up time-space domain mapping in maximum difficult point be --- neighbours rely on expression, since we provide most general function pointers Describing mode, this difficult point is overcome；2) this work belongs to the optimization of scheduling system, is adapted to arbitrarily complicated structure Grid application.

2. using the tutorial message of user, scheduling system can automatically select a good grouping shape, data are affine mode, Priority packet mode, and the caching between optimization task reuses, improves application performance, can abstractively be described with table 2:

Two algorithms (natural subdivision algorithm, plane subdivision algorithm) of 2. priority packet of table

The reason of this technology point brings beneficial effect: 1) present invention can choose out good grouping shape and size, be grouped Data footprint can be estimated with the oblique projection of divisional plane, wherein shared for meeting in conjunction with the size of mesh flake The grouping dimension of buffer memory capacity, the present invention preferably cache the better grouping dimension of reuse degree；2) present invention can exist schedule constraints In one grouping, this technology point is responsible for determining the mapping that data are affine, and the task in same grouping is distributed to each line as far as possible Journey, so as to obtain the data reusing in shared buffer memory；3) the time shaft height being grouped is related to the data reusing degree in group, this Invention can select the higher Partitional form of reuse degree, between adjacent packets caching reuse then by grouping serial number it is meticulous distribute come It realizes.

Detailed description of the invention

Fig. 1 is a variety of different time skewing method figures of one-dimensional problem；

Fig. 2 is the task dependency graph that Cholesky decomposes (5x5)；

Fig. 3 is the task dependency graph of grid application；

Fig. 4 is the priority packet figure obtained under one-dimensional grid space using nature partition patterns；

Fig. 5 is under two-dimensional grid space, using the packet diagram of plane partition patterns acquisition；

Fig. 6 is the difference of plane subdivision and natural subdivision.Triangular portions are subordinated to the packet diagram of left neighbour in natural subdivision；

Fig. 7 is overall flow figure of the present invention.

Wherein appended drawing reference are as follows:

Step 100/101/102/103/104/105.

Specific embodiment

In order to solve the above technical problems, the present invention proposes a kind of priority packet dispatching method using data topology information And system.

The following are overall steps of the invention, as shown in Figure 7:

The size of the mesh flake in the original mesh space is arranged in the original mesh space for obtaining the data topology information With floating point precision, new mesh space is generated.The corresponding global task of sharding method divides, and the size of mesh flake and floating point precision It is then used for the calculating of guide data footprint, each fragment to have unique mesh space coordinate；

According to the stencil format of new mesh space and parallel area, task image is tightened in building, time shaft upwards it there was only one The height of a grouping；Calculate the priority packet number of wherein each task, computation rule are as follows: according to the sequence of row major to each A data slice distributes initial priority；What the priority of the topological first layer task for tightening task image was accessed for the task The maximum value of priority corresponding to each data slice；The priority number of other tasks is each dependence originating task of the task The maximum value of priority number；

Obtain the data slice that is accessed of current task, and determine by way of format is abstract or function pointer and argument Whether it is related to neighbor data dependence, and generates respective markers；According to the label, identifies and be related to following for neighbor data dependence Ring, the circulation are to walk effective time；The duty mapping in the circulation is appointed to described tighten according to effective time step Business figure, the priority number of current task is calculated according to the priority number of the latter；

User (according to the spatial position of data slice) gives the mesh space coordinate gone out on missions in the new mesh space；

Neighbor data is obtained to rely on.The data slice that user's appointed task is accessed, and with stencil format is abstract or (letter Number pointers, argument) form indicate neighbor data rely on.

What neighbor data involved in grid application relied on has the calculating of two classes, and one kind is that stencil is calculated, and one kind is neighbour's number According to exchange (may be between the adjacent data piece of same layer grid or between the corresponding data piece of adjacent layer), stencil is calculated Format be it is fixed, static can describe, but neighbour's data exchange is just not necessarily, the application for some complexity, than Such as adaptive grid application, each data slice is uncertain there are several neighbours on corresponding clathrum, and data slice is Separation storage, its neighbour's data slice can not be calculated by this time dispatching system, but user can provide a function pointer with And corresponding argument counts its neighbor data piece.

Scheduling system identifies according to the label of data dependence and is related to the circulation of neighbours' dependence, is called effective time step；

On the space-time being made of effective time step and mesh space, according to the inclination conditions of data dependence, carry out Based on the grouping that hyperplane divides, there are two the constraints of grouping: i) data footprint is less than the one of the size of afterbody cache Half；Ii) the data reusing degree in group is big as far as possible (being greater than preset threshold)；

Priority is distributed to each grouping according to the sequence of row major, with the data reusing excavated between grouping；

Scheduling system is affine to fixed thread each task, to the task of each grouping is evenly dispersed into each Thread executes task schedule operative constraint in current group.

Each grid data piece has unique mesh space coordinate in the new mesh space module of generation.

Index module is obtained, the spatial position of the data slice for being accessed according to current task obtains task described Coordinate in new mesh space.

The constraint of grouping includes the half that data footprint is less than the size of afterbody cache；Number in each grouping It is greater than preset threshold according to reuse degree.

Compatibility distribution module, for each task is affine to fixed thread, so that the task each grouping is uniform Ground is distributed to each thread, and task schedule operative constraint is executed in current group.

The following are specific steps of the present invention, scene: the data in the implementation case on mesh space are Coutinuous stores, Gu Determine grid；Software: on the user interface and internal realization mechanism of scheduling system；Improvement: proposing new user interface, proposes The automatic algorithms of priority packet.Because space lattice is up to 3 dimensions, for convenience of describing, the following examples just only consider 3 dimensions The case where, for convenience of describing, present embodiment assumes that priority serial number is smaller, rank is higher, it is as follows:

S1 increases initial phase before the parallel area of task starts

S1.1 describes the global burst information of mesh space.Such as 3 dimension mesh space all divided in 3 axial directions, divide It can be uniform, be also possible to non-uniform, uniform to divide the spacing for providing division, non-uniform division will then give The shape of each fragment out, scheduling system calculate the mesh space after fragment according to information above.

S1.2 describes the dimension information of each fragment and the precision information of floating data.For uniform fragment, size It is specific, and for non-uniform fragment, user can provide an average fragment size, the precision of floating data then table Show calculating using single precision, double precision or 128 precision.

S1.3 describes the task in the parallel regions has neighbours' dependence, the net being related in parallel regions in which axial direction parallel The number of lattice variable.

The mode of S1.4 user's selection priority packet: natural subdivision, plane subdivision.

S1.5 scheduling system calculates the size<wx, wy, wz of priority packet in this stage, and wt>, it is also known that at this time The time shaft height of " tightening task image ".

For example the size of fragment is 16^3, is calculated using double precision, participate in calculating has Narray grid variable, currently The L3 caching of machine is 20M.If stencil format has neighbours' dependence in 6 directions of three space axial directions, then < wx, wy, Wz, wt > grouping data footprint be sizeof (double) * Narray* (wx+wt-1) * (wy+wt-1) * (wz+wt-1)≤ 20M*0.5*1.1, here 1.1 be one loosen coefficient with make up projection evaluation method aggressive.Thus it can calculate and meet Condition and the maximum multiple grouping shapes of data footprint, and select the wherein highest shape of reuse degree.

In the case of Fig. 4 shows 1 dimension mesh space, with the priority packet situation of nature partition patterns acquisition.Because The piecemeal size of time shaft is 8, and the piecemeal size of X-axis is 4, and task image herein is split as two windows, it can be seen that X-axis Division plane to t axis tilt.

The subdivision method that S1.6 is selected according to user constructs mapping table

S1.6.1 is according to natural subdivision:

To mesh space according to<wx, wy, wz>progress piecemeal subdivision are that each data fragmentation distributes according to linearisation sequence Initial packet number, mapping function initial_priority:<z, y, x>->priority；

S1.6.2 is according to plane subdivision:

S1.6.2.1 is needed in advance to hyperspace (the corresponding deflation task being made of wt effective time step and mesh space Figure), carry out affine division.

The neighbours' Dependency Specification provided according to 1.3 tilts the division plane of wx spacing in X-axis to t axis, then successively Y-axis, the division plane of Z axis are also tilted accordingly.The convex region being partitioned into this way is exactly a priority packet, each Grouping is owned by a three-dimensional coordinate.

The each grouping of S1.6.2.2 according to<z, y, x>lexcographical order sequence, decreasing priority.

S1.6.2.3 obtains priority mapping function relative_priority:<z, y, x, t>->priority_ offset

S1.6.2.4 obtains compatibility mapping function affinity:<z, y, x, t>->thread_id simultaneously

S1.6.2.5 calculates the maximum number of tasks total in each time window；

S2 distributes priority packet and thread affinity to each task in parallel area:

S2.1 user describes the data access information of task

S2.2 user describes this task in the coordinate information of mesh space

S2.3 dispatches system the duty mapping to corresponding priority packet in the construction phase of task

S2.3.1 sets the minimum packets from current window；If being selected from right subdivision, total is the tired of current group Count number of tasks；

Whether S2.3.2 has neighbours' dependence according to this task, and the effective time added up in current window walks ti；If working as front window Mouth has been expired, then from=from+total.If spacetime coordinate of this task in window is<ti, zi, yi, xi>.

S2.3.3 is according to natural subdivision:

If this task of S2.3.3.1 does not have forerunner or forerunner to belong to previous window, grouping serial number is initial_ priority(zi,yi,xi)+total；Otherwise, the grouping serial number of this task is the maximum value of the serial number of its all forerunner.

S2.3.3.2 distributes compatibility to this task.

S2.3.4 is according to plane subdivision:

The grouping serial number of this task is relative_priority (ti, zi, yi, xi)+from；Its compatibility is affinity(ti,zi,yi,xi)；

The following is an embodiment of the present invention 1, as follows:

This is the stencil calculating that 3d7p is carried out using jacobi iteration.Stencil meter is carried out in three-dimensional mesh space It calculates, global task has been carried out to 2 spatial axes and has been divided, discrete space uses fixed mesh, and data use the storage of single array Mode.

Here the size of fragment is 12^3, is calculated using double precision, and participate in calculating has 2 grid variables, current machine L3 caching be 20M.Because this calculating has neighbours' dependence in 6 directions of three space axial directions, then<wx, wy, wz, wt>point The data footprint of group is 8*2* (wx+wt-1) * (wy+wt-1) * (wz+wt-1) * 12*12*12≤20M*0.5*1.1, here 1.1, which are one, loosens coefficient to make up the aggressive of projection evaluation method.Thus it is maximum to obtain eligible and data footprint Two grouping shapes are<2,2,2,6>and<4,4,4,4>, but Du Genggao is reused in the group of the latter.So scheduling Systematic selection Grouping shape be<4,4,4,4>, i.e. the piecemeal width of the window height and spatial axes of time shaft is all 4, the maximum number in group It is 4 according to reuse degree.

Since the figure bandwagon effect of 3 dimension problems is bad, the group result of 1 dimension problem is only provided in diagram below.Fig. 4 What is provided is the natural subdivision result in two time windows.Here the width after space lattice divides is 16, space grouping Width is 4, and the height of time window is 8, according to user's mark, each corresponding effective time step of circulation.Each ellipse On be marked a triple<loop_id, task_id, prio_id>, successively indicate the task belong to which circulation, which Task, its packet numbering are how many.

The specific algorithm of natural subdivision grouping is as follows.Process in accordance with the present invention S1.6.1 calculates initial grouping and reflects Initial packet numbering is assigned in firing table, the task of first row, is 0,1,2,3 respectively.From the second row, each task Packet number takes the maximum value of forerunner.Spacer step exceeds window height 8 when active, then the task of this line will be according to " initial point Group mapping table " is adjusted, because the packet count in window is 4, the packet number of this line is 4,5,6,7.Continue below It is grouped.

If, can be the grouping of first window rightmost side in Fig. 4, with inclined line using the method for plane subdivision It is further broken into 3 groupings.The example that Fig. 6 gives is then that this packet fragmentation is 4 to be grouped.Plane subdivision is in other details It is upper substantially similar with natural subdivision.

The embodiment of the present invention 2, as follows:

The present embodiment is one of poisson equation solution using hot spot, and the present invention uses adaptability tessellated mesh, grid Fragment has dynamic and scrambling, and data then use the format of fragment storage rather than the form of single array.

Scheduling system priority packet on, there is no with the visibly different place of embodiment 1.

Here how main presentation, user describe: the fragment of mesh space divides, task image to mesh space reflects Penetrate and irregular neighbor data rely on expression.

● what is be shown below is an internal subprogram of looseGSRB (), its major function is to realize adjacent data The shadow region data exchange of piece.But in Adaptive grid method, the number of neighbours and position be all cannot it is static really Fixed.

Claims

1. a kind of priority packet dispatching method using data topology information characterized by comprising

Step 1, the ruler of the mesh flake in the original mesh space is arranged in the original mesh space for obtaining the data topology information Very little and floating point precision generates new mesh space；

Step 2, according to the stencil format of new mesh space and parallel area, building tightens task image, calculates the deflation task The priority packet of each task is numbered in figure, wherein priority computation rule are as follows: gives each data according to the sequence of row major Piece distributes initial priority number；The priority number for tightening task image upper topology first layer task is each of its access The maximum value of priority number corresponding to data slice；The priority number of other tasks is the dependence originating task of each dependence edge The maximum value of priority number；

Step 3, the difference for obtaining the data slice that current task is accessed, and being calculated by obtaining the stencil recycled where it The abstract form of format relies on to decide whether to be related to neighbor data, or enumerates what all neighbours relied on by one or more Function pointer is relied in conjunction with the form of the argument of specific value to decide whether to be related to neighbor data, and generates respective markers；Root It according to the label, identifies and is related to the circulation of neighbor data dependence, the circulation is to walk effective time；According to the effective time The space coordinate of step and task is mapped to current task some task tightened on task image, is compiled according to the priority of the latter Number calculate current task priority number.

2. utilizing the priority packet dispatching method of data topology information as described in claim 1, which is characterized in that the step Each data slice has unique mesh space coordinate in rapid 1.

3. utilizing the priority packet dispatching method of data topology information as described in claim 1, which is characterized in that the step Further include the spatial position of the data slice accessed according to each task between rapid 2 and the step 3, obtains the task in institute State the coordinate in new mesh space.

4. utilizing the priority packet dispatching method of data topology information as described in claim 1, which is characterized in that the step Rapid 2 further include the half that the constraint of grouping is less than the size of afterbody cache including data footprint；Number in each grouping It is greater than preset threshold according to reuse degree.

5. utilizing the priority packet dispatching method of data topology information as described in claim 1, which is characterized in that further include Each task is affine to fixed thread, so that the task of each grouping is evenly dispersed into each thread, task schedule Operative constraint executes in current group.

6. a kind of priority packet using data topology information dispatches system characterized by comprising

New mesh space module is generated, for obtaining the original mesh space of the data topology information, the original net is set The size and floating point precision of the mesh flake of grid space, generate new mesh space；

The priority packet module for tightening task image, according to the stencil format of new mesh space and parallel area, building, which is tightened, appoints Business figure, determines the grouping width for tightening task image；The priority computation rule of task on the deflation task image are as follows: excellent according to row First sequence distributes initial priority to each data slice；The priority of the topological first layer task for tightening task image is should The maximum value of priority number corresponding to each data slice that task is accessed；The priority of other tasks is on deflation task image The maximum value of the priority number of the dependence originating task of each dependence edge of this task；

Dynamic Packet module, the data slice accessed for obtaining current task, and by obtaining the stencil recycled where it The abstract form of the difference scheme of calculating relies on to decide whether to be related to neighbor data, or is enumerated by one or more all The form of the argument of function pointer and specific value that neighbours rely on relies on to decide whether to be related to neighbor data, and generates corresponding Label；It according to the label, identifies and is related to the circulation of neighbor data dependence, the circulation is to walk effective time；According to described Effective time step is compiled the duty mapping in the circulation to some task tightened on task image according to the priority of the latter Number calculate current task priority number.

7. dispatching system using the priority packet of data topology information as claimed in claim 6, which is characterized in that the life There is unique mesh space coordinate at each grid data piece in new mesh space module.

8. dispatching system using the priority packet of data topology information as claimed in claim 6, which is characterized in that further include Index module is obtained, it is empty in the new grid to obtain task for the spatial position of the data slice for being accessed according to current task Between in coordinate.

9. dispatching system using the priority packet of data topology information as claimed in claim 6, which is characterized in that grouping Constraint includes the half that data footprint is less than the size of afterbody cache；Data reusing degree in each grouping is greater than pre- If threshold value.

10. dispatching system using the priority packet of data topology information as claimed in claim 6, which is characterized in that also wrap Compatibility distribution module is included, for each task is affine to fixed thread, so that the task of each grouping is uniformly dispersed To each thread, task schedule operative constraint is executed in current group.