CN105528243B - A kind of priority packet dispatching method and system using data topology information - Google Patents
A kind of priority packet dispatching method and system using data topology information Download PDFInfo
- Publication number
- CN105528243B CN105528243B CN201510382438.5A CN201510382438A CN105528243B CN 105528243 B CN105528243 B CN 105528243B CN 201510382438 A CN201510382438 A CN 201510382438A CN 105528243 B CN105528243 B CN 105528243B
- Authority
- CN
- China
- Prior art keywords
- task
- data
- priority
- topology information
- mesh space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The present invention discloses a kind of priority packet dispatching method and system using data topology information, this method comprises: obtaining the original mesh space of data topology information, the size and floating point precision of the mesh flake in original mesh space is arranged, generates new mesh space;According to the stencil format of new mesh space and parallel area, task image (four-dimensional spacetime domain) is tightened in building, calculates the priority packet number of wherein each task;Obtain the data slice that current task is accessed, and it is relied on by way of format is abstract or function pointer and argument to decide whether to be related to neighbor data, and generate respective markers, go out to be related to the circulation of neighbor data dependence according to marker recognition, this circulation is to walk effective time, according to the currently active time step the duty mapping in circulation to some task for tightening task image, the priority number of current task is calculated according to the priority number of the latter, the priority packet of task is supported to dispatch.
Description
Technical field
It is the present invention relates to computer and the task schedule field of user class, in particular to a kind of using data topology information
Priority packet dispatching method and system.
Background technique
There is the memory wall for being difficult to go beyond in modern microprocessor architecture, how to optimize the data locality in program,
Its caching performance on a processor is improved, is the important topic of optimizing application.
The background of structured grid method is as follows: there is a large amount of in the fields such as thermal diffusion, electromagnetic field and hydrodynamics
Solving Nonlinear Equation, the only available analytic solutions of only a few problem or perturbed solution, structured grid method are that solution is this kind of
One of most important numerical method of problem, exemplary steps are as follows: discretization solves domain, continuous solution domain is turned to limited
Discrete point set, such as fixed isometric net or adaptive structured grid, then replace difference quotient with difference coefficient, ask solving in domain
The discrete solution of grid node out makes solution of difference equation converge on the solution of differential method by certain iterative steps.
The Difference Calculation that each point carries out in structure (discrete) grid, it is often necessary to the value of neighbouring some points, this
The calculating of class is referred to as Stencil calculating, and Stencil calculates the core for being structured grid application and universally acknowledged 7 classes height
One of performance calculating mode.
The relevant technologies 1: optimize the data reusing in stencil calculating with Time skewing method, as shown in Figure 1:
The calculating memory access ratio of structured grid application is very low, and the utilization rate for how optimizing cache exists as this class method is improved
One of the key of performance on multiple nucleus system, academia propose the optimization algorithm of various Time skewing, to optimize simultaneously simultaneously
Row and data locality, main thought be, to the hyperspace of the stencil iteration step calculated and space lattice composition into
The piecemeal of line tilt --- originally very long data reusing is reduced to inside piecemeal, then determines the dependence between these piecemeals
Relationship, realizes a kind of quasi-static task schedule, the essence of Time skewing optimization be the time iteration circulation of outer layer with
The traversal calculating of internal space lattice carries out unified consideration, implements complicated cyclical-transformation.
Time skewing method needs could be implemented outermost time cyclical-transformation to internal layer, for complicated applications
For implement it is very difficult, it is even not enforceable.Adaptive grid method is exactly such a application type, this method
A kind of effective ways of complicated physical problem are to solve for, it according to error dynamics is determined at every point of time or spatial point is
It is no that adaptability is needed to encrypt, to improve solving precision.Firstly, such methods are extremely complex in realization --- it must be based on maturation
Field programming framework carry out application and development, data structure becomes complicated and is no longer simple array, calculate step and also go out
The case where existing C/Fortran mixes, it is static to carrying out between entire mesh space and each calculating step that this causes compiling to be difficult
Dependency analysis, to cannot achieve time skewing.Second, due to the dynamic of grid dividing, the neighbours of mesh point are
It static cannot determine, which also limits the implementations of time skewing optimization.
The relevant technologies 2: task is parallel and task schedule in optimizing scheduling, as (English in Fig. 2 describes corresponding Fig. 2
Be choleskey decompose in four basic operations, code realization in be call high-performance math library library function), Fig. 3
It is shown:
Task parallel programming model is the parallel programming model studied and used extensively on multi-core platform in recent years, it is intended to be simplified
Multiple programming and improve multicore utilization rate, industrial circle and academia have developed many this kind of multiple programming interfaces, than if any
Cilk/Cilk++, OpenMP3.0, X10, Habanero-Java, TBB, TPL etc., the program write with this interface, in program
Task will form a derivation tree namely a directed acyclic graph, and runtime system is responsible for task schedule, and each verification answers one
A physical thread, each physical thread can execute many logic tasks, and this task schedule carried out in user's state space is significantly
The expense of scheduling is reduced, to improve the execution efficiency of multithread programs, runtime system uses task stealing dispatching algorithm, obtains
Load balance is obtained, the service efficiency of multicore is improved.
In terms of improving the data locality in task schedule, Umut A.Acar, Guy E.Blelloch, Robert
D.Blumofe.The Data Locality of Work Stealing.ACM Symposium on Parallel
Algorithms and Architectures.Proceedings of the twelfth annual ACM symposium
of Parallel algorithms and architectures.Jul.2000,Bar Harbor,Maine,United
States.pp.1-12 proposes calculating and the affine mechanism of task, and is adopted by some actual task scheduling systems, this
The node that kind method is more embodied in cluster to the optimization of grid application on storage optimization, for the effect of optimization of cache
It is not significant.
Priority scheduling is also a kind of common optimizing scheduling, but it is commonly used in those critical paths obviously than it
The longer task image in his path, improves the load balance of task.For example Fig. 2 shows the choleskey based on partitioned organization
The task image of decomposition needs to give higher priority wherein blue-black dpotrf is key task.
But in the task image much applied and apparent key task, such as the task image of structured grid application is not present
The appearance of Fig. 3 is generally showed, coupling is close between each task, and each directed walk is almost just as long, does not deposit
In some especially long critical path.(here, the color of task indicates that it is executed on which thread.) currently without human hair
Existing, priority packet scheduling can be used to improve the task schedule performance of structured grid application.
Summary of the invention
The present invention is directed structured grid application, it is therefore an objective on traditional Task Scheduling Mechanism, pass through pragmatic family
The collaboration of interface and scheduling system, realizes a kind of priority packet dispatching method and system using data topology information.
The present invention proposes a kind of priority packet dispatching method using data topology information, comprising:
Step 1, the mesh flake in the original mesh space is arranged in the original mesh space for obtaining the data topology information
Size and floating point precision, generate new mesh space;
Step 2, according to the stencil format of new mesh space and parallel area, building tightens task image, calculates the deflation
The priority packet of each task is numbered in task image, wherein priority computation rule are as follows: according to the sequence of row major to each
Data slice distributes initial priority number;The priority number for tightening task image upper topology first layer task is its access
The maximum value of priority number corresponding to each data slice;The priority number of other tasks is that the dependence source of each dependence edge is appointed
The maximum value of the priority number of business;
Step 3, obtain the data slice that is accessed of current task, and by format be abstracted or function pointer and argument by way of
It is relied on to decide whether to be related to neighbor data, and generates respective markers;According to the label, identifies and be related to neighbor data dependence
Circulation, it is described circulation be effective time walk;According to the space coordinate of effective time step and task, current task is mapped
To some task tightened on task image, the priority number of current task is calculated according to the priority number of the latter.
The priority packet dispatching method using data topology information, each data slice has in the step 1
Unique mesh space coordinate.
The priority packet dispatching method using data topology information, between the step 2 and the step 3 also
Spatial position including the data slice accessed according to each task obtains seat of the task in the new mesh space
Mark.
The priority packet dispatching method using data topology information, the step 2 further include the constraint of grouping
It is less than the half of the size of afterbody cache including data footprint;Data reusing degree in each grouping is greater than default threshold
Value.
The priority packet dispatching method using data topology information further includes that each task is affine to fixation
Thread, so that the task of each grouping is evenly dispersed into each thread, task schedule operative constraint in current group
It executes.
The present invention also proposes that a kind of priority packet using data topology information dispatches system, comprising:
New mesh space module is generated, for obtaining the original mesh space of the data topology information, the original is set
The size and floating point precision of the mesh flake of beginning mesh space, generate new mesh space;
The priority packet module for tightening task image, according to the stencil format of new mesh space and parallel area, building is tight
Contracting task image determines the grouping width for tightening task image;The priority computation rule of task on the deflation task image are as follows: according to
The sequence of row major distributes initial priority to each data slice;The priority of the topological first layer task for tightening task image
The maximum value of priority number corresponding to each data slice accessed for the task;Other tasks is preferential on deflation task image
Grade is the maximum value of the priority number of the dependence originating task of each dependence edge of this task;
Dynamic Packet module, the data slice accessed for obtaining current task, and by the way that format is abstract or function pointer
It is relied on the form of argument to decide whether to be related to neighbor data, and generates respective markers;According to the label, identifies and be related to
The circulation that neighbor data relies on, the circulation are to walk effective time;It is walked according to the effective time the task in the circulation
It is mapped to some task tightened on task image, the priority number of current task is calculated according to the priority number of the latter.
The priority packet using data topology information dispatches system, described to generate in new mesh space module often
A grid data piece has unique mesh space coordinate.
The priority packet using data topology information dispatches system, further includes obtaining index module, is used for root
According to the spatial position for the data slice that current task is accessed, coordinate of the task in the new mesh space is obtained.
The priority packet using data topology information dispatches system, and the constraint of grouping includes that data footprint wants small
In the half of the size of afterbody cache;Data reusing degree in each grouping is greater than preset threshold.
The priority packet using data topology information dispatches system, further includes compatibility distribution module, is used for
Each task is affine to fixed thread, so that the task of each grouping is evenly dispersed into each thread, task schedule
Operative constraint executes in current group.
As it can be seen from the above scheme the present invention has the advantages that
Technical effect includes two parts, one is the task of support reflecting automatically to four-dimensional spacetime domain (tightening task image)
It penetrates;The second is supporting priority packet scheduling, and the reuse to shared buffer memory is realized between packets inner, adjacent packets.
1. providing one group of user interface, support task image to the automatic mapping of four-dimensional spacetime domain (tightening task image), and not
It is limited by program complexity.This group of user interface is as described in Table 1:
One group of new user interface of table 1.
The reason of this technology point brings beneficial effect: 1) information provided according to user, scheduling system are able to achieve task and arrive
The mapping in four-dimensional spacetime domain (tightening task image).The data access information that user is responsible for division task, describes task.Task is to four
Tie up time-space domain mapping in maximum difficult point be --- neighbours rely on expression, since we provide most general function pointers
Describing mode, this difficult point is overcome;2) this work belongs to the optimization of scheduling system, is adapted to arbitrarily complicated structure
Grid application.
2. using the tutorial message of user, scheduling system can automatically select a good grouping shape, data are affine mode,
Priority packet mode, and the caching between optimization task reuses, improves application performance, can abstractively be described with table 2:
Two algorithms (natural subdivision algorithm, plane subdivision algorithm) of 2. priority packet of table
The reason of this technology point brings beneficial effect: 1) present invention can choose out good grouping shape and size, be grouped
Data footprint can be estimated with the oblique projection of divisional plane, wherein shared for meeting in conjunction with the size of mesh flake
The grouping dimension of buffer memory capacity, the present invention preferably cache the better grouping dimension of reuse degree;2) present invention can exist schedule constraints
In one grouping, this technology point is responsible for determining the mapping that data are affine, and the task in same grouping is distributed to each line as far as possible
Journey, so as to obtain the data reusing in shared buffer memory;3) the time shaft height being grouped is related to the data reusing degree in group, this
Invention can select the higher Partitional form of reuse degree, between adjacent packets caching reuse then by grouping serial number it is meticulous distribute come
It realizes.
Detailed description of the invention
Fig. 1 is a variety of different time skewing method figures of one-dimensional problem;
Fig. 2 is the task dependency graph that Cholesky decomposes (5x5);
Fig. 3 is the task dependency graph of grid application;
Fig. 4 is the priority packet figure obtained under one-dimensional grid space using nature partition patterns;
Fig. 5 is under two-dimensional grid space, using the packet diagram of plane partition patterns acquisition;
Fig. 6 is the difference of plane subdivision and natural subdivision.Triangular portions are subordinated to the packet diagram of left neighbour in natural subdivision;
Fig. 7 is overall flow figure of the present invention.
Wherein appended drawing reference are as follows:
Step 100/101/102/103/104/105.
Specific embodiment
In order to solve the above technical problems, the present invention proposes a kind of priority packet dispatching method using data topology information
And system.
The following are overall steps of the invention, as shown in Figure 7:
The size of the mesh flake in the original mesh space is arranged in the original mesh space for obtaining the data topology information
With floating point precision, new mesh space is generated.The corresponding global task of sharding method divides, and the size of mesh flake and floating point precision
It is then used for the calculating of guide data footprint, each fragment to have unique mesh space coordinate;
According to the stencil format of new mesh space and parallel area, task image is tightened in building, time shaft upwards it there was only one
The height of a grouping;Calculate the priority packet number of wherein each task, computation rule are as follows: according to the sequence of row major to each
A data slice distributes initial priority;What the priority of the topological first layer task for tightening task image was accessed for the task
The maximum value of priority corresponding to each data slice;The priority number of other tasks is each dependence originating task of the task
The maximum value of priority number;
Obtain the data slice that is accessed of current task, and determine by way of format is abstract or function pointer and argument
Whether it is related to neighbor data dependence, and generates respective markers;According to the label, identifies and be related to following for neighbor data dependence
Ring, the circulation are to walk effective time;The duty mapping in the circulation is appointed to described tighten according to effective time step
Business figure, the priority number of current task is calculated according to the priority number of the latter;
User (according to the spatial position of data slice) gives the mesh space coordinate gone out on missions in the new mesh space;
Neighbor data is obtained to rely on.The data slice that user's appointed task is accessed, and with stencil format is abstract or (letter
Number pointers, argument) form indicate neighbor data rely on.
What neighbor data involved in grid application relied on has the calculating of two classes, and one kind is that stencil is calculated, and one kind is neighbour's number
According to exchange (may be between the adjacent data piece of same layer grid or between the corresponding data piece of adjacent layer), stencil is calculated
Format be it is fixed, static can describe, but neighbour's data exchange is just not necessarily, the application for some complexity, than
Such as adaptive grid application, each data slice is uncertain there are several neighbours on corresponding clathrum, and data slice is
Separation storage, its neighbour's data slice can not be calculated by this time dispatching system, but user can provide a function pointer with
And corresponding argument counts its neighbor data piece.
Scheduling system identifies according to the label of data dependence and is related to the circulation of neighbours' dependence, is called effective time step;
On the space-time being made of effective time step and mesh space, according to the inclination conditions of data dependence, carry out
Based on the grouping that hyperplane divides, there are two the constraints of grouping: i) data footprint is less than the one of the size of afterbody cache
Half;Ii) the data reusing degree in group is big as far as possible (being greater than preset threshold);
Priority is distributed to each grouping according to the sequence of row major, with the data reusing excavated between grouping;
Scheduling system is affine to fixed thread each task, to the task of each grouping is evenly dispersed into each
Thread executes task schedule operative constraint in current group.
The present invention also proposes that a kind of priority packet using data topology information dispatches system, comprising:
New mesh space module is generated, for obtaining the original mesh space of the data topology information, the original is set
The size and floating point precision of the mesh flake of beginning mesh space, generate new mesh space;
The priority packet module for tightening task image, according to the stencil format of new mesh space and parallel area, building is tight
Contracting task image determines the grouping width for tightening task image;The priority computation rule of task on the deflation task image are as follows: according to
The sequence of row major distributes initial priority to each data slice;The priority of the topological first layer task for tightening task image
The maximum value of priority number corresponding to each data slice accessed for the task;Other tasks is preferential on deflation task image
Grade is the maximum value of the priority number of the dependence originating task of each dependence edge of this task;
Dynamic Packet module, the data slice accessed for obtaining current task, and by the way that format is abstract or function pointer
It is relied on the form of argument to decide whether to be related to neighbor data, and generates respective markers;According to the label, identifies and be related to
The circulation that neighbor data relies on, the circulation are to walk effective time;It is walked according to the effective time the task in the circulation
It is mapped to some task tightened on task image, the priority number of current task is calculated according to the priority number of the latter.
Each grid data piece has unique mesh space coordinate in the new mesh space module of generation.
Index module is obtained, the spatial position of the data slice for being accessed according to current task obtains task described
Coordinate in new mesh space.
The constraint of grouping includes the half that data footprint is less than the size of afterbody cache;Number in each grouping
It is greater than preset threshold according to reuse degree.
Compatibility distribution module, for each task is affine to fixed thread, so that the task each grouping is uniform
Ground is distributed to each thread, and task schedule operative constraint is executed in current group.
The following are specific steps of the present invention, scene: the data in the implementation case on mesh space are Coutinuous stores, Gu
Determine grid;Software: on the user interface and internal realization mechanism of scheduling system;Improvement: proposing new user interface, proposes
The automatic algorithms of priority packet.Because space lattice is up to 3 dimensions, for convenience of describing, the following examples just only consider 3 dimensions
The case where, for convenience of describing, present embodiment assumes that priority serial number is smaller, rank is higher, it is as follows:
S1 increases initial phase before the parallel area of task starts
S1.1 describes the global burst information of mesh space.Such as 3 dimension mesh space all divided in 3 axial directions, divide
It can be uniform, be also possible to non-uniform, uniform to divide the spacing for providing division, non-uniform division will then give
The shape of each fragment out, scheduling system calculate the mesh space after fragment according to information above.
S1.2 describes the dimension information of each fragment and the precision information of floating data.For uniform fragment, size
It is specific, and for non-uniform fragment, user can provide an average fragment size, the precision of floating data then table
Show calculating using single precision, double precision or 128 precision.
S1.3 describes the task in the parallel regions has neighbours' dependence, the net being related in parallel regions in which axial direction parallel
The number of lattice variable.
The mode of S1.4 user's selection priority packet: natural subdivision, plane subdivision.
S1.5 scheduling system calculates the size<wx, wy, wz of priority packet in this stage, and wt>, it is also known that at this time
The time shaft height of " tightening task image ".
For example the size of fragment is 16^3, is calculated using double precision, participate in calculating has Narray grid variable, currently
The L3 caching of machine is 20M.If stencil format has neighbours' dependence in 6 directions of three space axial directions, then < wx, wy,
Wz, wt > grouping data footprint be sizeof (double) * Narray* (wx+wt-1) * (wy+wt-1) * (wz+wt-1)≤
20M*0.5*1.1, here 1.1 be one loosen coefficient with make up projection evaluation method aggressive.Thus it can calculate and meet
Condition and the maximum multiple grouping shapes of data footprint, and select the wherein highest shape of reuse degree.
In the case of Fig. 4 shows 1 dimension mesh space, with the priority packet situation of nature partition patterns acquisition.Because
The piecemeal size of time shaft is 8, and the piecemeal size of X-axis is 4, and task image herein is split as two windows, it can be seen that X-axis
Division plane to t axis tilt.
The subdivision method that S1.6 is selected according to user constructs mapping table
S1.6.1 is according to natural subdivision:
To mesh space according to<wx, wy, wz>progress piecemeal subdivision are that each data fragmentation distributes according to linearisation sequence
Initial packet number, mapping function initial_priority:<z, y, x>->priority;
S1.6.2 is according to plane subdivision:
S1.6.2.1 is needed in advance to hyperspace (the corresponding deflation task being made of wt effective time step and mesh space
Figure), carry out affine division.
The neighbours' Dependency Specification provided according to 1.3 tilts the division plane of wx spacing in X-axis to t axis, then successively
Y-axis, the division plane of Z axis are also tilted accordingly.The convex region being partitioned into this way is exactly a priority packet, each
Grouping is owned by a three-dimensional coordinate.
The each grouping of S1.6.2.2 according to<z, y, x>lexcographical order sequence, decreasing priority.
S1.6.2.3 obtains priority mapping function relative_priority:<z, y, x, t>->priority_
offset
S1.6.2.4 obtains compatibility mapping function affinity:<z, y, x, t>->thread_id simultaneously
S1.6.2.5 calculates the maximum number of tasks total in each time window;
S2 distributes priority packet and thread affinity to each task in parallel area:
S2.1 user describes the data access information of task
S2.2 user describes this task in the coordinate information of mesh space
S2.3 dispatches system the duty mapping to corresponding priority packet in the construction phase of task
S2.3.1 sets the minimum packets from current window;If being selected from right subdivision, total is the tired of current group
Count number of tasks;
Whether S2.3.2 has neighbours' dependence according to this task, and the effective time added up in current window walks ti;If working as front window
Mouth has been expired, then from=from+total.If spacetime coordinate of this task in window is<ti, zi, yi, xi>.
S2.3.3 is according to natural subdivision:
If this task of S2.3.3.1 does not have forerunner or forerunner to belong to previous window, grouping serial number is initial_
priority(zi,yi,xi)+total;Otherwise, the grouping serial number of this task is the maximum value of the serial number of its all forerunner.
S2.3.3.2 distributes compatibility to this task.
S2.3.4 is according to plane subdivision:
The grouping serial number of this task is relative_priority (ti, zi, yi, xi)+from;Its compatibility is
affinity(ti,zi,yi,xi);
The following is an embodiment of the present invention 1, as follows:
This is the stencil calculating that 3d7p is carried out using jacobi iteration.Stencil meter is carried out in three-dimensional mesh space
It calculates, global task has been carried out to 2 spatial axes and has been divided, discrete space uses fixed mesh, and data use the storage of single array
Mode.
Here the size of fragment is 12^3, is calculated using double precision, and participate in calculating has 2 grid variables, current machine
L3 caching be 20M.Because this calculating has neighbours' dependence in 6 directions of three space axial directions, then<wx, wy, wz, wt>point
The data footprint of group is 8*2* (wx+wt-1) * (wy+wt-1) * (wz+wt-1) * 12*12*12≤20M*0.5*1.1, here
1.1, which are one, loosens coefficient to make up the aggressive of projection evaluation method.Thus it is maximum to obtain eligible and data footprint
Two grouping shapes are<2,2,2,6>and<4,4,4,4>, but Du Genggao is reused in the group of the latter.So scheduling Systematic selection
Grouping shape be<4,4,4,4>, i.e. the piecemeal width of the window height and spatial axes of time shaft is all 4, the maximum number in group
It is 4 according to reuse degree.
Since the figure bandwagon effect of 3 dimension problems is bad, the group result of 1 dimension problem is only provided in diagram below.Fig. 4
What is provided is the natural subdivision result in two time windows.Here the width after space lattice divides is 16, space grouping
Width is 4, and the height of time window is 8, according to user's mark, each corresponding effective time step of circulation.Each ellipse
On be marked a triple<loop_id, task_id, prio_id>, successively indicate the task belong to which circulation, which
Task, its packet numbering are how many.
The specific algorithm of natural subdivision grouping is as follows.Process in accordance with the present invention S1.6.1 calculates initial grouping and reflects
Initial packet numbering is assigned in firing table, the task of first row, is 0,1,2,3 respectively.From the second row, each task
Packet number takes the maximum value of forerunner.Spacer step exceeds window height 8 when active, then the task of this line will be according to " initial point
Group mapping table " is adjusted, because the packet count in window is 4, the packet number of this line is 4,5,6,7.Continue below
It is grouped.
If, can be the grouping of first window rightmost side in Fig. 4, with inclined line using the method for plane subdivision
It is further broken into 3 groupings.The example that Fig. 6 gives is then that this packet fragmentation is 4 to be grouped.Plane subdivision is in other details
It is upper substantially similar with natural subdivision.
The embodiment of the present invention 2, as follows:
The present embodiment is one of poisson equation solution using hot spot, and the present invention uses adaptability tessellated mesh, grid
Fragment has dynamic and scrambling, and data then use the format of fragment storage rather than the form of single array.
Scheduling system priority packet on, there is no with the visibly different place of embodiment 1.
Here how main presentation, user describe: the fragment of mesh space divides, task image to mesh space reflects
Penetrate and irregular neighbor data rely on expression.
● what is be shown below is an internal subprogram of looseGSRB (), its major function is to realize adjacent data
The shadow region data exchange of piece.But in Adaptive grid method, the number of neighbours and position be all cannot it is static really
Fixed.
Claims (10)
1. a kind of priority packet dispatching method using data topology information characterized by comprising
Step 1, the ruler of the mesh flake in the original mesh space is arranged in the original mesh space for obtaining the data topology information
Very little and floating point precision generates new mesh space;
Step 2, according to the stencil format of new mesh space and parallel area, building tightens task image, calculates the deflation task
The priority packet of each task is numbered in figure, wherein priority computation rule are as follows: gives each data according to the sequence of row major
Piece distributes initial priority number;The priority number for tightening task image upper topology first layer task is each of its access
The maximum value of priority number corresponding to data slice;The priority number of other tasks is the dependence originating task of each dependence edge
The maximum value of priority number;
Step 3, the difference for obtaining the data slice that current task is accessed, and being calculated by obtaining the stencil recycled where it
The abstract form of format relies on to decide whether to be related to neighbor data, or enumerates what all neighbours relied on by one or more
Function pointer is relied in conjunction with the form of the argument of specific value to decide whether to be related to neighbor data, and generates respective markers;Root
It according to the label, identifies and is related to the circulation of neighbor data dependence, the circulation is to walk effective time;According to the effective time
The space coordinate of step and task is mapped to current task some task tightened on task image, is compiled according to the priority of the latter
Number calculate current task priority number.
2. utilizing the priority packet dispatching method of data topology information as described in claim 1, which is characterized in that the step
Each data slice has unique mesh space coordinate in rapid 1.
3. utilizing the priority packet dispatching method of data topology information as described in claim 1, which is characterized in that the step
Further include the spatial position of the data slice accessed according to each task between rapid 2 and the step 3, obtains the task in institute
State the coordinate in new mesh space.
4. utilizing the priority packet dispatching method of data topology information as described in claim 1, which is characterized in that the step
Rapid 2 further include the half that the constraint of grouping is less than the size of afterbody cache including data footprint;Number in each grouping
It is greater than preset threshold according to reuse degree.
5. utilizing the priority packet dispatching method of data topology information as described in claim 1, which is characterized in that further include
Each task is affine to fixed thread, so that the task of each grouping is evenly dispersed into each thread, task schedule
Operative constraint executes in current group.
6. a kind of priority packet using data topology information dispatches system characterized by comprising
New mesh space module is generated, for obtaining the original mesh space of the data topology information, the original net is set
The size and floating point precision of the mesh flake of grid space, generate new mesh space;
The priority packet module for tightening task image, according to the stencil format of new mesh space and parallel area, building, which is tightened, appoints
Business figure, determines the grouping width for tightening task image;The priority computation rule of task on the deflation task image are as follows: excellent according to row
First sequence distributes initial priority to each data slice;The priority of the topological first layer task for tightening task image is should
The maximum value of priority number corresponding to each data slice that task is accessed;The priority of other tasks is on deflation task image
The maximum value of the priority number of the dependence originating task of each dependence edge of this task;
Dynamic Packet module, the data slice accessed for obtaining current task, and by obtaining the stencil recycled where it
The abstract form of the difference scheme of calculating relies on to decide whether to be related to neighbor data, or is enumerated by one or more all
The form of the argument of function pointer and specific value that neighbours rely on relies on to decide whether to be related to neighbor data, and generates corresponding
Label;It according to the label, identifies and is related to the circulation of neighbor data dependence, the circulation is to walk effective time;According to described
Effective time step is compiled the duty mapping in the circulation to some task tightened on task image according to the priority of the latter
Number calculate current task priority number.
7. dispatching system using the priority packet of data topology information as claimed in claim 6, which is characterized in that the life
There is unique mesh space coordinate at each grid data piece in new mesh space module.
8. dispatching system using the priority packet of data topology information as claimed in claim 6, which is characterized in that further include
Index module is obtained, it is empty in the new grid to obtain task for the spatial position of the data slice for being accessed according to current task
Between in coordinate.
9. dispatching system using the priority packet of data topology information as claimed in claim 6, which is characterized in that grouping
Constraint includes the half that data footprint is less than the size of afterbody cache;Data reusing degree in each grouping is greater than pre-
If threshold value.
10. dispatching system using the priority packet of data topology information as claimed in claim 6, which is characterized in that also wrap
Compatibility distribution module is included, for each task is affine to fixed thread, so that the task of each grouping is uniformly dispersed
To each thread, task schedule operative constraint is executed in current group.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510382438.5A CN105528243B (en) | 2015-07-02 | 2015-07-02 | A kind of priority packet dispatching method and system using data topology information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510382438.5A CN105528243B (en) | 2015-07-02 | 2015-07-02 | A kind of priority packet dispatching method and system using data topology information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105528243A CN105528243A (en) | 2016-04-27 |
CN105528243B true CN105528243B (en) | 2019-01-11 |
Family
ID=55770489
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510382438.5A Active CN105528243B (en) | 2015-07-02 | 2015-07-02 | A kind of priority packet dispatching method and system using data topology information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105528243B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106648859A (en) * | 2016-12-01 | 2017-05-10 | 北京奇虎科技有限公司 | Task scheduling method and device |
CN107220111B (en) * | 2017-04-28 | 2019-08-09 | 华中科技大学 | A kind of method for scheduling task that task based access control is stolen and system |
CN107784674B (en) * | 2017-10-26 | 2021-05-14 | 浙江科澜信息技术有限公司 | Method and system for simplifying three-dimensional model |
CN108509220B (en) * | 2018-04-02 | 2021-01-22 | 厦门海迈科技股份有限公司 | Revit engineering calculation amount parallel processing method, device, terminal and medium |
CN108647900B (en) * | 2018-05-18 | 2022-03-11 | 北京科技大学 | Region division method applied to hydrological simulation field |
CN112631610B (en) * | 2020-11-30 | 2022-04-26 | 上海交通大学 | Method for eliminating memory access conflict for data reuse of coarse-grained reconfigurable structure |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050081200A1 (en) * | 2001-12-14 | 2005-04-14 | Rutten Martijn Johan | Data processing system having multiple processors, a task scheduler for a data processing system having multiple processors and a corresponding method for task scheduling |
CN101571814A (en) * | 2009-06-01 | 2009-11-04 | 中国科学院计算技术研究所 | Communication behavior information of device based on message passing interface extraction method and system thereof |
CN102193826A (en) * | 2011-05-24 | 2011-09-21 | 哈尔滨工程大学 | Method for high-efficiency task scheduling of heterogeneous multi-core processor |
CN102681901A (en) * | 2012-05-08 | 2012-09-19 | 西安交通大学 | Segmental reconfigurable hardware task arranging method |
CN102831011A (en) * | 2012-08-10 | 2012-12-19 | 上海交通大学 | Task scheduling method and device based on multi-core system |
-
2015
- 2015-07-02 CN CN201510382438.5A patent/CN105528243B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050081200A1 (en) * | 2001-12-14 | 2005-04-14 | Rutten Martijn Johan | Data processing system having multiple processors, a task scheduler for a data processing system having multiple processors and a corresponding method for task scheduling |
CN101571814A (en) * | 2009-06-01 | 2009-11-04 | 中国科学院计算技术研究所 | Communication behavior information of device based on message passing interface extraction method and system thereof |
CN102193826A (en) * | 2011-05-24 | 2011-09-21 | 哈尔滨工程大学 | Method for high-efficiency task scheduling of heterogeneous multi-core processor |
CN102681901A (en) * | 2012-05-08 | 2012-09-19 | 西安交通大学 | Segmental reconfigurable hardware task arranging method |
CN102831011A (en) * | 2012-08-10 | 2012-12-19 | 上海交通大学 | Task scheduling method and device based on multi-core system |
Non-Patent Citations (1)
Title |
---|
"基于异构多核处理器的高效任务调度算法";李静梅,等;《高技术通讯》;20121231;第22卷(第3期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN105528243A (en) | 2016-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105528243B (en) | A kind of priority packet dispatching method and system using data topology information | |
Agullo et al. | Robust memory-aware mappings for parallel multifrontal factorizations | |
US20220057949A1 (en) | Systems and methods for minimizing communications | |
Dreher et al. | Racoon: A parallel mesh-adaptive framework for hyperbolic conservation laws | |
Ciechanowicz et al. | Enhancing muesli's data parallel skeletons for multi-core computer architectures | |
CN110516316B (en) | GPU acceleration method for solving Euler equation by interrupted Galerkin method | |
CN106021480B (en) | A kind of parallel spatial division methods and its system based on grid dividing | |
CN103150148B (en) | The big scale parallel method for embedding of remote sensing image of a kind of task based access control tree | |
Choi et al. | {EnvPipe}: Performance-preserving {DNN} training framework for saving energy | |
US7589719B2 (en) | Fast multi-pass partitioning via priority based scheduling | |
CN113553288B (en) | Two-layer blocking multicolor parallel optimization method for HPCG benchmark test | |
Castro et al. | NUMA-ICTM: A parallel version of ICTM exploiting memory placement strategies for NUMA machines | |
CN106484532A (en) | GPGPU parallel calculating method towards SPH fluid simulation | |
Jeon et al. | Parallel exact inference on a CPU-GPGPU heterogenous system | |
Herrmann et al. | Memory-aware list scheduling for hybrid platforms | |
Davis et al. | Paradigmatic shifts for exascale supercomputing | |
Bahnasawy et al. | Optimization procedure for algorithms of task scheduling in high performance heterogeneous distributed computing systems | |
Fresno et al. | Automatic data partitioning applied to multigrid PDE solvers | |
Pei et al. | Codelet scheduling by genetic algorithm | |
Jeannot et al. | Experimenting task-based runtimes on a legacy Computational Fluid Dynamics code with unstructured meshes | |
Tabbaa et al. | A fault tolerant scheduling algorithm for dag applications in cluster environments | |
de Blas Cartón et al. | Effortless and efficient distributed data-partitioning in linear algebra | |
Sun et al. | Solving irregularly structured problems based on distributed object model | |
Sasidharan et al. | A general space-filling curve algorithm for partitioning 2D meshes | |
Parikh et al. | Distributed work stealing at scale via matchmaking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |