CN106844024B - GPU/CPU scheduling method and system of self-learning running time prediction model - Google Patents

GPU/CPU scheduling method and system of self-learning running time prediction model Download PDF

Info

Publication number
CN106844024B
CN106844024B CN201611251972.3A CN201611251972A CN106844024B CN 106844024 B CN106844024 B CN 106844024B CN 201611251972 A CN201611251972 A CN 201611251972A CN 106844024 B CN106844024 B CN 106844024B
Authority
CN
China
Prior art keywords
program
gpu
parameter
cpu
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611251972.3A
Other languages
Chinese (zh)
Other versions
CN106844024A (en
Inventor
郝昀超
霍志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Academy Of Sciences State Owned Assets Management Co ltd
Institute of Computing Technology of CAS
Original Assignee
Chinese Academy Of Sciences State Owned Assets Management Co ltd
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Academy Of Sciences State Owned Assets Management Co ltd, Institute of Computing Technology of CAS filed Critical Chinese Academy Of Sciences State Owned Assets Management Co ltd
Priority to CN201611251972.3A priority Critical patent/CN106844024B/en
Publication of CN106844024A publication Critical patent/CN106844024A/en
Application granted granted Critical
Publication of CN106844024B publication Critical patent/CN106844024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a GPU/CPU scheduling method of a self-learning operation time prediction model, which relates to the technical field of large-scale heterogeneous computation and cloud computation, and comprises the steps of preprocessing a source code, generating an operation state identifier corresponding to the source code and parameters required by program operation, and storing the operation state identifier and the parameters in an XML file; setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file; when the program is called again, the XML file corresponding to the program is searched, new normalization parameters are calculated, the new normalization parameters are substituted into the prediction function, the running time prediction value of the program in the running process is obtained, the consumed time required by the program to be redistributed to another node is obtained, and if the newly distributed consumed time is lower than the running time prediction value, the program is distributed to the CPU node.

Description

GPU/CPU scheduling method and system of self-learning running time prediction model
Technical Field
The invention relates to the technical field of large-scale heterogeneous computing and cloud computing, in particular to a GPU/CPU scheduling method and system of a self-learning operation time prediction model.
Background
With the development and popularization of the GPGPU (general purpose computing graphics processing unit) technology, more and more computing clusters use the GPUs (graphics processing units) and the CPUs (central processing units) to perform heterogeneous parallel computing, so as to solve the problem of large-scale computing.
In the prior art, a CPU and a GPU are unified into a whole and loaded into a computing cluster to manufacture a method and a device for hybrid parallel computing of the CPU and the GPU, in a computing node which is scheduled with tasks to be processed, the CPU preprocesses the scheduled tasks to be processed one by one, and maps the preprocessed tasks into a video memory of the GPU after each task is preprocessed; or distributing each task to a proper computing platform according to the computing characteristics of each task of the data flow program and the data communication quantity between the tasks. However, in some types of tasks, such as image and video processing tasks, the tasks are relatively uniform and have strong repeatability, each task is different in data and parameters participating in operation, meanwhile, since the GPU is relatively suitable for parallelization tasks with a large scale, tasks with small parallelism and serial tasks are rather suitable for processing in the CPU, since the GPU device is installed on the PCI-E interface, the device with the GPU can independently run the GPU and the CPU tasks, and if one GPU task has a task segment only using the CPU, the GPU can be idle in the segment.
For a data processing task which has a CPU task segment and a GPU task segment and has high repeatability, the key of how to schedule the task lies in correctly predicting the processing time and the data volume required by each task segment and deciding whether the task is migrated to other nodes, most of the prior art can not predict the processing time, so that the task is in blind queue waiting, but the GPU resource is in an idle state due to the system occupation of the CPU serial task, the defect can be solved by predicting the running time of the task in the GPU and the CPU, when the CPU is suitable for executing the task for a long time, the task is packed into a computing node of only the CPU, and the GPU can execute the next computing task, thereby reducing the idle hardware of the whole system and improving the computing efficiency.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a GPU/CPU scheduling method and system of a self-learning operation time prediction model. The invention discloses a GPU/CPU scheduling method of a self-learning operation time prediction model, which comprises the following steps:
step 1, preprocessing a source code, generating an operation state identifier corresponding to the source code and a parameter required by program operation, and storing the operation state identifier and the parameter in an XML file;
step 2, setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file;
and 3, when the program is called again, searching the XML file corresponding to the program, calculating the normalization parameter, substituting the normalization parameter into the prediction function, acquiring the running time predicted value of the program in the current running, acquiring the consumed time required by the program to be redistributed to another node, and if the newly distributed consumed time is lower than the running time predicted value, distributing the program to the CPU node.
In the aforementioned method for scheduling a GPU/CPU with a self-learning runtime prediction model, the preprocessing of the source code in step 1 includes generating corresponding signals, including a signal copied to a GPU memory and a signal copied to a CPU memory, when the GPU and the CPU memory are exchanged.
In the above GPU/CPU scheduling method for the self-learning runtime prediction model, in step 3, the normalization parameter is generated by the following formula:
Figure BDA0001198010050000021
wherein XiIs a normalized parameter, X'iFor the operating parameters, μ is the mean value, SiIs the standard deviation.
In the above GPU/CPU scheduling method for self-learning runtime prediction model, in step 2, the prediction function h is usedθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]T,X[1 X1…Xn]TH is a predicted value of the running time, and a mean square error function is designed
Figure BDA0001198010050000022
Where y is the time for the program to run at a certain stage, the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
Figure BDA0001198010050000031
and recording the finally obtained theta in an XML file.
In the above GPU/CPU scheduling method with the self-learning runtime prediction model, in step 3, the consumed time required for program reallocation to another node is obtained through the following formula:
Figure BDA0001198010050000032
where m is the size of the file that the program needs to migrate and v is the average of the network speeds.
The invention also provides a GPU/CPU scheduling system of the self-learning operation time prediction model, which comprises:
the initialization module is used for preprocessing the source code, generating an operation state identifier corresponding to the source code and parameters required by program operation, and storing the operation state identifier and the parameters in an XML file;
the calculation theta module is used for setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file;
and the distribution module is used for searching the XML file corresponding to the program when the program is called again, calculating the normalization parameter, substituting the normalization parameter into the prediction function, acquiring the running time predicted value of the program in the running process, acquiring the consumed time required by the program to be redistributed to another node, and distributing the program to the CPU node if the newly distributed consumed time is lower than the running time predicted value.
In the GPU/CPU scheduling system with the self-learning runtime prediction model, the initialization and normalization parameter generation module may perform preprocessing on the source code, and may generate corresponding signals when the GPU and the CPU memory are exchanged, where the signals include a signal copied to the GPU memory and a signal copied to the CPU memory.
The GPU/CPU scheduling system of the self-learning runtime prediction model described above, wherein the allocation module generates the normalization parameter by the following formula:
Figure BDA0001198010050000033
wherein XiIs a normalized parameter, X'iFor the operating parameter, μ is the parameter mean, SiIs the standard deviation of the parameters.
The GPU/CPU scheduling system of the self-learning runtime prediction model, wherein the calculation theta module is used for predicting the function hθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]T,X=[1 X1…Xn]TH is a predicted value of the running time, and a mean square error function is designed
Figure BDA0001198010050000041
Where y is the time for the program to run at a certain stage, the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
Figure BDA0001198010050000042
record θ in an XML file.
The GPU/CPU scheduling system of the self-learning runtime prediction model described above, wherein the consumption time required for the program to be reallocated to another node is obtained in the allocation module by the following formula:
Figure BDA0001198010050000043
where m is the size of the file that the program needs to migrate and v is the average of the network speeds.
According to the scheme, the invention has the advantages that:
in a cluster system with both CPU and GPU, the invention can utilize GPU resources to the maximum extent, and allocate programs occupying CPU for a long time and having GPU idle to nodes with only CPU, thereby reducing the running time of the programs as a whole.
Drawings
FIG. 1 is a flow chart of a GPU/CPU scheduling method of the self-learning runtime prediction model of the present invention.
FIG. 2 is a topological structure diagram of the GPU/CPU scheduling method of the self-learning runtime prediction model of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
FIG. 1 is a flow chart of the GPU/CPU scheduling method of the self-learning runtime prediction model of the present invention, the system includes a source code preprocessor, a scheduling server, a database and a computation cluster with a plurality of GPU and CPU nodes, using the program and data parameters to be run as input, and the computation result of the program to be run as output. And calculating the predicted time according to the historical program running time recorded in the database, recording the actual running time into the database, and updating the recording model. The method specifically comprises the following steps:
step 1 preprocesses the source code of the program.
For each program source code which is to be operated in the system and is operated for the first time, a source code preprocessor is operated, and signals are respectively released when a memory application, a CPU and GPU memory exchange and a kernel function are executed. Specifically, in this embodiment, for the CUDA source code, each signal transmission function is designed based on a design framework of a signal and a slot, and the slot function will run at the dispatch server to receive a corresponding signal; the method specifically comprises the following substeps:
(1-1) when detecting that the program has an operation of copying from the CPU memory to the GPU memory, generating (emit) a corresponding SIGNAL (cudammcmpytogupu (void × src, void × dst, size _ t size)), wherein src represents a memory original address, dst represents a memory target address, and size represents a memory size to be copied;
(1-2) when detecting that the program has an operation of copying from the GPU memory to the CPU memory, generating (emit) a corresponding SIGNAL (cudammcmpytocpu (void × src, void × dst, size _ t size)), where src represents a memory original address, dst represents a memory target address, and size represents a memory size to be copied;
(1-3) when detecting that the GPU-side kernel function starts to execute, generating (emit) a corresponding SIGNAL (cudaKernellaunit ()) to represent that the operation of the GPU program starts;
(1-4) when the GPU-end kernel function is detected to be finished, a synchronization function needs to be run, and a (emit) corresponding SIGNAL (cudaSync ()) is generated to represent that the GPU program is finished to run;
(1-5) running the slot function at the dispatch server, receiving each signal function, recording it as a time node in the database in step 2, and if the program is run for the second time, executing step 3.
Step 2 of establishing database system and generating XML document
The step establishes or indexes the corresponding XML file and data field according to whether the program runs for the first time, and records the running time and the like as the data required by the step 3. The method specifically comprises the following substeps:
(2-1) if the program to be executed is run for the first time, generating a unique identifier and corresponding to an XML file, the XML file recording the integer and floating point parameters required by the program, and for some programs such as image processing, recording the characteristics of the processed data, including but not limited to: length and width of the image, resolution, number, etc. The following fields are generated in the database:
ProgramID int primary key not null identity(1,1),
XmlPos char(50)
(2-2) if the program is run for the second time, searching a corresponding XML file name in the database according to the program name, and adding parameters required by the program running to the XML file, wherein the parameters include but are not limited to: length and width of the image, resolution, number, etc.
Step 3 parameter preprocessing
The parameters to be used are preprocessed, parameters of floating point and integer types are selected, and normalization processing is carried out. The method specifically comprises the following substeps:
(3-1) counting all parameters of the integer type and the floating point type into an XML file, and calculating a mean value mu and a standard deviation Si
(3-2) marking the ith parameter as X'iAnd (4) carrying out normalization processing according to the following formula, recording the normalization processing into an XML file, and preparing for the step 4.
Figure BDA0001198010050000061
Step 4 calculating the predicted time
And (4) determining that the task is deployed in a GPU or CPU node according to the normalization parameters calculated in the step (3). The method specifically comprises the following substeps:
(4-1) if the program needing to be operated is operated for the first time, skipping to the step 5;
(4-2) obtaining a prediction function regression parameter θ ═ θ in the XML file0θ1… θn]TAnd the normalized program parameter X in step 3 ═ 1X1… Xn]TThe operation prediction time is calculated by substituting the operation prediction time into the following prediction function.
hθ(X)=θTX,
H represents the calculated time required by the running of the program, and T is a matrix transposition symbol;
(4-3) based on the obtained time, assuming that the network speed is known and the change is not large, the time cost (elapsed time) required for the program to be redistributed to another node can be obtained approximately as
Figure BDA0001198010050000062
Wherein m is the size of the file where the program is located, v is the average value of the network speed, and if the newly allocated consumed time is lower than the predicted value of the running time, the program can meet the time overhead of node reallocation, and the program can be allocated to the CPU node.
(4-4) receiving the signal of the program when the program is executed, recording the running time required by each part of the program in the step 1, recording the running time into the XML file again, and participating in the operation in the step 5.
Step 5 calculating a fitting function
In the step, a linear prediction model is adopted to calculate a time prediction function, so that the historical running time recorded in the step 4 is obtained, a running time fitting function theta is recalculated, and the subsequent prediction accuracy aiming at the running program is improved. The method specifically comprises the following substeps:
(5-1) the invention designs the fitting prediction function as a linear prediction function, prediction function hθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]TThe normalization parameter X obtained in step 3 is [ 1X ═ X1… Xn]TThe program has n parameters, corresponding to n prediction parameters theta of the prediction function;
(5-2) corresponding to (5-1), a mean square error function is designed to
Figure BDA0001198010050000071
Where y is the time of the program operation at a certain stage recorded in step 5, and the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
Figure BDA0001198010050000072
recording theta in an XML file;
(5-3) when the program runs for the kth time, searching the corresponding XML file from the database, and calculating the normalization parameter
Figure BDA0001198010050000073
Normalizing parameter X(k)Carry-in to a prediction function hθIn the step (X), the running time predicted value h required by the program at the stage is obtained;
(5-4) until the program runs to the end, recording each program segment, updating the prediction function, and normalizing the current normalization parameters
Figure BDA0001198010050000074
Carry-in mean square error function
Figure BDA0001198010050000075
(5-5) for each theta0To thetanThe following function is repeatedly calculated until the above mean square error function converges:
Figure BDA0001198010050000076
and updating the theta stored in the XML file again, and finishing the program execution.
FIG. 2 is a topology structure diagram of the GPU/CPU scheduling method and system of the self-learning runtime prediction model of the present invention, the system includes a scheduling server, a database server, a plurality of GPU computation nodes and CPU nodes. The physical connection of the various parts may be through a gigabit switch.
The invention also provides a GPU/CPU scheduling system of the self-learning operation time prediction model, which comprises:
the initialization module is used for preprocessing the source code, generating an operation state identifier corresponding to the source code and parameters required by program operation, and storing the operation state identifier and the parameters in an XML file;
the calculation theta module is used for setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file;
and the distribution module is used for searching the XML file corresponding to the program when the program is called again, calculating the normalization parameter, substituting the normalization parameter into the prediction function, acquiring the running time predicted value of the program in the running process, acquiring the consumed time required by the program to be redistributed to another node, and distributing the program to the CPU node if the newly distributed consumed time is lower than the running time predicted value.
The initialization and normalization parameter generation module is used for generating a normalization parameter, wherein the initialization and normalization parameter generation module is used for preprocessing a source code and generating corresponding signals when the GPU and the CPU are exchanged, and the signals comprise signals copied to a GPU memory and signals copied to a CPU memory.
Further the assignment module generates a normalized parameter by the following equation:
Figure BDA0001198010050000081
wherein XiIs a normalized parameter, X'iFor the operating parameter, μ is the parameter mean, SiIs the standard deviation of the parameters.
Still further, the calculate θ module predicts the function hθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]T,X=[1 X1…Xn]TH is a predicted value of the running time, and a mean square error function is designed
Figure BDA0001198010050000082
Where y is the time for the program to run at a certain stage, the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
Figure BDA0001198010050000083
record θ in an XML file.
Further, the consumption time required for the program to be redistributed to another node is obtained in the distribution module by the following formula:
Figure BDA0001198010050000084
where m is the size of the file that the program needs to migrate and v is the average of the network speeds.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A GPU/CPU scheduling method of a self-learning operation time prediction model is characterized by comprising the following steps:
step 1, preprocessing a source code, generating an operation state identifier corresponding to the source code and a parameter required by program operation, and storing the operation state identifier and the parameter in an XML file;
step 2, setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file;
and 3, when the program is called again, searching the XML file corresponding to the program, calculating a normalization parameter, substituting the normalization parameter into a prediction function, acquiring an operation time predicted value of the program in the current operation, acquiring transmission time consumed by the program to be redistributed to a GPU node or a CPU node, and if the transmission time is lower than the operation time predicted value, distributing the program to the CPU node.
2. The method as claimed in claim 1, wherein the preprocessing of the source code in step 1 comprises generating corresponding signals when the GPU and the CPU memory are exchanged, the corresponding signals comprising signals copied to the GPU memory and signals copied to the CPU memory.
3. The method of GPU/CPU scheduling with self-learning runtime prediction model as claimed in claim 1, wherein said step 3 generates the normalization parameters by the following formula:
Figure FDA0002369808930000011
wherein XiIs a normalized parameter, X'iFor the operating parameters, μ is the mean value, SiIs the standard deviation.
4. The method of claim 1, wherein the step 2 is to predict the function hθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]T,X=[1 X1… Xn]TH is a predicted value of the running time, and a mean square error function is designed
Figure FDA0002369808930000012
Where y is the time for the program to run at a certain stage, the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
Figure FDA0002369808930000013
and recording the finally obtained theta in an XML file.
5. The self-learning runtime prediction model GPU/CPU scheduling method of claim 1, wherein the transmission time is obtained in step 3 by the following formula:
Figure FDA0002369808930000021
where m is the size of the file that the program needs to migrate and v is the average of the network speeds.
6. A GPU/CPU scheduling system for self-learning runtime prediction models, comprising:
the initialization module is used for preprocessing the source code, generating an operation state identifier corresponding to the source code and parameters required by program operation, and storing the operation state identifier and the parameters in an XML file;
the calculation theta module is used for setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file;
and the distribution module is used for searching the XML file corresponding to the program when the program is called again, calculating the normalization parameter, substituting the normalization parameter into the prediction function, obtaining the running time predicted value of the program in the current running, obtaining the transmission time required by the program to be redistributed to the GPU node or the CPU node, and distributing the program to the CPU node if the transmission time is lower than the running time predicted value.
7. The self-learning runtime prediction model GPU/CPU scheduling system of claim 6 wherein the initialization and generation of the source code preprocessing in the normalization parameter module includes generating corresponding signals during GPU and CPU memory swap, the corresponding signals including signals copied to GPU memory and signals copied to CPU memory.
8. The self-learning runtime prediction model GPU/CPU scheduling system of claim 6 wherein the assignment module generates the normalization parameters by the following formula:
Figure FDA0002369808930000022
wherein XiIs a normalized parameter, X'iFor the operating parameter, μ is the parameter mean, SiIs the standard deviation of the parameters.
9. The self-learning runtime prediction model GPU/CPU scheduling system of claim 6, wherein the compute θ module predicts the function hθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]T,X=[1 X1… Xn]TH is a predicted value of the running time, and a mean square error function is designed
Figure FDA0002369808930000031
Where y is the time for the program to run at a certain stage, the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
Figure FDA0002369808930000032
record θ in an XML file.
10. The self-learning runtime prediction model GPU/CPU scheduling system of claim 6 wherein the allocation module obtains the transmission time by:
Figure FDA0002369808930000033
where m is the size of the file that the program needs to migrate and v is the average of the network speeds.
CN201611251972.3A 2016-12-30 2016-12-30 GPU/CPU scheduling method and system of self-learning running time prediction model Active CN106844024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611251972.3A CN106844024B (en) 2016-12-30 2016-12-30 GPU/CPU scheduling method and system of self-learning running time prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611251972.3A CN106844024B (en) 2016-12-30 2016-12-30 GPU/CPU scheduling method and system of self-learning running time prediction model

Publications (2)

Publication Number Publication Date
CN106844024A CN106844024A (en) 2017-06-13
CN106844024B true CN106844024B (en) 2020-06-05

Family

ID=59114064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611251972.3A Active CN106844024B (en) 2016-12-30 2016-12-30 GPU/CPU scheduling method and system of self-learning running time prediction model

Country Status (1)

Country Link
CN (1) CN106844024B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796591B (en) * 2019-09-25 2023-11-03 广东浪潮大数据研究有限公司 GPU card using method and related equipment
CN111522837B (en) * 2020-04-23 2023-06-23 北京百度网讯科技有限公司 Method and apparatus for determining time consumption of deep neural network
CN116627433B (en) * 2023-07-18 2024-01-09 鹏城实验室 Real-time parameter prediction method, system, equipment and medium for AI processor

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605493A (en) * 2013-11-29 2014-02-26 哈尔滨工业大学深圳研究生院 Parallel sorting learning method and system based on graphics processing unit
CN105468439A (en) * 2015-11-19 2016-04-06 华东师范大学 Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605493A (en) * 2013-11-29 2014-02-26 哈尔滨工业大学深圳研究生院 Parallel sorting learning method and system based on graphics processing unit
CN105468439A (en) * 2015-11-19 2016-04-06 华东师范大学 Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向GPU 异构集群的自学习负载均衡调度算法;刘惠等;《西安石油大学学报》;20150531;第30卷(第3期);第105-110页 *

Also Published As

Publication number Publication date
CN106844024A (en) 2017-06-13

Similar Documents

Publication Publication Date Title
CN105956021B (en) A kind of automation task suitable for distributed machines study parallel method and its system
CN106776005B (en) Resource management system and method for containerized application
CN110515739B (en) Deep learning neural network model load calculation method, device, equipment and medium
US12106154B2 (en) Serverless computing architecture for artificial intelligence workloads on edge for dynamic reconfiguration of workloads and enhanced resource utilization
US20240111586A1 (en) Multi-policy intelligent scheduling method and apparatus oriented to heterogeneous computing power
CN111258744A (en) Task processing method based on heterogeneous computation and software and hardware framework system
JP2020537784A (en) Machine learning runtime library for neural network acceleration
CN104050042B (en) The resource allocation methods and device of ETL operations
US20120233486A1 (en) Load balancing on heterogeneous processing clusters implementing parallel execution
CN111079921A (en) Efficient neural network training and scheduling method based on heterogeneous distributed system
CN114741207B (en) GPU resource scheduling method and system based on multi-dimensional combination parallelism
CN112328378A (en) Task scheduling method, computer device and storage medium
US11544113B2 (en) Task scheduling for machine-learning workloads
CN103401939A (en) Load balancing method adopting mixing scheduling strategy
Wang et al. An efficient and non-intrusive GPU scheduling framework for deep learning training systems
US20210390405A1 (en) Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof
CN104243617A (en) Task scheduling method and system facing mixed load in heterogeneous cluster
CN113391918A (en) Method, apparatus and computer program product for processing a computing job
Song et al. Bridging the semantic gaps of GPU acceleration for scale-out CNN-based big data processing: Think big, see small
US20210319298A1 (en) Compute-based subgraph partitioning of deep learning models for framework integration
CN113296905A (en) Scheduling method, scheduling device, electronic equipment, storage medium and software product
CN106844024B (en) GPU/CPU scheduling method and system of self-learning running time prediction model
US20110131554A1 (en) Application generation system, method, and program product
Wang et al. Lube: Mitigating bottlenecks in wide area data analytics
CN112905317A (en) Task scheduling method and system under rapid reconfigurable signal processing heterogeneous platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant