CN106844024B - GPU/CPU scheduling method and system of self-learning running time prediction model - Google Patents
GPU/CPU scheduling method and system of self-learning running time prediction model Download PDFInfo
- Publication number
- CN106844024B CN106844024B CN201611251972.3A CN201611251972A CN106844024B CN 106844024 B CN106844024 B CN 106844024B CN 201611251972 A CN201611251972 A CN 201611251972A CN 106844024 B CN106844024 B CN 106844024B
- Authority
- CN
- China
- Prior art keywords
- program
- gpu
- parameter
- cpu
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention provides a GPU/CPU scheduling method of a self-learning operation time prediction model, which relates to the technical field of large-scale heterogeneous computation and cloud computation, and comprises the steps of preprocessing a source code, generating an operation state identifier corresponding to the source code and parameters required by program operation, and storing the operation state identifier and the parameters in an XML file; setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file; when the program is called again, the XML file corresponding to the program is searched, new normalization parameters are calculated, the new normalization parameters are substituted into the prediction function, the running time prediction value of the program in the running process is obtained, the consumed time required by the program to be redistributed to another node is obtained, and if the newly distributed consumed time is lower than the running time prediction value, the program is distributed to the CPU node.
Description
Technical Field
The invention relates to the technical field of large-scale heterogeneous computing and cloud computing, in particular to a GPU/CPU scheduling method and system of a self-learning operation time prediction model.
Background
With the development and popularization of the GPGPU (general purpose computing graphics processing unit) technology, more and more computing clusters use the GPUs (graphics processing units) and the CPUs (central processing units) to perform heterogeneous parallel computing, so as to solve the problem of large-scale computing.
In the prior art, a CPU and a GPU are unified into a whole and loaded into a computing cluster to manufacture a method and a device for hybrid parallel computing of the CPU and the GPU, in a computing node which is scheduled with tasks to be processed, the CPU preprocesses the scheduled tasks to be processed one by one, and maps the preprocessed tasks into a video memory of the GPU after each task is preprocessed; or distributing each task to a proper computing platform according to the computing characteristics of each task of the data flow program and the data communication quantity between the tasks. However, in some types of tasks, such as image and video processing tasks, the tasks are relatively uniform and have strong repeatability, each task is different in data and parameters participating in operation, meanwhile, since the GPU is relatively suitable for parallelization tasks with a large scale, tasks with small parallelism and serial tasks are rather suitable for processing in the CPU, since the GPU device is installed on the PCI-E interface, the device with the GPU can independently run the GPU and the CPU tasks, and if one GPU task has a task segment only using the CPU, the GPU can be idle in the segment.
For a data processing task which has a CPU task segment and a GPU task segment and has high repeatability, the key of how to schedule the task lies in correctly predicting the processing time and the data volume required by each task segment and deciding whether the task is migrated to other nodes, most of the prior art can not predict the processing time, so that the task is in blind queue waiting, but the GPU resource is in an idle state due to the system occupation of the CPU serial task, the defect can be solved by predicting the running time of the task in the GPU and the CPU, when the CPU is suitable for executing the task for a long time, the task is packed into a computing node of only the CPU, and the GPU can execute the next computing task, thereby reducing the idle hardware of the whole system and improving the computing efficiency.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a GPU/CPU scheduling method and system of a self-learning operation time prediction model. The invention discloses a GPU/CPU scheduling method of a self-learning operation time prediction model, which comprises the following steps:
step 1, preprocessing a source code, generating an operation state identifier corresponding to the source code and a parameter required by program operation, and storing the operation state identifier and the parameter in an XML file;
step 2, setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file;
and 3, when the program is called again, searching the XML file corresponding to the program, calculating the normalization parameter, substituting the normalization parameter into the prediction function, acquiring the running time predicted value of the program in the current running, acquiring the consumed time required by the program to be redistributed to another node, and if the newly distributed consumed time is lower than the running time predicted value, distributing the program to the CPU node.
In the aforementioned method for scheduling a GPU/CPU with a self-learning runtime prediction model, the preprocessing of the source code in step 1 includes generating corresponding signals, including a signal copied to a GPU memory and a signal copied to a CPU memory, when the GPU and the CPU memory are exchanged.
In the above GPU/CPU scheduling method for the self-learning runtime prediction model, in step 3, the normalization parameter is generated by the following formula:
wherein XiIs a normalized parameter, X'iFor the operating parameters, μ is the mean value, SiIs the standard deviation.
In the above GPU/CPU scheduling method for self-learning runtime prediction model, in step 2, the prediction function h is usedθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]T,X[1 X1…Xn]TH is a predicted value of the running time, and a mean square error function is designed
Where y is the time for the program to run at a certain stage, the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
and recording the finally obtained theta in an XML file.
In the above GPU/CPU scheduling method with the self-learning runtime prediction model, in step 3, the consumed time required for program reallocation to another node is obtained through the following formula:
where m is the size of the file that the program needs to migrate and v is the average of the network speeds.
The invention also provides a GPU/CPU scheduling system of the self-learning operation time prediction model, which comprises:
the initialization module is used for preprocessing the source code, generating an operation state identifier corresponding to the source code and parameters required by program operation, and storing the operation state identifier and the parameters in an XML file;
the calculation theta module is used for setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file;
and the distribution module is used for searching the XML file corresponding to the program when the program is called again, calculating the normalization parameter, substituting the normalization parameter into the prediction function, acquiring the running time predicted value of the program in the running process, acquiring the consumed time required by the program to be redistributed to another node, and distributing the program to the CPU node if the newly distributed consumed time is lower than the running time predicted value.
In the GPU/CPU scheduling system with the self-learning runtime prediction model, the initialization and normalization parameter generation module may perform preprocessing on the source code, and may generate corresponding signals when the GPU and the CPU memory are exchanged, where the signals include a signal copied to the GPU memory and a signal copied to the CPU memory.
The GPU/CPU scheduling system of the self-learning runtime prediction model described above, wherein the allocation module generates the normalization parameter by the following formula:
wherein XiIs a normalized parameter, X'iFor the operating parameter, μ is the parameter mean, SiIs the standard deviation of the parameters.
The GPU/CPU scheduling system of the self-learning runtime prediction model, wherein the calculation theta module is used for predicting the function hθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]T,X=[1 X1…Xn]TH is a predicted value of the running time, and a mean square error function is designed
Where y is the time for the program to run at a certain stage, the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
record θ in an XML file.
The GPU/CPU scheduling system of the self-learning runtime prediction model described above, wherein the consumption time required for the program to be reallocated to another node is obtained in the allocation module by the following formula:
where m is the size of the file that the program needs to migrate and v is the average of the network speeds.
According to the scheme, the invention has the advantages that:
in a cluster system with both CPU and GPU, the invention can utilize GPU resources to the maximum extent, and allocate programs occupying CPU for a long time and having GPU idle to nodes with only CPU, thereby reducing the running time of the programs as a whole.
Drawings
FIG. 1 is a flow chart of a GPU/CPU scheduling method of the self-learning runtime prediction model of the present invention.
FIG. 2 is a topological structure diagram of the GPU/CPU scheduling method of the self-learning runtime prediction model of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
FIG. 1 is a flow chart of the GPU/CPU scheduling method of the self-learning runtime prediction model of the present invention, the system includes a source code preprocessor, a scheduling server, a database and a computation cluster with a plurality of GPU and CPU nodes, using the program and data parameters to be run as input, and the computation result of the program to be run as output. And calculating the predicted time according to the historical program running time recorded in the database, recording the actual running time into the database, and updating the recording model. The method specifically comprises the following steps:
step 1 preprocesses the source code of the program.
For each program source code which is to be operated in the system and is operated for the first time, a source code preprocessor is operated, and signals are respectively released when a memory application, a CPU and GPU memory exchange and a kernel function are executed. Specifically, in this embodiment, for the CUDA source code, each signal transmission function is designed based on a design framework of a signal and a slot, and the slot function will run at the dispatch server to receive a corresponding signal; the method specifically comprises the following substeps:
(1-1) when detecting that the program has an operation of copying from the CPU memory to the GPU memory, generating (emit) a corresponding SIGNAL (cudammcmpytogupu (void × src, void × dst, size _ t size)), wherein src represents a memory original address, dst represents a memory target address, and size represents a memory size to be copied;
(1-2) when detecting that the program has an operation of copying from the GPU memory to the CPU memory, generating (emit) a corresponding SIGNAL (cudammcmpytocpu (void × src, void × dst, size _ t size)), where src represents a memory original address, dst represents a memory target address, and size represents a memory size to be copied;
(1-3) when detecting that the GPU-side kernel function starts to execute, generating (emit) a corresponding SIGNAL (cudaKernellaunit ()) to represent that the operation of the GPU program starts;
(1-4) when the GPU-end kernel function is detected to be finished, a synchronization function needs to be run, and a (emit) corresponding SIGNAL (cudaSync ()) is generated to represent that the GPU program is finished to run;
(1-5) running the slot function at the dispatch server, receiving each signal function, recording it as a time node in the database in step 2, and if the program is run for the second time, executing step 3.
Step 2 of establishing database system and generating XML document
The step establishes or indexes the corresponding XML file and data field according to whether the program runs for the first time, and records the running time and the like as the data required by the step 3. The method specifically comprises the following substeps:
(2-1) if the program to be executed is run for the first time, generating a unique identifier and corresponding to an XML file, the XML file recording the integer and floating point parameters required by the program, and for some programs such as image processing, recording the characteristics of the processed data, including but not limited to: length and width of the image, resolution, number, etc. The following fields are generated in the database:
ProgramID int primary key not null identity(1,1),
XmlPos char(50)
(2-2) if the program is run for the second time, searching a corresponding XML file name in the database according to the program name, and adding parameters required by the program running to the XML file, wherein the parameters include but are not limited to: length and width of the image, resolution, number, etc.
Step 3 parameter preprocessing
The parameters to be used are preprocessed, parameters of floating point and integer types are selected, and normalization processing is carried out. The method specifically comprises the following substeps:
(3-1) counting all parameters of the integer type and the floating point type into an XML file, and calculating a mean value mu and a standard deviation Si;
(3-2) marking the ith parameter as X'iAnd (4) carrying out normalization processing according to the following formula, recording the normalization processing into an XML file, and preparing for the step 4.
Step 4 calculating the predicted time
And (4) determining that the task is deployed in a GPU or CPU node according to the normalization parameters calculated in the step (3). The method specifically comprises the following substeps:
(4-1) if the program needing to be operated is operated for the first time, skipping to the step 5;
(4-2) obtaining a prediction function regression parameter θ ═ θ in the XML file0θ1… θn]TAnd the normalized program parameter X in step 3 ═ 1X1… Xn]TThe operation prediction time is calculated by substituting the operation prediction time into the following prediction function.
hθ(X)=θTX,
H represents the calculated time required by the running of the program, and T is a matrix transposition symbol;
(4-3) based on the obtained time, assuming that the network speed is known and the change is not large, the time cost (elapsed time) required for the program to be redistributed to another node can be obtained approximately as
Wherein m is the size of the file where the program is located, v is the average value of the network speed, and if the newly allocated consumed time is lower than the predicted value of the running time, the program can meet the time overhead of node reallocation, and the program can be allocated to the CPU node.
(4-4) receiving the signal of the program when the program is executed, recording the running time required by each part of the program in the step 1, recording the running time into the XML file again, and participating in the operation in the step 5.
Step 5 calculating a fitting function
In the step, a linear prediction model is adopted to calculate a time prediction function, so that the historical running time recorded in the step 4 is obtained, a running time fitting function theta is recalculated, and the subsequent prediction accuracy aiming at the running program is improved. The method specifically comprises the following substeps:
(5-1) the invention designs the fitting prediction function as a linear prediction function, prediction function hθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]TThe normalization parameter X obtained in step 3 is [ 1X ═ X1… Xn]TThe program has n parameters, corresponding to n prediction parameters theta of the prediction function;
(5-2) corresponding to (5-1), a mean square error function is designed to
Where y is the time of the program operation at a certain stage recorded in step 5, and the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
recording theta in an XML file;
(5-3) when the program runs for the kth time, searching the corresponding XML file from the database, and calculating the normalization parameterNormalizing parameter X(k)Carry-in to a prediction function hθIn the step (X), the running time predicted value h required by the program at the stage is obtained;
(5-4) until the program runs to the end, recording each program segment, updating the prediction function, and normalizing the current normalization parametersCarry-in mean square error function
(5-5) for each theta0To thetanThe following function is repeatedly calculated until the above mean square error function converges:
and updating the theta stored in the XML file again, and finishing the program execution.
FIG. 2 is a topology structure diagram of the GPU/CPU scheduling method and system of the self-learning runtime prediction model of the present invention, the system includes a scheduling server, a database server, a plurality of GPU computation nodes and CPU nodes. The physical connection of the various parts may be through a gigabit switch.
The invention also provides a GPU/CPU scheduling system of the self-learning operation time prediction model, which comprises:
the initialization module is used for preprocessing the source code, generating an operation state identifier corresponding to the source code and parameters required by program operation, and storing the operation state identifier and the parameters in an XML file;
the calculation theta module is used for setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file;
and the distribution module is used for searching the XML file corresponding to the program when the program is called again, calculating the normalization parameter, substituting the normalization parameter into the prediction function, acquiring the running time predicted value of the program in the running process, acquiring the consumed time required by the program to be redistributed to another node, and distributing the program to the CPU node if the newly distributed consumed time is lower than the running time predicted value.
The initialization and normalization parameter generation module is used for generating a normalization parameter, wherein the initialization and normalization parameter generation module is used for preprocessing a source code and generating corresponding signals when the GPU and the CPU are exchanged, and the signals comprise signals copied to a GPU memory and signals copied to a CPU memory.
Further the assignment module generates a normalized parameter by the following equation:
wherein XiIs a normalized parameter, X'iFor the operating parameter, μ is the parameter mean, SiIs the standard deviation of the parameters.
Still further, the calculate θ module predicts the function hθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]T,X=[1 X1…Xn]TH is a predicted value of the running time, and a mean square error function is designed
Where y is the time for the program to run at a certain stage, the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
record θ in an XML file.
Further, the consumption time required for the program to be redistributed to another node is obtained in the distribution module by the following formula:
where m is the size of the file that the program needs to migrate and v is the average of the network speeds.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A GPU/CPU scheduling method of a self-learning operation time prediction model is characterized by comprising the following steps:
step 1, preprocessing a source code, generating an operation state identifier corresponding to the source code and a parameter required by program operation, and storing the operation state identifier and the parameter in an XML file;
step 2, setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file;
and 3, when the program is called again, searching the XML file corresponding to the program, calculating a normalization parameter, substituting the normalization parameter into a prediction function, acquiring an operation time predicted value of the program in the current operation, acquiring transmission time consumed by the program to be redistributed to a GPU node or a CPU node, and if the transmission time is lower than the operation time predicted value, distributing the program to the CPU node.
2. The method as claimed in claim 1, wherein the preprocessing of the source code in step 1 comprises generating corresponding signals when the GPU and the CPU memory are exchanged, the corresponding signals comprising signals copied to the GPU memory and signals copied to the CPU memory.
3. The method of GPU/CPU scheduling with self-learning runtime prediction model as claimed in claim 1, wherein said step 3 generates the normalization parameters by the following formula:
wherein XiIs a normalized parameter, X'iFor the operating parameters, μ is the mean value, SiIs the standard deviation.
4. The method of claim 1, wherein the step 2 is to predict the function hθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]T,X=[1 X1… Xn]TH is a predicted value of the running time, and a mean square error function is designed
Where y is the time for the program to run at a certain stage, the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
and recording the finally obtained theta in an XML file.
6. A GPU/CPU scheduling system for self-learning runtime prediction models, comprising:
the initialization module is used for preprocessing the source code, generating an operation state identifier corresponding to the source code and parameters required by program operation, and storing the operation state identifier and the parameters in an XML file;
the calculation theta module is used for setting a prediction function, calculating a regression parameter theta of the prediction function according to the running time of the program at a certain stage returned by the running state identifier and the parameter set at the stage, and storing the regression parameter theta in an XML file;
and the distribution module is used for searching the XML file corresponding to the program when the program is called again, calculating the normalization parameter, substituting the normalization parameter into the prediction function, obtaining the running time predicted value of the program in the current running, obtaining the transmission time required by the program to be redistributed to the GPU node or the CPU node, and distributing the program to the CPU node if the transmission time is lower than the running time predicted value.
7. The self-learning runtime prediction model GPU/CPU scheduling system of claim 6 wherein the initialization and generation of the source code preprocessing in the normalization parameter module includes generating corresponding signals during GPU and CPU memory swap, the corresponding signals including signals copied to GPU memory and signals copied to CPU memory.
8. The self-learning runtime prediction model GPU/CPU scheduling system of claim 6 wherein the assignment module generates the normalization parameters by the following formula:
wherein XiIs a normalized parameter, X'iFor the operating parameter, μ is the parameter mean, SiIs the standard deviation of the parameters.
9. The self-learning runtime prediction model GPU/CPU scheduling system of claim 6, wherein the compute θ module predicts the function hθ(X) is set to
hθ(X)=θTX,
Wherein, assuming that the program needs n parameters, the regression parameter θ of the prediction function is ═ θ0θ1… θn]T,X=[1 X1… Xn]TH is a predicted value of the running time, and a mean square error function is designed
Where y is the time for the program to run at a certain stage, the following function is repeatedly calculated for each parameter j equal to 0,1, …, n until the above functions converge:
record θ in an XML file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611251972.3A CN106844024B (en) | 2016-12-30 | 2016-12-30 | GPU/CPU scheduling method and system of self-learning running time prediction model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611251972.3A CN106844024B (en) | 2016-12-30 | 2016-12-30 | GPU/CPU scheduling method and system of self-learning running time prediction model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106844024A CN106844024A (en) | 2017-06-13 |
CN106844024B true CN106844024B (en) | 2020-06-05 |
Family
ID=59114064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611251972.3A Active CN106844024B (en) | 2016-12-30 | 2016-12-30 | GPU/CPU scheduling method and system of self-learning running time prediction model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106844024B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110796591B (en) * | 2019-09-25 | 2023-11-03 | 广东浪潮大数据研究有限公司 | GPU card using method and related equipment |
CN111522837B (en) * | 2020-04-23 | 2023-06-23 | 北京百度网讯科技有限公司 | Method and apparatus for determining time consumption of deep neural network |
CN116627433B (en) * | 2023-07-18 | 2024-01-09 | 鹏城实验室 | Real-time parameter prediction method, system, equipment and medium for AI processor |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605493A (en) * | 2013-11-29 | 2014-02-26 | 哈尔滨工业大学深圳研究生院 | Parallel sorting learning method and system based on graphics processing unit |
CN105468439A (en) * | 2015-11-19 | 2016-04-06 | 华东师范大学 | Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework |
-
2016
- 2016-12-30 CN CN201611251972.3A patent/CN106844024B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605493A (en) * | 2013-11-29 | 2014-02-26 | 哈尔滨工业大学深圳研究生院 | Parallel sorting learning method and system based on graphics processing unit |
CN105468439A (en) * | 2015-11-19 | 2016-04-06 | 华东师范大学 | Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework |
Non-Patent Citations (1)
Title |
---|
面向GPU 异构集群的自学习负载均衡调度算法;刘惠等;《西安石油大学学报》;20150531;第30卷(第3期);第105-110页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106844024A (en) | 2017-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105956021B (en) | A kind of automation task suitable for distributed machines study parallel method and its system | |
CN106776005B (en) | Resource management system and method for containerized application | |
CN110515739B (en) | Deep learning neural network model load calculation method, device, equipment and medium | |
US12106154B2 (en) | Serverless computing architecture for artificial intelligence workloads on edge for dynamic reconfiguration of workloads and enhanced resource utilization | |
US20240111586A1 (en) | Multi-policy intelligent scheduling method and apparatus oriented to heterogeneous computing power | |
CN111258744A (en) | Task processing method based on heterogeneous computation and software and hardware framework system | |
JP2020537784A (en) | Machine learning runtime library for neural network acceleration | |
CN104050042B (en) | The resource allocation methods and device of ETL operations | |
US20120233486A1 (en) | Load balancing on heterogeneous processing clusters implementing parallel execution | |
CN111079921A (en) | Efficient neural network training and scheduling method based on heterogeneous distributed system | |
CN114741207B (en) | GPU resource scheduling method and system based on multi-dimensional combination parallelism | |
CN112328378A (en) | Task scheduling method, computer device and storage medium | |
US11544113B2 (en) | Task scheduling for machine-learning workloads | |
CN103401939A (en) | Load balancing method adopting mixing scheduling strategy | |
Wang et al. | An efficient and non-intrusive GPU scheduling framework for deep learning training systems | |
US20210390405A1 (en) | Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof | |
CN104243617A (en) | Task scheduling method and system facing mixed load in heterogeneous cluster | |
CN113391918A (en) | Method, apparatus and computer program product for processing a computing job | |
Song et al. | Bridging the semantic gaps of GPU acceleration for scale-out CNN-based big data processing: Think big, see small | |
US20210319298A1 (en) | Compute-based subgraph partitioning of deep learning models for framework integration | |
CN113296905A (en) | Scheduling method, scheduling device, electronic equipment, storage medium and software product | |
CN106844024B (en) | GPU/CPU scheduling method and system of self-learning running time prediction model | |
US20110131554A1 (en) | Application generation system, method, and program product | |
Wang et al. | Lube: Mitigating bottlenecks in wide area data analytics | |
CN112905317A (en) | Task scheduling method and system under rapid reconfigurable signal processing heterogeneous platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |