CN113590229A - Industrial Internet of things graph task unloading method and system based on deep reinforcement learning - Google Patents
Industrial Internet of things graph task unloading method and system based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113590229A CN113590229A CN202110923267.8A CN202110923267A CN113590229A CN 113590229 A CN113590229 A CN 113590229A CN 202110923267 A CN202110923267 A CN 202110923267A CN 113590229 A CN113590229 A CN 113590229A
- Authority
- CN
- China
- Prior art keywords
- task
- graph
- network
- unloading
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000005457 optimization Methods 0.000 claims abstract description 36
- 230000009471 action Effects 0.000 claims abstract description 29
- 239000003999 initiator Substances 0.000 claims description 30
- 230000005540 biological transmission Effects 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 10
- 238000012546 transfer Methods 0.000 claims description 6
- 238000013459 approach Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000006855 networking Effects 0.000 claims 1
- 238000004891 communication Methods 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 125000002015 acyclic group Chemical group 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44594—Unloading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Telephonic Communication Services (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses an industrial Internet of things graph task unloading method and system based on deep reinforcement learning, wherein the method comprises the following steps: constructing a mobile edge computing system based on an unloading task scene in the industrial Internet of things; setting an optimization target for graph task unloading based on a mobile edge computing system, wherein the optimization target is the sum of the minimized task completion time and the weight consumed by data exchange; according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, reinforcement learning is carried out based on a pre-constructed deep Q network, and the computation-intensive graph task is unloaded in a dynamic time-varying environment, so that the optimal action is obtained. The system comprises: the system comprises a system model building module, an optimization target setting module and a reinforcement learning module. By using the method and the device, the unloading strategy of the graphic task under the time-varying condition and limited resources can be formulated. The invention can be widely applied to the field of industrial Internet of things.
Description
Technical Field
The invention relates to the field of industrial Internet of things, in particular to an industrial Internet of things graph task unloading method and system based on deep reinforcement learning.
Background
Under the big background of intelligent manufacturing in China, all the links of a factory are gradually developing towards intellectualization, and an unmanned forklift is widely applied to various factories as a main implementation mode of logistics in an intelligent factory. They can transport the material automatically and high-efficiently in the mill, have solved the too high problem of artifical transport intensity of labour. Unmanned Forklifts (UFs) have computational processors and various sensing devices (e.g., unmanned aerial vehicle cameras and high-quality sensors), and can carry perception-related applications (e.g., personnel identification, obstacle identification, anomaly identification and early warning, etc.) with innovative and computationally intensive features.
Some aware applications may involve a large number of complex tasks, and the processing of some complex tasks locally at the UF is impractical due to limitations in individual UF computing power and power consumption, thus requiring offloading of tasks to nearby equipment or base station processing. This complicates task offloading and makes reasonable task offloading challenging, as complex tasks are typically composed of interdependent sub-tasks. Graph tasks are used to represent dependencies between various compute-intensive tasks, where the tasks and data streams are represented by graph vertices and edges, respectively, so offloading of tasks from a task graph is an efficient method.
For an industrial internet of things with complex communication conditions, due to frequent changes of wireless channel conditions and fluctuation of computing resources of each device, the existing graphic task unloading method cannot be well executed. In highly dynamic environments, the optimization problem must be solved frequently, which may result in a waste of computing resources.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide an industrial internet of things graph task unloading method and system based on deep reinforcement learning, and an unloading strategy of a graph task under a time-varying condition and limited resources is formulated by considering the complexity of a communication environment and minimizing the sum of task completion time and Weight (WETC) consumed by data exchange.
The first technical scheme adopted by the invention is as follows: an industrial Internet of things graph task unloading method based on deep reinforcement learning comprises the following steps:
s1, constructing a mobile edge computing system based on unloading task scenes in the industrial Internet of things;
s2, setting an optimization target for graph task unloading based on the mobile edge computing system, wherein the optimization target is the sum of the weight of minimized task completion time and data exchange consumption;
s3, according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, performing reinforcement learning based on a pre-constructed deep Q network, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action;
the pre-constructed deep Q network comprises a train Q-network and a target Q-network which have the same structure.
Further, the step of constructing the mobile edge computing system based on the task unloading scene of one task initiator and a plurality of task performers in the industrial internet of things specifically includes:
s11, constructing a mobile edge computing system based on an unloading task scene in the industrial Internet of things, wherein the unloading task scene comprises a task initiator and a plurality of task executors;
s12, representing the dependency relationship between tasks by using an undirected acyclic graph G ═ { V, E }, which includes a set of tasks V ═ { Vi | i ∈ W }, and a set of edges E ═ E ∈ W }ijL (i, j) is belonged to W, i is not equal to j, wherein W represents the total number of tasks, and each edge e in GijVariable indication v used as binary indicatoriAnd vjWhether there is data exchange between them;
graph task offloading is performed in a mobile edge computing system, and the graph task offloading has transmission time consumption, execution time consumption and data exchange consumption.
Further, the reinforcement learning of the pre-constructed deep Q network is as follows:
state space, the state at time t being represented asWhereinRepresenting the channel gain of the task performer i at time t, ft={ft,iI belongs to m represents the cpu frequency of the task performer i at the time t, ut={ut,iI belongs to m represents the idle gaps quantity of the task performer i at the time t, GtTopological relations representing a task graph, dt={dt,iI belongs to m represents the distance between the task executor i and the task initiator at the time t;
action space, current task viIs represented by ai{ai,1,ai,2,...ai,mI ∈ m }, where ai,jIs set to a binary indicator, i.e. ai,jDenoted 1 as selection device njTo unload task vi,ai,j0 denotes no selected device njTo unload task vi;
Reward function, reward setting of the systemWhere t (u) represents time consumption, e (b) represents data exchange consumption, and α and (1- α) represent weights of time consumption and data exchange consumption, respectively.
Further, the step of performing reinforcement learning based on a pre-constructed deep Q network according to an environment state in the mobile edge computing system and an optimization goal of graph task offloading, and learning to offload a computation-intensive graph task in a dynamic time-varying environment to obtain an optimal action specifically includes:
s31, calculating the time consumption and the data exchange consumption in the dynamic environment according to the environment state in the mobile edge computing system and the optimization target of graph task unloading;
s32, determining a return r corresponding to the action a, inputting a new observed environment state S' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and updating parameters of a train Q-network in a reverse gradient transfer mode;
s33, repeating the step S32 until the return r is judged to be converged and approaches to the maximization, and taking the current action as the optimal action.
Further, the step of determining the return r corresponding to the action a, inputting a new observed environment state s' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and updating parameters of a train Q-network in a reverse gradient transfer manner specifically includes:
initializing graph task G ═ { V, E }, empirical playback pool, parameter θ of train-Q networktrainParameter theta of target-Q networktragetAnd a system environment, and setting an empty queue Q;
random selection of a task v by a task initiatoriTaking the queue as a head node of the queue and carrying out enqueue operation;
the task initiator sequentially takes out the tasks v to be unloaded from the queueiAnd according to the edge set E in the graph task G, the graph task V and the task V are sequentially combinediAll the associated tasks which are not unloaded carry out enqueue operation;
inputting the state of the current observed environment of the task initiator into a target Q-network, and outputting Q { s, a | theta }a∈ASelecting action a according to an epsilon-greedy algorithmiThen, unloading tasks are carried out;
task initiator observes state s from environmenti+1And awarding the reward i and will(s)i,ai,ri,si+1) Accessing as experience values into an experience playback pool;
judging that the experience playback pool is full of data, and randomly selecting K experience values from the experience playback pool;
calculating target values and incorporating experienceValue, updating parameter θ of train-Q network based on gradient descent methodtrain;
After F time steps, according to the parameter theta of the train Q-network at the current momenttrainUpdating parameter theta of target Q-networktarget。
The second technical scheme adopted by the invention is as follows: an industrial internet of things graph task unloading system based on deep reinforcement learning comprises:
the system model building module is used for building a mobile edge computing system based on task unloading scenes of one task initiator and a plurality of task performers in the industrial Internet of things;
the optimization target setting module is used for setting an optimization target for task unloading of the graph based on the mobile edge computing system, wherein the optimization target is the sum of the minimum task completion time and the weight consumed by data exchange;
and the reinforcement learning module is used for carrying out reinforcement learning based on a pre-constructed deep Q network according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action.
The method and the system have the beneficial effects that: the invention considers the complexity of the communication environment, minimizes the sum of WETC, makes the unloading strategy of the graphic tasks under the time-varying condition and limited resources, continuously accumulates experience in multiple iterations through a deep Q network, and learns through feedback signals, thereby being capable of making a near-optimal strategy to reasonably unload the complex graphic tasks in a dynamic environment.
Drawings
FIG. 1 is a flowchart of steps of an industrial IOT graph task unloading method based on deep reinforcement learning according to the invention;
FIG. 2 is a structural block diagram of an industrial Internet of things graph task unloading system based on deep reinforcement learning;
FIG. 3 is a schematic diagram of an industrial IOT scenario in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of the task framework in accordance with an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
Referring to fig. 1, the invention provides an industrial internet of things graph task unloading method based on deep reinforcement learning, which comprises the following steps:
s1, constructing a mobile edge computing system based on unloading task scenes in the industrial Internet of things;
specifically, referring to fig. 3, due to the limitations of single UF computing power and power consumption, the device needs to offload locally generated compute-intensive tasks to nearby devices and servers for processing through wireless communication, thereby reducing the capacity requirement and power consumption of a single device, and therefore, we construct a Mobile Edge Computing (MEC) system with a scenario in which a task initiator and multiple task performers offload tasks. Herein, each MEC system provides an information technology service environment and cloud computing capability at the edge of a mobile network as a unit for executing graphics tasks in parallel, making it possible to process tasks quickly, and assuming no interference between each MEC system.
Assuming that there is one task initiator and M task performers in the MEC system, where N ═ { Ni | i ∈ M } denotes the number of task performers, the computational resources of each task performer are divided into different numbers of free slots, which may be expressed as z ═ { z ∈ M-iI ∈ M }. We consider that each free gap can provide computational services for each of the graph tasks.
S2, setting an optimization target for graph task unloading based on the mobile edge computing system, wherein the optimization target is the sum of the weight of minimized task completion time and data exchange consumption;
s3, according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, performing reinforcement learning based on a pre-constructed deep Q network, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action;
the pre-constructed deep Q network comprises a train Q-network and a target Q-network which have the same structure.
Specifically, the Deep Q Network (DQN) is composed of two neural networks of the same structure but different roles, one is a train Q-network and one is a train Q-network. The two neural networks have different parameters thetatrainAnd thetatarget。θtrainQ value (expected reward), θ, for evaluating optimal actionstargetAn act of selecting the corresponding maximum Q value (by an epsilon-greedy algorithm). The two sets of parameters separate action selection and strategy evaluation, and reduce the risk of overfitting in the process of estimating the Q value. The experience pool is used for storing experiences generated by all the agents, and the experiences obtained by random sampling from the experience pool are used as the input of the train Q-network to update the parameters of the train Q-network, so that the memory and the computing resources required by training can be greatly reduced, and the coupling between data is reduced.
Further, as a preferred embodiment of the method, the step of constructing the mobile edge computing system based on the task unloading scene of one task initiator and a plurality of task performers in the industrial internet of things specifically includes:
s11, constructing a mobile edge computing system based on an unloading task scene in the industrial Internet of things, wherein the unloading task scene comprises a task initiator and a plurality of task executors;
s12, representing the dependency relationship between tasks by using an undirected acyclic graph G ═ { V, E }, which includes a set of tasks V ═ { Vi | i ∈ W }, and a set of edges E ═ E ∈ W }ijL (i, j) is belonged to W, i is not equal to j, wherein W represents the total number of tasks, and each edge e in GijVariable indication v used as binary indicatoriAnd vjWhether there is data exchange between them;
specifically, to reflect the different topological relationships of graph tasks conveniently, we represent the dependencies between tasks as undirected acyclic graphs G ═ V, E, where the package isContaining a set of tasks V ═ { Vi | i ∈ W }, and a set of edges E ═ E { (E) }ijL (i, j) is belonged to W, i is not equal to j, wherein W represents the total number of tasks, and each edge e in GijVariable indication v used as binary indicatoriAnd vjWhether there is data exchange between them. In addition, the parameter L ═ { L ═ LiI belongs to W and represents the data size of different tasks, and the calculation workload is determined by a parameter K ═ K ∈ WiI ∈ W } where K isiIndicating the execution of a task viThe amount of CPU cycles required. The graph task framework refers to FIG. 4.
In addition, we assume that each task performer is allocated a dedicated spectrum resource block during the transmission to support concurrent transmission of task offloading and downloading. Considering that the uplink transmission time is much longer than the downlink transmission time, only the consumption of uplink transmission is discussed herein, and given that task initiators are long-term suffering from resource shortages, it is assumed that the computationally intensive tasks would tend to be entirely offloaded to task performers.
Graph task offloading is performed in a mobile edge computing system, and the graph task offloading has transmission time consumption, execution time consumption and data exchange consumption.
Consumption of transmission timeIndicating that the task initiator offloads the task i to the edge device j in parallel, and the value of the task i is related to the conditions of channel condition, transmission power, bandwidth and the like. We use PTIIndicating a fixed power of uplink transmission from the task initiator,indicating the channel gain for offloading task i to device j. Moreover, we assume that there is additive white gaussian noise with zero mean and equal variance σ 2 at the receiving end of all tasks. According to shannon's theorem, the uplink transmission rate of offloading task i from the task initiator to device j:
where B denotes a fixed bandwidth of an orthogonal channel allocated to an edge device. Therefore, the uplink transmission time of task i is:
we indicate the variable a in binaryijIndicating whether task i is offloaded to device j
Thus, the total time consumed for task transmission is:
execution time consumption: after the task is transferred to each idle slot, the edge device starts to execute each subtask in parallel. Let us use f ═ fiI is less than or equal to M is expressed as CPU frequency used by executing task on edge device, and the same f is adopted by idle interval for bearing calculation taski. Thus, the total time consumed for task execution is:
therefore, the total time consumed for task transmission and execution is:
T(a)=Ttrans(a)+Texec(a)
data exchange cost:
we assume that when a subtask with a join relationship is offloaded to a different task performer, it will incur a data exchange cost of cjj′(j ∈ m, j' ∈ m, i ≠ j), if the data is unloaded to the same task executor, the data exchange cost is not generated, which represents the cost generated by the traffic exchange between different task executors in the MEC system. We use binaryIndicating variable bjj′To indicate whether there is a data exchange between different task performers, then
Thus, the total data exchange consumption is:
modeling an optimization target: to obtain the offloading strategy for computationally intensive graph tasks in the considered MEC, we formulated the following optimization problem with the main performance of minimizing the sum of the weights of task completion time and data exchange consumption (WETC).
γ=αT(u)+(1-α)E(b)
Thus, the following optimization model was constructed:
the following six constraints are simultaneously satisfied:
(2)Indicating that all tasks in the task graph need to be distributed to the idle gaps of the relevant task performers for performing;
(3)ensuring that tasks assigned to the same task performer cannot exceed maximum computational resources
(5)the method comprises the following steps of representing that the distance between a task initiator and a task performer is limited;
(6)the computational resources representing each task performer fluctuate randomly over an interval.
Further as a preferred embodiment of the method, an algorithm based on a combination of a Deep Q Network (DQN) and breadth-first traversal (BFS) is adopted to learn to offload computation-intensive graph tasks in a dynamic time-varying environment. The reinforcement learning of the pre-constructed deep Q network is as follows:
state space, as we mentionIn the outgoing DRL framework, agent will monitor the environment and record the system status at time intervals. At time t, we represent the state at this time asWhereinRepresenting the channel gain of the task performer i at time t, ft={ft,iI belongs to m represents the cpu frequency of the task performer i at the time t, ut={ut,iI belongs to m represents the idle gaps quantity of the task performer i at the time t, GtTopological relations representing a task graph, dt={dt,iI belongs to m represents the distance between the task executor i and the task initiator at the time t;
action space, when obtaining environment feedback state, the agent will according to the observed situation and the current task viAnd selecting the most suitable unloading strategy according to the output of the Q network. Thus, the current task viCan be represented as ai{ai,1,ai,2,...ai,mI ∈ m }, where ai,jIs set to a binary indicator, i.e. ai,jDenoted 1 as selection device njTo unload task vi,ai,j0 denotes no selected device njTo unload task vi;
Reward function, the optimization objective of the above considerations is to produce minimization of WETC with graph task offload in large scale scenarios facing random transformations, so we set the reward of the system to beWhere t (u) represents time consumption, e (b) represents data exchange consumption, and α and (1- α) represent weights of time consumption and data exchange consumption, respectively. The aim of minimum WETC is achieved by maximizing r, and finally the task initiator finds the optimal unloading scheme.
Further, as a preferred embodiment of the method, the step of performing reinforcement learning based on a pre-constructed deep Q network according to an environment state in the mobile edge computing system and an optimization goal of graph task offloading, and learning to offload a computation-intensive graph task in a dynamic time-varying environment to obtain an optimal action specifically includes:
s31, calculating the time consumption and the data exchange consumption in the dynamic environment according to the environment state in the mobile edge computing system and the optimization target of graph task unloading;
s32, determining a return r corresponding to the action a, inputting a new observed environment state S' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and updating parameters of a train Q-network in a reverse gradient transfer mode;
s33, repeating the step S32 until the return r is judged to be converged and approaches to the maximization, and taking the current action as the optimal action.
Further, as a preferred embodiment of the method, the step of determining the return r corresponding to the action a, inputting a new observed environment state s' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and further updating parameters of the train Q-network in a reverse gradient transfer manner specifically includes:
initializing graph task G ═ { V, E }, empirical playback pool, parameter θ of train-Q networktrainParameter theta of target-Q networktargetAnd a system environment, and setting an empty queue Q;
random selection of a task v by a task initiatoriTaking the queue as a head node of the queue and carrying out enqueue operation;
the task initiator sequentially takes out the tasks v to be unloaded from the queueiAnd according to the edge set E in the graph task G, the graph task V and the task V are sequentially combinediAll the associated tasks which are not unloaded carry out enqueue operation;
inputting the state of the current observed environment of the task initiator into a target Q-network, and outputting Q { s, a | theta }a∈ASelecting action a according to an epsilon-greedy algorithmiAnd then unloadedA task;
specifically, the epsilon-greedy algorithm:
task initiator observes state s from environmenti+1And reward return riAnd will(s)i,ai,ri,si+1) Accessing as experience values into an experience playback pool;
judging that the experience playback pool is full of data, and randomly selecting K experience values from the experience playback pool;
calculating a target value and combining an empirical value, and updating a parameter theta of the train-Q network based on a gradient descent methodtrain;
After F time steps, according to the parameter theta of the train Q-network at the current momenttrainUpdating parameter theta of target Q-networktarget。
As shown in fig. 2, an industrial internet of things graph task unloading system based on deep reinforcement learning includes:
the system model building module is used for building a mobile edge computing system based on task unloading scenes of one task initiator and a plurality of task performers in the industrial Internet of things;
the optimization target setting module is used for setting an optimization target for task unloading of the graph based on the mobile edge computing system, wherein the optimization target is the sum of the minimum task completion time and the weight consumed by data exchange;
and the reinforcement learning module is used for carrying out reinforcement learning based on a pre-constructed deep Q network according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (6)
1. An industrial Internet of things graph task unloading method based on deep reinforcement learning is characterized by comprising the following steps:
s1, constructing a mobile edge computing system based on unloading task scenes in the industrial Internet of things;
s2, setting an optimization target for graph task unloading based on the mobile edge computing system, wherein the optimization target is the sum of the weight of minimized task completion time and data exchange consumption;
s3, according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, performing reinforcement learning based on a pre-constructed deep Q network, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action;
the pre-constructed deep Q network comprises a train Q-network and a target Q-network which have the same structure.
2. The deep reinforcement learning-based industrial internet of things graph task offloading method according to claim 1, wherein the step of constructing the mobile edge computing system based on offloading task scenes in the industrial internet of things specifically includes:
s11, constructing a mobile edge computing system based on an unloading task scene in the industrial Internet of things, wherein the unloading task scene comprises a task initiator and a plurality of task executors;
s12, representing the dependency relationship between tasks by using an undirected acyclic graph G ═ { V, E }, which includes a set of tasks V ═ { Vi | i ∈ W }, and a set of edges E ═ E ∈ W }ijL (i, j) is belonged to W, i is not equal to j, wherein W represents the total number of tasks, and each edge e in GijVariable indication v used as binary indicatoriAnd vjWhether there is data exchange between them;
graph task offloading is performed in a mobile edge computing system, and the graph task offloading has transmission time consumption, execution time consumption and data exchange consumption.
3. The deep reinforcement learning-based industrial internet of things graph task unloading method according to claim 2, wherein the specific steps of reinforcement learning based on the pre-constructed deep Q network comprise:
state space, the state at time t being represented asWhereinRepresenting the channel gain of the task performer i at time t, ft={ft,iI belongs to m represents the cpu frequency of the task performer i at the time t, ut={ut,iI belongs to m represents the number of free gaps of the task performer i at the moment t, GtRepresenting topological relations of the task graph, dt={dt,iI belongs to m represents the distance between the task executor i and the task initiator at the time t;
action space, current task viIs represented by ai{ai,1,ai,2,…ai,mI ∈ m }, where ai,jSet to a binary indicator;
4. The deep reinforcement learning-based industrial internet of things task offloading method according to claim 3, wherein the step of performing reinforcement learning based on a pre-constructed deep Q network according to an environment state in a mobile edge computing system and an optimization goal of graph task offloading, learning to offload computation-intensive graph tasks in a dynamic time-varying environment, and obtaining an optimal action specifically comprises:
s31, calculating the time consumption and the data exchange consumption in the dynamic environment according to the environment state in the mobile edge computing system and the optimization target of graph task unloading;
s32, determining a return r corresponding to the action a, inputting a new observed environment state S' into the pre-constructed deep Q network, calculating a loss function loss by using the return r, and updating parameters of a train Q-network in a reverse gradient transfer mode;
s33, repeating the step S32 until the return r is judged to be converged and approaches to the maximization, and taking the current action as the optimal action.
5. The deep reinforcement learning-based industrial internet of things graph task offloading method according to claim 4, wherein the step of determining the reward r corresponding to the action a, inputting a new observed environment state s' into a pre-constructed deep Q network, calculating a loss function loss by using the reward r, and further updating parameters of a train Q-network in a reverse gradient transfer manner specifically comprises:
initializing graph task G ═ { V, E }, empirical playback pool, parameter θ of train-Q networktrainParameter theta of target-Q networktargetAnd a system environment, and setting an empty queue Q;
random selection of a task v by a task initiatoriTaking the queue as a head node of the queue and carrying out enqueue operation;
the task initiator sequentially takes out the tasks v to be unloaded from the queueiAnd according to the edge set E in the graph task G, the graph task V and the task V are sequentially combinediAll the associated tasks which are not unloaded carry out enqueue operation;
inputting the state of the current observed environment of the task initiator into a target Q-network, outputting Q s,a|θ}a∈Aselecting action a according to an epsilon-greedy algorithmiThen, unloading tasks are carried out;
task initiator observes state s from environmenti+1And reward return riAnd will(s)i,ai,ri,si+1) Accessing as experience values into an experience playback pool;
judging that the experience playback pool is full of data, and randomly selecting K experience values from the experience playback pool;
calculating a target value and combining an empirical value, and updating a parameter theta of the train-Q network based on a gradient descent methodtrain;
After F time steps, according to the parameter theta of the train Q-network at the current momenttrainUpdating parameter theta of target Q-networktarget。
6. The utility model provides an industry thing networking map task uninstallation system based on deep reinforcement learning which characterized in that includes:
the system model building module is used for building a mobile edge computing system based on unloading task scenes in the industrial Internet of things;
the optimization target setting module is used for setting an optimization target for task unloading of the graph based on the mobile edge computing system, wherein the optimization target is the sum of the minimum task completion time and the weight consumed by data exchange;
and the reinforcement learning module is used for carrying out reinforcement learning based on a pre-constructed deep Q network according to the environment state in the mobile edge computing system and the optimization target of graph task unloading, and learning to unload the computation-intensive graph task in a dynamic time-varying environment to obtain the optimal action.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110923267.8A CN113590229B (en) | 2021-08-12 | 2021-08-12 | Industrial Internet of things graph task unloading method and system based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110923267.8A CN113590229B (en) | 2021-08-12 | 2021-08-12 | Industrial Internet of things graph task unloading method and system based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113590229A true CN113590229A (en) | 2021-11-02 |
CN113590229B CN113590229B (en) | 2023-11-10 |
Family
ID=78257430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110923267.8A Active CN113590229B (en) | 2021-08-12 | 2021-08-12 | Industrial Internet of things graph task unloading method and system based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113590229B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115658251A (en) * | 2022-09-19 | 2023-01-31 | 重庆大学 | Federal multi-agent Actor-Critic learning intelligent logistics task unloading and resource distribution system and medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050210473A1 (en) * | 2004-03-08 | 2005-09-22 | Frank Inchingolo | Controlling task execution |
US20190394096A1 (en) * | 2019-04-30 | 2019-12-26 | Intel Corporation | Technologies for batching requests in an edge infrastructure |
WO2020119648A1 (en) * | 2018-12-14 | 2020-06-18 | 深圳先进技术研究院 | Computing task unloading algorithm based on cost optimization |
CN111726826A (en) * | 2020-05-25 | 2020-09-29 | 上海大学 | Online task unloading method in base station intensive edge computing network |
CN111835827A (en) * | 2020-06-11 | 2020-10-27 | 北京邮电大学 | Internet of things edge computing task unloading method and system |
CN112616152A (en) * | 2020-12-08 | 2021-04-06 | 重庆邮电大学 | Independent learning-based mobile edge computing task unloading method |
WO2021139537A1 (en) * | 2020-01-08 | 2021-07-15 | 上海交通大学 | Power control and resource allocation based task offloading method in industrial internet of things |
CN113157344A (en) * | 2021-04-30 | 2021-07-23 | 杭州电子科技大学 | DRL-based energy consumption perception task unloading method in mobile edge computing environment |
CN113225377A (en) * | 2021-03-30 | 2021-08-06 | 北京中电飞华通信有限公司 | Internet of things edge task unloading method and device |
-
2021
- 2021-08-12 CN CN202110923267.8A patent/CN113590229B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050210473A1 (en) * | 2004-03-08 | 2005-09-22 | Frank Inchingolo | Controlling task execution |
WO2020119648A1 (en) * | 2018-12-14 | 2020-06-18 | 深圳先进技术研究院 | Computing task unloading algorithm based on cost optimization |
US20190394096A1 (en) * | 2019-04-30 | 2019-12-26 | Intel Corporation | Technologies for batching requests in an edge infrastructure |
WO2021139537A1 (en) * | 2020-01-08 | 2021-07-15 | 上海交通大学 | Power control and resource allocation based task offloading method in industrial internet of things |
CN111726826A (en) * | 2020-05-25 | 2020-09-29 | 上海大学 | Online task unloading method in base station intensive edge computing network |
CN111835827A (en) * | 2020-06-11 | 2020-10-27 | 北京邮电大学 | Internet of things edge computing task unloading method and system |
CN112616152A (en) * | 2020-12-08 | 2021-04-06 | 重庆邮电大学 | Independent learning-based mobile edge computing task unloading method |
CN113225377A (en) * | 2021-03-30 | 2021-08-06 | 北京中电飞华通信有限公司 | Internet of things edge task unloading method and device |
CN113157344A (en) * | 2021-04-30 | 2021-07-23 | 杭州电子科技大学 | DRL-based energy consumption perception task unloading method in mobile edge computing environment |
Non-Patent Citations (2)
Title |
---|
于博文,等: "移动边缘计算任务卸载和基站关联协同决策问题研究", 《计算机研究与发展》, vol. 55, no. 03, pages 537 - 550 * |
卢海峰;顾春华;罗飞;丁炜超;杨婷;郑帅;: "基于深度强化学习的移动边缘计算任务卸载研究", 计算机研究与发展, no. 07, pages 195 - 210 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115658251A (en) * | 2022-09-19 | 2023-01-31 | 重庆大学 | Federal multi-agent Actor-Critic learning intelligent logistics task unloading and resource distribution system and medium |
Also Published As
Publication number | Publication date |
---|---|
CN113590229B (en) | 2023-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111756812B (en) | Energy consumption perception edge cloud cooperation dynamic unloading scheduling method | |
CN111226238B (en) | Prediction method, terminal and server | |
CN108958916B (en) | Workflow unloading optimization method under mobile edge environment | |
CN112181666A (en) | Method, system, equipment and readable storage medium for equipment evaluation and federal learning importance aggregation based on edge intelligence | |
CN112188442A (en) | Vehicle networking data-driven task unloading system and method based on mobile edge calculation | |
CN111093203A (en) | Service function chain low-cost intelligent deployment method based on environment perception | |
CN113568727A (en) | Mobile edge calculation task allocation method based on deep reinforcement learning | |
CN113660325B (en) | Industrial Internet task unloading strategy based on edge calculation | |
CN116541106B (en) | Computing task unloading method, computing device and storage medium | |
CN114340016A (en) | Power grid edge calculation unloading distribution method and system | |
Gupta et al. | Toward intelligent resource management in dynamic Fog Computing‐based Internet of Things environment with Deep Reinforcement Learning: A survey | |
CN113590229B (en) | Industrial Internet of things graph task unloading method and system based on deep reinforcement learning | |
Murti et al. | Learning-based orchestration for dynamic functional split and resource allocation in vRANs | |
CN113821346B (en) | Edge computing unloading and resource management method based on deep reinforcement learning | |
CN111158893B (en) | Task unloading method, system, equipment and medium applied to fog computing network | |
Li et al. | Efficient data offloading using Markovian decision on state reward action in edge computing | |
CN114693141B (en) | Transformer substation inspection method based on end edge cooperation | |
CN116455903A (en) | Method for optimizing dependency task unloading in Internet of vehicles by deep reinforcement learning | |
CN116954866A (en) | Edge cloud task scheduling method and system based on deep reinforcement learning | |
CN114546660B (en) | Multi-unmanned aerial vehicle cooperation edge computing method | |
CN115220818A (en) | Real-time dependency task unloading method based on deep reinforcement learning | |
CN115665160A (en) | Multi-access edge computing system and method for electric power safety tool | |
CN116431326A (en) | Multi-user dependency task unloading method based on edge calculation and deep reinforcement learning | |
CN115686821A (en) | Unloading method and device for edge computing task | |
CN109298933B (en) | Wireless communication network equipment and system based on edge computing network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |