CN108803313B - Path planning method based on ocean current prediction model - Google Patents
Path planning method based on ocean current prediction model Download PDFInfo
- Publication number
- CN108803313B CN108803313B CN201810589190.3A CN201810589190A CN108803313B CN 108803313 B CN108803313 B CN 108803313B CN 201810589190 A CN201810589190 A CN 201810589190A CN 108803313 B CN108803313 B CN 108803313B
- Authority
- CN
- China
- Prior art keywords
- current
- ocean current
- point
- time
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000009471 action Effects 0.000 claims abstract description 57
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims abstract description 10
- 230000005764 inhibitory process Effects 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 40
- 238000012549 training Methods 0.000 claims description 6
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 5
- 238000010586 diagram Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 3
- 210000002364 input neuron Anatomy 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 210000004205 output neuron Anatomy 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims 1
- 230000002787 reinforcement Effects 0.000 abstract description 4
- 238000004364 calculation method Methods 0.000 abstract description 2
- 238000001514 detection method Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000005265 energy consumption Methods 0.000 description 3
- 125000004432 carbon atom Chemical group C* 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009189 diving Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000013535 sea water Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/048—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators using a predictor
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Feedback Control In General (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention belongs to the field of underwater robot control, and discloses a path planning method based on an ocean current prediction model, which comprises the following steps: rasterizing a navigation area according to the path key points; carrying out ocean current prediction on a navigation area by using an area ocean mode, and carrying out fitting calculation to obtain real-time ocean current information; marking a no-go zone by using the electronic chart information; storing the navigation inhibition information and the starting point and end point position information of different depths according to plane grids of different depths, and whether the grid points are in a navigation inhibition area or not and whether the grid points reach the end point or not; calculating the direction from the current position to the terminal point and determining selectable actions in all the next driving directions; and (5) seeking an optimal strategy planned by the Markov decision process by using Q learning and outputting a path. The invention fully considers the influence of real-time ocean current on path planning, performs fitting through a BP neural network and a bagging algorithm, and seeks an optimal solution by using reinforcement learning, thereby accelerating the convergence speed and reducing the complexity of operation.
Description
Technical Field
The invention belongs to the field of underwater robot control, and particularly relates to a path planning method based on an ocean current prediction model.
Background
An underwater robot is also called an unmanned remote control submersible vehicle, and is a limit operation robot working underwater. Underwater robots have become an important tool for the development of the ocean because of the harsh environment and danger of underwater environments and limited human diving depths.
The underwater robot can replace manpower to operate for a long time underwater in a high-risk environment, a polluted environment and a zero-visibility water area, the underwater robot is generally provided with a sonar system, a camera, a lighting lamp, a mechanical arm and other devices, a real-time video and a sonar image can be provided, the mechanical arm can grab a crane, and the underwater robot is widely applied to the fields of oil development, marine law enforcement evidence obtaining, scientific research, military and the like.
Because the running environment of the underwater robot is complex, the noise of underwater acoustic signals is large, and various underwater acoustic sensors generally have the defects of poor precision and frequent jumping, the filtering technology in the underwater robot motion control system is very important. A position sensor commonly adopted in the motion control of the underwater robot is a short-baseline or long-baseline underwater acoustic positioning system, and a speed sensor is a Doppler velocimeter and can influence the accuracy of the underwater acoustic positioning system. The factors mainly include sound speed error, measurement error of response time of the transponder and correction error of the position, namely the distance, of the transponder. Factors influencing the accuracy of the Doppler velocimeter mainly comprise the sound velocity c, the physical and chemical properties of a medium in seawater, the pitching of a carrier and the like
Therefore, path planning is particularly important for underwater robots. The path planning is one of the basic links of the intelligent navigation of the underwater robot. When the underwater robot navigates in a large-scale marine environment, the influence of the marine environment on the navigation of the underwater robot needs to be considered besides the problems of obstacle avoidance and energy consumption. The ocean current changing along with time brings great challenges to the safety and task realization of the underwater robot, so that the underwater robot can utilize the energy in a flow field as much as possible by the predicted ocean current elements when planning the path, and a feasible safe path with low energy consumption is planned.
From the perspective of algorithm strategies, current path planning algorithms can be divided into path planning based on intelligent computation, path planning based on behavior and learning psychology, and random sampling path planning. These algorithms are mainly aimed at improving the solution space search efficiency and speeding up convergence, or are proposed for unknown environments or dynamic spaces, and at present, more and more scholars begin the research of path planning under the influence of ocean currents. The invention discloses a method for forecasting ocean current in real time by forecasting ocean current field data and AUV position and control instructions of a certain area at a future moment by using regional ocean modes, and an ocean current field used for path planning is more accurate and has real-time performance.
The patent with the application number of 201710538828.6 discloses a path planning device and a method of an unmanned underwater vehicle based on a detection threat domain, which solve the problem of path planning of a UUV under a terrain obstacle environment based on a path planning algorithm of the detection threat domain and can meet the kinematic constraint, collision avoidance constraint and hidden detection constraint of the UUV. A path from a motion starting point to a motion terminal point is planned at a given initial position, a terminal point position, a maximum curvature constraint, a path discrete point resolution, a hidden safety index and the like, and the path is smooth, continuous and derivable, and meets the navigation turning curvature constraint, the hidden safety index and the like of a UUV (unmanned Underwater vehicle) so that the UUV can safely and covertly reach the terminal point in the shortest time. The method applies the detection threat theory and the geometric theory of navigation turning curvature constraint to the field of path planning of UUV for the first time, can rapidly realize path planning, is simple and reliable, easy to realize, small in calculated amount and good in real-time performance, can meet the requirement of path planning, improves the practicability of path planning, and has positive significance for the development of the field of underwater path planning in the future. However, when the method is applied to path planning of an underwater vehicle, the problems of too complex calculation process and poor real-time performance exist.
Disclosure of Invention
The invention aims to disclose a path planning method based on an ocean current prediction model, which is low in energy consumption and high in safety.
The purpose of the invention is realized as follows:
a path planning method based on an ocean current prediction model comprises the following steps:
step (1): determining a navigation area according to the path key points, and rasterizing the navigation area;
step (2): carrying out ocean current prediction with the time step length delta T on the navigation area by using the regional ocean mode, fitting by using a bagging algorithm and a BP neural network according to the AUV real-time pose change, the control instruction and the ocean current information at the corresponding moment, and calculating to obtain the real-time ocean current information:
the control instruction corresponds to the AUV pose at the last moment, the AUV pose at the last moment and ocean current information, a bagging algorithm is used for generating T training sets, T BP neural networks are used for training T base learners based on each sampling set, the BP neural networks are three layers, the speed and the angle of the AUV at the last moment, the voltage of a rudder, a wing and a propeller and the speed and the angle of the AUV at the moment are input and output, the meridional speed and the latitudinal speed of ocean current are output, the input layer comprises 7 input neurons and 2 output neurons, the number of the neurons in the hidden layer is one of 5, 8, 10, 12 and 15, the number of the hidden layers is determined through a 10-fold cross-validation method, the final real-time ocean current element is obtained according to the error rate in proportion, and the ocean current element obtained at the moment is taken as the ocean current element at the next moment, so that the real-time ocean current information is obtained.
And (3): marking the area endangering the safe navigation of the underwater robot as a no-navigation area in the grid by utilizing the electronic chart information;
and (4): storing the navigation inhibition information and the starting point and end point position information of different depths according to plane grids of different depths, and storing the longitude and latitude of each point of the grid, whether the point is a navigation inhibition area or not and whether the point reaches the end point or not;
and (5): calculating the direction from the current position to the terminal point and determining the optional action in all the next driving directions:
according to the structure diagram of the rectangular grid, assuming that a black point in the middle of the rectangular grid is the current position of the underwater robot, 16 possible actions from a1, a2 to a16 exist in the current action, the possible actions are the actions from the current position of the underwater robot to the positions of the two layers of the underwater robot, and assuming that the position where the underwater robot is located after the current action is executed is in a no-navigation area;
let astFor the motion from the current point position to the target point position, the motion selection formula is as follows:
in the above formula, i is an integer, and i is belonged to [1,16 ]](ii) a Selection Ai>0, if the obstacle is at the nearest 8 grid points of the current point, the action corresponding to the grid point where the obstacle is located and the adjacent action are abandoned; if the obstacle is at a layer of grid point outside the current point, only the action corresponding to the grid point with the obstacle is abandoned.
And (6): and (3) adopting an emphasis learning mode, seeking the optimal strategy planned by the Markov decision process by using Q learning and outputting a path.
Step (6.1): initializing value function Q (s, a) ═ 0, initializing original strategy pi (s, a) ═ argmaxa Q(s,a);
Step (6.2): initialization State S0For the initial position, an initial time t is determined0;
Step (6.3): calculating the real-time ocean current speed of the current position through a neural network;
step (6.4): using the key exploration strategy to select action a and generate reward rt+1Transition to State St+1:
Focus exploration strategy μ (x):
in the above formula, the first and second carbon atoms are,
in the above formula, w1Is a weight coefficient of the distance influence, w2Is the weighting factor of the ocean current effects;vcis the current velocity, a, of the grid point at which the current position is located at time tiIs a probability of piAn optional action of (1).
Step (6.5): according to the original strategy pi, in the state St+1Selecting and executing action at+1。
Step (6.6): updating the function value of the state action value function:
Q(st,at)←Q(st,at)+β[rt+1+γQ(st+1,at+1)-Q(st,at)];
in the above formula, β represents a learning rate, and a numeric area is [0,1 ]; gamma denotes a discount factor.
Step (6.8): and (4) judging whether the underwater robot reaches the state of the target position, if not, turning to the step (6.3), and if so, turning to the step (6.9).
Step (6.9): judging whether the iteration times are reached or whether all the state action value functions are converged, and turning to the step (6.2) if the iteration times are not reached or the state action value functions are not converged; and if the iteration times are reached or all action value functions are converged, outputting an optimal strategy to obtain an optimal planning path.
The invention has the beneficial effects that:
the method fully considers the influence of real-time ocean current on path planning, predicts future ocean elements through regional ocean modes, and performs fitting by using a BP neural network and a bagging algorithm to obtain real-time ocean current information. Meanwhile, planning is carried out according to a Markov decision process, and an optimal solution is sought by using reinforcement learning, so that the convergence speed is increased, the complexity of operation is reduced, and a planned path is obtained better and faster.
Drawings
FIG. 1 is a flow chart of a path planning method based on an ocean current prediction model;
FIG. 2 is a diagram of a rectangular grid structure;
FIG. 3 is a schematic view of action selection;
figure 4 is a flow chart of a markov decision process planning.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
example 1:
as shown in fig. 1, a path planning method based on an ocean current prediction model includes the following steps:
step (1): determining a navigation area according to the path key points, and rasterizing the navigation area;
determining a rectangular navigation area according to the starting point and the end point of the underwater robot path; orthogonal curve grids are adopted in the horizontal direction, the grid distance range is set to be 2 km-30 km, and 20-30 layers are divided in the vertical direction in an equal depth mode.
Step (2): carrying out ocean current prediction with the time step length delta T on the navigation area by using the regional ocean mode, fitting by using a bagging algorithm and a BP neural network according to the AUV real-time pose change, the control instruction and the ocean current information at the corresponding moment, and calculating to obtain the real-time ocean current information:
adopting a sigma coordinate in the vertical direction, controlling the scale of the vertical coordinate to be [ -1,0] through a vertical transformation function and a stretching function, and setting the number of vertically divided layers;
vertical transformation function:
z(x,y,s,t)=η(x,y,t)+[η(x,y,t)+h(x,y)]×Z0(x,y,s);
in the above formula, z is the height of a cartesian coordinate system, x is a coordinate value of a warp, y is a coordinate value of a weft, s is a vertical distance from the water surface, t is time, η (x, y, t) is a free sea surface varying with time, h (x, y) is the thickness of an undisturbed water body, and hc is a conversion parameter;
stretching function:
in the above formula, θsIs a surface control parameter, 0 < thetas≤10。
The regional ocean mode initial condition is realized by four-dimensional assimilation, the boundary condition is obtained by differentiating a forecasting field of a global mode, a central difference format is adopted in space, a frog leaping format is adopted in time, the time step length is set to be 5min, and the ocean current field of a navigation region is forecasted and stored in a file.
The control instruction corresponds to the AUV pose at the last moment, the AUV pose at the last moment and ocean current information, a bagging algorithm is used for generating T training sets, T BP neural networks are used for training T base learners based on each sampling set, the BP neural networks are three layers, the speed and the angle of the AUV at the last moment, the voltage of a rudder, a wing and a propeller and the speed and the angle of the AUV at the moment are input and output, the meridional speed and the latitudinal speed of ocean current are output, the input layer comprises 7 input neurons and 2 output neurons, the number of the neurons in the hidden layer is one of 5, 8, 10, 12 and 15, the number of the hidden layers is determined through a 10-fold cross-validation method, the final real-time ocean current element is obtained according to the error rate in proportion, and the ocean current element obtained at the moment is taken as the ocean current element at the next moment, so that the real-time ocean current information is obtained.
And (3): marking the area endangering the safe navigation of the underwater robot as a no-navigation area in the grid by utilizing the electronic chart information;
and (4): storing the navigation inhibition information and the starting point and end point position information of different depths according to the plane grids of different depths, and storing the longitude and latitude of each point of the grid, whether the grid is a navigation inhibition area or not and whether the grid reaches an end point or not;
and (5): calculating the direction from the current position to the terminal point and determining the optional action in all the next driving directions:
as shown in fig. 2, according to the rectangular grid structure diagram, it is assumed that a black dot in the middle of the rectangular grid is the current position of the underwater robot, and there are 16 possible current actions from a1, a2 to a16, the possible actions are the actions from the current position of the underwater robot to the positions of the two layers of the underwater robot, and it is assumed that the position where the current action is executed is in the no-navigation area;
as shown in FIG. 3, let astFor the motion from the current point position to the target point position, the motion selection formula is as follows:
in the above formula, i is an integer, and i belongs to [1,16 ]](ii) a Selection Ai>0, if the obstacle is at the nearest 8 grid points of the current point, the action corresponding to the grid point where the obstacle is located and the adjacent action are abandoned; if the obstacle is at a layer of grid point outside the current point, only the action corresponding to the grid point with the obstacle is abandoned.
And (6): and (3) adopting a key learning mode, seeking an optimal strategy planned by a Markov decision process by using Q learning and outputting a path:
the markov decision process is described by the quintuple (S, a, P, R, γ), where:
s is a finite state set, A is a finite action set, P is a state transition probability, R is a return function, and gamma is a discount factor for calculating a cumulative return.
The goal of reinforcement learning is to seek the optimal strategy given a markov decision process. A policy refers to the mapping of states to actions, often denoted by the symbol π. The underwater robot plans an optimal path according to a return optimization strategy by exploring actions in a strategy for an unknown environment, when one action generates positive return, the action is strengthened, and the action can be selected at a high probability when the same state reappears next time, otherwise, the action is weakened, and the optimal strategy is sought by continuously interacting with the environment. Due to the inherent adaptability, reaction capability and online learning capability, the Q learning is mostly used in path planning of unknown environments, and is most widely applied. The specific steps are shown in fig. 4.
Step (6.1): initializing value function Q (s, a) ═ 0, initializing original strategy pi (s, a) ═ argmaxa Q(s,a);
Step (6.2): initialization state S0For the initial position, an initial time t is determined0;
Step (6.3): calculating the real-time ocean current speed of the current position through a neural network;
step (6.4): using the key exploration strategy to select action a and generate reward rt+1Transition to State St+1:
The key exploration strategy is as follows:
in the above formula, the first and second carbon atoms are,
in the above formula, w1Is a weight coefficient of the distance influence, w2Is the weighting factor of the ocean current influence;vcis the current speed at the grid point of the current position at time t, aiIs a probability of piAn optional action of (1);
the immediate reward function:
in the above formula, wdIs a weight coefficient, w, of a distance reward or punishment functionrIs a weight coefficient of a hazard reward or punishment function, wcIs the weight coefficient of the ocean current reward and punishment function;
distance reward and punishment function rdD (t) d (t +1), d (t) representing the distance from the robot position to the target point at time t, and d (t +1) representing the distance from the robot position to the target point at time t + 1.
Danger reward and punishment function:
in the above formula, doThe lattice distance between the current position of the underwater robot and the obstacle is obtained;
submarine current reward and punishment function rc=vccos | α - θ | where α is the heading angle and θ is the direction of ocean currents;
generating a reward r according to the action a selected by the key exploration strategyt+1Transition to State St+1;
Step (6.5): according to the original strategy pi, in the state St+1Selecting and executing action at+1。
Step (6.6): updating the function value of the state action value function:
Q(st,at)←Q(st,at)+β[rt+1+γQ(st+1,at+1)-Q(st,at)];
in the above formula, β represents a learning rate, and a numeric area is [0,1 ]; gamma denotes a discount factor.
Step (6.8): and (4) judging whether the underwater robot reaches the state of the target position, if not, turning to the step (6.3), and if so, turning to the step (6.9).
Step (6.9): judging whether the iteration times are reached or whether all the state action value functions are converged, and turning to the step (6.2) if the iteration times are not reached or the state action value functions are not converged; and if the iteration times are reached or all action value functions are converged, outputting an optimal strategy to obtain an optimal planning path.
Compared with the prior art, the method fully considers the influence of real-time ocean currents on path planning, predicts future ocean elements through regional ocean modes, and utilizes the BP neural network and bagging algorithm to carry out fitting to obtain real-time ocean current information. Meanwhile, planning is carried out according to a Markov decision process, and an optimal solution is sought by using reinforcement learning, so that the convergence speed is increased, the complexity of operation is reduced, and a planned path is obtained better and faster.
The above description is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (1)
1. A path planning method based on an ocean current prediction model is characterized by comprising the following steps: comprises the following steps:
step (1): determining a navigation area according to the path key points, and rasterizing the navigation area;
determining a rectangular navigation area according to the starting point and the end point of the underwater robot path; adopting orthogonal curve grids in the horizontal direction, setting the grid distance range to be 2 km-30 km, and dividing 20-30 layers in the vertical direction in an equal depth manner;
adopting sigma coordinates in the vertical direction, controlling the scale of the vertical coordinates to be [ -1,0] through a vertical transformation function and a stretching function, and setting the number of vertically divided layers;
vertical transformation function:
z(x,y,s,t)=η(x,y,t)+[η(x,y,t)+h(x,y)]×Z0(x,y,s);
in the above formula, z is the height of the cartesian coordinate system, x is the coordinate value of the longitude, y is the coordinate value of the latitude, s is the vertical distance from the water surface, t is the time, η (x, y, t) is the free sea surface varying with time, h (x, y) is the thickness of the undisturbed water body, and hc is the conversion parameter;
stretching function:
in the above formula, θsIs a surface control parameter, 0 < thetas≤10;
Step (2): carrying out ocean current prediction with the time step length delta T on the navigation area by using an area ocean mode, fitting by using a bagging algorithm and a BP neural network according to the AUV real-time pose change, the control instruction and ocean current information at a corresponding moment, and calculating to obtain real-time ocean current information;
the step (2) is specifically as follows:
the control instruction corresponds to the AUV pose at the last moment, the AUV pose at the last moment and ocean current information, a bagging algorithm is used for generating T training sets, T BP neural networks are used for training T base learners based on each sampling set, the BP neural networks are three layers, the input is the speed and the angle of the AUV at the last moment, the voltage of a rudder, a wing and a propeller and the speed and the angle of the AUV at the moment, the output is the warp-wise speed and the weft-wise speed of ocean current, the input layer is 7 input neurons and 2 output neurons, the number of the neurons in the hidden layer is one of 5, 8, 10, 12 and 15, the number of the hidden layers is determined by a 10-fold cross-validation method, the final real-time ocean current element is obtained according to the error rate in proportion, and the ocean current element obtained at the moment is taken as the ocean current element at the next moment, so that the real-time ocean current information is obtained;
and (3): marking the area endangering the safe navigation of the underwater robot as a restricted navigation area in the grid by utilizing the electronic chart information;
and (4): storing the navigation inhibition information and the starting point and end point position information of different depths according to plane grids of different depths, and storing the longitude and latitude of each point of the grids, whether the grid is a navigation inhibition area or not and whether the grid reaches an end point or not;
and (5): calculating the direction from the current position to the terminal point and determining selectable actions in all the next driving directions;
the step (5) is specifically as follows:
according to the structure diagram of the rectangular grid, assuming that a black point in the middle of the rectangular grid is the current position of the underwater robot, 16 possible actions from a1, a2 to a16 exist in the current action, the possible actions are the actions from the current position of the underwater robot to the positions of the two layers of the underwater robot, and assuming that the position where the underwater robot is located after the current action is executed is in a no-navigation area;
let astFor the motion from the current point position to the target point position, the motion selection formula is as follows:
in the above formula, i is an integer, and i is belonged to [1,16 ]](ii) a Selection AiIf the obstacle is more than 0, if the obstacle is 8 grid points closest to the current point, the action corresponding to the grid point where the obstacle is located and the adjacent action are abandoned; if the obstacle is positioned at a layer of grid points outside the current point, only the action corresponding to the grid points with the obstacle is abandoned;
and (6): adopting a key learning mode, seeking an optimal strategy planned by a Markov decision process by using Q learning and outputting a path;
the step (6) comprises the following steps:
step (6.1): initializing value function Q (s, a) ═ 0, initializing original strategy pi (s, a) ═ argmaxaQ(s,a);
Step (6.2): initialization state S0For the initial position, an initial time t is determined0;
Step (6.3): calculating the real-time ocean current speed of the current position through a neural network;
step (6.4): using the key exploration strategy to select action a and generate reward rt+1Transition to State St+1;
Step (6.5): according to the original strategy pi, in the state St+1Selecting and executing action at+1;
Step (6.6): updating the function value of the state action value function;
Q(st,at)←Q(st,at)+β[rt+1+γQ(st+1,at+1)-Q(st,at)];
in the above formula, β represents a learning rate, and a numeric area is [0,1 ]; gamma represents a discount factor;
immediate reward function:
wherein, wd、wr、wcRespectively are weight coefficients of a distance reward and punishment function, a danger reward and punishment function and a sea current reward and punishment function;
distance reward and punishment function rdD (t) -d (t +1), d (t) representing the distance from the robot position to the target point at time t, d (t +1) representing the distance from the robot position to the target point at time t + 1;
danger reward and punishment function:
d0the lattice distance between the current position of the underwater robot and the obstacle is obtained;
submarine current reward and punishment function rc=vccos|α0-θ|,α0Is the course angle, theta is the ocean current direction;
Step (6.8): judging whether the underwater robot reaches the state of the target position, if not, turning to the step (6.3), and if so, turning to the step (6.9);
step (6.9): judging whether the iteration times are reached or whether all the state action value functions are converged, and turning to the step (6.2) if the iteration times are not reached or the state action value functions are not converged; if the iteration times are reached or all action value functions are converged, outputting an optimal strategy to obtain an optimal planning path;
the key exploration strategy mu (x) is as follows:
in the above-mentioned formula, the compound has the following structure,
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810589190.3A CN108803313B (en) | 2018-06-08 | 2018-06-08 | Path planning method based on ocean current prediction model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810589190.3A CN108803313B (en) | 2018-06-08 | 2018-06-08 | Path planning method based on ocean current prediction model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108803313A CN108803313A (en) | 2018-11-13 |
CN108803313B true CN108803313B (en) | 2022-07-12 |
Family
ID=64088958
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810589190.3A Active CN108803313B (en) | 2018-06-08 | 2018-06-08 | Path planning method based on ocean current prediction model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108803313B (en) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109445437A (en) * | 2018-11-30 | 2019-03-08 | 电子科技大学 | A kind of paths planning method of unmanned electric vehicle |
CN109657863B (en) * | 2018-12-20 | 2021-06-25 | 智慧航海(青岛)科技有限公司 | Firefly algorithm-based unmanned ship global path dynamic optimization method |
CN109726866A (en) * | 2018-12-27 | 2019-05-07 | 浙江农林大学 | Unmanned boat paths planning method based on Q learning neural network |
CN109948054A (en) * | 2019-03-11 | 2019-06-28 | 北京航空航天大学 | A kind of adaptive learning path planning system based on intensified learning |
CN110555584B (en) * | 2019-07-17 | 2021-04-06 | 浙江工业大学 | Automatic parking lot scheduling method based on deep reinforcement learning |
CN110543171B (en) * | 2019-08-27 | 2020-07-31 | 华中科技大学 | Storage multi-AGV path planning method based on improved BP neural network |
CN110763234B (en) * | 2019-10-15 | 2022-10-28 | 哈尔滨工程大学 | Submarine topography matching navigation path planning method for underwater robot |
CN111645079B (en) * | 2020-08-04 | 2020-11-10 | 天津滨电电力工程有限公司 | Device and method for planning and controlling mechanical arm path of live working robot |
CN111958601A (en) * | 2020-08-19 | 2020-11-20 | 西南交通大学 | Automatic path finding and material identification method based on deep learning |
CN112215395B (en) * | 2020-09-02 | 2023-04-18 | 中国船舶重工集团公司第七研究院 | Underwater equipment adaptability information guarantee system based on ocean big data |
CN112698646B (en) * | 2020-12-05 | 2022-09-13 | 西北工业大学 | Aircraft path planning method based on reinforcement learning |
CN112581026B (en) * | 2020-12-29 | 2022-08-12 | 杭州趣链科技有限公司 | Joint path planning method for logistics robot on alliance chain |
CN113064440B (en) * | 2021-03-15 | 2022-08-02 | 哈尔滨工程大学 | Self-adaptive observation method based on ocean mode |
CN113052370B (en) * | 2021-03-15 | 2024-06-14 | 哈尔滨工程大学 | Ocean environment element statistical prediction method based on space-time experience orthogonal function |
CN113325856B (en) * | 2021-05-31 | 2022-07-08 | 中国船舶工业集团公司第七0八研究所 | UUV optimal operation path planning method based on countercurrent approximation strategy |
CN114200929B (en) * | 2021-11-24 | 2023-10-20 | 中国科学院沈阳自动化研究所 | Rapid comb-type path planning method for maximum detection coverage rate of multi-underwater robot |
CN116700315B (en) * | 2023-07-03 | 2024-02-06 | 苏州优世达智能科技有限公司 | Unmanned ship track tracking control method and system |
CN116929372B (en) * | 2023-07-31 | 2024-08-09 | 哈尔滨工程大学 | Ocean robot path planning method and system based on multi-energy consumption capturing modeling |
CN118674125A (en) * | 2024-08-22 | 2024-09-20 | 广东海洋大学 | Ocean current energy prediction method and system for target ocean region |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006122030A2 (en) * | 2005-05-07 | 2006-11-16 | Thaler Stephen L | Device for the autonomous bootstrapping of useful information |
CN102175245A (en) * | 2011-01-28 | 2011-09-07 | 哈尔滨工程大学 | Underwater vehicle path planning method based on ocean current historical statistic information |
CN102799179A (en) * | 2012-07-06 | 2012-11-28 | 山东大学 | Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning |
CN102819264A (en) * | 2012-07-30 | 2012-12-12 | 山东大学 | Path planning Q-learning initial method of mobile robot |
CN106970615A (en) * | 2017-03-21 | 2017-07-21 | 西北工业大学 | A kind of real-time online paths planning method of deeply study |
CN107748566A (en) * | 2017-09-20 | 2018-03-02 | 清华大学 | A kind of underwater autonomous robot constant depth control method based on intensified learning |
-
2018
- 2018-06-08 CN CN201810589190.3A patent/CN108803313B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006122030A2 (en) * | 2005-05-07 | 2006-11-16 | Thaler Stephen L | Device for the autonomous bootstrapping of useful information |
CN102175245A (en) * | 2011-01-28 | 2011-09-07 | 哈尔滨工程大学 | Underwater vehicle path planning method based on ocean current historical statistic information |
CN102799179A (en) * | 2012-07-06 | 2012-11-28 | 山东大学 | Mobile robot path planning algorithm based on single-chain sequential backtracking Q-learning |
CN102819264A (en) * | 2012-07-30 | 2012-12-12 | 山东大学 | Path planning Q-learning initial method of mobile robot |
CN106970615A (en) * | 2017-03-21 | 2017-07-21 | 西北工业大学 | A kind of real-time online paths planning method of deeply study |
CN107748566A (en) * | 2017-09-20 | 2018-03-02 | 清华大学 | A kind of underwater autonomous robot constant depth control method based on intensified learning |
Non-Patent Citations (2)
Title |
---|
基于ARIMA_BP神经网络模型海流流速预测研究;董世超;《中国科技信息》;20140228(第02期);第86-88页第1-4节 * |
水下机器人路径规划问题的关键技术研究;曹江丽;《中国博士学位论文全文数据库 信息科技辑》;20110215(第02期);第I140-42页第5章 * |
Also Published As
Publication number | Publication date |
---|---|
CN108803313A (en) | 2018-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108803313B (en) | Path planning method based on ocean current prediction model | |
CN109540151B (en) | AUV three-dimensional path planning method based on reinforcement learning | |
CN108803321B (en) | Autonomous underwater vehicle track tracking control method based on deep reinforcement learning | |
CN109976349B (en) | Design method of path tracking guidance and control structure of constraint-containing unmanned ship | |
Wang et al. | A COLREGs-based obstacle avoidance approach for unmanned surface vehicles | |
CN109765929B (en) | UUV real-time obstacle avoidance planning method based on improved RNN | |
CN109753068A (en) | A kind of more USV multi-agent synergy collision-avoidance planning methods considering signal intelligence | |
CN109241552A (en) | A kind of underwater robot motion planning method based on multiple constraint target | |
CN111026135B (en) | High-performance sailing feedforward control system and control method for unmanned ship | |
Lan et al. | Path planning for underwater gliders in time-varying ocean current using deep reinforcement learning | |
CN114610046A (en) | Unmanned ship dynamic safety trajectory planning method considering dynamic water depth | |
CN109916419A (en) | A kind of hybrid genetic algorithm unmanned boat real-time route planing method of object-oriented | |
Yan et al. | Real-world learning control for autonomous exploration of a biomimetic robotic shark | |
Liang et al. | Economic MPC-based planning for marine vehicles: Tuning safety and energy efficiency | |
Hedjar et al. | An automatic collision avoidance algorithm for multiple marine surface vehicles | |
Zhang et al. | Dynamic path planning algorithm for unmanned surface vehicle under island-reef environment | |
Gao et al. | Artificial intelligence algorithms in unmanned surface vessel task assignment and path planning: A survey | |
CN113741477A (en) | Under-actuated ship berthing path planning method | |
Wang et al. | Dynamic position predicting of underactuated surface vessel with unscented Kalman filter | |
Guo et al. | Path planning for autonomous underwater vehicles based on an improved artificial jellyfish search algorithm in multi-obstacle ocean current environment | |
CN116540717A (en) | AUV local path planning method based on improved DWA | |
Yiming et al. | Variable-structure filtering method for an unmanned wave glider | |
CN115185262A (en) | Dynamic obstacle avoidance path rapid planning method based on minimum safe meeting distance | |
CN115793639A (en) | Unmanned ship complex path planning method and device based on reinforcement learning algorithm | |
Qu et al. | USV Path Planning Under Marine Environment Simulation Using DWA and Safe Reinforcement Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |