CN112966714A

CN112966714A - Edge time sequence data anomaly detection and network programmable control method

Info

Publication number: CN112966714A
Application number: CN202110142428.XA
Authority: CN
Inventors: 吴迪; 戴宁一; 邓晗晖; 江中凯; 谢小峰; 范喆; 聂祥
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2021-02-02
Filing date: 2021-02-02
Publication date: 2021-06-15
Anticipated expiration: 2041-02-02
Also published as: CN112966714B

Abstract

The invention relates to an edge time sequence data anomaly detection and network programmable control method, and belongs to the field of combination of time sequence data of the Internet of things, deep learning and machine learning. Acquiring time sequence data on the Internet of things edge equipment; predicting the time sequence data of the Internet of things according to a Grid LSTM-based attention mechanism; predicting the Internet of things time sequence data on the edge equipment by using an attention mechanism prediction model based on Grid LSTM to obtain an error between a true value and a predicted value; carrying out anomaly detection on the error by utilizing an SVM algorithm to obtain the abnormal condition of the data; the tracing and shielding of the transmission path of the abnormal data packet and the searching of the new transmission path of the data are realized. The method has the advantages that the analysis and processing capability of the time series data of the Internet of things is improved, and the data prediction performance and the abnormality detection performance are improved; the problem of data security during wireless sensor network data transmission is solved.

Description

Edge time sequence data anomaly detection and network programmable control method

Technical Field

The invention relates to an edge time sequence data anomaly detection and network programmable control method, and belongs to the field of combination of time sequence data of the Internet of things, deep learning and machine learning.

Background

With the development of smart cities, industry 4.0, supply chain and home automation technologies, Internet of Things (IoT) applications have generated a large amount of data. According to Cisco's report, by 2021 the number of connected Internet of things devices will reach 116 billion, meaning that more than 49 megabytes of data traffic will be generated per month. Ubiquitous sensors generate a great deal of data and information, and the data is becoming the most common data form in the computing of the internet of things, so that data transmission and processing play an increasingly critical role in the application of the internet of things. Very useful and valuable information can be obtained by processing the data of the Internet of things, so that guarantee is provided for intelligent automation and decision making of the application programs of the Internet of things. The data-driven internet of things application has strict requirements on delay, reliability, safety and real-time performance, and the internet of things needs to preload traffic due to the limitation of network bandwidth and computing resources, so a platform needs to be deployed, the platform integrates functions of connection, computation, storage and the like at the edge of a network, and the appearance of edge computing meets the requirements, so the edge computing becomes a solution for performing data processing and intellectualization at the edge. Network, i.e. the source closest to the internet of things data. Meanwhile, the data of the internet of things is generated by distributed intelligent equipment and a sensor, so that the data of the internet of things has the characteristic of large scale, and different data acquisition equipment enables the types of data acquired in the internet of things to be various; because the sensor data collected by the equipment located at a specific position is marked with a timestamp, the data of the internet of things has mutual dependency, which is the most obvious difference between the data of the internet of things and the traditional data, and the data of the internet of things has high real-time requirements, for example, when a certain sensor is abnormally operated, the sensor needs to be immediately detected to avoid influencing the normal operation of other equipment. Therefore, there is a need for an effective and efficient system for analyzing and processing internet of things time-series data of edges. Although the prediction of time series data and space-time data has more application scenes in real life, most of wireless sensor networks are used for energy-saving analysis of nodes, and the methods are rarely used for considering the safety problem of the wireless sensor nodes. Therefore, a new application scenario in the wireless sensor network is provided for the prediction of the time sequence data and the time-space data, so that the unsafe problem that the data of the wireless sensor network is falsified in the data transmission process is solved, the automatic network maintenance in the data transmission process of the wireless sensor network is realized, and the safe data transmission of the wireless sensor is ensured.

Disclosure of Invention

The invention aims to provide an edge time sequence data anomaly detection and prediction analysis method, thereby overcoming the defects in the prior art.

The technical scheme of the invention is to provide a concept of an edge long-time memory network (EdgeLSTM) system, in particular to a thought of combining an attention mechanism in deep learning, Grid LSTM and SVM in a machine learning algorithm. And extracting the multidimensional characteristics of the time sequence data of the Internet of things by using an EdegLSTM system, and performing flexible and stable processing at the network edge. The key idea of the edge long-time memory network (EdgeLSTM) is to extract the importance of multidimensional characteristics by using an attention mechanism, expand LSTM units into a Grid structure and calculate the time sequence data of the Internet of things by using Grid LSTM depth. Grid LSTM deploys units along arbitrary dimensions. In an edge long-short memory network (EdgeLSTM) system, the cells are deployed along the time (horizontal) direction and the depth (vertical) direction, and the edge long-short memory network (EdgeLSTM) can handle processing of data having multiple dimensions and more complex features than a standard LSTM network. Specifically, edge-long-short-term memory network (EdgeLSTM) systems use a Grid LSTM neural network-based attention mechanism to predict trends in time series data, and then use a Multiclass support vector machine (Multiclass SVM) to classify anomaly detection. The edge long-time memory network (EdgeLSTM) system can fully play the potential of edge calculation, and improve the management of the network system through the edge data-driven processing of the internet of things. An edge long-time memory network (EdgeLSTM) is deployed on three data-driven Internet of things application programs, namely data prediction, anomaly detection and network maintenance. And the tracing and shielding of the transmission path of the abnormal data packet and the searching of a new data transmission path are realized through a network programmable control method, so that the safety problem in the data transmission process of the wireless sensor network is solved, and the sensor node is maintained.

The method has the advantages that the analysis and processing capability of the time sequence data of the Internet of things is improved, and meanwhile, the abnormality detection is carried out on the time sequence data of the Internet of things; compared with a model without an edge long-time memory network (EdgeLSTM), the data prediction performance and the anomaly detection performance are improved. The network programmable control method improves the reliability and the safety of a data transmission network.

Drawings

Fig. 1 is a diagram of the computing architecture of the internet of things in an edge network according to the present invention.

Fig. 2 is a schematic diagram of a NeuroIoT framework oriented to processing and analyzing time-series data of the internet of things according to the present invention.

FIG. 3 is a diagram of the structure of Grid LSTM unit according to the present invention.

FIG. 4 is a diagram of the Grid LSTM network architecture of the present invention.

FIG. 5 is a diagram of an attention prediction model based on EnhancedGrid LSTM according to the present invention.

FIG. 6 is a graph of the predicted results on a power data set according to the present invention.

Fig. 7 is a diagram of the predicted results on the SML2010 data set according to the present invention.

FIG. 8 is a trace-back diagram of an abnormal packet

Detailed Description

In order to more clearly illustrate the technical solution of the embodiment of the present invention, the following is further described with reference to the accompanying drawings 1-8; obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

Fig. 1 is a computing architecture diagram of the internet of things in an edge network, which mainly includes a data sensing layer, a data transmission layer, a data storage and processing layer, a joint learning and analysis layer, and an application layer of the internet of things.

Fig. 2 is a NeuroIoT framework diagram for processing and analyzing time-series data of the internet of things according to the embodiment of the present invention, where the method includes:

firstly, exploratory analysis is carried out on original data

Specifically, in this example, for the collected data of the original sensor, the invention performs preliminary data exploration analysis, mainly looking up the correlation between the general variables and the general variables, and the correlation between the general variables and the target variables; and observing the conditions of missing values and abnormal values of each variable.

Step two, preprocessing data

In particular, in this example, the raw data set is acquired by a plurality of sensors, and therefore, the raw data set needs to be preprocessed, mainly including data cleaning, data filling, data down-sampling, and usage

Normalizing the data, wherein min and max represent the minimum and maximum values, respectively, of the values of a certain list of features, x represents all the values of the feature, and x represents the minimum and maximum values of the feature_stdRepresents the normalized value with the value range of [0,1 ]]。

Step three, dividing the data set

Specifically, in this example, the preprocessed data set is segmented according to a ratio of 6: 2: 2 into a training set, a validation set and a test set, wherein the training set only contains normal data, and the test set and the validation set contain both normal data and abnormal data.

Step four, building an attention prediction model based on Enhanced Grid LSTM

Specifically, in this example, because the data size of the abnormal value in the time series data of the internet of things is small, the prediction model is built by using the training set, and then the hyper-parameters are selected for the prediction model through the verification set to achieve a better effect, and the structure diagram of the prediction model of the present invention is shown in fig. 5. For the model building process, some formulas are specifically used as follows:

g^u＝σ(W^uH)

g^f＝σ(W^fH)

g^o＝σ(W^oH)

g^c＝tanh(W^cH) (1)

wherein, sigma is a logic sigmoid function and the expression is

W^u，W^f，W^o，W^cRespectively representing the weight matrix in different states. H ═ x [ I ] x [ ]_i，h]^TWherein x is_iRepresenting the current input, I the transformed mapping matrix, and h the output vector at the previous time instant. g^uA presentation input gate for determining what information is to be updated; g^fA presentation forgetting gate for deciding what information needs to be discarded; g^oA presentation output gate for determining what information is to be output into the next cell state; g^cIndicating what information is currently to be updated into the new cell; m 'represents the output of the state of the memory unit at the current moment, and h' represents the output of the state of the hidden unit at the current moment. The above is a framework of the most basic LSTM neural network through which the N-dimensional hidden vector h is passed₁，h₂，…，h_i，…，h_NAnd N-dimensional memoryVector m₁，m₂，…，m_i，…，m_NAs input parameters, the N-dimensional hidden vector h 'is finally output'₁，h′₂，…，h′_i，…，h′_NAnd N-dimensional memory vector m'₁，m′₂，…，m′_i，…，m′_NThe specific formula is as follows:

wherein W_i(i-1, 2, …, N) is a weight matrix

And (5) splicing to form a weight matrix.

The structure of the Grid LSTM network is shown in FIG. 3: for each cell, the Grid has N edges to receive the hidden state vector and the memory state vector and outputs N hidden state vectors and memory state vectors, with a data point mapped into the Grid LSTM network along a pair of input hidden/memory state vectors on one side. In the edge long-short time memory network (EdgeLSTM) system, a 2-dimensional Grid LSTM unit is used, the structure diagram of which is shown in fig. 3, h₁And h₂Representing the concealment vector in the temporal (horizontal) and depth (vertical) dimensions, m, respectively₁And m₂Representing the memory vectors in the temporal and depth directions, respectively. Thus using h in the time dimension₁And m₁To perform 2D mesh LSTM unit calculation, and finally output a hidden state vector h'₁And a memory state vector m'₁(ii) a Correspondingly, h is aligned in the depth dimension₂And m₂Calculating to obtain a hidden state vector h'₂And a memory state vector m'₂。h₁And h₂Various gating mechanisms used in equation 1 above are generated, and m₁And m₂Are combined into a main memory state vector for learning the complex characteristics of the time sequence data of the Internet of things. After the 2D mesh LSTM cells are constructed, the cells are connected to form a 2D mesh LSTM network, which is composed of four cells connected by a loop, as shown in fig. 4 below, with the horizontal axis representing the time dimension and the vertical axis representing the depth dimension. Specifically, the input in the time dimension corresponds to a time sequence, and the cells of each hidden layer correspond to different time steps. For the output at the current moment, it takes the data at the different moments into account and evaluates their influence at the next moment. For each Grid LSTM cell, it controls the input, storage, and output of data through the gating mechanism described above. The gating mechanism of the current cell receives the previous hidden layer (e.g.

) The output of the previous time instant generated. Inputting the current time sample and then determining the output of the current hidden layer (e.g. for a given layer)

) This will be used at the next instant in time to generate the output of the next hidden layer (e.g. the output of the next hidden layer)

). For information processing along the depth dimension, whose workflow is similar to the time dimension, the gating mechanism of the current cell handles information from a previous hidden layer (e.g., the

) And generates an output of the current hidden layer (e.g., a time sequence of the current hidden layer), and generates a time sequence of the current hidden layer

) Correspondingly deriving the next hidden layer (e.g. of

) To output of (c). In an edge long-and-short time memory network (EdgeLSTM), grid LSTM cells in the time dimension are taken as depthThe degree dimension is input, and then the grid LSTM cells in the depth dimension are input as the time dimension. By combining with the attention mechanism idea, the characteristics with high importance of time sequence data are better extracted. By performing this replacement process, sequence data from the time and depth dimensions is ultimately modeled and predicted. Compared with a single LSTM network, the edge long-time memory network (EdgeLSTM) can enhance the capability of the model to learn more complex features in the data of the Internet of things.

Step five, predicting data

Specifically, in this example, the trained prediction model is tested and evaluated by using the test set, and finally, the prediction value of the normal data and the prediction value of the abnormal data are obtained. Since it is a regression problem in machine learning, the evaluation indexes used are Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R²The score is calculated by the following specific formula:

wherein,

representing the true value of the ith sample at time t,

denotes the predicted value of the ith sample at time t, n denotes the total number of samples,

representing the mean of the samples at time t. The smaller the value of MAPE, the better, the minimum value is 0; r²The larger the value of (A), the better, the maximum value is 1, and the fitting effect of the model to unknown data is best. The results of the predictions on the two test data sets shown in fig. 6 and 7, along with table 1, also show that the prediction model of the present invention works best relative to the other models.

TABLE 1 comparison of predicted Performance of different models on two datasets

Step six, data anomaly detection

Specifically, in this example, through the fourth and fifth steps, the predicted value of the verification set on the prediction model and the predicted value of the test set on the prediction model can be obtained, the verification set residual data set and the test set residual data set are respectively constructed according to the obtained predicted values, then the multiple classes of SVM detection models are constructed by using the verification set residual data set, and the multiple classes of SVM abnormality detection models are tested by using the test set residual data set. The classification evaluation criteria used herein are Precision (Precision), Recall (Recall), F_βThe score is calculated by the following specific formula:

wherein, TP, TN, FP and FN are four results output by the classification model, TP represents the number of predicting positive classes as positive classes, TN represents the number of predicting negative classes as negative classes, FP represents the number of predicting negative classes as positive classes, and FN represents the number of predicting positive classes as negative classes. The F score is a balance between the precision rate and the recall rate and is a comprehensive consideration of the model performance, and the higher the F score is, the better the model comprehensive performance is. Generally, the size of beta is set to decide which index is more favored in model evaluation, and when the beta is less than 1, the weight of the precision rate is greater than the recall rate; when beta is larger than 1, the weight of the precision rate is smaller than the recall rate; when β is 1, the weight of the precision rate is as large as the weight of the recall rate. When searching, generally, the recall rate is required to be increased first, and then the precision rate is increased, namely the weight of the recall rate is greater than the precision rate; when disease detection and abnormal detection are performed, the precision rate is generally required to be increased first, and then the recall rate is increased, namely, the weight of the precision rate is greater than the recall rate. In the experiment, since the task of abnormality detection is performed, the precision rate is considered to be more important than the recall rate, and therefore β is set to 0.1. As a result, as shown in table 2, it can be seen that the abnormality detection model of the present invention is superior to other models in effect.

TABLE 2 comparison of Classification Performance of different models on two datasets

Step seven, programmable control of the network

When data is transmitted, a bloom filter is set in each node for storing the data. When a data packet passes through a sensor node, the node splices identification information of the data packet, namely the serial number ID of the data packet, the ID of a source node sending the data packet, the ID of a local node and the ID of a next hop routing node to form a new character string, then puts the character string into a bloom filter of the node through hash mapping, and sets a corresponding phase to be 1. The implementation is shown in formula (10):

Hash(pId||sId||lId||nId) (10)

wherein, Hash represents one of the Hash functions, pId represents the ID of the current packet, sId represents the ID of the source node sending the packet, lId represents the ID of the current node, nId represents the ID of the next hop node, and | | l represents the concatenation operation.

After the edge server detects the abnormality of the received data, if the data is abnormal data, the access point initiates a tracing query about the data packet. The access point will send a data packet containing the source node ID of the abnormal data packet and the ID of the abnormal data packet by broadcasting. Due to the broadcast characteristic, all nodes around the sensor node receive the data packet, extract the ID of the source node and the ID of the abnormal data packet from the data packet, then splice the data packet and the ID of the abnormal data packet in the manner of formula (10), and inquire whether the data packet is stored in the bloom filter of the current sensor node.

If the abnormal data packet is in the current bloom filter, the abnormal data packet is indicated to possibly pass through the node, and the node continues the operation, namely sends the corresponding broadcast data packet, and shields the node, so that the node cannot transmit the data. Otherwise, no operation is performed.

As shown in fig. 8, when the edge server detects data abnormality from the node 1, the access point initiates a trace back request for the data packet. The access point sends an inquiry data packet containing required information in a broadcasting mode, and the nodes 6, 8 and 9 in the broadcasting range receive the inquiry data packet, extract the information of the data packet, splice and inquire whether an abnormal data packet passes through the node. If the bloom filter of the node 6 hits, the node is considered to be the node on the transmission path of the abnormal packet. The node 6 shields the node itself, and meanwhile, continuously sends the query data packet to the

neighbor nodes

3, 5, 7 and the access node in a broadcast manner, while the access node is judged to be on the path of the abnormal data packet, so the query data packet is not sent again, and the rest neighbor nodes continue to perform the above operation until the node 1 is reached. Then, the node 1 will find a new transmission path to transmit data to the access point.

And repeating the loop until the final node is identical to the ID of the source node of the abnormal data packet. Since all nodes on the path are likely to be attacked, in order to avoid the suspicious node, the source node sends the data packet through the new path.

Claims

1. An edge time sequence data anomaly detection and network programmable control method is characterized in that:

the concept of an edge long-time memory network system, namely EdgeLSTM, is provided, and the idea of combining an attention mechanism in deep learning, Grid LSTM and SVM in a machine learning algorithm is provided;

acquiring time sequence data on the Internet of things edge equipment;

predicting the time sequence data of the Internet of things according to a Grid LSTM-based attention mechanism;

predicting the Internet of things time sequence data on the edge equipment by using an attention mechanism prediction model based on Grid LSTM to obtain an error between a true value and a predicted value;

carrying out anomaly detection on the error by utilizing an SVM algorithm to obtain the abnormal condition of the data; therefore, the network programmable control method is provided, and tracing and shielding of the transmission path of the abnormal data packet and searching of a new transmission path of the data are realized.

2. The method for edge timing sequence data anomaly detection and network programmable control according to claim 1, specifically comprising the steps of:

step one, exploratory analysis is carried out on the original data,

carrying out preliminary data exploration analysis on the collected data of the original sensor, and checking the correlation between the general variables and the target variables; observing the conditions of the missing value and the abnormal value of each variable;

step two, the pre-processing of the data,

the raw data set is sensed by multiple sensorsThe data collected by the device needs to be preprocessed, and the preprocessing mainly comprises data cleaning, data filling, data down sampling and use

Normalizing the data, wherein min and max represent the minimum and maximum values, respectively, of the values of a certain list of features, x represents all the values of the feature, and x represents the minimum and maximum values of the feature_stdRepresents the normalized value with the value range of [0,1 ]]；

Step three, dividing the data set,

dividing the preprocessed data set into a training set, a verification set and a test set according to the ratio of 6: 2, wherein the training set only contains normal data, and the test set and the verification set contain both normal data and abnormal data;

step four, building an attention prediction model based on Enhanced Grid LSTM,

the data size of abnormal values in the time series data of the Internet of things is small, a prediction model is built by using a training set, and hyper-parameters are selected for the prediction model through a verification set to achieve a better effect, wherein the use formula is as follows:

wherein, sigma is a logic sigmoid function and the expression is

W^u，W^f，W^o，W^cRespectively representing the weight matrixes in different states; h ═ x [ I ] x [ ]_i，h]^TWherein x is_iRepresenting the current input, I representing the transformed mapping matrix, h representing the previousAn output vector at a time; g^uA presentation input gate for determining information to be updated; g^fA presentation forgetting gate for deciding what information needs to be discarded; g^oA representative output gate for determining the information to be output into the next cell state; g^cIndicating that information is currently about to be updated into the new cell; m 'represents the output of the state of the memory unit at the current moment, and h' represents the output of the state of the hidden unit at the current moment;

according to the framework of the most basic LSTM neural network described above, the N-dimensional hidden vector h is hidden by this framework₁，h₂，…，h_i，…，h_NAnd an N-dimensional memory vector m₁，m₂，…，m_i，…，m_NAs input parameters, the N-dimensional hidden vector h 'is finally output'₁，h′₂，…，h′_i，…，h′_NAnd N-dimensional memory vector m'₁，m′₂，…，m′_i，…，m′_NThe specific formula is as follows:

wherein W_i(i-1, 2, …, N) is a weight matrix W_i ^u，W_i ^f，W_i ^o，W_i ^cSplicing to form a weight matrix; for each cell, the Grid has N edges to receive the hidden state vector and the memory state vector and output N hidden state vectors and memory state vectors, a data point is mapped into the Grid LSTM network along a pair of input hidden/memory state vectors on a side; in an edge long-short time memory network (EdgeLSTM) system, a 2-dimensional Grid LSTM unit is used, h₁And h₂Representing the hidden vector m in the time dimension and the depth dimension, respectively₁And m₂Respectively representing memory vectors in the temporal and depth directions; thus using h in the time dimension₁And m₁To perform 2D grid LSTM unit calculation, and finally output the hidden stateVector h'₁And a memory state vector m'₁(ii) a Correspondingly, h is aligned in the depth dimension₂And m₂Calculating to obtain a hidden state vector h'₂And a memory state vector m'₂；h₁And h₂Various gating mechanisms used in equation 1 above are generated, and m₁And m₂The main memory state vectors are combined to learn the complex characteristics of the time sequence data of the Internet of things; after 2D grid LSTM units are constructed, the units are connected to form a 2D grid LSTM network, the 2D grid LSTM network is formed by circularly connecting four units, a horizontal axis represents a time dimension, and a vertical axis represents a depth dimension; step five, the prediction of the data is carried out,

testing and evaluating the trained prediction model by using the test set, and finally obtaining a predicted value of normal data and a predicted value of abnormal data; as a regression problem in machine learning, the evaluation indexes used are Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and R²The score is calculated by the following specific formula:

wherein,

representing the true value of the ith sample at time t,

representing the mean of the samples at time t. The smaller the value of MAPE, the better, the minimum value is 0; r²The larger the value of (A), the better, the maximum value is 1, and the fitting effect of the model to unknown data is best;

step six, detecting the data abnormity,

through the fourth step and the fifth step, a predicted value of the verification set on the prediction model and a predicted value of the test set on the prediction model are obtained, a verification set residual error data set and a test set residual error data set are respectively constructed according to the obtained predicted values, then a multi-class SVM detection model is constructed by using the verification set residual error data set, and a multi-class SVM abnormity detection model is tested by using the test set residual error data set; the classification evaluation criteria employed were precision: precision, recall: recall, F_βThe score is calculated by the following specific formula:

TP represents the number of positive classes predicted by the positive classes, TN represents the number of negative classes predicted by the negative classes, FP represents the number of positive classes predicted by the negative classes, and FN represents the number of negative classes predicted by the positive classes;

step seven, programmable control of the network

After the edge server detects the received data in an abnormal way, if the data is abnormal data, the access point initiates a tracing query about the data packet, and the access point sends a data packet containing the source node ID of the abnormal data packet and the ID of the abnormal data packet in a broadcasting way; all nodes around the sensor node extract the ID of the source node and the ID of the abnormal data packet from the data packet, splice the source node and the abnormal data packet in a formula (10) mode, and inquire whether the ID is stored in a bloom filter of the current sensor node or not;

Hash(pId||sId||lId||hId) (10)

wherein, Hash represents one of the Hash functions, pId represents the ID of the current data packet, sId represents the ID of the source node sending the data packet, lId represents the ID of the current node, nId represents the ID of the next hop node, and | | l represents the splicing operation; performing loop iteration until the ID of the final node is the same as that of the source node of the abnormal data packet; in order to avoid the suspicious node, the source node sends a data packet through a new path, so that the source node transmits data through the new path.