CN118827151A - Threat flow data identification method, device, equipment and medium - Google Patents
Threat flow data identification method, device, equipment and medium Download PDFInfo
- Publication number
- CN118827151A CN118827151A CN202410783628.7A CN202410783628A CN118827151A CN 118827151 A CN118827151 A CN 118827151A CN 202410783628 A CN202410783628 A CN 202410783628A CN 118827151 A CN118827151 A CN 118827151A
- Authority
- CN
- China
- Prior art keywords
- threat
- data
- log
- flow data
- firewall
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000012549 training Methods 0.000 claims abstract description 94
- 238000010801 machine learning Methods 0.000 claims abstract description 43
- 230000015654 memory Effects 0.000 claims description 27
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000007405 data analysis Methods 0.000 claims description 8
- 238000002790 cross-validation Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 description 11
- 230000002265 prevention Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 241000700605 Viruses Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention relates to the technical field of network security, and discloses a method, a device, equipment and a medium for identifying threat flow data, wherein the method comprises the following steps: obtaining a firewall log comprising a traffic log and a security log, the security log comprising: the threat event intercepted by the firewall and the threat type corresponding to the threat event, and the flow log comprises: flow data monitored by the firewall; based on the security log and the flow log, determining target flow data corresponding to a threat event in the security log and a threat tag of the target flow data, so as to construct a training data set to train a preset machine learning model and obtain a threat identification model; threat identification is carried out on the flow data monitored by the firewall based on the threat identification model, the threat flow data is intercepted, the invention trains a machine learning model through the data in the firewall log, threat identification models are obtained through training to identify threat flow data, so that threat flow can be intercepted more comprehensively to ensure network security.
Description
Technical Field
The invention relates to the technical field of network security, in particular to a method, a device, equipment and a medium for identifying threat flow data.
Background
With the rapid development of the internet, people's life also depends on networks more and more, and the types of network threats are more and more, and threat network traffic is likely to cause the security of equipment and property of network users to be affected, so how to identify threat traffic in network traffic data becomes more and more important.
In the prior art, a firewall is generally adopted to intercept threat flow data, however, a firewall interception mode is generally used for determining preset rules according to experience so as to judge the threat of the acquired flow data, so that the threat flow is intercepted, and the interception mode easily causes that part of the threat flow cannot be identified by the firewall and cannot comprehensively and efficiently identify the threat of the flow data.
Disclosure of Invention
In view of the above, the invention provides a method, a device, equipment and a medium for identifying threat flow data, so as to solve the problem that threat identification cannot be performed on the threat flow data comprehensively and efficiently.
In a first aspect, the present invention provides a method for identifying threat traffic data, the method comprising:
obtaining a firewall log, the firewall log comprising: a traffic log and a security log, the security log comprising: the method comprises the steps that a threat event intercepted by a firewall and a threat type corresponding to the threat event are included in the flow log, wherein the flow log comprises: flow data monitored by the firewall;
performing data matching based on the security log and the flow log, determining target flow data corresponding to a threat event in the security log, and determining the threat type of the threat event as a threat tag of the target flow data;
Constructing a training data set according to target flow data and threat labels corresponding to all threat events in the security log, and training a preset machine learning model through the training data set to obtain a threat identification model;
And carrying out threat identification on the traffic data monitored by the firewall based on the threat identification model, and intercepting the threat traffic data.
The threat event intercepted in the security log of the firewall and the flow data in the flow log are subjected to data matching, so that target flow data corresponding to the threat event and a threat label corresponding to the target flow data are determined, a training data set is obtained, a preset machine learning model is trained through the training data set to obtain a threat identification model, the flow data monitored by the firewall are identified and intercepted, the comprehensiveness of threat flow identification is improved, and network security is further ensured.
In an alternative embodiment, the firewall log includes: logs corresponding to a plurality of different firewalls;
Before the data matching based on the security log and the traffic log, the method further comprises:
according to the security logs corresponding to the firewalls in the firewall logs, unstructured text data of each threat event is determined;
Carrying out data analysis on unstructured text data of each threat event to obtain structured data corresponding to each threat event under a unified structure format, wherein the data analysis mode is as follows: a stream log analysis method based on the longest public subsequence.
The security logs of the plurality of firewalls are analyzed, so that structured data in a unified structure format is obtained, more training data corresponding to threat events can be obtained, the threat identification effect of a threat identification model obtained through training is guaranteed, meanwhile, the data of the security logs are analyzed, the matching effect of the subsequent security logs and the flow logs can be improved, and the data quality of a training set is guaranteed.
In an optional implementation manner, the constructing a training data set according to the target traffic data and the threat label corresponding to each threat event in the security log includes:
determining the characteristic dimension of the target flow data according to attribute information corresponding to each threat event in the security log;
Extracting sample characteristics of each characteristic dimension in the target flow data, and determining sample data corresponding to each target flow data based on the sample characteristics and threat labels of threat events corresponding to the target flow data;
and constructing a training data set according to the sample data corresponding to each target flow data.
Sample characteristics corresponding to each characteristic dimension of the target flow data are extracted, the sample characteristics are matched with threat labels corresponding to the target flow data, so that sample data of the target flow data are obtained, the sample data are collected to obtain a training data set, the effectiveness of the training data can be further guaranteed, and the training effect of the threat identification model is improved.
In an alternative embodiment, the preset machine learning model is: a machine learning model constructed based on a lightweight gradient elevator algorithm;
the training the preset machine learning model through the training data set comprises the following steps:
In the process of training the machine learning model constructed based on the lightweight gradient hoisting algorithm through the training data set, performing cross-validation on the machine learning model;
and adjusting model parameters of the machine learning model according to the cross-validation result.
The model stability can be further improved by constructing a machine learning model constructed based on a lightweight gradient elevator algorithm and adjusting model parameters through cross verification in the process of training the model, so that the threat identification effect of the threat identification model obtained through training is ensured.
In an alternative embodiment, after threat identification is performed on the traffic data monitored by the firewall based on the threat identification model, the method further includes, after intercepting the threat traffic data:
Determining contribution values corresponding to all feature dimensions when the threat identification model identifies the threat flow data according to a model interpreter;
and outputting contribution values corresponding to the feature dimensions of the threat flow data.
By adopting the model interpreter to determine the contribution degree of each characteristic dimension and output the contribution degree when the threat identification model identifies threat flow data, a user can know what dimension causes are specific to the threat flow generation, and accordingly corresponding security prevention and control are performed on the network equipment in time.
In an optional implementation manner, the threat identification for the traffic data monitored by the firewall based on the threat identification model includes:
Threat identification is carried out on the monitored flow data through a firewall, first threat flow data in the flow data are intercepted, the first threat flow data in the flow data are eliminated, and second flow data are obtained;
And carrying out threat identification on the second traffic data through the threat identification model, and intercepting the second threat traffic data in the second traffic data.
The flow data is intercepted for the first time through the firewall, and then the intercepted flow data is subjected to hazard investigation again through the threat identification model, so that the dual screening of the flow data is realized, the comprehensiveness of threat interception is further ensured, and the network safety is ensured.
In an alternative embodiment, the method further comprises:
And outputting the threat event and the threat type corresponding to the threat flow data.
By outputting and displaying the corresponding threat event and threat type of the threat flow data, the user can know the current network security state in time, so that avoidance is performed in the subsequent network behavior.
In a second aspect, the present invention provides an apparatus for identifying threat traffic data, the apparatus comprising:
The log data acquisition module is used for acquiring a firewall log, and the firewall log comprises: a traffic log and a security log, the security log comprising: the method comprises the steps that a threat event intercepted by a firewall and a threat type corresponding to the threat event are included in the flow log, wherein the flow log comprises: flow data monitored by the firewall;
the threat tag matching module is used for carrying out data matching based on the security log and the flow log, determining target flow data corresponding to a threat event in the security log, and determining the threat type of the threat event as a threat tag of the target flow data;
the threat model training module is used for constructing a training data set according to target flow data and threat labels corresponding to all threat events in the security log, and training a preset machine learning model through the training data set to obtain a threat identification model;
and the threat flow interception module is used for carrying out threat identification on the flow data monitored by the firewall based on the threat identification model and intercepting the threat flow data.
In a third aspect, the present invention provides a computer device comprising: the device comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions so as to execute the threat flow data identification method of the first aspect or any corresponding implementation mode.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of identifying threat traffic data of the first aspect or any of its corresponding embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of a method of identifying threat traffic data in accordance with an embodiment of the invention;
FIG. 2 is a flow chart of another method of identifying threat traffic data in accordance with an embodiment of the invention;
FIG. 3 is a block diagram of an identification device for threat traffic data in accordance with an embodiment of the invention;
Fig. 4 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
With the rapid development of the internet, people's life also depends on networks more and more, and the types of network threats are more and more, and threat network traffic is likely to cause the security of equipment and property of network users to be affected, so how to identify threat traffic in network traffic data becomes more and more important.
In the prior art, a firewall is generally adopted to intercept threat flow data, however, a firewall interception mode is generally used for determining preset rules according to experience so as to judge the threat of the acquired flow data, so that the threat flow is intercepted, and the interception mode easily causes that part of the threat flow cannot be identified by the firewall and cannot comprehensively and efficiently identify the threat of the flow data.
Therefore, the embodiment of the invention provides a threat flow data identification method, which is used for obtaining target flow data corresponding to a threat event by matching the threat event intercepted in a security log of a firewall with flow data in the flow log, determining a corresponding threat label to obtain a training data set, and training a preset machine learning model through the training data set to obtain a threat identification model so as to identify and intercept the flow data monitored by the firewall, thereby improving the comprehensiveness of threat flow identification and further ensuring network security.
In accordance with an embodiment of the present invention, there is provided an embodiment of a method of identifying threat traffic data, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than what is shown or described herein.
In this embodiment, a method for identifying threat traffic data is provided, which may be used for identifying the above-mentioned cyber threat, and fig. 1 is a flowchart of a method for identifying threat traffic data according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:
Step S101, obtaining a firewall log, where the firewall log includes: traffic log and security log, security log includes: the threat event intercepted by the firewall and the threat type corresponding to the threat event, and the flow log comprises: and the firewall monitors the flow data.
On a device that needs to be connected to a network, a user typically installs existing firewall software to protect the device from the network, where the firewall software can monitor traffic data of the user device, so as to generate a traffic log, and record traffic data corresponding to various traffic behaviors of the user device in a section of event in the traffic log. For the monitored traffic data, the firewall may intercept the threat according to a set interception rule, for example, the five-tuple of the traffic data is processed according to a preset rule to determine whether it is a threat event to intercept. Therefore, the firewall determines the network event corresponding to some traffic data as a threat event, and determines the threat type corresponding to the threat event according to the set rule, for example, the threat type of virus, dangerous website, abnormal software installation, etc. After the firewall recognizes and intercepts the corresponding threat events, a corresponding security log is generated, and the intercepted threat events and threat types corresponding to the threat events are recorded in the security log.
Step S102, data matching is carried out based on the security log and the flow log, target flow data corresponding to the threat event in the security log is determined, and the threat type of the threat event is determined as a threat label of the target flow data.
Because all the events corresponding to the flow data are not threatened, the threat events recorded in the security log only correspond to part of the flow data in the flow log, so that the security log and the data recorded in the flow log are required to be matched, the target flow data corresponding to each threat event is determined, and meanwhile, threat tags are marked on the target flow data according to the threat types of each threat event recorded in the security log, namely, the data of the target flow data are marked, so that training data are obtained.
Step S103, a training data set is constructed according to target flow data and threat labels corresponding to all threat events in the safety log, and a preset machine learning model is trained through the training data set to obtain a threat identification model.
After determining target flow data corresponding to all threat events in the safety log and marking the target flow data, summarizing the marked target flow data to construct a training data set. Training a pre-established machine learning model through the training data set, and adjusting training parameters of the model in the training process so as to obtain a threat identification model. Specifically, the training data set can be divided into a training set and a testing set, the trained model is tested through the testing set, the threat identification model obtained through training is ensured to meet the requirements, and the threat identification capability of the preset requirements is achieved.
And step S104, threat identification is carried out on the traffic data monitored by the firewall based on the threat identification model, and the threat traffic data is intercepted.
After training to obtain a threat identification model, threat identification is carried out on the flow data monitored by the firewall through the threat identification model, and threat flow data corresponding to a threat event in the flow data is intercepted.
According to the threat flow data identification method, the threat event intercepted in the security log of the firewall is matched with the flow data in the flow log, so that target flow data corresponding to the threat event is obtained, the corresponding threat label is determined to obtain the training data set, the preset machine learning model is trained through the training data set to obtain the threat identification model, the flow data monitored by the firewall is identified and intercepted, the comprehensiveness of threat flow identification is improved, and network security is further guaranteed.
According to an embodiment of the present invention, another embodiment of a method for identifying threat traffic data is provided, which may be used for threat identification as described above, and fig. 2 is a flowchart of another method for identifying threat traffic data according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:
step S201, a firewall log is obtained, where the firewall log includes: traffic log and security log, security log includes: the threat event intercepted by the firewall and the threat type corresponding to the threat event, and the flow log comprises: and the firewall monitors the flow data.
Specifically, the firewall log includes: logs corresponding to a plurality of different firewalls.
It will be appreciated that in order to further secure the network of the device, the user may install different kinds of firewall software on the device. There may be a difference in the interception rules corresponding to these firewall software, so that more threat events may be intercepted, and different firewalls have corresponding security logs for recording the threat events and the corresponding threat types intercepted by them. Likewise, although there is substantially no difference in the monitored traffic data for different firewalls, each firewall may also generate its corresponding traffic log. During subsequent matching, threat events in security logs corresponding to different firewalls and flow data in flow logs can be respectively summarized, and then data matching is performed after the summarization. By matching threat events and flow data corresponding to the firewall logs, a richer training data set can be obtained, and the training effect of a subsequent model is further improved.
Step S202, unstructured text data of each threat event is determined according to security logs corresponding to each firewall in the firewall logs;
Carrying out data analysis on unstructured text data of each threat event to obtain structured data corresponding to each threat event under a unified structure format, wherein the data analysis mode is as follows: a stream log analysis method based on the longest public subsequence.
The data record format in the security log of the firewall is typically unstructured text data, i.e., the threat events recorded in the security log are in the form of unstructured text data. And determining the security logs corresponding to the firewalls in the firewall logs, and determining unstructured text data corresponding to the threat events in the security logs. Due to the type difference of the firewalls, the unstructured text data also have differences in format, so that the data are required to be unified in format, the data corresponding to the threat events can be analyzed through a stream log analysis method based on the longest public subsequence, structured data corresponding to each threat event under the unified structure format are obtained, and the matching accuracy of the subsequent security log and the flow log is ensured.
The security logs of the plurality of firewalls are analyzed, so that structured data in a unified structure format is obtained, more training data corresponding to threat events can be obtained, the threat identification effect of a threat identification model obtained through training is guaranteed, the data of the security logs are analyzed, the matching effect of the subsequent security logs and the flow logs can be improved, and the data quality of a training set is guaranteed.
In one example, the flow of parsing the security log by the method of parsing the flow log based on the longest common subsequence may be: step 1, initializing the data of the security log, determining a log Object LCS Object, a log template LCS seq and a line list Ids, and determining a list LCS Map storing the log Object. And 2, reading the log by adopting a streaming reading mode. And 3, traversing the LCS Map after reading the new log entry, searching the largest common subsequence of the log and all LCS objects, and if the length of the subsequence is more than half of the length of the log sequence, considering that the log is matched with the log key. If a matching log object is found, step 5 is performed, and if not, or the LCS Map is empty, step 4 is performed. And 4, initializing the line log into a new LCS Object, and putting the new LCS Object into a list LCS Map. Step 5, the line log is updated to the line list Ids of the matched LCS Object, and the LCS seq is updated. Step 6, jumping to step 2 until the log is read. The log analysis method is only an exemplary log analysis mode, and when the log analysis method is specifically implemented, a proper log analysis mode can be determined according to actual conditions.
Step S203, data matching is performed based on the security log and the flow log, target flow data corresponding to the threat event in the security log is determined, and the threat type of the threat event is determined as a threat tag of the target flow data. The detailed description refers to step S202 of the embodiment shown in fig. 1, and will not be described herein.
Step S204, a training data set is constructed according to target flow data and threat labels corresponding to all threat events in the safety log, and a preset machine learning model is trained through the training data set to obtain a threat identification model.
Specifically, in step S204, it includes:
step S2041, determining the characteristic dimension of the target flow data according to the attribute information corresponding to each threat event in the security log.
Attribute information corresponding to each threat event, such as an access IP address, an access direction, a risk level and the like corresponding to the threat event, is recorded in the security log, and a characteristic dimension of corresponding target flow data is determined by combining specific information corresponding to the attributes, wherein the characteristic dimension can be, for example, a characteristic dimension of standard deviation of source IP, source port, destination IP, destination port, stream length and the like.
Step S2042 extracts sample features of each feature dimension in the target flow data, and determines sample data corresponding to each target flow data based on the sample features and threat tags of threat events corresponding to the target flow data.
After the feature dimensions are determined, feature extraction is carried out on each piece of target flow data to obtain sample feature data corresponding to each feature dimension of the target flow data, sample features of different feature dimensions are converged, meanwhile, threat labels of threat events corresponding to the target flow data are combined, and label marking is carried out on the sample feature data to obtain sample data corresponding to each piece of target flow data.
Step S2043, a training data set is constructed according to sample data corresponding to each target flow data, and a preset machine learning model is trained through the training data set, so that a threat identification model is obtained.
And summarizing sample data corresponding to each target flow data to obtain a training data set, wherein each sample data in the training data set is data obtained by extracting the characteristics of the target flow data, so that the training efficiency and the recognition effect of machine learning can be further ensured. And after conventional training processes such as training, testing and the like of the model types of the preset machine learning models, obtaining a trained threat identification model.
Sample characteristics corresponding to each characteristic dimension of the target flow data are extracted, the sample characteristics are matched with threat labels corresponding to the target flow data, so that sample data of the target flow data are obtained, the sample data are collected to obtain a training data set, the effectiveness of the training data can be further guaranteed, and the training effect of the threat identification model is improved.
In one example, the training data set obtained above may be preprocessed, which may specifically include: filling data missing values, processing the missing values, performing correlation analysis on the features so as to perform feature screening, encoding classification variables and IP addresses, performing preprocessing operations such as data type conversion and the like, and guaranteeing the quality of the constructed training data set.
Specifically, in step S204, the preset machine learning model is: a machine learning model constructed based on a lightweight gradient elevator algorithm;
Training a preset machine learning model through a training data set, comprising:
In the process of training a machine learning model constructed based on a lightweight gradient hoisting algorithm through a training data set, cross-verifying the machine learning model;
and adjusting model parameters of the machine learning model according to the cross-validation result.
The lightweight gradient elevator algorithm is the Light GBM algorithm, is used for solving the problems of classification and regression, has higher accuracy in threat identification based on a machine learning model constructed by the algorithm, and can further ensure the stability of a model obtained by training by adopting a cross-validation mode to adjust parameters in the training process, and ensure the effect of threat identification on flow data.
Step S205, threat identification is carried out on the traffic data monitored by the firewall based on the threat identification model, and the threat traffic data is intercepted.
Specifically, in step S205, threat identification is performed on traffic data monitored by the firewall based on the threat identification model, including:
threat identification is carried out on the monitored flow data through a firewall, first threat flow data in the flow data are intercepted, the first threat flow data in the flow data are eliminated, and second flow data are obtained;
threat identification is carried out on the second traffic data through the threat identification model, and the second threat traffic data in the second traffic data is intercepted.
The first interception is carried out on the traffic data through the firewall, threat traffic data corresponding to the threat event intercepted for the first time is screened out, second traffic data is obtained, then threat identification is carried out through a trained threat identification model, the second threat traffic data in the second traffic data is intercepted, double screening of the traffic data can be achieved, the integrity of threat interception is further guaranteed, and network security is guaranteed.
Step S206, determining contribution values corresponding to each characteristic dimension when the threat identification model identifies the threat flow data according to a model interpreter;
and outputting contribution values corresponding to the feature dimensions of the threat flow data.
It can be understood that after the threat identification model identifies the threat traffic data, the contribution degree of each characteristic dimension, that is, the criticality of factors of each dimension to the traffic data identified as threat traffic data, of the threat identification model in the process of identifying the threat traffic data can be determined according to the model interpreter. And outputting the contribution degree of each characteristic dimension. The method can enable the user to intuitively know the threat flow data, which are caused by factors of which dimensions, so that corresponding prevention and control can be timely performed.
Specifically, the model interpreter may be a SHAP interpreter, and in one example, the contribution degrees corresponding to different feature dimensions in the process of identifying each threat traffic data may be determined by the SHAP interpreter. The calculation principle of the SHAP interpreter in the contribution degree calculation is as follows: assuming that 3 features A, B, C are input into the model, and the output predicted value is 0 or 1, the output predicted value corresponding to the model can be determined when only A, B or C is input into the model; when the model input is AB, AC or BC, outputting a predicted value corresponding to the model; and model output predictors when the model input is ABC. The marginal effect of each feature is then calculated by comparing the differences in model outputs in the different subsets, and finally the independent contribution of each feature a, B, C to the predicted result, i.e. their SHAP values, is obtained according to the shapley value formula.
By adopting the model interpreter to determine the contribution degree of each characteristic dimension and output the contribution degree when the threat identification model identifies threat flow data, a user can know what dimension causes are specific to the threat flow generation, and accordingly corresponding security prevention and control are performed on the network equipment in time.
Step S207, outputting threat event and threat type corresponding to the threat flow data.
By outputting and displaying the corresponding threat event and threat type of the threat flow data, the user can know the current network security state in time, so that avoidance is performed in the subsequent network behavior. Specifically, the threat level corresponding to the threat flow data may be determined and output according to a preset threat level determination rule. For example, a threat level table may be preset, where threat events corresponding to different threat levels are included in the threat level table, and specific threat levels are determined by combining the threat events corresponding to threat traffic data with the threat level table, so that a user can further know the serious condition of the threat, and thus corresponding prevention and control can be timely performed.
According to the threat flow data identification method provided by the embodiment of the invention, the threat event intercepted in the security log of the firewall is matched with the flow data in the flow log, so that the target flow data corresponding to the threat event is obtained, the corresponding threat label is determined, the target flow data is subjected to feature extraction, so that the training data set is obtained, the preset machine learning model is trained through the training data set to obtain the threat identification model, the flow data monitored by the firewall is identified and intercepted, the threat flow identification comprehensiveness is improved, the network security is further ensured, and after the threat identification model identifies the threat flow data, the identification contribution degree corresponding to each feature dimension in the identification process is determined according to the model interpreter, so that a user can conduct network prevention and control more specifically, and the user experience is improved.
The embodiment also provides a threat flow data identification device, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The present embodiment provides an apparatus for identifying threat traffic data, as shown in fig. 3, including:
The log data obtaining module 401 is configured to obtain a firewall log, where the firewall log includes: traffic log and security log, security log includes: the threat event intercepted by the firewall and the threat type corresponding to the threat event, and the flow log comprises: and the firewall monitors the flow data.
The threat tag matching module 402 is configured to perform data matching based on the security log and the traffic log, determine target traffic data corresponding to a threat event in the security log, and determine a threat type of the threat event as a threat tag of the target traffic data.
The threat model training module 403 is configured to construct a training data set according to target traffic data and threat labels corresponding to each threat event in the security log, and train a preset machine learning model through the training data set to obtain a threat identification model.
The threat flow interception module 404 is configured to perform threat identification on flow data monitored by the firewall based on the threat identification model, and intercept threat flow data.
In some alternative embodiments, the log data obtaining module 401 obtains a firewall log including: the log data obtaining module 401 is further configured to determine unstructured text data of each threat event according to security logs corresponding to each firewall in the firewall logs;
Carrying out data analysis on unstructured text data of each threat event to obtain structured data corresponding to each threat event under a unified structure format, wherein the data analysis mode is as follows: a stream log analysis method based on the longest public subsequence.
In some optional embodiments, the threat model training module 403, when constructing a training data set according to the target traffic data and the threat label corresponding to each threat event in the security log, includes: determining the characteristic dimension of the target flow data according to the attribute information corresponding to each threat event in the security log;
Extracting sample characteristics of each characteristic dimension in the target flow data, and determining sample data corresponding to each target flow data based on the sample characteristics and threat labels of threat events corresponding to the target flow data;
and constructing a training data set according to the sample data corresponding to each target flow data.
In some alternative embodiments, threat model training module 403 trains the preset machine learning model to: and a machine learning model constructed based on a lightweight gradient elevator algorithm.
The threat model training module 403, when training the preset machine learning model through the training data set, includes:
In the process of training a machine learning model constructed based on a lightweight gradient hoisting algorithm through a training data set, cross-verifying the machine learning model;
and adjusting model parameters of the machine learning model according to the cross-validation result.
In some optional embodiments, the threat traffic interception module 404, after performing threat identification on traffic data monitored by the firewall based on the threat identification model, is further configured to:
Determining contribution values corresponding to all feature dimensions when the threat identification model identifies the threat flow data according to a model interpreter;
and outputting contribution values corresponding to the feature dimensions of the threat flow data.
In some alternative embodiments, threat traffic interception module 404, when threat identification is performed on traffic data monitored by a firewall based on a threat identification model, comprises:
threat identification is carried out on the monitored flow data through a firewall, first threat flow data in the flow data are intercepted, the first threat flow data in the flow data are eliminated, and second flow data are obtained;
threat identification is carried out on the second traffic data through the threat identification model, and the second threat traffic data in the second traffic data is intercepted.
In some alternative embodiments, the threat traffic interception module 404 is further configured to output a threat event and a threat type corresponding to the threat traffic data.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The threat traffic data identification apparatus in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC (Application SPECIFIC INTEGRATED Circuit) Circuit, a processor and memory that execute one or more software or firmware programs, and/or other devices that can provide the above-described functionality.
The embodiment of the invention also provides computer equipment, which is provided with the threat flow data identification device shown in the figure 3.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 4, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 4.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The computer device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 40 may be connected by a bus or other means, for example in fig. 4.
The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.
Claims (10)
1. A method of identifying threat traffic data, the method comprising:
obtaining a firewall log, the firewall log comprising: a traffic log and a security log, the security log comprising: the method comprises the steps that a threat event intercepted by a firewall and a threat type corresponding to the threat event are included in the flow log, wherein the flow log comprises: flow data monitored by the firewall;
performing data matching based on the security log and the flow log, determining target flow data corresponding to a threat event in the security log, and determining the threat type of the threat event as a threat tag of the target flow data;
Constructing a training data set according to target flow data and threat labels corresponding to all threat events in the security log, and training a preset machine learning model through the training data set to obtain a threat identification model;
And carrying out threat identification on the traffic data monitored by the firewall based on the threat identification model, and intercepting the threat traffic data.
2. The method of claim 1, wherein the firewall log comprises: logs corresponding to a plurality of different firewalls;
Before the data matching based on the security log and the traffic log, the method further comprises:
according to the security logs corresponding to the firewalls in the firewall logs, unstructured text data of each threat event is determined;
Carrying out data analysis on unstructured text data of each threat event to obtain structured data corresponding to each threat event under a unified structure format, wherein the data analysis mode is as follows: a stream log analysis method based on the longest public subsequence.
3. The method of claim 1, wherein constructing a training data set from the target traffic data and threat tags corresponding to each threat event in the security log comprises:
determining the characteristic dimension of the target flow data according to attribute information corresponding to each threat event in the security log;
Extracting sample characteristics of each characteristic dimension in the target flow data, and determining sample data corresponding to each target flow data based on the sample characteristics and threat labels of threat events corresponding to the target flow data;
and constructing a training data set according to the sample data corresponding to each target flow data.
4. The method of claim 1, wherein the predetermined machine learning model is: a machine learning model constructed based on a lightweight gradient elevator algorithm;
the training the preset machine learning model through the training data set comprises the following steps:
In the process of training the machine learning model constructed based on the lightweight gradient hoisting algorithm through the training data set, performing cross-validation on the machine learning model;
and adjusting model parameters of the machine learning model according to the cross-validation result.
5. A method according to claim 3, wherein after threat identification is performed on traffic data monitored by a firewall based on the threat identification model, intercepting threat traffic data, the method further comprises:
Determining contribution values corresponding to all feature dimensions when the threat identification model identifies the threat flow data according to a model interpreter;
and outputting contribution values corresponding to the feature dimensions of the threat flow data.
6. The method of claim 1, wherein threat identification of traffic data monitored by a firewall based on the threat identification model comprises:
Threat identification is carried out on the monitored flow data through a firewall, first threat flow data in the flow data are intercepted, the first threat flow data in the flow data are eliminated, and second flow data are obtained;
And carrying out threat identification on the second traffic data through the threat identification model, and intercepting the second threat traffic data in the second traffic data.
7. The method according to claim 1, wherein the method further comprises:
And outputting the threat event and the threat type corresponding to the threat flow data.
8. An apparatus for identifying threat traffic data, the apparatus comprising:
The log data acquisition module is used for acquiring a firewall log, and the firewall log comprises: a traffic log and a security log, the security log comprising: the method comprises the steps that a threat event intercepted by a firewall and a threat type corresponding to the threat event are included in the flow log, wherein the flow log comprises: flow data monitored by the firewall;
the threat tag matching module is used for carrying out data matching based on the security log and the flow log, determining target flow data corresponding to a threat event in the security log, and determining the threat type of the threat event as a threat tag of the target flow data;
the threat model training module is used for constructing a training data set according to target flow data and threat labels corresponding to all threat events in the security log, and training a preset machine learning model through the training data set to obtain a threat identification model;
and the threat flow interception module is used for carrying out threat identification on the flow data monitored by the firewall based on the threat identification model and intercepting the threat flow data.
9. A computer device, comprising:
A memory and a processor communicatively coupled to each other, the memory having stored therein computer instructions that, upon execution, perform the method of identifying threat traffic data of any of claims 1 to 7.
10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the method of identifying threat traffic data of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410783628.7A CN118827151A (en) | 2024-06-17 | 2024-06-17 | Threat flow data identification method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410783628.7A CN118827151A (en) | 2024-06-17 | 2024-06-17 | Threat flow data identification method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118827151A true CN118827151A (en) | 2024-10-22 |
Family
ID=93067535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410783628.7A Pending CN118827151A (en) | 2024-06-17 | 2024-06-17 | Threat flow data identification method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118827151A (en) |
-
2024
- 2024-06-17 CN CN202410783628.7A patent/CN118827151A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10248541B2 (en) | Extraction of problem diagnostic knowledge from test cases | |
US8396815B2 (en) | Adaptive business process automation | |
US9015666B2 (en) | Updating product documentation using automated test scripts | |
US9892263B2 (en) | System, method and apparatus to visually configure an analysis of a program | |
US11182163B1 (en) | Customizable courses of action for responding to incidents in information technology environments | |
US11126494B2 (en) | Automated, adaptive, and auto-remediating system for production environment | |
CN107430590B (en) | System and method for data comparison | |
CN113114680A (en) | Detection method and detection device for file uploading vulnerability | |
CN111654495B (en) | Method, apparatus, device and storage medium for determining traffic generation source | |
CN116346456A (en) | Business logic vulnerability attack detection model training method and device | |
WO2022042126A1 (en) | Fault localization for cloud-native applications | |
CN104657248A (en) | Java thread stack analysis method and system | |
US10015181B2 (en) | Using natural language processing for detection of intended or unexpected application behavior | |
CN110276183B (en) | Reverse Turing verification method and device, storage medium and electronic equipment | |
CN112003833A (en) | Abnormal behavior detection method and device | |
CN118827151A (en) | Threat flow data identification method, device, equipment and medium | |
CN113114679B (en) | Message identification method and device, electronic equipment and medium | |
CN111651753A (en) | User behavior analysis system and method | |
CN113037555B (en) | Risk event marking method, risk event marking device and electronic equipment | |
US12079285B2 (en) | Training device, determination device, training method, determination method, training method, and determination program | |
CN113986768A (en) | Application stability testing method, device, equipment and medium | |
CN113656314A (en) | Pressure test processing method and device | |
CN112131090A (en) | Business system performance monitoring method and device, equipment and medium | |
US20200226257A1 (en) | System and method for identifying activity in a computer system | |
CN116707898A (en) | Injection attack detection method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |