CN111431872B - Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics - Google Patents

Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics Download PDF

Info

Publication number
CN111431872B
CN111431872B CN202010163310.0A CN202010163310A CN111431872B CN 111431872 B CN111431872 B CN 111431872B CN 202010163310 A CN202010163310 A CN 202010163310A CN 111431872 B CN111431872 B CN 111431872B
Authority
CN
China
Prior art keywords
internet
tcp
stage
things
equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010163310.0A
Other languages
Chinese (zh)
Other versions
CN111431872A (en
Inventor
范建存
王炳杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010163310.0A priority Critical patent/CN111431872B/en
Publication of CN111431872A publication Critical patent/CN111431872A/en
Application granted granted Critical
Publication of CN111431872B publication Critical patent/CN111431872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0876Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Power Engineering (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics, wherein eight available header information in a TCP/IP protocol are extracted from equipment flow according to protocol knowledge and used as characteristics, an MAC address is used as a label, and an initial sample is generated after data preprocessing is carried out; selecting characteristics by using information gain for simplifying characteristics and giving a characteristic score; designing a two-stage Internet of things equipment identification scheme for reducing redundant information and providing a corresponding OneR-NB algorithm model and a model optimization degree; the overall accuracy of the invention reaches 99.9%, and the model optimization degree is 34.36% of the original optimization degree. On the aspect of identifying the equipment of the Internet of things, complex feature generation is reduced by screening TCP handshake packet reduction and extracting TCP/IP protocol fields, feature selection is carried out according to calculated information gain, a two-stage equipment identification scheme of the Internet of things is designed to reduce redundant information brought by equipment of a non-Internet of things, and finally sufficient simplification of features is realized and the high efficiency of an algorithm model is ensured.

Description

Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics.
Background
The development of the technology of the internet of things leads human beings to enter a new era of everything perception, everything interconnection and everything intelligence, the number of personal terminals reaches 400 billion by the time of 2025 years according to the prediction of GIV, and the economic output value after the number transformation of related industries can reach 23 trillion dollars. However, new technologies also pose threats and challenges, and by the latest forecast of Juniper Research, the internet of things security cost will reach $ 60 billion by 2023. Different from the traditional network security, attackers slowly transfer from equipment vulnerability exploitation to tool simulation legal operation, and attack modes are more and more diversified. In order to reduce the security risk of the environment of the internet of things, the accessed internet of things equipment must be supervised. The conventional device identification mainly takes the information of MAC address, IP address, host name and the like as the standard, but the information can be forged. To address such issues, features may be extracted from network traffic as internet of things device fingerprints.
Although machine learning has been studied on internet of things traffic, most of the studies are mainly based on abnormal traffic detection, and only a few parts of internet of things equipment identification are researched. In the identification of the internet of things equipment, characteristics with more calculation are selected in many schemes, or the accuracy is optimized by performing special improvement on the algorithm, and whether the method is suitable for practical application or not is rarely considered. There are three key points to consider in actual fingerprint identification: firstly, fingerprint characteristics; secondly, an algorithm model; and thirdly, calculating performance. The differences between the internet of things flow and the traditional network flow and how to extract effective information from the differences are a basis for constructing the fingerprint of the internet of things equipment, and the selection of a proper machine learning algorithm model is a key, and the guarantee that the actual calculation resource consumption can be implemented is considered.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a two-stage internet of things equipment identification method based on TCP/IP protocol characteristics, aiming at the defects in the prior art, so that the sufficient simplification of fingerprints is realized and the high efficiency of an algorithm model is ensured.
The invention adopts the following technical scheme:
a two-stage Internet of things equipment identification method based on TCP/IP protocol features comprises the following steps:
s1, deploying a collector at a core router of the Internet of things equipment to collect traffic and send data to a management end, wherein the extracted features are that fields in the head of a TCP or IP message form a sample set D;
s2, calculating information gain, sequencing importance degrees of features, and selecting features;
s3, adopting a two-stage Internet of things equipment identification model, identifying whether the equipment is Internet of things equipment or not in the first stage, identifying the specific type of the equipment in the second stage, determining a model optimization degree index through a oneR-NB method, and completing the identification of the Internet of things equipment.
Specifically, in step S1, only packets in the TCP connection phase of the device are filtered out in the acquisition process, and the determination is performed according to the SYN flag field in the header of the TCP packet; and extracting fields in the headers of the IP message and the TCP message, forming a sample data by taking the MAC address as an initial classification label, and forming a sample set D by collecting all the sample data.
Further, extracting fields in headers of the IP packet and the TCP packet includes: the method comprises the steps of target IP in an IP message header, total length, survival time, a target port in a TCP message, window size, TCP option sequence, maximum message length and window expansion factor size.
Specifically, in step S2, first, the information entropy of the sample set D is calculated, then the conditional information entropy of the set D under the feature a is calculated, and the final difference is the information gain of the feature a to D; after information gain calculation is carried out on all the characteristics, the characteristics are arranged according to the sequence of the information gain values from large to small, after the information gain calculation is carried out on all the characteristics, the size of a window is selected as the characteristics in the first stage of the identification of the equipment of the Internet of things, and a target IP, a window expansion factor and a target port are selected as the characteristics in the second stage.
Further, the information gain G (D, a) of the feature a to the set D is:
G(D,A)=H(D)-H(D/A)
where D is the sample set generated in step S1, H (D) is the information entropy of the set D, and H (D/a) is the conditional entropy of the set D under the condition of the feature a.
Specifically, in step S3, the traffic feature construction module starts monitoring and capturing an original traffic packet after the device is started, a network analysis tool is used to filter a SYN packet in a TCP connection from the traffic packet, a feature field of a protocol is extracted according to protocol knowledge and an MAC address is used as a classification label, and finally, the extracted features are converted into a vector with a fixed format; the method comprises the steps that a device type two-classification module is adopted, and few features are used for distinguishing whether current devices to be classified are internet of things devices or non-internet of things devices; and (3) performing two-classification identification by using an Internet of things equipment multi-classification module and an OneR algorithm to determine whether the equipment is the Internet of things equipment, and if so, performing multi-classification identification by using a naive Bayesian algorithm.
Further, the two-classification recognition performed by the OneR algorithm specifically comprises:
s301, selecting a certain attribute, and establishing a rule for each attribute value of the attribute;
s302, calculating an error rate of the rule;
s303, selecting a rule with the minimum error rate.
Further, in step S301, the frequency of occurrence of each category is calculated; finding the most frequently occurring category; a rule is established that assigns this category to this attribute value.
Further, assuming that x internet of things device samples and y non-internet of things device samples are used for predetermining a features, b features are determined in the first stage, c features are determined in the second stage, and the sample model optimization degree alpha represents the ratio of the two-stage classification sample model to the initial sample model.
Furthermore, the ratio α between the two-stage classification sample model and the initial sample model is:
Figure BDA0002406556470000041
compared with the prior art, the invention has at least the following beneficial effects:
a two-stage Internet of things equipment identification method based on TCP/IP protocol features is characterized in that all packets are not required to be collected in a flow collection stage, and only SYN packets in TCP connection are required to be collected, so that the storage space can be reduced; the extracted features are all fields in the message header, which are determined according to protocol knowledge and do not require additional computational consumption. A two-stage Internet of things equipment identification scheme is designed, feature selection is carried out at each stage, a OneR-NB algorithm model is provided to adapt to sample data, and finally an optimization degree evaluation index of the model is provided.
Further, the architecture of the internet of things comprises three layers, namely an application layer, a network layer and a physical layer. The purpose of the scheme is to start from the traffic data of the network layer, so that a collector is deployed at a core router of the Internet of things equipment to collect traffic and send the data to a management end. The TCP/IP protocol is a mainstream protocol of network transmission at present, a stable transmission mechanism of the TCP protocol and the particularity of network services of the equipment of the Internet of things enable the information which can be used to be mined from the TCP/IP protocol to serve as the characteristics of equipment identification.
Furthermore, the invention extracts 8 fields from the headers of the IP message and the TCP message, and the method comprises the following steps: the method comprises the steps of target IP in an IP message header, total length, survival time, a target port in a TCP message, window size, TCP option sequence, maximum message length and window expansion factor size. The IP protocol is located at the network layer of the TCP/IP protocol, and the destination IP, total length, and time-to-live in the header of the message significantly convey routing and transport information. The TCP protocol is located at the TCP/IP transport layer, and aims to establish a reliable end-to-end connection on the network, and the control information in the header of the TCP protocol can indicate the object and state of the connection. The characteristics of a destination port, the size of a window, the sequence of an option part in a TCP message header, the maximum message length in the option part and the field value of a window expansion factor can be reflected most. In conclusion, the maximum of 8 characteristics selected by the invention shows the difference of the implementation of different Internet of things devices on the TCP/IP protocol.
Furthermore, in the application of machine learning algorithm, on one hand, good feature selection can help us to understand the characteristics and the underlying structure of data, which plays an important role in further improving the model and algorithm. On the other hand, good feature selection can reduce the number of features to achieve the purpose of dimension reduction, thereby improving the performance of the model. The information gain reflects the amount of information a feature can bring to a system, and the larger the information gain, the more beneficial the feature is for classification.
Furthermore, in consideration of practical application problems, the real internet of things environment comprises internet of things equipment and non-internet of things equipment, and the characteristic values of the non-internet of things equipment are invalid and bring redundant information. The first-stage equipment identification designed by the invention can effectively distinguish two types of equipment by using only one characteristic, and the applied OneR algorithm has the most efficient performance. And in the second stage, specific Internet of things equipment is identified, and for the multi-classification problem, the naive Bayes algorithm in the stage is optimal in precision and performance by combining the characteristics of the multi-classification problem. Meanwhile, the invention provides the model optimization degree of the two-stage recognition scheme, and the advantages of the scheme are shown.
In conclusion, the invention constructs characteristics from the perspective of TCP/IP protocol on the aspect of the identification of the equipment of the Internet of things, thereby simplifying the fingerprint and ensuring the high efficiency of an algorithm model.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a diagram of a test platform architecture;
FIG. 2 is a diagram of a two-stage Internet of things device identification scheme;
fig. 3 is a flow chart of algorithm identification of the internet of things device.
Detailed Description
The invention provides a two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics, which comprises the steps of firstly extracting eight available header information in a TCP/IP protocol from equipment flow according to protocol knowledge as characteristics, using an MAC address as a label, and generating an initial sample after data preprocessing; then, for the purpose of simplifying the characteristics, information gain is utilized to select the characteristics and give characteristic scores; finally, a two-stage Internet of things equipment identification scheme is designed for reducing redundant information, a suitable OneR-NB algorithm model is provided, and model optimization degree is also provided; finally, the overall accuracy of the algorithm can reach 99.9% through realization and analysis, and the model optimization degree is 34.36% of the original optimization degree. The invention optimizes the scanning quantity of the flexible width wave beams, reduces the cost of user discovery by designing the quantity of the optimal flexible width wave beams, greatly improves the system capacity and solves the problem of the coverage gap between the user discovery and the data transmission. In the invention, on the aspect of identifying the equipment of the Internet of things, complex feature generation is reduced by screening TCP handshake packet reduction and extracting TCP/IP protocol field, then feature selection is carried out according to calculated information gain, and a two-stage equipment identification scheme of the Internet of things is designed to reduce redundant information brought by equipment of non-Internet of things, so that sufficient simplification of features is realized and high efficiency of an algorithm model is ensured.
The invention discloses a two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics, which comprises the following steps:
s1, data set acquisition and feature extraction
Referring to fig. 1, the data used in the present invention is derived from traffic collected in a real network environment, and the traffic is derived from various internet of things devices, including power supplies, applications, health monitoring, cameras, controllers, and the like, and non-internet of things devices, such as mobile phones, tablets, and notebook computers. Traffic can be mapped to another server store at the gateway through port mirroring at the router and analyzed. The method comprises the following steps of utilizing a wireshark network analyzer to process an original pcap flow packet, and specifically comprising the following steps:
in the first step, filter input parameters TCP, flags are 0x002, and SYN packets of TCPs of all devices are filtered;
secondly, fields such as a target IP, a total length and a survival time in the IP message header, a target port, a window size, a TCP option sequence, a maximum message length, a window expansion factor size and the like in a TCP message are applied as columns, and a source MAC address is also applied as a column;
and thirdly, exporting a grouping analysis result as a csv format file serving as an initial sample, writing python for missing value processing, converting all features into text types, and re-calibrating the classification labels according to the MAC addresses.
Table 1 shows the statistics of the device samples.
TABLE 1 statistical table of sample numbers of devices
Figure BDA0002406556470000071
Figure BDA0002406556470000081
Figure BDA0002406556470000091
S2, information gain calculation and feature selection
In order to realize fingerprint simplification, the invention selects characteristics by taking information gain as an index. Assuming that the information gain of the data set D by the feature A is denoted as G (D, A), the information entropy of the set D is defined as H (D), and the conditional entropy of the set D under the condition of the feature A is defined as H (D/A), then the information gain is:
G(D,A)=H(D)-H(D/A) (1)
firstly, calculating the information entropy of the sample set D, then calculating the conditional information entropy of the set D under the characteristic A, and finally, the difference value is the information gain of the characteristic A to the D.
In a specific implementation, a python script can be written according to a formula, the information gain calculation module in the weka data mining tool is used for scoring, and the scoring of the first stage and the second stage is shown in tables 2 and 3.
TABLE 2 first stage feature scoring Table
Feature(s) Scoring
Window size value 0.9801
Destination IP 0.9092
Multiplier 0.7527
Destination Port 0.673
Kind 0.6373
Length 0.6055
Time to live 0.09
MSS Value 0.0241
TABLE 3 second stage feature scoring Table
Figure BDA0002406556470000101
Figure BDA0002406556470000111
The window size in table 2 not only has a higher information gain, but also has fewer discrete values to facilitate modeling compared to the target IP, so the window size can be characterized in the first stage. It can be seen from table 3 that the first four features have stronger classification capability, and the first three features, i.e., the destination IP, the window expansion factor, and the destination port, are selected as features in the second stage in consideration of the fact that the window size is already adopted as a feature in the first stage.
S3 two-stage Internet of things equipment identification
In an actual network environment, non-internet-of-things devices such as mobile phones and tablet computers are mainly android systems or apple systems, so that the realization of a protocol stack is not large, and meanwhile, a target IP (internet protocol), a port and a service accessed by a user are relatively related, and a lot of redundant information can be generated.
Referring to fig. 2, the present invention designs a two-stage internet of things device identification scheme, which includes the following three modules:
flow characteristic construction module
The method comprises the steps of starting monitoring and capturing an original flow packet after the slave device is started, then filtering a SYN packet in TCP connection from the flow packet by using a network analysis tool, then extracting a characteristic field of a protocol according to protocol knowledge, using an MAC address as a classification label, and finally converting the extracted characteristic into a vector with a fixed format.
Device type binary module
The two-classification recognizer is adopted, and the purpose is to distinguish whether the current equipment to be classified is the equipment of the Internet of things or the equipment of the non-Internet of things by using few characteristics, so that redundant information caused by the equipment of the non-Internet of things is reduced.
Multi-classification module of Internet of things equipment
And a multi-classification machine learning algorithm is adopted, so that the accuracy and the high efficiency of the recognition algorithm are ensured under a reduced characteristic set.
The determined 8 features are regarded as text information, and meanwhile, the classification and identification problems of the Internet of things equipment are close to text classification, so that the invention provides the OneR-NB algorithm model, namely, the OneR algorithm is firstly utilized to carry out two-classification identification to determine whether the equipment is the Internet of things equipment, if the equipment is the Internet of things equipment, then the NB (naive Bayes) algorithm is utilized to carry out multi-classification identification, and the algorithm identification flow chart is shown in figure 3.
The idea of the oneR algorithm is simple, a rule for testing only aiming at a single attribute is established, and different branches are performed. Each branch corresponds to different attribute values, the class of the branch is the class of the original data (training data) which appears most on the branch, and the algorithm steps are as follows:
s301, selecting a certain attribute, and establishing a rule for each attribute value of the attribute as follows:
a. calculating the frequency of occurrence of each category;
b. finding the most frequently occurring category;
c. establishing a rule, and endowing the attribute value with the category;
s302, calculating an error rate of the rule;
s303, selecting a rule with the minimum error rate.
The first stage is a common classification problem, thus comparing the accuracy and model computational complexity of the OneR algorithm with other common algorithms. It can be seen from table 4 that although the accuracy of each algorithm reaches 100% in the first stage, the OneR algorithm adopted in the present invention has lower computational complexity.
TABLE 4 first stage algorithmic model comparison
Figure BDA0002406556470000121
Figure BDA0002406556470000131
Naive Bayesian algorithm (Naive Bayesian algorithm) is one of the most widely used classification algorithms. The naive Bayes method is correspondingly simplified on the basis of a Bayes algorithm, namely that the attributes are mutually independent under the condition when a target value is given. Is provided with a sampleData setD={d1,d2,…,dnAnd the characteristic attribute set corresponding to the sample data is X ═ X1,x2,…,xd},Class variablesIs Y ═ Y1,y2,…,ymD can be divided into m classes, where x1,x2,…,xdIndependent and random from each other, the prior probability P of YpriorP (Y), posterior probability P of YpostP (Y | X), obtainable by a naive bayes algorithm, the posterior probability can be determined by the prior probability PpriorThe feature probability P (X), the class conditional probability P (X | Y), and the feature probability P (Y) are calculated as follows:
Figure BDA0002406556470000132
naive bayes is based on the mutual independence between features, and given a class of y, the above equation can be further expressed as:
Figure BDA0002406556470000133
the posterior probability can be calculated by the two formulas as follows:
Figure BDA0002406556470000141
since the size of p (x) is fixed,therefore, when comparing posterior probabilities, only the molecular moieties of the above formula need be compared. Thus, it can be obtained that one sample data belongs to the category yiNaive bayes calculation of (a) is as follows:
Figure BDA0002406556470000142
the second stage is a multi-classification problem, and therefore compared with more broadly applied algorithms such as naive bayes and decision trees, and support vector machines, the model comparison is shown in table 5.
TABLE 5 second stage algorithmic model comparison
Figure BDA0002406556470000143
From table 5 it can be seen that all three algorithms are very accurate and do not differ much. It can be seen that naive bayes and decision trees have about the same accuracy and model complexity, but naive bayes have fewer cases of classification errors on class classification. The accuracy of the support vector machine is optimal, but the model complexity is higher because the support vector machine constructs multiple 2 classifiers to solve the multi-classification problem. The method can predict that as the class of the equipment of the Internet of things increases, the complexity of the model increases exponentially, and the support vector machine consumes more resources in practical application. In conclusion, the naive Bayes algorithm selected by the invention is optimal in this stage.
The OneR algorithm and the naive Bayesian algorithm are combined, the OneR algorithm can establish a model under a simple rule, and whether the equipment is the Internet of things equipment or not is quickly distinguished, so that redundant characteristic information brought by a large number of non-Internet of things equipment samples is removed.
In the second stage, redundant calculation can be reduced when the model is established by using naive Bayes, so that the model is faster.
And finally, evaluating the optimization degree of the two-stage model compared with the initial model.
Assuming that x internet of things device samples and y non-internet of things device samples are used for predetermining a characteristics, b characteristics are determined in the first stage, c characteristics are determined in the second stage, the sample model optimization degree is represented by alpha, and the sample model optimization degree represents the ratio of the two-stage classification sample model to the initial sample model, namely:
Figure BDA0002406556470000151
in all 256376 samples, 149440 pieces of internet-of-things equipment and 106936 pieces of non-internet-of-things equipment are provided, 8 characteristics are predetermined, 1 characteristic in the first stage and 3 characteristics in the second stage are substituted into the formula for calculation, and the model optimization degree provided by the invention is 34.36% of the original optimization degree.
In summary, the two-stage internet of things equipment identification method based on the TCP/IP protocol features of the present invention first describes the collection and feature extraction process of the data set in detail, determines the target IP, the total length, the lifetime, the target port in the TCP message, the window size, the TCP option order, the maximum message length, the window expansion factor size, and other 8 features according to the protocol knowledge; then, information gain is calculated for each feature and is sequenced, and the simplification of fingerprint features is realized after the features are selected; finally, in order to reduce redundant characteristic information brought by non-Internet-of-things equipment, a two-stage equipment identification scheme is designed, a suitable OneR-NB algorithm is determined, and the model optimization degree can reach 34.36% of the original optimization degree through data verification.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (7)

1. A two-stage Internet of things equipment identification method based on TCP/IP protocol features is characterized by comprising the following steps:
s1, arranging a collector at a core router of the Internet of things equipment to collect flow and send data to a management end, only filtering packets of the equipment in a TCP connection stage in the collection process, and judging according to a SYN mark field of a TCP message header; extracting fields in the headers of the IP message and the TCP message, forming a piece of sample data by taking the MAC address as an initial classification label, and forming a sample set D by collecting all the sample data;
s2, calculating information gain, sequencing importance degrees of features, and selecting features;
s3, adopting a two-stage Internet of things equipment identification model, identifying whether the equipment is Internet of things equipment or not in the first stage, identifying the specific type of the equipment in the second stage, determining a model optimization degree index, and completing the identification of the Internet of things equipment;
the method comprises the steps that a flow characteristic construction module starts to monitor and capture an original flow packet after equipment is started, a network analysis tool is used for filtering SYN packets in TCP connection from the flow packet, characteristic fields of a protocol are extracted according to protocol knowledge, an MAC address is used as a classification label, and finally extracted characteristics are converted into vectors in a fixed format; the method comprises the steps that a device type two-classification module is adopted, and the current device to be classified is an internet of things device or a non-internet of things device is distinguished by using characteristics; the method comprises the following steps of adopting a multi-classification module of the Internet of things equipment, performing two-classification identification by using an OneR algorithm to determine whether the equipment is the Internet of things equipment, and if the equipment is the Internet of things equipment, performing multi-classification identification by using a naive Bayesian algorithm, wherein the two-classification identification performed by the OneR algorithm specifically comprises the following steps:
s301, selecting a certain attribute, and establishing a rule for each attribute value of the attribute;
s302, calculating an error rate of the rule;
s303, selecting a rule with the minimum error rate.
2. The two-stage internet of things device identification method based on TCP/IP protocol features of claim 1, wherein extracting fields in IP message headers comprises: target IP, total length and survival time in the IP message header; fields in the TCP packet header include: the destination port, the window size, the TCP option order, the maximum message length and the window expansion factor size in the TCP message.
3. The two-stage Internet of things equipment identification method based on the TCP/IP protocol features as claimed in claim 1, wherein in step S2, the information entropy of a sample set D is first calculated, then the conditional information entropy of a set D under a feature A is calculated, and the final difference is the information gain of the feature A to the D; after information gain calculation is carried out on all the characteristics, the characteristics are arranged according to the sequence of the information gain values from large to small, after the information gain calculation is carried out on all the characteristics, the window size of a TCP message is selected as the characteristics in the first stage of identification of the Internet of things equipment, and the target IP of the IP message, the window expansion factor of the TCP message and the target port of the TCP message are selected as the characteristics in the second stage.
4. The TCP/IP protocol feature-based two-stage Internet of things device identification method according to claim 3, wherein the information gain G (D, A) of feature A to set D is:
G(D,A)=H(D)-H(D/A)
where D is the sample set generated in step S1, H (D) is the information entropy of the set D, and H (D/a) is the conditional entropy of the set D under the condition of the feature a.
5. The two-stage internet of things device identification method based on the TCP/IP protocol features as claimed in claim 1, wherein in step S301, the frequency of occurrence of each category is calculated; finding the most frequently occurring category; a rule is established that assigns this category to this attribute value.
6. The two-stage Internet of things equipment identification method based on the TCP/IP protocol features as claimed in claim 1, wherein x Internet of things equipment samples and y non-Internet of things equipment samples are assumed, a features are predetermined, b features are determined in the first stage, c features are determined in the second stage, and the sample model optimization degree alpha represents the ratio of the two-stage classification sample model to the initial sample model.
7. The TCP/IP protocol feature-based two-stage Internet of things equipment identification method according to claim 6, wherein the ratio α between the two-stage classification sample model and the initial sample model is as follows:
Figure FDA0002921596430000021
CN202010163310.0A 2020-03-10 2020-03-10 Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics Active CN111431872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010163310.0A CN111431872B (en) 2020-03-10 2020-03-10 Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010163310.0A CN111431872B (en) 2020-03-10 2020-03-10 Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics

Publications (2)

Publication Number Publication Date
CN111431872A CN111431872A (en) 2020-07-17
CN111431872B true CN111431872B (en) 2021-04-20

Family

ID=71547668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010163310.0A Active CN111431872B (en) 2020-03-10 2020-03-10 Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics

Country Status (1)

Country Link
CN (1) CN111431872B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112953961B (en) * 2021-03-14 2022-05-17 国网浙江省电力有限公司电力科学研究院 Equipment type identification method in power distribution room Internet of things
CN114205332A (en) * 2021-11-12 2022-03-18 国网山西省电力公司电力科学研究院 Power Internet of things equipment identification method based on TCP retransmission message

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109756489A (en) * 2018-12-26 2019-05-14 浙江大学常州工业技术研究院 A kind of efficient Internet of Things Subscriber Management System and application method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780698A (en) * 2012-07-24 2012-11-14 南京邮电大学 User terminal safety communication method in platform of Internet of Things
US10536357B2 (en) * 2015-06-05 2020-01-14 Cisco Technology, Inc. Late data detection in data center
CN110086810B (en) * 2019-04-29 2020-08-18 西安交通大学 Passive industrial control equipment fingerprint identification method and device based on characteristic behavior analysis
CN110380989B (en) * 2019-07-26 2022-09-02 东南大学 Internet of things equipment identification method based on two-stage and multi-classification network traffic fingerprint features
CN110519128B (en) * 2019-09-20 2021-02-19 西安交通大学 Random forest based operating system identification method
CN110784491B (en) * 2019-11-13 2022-08-16 深圳前海智安信息科技有限公司 Internet of things safety management system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109756489A (en) * 2018-12-26 2019-05-14 浙江大学常州工业技术研究院 A kind of efficient Internet of Things Subscriber Management System and application method

Also Published As

Publication number Publication date
CN111431872A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
CN109063745B (en) Network equipment type identification method and system based on decision tree
CN109284606B (en) Data flow anomaly detection system based on empirical characteristics and convolutional neural network
CN111107102A (en) Real-time network flow abnormity detection method based on big data
CN112804123B (en) Network protocol identification method and system for scheduling data network
Alabadi et al. Anomaly detection for cyber-security based on convolution neural network: A survey
EP3948604B1 (en) Computer security
CN111431872B (en) Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics
CN112884204B (en) Network security risk event prediction method and device
Thom et al. Smart recon: Network traffic fingerprinting for iot device identification
GB2583892A (en) Adaptive computer security
CN116992299B (en) Training method, detecting method and device of blockchain transaction anomaly detection model
CN112115965A (en) SVM-based passive operating system identification method, storage medium and equipment
EP3948603B1 (en) Pre-emptive computer security
EP3948605B1 (en) Adaptive computer security
Rajesh et al. Evaluation of machine learning algorithms for detection of malicious traffic in scada network
Li et al. Transfer-learning-based network traffic automatic generation framework
Chakraborty et al. Industrial control system device classification using network traffic features and neural network embeddings
CN116170237B (en) Intrusion detection method fusing GNN and ACGAN
CN111211948B (en) Shodan flow identification method based on load characteristics and statistical characteristics
CN114362972B (en) Botnet hybrid detection method and system based on flow abstract and graph sampling
Khatun et al. An Approach to Detect Phishing Websites with Features Selection Method and Ensemble Learning
CN118381682B (en) Industrial control network attack event comprehensive analysis tracing method and device
CN113242240B (en) Method and device capable of detecting DDoS attacks of multiple types of application layers
CN114726599B (en) Artificial intelligence algorithm-based intrusion detection method and device in software defined network
CN118353724B (en) Encryption malicious traffic detection method and system based on multi-feature selection stacking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant