CN111431872B

CN111431872B - Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics

Info

Publication number: CN111431872B
Application number: CN202010163310.0A
Authority: CN
Inventors: 范建存; 王炳杰
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2021-04-20
Anticipated expiration: 2040-03-10
Also published as: CN111431872A

Abstract

The invention discloses a two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics, wherein eight available header information in a TCP/IP protocol are extracted from equipment flow according to protocol knowledge and used as characteristics, an MAC address is used as a label, and an initial sample is generated after data preprocessing is carried out; selecting characteristics by using information gain for simplifying characteristics and giving a characteristic score; designing a two-stage Internet of things equipment identification scheme for reducing redundant information and providing a corresponding OneR-NB algorithm model and a model optimization degree; the overall accuracy of the invention reaches 99.9%, and the model optimization degree is 34.36% of the original optimization degree. On the aspect of identifying the equipment of the Internet of things, complex feature generation is reduced by screening TCP handshake packet reduction and extracting TCP/IP protocol fields, feature selection is carried out according to calculated information gain, a two-stage equipment identification scheme of the Internet of things is designed to reduce redundant information brought by equipment of a non-Internet of things, and finally sufficient simplification of features is realized and the high efficiency of an algorithm model is ensured.

Description

Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics.

Background

The development of the technology of the internet of things leads human beings to enter a new era of everything perception, everything interconnection and everything intelligence, the number of personal terminals reaches 400 billion by the time of 2025 years according to the prediction of GIV, and the economic output value after the number transformation of related industries can reach 23 trillion dollars. However, new technologies also pose threats and challenges, and by the latest forecast of Juniper Research, the internet of things security cost will reach $ 60 billion by 2023. Different from the traditional network security, attackers slowly transfer from equipment vulnerability exploitation to tool simulation legal operation, and attack modes are more and more diversified. In order to reduce the security risk of the environment of the internet of things, the accessed internet of things equipment must be supervised. The conventional device identification mainly takes the information of MAC address, IP address, host name and the like as the standard, but the information can be forged. To address such issues, features may be extracted from network traffic as internet of things device fingerprints.

Although machine learning has been studied on internet of things traffic, most of the studies are mainly based on abnormal traffic detection, and only a few parts of internet of things equipment identification are researched. In the identification of the internet of things equipment, characteristics with more calculation are selected in many schemes, or the accuracy is optimized by performing special improvement on the algorithm, and whether the method is suitable for practical application or not is rarely considered. There are three key points to consider in actual fingerprint identification: firstly, fingerprint characteristics; secondly, an algorithm model; and thirdly, calculating performance. The differences between the internet of things flow and the traditional network flow and how to extract effective information from the differences are a basis for constructing the fingerprint of the internet of things equipment, and the selection of a proper machine learning algorithm model is a key, and the guarantee that the actual calculation resource consumption can be implemented is considered.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a two-stage internet of things equipment identification method based on TCP/IP protocol characteristics, aiming at the defects in the prior art, so that the sufficient simplification of fingerprints is realized and the high efficiency of an algorithm model is ensured.

The invention adopts the following technical scheme:

a two-stage Internet of things equipment identification method based on TCP/IP protocol features comprises the following steps:

s1, deploying a collector at a core router of the Internet of things equipment to collect traffic and send data to a management end, wherein the extracted features are that fields in the head of a TCP or IP message form a sample set D;

s2, calculating information gain, sequencing importance degrees of features, and selecting features;

s3, adopting a two-stage Internet of things equipment identification model, identifying whether the equipment is Internet of things equipment or not in the first stage, identifying the specific type of the equipment in the second stage, determining a model optimization degree index through a oneR-NB method, and completing the identification of the Internet of things equipment.

Specifically, in step S1, only packets in the TCP connection phase of the device are filtered out in the acquisition process, and the determination is performed according to the SYN flag field in the header of the TCP packet; and extracting fields in the headers of the IP message and the TCP message, forming a sample data by taking the MAC address as an initial classification label, and forming a sample set D by collecting all the sample data.

Further, extracting fields in headers of the IP packet and the TCP packet includes: the method comprises the steps of target IP in an IP message header, total length, survival time, a target port in a TCP message, window size, TCP option sequence, maximum message length and window expansion factor size.

Specifically, in step S2, first, the information entropy of the sample set D is calculated, then the conditional information entropy of the set D under the feature a is calculated, and the final difference is the information gain of the feature a to D; after information gain calculation is carried out on all the characteristics, the characteristics are arranged according to the sequence of the information gain values from large to small, after the information gain calculation is carried out on all the characteristics, the size of a window is selected as the characteristics in the first stage of the identification of the equipment of the Internet of things, and a target IP, a window expansion factor and a target port are selected as the characteristics in the second stage.

Further, the information gain G (D, a) of the feature a to the set D is:

G(D,A)＝H(D)-H(D/A)

where D is the sample set generated in step S1, H (D) is the information entropy of the set D, and H (D/a) is the conditional entropy of the set D under the condition of the feature a.

Specifically, in step S3, the traffic feature construction module starts monitoring and capturing an original traffic packet after the device is started, a network analysis tool is used to filter a SYN packet in a TCP connection from the traffic packet, a feature field of a protocol is extracted according to protocol knowledge and an MAC address is used as a classification label, and finally, the extracted features are converted into a vector with a fixed format; the method comprises the steps that a device type two-classification module is adopted, and few features are used for distinguishing whether current devices to be classified are internet of things devices or non-internet of things devices; and (3) performing two-classification identification by using an Internet of things equipment multi-classification module and an OneR algorithm to determine whether the equipment is the Internet of things equipment, and if so, performing multi-classification identification by using a naive Bayesian algorithm.

Further, the two-classification recognition performed by the OneR algorithm specifically comprises:

s301, selecting a certain attribute, and establishing a rule for each attribute value of the attribute;

s302, calculating an error rate of the rule;

s303, selecting a rule with the minimum error rate.

Further, in step S301, the frequency of occurrence of each category is calculated; finding the most frequently occurring category; a rule is established that assigns this category to this attribute value.

Further, assuming that x internet of things device samples and y non-internet of things device samples are used for predetermining a features, b features are determined in the first stage, c features are determined in the second stage, and the sample model optimization degree alpha represents the ratio of the two-stage classification sample model to the initial sample model.

Furthermore, the ratio α between the two-stage classification sample model and the initial sample model is:

compared with the prior art, the invention has at least the following beneficial effects:

a two-stage Internet of things equipment identification method based on TCP/IP protocol features is characterized in that all packets are not required to be collected in a flow collection stage, and only SYN packets in TCP connection are required to be collected, so that the storage space can be reduced; the extracted features are all fields in the message header, which are determined according to protocol knowledge and do not require additional computational consumption. A two-stage Internet of things equipment identification scheme is designed, feature selection is carried out at each stage, a OneR-NB algorithm model is provided to adapt to sample data, and finally an optimization degree evaluation index of the model is provided.

Further, the architecture of the internet of things comprises three layers, namely an application layer, a network layer and a physical layer. The purpose of the scheme is to start from the traffic data of the network layer, so that a collector is deployed at a core router of the Internet of things equipment to collect traffic and send the data to a management end. The TCP/IP protocol is a mainstream protocol of network transmission at present, a stable transmission mechanism of the TCP protocol and the particularity of network services of the equipment of the Internet of things enable the information which can be used to be mined from the TCP/IP protocol to serve as the characteristics of equipment identification.

Furthermore, the invention extracts 8 fields from the headers of the IP message and the TCP message, and the method comprises the following steps: the method comprises the steps of target IP in an IP message header, total length, survival time, a target port in a TCP message, window size, TCP option sequence, maximum message length and window expansion factor size. The IP protocol is located at the network layer of the TCP/IP protocol, and the destination IP, total length, and time-to-live in the header of the message significantly convey routing and transport information. The TCP protocol is located at the TCP/IP transport layer, and aims to establish a reliable end-to-end connection on the network, and the control information in the header of the TCP protocol can indicate the object and state of the connection. The characteristics of a destination port, the size of a window, the sequence of an option part in a TCP message header, the maximum message length in the option part and the field value of a window expansion factor can be reflected most. In conclusion, the maximum of 8 characteristics selected by the invention shows the difference of the implementation of different Internet of things devices on the TCP/IP protocol.

Furthermore, in the application of machine learning algorithm, on one hand, good feature selection can help us to understand the characteristics and the underlying structure of data, which plays an important role in further improving the model and algorithm. On the other hand, good feature selection can reduce the number of features to achieve the purpose of dimension reduction, thereby improving the performance of the model. The information gain reflects the amount of information a feature can bring to a system, and the larger the information gain, the more beneficial the feature is for classification.

Furthermore, in consideration of practical application problems, the real internet of things environment comprises internet of things equipment and non-internet of things equipment, and the characteristic values of the non-internet of things equipment are invalid and bring redundant information. The first-stage equipment identification designed by the invention can effectively distinguish two types of equipment by using only one characteristic, and the applied OneR algorithm has the most efficient performance. And in the second stage, specific Internet of things equipment is identified, and for the multi-classification problem, the naive Bayes algorithm in the stage is optimal in precision and performance by combining the characteristics of the multi-classification problem. Meanwhile, the invention provides the model optimization degree of the two-stage recognition scheme, and the advantages of the scheme are shown.

In conclusion, the invention constructs characteristics from the perspective of TCP/IP protocol on the aspect of the identification of the equipment of the Internet of things, thereby simplifying the fingerprint and ensuring the high efficiency of an algorithm model.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a diagram of a test platform architecture;

FIG. 2 is a diagram of a two-stage Internet of things device identification scheme;

fig. 3 is a flow chart of algorithm identification of the internet of things device.

Detailed Description

The invention provides a two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics, which comprises the steps of firstly extracting eight available header information in a TCP/IP protocol from equipment flow according to protocol knowledge as characteristics, using an MAC address as a label, and generating an initial sample after data preprocessing; then, for the purpose of simplifying the characteristics, information gain is utilized to select the characteristics and give characteristic scores; finally, a two-stage Internet of things equipment identification scheme is designed for reducing redundant information, a suitable OneR-NB algorithm model is provided, and model optimization degree is also provided; finally, the overall accuracy of the algorithm can reach 99.9% through realization and analysis, and the model optimization degree is 34.36% of the original optimization degree. The invention optimizes the scanning quantity of the flexible width wave beams, reduces the cost of user discovery by designing the quantity of the optimal flexible width wave beams, greatly improves the system capacity and solves the problem of the coverage gap between the user discovery and the data transmission. In the invention, on the aspect of identifying the equipment of the Internet of things, complex feature generation is reduced by screening TCP handshake packet reduction and extracting TCP/IP protocol field, then feature selection is carried out according to calculated information gain, and a two-stage equipment identification scheme of the Internet of things is designed to reduce redundant information brought by equipment of non-Internet of things, so that sufficient simplification of features is realized and high efficiency of an algorithm model is ensured.

The invention discloses a two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics, which comprises the following steps:

s1, data set acquisition and feature extraction

Referring to fig. 1, the data used in the present invention is derived from traffic collected in a real network environment, and the traffic is derived from various internet of things devices, including power supplies, applications, health monitoring, cameras, controllers, and the like, and non-internet of things devices, such as mobile phones, tablets, and notebook computers. Traffic can be mapped to another server store at the gateway through port mirroring at the router and analyzed. The method comprises the following steps of utilizing a wireshark network analyzer to process an original pcap flow packet, and specifically comprising the following steps:

in the first step, filter input parameters TCP, flags are 0x002, and SYN packets of TCPs of all devices are filtered;

secondly, fields such as a target IP, a total length and a survival time in the IP message header, a target port, a window size, a TCP option sequence, a maximum message length, a window expansion factor size and the like in a TCP message are applied as columns, and a source MAC address is also applied as a column;

and thirdly, exporting a grouping analysis result as a csv format file serving as an initial sample, writing python for missing value processing, converting all features into text types, and re-calibrating the classification labels according to the MAC addresses.

Table 1 shows the statistics of the device samples.

TABLE 1 statistical table of sample numbers of devices

S2, information gain calculation and feature selection

In order to realize fingerprint simplification, the invention selects characteristics by taking information gain as an index. Assuming that the information gain of the data set D by the feature A is denoted as G (D, A), the information entropy of the set D is defined as H (D), and the conditional entropy of the set D under the condition of the feature A is defined as H (D/A), then the information gain is:

G(D,A)＝H(D)-H(D/A) (1)

firstly, calculating the information entropy of the sample set D, then calculating the conditional information entropy of the set D under the characteristic A, and finally, the difference value is the information gain of the characteristic A to the D.

In a specific implementation, a python script can be written according to a formula, the information gain calculation module in the weka data mining tool is used for scoring, and the scoring of the first stage and the second stage is shown in tables 2 and 3.

TABLE 2 first stage feature scoring Table

Feature(s)	Scoring
		Window size value	0.9801
Destination IP	0.9092
		Multiplier	0.7527
Destination Port	0.673
		Kind	0.6373
Length	0.6055
		Time to live	0.09
MSS Value	0.0241

TABLE 3 second stage feature scoring Table

The window size in table 2 not only has a higher information gain, but also has fewer discrete values to facilitate modeling compared to the target IP, so the window size can be characterized in the first stage. It can be seen from table 3 that the first four features have stronger classification capability, and the first three features, i.e., the destination IP, the window expansion factor, and the destination port, are selected as features in the second stage in consideration of the fact that the window size is already adopted as a feature in the first stage.

S3 two-stage Internet of things equipment identification

In an actual network environment, non-internet-of-things devices such as mobile phones and tablet computers are mainly android systems or apple systems, so that the realization of a protocol stack is not large, and meanwhile, a target IP (internet protocol), a port and a service accessed by a user are relatively related, and a lot of redundant information can be generated.

Referring to fig. 2, the present invention designs a two-stage internet of things device identification scheme, which includes the following three modules:

flow characteristic construction module

The method comprises the steps of starting monitoring and capturing an original flow packet after the slave device is started, then filtering a SYN packet in TCP connection from the flow packet by using a network analysis tool, then extracting a characteristic field of a protocol according to protocol knowledge, using an MAC address as a classification label, and finally converting the extracted characteristic into a vector with a fixed format.

Device type binary module

The two-classification recognizer is adopted, and the purpose is to distinguish whether the current equipment to be classified is the equipment of the Internet of things or the equipment of the non-Internet of things by using few characteristics, so that redundant information caused by the equipment of the non-Internet of things is reduced.

Multi-classification module of Internet of things equipment

And a multi-classification machine learning algorithm is adopted, so that the accuracy and the high efficiency of the recognition algorithm are ensured under a reduced characteristic set.

The determined 8 features are regarded as text information, and meanwhile, the classification and identification problems of the Internet of things equipment are close to text classification, so that the invention provides the OneR-NB algorithm model, namely, the OneR algorithm is firstly utilized to carry out two-classification identification to determine whether the equipment is the Internet of things equipment, if the equipment is the Internet of things equipment, then the NB (naive Bayes) algorithm is utilized to carry out multi-classification identification, and the algorithm identification flow chart is shown in figure 3.

The idea of the oneR algorithm is simple, a rule for testing only aiming at a single attribute is established, and different branches are performed. Each branch corresponds to different attribute values, the class of the branch is the class of the original data (training data) which appears most on the branch, and the algorithm steps are as follows:

s301, selecting a certain attribute, and establishing a rule for each attribute value of the attribute as follows:

a. calculating the frequency of occurrence of each category;

b. finding the most frequently occurring category;

c. establishing a rule, and endowing the attribute value with the category;

s302, calculating an error rate of the rule;

s303, selecting a rule with the minimum error rate.

The first stage is a common classification problem, thus comparing the accuracy and model computational complexity of the OneR algorithm with other common algorithms. It can be seen from table 4 that although the accuracy of each algorithm reaches 100% in the first stage, the OneR algorithm adopted in the present invention has lower computational complexity.

TABLE 4 first stage algorithmic model comparison

Naive Bayesian algorithm (Naive Bayesian algorithm) is one of the most widely used classification algorithms. The naive Bayes method is correspondingly simplified on the basis of a Bayes algorithm, namely that the attributes are mutually independent under the condition when a target value is given. Is provided with a sampleData setD＝{d₁,d₂,…,d_nAnd the characteristic attribute set corresponding to the sample data is X ═ X₁,x₂,…,x_d}，Class variablesIs Y ═ Y₁,y₂,…,y_mD can be divided into m classes, where x₁,x₂,…,x_dIndependent and random from each other, the prior probability P of Y_priorP (Y), posterior probability P of Y_postP (Y | X), obtainable by a naive bayes algorithm, the posterior probability can be determined by the prior probability P_priorThe feature probability P (X), the class conditional probability P (X | Y), and the feature probability P (Y) are calculated as follows:

naive bayes is based on the mutual independence between features, and given a class of y, the above equation can be further expressed as:

the posterior probability can be calculated by the two formulas as follows:

since the size of p (x) is fixed,therefore, when comparing posterior probabilities, only the molecular moieties of the above formula need be compared. Thus, it can be obtained that one sample data belongs to the category y_iNaive bayes calculation of (a) is as follows:

the second stage is a multi-classification problem, and therefore compared with more broadly applied algorithms such as naive bayes and decision trees, and support vector machines, the model comparison is shown in table 5.

TABLE 5 second stage algorithmic model comparison

From table 5 it can be seen that all three algorithms are very accurate and do not differ much. It can be seen that naive bayes and decision trees have about the same accuracy and model complexity, but naive bayes have fewer cases of classification errors on class classification. The accuracy of the support vector machine is optimal, but the model complexity is higher because the support vector machine constructs multiple 2 classifiers to solve the multi-classification problem. The method can predict that as the class of the equipment of the Internet of things increases, the complexity of the model increases exponentially, and the support vector machine consumes more resources in practical application. In conclusion, the naive Bayes algorithm selected by the invention is optimal in this stage.

The OneR algorithm and the naive Bayesian algorithm are combined, the OneR algorithm can establish a model under a simple rule, and whether the equipment is the Internet of things equipment or not is quickly distinguished, so that redundant characteristic information brought by a large number of non-Internet of things equipment samples is removed.

In the second stage, redundant calculation can be reduced when the model is established by using naive Bayes, so that the model is faster.

And finally, evaluating the optimization degree of the two-stage model compared with the initial model.

Assuming that x internet of things device samples and y non-internet of things device samples are used for predetermining a characteristics, b characteristics are determined in the first stage, c characteristics are determined in the second stage, the sample model optimization degree is represented by alpha, and the sample model optimization degree represents the ratio of the two-stage classification sample model to the initial sample model, namely:

in all 256376 samples, 149440 pieces of internet-of-things equipment and 106936 pieces of non-internet-of-things equipment are provided, 8 characteristics are predetermined, 1 characteristic in the first stage and 3 characteristics in the second stage are substituted into the formula for calculation, and the model optimization degree provided by the invention is 34.36% of the original optimization degree.

In summary, the two-stage internet of things equipment identification method based on the TCP/IP protocol features of the present invention first describes the collection and feature extraction process of the data set in detail, determines the target IP, the total length, the lifetime, the target port in the TCP message, the window size, the TCP option order, the maximum message length, the window expansion factor size, and other 8 features according to the protocol knowledge; then, information gain is calculated for each feature and is sequenced, and the simplification of fingerprint features is realized after the features are selected; finally, in order to reduce redundant characteristic information brought by non-Internet-of-things equipment, a two-stage equipment identification scheme is designed, a suitable OneR-NB algorithm is determined, and the model optimization degree can reach 34.36% of the original optimization degree through data verification.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A two-stage Internet of things equipment identification method based on TCP/IP protocol features is characterized by comprising the following steps:

s1, arranging a collector at a core router of the Internet of things equipment to collect flow and send data to a management end, only filtering packets of the equipment in a TCP connection stage in the collection process, and judging according to a SYN mark field of a TCP message header; extracting fields in the headers of the IP message and the TCP message, forming a piece of sample data by taking the MAC address as an initial classification label, and forming a sample set D by collecting all the sample data;

s3, adopting a two-stage Internet of things equipment identification model, identifying whether the equipment is Internet of things equipment or not in the first stage, identifying the specific type of the equipment in the second stage, determining a model optimization degree index, and completing the identification of the Internet of things equipment;

the method comprises the steps that a flow characteristic construction module starts to monitor and capture an original flow packet after equipment is started, a network analysis tool is used for filtering SYN packets in TCP connection from the flow packet, characteristic fields of a protocol are extracted according to protocol knowledge, an MAC address is used as a classification label, and finally extracted characteristics are converted into vectors in a fixed format; the method comprises the steps that a device type two-classification module is adopted, and the current device to be classified is an internet of things device or a non-internet of things device is distinguished by using characteristics; the method comprises the following steps of adopting a multi-classification module of the Internet of things equipment, performing two-classification identification by using an OneR algorithm to determine whether the equipment is the Internet of things equipment, and if the equipment is the Internet of things equipment, performing multi-classification identification by using a naive Bayesian algorithm, wherein the two-classification identification performed by the OneR algorithm specifically comprises the following steps:

s302, calculating an error rate of the rule;

s303, selecting a rule with the minimum error rate.

2. The two-stage internet of things device identification method based on TCP/IP protocol features of claim 1, wherein extracting fields in IP message headers comprises: target IP, total length and survival time in the IP message header; fields in the TCP packet header include: the destination port, the window size, the TCP option order, the maximum message length and the window expansion factor size in the TCP message.

3. The two-stage Internet of things equipment identification method based on the TCP/IP protocol features as claimed in claim 1, wherein in step S2, the information entropy of a sample set D is first calculated, then the conditional information entropy of a set D under a feature A is calculated, and the final difference is the information gain of the feature A to the D; after information gain calculation is carried out on all the characteristics, the characteristics are arranged according to the sequence of the information gain values from large to small, after the information gain calculation is carried out on all the characteristics, the window size of a TCP message is selected as the characteristics in the first stage of identification of the Internet of things equipment, and the target IP of the IP message, the window expansion factor of the TCP message and the target port of the TCP message are selected as the characteristics in the second stage.

4. The TCP/IP protocol feature-based two-stage Internet of things device identification method according to claim 3, wherein the information gain G (D, A) of feature A to set D is:

G(D,A)＝H(D)-H(D/A)

5. The two-stage internet of things device identification method based on the TCP/IP protocol features as claimed in claim 1, wherein in step S301, the frequency of occurrence of each category is calculated; finding the most frequently occurring category; a rule is established that assigns this category to this attribute value.

6. The two-stage Internet of things equipment identification method based on the TCP/IP protocol features as claimed in claim 1, wherein x Internet of things equipment samples and y non-Internet of things equipment samples are assumed, a features are predetermined, b features are determined in the first stage, c features are determined in the second stage, and the sample model optimization degree alpha represents the ratio of the two-stage classification sample model to the initial sample model.

7. The TCP/IP protocol feature-based two-stage Internet of things equipment identification method according to claim 6, wherein the ratio α between the two-stage classification sample model and the initial sample model is as follows: