WO2022088779A1 - 深度报文处理方法、装置、电子设备及存储介质 - Google Patents

深度报文处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022088779A1
WO2022088779A1 PCT/CN2021/107642 CN2021107642W WO2022088779A1 WO 2022088779 A1 WO2022088779 A1 WO 2022088779A1 CN 2021107642 W CN2021107642 W CN 2021107642W WO 2022088779 A1 WO2022088779 A1 WO 2022088779A1
Authority
WO
WIPO (PCT)
Prior art keywords
http
data stream
protocol
stream
tcp
Prior art date
Application number
PCT/CN2021/107642
Other languages
English (en)
French (fr)
Inventor
孙晓
谢永恒
万月亮
Original Assignee
北京锐安科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京锐安科技有限公司 filed Critical 北京锐安科技有限公司
Publication of WO2022088779A1 publication Critical patent/WO2022088779A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/163In-band adaptation of TCP data exchange; In-band control procedures

Definitions

  • the embodiments of the present application relate to the field of communication technologies, for example, to a deep packet processing method, apparatus, electronic device, and storage medium.
  • the DPI Deep Packet Inspection, deep packet inspection
  • the "depth” is compared with the level of ordinary packet analysis.
  • Ordinary packet detection only analyzes the content below the fourth layer of Internet Protocol (IP) packets, including source IP address, destination IP address, transmission Layer source port, transport layer destination port and bearer protocol type, and DPI technology is based on the analysis of network protocols from Layer 2 to Layer 7, which can realize accurate perception of data in the network, so as to achieve accurate grasp of network status, such as service identification. , business statistics, traffic control, and network element analysis.
  • IP Internet Protocol
  • DPI digital data packet filtering technology
  • the essence of DPI is a data packet filtering technology, which needs to parse the application layer payload first, so as to match and filter the parsed information according to business requirements. Therefore, it is very important to parse out the application layer payload information.
  • the present application provides an in-depth message processing method, device, electronic device and storage medium to implement message processing for the HTTP/2 protocol, thereby providing a data basis for matching preset detection rules.
  • an embodiment of the present application provides a deep packet processing method, including:
  • At least one HTTP/2 data stream in the TCP data stream is determined according to the stream identifier in the HTTP/2 protocol, and based on the HTTP/2 data
  • the type of stream constructs the corresponding HTTP/2 stream object structure
  • the parsed header information and data content of each HTTP/2 data stream are stored in the corresponding HTTP/2 stream object structure.
  • an embodiment of the present application further provides a deep packet processing device, the device comprising:
  • a protocol determination module configured to obtain a TCP data stream, and determine whether the TCP data stream contains the HTTP/2 protocol based on the protocol identifier in the TCP data stream;
  • a structure building module configured to determine at least one HTTP/2 data stream in the TCP data stream according to a stream identifier in the HTTP/2 protocol when the HTTP/2 protocol is included in the TCP data stream, and Build a corresponding HTTP/2 stream object structure based on the type of the HTTP/2 data stream;
  • a content extraction module configured to parse the header information of each of the HTTP/2 data streams based on the HTTP/2 header mapping table, and extract the data content corresponding to the header information in the HTTP/2 data stream;
  • the data storage module is configured to store the parsed header information and data content of each HTTP/2 data stream into a corresponding HTTP/2 stream object structure.
  • an embodiment of the present application further provides an electronic device, the electronic device comprising:
  • storage means arranged to store at least one program
  • the at least one processor When the at least one program is executed by the at least one processor, the at least one processor implements the deep packet processing method provided by the embodiment of the present application.
  • the embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the deep packet processing method provided by the embodiments of the present application.
  • FIG. 1 is a schematic flowchart of a deep packet processing method according to Embodiment 1 of the present application
  • FIG. 2 is a schematic flowchart of a deep packet processing method according to Embodiment 2 of the present application.
  • FIG. 3 is a schematic structural diagram of a deep message processing apparatus according to Embodiment 3 of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device according to Embodiment 4 of the present application.
  • Embodiment 1 is a schematic flowchart of a deep packet processing method provided in Embodiment 1 of the present application.
  • This embodiment is applicable to the need to parse a HyperText Transfer Protocol 2.0 (Hyper Text Transfer Protocol 2.0, HTTP/2) data stream, and to analyze at least one
  • the data information of the HTTP/2 data stream is stored, to statistically analyze the situation of a large amount of HTTP/2 data stream data information, the method can be performed by a deep message processing device, and the device can be realized by hardware and/or software,
  • the method includes the following steps:
  • TCP Transmission Control Protocol
  • Transmission Control Protocol provides a connection-oriented byte stream service
  • two applications using TCP need to establish a TCP connection before exchanging data packets.
  • the data transmitted over the connection is a TCP data stream
  • each TCP data stream contains at least one TCP packet
  • each TCP packet includes a packet header part and an effectively carried data body part.
  • the TCP data stream can be obtained directly from the transport layer, or obtained by decapsulation from underlying data packets such as the link layer and the network layer, where decapsulation refers to the process of removing the header or tail layer by layer from the bottom up.
  • the HTTP/2 protocol refers to the HTTP/2 protocol message frame.
  • the protocol identifier refers to the protocol information contained in the data body part of each TCP message in the TCP data stream, where the protocol refers to the application layer such as HTTP protocol, File Transfer Protocol (FTP) protocol or HTTP/2
  • the protocol information refers to the combination of at least one message frame sent on the application layer, wherein at least one message frame generated by the application layer protocol in the interaction process is associated and assembled by the transport layer according to the flag of the frame header, And encapsulated into the data body part of the TCP message, therefore, the application layer protocol information can be determined from the data body part of the TCP message.
  • each application layer protocol has corresponding characteristics, such as a specific port, a specific string or a specific Bit sequence, etc., based on the corresponding characteristics of each protocol, through the TCP
  • the protocol information in the data body part of the TCP message in the data stream is detected, and the protocol contained in the TCP data stream can be determined.
  • determining whether the TCP data stream contains the HTTP/2 protocol based on the protocol identifier in the TCP data stream includes: extracting the content of the first valid bearer packet from the TCP data stream, Magic string matching is performed on the content of the text. If the matching is successful, the HTTP/2 protocol is included in the TCP data stream.
  • the content of the first effectively carried packet refers to the data body part of the first TCP packet of the TCP data stream.
  • Magic frames are used to determine whether they are HTTP/2 protocols by performing Magic string matching on the first message frame.
  • the content of the Magic frame is fixed as PRI*HTTP/2.0 ⁇ r ⁇ n ⁇ r ⁇ nSM ⁇ r ⁇ n ⁇ r ⁇ n. Therefore, the Magic string match is performed on the content of the first valid packet carried by the TCP data stream.
  • the HTTP/2 protocol is recognized.
  • the TCP data stream is added to the blacklist, so that the HTTP/2 data stream processing is not performed during the life cycle of the TCP data stream.
  • the life cycle of the TCP data stream refers to the preset time period for maintaining the TCP connection. If the TCP data stream does not contain the HTTP/2 protocol, the TCP data stream is added to the blacklist to ensure that the TCP connection is maintained within the preset time period. No HTTP/2 stream processing is performed on the TCP stream.
  • HTTP/2 protocol is included in the TCP data stream, determine at least one HTTP/2 data stream in the TCP data stream according to the stream identifier in the HTTP/2 protocol, and construct a corresponding HTTP/2 data stream based on the type of the HTTP/2 data stream /2 Stream object structure.
  • the stream identifier in the HTTP/2 protocol is used to uniquely identify the HTTP/2 data stream, the stream identifier is 31 bytes, and at least one HTTP/2 data stream can be determined by the stream identifier.
  • a single HTTP/2 connection can contain multiple HTTP/2 data streams opened at the same time, each HTTP/2 data stream can be regarded as a request, and each HTTP/2 data stream transmits at least one HTTP/2 protocol packet Frames, HTTP/2 protocol message frames in different HTTP/2 data streams are interleaved and sent to each other. According to the flag of the frame header, the HTTP/2 protocol message frames on multiple different HTTP/2 data streams are associated and assembled at the transport layer, and encapsulated into TCP messages.
  • the process of determining at least one HTTP/2 data stream in the TCP data stream according to the stream identifier in the HTTP/2 protocol is: first, determine the data body of the TCP packet from the TCP data stream, wherein the TCP packet The data body is obtained by correlating and assembling the HTTP/2 protocol message frames of multiple HTTP/2 data streams according to the flags of the frame headers; then, obtaining multiple HTTP/2 protocol message frames from the TCP message data body ; Finally, at least one HTTP/2 data stream is determined according to the stream identifier of the HTTP/2 protocol message frame. Exemplarily, if the stream identifiers of the four HTTP/2 protocol message frames are 2, 4, 4, and 6, respectively, three HTTP/2 stream object structures are constructed.
  • each HTTP/2 stream includes a header frame—headers, which is used to transmit additional header fields of the HTTP/2 stream, and may also include at least one message body frame—data, which is used to transmit HTTP/ 2 message body.
  • the HTTP/2 stream object structure includes the header storage area and the corresponding data storage area.
  • the header information of each HTTP/2 data stream is compressed by Huffman encoding to reduce the transmission size, and is carried by the headers in the HTTP/2 stream.
  • Huffman encoding and decoding By performing Huffman encoding and decoding on the headers in the HTTP/2 stream, combined with the header mapping table Get the header information of the HTTP/2 data stream. Both ends of the HTTP/2 data stream transmission (such as the client and the server) need to maintain the same header mapping table.
  • the HTTP/2 header mapping table includes a static header mapping table and a dynamic header mapping table; wherein, parsing the header information of each HTTP/2 data stream based on the HTTP/2 header mapping table includes: identifying the HTTP/2 data stream
  • the unparsed header information is matched in the static header mapping table to determine the parsed header information corresponding to the unparsed header information; when the unparsed header information is not included in the static header mapping table, the dynamic header information is called.
  • the mapping table determines the parsed header information corresponding to the unparsed header information.
  • the static header mapping table contains common header names and common combinations of header names and values.
  • the header names and values form header key-value pairs, as shown in Table 1, which are usually preset on both sides of the connection (such as client and server).
  • the dynamic header mapping table can dynamically add content. For example, the client sends the request information to the server to add cookie:xxxxxxx to the dynamic header mapping table, so that the client and the server can represent the entire key-value pair with one character.
  • the unparsed header information in the HTTP/2 data stream is first matched in the static header mapping table, and if the matching fails, the dynamic header mapping table is used for matching.
  • the static header mapping table includes multiple header key-value pairs and multiple header names, as shown in Table 1.
  • Table 1 After receiving the header information of the HTTP/2 data stream, if the entire header key-value pair exists in the static header mapping table, you can directly query the header key value according to the index value, such as decoding the header information of the HTTP/2 data stream.
  • the index value is 2
  • the corresponding :method:GET can be obtained by querying the static header mapping table, and the resource identified by the Request-URI is requested.
  • the index value is 3 and the corresponding :method can be obtained by querying the static header mapping table.
  • the header name can be queried according to the index value, and after decoding the header value, it is used as the The value corresponding to the header name, add the decoded header key-value pair to the dynamic header mapping table, so that after receiving the header information, the header key-value pair can be directly determined according to the index value in the dynamic header mapping table, Exemplarily, the index value is 32 (100000), the cookie can be obtained by querying the static dictionary, the header value uses Huffman encoding, and the length is 28 (0011100); the next 28 bytes are the value of the cookie, The value corresponding to the cookie can be obtained by Huffman decoding, and the header name and corresponding value are added to the dynamic header mapping table.
  • Index Header Name Header Value 1 authority 2 :method GET 3 :method POST 4 :path / 5 :path /index.html 6 :scheme http 7 :scheme https ... ... ... 32 cookies ... ... ... ...
  • each HTTP/2 data stream includes a header frame, and a priority frame, a ping frame, or at least one message body frame.
  • a certain HTTP/2 data stream in the transmission between the client and the server includes one header frame and two message body frames
  • the parsed header information and two message body frames are stored in the HTTP/2 data stream. in the stream object structure.
  • the TCP data stream by obtaining the TCP data stream, it is determined whether the TCP data stream contains the HTTP/2 protocol based on the protocol identifier in the TCP data stream; in the case of determining that the TCP data stream contains the HTTP/2 protocol, according to the HTTP/2 protocol
  • the stream identifier in the /2 protocol determines at least one HTTP/2 data stream in the TCP data stream, and constructs a corresponding HTTP/2 stream object structure based on the type of the HTTP/2 data stream, thereby obtaining multiple HTTP/2 stream object structures; And parse the header information of each HTTP/2 data stream based on the HTTP/2 header mapping table, extract the data content corresponding to the header information in the HTTP/2 data stream, and parse the header of each HTTP/2 data stream.
  • Information and data content are stored in the corresponding HTTP/2 stream object structure, thereby realizing the analysis and storage of HTTP/2 data streams. Matching provides the data base.
  • the method further includes: receiving a matching rule for data detection, and assigning the matching rule to each HTTP/2 stream respectively.
  • the information stored in each HTTP/2 stream object structure is matched to determine the detection result.
  • the matching rules for data detection refer to rules determined according to actual business requirements, which can be used to filter the information stored in each HTTP/2 flow object structure in at least one HTTP/2 flow object structure, and obtain a matching rule that meets the business requirements. information, use the filtering result as the detection result, and then call the detection result back to the external application, so that the external application can display or count the detection result.
  • the matching rule includes at least one of the following: preset header information length, preset data content length, and preset field content.
  • the preset header information length and the preset data content length can filter out the information stored in the HTTP/2 stream object structure with fixed header length and data body length.
  • the content of the preset field can be a preset string, and the preset string can be a characteristic string of sensitive behaviors.
  • the information stored in the matched HTTP/2 stream object structure is determined as sensitive behaviors to realize bad network content or hackers. Detection of network attack behavior; preset strings can also be characteristic strings corresponding to various application types, which are used to identify various application types, such as QQ, WeChat or Taobao, etc.
  • the characteristic strings corresponding to various application types are stored in Background feature database, so that after matching the storage information in at least one HTTP/2 stream object structure with the feature string, the detection results of each application type can be obtained, and the statistics of traffic flow and business proportion can be realized;
  • the preset string also It can be a characteristic string of an area such as the World Wide Web or a wireless network or a character string representing the time.
  • the detection result is determined by matching at least one of the preset header information length, the preset data content length, and the preset field content with the information stored in at least one HTTP/2 stream object structure, respectively. Filtering of stored information in the HTTP/2 stream object structure with fixed header length and data body length, detection of bad network content or hacker network attack behavior, statistics of traffic flow and business proportion, and hotspot areas of HTTP/2 protocol and peak hour detection.
  • FIG. 2 is a schematic flowchart of a deep packet processing method provided in Embodiment 2 of the present application.
  • this embodiment adds reorganization of IP fragmented packets, and extracts the reorganized IP packets.
  • the bearer data is obtained, the TCP data stream is obtained, and the TCP data stream is sequenced and deduplicated.
  • the deep packet processing method provided by this embodiment includes:
  • S210 Receive the IP fragmentation message in the underlying protocol information, and reassemble the IP fragmentation message; extract the bearer data of the recombined IP message to obtain a TCP data stream.
  • the underlying protocol refers to the corresponding protocols including the network layer and below the network layer, such as MAC (Media Access Control Address) protocol, VLAN (Virtual Local Area Network, virtual local area network) protocol, MPLS (Multi-Protocol) Label Switching, multi-protocol label transformation) protocol or IP (Internet Protocol, Internet Protocol), etc.
  • MAC Media Access Control Address
  • VLAN Virtual Local Area Network, virtual local area network
  • MPLS Multi-Protocol Label Switching, multi-protocol label transformation
  • IP Internet Protocol, Internet Protocol
  • the underlying protocol is the MAC protocol, the VLAN protocol, or the MPLS protocol
  • the MAC protocol or the VLAN protocol needs to be decapsulated to obtain the IP fragmented packet.
  • the header of the IP fragmentation packet includes fields related to fragmentation, such as: Identification, which is used to confirm whether different fragments belong to the same IP packet; Flags, when the MF in Flags is 1, it means that there are still Fragment, this fragment is an intermediate fragment; Fragment Offset, which represents the offset address of this fragment in the entire packet. Therefore, according to the header information of multiple IP fragmented packets, a complete IP packet can be reassembled.
  • the data body part of the complete IP packet contains the TCP packet. By extracting the data body part of multiple complete IP packets, we can get Multiple TCP packets, that is, TCP data streams.
  • S220 sort and deduplicate the TCP data stream according to the sequence number of the TCP data stream; for the sorted and deduplicated TCP data stream, release the TCP data based on the control information flag bit, wherein the control information flag for releasing Bits include disconnection and connection reset.
  • the header of each TCP packet in the TCP data stream contains a sequence number, and the sequence number is the sequence number of the first byte of the data group sent by the TCP packet, which ensures the transmission of multiple TCP packets in the TCP data stream.
  • the order liness.
  • the sequence number of a TCP packet is 300
  • the data portion of the TCP packet has a total of 100 bytes
  • the sequence number of the next TCP packet is 400.
  • the TCP packets can be sorted and deduplicated according to the sequence number to discard redundant TCP packets.
  • the header of each TCP message in the TCP data stream contains control information flag bits, which are used to characterize the properties of each TCP message, such as RST, connection reset flag, which is used to reset the host crash or other reasons. Error connection, or used to reject illegal message segments and connection requests; FIN, connection disconnection flag, when FIN is 1, it means that the sender's data has been sent, and the data stream of the sender is closed; ACK; Confirmation sequence number flag, When it is 1, it means that the confirmation number is valid, and when it is 0, it means that there is no confirmation information in the message, and the confirmation number field is ignored.
  • control information flag bit of the TCP packet is connection disconnection or connection reset
  • the TCP packet is released, and the HTTP/2 data stream processing is not performed on the TCP packet.
  • a TCP data flow maintenance table is established, and the sorting and deduplication, as well as the TCP packets in the released TCP data flow, are stored in the TCP data flow maintenance table, so as to improve the HTTP/2 data flow for the TCP data flow. processing speed.
  • S230 Acquire the TCP data stream, and determine whether the TCP data stream includes the HTTP/2 protocol based on the protocol identifier in the TCP data stream.
  • HTTP/2 protocol is included in the TCP data stream, determine at least one HTTP/2 data stream in the TCP data stream according to the stream identifier in the HTTP/2 protocol, and construct a corresponding HTTP/2 data stream based on the type of the HTTP/2 data stream /2 Stream object structure.
  • the IP fragment packets in the underlying protocol information are first received, and the IP fragment packets are reorganized; and then the reorganized IP packets are extracted.
  • the bearer data of the text is obtained, and the TCP data stream is obtained, and the TCP data stream is sorted and deduplicated according to the sequence number of the TCP data stream;
  • HTTP/2 data stream processing is performed on the preprocessed TCP data stream, thereby reducing the HTTP/2 data stream processing of redundant TCP data streams and improving the processing efficiency.
  • FIG. 3 is a schematic structural diagram of a deep message processing apparatus according to Embodiment 3 of the present application. This embodiment is applicable to the need to parse HTTP/2 data streams and store data information of at least one HTTP/2 data stream. , to statistically analyze the situation of a large amount of HTTP/2 data stream data information, the apparatus includes: a protocol determination module 310 , a structure construction module 320 , a content extraction module 330 and a data storage module 340 .
  • the protocol determination module 310 is configured to obtain the TCP data stream, and determines whether the TCP data stream contains the HTTP/2 protocol based on the protocol identifier in the TCP data stream;
  • the structure building module 320 is configured to determine at least one HTTP/2 data stream in the TCP data stream according to the stream identifier in the HTTP/2 protocol when the HTTP/2 protocol is included in the TCP data stream, and based on the HTTP/2 data stream Type to build the corresponding HTTP/2 stream object structure;
  • the content extraction module 330 is configured to parse the header information of each HTTP/2 data stream based on the HTTP/2 header mapping table, and extract the data content corresponding to the header information in the HTTP/2 data stream;
  • the data storage module 340 is configured to store the parsed header information and data content of each HTTP/2 data stream into a corresponding HTTP/2 stream object structure.
  • the TCP data stream is obtained by the protocol determination module, and based on the protocol identifier in the TCP data stream, it is determined whether the TCP data stream contains the HTTP/2 protocol; based on the structure construction module, when the TCP data stream contains the HTTP/2 protocol , determine at least one HTTP/2 data stream in the TCP data stream according to the stream identifier in the HTTP/2 protocol, and construct the corresponding HTTP/2 stream object structure based on the type of HTTP/2 data stream through the structure building module, so as to obtain multiple A HTTP/2 stream object structure; and the content extraction module is used to parse the header information of each HTTP/2 data stream based on the HTTP/2 header mapping table, and extract the data content corresponding to the header information in the HTTP/2 data stream , through the data storage module, the parsed header information and data content of each HTTP/2 data stream are stored in the corresponding HTTP/2 stream object structure, thereby realizing the parsing and storage of HTTP/2 data streams.
  • Each HTTP/2 data stream is stored in the corresponding structure, which provides a data basis for matching preset detection
  • the preprocessing module includes a message reorganization unit and a message sorting unit, wherein,
  • the packet reorganization unit is configured to receive the IP fragmentation packets in the underlying protocol information, and reorganize the IP fragmentation packets; extract the bearer data of the reorganized IP packets to obtain the TCP data stream;
  • the packet sorting unit is set to sort and deduplicate the TCP data stream according to the sequence number of the TCP data stream; for the sorted and deduplicated TCP data stream, the TCP data is released based on the control information flag bit, wherein, for The released control information flags include disconnection and connection reset.
  • the HTTP/2 header mapping table includes a static header mapping table and a dynamic header mapping table
  • the content extraction module 330 is further configured to identify unparsed header information in the HTTP/2 data stream, and store the unparsed header information in the static header mapping table. Match in the parsed header to determine the parsed header information corresponding to the unparsed header information; when the static header mapping table does not include unparsed header information, call the dynamic header mapping table to determine the parsed header information corresponding to the unparsed header information.
  • the above-mentioned device further includes: an information matching module, configured to receive after the data storage module 340 stores the parsed header information and data content of each HTTP/2 data stream in the corresponding HTTP/2 stream object structure, Matching rules for data detection, matching the matching rules with the information stored in each HTTP/2 stream object structure to determine the detection result.
  • the matching rule may include at least one of the following: preset header information length, preset data content length, and preset field content.
  • the protocol determination module 310 is further configured to add the TCP data stream to the blacklist when the TCP data stream does not contain the HTTP/2 protocol, so that the HTTP/2 data stream processing is not performed during the life cycle of the TCP data stream. .
  • the deep packet processing apparatus provided by the embodiment of the present application can execute the deep packet processing method provided by any embodiment of the present application, and has functional modules corresponding to the execution method.
  • FIG. 4 is a schematic structural diagram of an electronic device according to Embodiment 4 of the present application.
  • FIG. 4 shows a block diagram of an exemplary electronic device 40 suitable for implementing embodiments of embodiments of the present application.
  • the electronic device 40 shown in FIG. 4 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
  • electronic device 40 takes the form of a general-purpose computing device.
  • Components of the electronic device 40 may include, but are not limited to, at least one processor or processing unit 401, a system memory 402, and a bus 403 connecting different system components (including the system memory 402 and the processing unit 401).
  • Bus 403 represents at least one of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards Association) Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.
  • Electronic device 40 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 40, including both volatile and non-volatile media, removable and non-removable media.
  • System memory 402 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 404 and/or cache memory 405 .
  • Electronic device 40 may include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 406 may be configured to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive").
  • a magnetic disk drive for reading and writing to removable non-volatile magnetic disks (eg "floppy disks") and removable non-volatile optical disks (eg Compact Disc-Read only) may be provided.
  • Memory 402 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.
  • Program modules 407 generally perform the functions and/or methods of the embodiments described herein.
  • the electronic device 40 may also communicate with at least one external device 409 (eg, a keyboard, pointing device, display 410, etc.), may also communicate with at least one device that enables a user to interact with the electronic device 40, and/or communicate with the electronic device 40 communicates with any device (eg, network card, modem, etc.) capable of communicating with at least one other computing device. Such communication may take place through an input/output (I/O) interface 411 . And, the electronic device 40 can also communicate with at least one network (such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) through the network adapter 412. As shown, network adapter 412 communicates with other modules of electronic device 40 via bus 403 .
  • LAN Local Area Network
  • WAN Wide Area Network
  • public network such as the Internet
  • the processing unit 401 executes various functional applications and data processing by running the program stored in the system memory 402, for example, implements a deep packet processing method provided by the embodiment of the present invention, and the method includes:
  • TCP data stream Obtain the TCP data stream, and determine whether the TCP data stream contains the HTTP/2 protocol based on the protocol identifier in the TCP data stream;
  • At least one HTTP/2 data stream in the TCP data stream is determined according to the stream identifier in the HTTP/2 protocol, and a corresponding HTTP/2 data stream is constructed based on the type of the HTTP/2 data stream.
  • HTTP/2 stream object structure
  • the parsed header information and data content of each HTTP/2 data stream are stored in the corresponding HTTP/2 stream object structure.
  • processor can also implement the technical solution of the deep packet processing method provided by any embodiment of the present application.
  • Embodiment 5 of the present application further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are used to execute a deep packet processing method when executed by a computer processor.
  • This embodiment provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the deep packet processing method provided by any embodiment of the present application, and the method includes:
  • TCP data stream Obtain the TCP data stream, and determine whether the TCP data stream contains the HTTP/2 protocol based on the protocol identifier in the TCP data stream;
  • At least one HTTP/2 data stream in the TCP data stream is determined according to the stream identifier in the HTTP/2 protocol, and a corresponding HTTP/2 data stream is constructed based on the type of the HTTP/2 data stream.
  • HTTP/2 stream object structure
  • the parsed header information and data content of each HTTP/2 data stream are stored in the corresponding HTTP/2 stream object structure.
  • the computer storage medium of the embodiments of the present application may adopt any combination of at least one computer-readable medium.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above.
  • a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium, including but not limited to wireless, wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the foregoing.
  • RF radio frequency
  • Computer program code for performing the operations of the embodiments of the present application may be written in at least one programming language, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

本申请公开了一种深度报文处理方法、装置、电子设备及存储介质。该方法包括:通过获取TCP数据流,基于TCP数据流中的协议标识确定TCP数据流中是否包含HTTP/2协议;在确定TCP数据流中包含HTTP/2协议的情况下,根据HTTP/2协议中的流标识符确定TCP数据流中至少一个HTTP/2数据流,基于HTTP/2数据流的类型构建对应的HTTP/2流对象结构,从而得到多个HTTP/2流对象结构;并基于HTTP/2头映射表解析每个HTTP/2数据流的头信息,并提取HTTP/2数据流中与所述头信息对应的数据内容,将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中。

Description

深度报文处理方法、装置、电子设备及存储介质
本申请要求在2020年10月28日提交中国专利局、申请号为202011173763.8的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及通信技术领域,例如涉及一种深度报文处理方法、装置、电子设备及存储介质。
背景技术
为了协助运营商对网络流量的管控,DPI(Deep Packet Inspection,深度报文检测)方法在网络中被广泛应用。其中的“深度”是和普通报文分析层次相比较而言的,普通报文检测仅分析网际互联协议(Internet Protocol,IP)报文四层以下内容,包括源IP地址、目的IP地址、传输层源端口、传输层目的端口和承载协议类型,而DPI技术基于从二层到七层网络协议的分析,能够实现对网络中数据的精准感知,从而实现对网络现状的精准把握,如业务识别、业务统计、流量管控和网元分析等。
然而,DPI的本质是一种数据报文过滤技术,需要先对应用层有效载荷进行解析,从而根据业务需求对解析后的信息进行匹配过滤。因此,解析出应用层有效载荷信息至关重要。
发明内容
本申请提供一种深度报文处理方法、装置、电子设备及存储介质,以实现针对HTTP/2协议的报文处理,从而为预设检测规则的匹配提供了数据基础。
第一方面,本申请实施例提供了一种深度报文处理方法,包括:
获取TCP数据流,基于所述TCP数据流中的协议标识确定所述TCP数据流 中是否包含HTTP/2协议;
在确定TCP数据流中包含HTTP/2协议的情况下,根据所述HTTP/2协议中的流标识符确定所述TCP数据流中至少一个HTTP/2数据流,并基于所述HTTP/2数据流的类型构建对应的HTTP/2流对象结构;
基于HTTP/2头映射表解析每个所述HTTP/2数据流的头信息,并提取所述HTTP/2数据流中与所述头信息对应的数据内容;
将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中。
第二方面,本申请实施例还提供了一种深度报文处理装置,该装置包括:
协议确定模块,设置为获取TCP数据流,基于所述TCP数据流中的协议标识确定所述TCP数据流中是否包含HTTP/2协议;
结构构建模块,设置为在所述TCP数据流中包含HTTP/2协议的情况下,根据所述HTTP/2协议中的流标识符确定所述TCP数据流中至少一个HTTP/2数据流,并基于所述HTTP/2数据流的类型构建对应的HTTP/2流对象结构;
内容提取模块,设置为基于HTTP/2头映射表解析每个所述HTTP/2数据流的头信息,并提取所述HTTP/2数据流中与所述头信息对应的数据内容;
数据存储模块,设置为将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中。
第三方面,本申请实施例还提供了一种电子设备,所述电子设备包括:
至少一个处理器;
存储装置,设置为存储至少一个程序,
当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如本申请实施例提供的深度报文处理方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请实施例提供的深度报文处理方法。
附图说明
图1为本申请实施例一所提供的一种深度报文处理方法的流程示意图;
图2为本申请实施例二所提供的一种深度报文处理方法的流程示意图;
图3为本申请实施例三所提供的一种深度报文处理装置的结构示意图;
图4为本申请实施例四所提供的一种电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作详细说明。
实施例一
图1为本申请实施例一提供的深度报文处理方法的流程示意图,本实施例可适用于需要解析超文本传输协议2.0(Hyper Text Transfer Protocol 2.0,HTTP/2)数据流,并对至少一个的HTTP/2数据流的数据信息进行存储,以统计分析大量HTTP/2数据流数据信息的情形,该方法可以由深度报文处理装置来执行,该装置可以由硬件和/或软件来实现,该方法包括如下步骤:
S110、获取TCP数据流,基于TCP数据流中的协议标识确定TCP数据流中是否包含HTTP/2协议。
其中,TCP(Transmission Control Protocol,传输控制协议)提供一种面向连接的字节流服务,两个使用TCP的应用(如客户端和服务器)在交换数据包之前,需要建立一个TCP连接,在该连接上传输的数据则为TCP数据流,每个TCP数据流中包含至少一个TCP报文,每个TCP报文包括报文头部分和有效承载的数据体部分。TCP数据流可以从传输层中直接获取,或从链路层、网络层等底层数据包中解封装获得,其中,解封装指自下而上的逐层去掉头部或尾部的过程。
在本实施例中,HTTP/2协议指HTTP/2协议报文帧。协议标识指TCP数 据流中每个TCP报文的数据体部分中所包含的协议信息,其中,协议是指诸如HTTP协议、文件传输协议(File Transfer Protocol,FTP)协议或HTTP/2等应用层的协议,协议信息指在应用层上发送的至少一个报文帧的组合,其中,应用层协议在交互过程中产生的至少一个报文帧,由传输层根据帧头的标志进行关联和组装,并封装至TCP报文的数据体部分中,因此,可以从TCP报文的数据体部分中确定应用层协议信息。不同的应用类型通常依赖不同的应用层协议,而每个应用层协议具备相应的特征,如特定的端口、特定的字符串或者特定的Bit序列等,基于每个协议的相应特征,通过对TCP数据流中的TCP报文数据体部分的协议信息进行检测,可以确定出TCP数据流所包含的协议。
示例性的,基于TCP数据流中的协议标识确定TCP数据流中是否包含HTTP/2协议,包括:从TCP数据流中提取第一个有效承载的报文内容,对第一个有效承载的报文内容进行Magic字符串匹配,若匹配成功,则TCP数据流中包含HTTP/2协议。其中,第一个有效承载的报文内容指TCP数据流的第一个TCP报文的数据体部分。两个依赖HTTP/2协议的应用(如客户端和服务器)在传输至少一个HTTP/2报文帧时,第一个报文帧通常为Magic帧,因此,将第一个报文帧通常为Magic帧作为HTTP/2协议的特征,对第一个报文帧进行Magic字符串匹配可以判断是否为HTTP/2协议。Magic帧的内容固定为PRI*HTTP/2.0\r\n\r\nSM\r\n\r\n,因此,将TCP数据流的第一个有效承载的报文内容进行Magic字符串匹配,可以识别出HTTP/2协议。
可选的,若TCP数据流不包含HTTP/2协议,将TCP数据流添加至黑名单,以使在TCP数据流的生命周期内不进行HTTP/2数据流处理。其中,TCP数据流的生命周期是指保持TCP连接的预设时间段,若TCP数据流不包含HTTP/2协议,将TCP数据流添加至黑名单,以对保持TCP连接的预设时间段内的TCP数据流均不进行HTTP/2数据流处理。
S120、若TCP数据流中包含HTTP/2协议,则根据HTTP/2协议中的流标 识符确定TCP数据流中至少一个HTTP/2数据流,并基于HTTP/2数据流的类型构建对应的HTTP/2流对象结构。
其中,HTTP/2协议中的流标识符用于唯一标识HTTP/2的数据流,流标识符为31字节,通过流标识符可以确定出至少一个HTTP/2数据流。一个单独的HTTP/2连接能够包含多个同时打开的HTTP/2数据流,每个HTTP/2数据流可以看作一个请求,每个HTTP/2数据流上传输至少一个HTTP/2协议报文帧,不同HTTP/2数据流中的HTTP/2协议报文帧交错地发送给对方。根据帧头的标志,在传输层将多个不同HTTP/2数据流上的HTTP/2协议报文帧进行关联和组装,并封装成TCP报文。示例性的,根据HTTP/2协议中的流标识符确定TCP数据流中至少一个HTTP/2数据流的过程为:首先,从TCP数据流中确定TCP报文的数据体,其中,TCP报文的数据体由多个HTTP/2数据流的HTTP/2协议报文帧,根据帧头的标志进行关联和组装得到;然后,从TCP报文数据体中获取多个HTTP/2协议报文帧;最后,根据HTTP/2协议报文帧的流标识符确定至少一个HTTP/2数据流。示例性的,4个HTTP/2协议报文帧的流标识符分别为2、4、4、6,则构建3个HTTP/2流对象结构。
在本实施例中,每个HTTP/2流包括一个头帧——headers,用于传输HTTP/2流的额外的首部字段,还可以包括至少一个消息体帧——data,用于传输HTTP/2消息体。HTTP/2流对象结构中包括headers的存储区域和对应的data的存储区域。
S130、基于HTTP/2头映射表解析每个HTTP/2数据流的头信息,并提取HTTP/2数据流中与头信息对应的数据内容。
其中,每个HTTP/2数据流的头信息采用Huffman编码压缩以减少传输大小,由HTTP/2流中的headers携带,通过对HTTP/2流中的headers进行Huffman编码解码,再结合头映射表获取HTTP/2数据流的头信息。HTTP/2数据流的传输的两端(如客户端和服务器)需要维护同一份头映射表。
示例性的,HTTP/2头映射表包括静态头映射表和动态头映射表;其中,基于HTTP/2头映射表解析每个HTTP/2数据流的头信息,包括:识别HTTP/2数据流中未解析头信息,将未解析头信息在静态头映射表中进行匹配,确定未解析头信息对应的解析后的头信息;当静态头映射表中不包括未解析头信息时,调用动态头映射表,确定未解析头信息对应的解析后的头信息。
其中,静态头映射表包含常见的头部名称,以及常见的头部名称与值的组合,头部名称与值组成头部键值对,如表1所示,通常预先设置于连接双方(如客户端与服务器)。动态头映射表可以动态地添加内容,如客户端向服务端发送将cookie:xxxxxxx添加到动态头映射表的请求信息,以使客户端和服务器可以将整个键值对用一个字符表示。示例性的,HTTP/2数据流中未解析头信息先通过静态头映射表中进行匹配,若未匹配成功,则再通过动态头映射表进行匹配。
在本实施例中,静态头映射表中包括多个头部键值对和多个头部名称,如表1所示。当接收到HTTP/2数据流的头信息后,若整个头部键值对都存在于静态头映射表,可以根据索引值直接查询到头部键值,如HTTP/2数据流的头信息解码后索引值为2,则在静态头映射表查询可得对应:method:GET,请求获取Request-URI所标识的资源,解码后索引值为3,则在静态头映射表查询可得对应:method:POST,在Request-URI所标识的资源后附加新的数据;若只有头部名称存在于静态头映射表,可以根据索引值查询到头部名称,在将头部值进行解码后,作为该头部名称对应的值,将解码后的头部键值对添加至动态头映射表中,以使后续接收到头信息后,可以根据动态头映射表中的索引值直接确定头部键值对,示例性的,索引值为32(100000),在静态字典中查询可得cookie,头部值使用了哈夫曼编码,长度是28(0011100);接下来的28个字节为cookie的值,将其进行哈夫曼解码即可得到cookie对应的值,并将该头部名称和对应值添加至动态头映射表中。
表1
Index Header Name Header Value
1 :authority  
2 :method GET
3 :method POST
4 :path /
5 :path /index.html
6 :scheme http
7 :scheme https
…… …… ……
32 cookie  
…… …… ……
S140、将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中。
其中,每个HTTP/2数据流包含一个头帧,以及优先级帧、ping帧或至少一个消息体帧等。示例性的,客户端与服务器传输中的某个HTTP/2数据流中包括一个头帧和两个消息体帧,则将解析后的头信息和两个消息体帧存储至该HTTP/2数据流对象结构中。
本实施例的技术方案,通过获取TCP数据流,基于TCP数据流中的协议标识确定TCP数据流中是否包含HTTP/2协议;在确定TCP数据流中包含HTTP/2协议的情况下,根据HTTP/2协议中的流标识符确定TCP数据流中至少一个HTTP/2数据流,基于HTTP/2数据流的类型构建对应的HTTP/2流对象结构,从而得到多个HTTP/2流对象结构;并基于HTTP/2头映射表解析每个HTTP/2数据流的头信息,并提取HTTP/2数据流中与所述头信息对应的数据内容,将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中,从而实现了对HTTP/2数据流的解析与存储,通过将每个HTTP/2数据流存储至对应结构,为预设检测规则的匹配提供了数据基础。
可选的,在将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中之后,还包括:接收数据检测的匹配规则,将匹配规则分别与每个HTTP/2流对象结构中存储信息进行匹配,确定检测结果。
其中,数据检测的匹配规则指根据实际业务需求所确定的规则,可以用于对至少一个HTTP/2流对象结构中的每个HTTP/2流对象结构存储的信息进行过滤,得到符合业务需求的信息,将过滤结果作为检测结果,从而调用检测结果返回至外部应用,以使外部应用展示或统计检测结果。通过匹配规则分别与至少一个HTTP/2流对象结构中的每个HTTP/2流对象结构存储信息进行匹配,确定检测结果,实现了HTTP/2流对象结构中存储信息的过滤,从而得出业务需求对应的检测结果。
示例性的,匹配规则包括如下至少一项:预设头信息长度、预设数据内容长度和预设字段内容。其中,预设头信息长度和预设数据内容长度可以筛选出固定头长度和数据体长度的HTTP/2流对象结构中存储信息。预设字段内容可以是一段预设字符串,预设字符串可以是敏感行为的特征字符串,将匹配出的HTTP/2流对象结构中存储信息确定为敏感行为,实现不良网络内容或者是黑客网络攻击行为的检测;预设字符串还可以是各种应用类型对应的特征字符串,用于识别各种应用类型,如QQ、微信或淘宝等,各种应用类型对应的特征字符串存储于后台特征数据库,以使将至少一个HTTP/2流对象结构中的存储信息与特征字符串进行匹配后,得到各个应用类型的检测结果,实现流量流向和业务占比的统计;预设字符串还可以是诸如万维网或无线网等区域的特征字符串或表征时间的字符串,对匹配后的HTTP/2流对象结构中存储信息作统计,可以得到HTTP/2协议的热点区域和高峰时刻。
在本实施例中,通过预设头信息长度、预设数据内容长度和预设字段内容中的至少一项,分别与至少一个HTTP/2流对象结构中存储信息进行匹配,确定检测结果,实现了固定头长度和数据体长度的HTTP/2流对象结构中存储信 息的筛选,不良网络内容或者是黑客网络攻击行为的检测,流量流向和业务占比的统计,以及HTTP/2协议的热点区域和高峰时刻的探测。
实施例二
图2为本申请实施例二提供的深度报文处理方法中的流程示意图,本实施例在上述各实施例的基础上,增加了对IP分片报文进行重组,提取重组后的IP报文的承载数据,得到TCP数据流,并对TCP数据流进行排序和去重的步骤。其中与上述各实施例相同或相应的术语的解释在此不再赘述。参见图2,本实施例提供的深度报文处理方法包括:
S210、接收底层协议信息中的IP分片报文,对IP分片报文进行重组;提取重组后的IP报文的承载数据,得到TCP数据流。
其中,底层协议指包含网络层及网络层以下对应的协议,如MAC(Media Access Control Address,媒体存取控制位址)协议、VLAN(Virtual Local Area Network,虚拟局域网)协议、MPLS(Multi-Protocol Label Switching,多协议标签变换)协议或IP(Internet Protocol,网际互联协议)等。示例性的,若底层协议为MAC协议、VLAN协议或MPLS协议,需要对MAC协议或VLAN协议进行解封装,从而得到IP分片报文。IP分片报文的报头中,包括与分片相关的字段,如:Identification,用于确认不同的分片是否属于同一个IP报文;Flags,当Flags中的MF为1时,表示还有分片,此分片为中间分片;Fragment Offset,表示此分片在整个报文中的偏移地址。因此,根据多个IP分片报文的报头信息,可以重组出完整IP报文,完整IP报文的数据体部分包含了TCP报文,提取多个完整IP报文的数据体部分,可以得到多个TCP报文,即TCP数据流。
S220、根据TCP数据流的序列号对TCP数据流进行排序和去重;对于排序和去重后的TCP数据流,基于控制信息标志位对TCP数据进行释放,其中,用于释放的控制信息标志位包括连接断开和连接重置。
其中,TCP数据流中的每个TCP报文的首部均包含序列号,序列号是TCP报文发送的数据组的第一个字节的序号,确保了TCP数据流中多个TCP报文传输的有序性。示例性的,一个TCP报文的序列号为300,此TCP报文的数据部分共有100字节,则下一个TCP报文的序列号为400。根据序列号可以对TCP报文进行排序和去重,以丢弃冗余TCP报文。TCP数据流中的每个TCP报文的首部均包含控制信息标志位,用于表征每个TCP报文的性质,如RST,连接重置标志,用于重置由于主机崩溃或其他原因出现的错误连接,或者用于拒绝非法报文段和拒绝连接请求;FIN,连接断开标志,FIN为1时,表示发送方的数据均发送完毕,关闭本方的数据流;ACK;确认序号标志,为1时表示确认号有效,为0时表示报文中不含确认信息,忽略确认号字段。若TCP报文的控制信息标志位为连接断开或连接重置,则对该TCP报文进行释放,不对该TCP报文进行HTTP/2数据流处理。可选的,建立TCP数据流维护表,将排序和去重,以及释放后的TCP数据流中的TCP报文存储于TCP数据流维护表中,以提高对TCP数据流进行HTTP/2数据流处理的速度。
S230、获取TCP数据流,基于TCP数据流中的协议标识确定TCP数据流中是否包含HTTP/2协议。
S240、若TCP数据流中包含HTTP/2协议,则根据HTTP/2协议中的流标识符确定TCP数据流中至少一个HTTP/2数据流,并基于HTTP/2数据流的类型构建对应的HTTP/2流对象结构。
S250、基于HTTP/2头映射表解析每个HTTP/2数据流的头信息,并提取HTTP/2数据流中与头信息对应的数据内容。
S260、将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中。
S270、接收数据检测的匹配规则,将所述匹配规则分别与每个所述HTTP/2流对象结构中存储信息进行匹配,确定检测结果。
本实施例的技术方案,在对TCP数据流进行HTTP/2数据流处理之前,首先接收底层协议信息中的IP分片报文,对IP分片报文进行重组;然后提取重组后的IP报文的承载数据,得到TCP数据流,根据TCP数据流的序列号对TCP数据流进行排序和去重;对于排序和去重后的TCP数据流,基于控制信息标志位对TCP数据进行释放,实现了对TCP数据流的预处理,对预处理后的TCP数据流进行HTTP/2数据流处理,从而减少了对冗余TCP数据流的HTTP/2数据流处理,提高了处理效率。
实施例三
图3为本申请实施例三提供的一种深度报文处理装置的结构示意图,本实施例可适用于需要解析HTTP/2数据流,并对至少一个的HTTP/2数据流的数据信息进行存储,以统计分析大量HTTP/2数据流数据信息的情形,该装置包括:协议确定模块310、结构构建模块320、内容提取模块330和数据存储模块340。
协议确定模块310,设置为获取TCP数据流,基于TCP数据流中的协议标识确定TCP数据流中是否包含HTTP/2协议;
结构构建模块320,设置为在TCP数据流中包含HTTP/2协议时,根据HTTP/2协议中的流标识符确定TCP数据流中至少一个HTTP/2数据流,并基于HTTP/2数据流的类型构建对应的HTTP/2流对象结构;
内容提取模块330,设置为基于HTTP/2头映射表解析每个HTTP/2数据流的头信息,并提取HTTP/2数据流中与头信息对应的数据内容;
数据存储模块340,设置为将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中。
在本实施例中,通过协议确定模块获取TCP数据流,基于TCP数据流中的协议标识确定TCP数据流中是否包含HTTP/2协议;基于结构构建模块在 TCP数据流中包含HTTP/2协议时,根据HTTP/2协议中的流标识符确定TCP数据流中至少一个HTTP/2数据流,通过结构构建模块来基于HTTP/2数据流的类型构建对应的HTTP/2流对象结构,从而得到多个HTTP/2流对象结构;并用过内容提取模块来基于HTTP/2头映射表解析每个HTTP/2数据流的头信息,并提取HTTP/2数据流中与所述头信息对应的数据内容,通过数据存储模块将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中,从而实现了对HTTP/2数据流的解析与存储,通过将每个HTTP/2数据流存储至对应结构,为预设检测规则的匹配提供了数据基础。
可选的,还包括预处理模块,其中,预处理模块包括报文重组单元和报文排序单元,其中,
报文重组单元,设置为接收底层协议信息中的IP分片报文,对IP分片报文进行重组;提取重组后的IP报文的承载数据,得到TCP数据流;
报文排序单元,设置为根据TCP数据流的序列号对TCP数据流进行排序和去重;对于排序和去重后的TCP数据流,基于控制信息标志位对TCP数据进行释放,其中,用于释放的控制信息标志位包括连接断开和连接重置。
可选的,HTTP/2头映射表包括静态头映射表和动态头映射表,内容提取模块330还设置为识别HTTP/2数据流中未解析头信息,将未解析头信息在静态头映射表中进行匹配,确定未解析头信息对应的解析后的头信息;当静态头映射表中不包括未解析头信息时,调用动态头映射表,确定未解析头信息对应的解析后的头信息。
可选的,上述装置还包括:信息匹配模块,设置为在数据存储模块340将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中之后,接收数据检测的匹配规则,将匹配规则分别与每个HTTP/2流对象结构中存储信息进行匹配,确定检测结果。其中,匹配规则可以包括如下至少一项:预设头信息长度、预设数据内容长度和预设字段内容。
可选的,协议确定模块310还设置为在TCP数据流不包含HTTP/2协议时,将TCP数据流添加至黑名单,以使在TCP数据流的生命周期内不进行HTTP/2数据流处理。
本申请实施例所提供的深度报文处理装置可执行本申请任意实施例所提供的深度报文处理方法,具备执行方法相应的功能模块。
值得注意的是,上述系统所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请实施例的保护范围。
实施例四
图4为本申请实施例四提供的一种电子设备的结构示意图。图4示出了适于用来实现本申请实施例实施方式的示例性电子设备40的框图。图4显示的电子设备40仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图4所示,电子设备40以通用计算设备的形式表现。电子设备40的组件可以包括但不限于:至少一个处理器或者处理单元401,系统存储器402,连接不同系统组件(包括系统存储器402和处理单元401)的总线403。
总线403表示几类总线结构中的至少一种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture,ISA)总线,微通道体系结构(Micro Channel Architecture,MCA)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线。
电子设备40典型地包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备40访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器402可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory,RAM)404和/或高速缓存存储器405。电子设备40可以包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统406可以设置为读写不可移动的、非易失性磁介质(图4未显示,通常称为“硬盘驱动器”)。尽管图4中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如只读光盘(Compact Disc-Read Only Memory,CD-ROM),数字视盘(Digital Video Disc-Read Only Memory,DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过至少一个数据介质接口与总线403相连。存储器402可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请各实施例的功能。
具有一组(至少一个)程序模块407的程序/实用工具408,可以存储在例如存储器402中,这样的程序模块407包括但不限于操作系统、至少一个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块407通常执行本申请所描述的实施例中的功能和/或方法。
电子设备40也可以与至少一个外部设备409(例如键盘、指向设备、显示器410等)通信,还可与至少一个使得用户能与该电子设备40交互的设备通信,和/或与使得该电子设备40能与至少一个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(Input/Output,I/O)接口411进行。并且,电子设备40还可以通过网络适配 器412与至少一个网络(例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器412通过总线403与电子设备40的其它模块通信。应当明白,尽管图4中未示出,可以结合电子设备40使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、磁盘阵列(Redundant Arrays of Independent Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。
处理单元401通过运行存储在系统存储器402中的程序,从而执行各种功能应用以及数据处理,例如实现本发实施例所提供的一种深度报文处理方法,该方法包括:
获取TCP数据流,基于TCP数据流中的协议标识确定TCP数据流中是否包含HTTP/2协议;
在确定TCP数据流中包含HTTP/2协议的情况下,根据HTTP/2协议中的流标识符确定TCP数据流中至少一个HTTP/2数据流,并基于HTTP/2数据流的类型构建对应的HTTP/2流对象结构;
基于HTTP/2头映射表解析每个HTTP/2数据流的头信息,并提取HTTP/2数据流中与头信息对应的数据内容;
将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中。
当然,本领域技术人员可以理解,处理器还可以实现本申请任意实施例所提供的深度报文处理方法的技术方案。
实施例五
本申请实施例五还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种深度报文处理方法。
本实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请任意实施例所提供的深度报文处理方法步骤,该方法包括:
获取TCP数据流,基于TCP数据流中的协议标识确定TCP数据流中是否包含HTTP/2协议;
在确定TCP数据流中包含HTTP/2协议的情况下,根据HTTP/2协议中的流标识符确定TCP数据流中至少一个HTTP/2数据流,并基于HTTP/2数据流的类型构建对应的HTTP/2流对象结构;
基于HTTP/2头映射表解析每个HTTP/2数据流的头信息,并提取HTTP/2数据流中与头信息对应的数据内容;
将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中。
本申请实施例的计算机存储介质,可以采用至少一个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有至少一个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器((Erasable Programmable Read-Only Memory,EPROM)或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种 形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
可以以至少一种程序设计语言或其组合来编写用于执行本申请实施例操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。

Claims (10)

  1. 一种深度报文处理方法,包括:
    获取传输控制协议TCP数据流,基于所述TCP数据流中的协议标识确定所述TCP数据流中是否包含超文本传输协议2.0 HTTP/2协议;
    响应于所述TCP数据流中包含HTTP/2协议,根据所述HTTP/2协议中的流标识符确定所述TCP数据流中至少一个HTTP/2数据流,并基于所述HTTP/2数据流的类型构建对应的HTTP/2流对象结构;
    基于HTTP/2头映射表解析每个所述HTTP/2数据流的头信息,并提取所述HTTP/2数据流中与所述头信息对应的数据内容;
    将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中。
  2. 根据权利要求1所述的方法,其中,所述获取TCP数据流,包括:
    接收底层协议信息中的网际互联协议IP分片报文,对所述IP分片报文进行重组;
    提取重组后的IP报文的承载数据,得到所述TCP数据流。
  3. 根据权利要求1所述的方法,在获取TCP数据流之前,还包括:
    根据TCP数据流的序列号对TCP数据流进行排序和去重;
    对于排序和去重后的TCP数据流,基于控制信息标志位对TCP数据进行释放,其中,用于释放的控制信息标志位包括连接断开和连接重置。
  4. 根据权利要求1所述的方法,其中,所述HTTP/2头映射表包括静态头映射表和动态头映射表;
    其中,所述基于HTTP/2头映射表解析每个所述HTTP/2数据流的头信息,包括:
    识别所述HTTP/2数据流中未解析头信息,将所述未解析头信息在所述静态头映射表中进行匹配,确定所述未解析头信息对应的解析后的头信息;
    在所述静态头映射表中不包括所述未解析头信息的情况下,调用所述动态头映射表,确定所述未解析头信息对应的解析后的头信息。
  5. 根据权利要求1所述的方法,在将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中之后,还包括:
    接收数据检测的匹配规则,将所述匹配规则分别与每个所述HTTP/2流对象结构中存储信息进行匹配,确定检测结果。
  6. 根据权利要求5中所述的方法,其中,所述匹配规则包括如下至少一项:预设头信息长度、预设数据内容长度和预设字段内容。
  7. 根据权利要求1所述的方法,还包括:响应于所述TCP数据流不包含HTTP/2协议,将所述TCP数据流添加至黑名单,以使在所述TCP数据流的生命周期内不进行HTTP/2数据流处理。
  8. 一种深度报文处理装置,包括:
    协议确定模块,设置为获取传输控制协议TCP数据流,基于所述TCP数据流中的协议标识确定所述TCP数据流中是否包含超文本传输协议2.0HTTP/2协议;
    结构构建模块,设置为在所述TCP数据流中包含HTTP/2协议的情况下,根据所述HTTP/2协议中的流标识符确定所述TCP数据流中至少一个HTTP/2数据流,并基于所述HTTP/2数据流的类型构建对应的HTTP/2流对象结构;
    内容提取模块,设置为基于HTTP/2头映射表解析每个所述HTTP/2数据流的头信息,并提取所述HTTP/2数据流中与所述头信息对应的数据内容;
    数据存储模块,设置为将每个HTTP/2数据流解析后的头信息与数据内容存储至对应的HTTP/2流对象结构中。
  9. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序,
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-7中所述的深度报文处理方法。
  10. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机 程序,所述计算机程序被处理器执行时实现如权利要求1-7中所述的深度报文处理方法。
PCT/CN2021/107642 2020-10-28 2021-07-21 深度报文处理方法、装置、电子设备及存储介质 WO2022088779A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011173763.8 2020-10-28
CN202011173763.8A CN112311789B (zh) 2020-10-28 2020-10-28 深度报文处理方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022088779A1 true WO2022088779A1 (zh) 2022-05-05

Family

ID=74331575

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107642 WO2022088779A1 (zh) 2020-10-28 2021-07-21 深度报文处理方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN112311789B (zh)
WO (1) WO2022088779A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113872976A (zh) * 2021-09-29 2021-12-31 绿盟科技集团股份有限公司 一种基于http2攻击的防护方法、装置及电子设备
CN115190056A (zh) * 2022-09-08 2022-10-14 杭州海康威视数字技术股份有限公司 一种可编排的流量协议识别与解析方法、装置及设备
CN115296878A (zh) * 2022-07-27 2022-11-04 天翼云科技有限公司 一种报文检测方法、装置、电子设备及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112311789B (zh) * 2020-10-28 2023-02-28 北京锐安科技有限公司 深度报文处理方法、装置、电子设备及存储介质
CN115412538A (zh) * 2021-05-11 2022-11-29 北京字跳网络技术有限公司 网络请求信息的处理方法、装置、设备及存储介质
CN113691523B (zh) * 2021-08-20 2023-10-10 中科国昱(合肥)科技有限公司 面向实时网络流量密码应用评估方法和终端设备
CN115883682B (zh) * 2021-09-24 2024-11-08 北京中创信测科技股份有限公司 一种基于http2协议的数据传输方法和装置
CN114389863B (zh) * 2021-12-28 2024-02-13 绿盟科技集团股份有限公司 一种蜜罐交互的方法、装置、蜜罐网络、设备及存储介质
CN114553494B (zh) * 2022-01-26 2024-02-13 深圳市风云实业有限公司 一种基于数据报文的轻量级染色与检测方法及装置
CN115361334B (zh) * 2022-10-19 2023-01-31 深圳市光联世纪信息科技有限公司 基于深度包检测技术的sd-wan流量识别方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101399843A (zh) * 2007-09-27 2009-04-01 中兴通讯股份有限公司 报文深度过滤方法
US10291682B1 (en) * 2016-09-22 2019-05-14 Juniper Networks, Inc. Efficient transmission control protocol (TCP) reassembly for HTTP/2 streams
CN110636151A (zh) * 2019-10-25 2019-12-31 新华三信息安全技术有限公司 一种报文处理方法、装置、防火墙及存储介质
US20200162537A1 (en) * 2018-11-20 2020-05-21 International Business Machines Corporation Passive re-assembly of http2 fragmented segments
CN112311789A (zh) * 2020-10-28 2021-02-02 北京锐安科技有限公司 深度报文处理方法、装置、电子设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7865589B2 (en) * 2007-03-12 2011-01-04 Citrix Systems, Inc. Systems and methods for providing structured policy expressions to represent unstructured data in a network appliance
CN102882703B (zh) * 2012-08-31 2015-08-19 赛尔网络有限公司 一种基于http分析的url自动分类分级的系统及方法
US10044620B2 (en) * 2015-05-01 2018-08-07 Hughes Network Systems, Llc Multi-phase IP-flow-based classifier with domain name and HTTP header awareness
US11431677B2 (en) * 2018-01-11 2022-08-30 Nicira, Inc. Mechanisms for layer 7 context accumulation for enforcing layer 4, layer 7 and verb-based rules

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101399843A (zh) * 2007-09-27 2009-04-01 中兴通讯股份有限公司 报文深度过滤方法
US10291682B1 (en) * 2016-09-22 2019-05-14 Juniper Networks, Inc. Efficient transmission control protocol (TCP) reassembly for HTTP/2 streams
US20200162537A1 (en) * 2018-11-20 2020-05-21 International Business Machines Corporation Passive re-assembly of http2 fragmented segments
CN110636151A (zh) * 2019-10-25 2019-12-31 新华三信息安全技术有限公司 一种报文处理方法、装置、防火墙及存储介质
CN112311789A (zh) * 2020-10-28 2021-02-02 北京锐安科技有限公司 深度报文处理方法、装置、电子设备及存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113872976A (zh) * 2021-09-29 2021-12-31 绿盟科技集团股份有限公司 一种基于http2攻击的防护方法、装置及电子设备
CN113872976B (zh) * 2021-09-29 2023-06-02 绿盟科技集团股份有限公司 一种基于http2攻击的防护方法、装置及电子设备
CN115296878A (zh) * 2022-07-27 2022-11-04 天翼云科技有限公司 一种报文检测方法、装置、电子设备及存储介质
CN115296878B (zh) * 2022-07-27 2023-11-03 天翼云科技有限公司 一种报文检测方法、装置、电子设备及存储介质
CN115190056A (zh) * 2022-09-08 2022-10-14 杭州海康威视数字技术股份有限公司 一种可编排的流量协议识别与解析方法、装置及设备

Also Published As

Publication number Publication date
CN112311789B (zh) 2023-02-28
CN112311789A (zh) 2021-02-02

Similar Documents

Publication Publication Date Title
WO2022088779A1 (zh) 深度报文处理方法、装置、电子设备及存储介质
US20200120075A1 (en) Hardware-accelerated payload filtering in secure communication
US20170300595A1 (en) Data packet extraction method and apparatus
US8009672B2 (en) Apparatus and method of splitting a data stream over multiple transport control protocol/internet protocol (TCP/IP) connections
US9294589B2 (en) Header compression with a code book
KR100997182B1 (ko) 플로우 정보 제한장치 및 방법
US20030039249A1 (en) Method and system for efficient layer 3-layer 7 routing of internet protocol ("IP") fragments
US8311039B2 (en) Traffic information aggregating apparatus
WO2016082371A1 (zh) 一种基于ssh协议的会话解析方法及系统
CN114157502B (zh) 一种终端识别方法、装置、电子设备及存储介质
TWI745034B (zh) 封包聚合及解聚合方法
WO2014135038A1 (zh) 基于pcie总线的报文传输方法与装置
CN110650018A (zh) 一种报文防篡改方法和装置
CN109525495B (zh) 一种数据处理装置、方法和fpga板卡
CN112671771B (zh) 数据传输方法、装置、电子设备及介质
US7991008B2 (en) Method for identifying the transmission control protocol stack of a connection
CN112436998A (zh) 一种数据传输方法及电子设备
CN108460044B (zh) 数据的处理方法和装置
WO2022100581A1 (zh) Ipfix消息的处理方法、存储介质、网络交换芯片及asic芯片
WO2002051077A1 (en) A method and system for distinguishing higher layer protocols of the internet traffic
CN105635182B (zh) 一种数据压缩传输方法及系统
CN108881124B (zh) 在模块间实现高性能通信的方法、系统、存储介质及设备
CN112491662A (zh) 一种icmp隐蔽隧道检测方法及装置
Lukashin et al. Distributed packet trace processing method for information security analysis
CN115801927A (zh) 报文解析方法及装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21884515

Country of ref document: EP

Kind code of ref document: A1