CN116974983A

CN116974983A - Data processing method, device, computer readable medium and electronic equipment

Info

Publication number: CN116974983A
Application number: CN202210449462.6A
Authority: CN
Inventors: 刘国旭
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2023-10-31

Abstract

The application belongs to the field of data processing, and relates to a data processing method, a data processing device, a computer readable medium and electronic equipment. The method comprises the following steps: responding to a data checking request, acquiring a first data copy corresponding to a source node and a second data copy corresponding to a destination node, wherein the first data copy comprises first primary key information and a first version number corresponding to each piece of data in the source node, and the second data copy comprises second primary key information and a second version number corresponding to each piece of data in the destination node; and comparing the first primary key information and the first version number with the second primary key information and the second version number respectively to obtain a data checking result, and executing target operation according to the data checking result. The application can realize the system locking-free data check, reduces the occupation of system resources, has less data volume for data check and improves the efficiency of data check.

Description

Data processing method, device, computer readable medium and electronic equipment

Technical Field

The application belongs to the technical field of data processing, and particularly relates to a data processing method, a data processing device, a computer readable medium and electronic equipment.

Background

For an online service system, a large amount of data is transmitted between nodes of the system at any time, middleware such as Kafka and Redis can be passed through in the data transmission process, and if an abnormality occurs in one link, the data of a source end and a destination end are inconsistent, so that the operation of the system is inevitably influenced. Therefore, data check between nodes is necessary, however, the data in the system is not changed at any time, and in the dynamic case, how to perform data check becomes a difficulty.

At present, data verification is mainly performed in a locking mode, firstly, a verification request is sent to a source terminal which wants to perform data verification, after the source terminal receives the verification request, new data is received from upstream in a pause mode, the existing data is returned after being transmitted to a destination terminal, then the data are obtained from the source terminal and the destination terminal for verification, and finally, the data with difference are sent to the destination terminal to complete verification. For a system with large data volume, the data transmission in the system is suspended for data verification, so that a large amount of resources are occupied, normal data transmission is affected, and smooth operation of user service is further affected.

Disclosure of Invention

The application provides a data processing method, a data processing device, a computer readable medium and electronic equipment, which can solve the problem that locking is needed to suspend data transmission when data check is performed under the condition of dynamic change of system data in the related technology.

Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.

In a first aspect, an embodiment of the present application provides a data processing method, including: responding to a data checking request, acquiring a first data copy corresponding to a source node and a second data copy corresponding to a destination node, wherein the first data copy comprises first primary key information and a first version number corresponding to each piece of data in the source node, and the second data copy comprises second primary key information and a second version number corresponding to each piece of data in the destination node; and comparing the first primary key information and the first version number with the second primary key information and the second version number respectively to obtain a data checking result, and executing target operation according to the data checking result.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including: the copy acquisition module is used for responding to a data check request, acquiring a first data copy corresponding to a source node and a second data copy corresponding to a destination node, wherein the first data copy comprises first primary key information and a first version number corresponding to each piece of data in the source node, and the second data copy comprises second primary key information and a second version number corresponding to each piece of data in the destination node; and the data comparison module is used for comparing the first primary key information and the first version number with the second primary key information and the second version number respectively so as to obtain a data checking result and executing target operation according to the data checking result.

In a third aspect, embodiments of the present application provide a computer-readable medium, on which a computer program is stored, which computer program, when being executed by a processor, implements a data processing method as in the above technical solutions.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the data processing method as in the above technical solution via execution of the executable instructions.

In a fifth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the above-described data processing method.

According to the data processing method provided by the embodiment of the application, on one hand, after a data checking request is responded, a first data copy corresponding to a source node and a second data copy corresponding to a destination node can be obtained, and by comparing the first primary key information in the first data copy with the second primary key information and the second version number in the second data copy, whether the data sent by the source node and the data received by the destination node are identical or not is judged, and because the data copy is adopted, the data transmission is not required to be stopped, and the line for transmitting the data is not required to be locked, so that the system is free of locking data checking, and the occupation of system resources is reduced; on the other hand, the first data copy and the second data copy are formed only according to the primary key information and the version number in each row of data, so that the data quantity adopted in data checking is reduced, and the data checking efficiency is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

Fig. 1 schematically shows a block diagram of a system to which the technical solution of the application is applied in one embodiment.

Fig. 2 schematically shows a flow chart of data verification in a locking manner in the related art.

Figure 3 schematically shows a flow diagram of the steps of a data processing method in one embodiment.

FIG. 4 schematically illustrates a flow diagram for acquiring a first copy of data and a second copy of data in one embodiment.

Fig. 5 schematically illustrates a flow diagram for obtaining a data verification result in one embodiment.

FIG. 6 schematically illustrates a flow diagram of performing a target operation based on data verification results in one embodiment.

FIG. 7 schematically illustrates an interface diagram of a Redis-based distributed lock in one embodiment.

FIG. 8 schematically illustrates a flow diagram for data verification and data repair in one embodiment.

Fig. 9 schematically shows a block diagram of the structure of the data processing apparatus in one embodiment.

FIG. 10 schematically illustrates a block diagram of a computer system suitable for use in implementing an embodiment of the application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

In order to facilitate understanding of the technical solution of the present application, technical terms related to the present application are explained herein.

1. Multi-version concurrency control (Multiversion concurrency control, MCC or MVCC) is a common concurrency control for database management systems, and is also used for realizing transactional memory in programming languages. MVCC is intended to address multiple, long-term read and dead write issues caused by read-write locks, where the data item read by each transaction is a historical snapshot and depends on the level of isolation implemented. The write operation does not overwrite an existing data item, but creates a new version that does not become a courseware until all operations commit. Snapshot isolation allows a transaction to see the data state at its start-up.

MVCC mainly solves the following problems:

1. the read-write is not blocked, namely the read-write is not blocked, and the write is not blocked, so that the concurrent processing capacity of the transaction can be improved. The evolution thought for improving concurrency is as follows: the common lock can only be executed in series; the read-write lock can realize read-write concurrency; and the data multi-version concurrency control can realize read-write concurrency.

2. The probability of deadlock is reduced, the MVCC adopts an optimistic lock mode, locking is not needed when data is read, and only necessary data lines are locked for write operation.

3. The problem of consistent reads, also known as snapshot reads, is solved, when querying a snapshot of a database at a point in time, only the results of a transaction commit update before that point in time can be seen, and the results of a transaction commit update after that point in time cannot be seen.

2. Copy of data: and carrying out consistency reading on each node of the data flow in the system to generate a snapshot, and extracting data generated by the primary key information and the version number from the snapshot.

3. Primary key information: the information that can distinguish the data lines may be, for example, a node number, a node name, or the like.

4. Version number: the method is used for indicating the update condition of data in a data line, the version number is a globally self-increasing version number, any one or more data in the data line is changed, and the version number is self-increased.

5. Source node, destination node: and determining according to the data flow direction, wherein the node for transmitting data is a source node and the node for receiving data is a destination node in the two adjacent nodes.

Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the technical solution of the present application is applied.

As shown in fig. 1, system architecture 100 may include a terminal device 101, a network 102, and a server 103. The terminal device 101 may include various electronic devices such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart television, and a smart car terminal. The server 103 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. Network 102 is a communication medium of various connection types capable of providing a communication link between terminal device 101 and server 103, and may be, for example, a wired communication link or a wireless communication link.

The system architecture in embodiments of the present application may have any number of terminal devices, networks, and servers, as desired for implementation. For example, the server may be a server group composed of a plurality of server devices. In addition, the technical solution provided in the embodiment of the present application may be applied to the server 103, or may be applied to the terminal device 101, or may be implemented by both the terminal device 101 and the server 103, which is not limited in particular.

In one embodiment of the present application, the user performs a service operation through the terminal device 101, and sends a service request to the server 103 through the network 102, so that the server 103 processes the service request, and returns a service processing result, where the service may specifically be online shopping, participating in commodity second deactivation, online reservation registration, and so on. In the process of service processing, a service system comprises a plurality of nodes, service data can circulate among all nodes in the service system, in order to ensure that service requests submitted by users are processed correctly, abnormal conditions such as lack, loss, error transmission and the like do not occur in the process of data transmission, and correspondingly, operation and maintenance service personnel need to check the data timely or when the system is found abnormal, or set data checking service intervals in the system to check the data in the system. When data checking is performed, data between any two adjacent nodes in each service line included in the service system can be checked, wherein a node for sending the data is a source node, a node for receiving the data is a destination node, then a multi-version concurrency control mechanism is adopted to perform consistency reading on the data of the source node and the destination node so as to obtain a first data copy corresponding to the source node and a second data copy corresponding to the destination node, and then the data in the first data copy and the second data copy are compared so as to obtain a data checking result, and target operation is performed according to the data checking result. The data checking result is divided into two types, namely abnormal data transmission and normal data transmission, if the data transmission is normal, no processing is needed, the next data checking is waited, and if the data transmission is abnormal, the data in the destination node is needed to be repaired according to the correct data in the source node.

In an embodiment of the present application, a data checking service for performing data checking may be set up in the server 103, or may be set up in the terminal device 101, and when a triggering condition of data checking is satisfied, the data checking service may be invoked to check the data of the source and destination nodes to be checked. In the embodiment of the application, the data checking service is an external service, and the influence on other services and data transmission in the system can be avoided by independently setting the data checking service, and the workload of modifying the system code is reduced.

Further, when the first data copy and the second data copy are formed, the primary key information and the version number can be generated according to the primary key information in each row of data, wherein the primary key information is the data which can distinguish each row of data, the version number is used for indicating how many times the data in the data row are changed, under normal conditions, the version number corresponding to a certain primary key information in the source node is the same as the version number corresponding to the primary key information in the destination node, and if the primary key information is different or missing, the data transmission process is abnormal.

In one embodiment of the present application, a cloud server that provides cloud computing services may be used to execute the data processing method in the present application, and accordingly, the technical solution of the present application relates to cloud computing and cloud storage in cloud technology.

Cloud computing (clouding) is a computing model that distributes computing tasks across a large pool of computers, enabling various application systems to acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed.

As a basic capability provider of cloud computing, a cloud computing resource pool (cloud platform for short, generally referred to as IaaS (Infrastructure as a Service, infrastructure as a service) platform) is established, in which multiple types of virtual resources are deployed for external clients to select for use.

According to the logic function division, a PaaS (Platform as a Service ) layer can be deployed on an IaaS (Infrastructure as a Service ) layer, and a SaaS (Software as a Service, software as a service) layer can be deployed above the PaaS layer, or the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, web container, etc. SaaS is a wide variety of business software such as web portals, sms mass senders, etc. Generally, saaS and PaaS are upper layers relative to IaaS.

Cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside.

At present, the storage method of the storage system is as follows: when creating logical volumes, each logical volume is allocated a physical storage space, which may be a disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as a data Identification (ID) and the like, the file system writes each object into a physical storage space of the logical volume, and the file system records storage position information of each object, so that when the client requests to access the data, the file system can enable the client to access the data according to the storage position information of each object.

The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided into stripes in advance according to the set of capacity measures for objects stored on a logical volume (which measures tend to have a large margin with respect to the capacity of the object actually to be stored) and redundant array of independent disks (RAID, redundant Array of Independent Disk), and a logical volume can be understood as a stripe, whereby physical storage space is allocated for the logical volume.

Fig. 2 is a schematic flow chart of data checking by locking in the related art, as shown in fig. 2, the flow direction of service data is Node (Node) 1→node 2→node 3, when operation and maintenance service personnel want to check the data between Node 2 and Node 3, a data checking request is sent to Node 2 first; after Node 2 receives the data check request, it pauses receiving new data from Node 1 and waits for the existing data to complete transmission to Node 3 and returns; and the operation and maintenance service personnel acquire data from the Node 2 and the Node 3 respectively, start to check, and finally send the data with difference to the Node 3 to finish the check.

Although the scheme shown in fig. 2 has little influence on a system with small data volume, for a system with large data volume, locking pauses data transmission between two nodes to perform data check, which occupies a large amount of resources to influence normal data transmission and further seriously affects user service.

In view of the problems in the related art, the following describes in detail, with reference to specific embodiments, a data processing method, a data processing apparatus, a computer readable medium, an electronic device, and other technical solutions provided in the present application.

Fig. 3 schematically shows a flow chart of steps of a data processing method according to an embodiment of the present application, which may be performed by a data checking service provided in a server, which may specifically be the server 103 in fig. 1, but may also be performed by a data checking service provided in the terminal device 101. As shown in fig. 3, the data processing method in the embodiment of the present application may mainly include the following S310 to S320.

S310: responding to a data checking request, acquiring a first data copy corresponding to a source node and a second data copy corresponding to a destination node, wherein the first data copy comprises first primary key information and a first version number corresponding to each piece of data in the source node, and the second data copy comprises second primary key information and a second version number corresponding to each piece of data in the destination node;

s320: and comparing the first primary key information and the first version number with the second primary key information and the second version number respectively to obtain a data checking result, and executing target operation according to the data checking result.

According to the data processing method provided by the embodiment of the application, on one hand, after a data checking request is responded, a first data copy corresponding to a source node and a second data copy corresponding to a destination node can be obtained, and by comparing the first primary key information and the first version number in the first data copy with the second primary key information and the second version number in the second data copy respectively, whether the data sent by the source node and the data received by the destination node are identical or not is judged, and because the data copy is adopted, the data transmission is not required to be stopped, and the transmitted data is not required to be locked, so that the system non-locking data checking is realized, and the occupation of system resources is reduced; on the other hand, the first data copy and the second data copy are formed only according to the primary key information and the version number in each row of data, so that the data quantity adopted in data checking is reduced, and the data checking efficiency is improved.

Specific implementations of the individual method steps of the data processing method are described in detail below.

In S310, in response to the data check request, a first data copy corresponding to the source node and a second data copy corresponding to the destination node are obtained, where the first data copy includes first primary key information and a first version number corresponding to each piece of data in the source node, and the second data copy includes second primary key information and a second version number corresponding to each piece of data in the destination node.

In one embodiment of the present application, the conditions for triggering the data checking task are mainly divided into two cases, one is that the operation and maintenance service personnel trigger the data checking task when abnormal conditions are found in the running process of the system, and the other is that the system actively triggers the data checking task at a preset time, for example, actively triggers the data checking task at a service low peak period. Whether the operation and maintenance service personnel trigger or the system actively triggers, after responding to the data checking request, the data checking service can acquire data corresponding to the source node and the destination node to be checked from a data table maintained at the source node and a data table maintained at the destination node according to the time point of responding to the data checking request so as to perform data checking. The source node and the destination node to be checked can be nodes appointed by operation and maintenance service personnel or a system, and can also be part or all of the nodes in each service thread in the system.

In one embodiment of the present application, the system may be a distributed system or a non-distributed system. When the data is checked, the data can be checked on a plurality of node groups consisting of the source node and the destination node at the same time. When data verification is performed according to the acquired data corresponding to the source node and the destination node to be verified, the data verification can be performed through a first data copy corresponding to the source node and a second data copy corresponding to the destination node, wherein the first data copy and the second data copy comprise primary key information and version numbers corresponding to all pieces of data in a data table, so that locking of service threads between the source node and the destination node for verifying the data between the source node and the destination node can be avoided, and further influence on normal data transmission of the outside and influence on user service are avoided.

Fig. 4 shows a schematic flow chart of acquiring the first data copy and the second data copy, and as shown in fig. 4, the flow includes at least S401-S403:

in S401, snapshot reading is performed on data corresponding to the source node and data corresponding to the destination node based on a multi-version concurrency control mechanism, so as to obtain a first snapshot corresponding to the source node and a second snapshot corresponding to the destination node.

In one embodiment of the present application, a consistent read mechanism in MVCC, i.e. snapshot read, is mainly adopted, and by performing snapshot read on data corresponding to a source node and data corresponding to a destination node, a first snapshot corresponding to the source node and a second snapshot corresponding to the destination node can be obtained. As can be seen from the mechanism of the consistency read (snapshot read), the data contained in the first snapshot obtained is the transaction commit updated data which can be read in the data table corresponding to the source node according to the time point of responding to the data check request, and similarly, the data contained in the second snapshot is the transaction commit updated data which can be read in the data table corresponding to the destination node according to the time point of responding to the data check request.

In S402, first primary key information and a first version number corresponding to each piece of data are extracted from the first snapshot, and second primary key information and a second version number corresponding to each piece of data are extracted from the second snapshot.

In one embodiment of the present application, since there are multiple pieces of data in the data table corresponding to each node, and each piece of data includes data corresponding to multiple dimensions, in order to reduce the amount of data used in data checking, and not to affect the result of data checking, after the first snapshot and the second snapshot are acquired, data of a critical dimension that can be used to indicate data update information may be extracted from the first snapshot and the second snapshot. In the embodiment of the present application, the data in the key dimension specifically includes the primary key information and the version number corresponding to each piece of data, where the primary key information is information for distinguishing each piece of data, for example, may be a node number, a node name, etc., and the version number is used to indicate an update condition of the data in each data line, and is a global self-increasing version number, and when the data in the data line changes, the version number corresponding to the data line may be self-increased in 1 unit, or may be self-increased in other fixed values, which is not specifically limited in the embodiment of the present application. When the data of a certain line changes, the version number corresponding to the line is updated, so that the position where the system data is changed can be known according to the value of the version number. For example, when there is a piece of data, the primary key information is a, and the version number is 3, it is indicated that the data in the data row corresponding to a is changed 3 times before the time point of responding to the data check request.

In one embodiment of the present application, after the first snapshot and the second snapshot are acquired, first primary key information and a first version number corresponding to each piece of data may be extracted from the first snapshot, and second primary key information and a second version number corresponding to each piece of data may be extracted from the second snapshot, so that a first data copy may be generated according to the first primary key information and the first version number, and a second data copy may be generated according to the second primary key information and the second version number.

Taking routing data among network controllers as an example, each network controller controls a plurality of virtual routing forwarding (Virtual Routing Forwarding, abbreviated as VRF), any two adjacent network controllers are respectively used as a source node and a destination node, according to the method in the above embodiment, a snapshot corresponding to the source node and the destination node and a data copy can be obtained, and table 1 and table 2 show a first snapshot corresponding to the source node and a second snapshot corresponding to the destination node:

TABLE 1 first snapshot corresponding to source node

TABLE 2 second snapshot corresponding to destination node

As can be seen from analysis of tables 1 and 2, each data line included in the obtained first snapshot and second snapshot includes five dimensions of data, which are respectively: VRF, cidr (class Inter-Domain Routing), A, B and version number, where a and B may be information related to VRF such as next hop IP address (next hop) and Prefix (Prefix). Since one data row can be uniquely determined according to VRF and Cidr, VRF, cidr, and version number can be extracted from the first snapshot and the second snapshot to form a first data copy and a second data copy, and table 3 and table 4 show the first data copy and the second data copy:

TABLE 3 first data copy corresponding to Source node

Table 4 second data copy corresponding to destination node

In S403, the first data copy is generated according to the first primary key information and the first version number, and the second data copy is generated according to the second primary key information and the second version number.

In one embodiment of the present application, after the first primary key information, the first version number, the second primary key information, and the second version number are acquired, a first data copy may be generated according to the first primary key information and the first version number, and a second data copy may be generated according to the second primary key information and the second version number. And comparing the data in the first data copy and the second data copy to determine whether the data has abnormality in the process of flowing from the source node to the destination node.

In one embodiment of the present application, since the first version number and the second version number are globally self-increasing, in the case that no abnormality occurs in the data transmission process, the same primary key information should exist in the data of the source node and the destination node, and the version numbers corresponding to the same primary key information are the same, and if the primary key information in the destination node is different from the primary key information in the source node, or the version numbers corresponding to the same primary key information in the source node and the destination node are different, it is indicated that there is an abnormality in the data transmission process, so in the embodiment of the present application, whether the abnormality occurs in the data flow from the source node to the destination node can be determined by comparing whether the primary key information in the source node and the destination node is the same, and the version numbers corresponding to the same primary key information in the source node and the destination node.

In one embodiment of the present application, while updating the version number according to the change condition of the data in the data line, a log recording the data before and after the update may be generated according to the updated data. And if the data changes are inconsistent, rolling back the data line corresponding to the destination node according to the data in the logs corresponding to the source node, so that the data of all the data lines smaller than the version number are consistent in the source node and the destination node when the version numbers corresponding to the same main key information in the source node and the destination node are the same.

In S320, the first primary key information and the first version number are compared with the second primary key information and the second version number, respectively, so as to obtain a data checking result, and a target operation is executed according to the data checking result.

In one embodiment of the present application, after the first data copy and the second data copy are obtained, the first primary key information and the first version number in the first data copy may be compared with the second primary key information and the second version number in the second data copy, so as to obtain a data checking result. In the embodiment of the application, the data checking results are of two types, namely, the first type is normal data transmission, the second type is abnormal data transmission, when the data transmission is normal, no processing is needed for system data, and when the data transmission is abnormal, the error data is needed to be repaired according to the correct data. Further, the case of abnormal data transmission is also classified into two types, the first type is data loss, as shown in the first data copy and the second data copy shown in table 3 and table 4, the second data copy lacks the corresponding (VRF 2, 172.21.0.3) data line with respect to the first data copy, and the other type of abnormal data transmission is data error, for example, when the corresponding (VRF 2, 172.21.0.3) data line exists in the second data copy shown in table 4, but the version number corresponding to the data line is different from the version number corresponding to (VRF 2, 172.21.0.3) in the first data copy, that is, an error occurs in data transmission, so that when judging the case of abnormal data transmission, it is necessary to implement in two ways.

Fig. 5 shows a flow chart of obtaining a data checking result, as shown in fig. 5, in S501, the first primary key information is compared with the second primary key information; in S502, when the first primary key information is the same as the second primary key information, a first target version number corresponding to the first primary key information and a second target version number corresponding to the second primary key information are obtained; in S503, comparing the first target version number with the second target version number; in S504, when the first target version number is different from the second target version number, it is determined that the data check result is abnormal for data transmission.

And when the second target main key information which is the same as the first target main key information exists, acquiring a first target version number corresponding to the first main key information and a second target version number corresponding to the second target main key information, comparing the first target version number with the second target version number, and determining a data checking result according to the comparison result. Likewise, according to the above method, the data check result may be obtained when the number of the first primary key information and/or the second primary key information is one.

The method shown in fig. 5 can determine that the data transmission is abnormal due to the data transmission error. Meanwhile, the first main key information and the second main key information are compared, when the second main key information which is the same as the first main key information does not exist, the data loss is shown when the data flows from the source node to the destination node, and the data transmission abnormality can be determined as a data check result. Taking the first data copy and the second data copy shown in table 3 and table 4 as examples, the primary key information (VRF 1, 10.206.0.1), (VRF 1, 10.206.0.2), (VRF2,172.21.0.1) and (VRF2,172.21.0.3) in the first data copy are respectively compared with the primary key information (VRF 1, 10.206.0.1), (VRF 1, 10.206.0.2) and (VRF2,172.21.0.1) in the second data copy, and the primary key information (VRF 1, 10.206.0.1), (VRF 1, 10.206.0.2) and (VRF2,172.21.0.1) in the first data copy are found by comparison to be the same as the primary key information (VRF 1, 10.206.0.1), (VRF 1, 10.206.0.2) and (VRF2,172.21.0.1) in the second data copy, and then the version numbers corresponding to the primary key information can be obtained for comparison, which can be obtained, and the version numbers corresponding to the primary key information are all the same, which indicates that the data is not abnormal in the transmission process, but the primary key information (VRF2,172.21.0.3) is not present in the second data copy, which indicates that the data is lost in the data transmission process, and the data table of the data item needs to be added to the node.

In one embodiment of the application, no matter the data is lost or the data is wrong, the data corresponding to the destination node is required to be repaired, so that the data received by the destination node is ensured to be the same as the data sent by the source node, and further the influence on the processing result of the user service is avoided.

In one embodiment of the present application, when repairing data, a service thread corresponding to a data line to be repaired needs to be locked to suspend data transmission between a source node and a destination node, and then repair a data table of the destination node according to correct source node data. Fig. 6 shows a flow chart of performing a target operation according to a data check result, and as shown in fig. 6, in S601, when the data check result is abnormal for data transmission, target primary key information corresponding to the abnormal data is acquired; in S602, switching a lock state corresponding to the target primary key information to a locked state to stop data transmission between the source node and the destination node; in S603, data repair is performed on the destination node according to the source node data corresponding to the target primary key information.

When repairing data of a target node according to source node data corresponding to target main key information, firstly, comparing the target main key information with main key information in a data table corresponding to the source node to acquire target data corresponding to the target main key information; and then repairing the data of the destination node according to the target data.

Since data transmission anomalies include data loss and data errors, the manner in which data repair is performed is also different. Specifically, when the version number corresponding to the target primary key information exists in the second data copy, and the version number corresponding to the target primary key information in the second data copy is different from the version number corresponding to the target primary key information in the first data copy, that is, when the type of the data transmission abnormality is a data error, data to be updated can be obtained from a data table corresponding to the target node according to the target primary key information, and the data to be updated is the data corresponding to the target primary key information; and then updating the data to be updated according to the acquired target data. When the second primary key information corresponding to the target primary key information does not exist in the second data copy, that is, when the data transmission abnormality type is lost, the target data can be inserted into the data table corresponding to the target node so as to complement the data table of the target node.

In one embodiment of the application, the adopted lock is a distributed lock, and after the target primary key information is determined, the state of the lock can be switched to the locking state by assigning a value to the lock corresponding to the target primary key information, so that the data restoration of the target node is facilitated. Distributed locks are a way to control the synchronous access of shared resources between distributed systems, and if one or a group of resources is shared between different systems or different hosts of the same system, then when accessing these resources, mutual exclusion is often required to prevent interference with each other to ensure consistency, which requires the use of a distributed lock.

In one embodiment of the application, the distributed lock may be a Redis-based distributed lock or a zookeeper-based distributed lock.

When the distributed lock is based on the Redis, the key of the lock can be recorded by the data checking service through the Redis, each source node acquires the key by periodically polling the Redis, and the corresponding service thread is locked according to the key, so that the data in the service thread is repaired. A description will be given next of a dis-based distributed lock based on the first data copy and the second data copy shown in tables 3 and 4.

FIG. 7 shows an interface diagram of a Redis-based distributed lock, as shown in FIG. 7, where a data verification service 701 records a key with the aid of Redis 702: when it is determined that the second data copy lacks information corresponding to the primary key information (VRF2,172.21.0.3) and data repair is needed, the data check service 701 sets a value for the lock-VRF2 to suspend data transmission of the VRF2 service thread, and inserts data corresponding to (VRF2,172.21.0.3) in the source node into the data table of the destination node 705.

In one embodiment of the present application, if both source nodes 703 and 704 and destination node 705 contain data transmissions for VRF2 traffic threads, then source nodes 703 and 704 need to periodically poll whether the key is present in Redis, if not, the first querying source node, e.g., source node 703, acquires the lock and suspends its data transmission with the corresponding VRF2 traffic thread to destination node 705 by locking, while source node 704 continues to periodically poll until the lock is removed by source node 703, then source node 704 takes the lock and locks to repair the data to destination node 705. If only the data transmission of the VRF2 service thread is included between the source node 703/704 and the destination node 705, only the source node 703/704 needs to be locked, and the data transmission of the corresponding VRF2 service thread between the source node 703/704 and the destination node 705 is suspended.

Further, after the data repair is completed, the distributed lock is deleted, the data transmission is resumed, and the data buffered by the source nodes 703 and 704 from the previous node is sent to the destination node 705.

In one embodiment of the present application, when a key exists in the dis, it is indicated that the lock corresponding to the key is locked, and data transmission cannot be performed, and only when no key exists in the dis, it is indicated that the lock is released, and normal data transmission can be performed, so that the source node needs to send a query request before sending data to the destination node, and obtains the state of the lock corresponding to the primary key information in the dis according to the primary key information in the query request, and determines whether the data can be sent to the destination node according to the state of the lock.

In one embodiment of the present application, a plurality of locks corresponding to different services may be set according to the service types, for example, when VRF1, VRF2 and VRF3 exist, then three keys may be recorded in Redis: when the nodes processing a certain type of service need to be subjected to data restoration, the lock-VRF1, the lock-VRF2 and the lock-VRF3 can suspend corresponding data transmission by setting a value for the key corresponding to the service type, and data restoration is performed.

When the distributed lock is a zookeeper-based distributed lock, when a plurality of source nodes want to pause data transmission between the source nodes and a destination node for data restoration, a temporary sequence node can be created under a designated node of the zookeeper, the source node corresponding to the first temporary sequence node firstly acquires the distributed lock, other source nodes which do not acquire the distributed lock acquire a distributed lock deletion event from the zookeeper, when the lock deletion event exists, whether the source node is the first in the temporary sequence node is judged, if so, the distributed lock is acquired, and if not, the distributed lock deletion event is continuously acquired.

By adopting the distributed lock to lock the service thread needing to carry out data restoration between the source node and the destination node, the destination node is unlocked to carry out new data transmission after finishing data restoration, thereby ensuring the normal data transmission of other service threads, restoring the abnormal data transmission thread, ensuring the consistency of service data between the source node and the destination node, and further avoiding the influence on user service.

In order to more intuitively and comprehensively understand the process of data checking and safety repairing of data in the case of abnormal data checking in the application, the specific description is based on the first snapshot and the second snapshot shown in the table 1 and the table 2. Fig. 8 shows a flow chart of data checking and data repairing, as shown in fig. 8, a first data copy 802 may be generated by extracting first primary key information and a first version number from a first snapshot 801, a second data copy 804 may be generated by extracting second primary key information and a second version number from a second snapshot 803, and by comparing the first data copy 802 and the second data copy 804, difference data 805, that is, a data row corresponding to primary key information (VRF2,172.21.0.3) may be determined; the corresponding data row may then be obtained from the first snapshot according to the difference data 805, while the service thread corresponding to VRF2 is locked by the distributed lock corresponding to VRF2, and data transmission between the source node and the destination node is suspended, so as to repair the data table 806 of the destination node according to the data row corresponding to the difference data 805, and specifically, insert the data row into the data table of the destination node.

In one embodiment of the application, after the service thread is locked by adopting the distributed lock and before the data restoration is performed on the destination node, the data in the first data copy and the second data copy can be further compared, so that the data verification result obtained in the previous time is confirmed according to the data verification result obtained by the comparison, and when the data verification results are consistent and the data transmission is abnormal, the data restoration is performed on the destination node. Therefore, inaccurate data checking results caused by large data volume can be avoided, and unnecessary data restoration caused by occupation of a large amount of system resources can be avoided.

According to the data processing method, after a data checking request is responded, a first data copy corresponding to a source node and a second data copy corresponding to a destination node are obtained, and the first data copy and the second data copy contain primary key information and version numbers corresponding to all pieces of data; and comparing the first primary key information and the first version number in the first data copy with the second primary key information and the second version number in the second data copy respectively to obtain a data checking result, and executing target operation according to the data checking result. According to the data processing method, on one hand, the data copies corresponding to the source node and the destination node are compared to judge whether the data sent by the source node and the data received by the destination node are identical, so that external data transmission is not required to be stopped, a line for transmitting the data is not required to be locked, verification of system locking-free data is realized, and occupation of system resources is reduced; on the other hand, the first data copy and the second data copy are formed only according to the primary key information and the version number in each row of data, so that the data quantity adopted in data checking is reduced, and the data checking efficiency is improved.

It should be noted that although the steps of the methods of the present application are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

The following describes embodiments of the apparatus of the present application that may be used to perform the data processing methods of the above-described embodiments of the present application. Fig. 9 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 9, the data processing apparatus 900 includes: the copy acquisition module 910 and the data comparison module 920, specifically:

the copy obtaining module 910 is configured to obtain, in response to a data check request, a first data copy corresponding to a source node and a second data copy corresponding to a destination node, where the first data copy includes first primary key information and a first version number corresponding to each piece of data in the source node, and the second data copy includes second primary key information and a second version number corresponding to each piece of data in the destination node; the data comparison module 920 is configured to compare the first primary key information and the first version number with the second primary key information and the second version number, respectively, so as to obtain a data checking result, and execute a target operation according to the data checking result.

In some embodiments of the present application, based on the above technical solutions, the copy obtaining module 910 includes: the snapshot generating unit is used for carrying out snapshot reading on the data corresponding to the source node and the data corresponding to the destination node based on a multi-version concurrency control mechanism so as to acquire a first snapshot corresponding to the source node and a second snapshot corresponding to the destination node; an information extraction unit, configured to extract, from the first snapshot, the first primary key information and the first version number corresponding to each piece of data, and extract, from the second snapshot, the second primary key information and the second version number corresponding to each piece of data; and the copy generation unit is used for generating the first data copy according to the first primary key information and the first version number and generating the second data copy according to the second primary key information and the second version number.

In some embodiments of the present application, based on the above technical solutions, the data comparison module 920 is configured to: comparing the first primary key information with the second primary key information; when the first main key information is the same as the second main key information, a first target version number corresponding to the first main key information and a second target version number corresponding to the second main key information are obtained; comparing the first target version number with the second target version number; and when the first target version number is different from the second target version number, determining that the data checking result is abnormal for data transmission.

In some embodiments of the present application, based on the above technical solutions, the data comparison module 920 is configured to: comparing the first main key information with the second main key information, and determining that the data check result is abnormal in data transmission when the second main key information identical to the first main key information does not exist.

In some embodiments of the present application, based on the above technical solutions, the data processing apparatus 900 is configured to: before responding to a data checking request, responding to a query request sent by the source node, and acquiring the state of a lock corresponding to primary key information in the query request; and when the lock is in the unlocking state, transmitting data to the destination node through the source node.

In some embodiments of the present application, based on the above technical solutions, the data comparison module 920 includes: the acquisition unit is used for acquiring target primary key information corresponding to abnormal data when the data check result is abnormal in data transmission; the locking unit is used for switching the state of the lock corresponding to the target primary key information into a locking state so as to stop data transmission between the source node and the destination node; and the first repairing unit is used for repairing the data of the target node according to the source node data corresponding to the target primary key information.

In some embodiments of the present application, based on the above technical solutions, the repair unit includes: the comparison unit is used for comparing the target main key information with the main key information in the data table corresponding to the source node so as to acquire target data corresponding to the target main key information; and the second repairing unit is used for repairing the data of the target node according to the target data.

In some embodiments of the present application, based on the above technical solution, when a version number corresponding to the target primary key information in the second data copy is different from a version number corresponding to the target primary key information in the first data copy; the second repair unit is configured to: acquiring data to be updated from a data table corresponding to the destination node according to the target primary key information; and updating the data to be updated according to the target data.

In some embodiments of the present application, based on the above technical solution, when there is no second primary key information identical to the target primary key information in the second data copy; the second repair unit is configured to: and inserting the target data into a data table corresponding to the target node.

In some embodiments of the present application, based on the above technical solutions, the data comparison module 920 is configured to: and before the target node is subjected to data restoration according to the source node data corresponding to the target primary key information, the data in the first data copy and the second data copy are subjected to re-comparison so as to confirm the data checking result.

In some embodiments of the application, the source node and the destination node are any two adjacent nodes in a traffic line comprising a plurality of nodes; based on the above technical solution, the data processing apparatus 900 is further configured to: and after updating the data table corresponding to the destination node according to the target data, transmitting the cached data to the destination node through the source node, wherein the cached data is the data sent to the source node by the last node adjacent to the source node.

In some embodiments of the present application, based on the above technical solutions, the lock is a redis-based distributed lock or a zookeeper-based distributed lock.

Specific details of the data processing apparatus provided in each embodiment of the present application have been described in the corresponding method embodiments, and are not described herein.

Fig. 10 schematically shows a block diagram of a computer system for implementing an electronic device, which may be the terminal device 101 and the server 103 as shown in fig. 1, according to an embodiment of the present application.

It should be noted that, the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 10, the computer system 1000 includes a central processing unit 1001 (Central Processing Unit, CPU) which can execute various appropriate actions and processes according to a program stored in a Read-Only Memory 1002 (ROM) or a program loaded from a storage section 1008 into a random access Memory 1003 (Random Access Memory, RAM). In the random access memory 1003, various programs and data necessary for the system operation are also stored. The cpu 1001, the rom 1002, and the ram 1003 are connected to each other via a bus 1004. An Input/Output interface 1005 (i.e., an I/O interface) is also connected to bus 1004.

In some embodiments, the following components are connected to the input/output interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a local area network card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the input/output interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.

In particular, the processes described in the various method flowcharts may be implemented as computer software programs according to embodiments of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. The computer programs, when executed by the central processor 1001, perform the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, comprising several instructions for causing an electronic device to perform the method according to the embodiments of the present application.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of data processing, comprising:

responding to a data checking request, acquiring a first data copy corresponding to a source node and a second data copy corresponding to a destination node, wherein the first data copy comprises first primary key information and a first version number corresponding to each piece of data in the source node, and the second data copy comprises second primary key information and a second version number corresponding to each piece of data in the destination node;

and comparing the first primary key information and the first version number with the second primary key information and the second version number respectively to obtain a data checking result, and executing target operation according to the data checking result.

2. The method of claim 1, wherein the obtaining a first copy of data corresponding to the source node and a second copy of data corresponding to the destination node comprises:

performing snapshot reading on the data corresponding to the source node and the data corresponding to the destination node based on a multi-version concurrency control mechanism to acquire a first snapshot corresponding to the source node and a second snapshot corresponding to the destination node;

extracting the first primary key information and the first version number corresponding to each piece of data from the first snapshot, and extracting the second primary key information and the second version number corresponding to each piece of data from the second snapshot;

And generating the first data copy according to the first primary key information and the first version number, and generating the second data copy according to the second primary key information and the second version number.

3. The method according to claim 1 or 2, wherein comparing the first primary key information and the first version number with the second primary key information and the second version number, respectively, to obtain a data check result includes:

comparing the first primary key information with the second primary key information;

when the first main key information is the same as the second main key information, a first target version number corresponding to the first main key information and a second target version number corresponding to the second main key information are obtained;

comparing the first target version number with the second target version number;

and when the first target version number is different from the second target version number, determining that the data checking result is abnormal for data transmission.

4. The method according to claim 1 or 2, wherein comparing the first primary key information and the first version number with the second primary key information and the second version number, respectively, to obtain a data check result includes:

and when the second main key information which is the same as the first main key information does not exist, determining that the data check result is abnormal in data transmission.

5. The method of claim 1, wherein prior to responding to the data verification request, the method further comprises:

responding to a query request sent by the source node, and acquiring the state of a lock corresponding to primary key information in the query request;

and when the lock is in the unlocking state, transmitting data to the destination node through the source node.

6. The method of claim 1, wherein the performing a target operation based on the data check result comprises:

when the data check result is abnormal in data transmission, acquiring target primary key information corresponding to abnormal data;

switching the state of the lock corresponding to the target primary key information into a locking state so as to stop data transmission between the source node and the destination node;

and carrying out data restoration on the destination node according to the source node data corresponding to the target primary key information.

7. The method of claim 6, wherein the performing data repair on the destination node according to the source node data corresponding to the target primary key information comprises:

Comparing the target primary key information with primary key information in a data table corresponding to the source node to obtain target data corresponding to the target primary key information;

and carrying out data restoration on the target node according to the target data.

8. The method of claim 7, wherein when a version number in the second data copy corresponding to the target primary key information is different from a version number in the first data copy corresponding to the target primary key information;

the data repairing of the destination node according to the target data comprises the following steps:

acquiring data to be updated from a data table corresponding to the destination node according to the target primary key information;

and updating the data to be updated according to the target data.

9. The method of claim 7, wherein when there is no second primary key information in the second data copy that is the same as the target primary key information;

and inserting the target data into a data table corresponding to the target node.

10. The method of claim 7, wherein the source node and the destination node are any two adjacent nodes in a business thread comprising a plurality of nodes;

After the target node is subjected to data restoration according to the target data, the method further comprises the following steps:

and transmitting the cached data to the destination node through the source node, wherein the cached data is the data transmitted to the source node by the previous node adjacent to the source node.

11. A data processing apparatus, comprising:

the copy acquisition module is used for responding to a data check request, acquiring a first data copy corresponding to a source node and a second data copy corresponding to a destination node, wherein the first data copy comprises first primary key information and a first version number corresponding to each piece of data in the source node, and the second data copy comprises second primary key information and a second version number corresponding to each piece of data in the destination node;

and the data comparison module is used for comparing the first primary key information and the first version number with the second primary key information and the second version number respectively so as to obtain a data checking result and executing target operation according to the data checking result.

12. A computer readable medium comprising instructions which, when run on a computer, cause the computer to perform the data processing method of any one of claims 1 to 10.

13. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to invoke the executable instructions to implement the data processing method of any of claims 1 to 10.

14. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the data processing method of any of claims 1 to 10.