CN108347455B - Metadata interaction method and system - Google Patents

Metadata interaction method and system Download PDF

Info

Publication number
CN108347455B
CN108347455B CN201710053030.2A CN201710053030A CN108347455B CN 108347455 B CN108347455 B CN 108347455B CN 201710053030 A CN201710053030 A CN 201710053030A CN 108347455 B CN108347455 B CN 108347455B
Authority
CN
China
Prior art keywords
metadata
partition
request
server
corresponding partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710053030.2A
Other languages
Chinese (zh)
Other versions
CN108347455A (en
Inventor
程霖
朱云锋
付鑫
安凯歌
唐治洋
陶云峰
卢毅军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710053030.2A priority Critical patent/CN108347455B/en
Publication of CN108347455A publication Critical patent/CN108347455A/en
Application granted granted Critical
Publication of CN108347455B publication Critical patent/CN108347455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/561Adding application-functional data or data for application control, e.g. adding metadata

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention aims to provide a metadata interaction method and a metadata interaction system, wherein a plurality of backend machines form a backend consistency total system, the backend machines are divided into a plurality of partitions, each partition respectively responds to a corresponding request and shares request pressure, the problem of overlarge pressure caused by the response of the same backend machine to all requests is avoided, the horizontal extension of the backend machines is realized, in addition, by distributing the corresponding partitions for different users, a Quota Quota can be made for a metadata resource space written by each user, metadata among the users is partitioned and isolated, the safety is also ensured, and the differentiation of different users for the charging of the Quota Quota is also facilitated.

Description

Metadata interaction method and system
Technical Field
The invention relates to the field of computers, in particular to a metadata interaction method and a metadata interaction system.
Background
At present, more and more businesses need to break through the boundary of a data center to realize the capacity of cross-region service. Asynchronous replication of data based on cross-region seems to solve this problem, however, in fact, such asynchronous replication brings about a problem that data is difficult to achieve consistency, and even a disaster tolerance problem at a region level cannot be handled. In some industries where data consistency and reliability are highly required, such as the internet financial industry, such asynchronous replication is simply insufficient.
Therefore, when a cross-regional consistent metadata storage System (Global Meta System) scheme is designed, a back-end System always cannot solve the problems of high availability, consistency and the like of data by using a distributed consistency System (Quorum). The diagram of the Quorum group is shown in FIG. 2, wherein there are angle color Leader, Follower and Observer in the Quorum group. In a distributed consistency system, a plurality of machines composing the consistency system are called a Quorum, data in the machines in the Quorum are the same, wherein the machines have a plurality of roles, a Leader represents a Leader in the Quorum, and all transactional requests requesting to the Quorum must go through the Leader to be processed; follower accepts non-transactional requests and if a transactional request is received it needs to be forwarded to the Leader. The decision between the Leader and the Follower needs to be made on the transactional request data of the user, and the request data is not approved. The Observer only needs a learner in the Quorum to pull data from the Leader, and mainly plays a role in backing up the data. Here, the transactional request refers to a request such as write/update that can change log data in a back-end consistency system (NuwaLog); a non-transactional request refers to a request, such as a read, that does not change the back-end consistency system (NuwaLog) log data.
As the number of machines in a Quorum group increases, a transactional request requires a prompt to acknowledge (ack) to N/2+1 machines in the Quorum group, where N represents the number of machines in the Quorum, and servers of odd number of machines are generally deployed in the Quorum group, where network delay caused by communication between machines increases, so that it can be seen that the Quorum cannot be arbitrarily expanded, and the performance of the transactional request is reduced due to the overhead of communication between the Quorum groups.
Currently, there are some better consistency systems in the industry, such as: zookeeper, chubby, etc. A back-end consistency system (Quorum) typically consists of an odd number of servers (servers). All front-end transactional requests in a cross-regional consistency metadata system (Frontend) need to reach the back-end, all the requests establish a TCP connection and a session with the back-end, which causes pressure on the back-end, and theoretically, the qualum cannot be arbitrarily expanded because the overhead of communication between the qualum groups causes performance degradation of the transactional requests. In addition, in each Quorum group, data of a user is written into the group, so that the problems that the pressure of the back end is too large, the data written by the user is uneven, some users write more data, some users write less data, the resource space written by each user cannot be subjected to a Quota, the data among the users cannot be partitioned and isolated, and the safety cannot be guaranteed often occur.
In a global scenario, a single consistency system is used for storing metadata such as log (log) and snapshot (snapshot), which causes the reduction of single machine storage performance and the limitation of storage capacity, the number of data storage sites is limited by the number of nodes of a consistency protocol, and the performance of reading data in each region (region) is low. Wherein, the Log is a transactional Log, that is, request data sent by a user, and is uniformly called as Log in a backend consistency system; snapshot is a Snapshot, that is, a Snapshot of the total amount of data in the memory at a certain time in the backend consistency system.
In designing a metadata storage system architecture with cross-regional consistency, most importantly, a distributed consistency system is needed to ensure the consistency of metadata on a global basis. It is common practice in the industry to utilize a consistency system implemented by zookeeper or chubby, or based on paxos, as shown in fig. 1, whose common architecture is:
each region (region) requires a local storage system (storage system), which may be mysql or a distributed database, etc. The structure has fewer components and the design of the interactive protocol is simpler. The distributed consistency system can send snapshots to the storage system at intervals, and clients/users of each region can acquire analysis data from the storage system. However, in this architecture, each machine in the Quorum includes a master server (Leader) and a slave server (Follower), and all snapshots need to be stored, and the memory of the master server (Leader) and the slave server (Follower) stores the full log data.
Here, it can be clearly seen that the consistency system has the problems of obvious expandability, single-machine performance bottleneck, and limited data capacity of the disk and the memory. Moreover, the local storage system (storage system) still needs to acquire data from the consistency system, and thus, a Leader in the Quorum system needs to access the mysql/storage system and actively send snapshot data to the mysql/storage system, which has poor performance. Therefore, although such an architecture protocol is simple and the consistency system does not need any modification, it is only necessary to push snapshot data to mysql/storage system at intervals, but the architecture only solves the read request of client/user of each region, and does not solve the difficulty of global region-crossing consistency metadata access.
As described above, the distributed consistency system is directly used as a cross-regional core backend system, the protocol design is simple, the system is not prone to errors, but the system has many performance and machine storage capacity problems and is not scalable, and such a distributed system cannot be used in a complex scenario like cloud computing.
Disclosure of Invention
An object of the present invention is to provide a metadata interaction method and system, which can solve the problems that the problem pressure of a backend machine is too large, a quantum Quota cannot be made on a resource space written by each user, data between users cannot be partitioned and isolated, and security cannot be guaranteed.
The invention provides a metadata interaction method on a backend machine, which comprises the following steps:
receiving a metadata request of a user for updating a corresponding partition from a front-end computer, and updating corresponding metadata into the corresponding partition according to the request;
sending metadata to the front-end computer, receiving the position of the currently pulled metadata in the partition from the front-end computer, and deleting the pulled metadata in the corresponding partition according to the position in the partition.
Further, in the above method, after sending the metadata to the front-end, the method further includes:
and acquiring a metadata request of a user for reading the corresponding partition from the front-end computer, and sending corresponding metadata to the front-end computer according to the metadata reading request.
Further, in the above method, the front-end computer adopts a main/standby server architecture.
Further, before receiving a metadata request of a user for updating a corresponding partition from a front-end computer or acquiring a metadata request of a user for reading a corresponding partition from the front-end computer, the method further includes:
receiving registration requests from the primary server and the standby server;
and after the main server and the standby server are registered according to the registration request, sending registration success information to the main server and the standby server and sending the position of the currently pulled metadata in the partition to the main server.
Further, in the above method, a distributed lock service system switches between a primary server and a standby server in the front-end computer.
Further, in the above method, when a distributed lock service system switches between a primary server and a standby server in the front-end computer, the method further includes:
receiving a logout request from a main server and/or a standby server which is offline, and sending logout result information to the main server and/or the standby server which is offline after logging out the main server and/or the standby server which is offline according to the logout request;
receiving a registration request from a newly online main server and/or standby server, registering the newly online main server and/or standby server according to the registration request, and then sending registration success information to the newly online main server and/or standby server and sending the position of currently pulled metadata in the partition to the newly online main server.
Further, in the above method, while updating the corresponding metadata into the corresponding partition, the method further includes:
when the metadata is a log, generating a corresponding snapshot according to the currently stored metadata in the partition and storing the snapshot into the partition;
deleting the metadata pulled in the partition according to the position in the partition, and simultaneously:
deleting the corresponding snapshot from the partition.
According to another aspect of the present invention, there is also provided a metadata interaction method on a front-end machine, the method including:
receiving a metadata request of a user for updating a corresponding partition, and updating the metadata into the corresponding partition by sending the request to the corresponding partition;
and pulling the metadata from the corresponding partition, storing the pulled metadata to a local storage system of a region where the front-end computer is located, and replying the position of the currently pulled metadata in the partition to the corresponding partition, so that the partition deletes the metadata which is stored in the partition according to the position of the partition.
Further, in the above method, after storing the pulled metadata in a local storage system of a region where the front-end computer is located, the method further includes:
receiving a metadata request of a user for reading a corresponding partition;
according to the metadata request for reading the corresponding subarea, acquiring the metadata of the corresponding subarea from a local storage system of the region where the front-end computer is located,
and if the metadata is acquired from the local storage system, the metadata of the acquired corresponding partition is replied to the user.
Further, in the above method, after obtaining metadata of a corresponding partition from a local storage system of a region where the front-end computer is located, the method further includes:
and if the metadata is not acquired from the local storage system, sending the metadata reading request to the corresponding partition, and after acquiring the metadata from the corresponding partition, replying the acquired metadata of the corresponding partition to the user.
Further, in the above method, the front-end computer adopts a main/standby server architecture.
Further, before receiving a metadata request of a user for updating a corresponding partition or receiving a metadata request of a user for reading a corresponding partition, the method includes:
the main server and the standby server send registration requests to a backend consistency main system;
and after the main server and the standby server receive the registration success information from the back-end consistency total system, the main server acquires the position of the currently pulled metadata in the partition from the corresponding partition.
Further, in the above method, a distributed lock service system switches between a primary server and a standby server in the front-end computer.
Further, in the above method, while the distributed lock service system switches the primary server and the standby server in the front-end computer, the method further includes:
the off-line main server and the off-line standby server send logout requests to the distributed lock service system and receive logout result information from the back-end consistency main system;
and after the newly online main server and the standby server send registration requests to the back-end consistency total system and receive registration success information from the back-end consistency total system, the newly online main server acquires the position of the currently pulled metadata in the partition from the corresponding partition.
Further, in the above method, receiving a metadata request for updating a corresponding partition from a user, and updating the metadata into the corresponding partition by sending the request to the corresponding partition, includes:
the main server or the standby server acquires a metadata request of a user for updating a corresponding partition from a load balancing server, and updates the metadata into the corresponding partition by sending the metadata request of the user for updating the corresponding partition to the backend machine;
the method comprises the following steps of pulling metadata from a corresponding partition, storing the pulled metadata to a local storage system of a region where a front-end computer is located, and replying the position of the currently pulled metadata in the partition where the metadata is located to a back-end computer, wherein the method comprises the following steps:
and the main server pulls the metadata from the corresponding partition, stores the pulled metadata to a local storage system of a region where the front-end computer is located, and replies the position of the pulled metadata in the partition to the corresponding partition.
Further, in the above method, receiving a metadata request for reading a corresponding partition from a user, acquiring metadata of the corresponding partition from a local storage system in a region where the front-end computer is located according to the metadata request for reading the corresponding partition, and if the metadata is acquired from the local storage system, replying the acquired metadata to the user includes:
the main server or the standby server receives a metadata request of a user for reading a corresponding partition from the load balancing server;
and the main server or the standby server acquires the metadata of the corresponding partition from a local storage system of the region where the front-end computer is located according to the metadata request for reading the corresponding partition, and if the metadata is acquired from the local storage system, the acquired metadata is replied to the user.
Further, in the above method, if the metadata is not acquired from the local storage system, sending the metadata reading request to the corresponding partition, and after acquiring the metadata from the corresponding partition, replying the acquired metadata to the user includes:
and if the metadata is not acquired from the local storage system, the main server or the standby server sends the metadata reading request to the corresponding partition, and the acquired metadata is replied to the user after the metadata is acquired from the corresponding partition.
According to another aspect of the present invention, there is also provided a backend machine, including:
the updating device is used for receiving a metadata request of a user for updating the corresponding partition from the front-end computer and updating the corresponding metadata into the corresponding partition according to the request;
and the sending device is used for sending the metadata to the front-end machine, receiving the position of the currently pulled metadata in the partition from the front-end machine, and deleting the pulled metadata in the corresponding partition according to the position in the partition.
Further, in the backend machine, the sending device is further configured to obtain, from the front-end machine, a metadata request for reading a corresponding partition of a user, and send corresponding metadata to the front-end machine according to the metadata read request.
Further, in the backend machine, the front-end machine adopts a main/standby server architecture.
Further, the backend machine further includes a registration device, configured to: receiving registration requests from the primary server and the standby server; and after the main server and the standby server are registered according to the registration request, sending registration success information to the main server and the standby server and sending the position of the currently pulled metadata in the partition to the main server.
Further, in the backend machine, a distributed lock service system switches the main server and the standby server in the front-end machine.
Further, in the backend machine, the registration device is further configured to receive a logout request from the offline main server and/or the offline standby server, and send logout result information to the offline main server and/or the offline standby server after logging out the offline main server and/or the offline standby server according to the logout request;
the registration device is further configured to receive a registration request from a newly online main server and/or standby server, register the newly online main server and/or standby server according to the registration request, and then send registration success information to the newly online main server and/or standby server and send the position of currently pulled metadata in the partition where the metadata is located to the newly online main server.
Further, in the backend machine, the updating apparatus is further configured to update the corresponding metadata into the corresponding partition when the metadata is a log, and generate a corresponding snapshot according to the metadata currently stored in the partition and store the snapshot into the partition;
the sending device is further configured to delete the corresponding snapshot from the partition while deleting the metadata that has been pulled from the partition according to the position in the partition where the sending device is located.
According to another aspect of the present invention, there is also provided a front end machine, including:
the updating initiating device is used for receiving a metadata request of a user for updating the corresponding partition, and updating the metadata into the corresponding partition by sending the request to the corresponding partition;
and the pulling device is used for pulling the metadata from the corresponding partition, storing the pulled metadata to a local storage system of a region where the front-end computer is located, and replying the position of the currently pulled metadata in the partition to the corresponding partition so that the partition deletes the pulled metadata stored in the partition according to the position of the partition.
Further, the front-end computer further includes a reading device, configured to receive a metadata request of a user for reading a corresponding partition; according to the metadata request for reading the corresponding subarea, acquiring the metadata of the corresponding subarea from a local storage system of the region where the front-end computer is located,
and if the metadata is acquired from the local storage system, the metadata of the acquired corresponding partition is replied to the user.
Further, in the front-end computer, the reading device is further configured to send the metadata reading request to the corresponding partition if the metadata is not acquired from the local storage system, and reply the metadata acquired from the corresponding partition to the user after acquiring the metadata from the corresponding partition.
Further, in the front-end computer, the front-end computer adopts a main/standby server architecture.
Further, the front-end computer further includes an initiating registration device, configured to enable the main server and the standby server to send a registration request to a backend consistency total system; and after the main server and the standby server receive the registration success information from the back-end consistency total system, the main server acquires the position of the currently pulled metadata in the partition from the corresponding partition.
Further, in the front-end computer, a distributed lock service system switches between a main server and a standby server in the front-end computer.
Further, the front-end computer further comprises a logout initiating device, a main server and a standby server for offline, sending a logout request to the distributed lock service system, and receiving logout result information from a back-end consistency total system;
the registration initiating device is further configured to send a registration request to the backend consistency total system by the newly online main server and the standby server, and after receiving registration success information from the backend consistency total system, the newly online main server obtains the position of the currently pulled metadata in the partition from the corresponding partition.
Further, in the front-end computer, the update initiating device is configured to enable the main server or the standby server to obtain, from a load balancing server, a metadata request for updating a partition corresponding to a user, and the main server or the standby server updates the metadata into the corresponding partition by sending the metadata request for updating the partition corresponding to the user to the backend computer;
and the pulling device is used for pulling the metadata from the corresponding partition by the main server, storing the pulled metadata to a local storage system of a region where the front-end computer is located, and replying the position of the pulled metadata in the partition to the corresponding partition.
Further, in the front-end computer, the reading device is configured to enable the main server or the standby server to receive, from the load balancing server, a metadata request of a user for reading a corresponding partition; and the main server or the standby server acquires the metadata of the corresponding partition from a local storage system of the region where the front-end computer is located according to the metadata request for reading the corresponding partition, and if the metadata is acquired from the local storage system, the acquired metadata is replied to the user.
Further, in the front-end computer, the reading device is further configured to, if the metadata is not obtained from the local storage system, send the metadata reading request to the corresponding partition by the main server or the standby server, and after the metadata is obtained from the corresponding partition, reply the obtained metadata to the user.
According to another aspect of the present invention, there is also provided a computing-based device comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
receiving a metadata request of a user for updating a corresponding partition from a front-end computer, and updating corresponding metadata into the corresponding partition according to the request;
sending metadata to the front-end computer, receiving the position of the currently pulled metadata in the partition from the front-end computer, and deleting the pulled metadata in the corresponding partition according to the position in the partition
According to another aspect of the present invention, there is also provided a computing-based device comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
receiving a metadata request of a user for updating a corresponding partition, and updating the metadata into the corresponding partition by sending the request to the corresponding partition;
and pulling the metadata from the corresponding partition, storing the pulled metadata to a local storage system of a region where the front-end computer is located, and replying the position of the currently pulled metadata in the partition to the corresponding partition, so that the partition deletes the metadata which is stored in the partition according to the position of the partition.
Compared with the prior art, the invention divides the back-end machine into a back-end consistency total system by forming a plurality of back-end machines, each partition respectively responds to the corresponding request to share the request pressure, thereby avoiding the problem of overlarge pressure caused by the response of the same back-end machine to all requests, realizing the horizontal expansion of the back-end machine, in addition, by distributing the corresponding partitions for different users, the Quota Quota can be made for the metadata resource space written by each user, the metadata among the users can be partitioned and isolated, the safety can be ensured, the different users can be conveniently distinguished to charge the Quota Quota, in addition, the front-end machine (Frontend) continuously pulls the logs from the corresponding partition of a certain back-end machine (NuwaLog), and continuously stores the pulled logs into a local storage system (OTS) of the region of the front-end machine, the position of the currently pulled logs is replied to the corresponding partition, the corresponding partition deletes the pulled logs stored in the corresponding partition according to the positions, so that the corresponding partition does not need to store all logs, only needs to store logs which are not pulled to a local storage system (OTS), can avoid the problems that the stand-alone storage performance of a back-end computer (NuwaLog) is reduced, the storage capacity is limited, the number of data storage sites is limited by the number of nodes of a consistency protocol, and the data consistency, the correctness and the robustness of the whole system are achieved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 illustrates an architecture diagram of an existing cross-regional strongly consistent metadata storage system;
FIG. 2 illustrates a prior art Quorum group graph;
FIG. 3 illustrates a simplified component diagram of a cross-regional strong consistency system in accordance with an embodiment of the present invention;
FIG. 4 illustrates a cross-regional strong consistency system protocol interaction communication diagram of an embodiment of the present invention;
FIG. 5 illustrates a simplified assembly diagram of an embodiment of the present invention;
FIG. 6 is a schematic diagram of a cross-region strong consistency system according to an embodiment of the present invention.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
The present invention will be described in detail below by taking the example of the metadata as a journal, and those skilled in the art will understand that the metadata includes, but is not limited to, data such as a journal, and the metadata can be various information describing data properties (property) for supporting functions such as indicating storage locations, history data, resource lookup, file records, and the like.
Firstly, as shown in fig. 3, a plurality of backend machines (nuwal) form a backend consistent total System (Global Meta System), the backend machines are divided into a plurality of partitions (domains), that is, the backend machines can be designed into a plurality of qualum forms, each qualum can be designed into a plurality of Domain forms, each partition respectively responds to a corresponding request to share the request pressure, the pressure of a front-end System Frontend to the backend machine qualum is reduced, the problem of excessive pressure caused by the same backend machine responding to all requests is avoided, a user needs to specify a Global Meta System, the Quota of the front-end System and the Domain information to access a required resource position layer by layer, the multi-layer design structure of the NeuronId (backend machine number) - > qualum id (backend machine number) - > DomainId (partition number) is adopted, the horizontal extension of the backend machines (nuwal) is realized, in addition, the corresponding Quota of each partition is allocated to each user to write resource data of the corresponding user space, and metadata among users are partitioned and isolated, the security is also ensured, and different users can be distinguished conveniently to charge the Quota of the Quota. Here, a plurality of backend consistent total systems (Global metal systems) may be deployed worldwide, and fig. 3 illustrates an embodiment of the present invention to deploy only one backend consistent total System (Global metal System) globally.
As shown in fig. 3, a put/get/delete request sent by the user (client), that is, a write-read delete operation (requests) specifies a resource access location, in the multi-layer design of the present invention, a neuroid, a QuorumId, and a DomainId need to be specified to send a request, such as a restful request, to a Frontend, the Frontend finds a machine and a resource PATH at a location of a resource specified by the user through a stored mapping relationship, the resource of the user is stored in a neuroid- > querumid- > DomainId at the back end, and a multi-layer combination can accurately locate a location where the user accesses the resource.
As can be seen from the system architecture diagram of fig. 3, a three-tier access mechanism is required to reach the back-end NuwaLog system. Of course, a NuwaLog is currently configured as a global backend engine, and if necessary, multiple sets can be configured, each NuwaLog has a NeuronId as an identifier, and the NeuronId is directly identified as 1 here.
Multiple sets of Quorum systems are deployed in each NuwaLog to relieve the front-end machine from pressure on the back-end. Each set of Quorum systems is identified by Quorum1, Quorum2, Quorum3.
Each Quorum can establish a plurality of domains as the own domains of users, so as to prevent interference among the users and play a role in resource isolation, and each Domain has an identifier, i.e. Domain1, Domain2, etc. DomainId may be just a path (prefix of log path) at the NuwaLog back end as a resource isolation access.
Here, the client needs to transmit the neuroid, the qurumid, and the DomainId into the front end, and these three elements together form the PATH field in the restful protocol, and are sent to the front end through the standard restful protocol. The mapping relationship between NeuronId- > QuorumId- > DomainId and the actual server address can be saved in a configuration file.
One solution is that the configuration file can be stored locally on the front. The frontend needs to search a corresponding server (server) address from a local frontend configuration file according to a neuroid, a QuorumId and a DomainId in a user request, namely, a server ip: port at the rear end of a NuwaLog, so that the neuroid is formulated in the local frontend configuration file, a machine ip: port address in the QuorumId is formulated in a Json format, wherein the DomainId represents a unique ID number of the user, and a path meeting written on the rear end NuwaLog brings the DomainId. For example, as shown in FIG. 4, Neuron1 represents a Neuron with an ID number of 1, wherein the ip addresses and port numbers of the three servers of Quorum1 and the ip addresses and port numbers of the three servers of Quorum 2. The information sent by the user is the ID number such as Neuron1 and Quorum1, then the data is transmitted to 10.101.10.10, 10.101.10.11 and 10.101.10.12, and then the data is used as the path prefix written by the user to the backend server according to the DomainID number sent by the user, wherein the path prefix written to the backend is/DomainID/.
Alternatively, the configuration file may be stored in the NuwaLog at the backend. By using a notification mechanism of the backend NuwaLog, the mapping relationship between the QuorumId and the DomainId can be stored in the backend NuwaLog, and the frontend can unsubscribe the information. The mapping relation between the QuorumId and the DomainId can be stored by inputting the mapping relation into a back-end NuwaLog system in advance, and the frontend can acquire the subscribed information and further store the subscribed information in the memory. Thus, frontend can resolve according to the path in the user request to find the machine IP and resource location needing to be accessed.
The front end front and the rear end NuwaLog systems of the system adopt a standard Restful protocol for communication, so that the adoption of the standard Restful protocol can be better combined with resource positioning, and the storage positions of user data are all used as resource units. When the request sent by the client is a standard restful protocol, the frontend acquires the ip address and the port number of the corresponding NuwaLog back-end service according to the transmitted information of the NeuronId, the QuorumId and the DomainId. The design can locate the data home address to realize successful data access, wherein the NeuronId is used as the first address to accurately locate the resource to which region of the world.
Secondly, a simple component diagram of a cross-regional strong consistency System (Global Meta System) with separated log and snapshot designed by the invention is given, as shown in fig. 5, as can be seen from the simple component diagram of fig. 5, there are three server (service) components, which are:
distributed consistency system (NuwaLog), front end machine (Frontend), and local storage system (bulk local storage), wherein,
the distributed consistency system (NuwaLog) is a back-end Log consistency storage system, which is a Quorum and is a transactional Log submission system, the content of a user (client/user) received by a front-end machine (Frontend) can be assembled into a Log and written into a back-end NuwaLog, the Log can be consistent and highly reliable for the data of the user, the distributed consistency system has a function of not losing the data under the failover condition, all requests are only a Log in the NuwaLog, and the meaning of the Log is not required to be understood, wherein the Log is the transactional Log, namely the request data sent by the user, and is uniformly called as the Log in the NuwaLog back-end system;
a front-end machine (Frontend) is respectively deployed in each region, plays a role of transferring a user request in the front end, and has a function of pushing and pulling data with the NuwaLog;
the Local Storage System (Bulk Storage System/Local Storage) is a cloud open Storage System, which is deployed in each region where the front-end computer is located, and is used as large-scale Local Storage in each region (region). The bulk local storage may be a local large-scale table storage system, such as an OTS or HDFS system, and may be capable of persistently storing a large amount of data of a user.
When a user (client/user/user) sends a request to a front-end computer (Frontend), the operation of the whole cross-region strong consistency System (Global metal System) needs reasonable design of interaction protocols among all components to achieve the accuracy and the robustness of the System.
According to the method, when a cross-region strong consistency metadata System (Global Meta System) is constructed, log and snapshot need to be separated for a back-end distributed consistency System (NuwaLog), a local storage System of each region stores snapshot total data, a back-end machine (NuwaLog) only stores log, and the log can be continuously deleted (trim) along with reading of a front-end machine (Frontend). There is a general industry-wide split of distributed coherency systems. The distributed consistency system has many split service components, the communication protocol between the components is complex, and the factors of correctness, consistency, reliability and the like of system data need to be ensured. Here, the backend machine may be a backend coherent system.
As shown in fig. 6, in an embodiment of the present invention, the interactive protocol APIs of Frontend, Bulk Storage System and NuwaLog include:
(1) Register/Unregister, wherein, frontend registers or cancels frontend ID number to NuwaLog, if a plurality of NuwaLogs form a distributed consistency total system, each NuwaLog is divided into a plurality of subareas, frontend can Register to the distributed consistency total system for interacting with corresponding subareas;
(2) GetTrimPoint: the method comprises the steps that a frontend obtains a Log NXID number of data to be pulled from a NuwaLog, if a plurality of NuwaLogs form a distributed consistency total system, and each NuwaLog is divided into a plurality of partitions, the frontend obtains the Log NXID number of the data to be pulled from the corresponding partition;
(3) submit (also called Write): the frontend submits a transactional log of the write request to a NuwaLog system, if a plurality of NuwaLogs form a distributed consistency total system, and each NuwaLog is divided into a plurality of partitions, the frontend submits the transactional log of the write request to a corresponding partition;
(4) pull: the frontend pulls up data from the NuwaLog, if a plurality of NuwaLogs form a distributed consistency total system, and each NuwaLog is divided into a plurality of domains, the frontend pulls up data from the corresponding partition;
(5) ack: the frontend writes data from pull into Bulk Storage, and confirms the NXID number of the last Log written by the data to NuwaLog, if a plurality of NuwaLogs form a distributed consistency total system, and each NuwaLog is divided into a plurality of partitions, the frontend confirms the NXID number of the last Log written by the data to the corresponding partition.
As shown in fig. 3 and 6, the present invention provides a metadata interaction method on a backend machine, including:
a plurality of backend machines form a backend consistency total System (Global metal System), and the backend machines are divided into a plurality of sub-areas (domains), that is to say, one backend consistency total System can be formed by a plurality of backend machines (quadrum), and each backend machine is formed by a plurality of domains; here, the partition is a resource isolation domain between users;
distributing corresponding partitions to different users;
as shown in fig. 6, a log request of a user for updating a corresponding partition is received from a front-end computer (Frontend), and a corresponding log is updated into a partition of the user in a NuwaLog of a certain back-end computer in a corresponding back-end consistency total system according to the log update request of the user; here, the update includes a write and/or a post-write modification;
the partition continuously sends logs to the front-end computer (Frontend), receives the position of the currently pulled log in the partition from the front-end computer, and deletes the pulled log in the partition according to the position of the log in the partition.
Here, the corresponding partition receives requests from all front-end machines (frontends) and stores the Log. The front-end machine (Frontend) can start two threads, one thread continuously pulls Log data from a corresponding partition in a certain NuwaLog, the other thread continuously writes data into an ots (bulk storage) and replies the position of the corresponding partition which has been pulled down, the corresponding partition deletes (trim) the Log which is next to the minimum ID number of the Log which has been pulled down in a plurality of positions replied by all the front-end machines (Frontend) according to the received position, namely all logs before the NXID number, such as A, B and C of three front-end machines at present, for example, 10 logs originally exist in the corresponding partition, each Log corresponds to an ID number of 1-10, each front-end machine pulls the Log from the minimum ID number 1, if the current front-end machine a pulls the Log with the ID number of 3, the front-end machine B returns the position of the NXID number to the corresponding partition to be 4, at this time, the front-end machine B pulls the Log with the ID number of 4, replying the NXID number position to the corresponding partition to be 5, replying the NXID number position to be 6 after the front-end computer C pulls the log with the ID number of 5, and then deleting the pulled log with the ID numbers of 1-3 before the ID number of 4 from the storage space of the corresponding partition by the subsequent corresponding partition, wherein part of the logs with the ID numbers of 4-5 are not pulled completely, and the logs with the ID numbers of 4-5 can be deleted after all the front-end computers are pulled completely.
In the embodiment, the plurality of backend machines form a backend consistency total system, the backend machines are divided into a plurality of subareas, and each subarea respectively responds to the corresponding request to share the request pressure, so that the problem of overlarge pressure caused by the fact that the same backend machine responds to all requests is avoided, and the horizontal extension of the backend machines is realized; in addition, by distributing corresponding partitions for different users, a Quota Quota can be made for a metadata resource space written by each user, metadata among the users are partitioned and isolated, the security is also ensured, and different users can be distinguished conveniently to charge the Quota Quota; in addition, the front-end computer (Frontend) continuously pulls logs from a corresponding partition of a certain back-end computer (NuwaLog), continuously stores the pulled logs to a local storage system (OTS) of a region where the front-end computer is located, and replies the position of the currently pulled logs to the corresponding partition, and the corresponding partition deletes the pulled logs stored in the corresponding partition according to the position, so that the corresponding partition does not need to store all logs, only the local storage system (OTS) is required to be stored and not pulled, the reduction of the stand-alone storage performance and the limitation of the storage capacity of the back-end computer (NuwaLog) can be avoided, the number of data storage sites is limited by the number of nodes of the consistency protocol, and the data consistency, the correctness and the robustness of the whole system are achieved.
In an embodiment of the present invention, as shown in fig. 6, after the corresponding partition continuously sends the log to the front end, the method further includes:
the corresponding partition acquires a read log request of a user from the front end (Frontend), and sends a corresponding log to the front end according to the read log request. The embodiment is an optimized supplement to the scheme that the front-end computer obtains the corresponding log from the local storage system of the region where the front-end computer is located, and therefore, when the situation that the log which needs to be read by the user is not pulled into the local storage system of the corresponding region occurs, the log reading request of the user can be responded in time, and the log can be directly read from the corresponding partition.
In an embodiment of the present invention, the front end machine (Frontend) adopts a main/standby server architecture. Here, by adopting the master/slave architecture, a Frontend single point failure can be avoided, which results in that log data in a local storage system (Bulk storage system) of a local domain (region) cannot be updated.
In an embodiment of the present invention, before the corresponding partition receives an update log request of a user from a front-end computer or obtains a read log request of the user from the front-end computer, the method further includes:
the back-end consistency main system receives registration requests from the main server and the standby server;
and after registering the main server and the standby server according to the registration request, the rear-end consistency total system sends registration success information to the main server and the standby server and sends the position of the currently pulled log in the partition to the main server. Here, the Frontend of each region includes that the primary server and the backup server both need to be registered in the backend consistency total system, so that a partition in the backend consistency total system can find a corresponding Frontend sending request result, specifically, as shown in fig. 6, when each Frontend is started, a register may be called to register to the NuwaLog first, since the primary server is responsible for transactional operations, after the registration is successful, the primary server may call a getmpoint to obtain an ID number of a next position of a log that the backend NuwaLog has currently deleted, for example, a log whose ID number is currently deleted is 3, then obtain a position of the log that has currently been pulled is 4, that is, the primary server starts to pull the log from the position 4 next time.
In an embodiment of the present invention, a distributed Lock Service (Nuwa Lock Service) is used to switch between a primary server and a standby server in the front-end computer. In order to avoid a fronted single point failure and prevent log data in a local storage system (Bulk storage system) of a local domain (region) from being updated, a master/slave architecture is adopted to ensure that only a master server (master) performs transactional operation each time, when the master server (master) fails, the master server can be selected again through a Nuwa Lock Service, namely, the master is selected, the Nuwa Lock Service can provide a distributed Lock protocol and can be suitable for the fronted master selection, and the distributed Lock Service system (Nuwa Lock Service) can be a globalized distributed Lock Service system (Nuwa Global Lock Service) and can manage the Global master/slave architecture.
In an embodiment of the present invention, when a distributed lock service system switches a primary server and a standby server in the front-end computer, the method further includes:
as shown in fig. 6, the back-end consistency master system receives a logout request from an offline main server and/or an offline standby server, and sends logout result information to the offline main server and/or the offline standby server after logging out the offline main server and/or the offline standby server according to the logout request;
the back-end consistency main system receives a registration request from a newly online main server and/or standby server, registers the newly online main server and/or standby server according to the registration request, and then sends registration success information to the newly online main server and/or standby server and the position of a currently pulled log in a partition to the newly online main server. Here, Frontend of each region needs to be registered to NuwaLog, and includes a main server and a standby server which are newly on-line, so that the back-end consistency total system can find a corresponding Frontend sending request result, specifically, as shown in fig. 6, when each main server and standby server which are newly on-line are started, a register may be called to register to the back-end consistency total system first, and since only the main server (master) performs a transactional operation, after the registration is successful, the main server (master) may call a GetTrimPoint to obtain a next position ID number of a log which is currently deleted by the back-end NuwaLog, for example, a log whose ID number is currently deleted is No. 3, and then the position where the log which is currently pulled is obtained is 4, that is, the main server (master) starts to pull the log from the position 4 next time. In addition, when the offline Frontend includes a primary server and a standby server, a unregister needs to be called to offline the Frontend from the backend coherency bus.
In an embodiment of the present invention, as shown in fig. 6, when updating the corresponding log into the corresponding partition, the method further includes:
generating a corresponding snapshot according to the currently stored log of the corresponding partition and storing the corresponding snapshot into the same partition;
the corresponding partition deletes the pulled log of the partition according to the position of the pulled log in the partition, and simultaneously comprises the following steps:
deleting the corresponding snapshot from the partition. Here, Snapshot is a Snapshot, that is, a Snapshot of the total amount of data in the memory at a certain time in the corresponding partition. The corresponding partition receives a request of the front and stores log and snapshot, wherein the snapshot is used for solving the problem that the partition has a failover (failover), log data are not transmitted to a local OTS (optical transport system), the partition restarts and can recover the data which are not transmitted to the OTS from the snapshot, the partition deletes the pulled log according to the position and deletes the snapshot file on the corresponding disk at intervals, and the expired useless snapshot is avoided being stored.
In an embodiment of the present invention, an avro serialization manner may be adopted between the front end (front) and the back end (NuwaLog), and Netty is used as RPC communication. The reason for choosing the avro protocol to use here is that the serialization protocol is relatively simple and efficient. Netty, a common RPC protocol in the industry, is a mature and efficient remote network communication protocol in the JAVA domain.
As shown in fig. 6, the present invention further provides a metadata interaction method on a front-end machine, where the method includes:
a front-end computer (Frontend) receives a Log request of a user for updating a corresponding partition, updates a corresponding Log (Log) into the corresponding partition by sending the Log request of the user for updating the corresponding partition to the corresponding partition, wherein different users are distributed with the corresponding partition, a rear-end computer is divided into a plurality of partitions, and the plurality of rear-end computers form a rear-end consistency total system; here, the corresponding partition receives all the fronted requests and stores log;
the front-end machine (Frontend) continuously pulls (pull) logs from the back-end machine (NuwaLog), continuously stores (Apply) the pulled logs to a local storage system (OTS) of a region where the front-end machine is located, and replies (Send Ack) the position of the currently pulled logs in the corresponding partition so that the corresponding partition deletes the pulled (pull) logs stored in the corresponding partition according to the position.
Here, the corresponding partition receives requests from all front-end machines (frontends) and stores the Log. The front-end machine (Frontend) can start two threads, one thread continuously pulls Log data from the corresponding partition, the other thread continuously writes data into the OTS, and replies the position of the corresponding partition which is already pulled down, the corresponding partition deletes (trim) all logs before the next ID number of the minimum ID number of the Log which is already pulled down in a plurality of positions which are replied by all the front-end machines (Frontend) according to the received position, namely all logs before the NXID number, such as three front-end machines A, B and C at present, in addition, 10 logs originally exist in the NuwaLog at the back end, each Log corresponds to the ID numbers of 1-10, each front-end machine pulls the Log from the minimum ID number 1, if the front-end machine A pulls the Log with the ID number of 3, the front-end machine B returns to the NuwaLog NXID number of 4, at this time, the front-end machine B pulls the Log with the ID number of 4, and replies to the NuwaLog NXID number of 5, and at this time, the front-end machine B pulls the Log of the NuwaLog of the Nuwa, and the NXID number position is returned to the NuwaLog to be 6, then the subsequent NuwaLog only deletes the log which is pulled and has the ID number of 1-3 before the ID number of 4 from the storage space of the NuwaLog, and because part of the log which has the ID number of 4-5 is not pulled and is not pulled completely, the log can be deleted after all the front-end machines are pulled and pulled completely.
In the embodiment, a plurality of backend machines form a backend consistency total system, the backend machines are divided into a plurality of partitions, each partition respectively responds to a corresponding request and shares request pressure, the problem of overlarge pressure caused by the fact that the same backend machine responds to all requests is avoided, and horizontal expansion of the backend machines is realized; in addition, by distributing corresponding partitions for different users, a Quota Quota can be made for a metadata resource space written by each user, metadata among the users are partitioned and isolated, the security is also ensured, and different users can be distinguished conveniently to charge the Quota Quota; in addition, the front-end computer (Frontend) continuously pulls the logs from the corresponding partition, continuously stores the pulled logs into a local storage system (OTS) of the region where the front-end computer is located, replies the position of the currently pulled logs to the corresponding partition, and deletes the pulled logs stored in the corresponding partition according to the position, so that the corresponding partition does not need to store all logs, only needs to store the logs which are not pulled into the local storage system (OTS), can avoid the reduction of the single-machine storage performance and the limitation of the storage capacity of the corresponding partition, and solves the problem that the number of data storage sites is limited by the number of nodes of the consistency protocol, thereby achieving the data consistency, the correctness and the robustness of the whole system.
In an embodiment of the present invention, as shown in fig. 6, after the front end computer (Frontend) stores the pulled log in a local storage system of a region where the front end computer is located, the method further includes:
the front-end machine receives a log request of a user for reading (get) a corresponding partition;
the front end computer (Frontend) acquires the log of the corresponding partition from the local storage system of the region where the front end computer is located according to the log request for reading the corresponding partition,
if the log is acquired from the local storage system, the front end machine (Frontend) replies the acquired log to the user. Therefore, all non-transactional operation requests can be prevented from being sent to corresponding partitions in the back-end consistency total system, so that the workload of the back-end consistency total system is reduced, in addition, the response speed of the user for reading the log requests can also be improved by acquiring the corresponding logs from the local storage system of the region where the front-end computer is located through the front-end computer corresponding to the region, so that the data reading performance is improved, for example, a user in new york can read logs from the local storage system in new york through the front-end computer (Frontend) in new york, and a user in beijing can read logs from the local storage system in beijing through the front-end computer (Frontend) in new york.
In an embodiment of the present invention, after acquiring the log of the corresponding partition from the local storage system of the region where the front-end computer is located, the method further includes:
if the log reading request is not acquired from the local storage system, the log reading request is sent to the corresponding partition, and the log of the corresponding partition is acquired (strong read) from the corresponding partition, and then the acquired log of the corresponding partition is replied to the user.
In an embodiment of the present invention, the front-end computer adopts a master/slave (master/slave) architecture. Here, by adopting the master/slave architecture, a Frontend single point failure can be avoided, which results in that log data in a local storage system (Bulk storage system) of a local domain (region) cannot be updated.
In an embodiment of the present invention, before receiving an update log request from a user or receiving a log request from a user to read a corresponding partition, the method includes:
the main server and the standby server send registration requests to a backend consistency main system;
and after the main server and the standby server receive the registration success information from the back-end consistency main system, the main server acquires the position of the currently pulled log in the partition from the back-end machine. Here, the Frontend of each region includes that the primary server and the standby server both need to be registered to the backend consistency total system, so that the corresponding partition in the backend consistency total system can find the corresponding Frontend sending request result, specifically, as shown in fig. 6, when each Frontend is started, a register may be called to register to the backend consistency total system first, and since the primary server is responsible for transactional operation, after the registration is successful, the primary server may call a GetTrimPoint to obtain a next location ID number of a currently deleted log of the corresponding partition, for example, a currently deleted log with an ID number of 3, then obtain a location of the currently pulled log of 4, that is, the primary server starts to pull the log from the location 4 next time.
In an embodiment of the present invention, as shown in fig. 6, a distributed Lock Service (Nuwa Lock Service) is used to switch between the primary server and the standby server in the front-end computer. In order to avoid a single point of failure of Frontend, log data in a local storage system (Bulk storage system) of a local domain (region) cannot be updated, a master/slave architecture is adopted here to ensure that only a master server (master) performs transactional operation each time, and after the master server (master) fails, a master can be selected again through a Nuwa Lock Service, that is, the master is selected, and the Nuwa Lock Service can provide a distributed Lock protocol and is suitable for Frontend master selection.
In an embodiment of the present invention, while the distributed lock service system switches the primary server and the standby server in the front-end computer, the distributed lock service system further includes:
the off-line main server and the off-line standby server send logout requests to the distributed lock service system and receive logout result information from the back-end consistency main system;
and after the newly online main server and the standby server send registration requests to the back-end consistency total system and receive registration success information from the back-end consistency total system, the newly online main server obtains the position of the currently pulled log in the partition from the back-end machine. Here, the Frontend of each region needs to be registered to the backend consistency total system, including the primary server and the standby server that are newly online, so that the corresponding partition in the backend consistency total system can find the corresponding Frontend sending request result, specifically, as shown in fig. 6, when each of the primary server and the standby server that are newly online is started, a register may be called to register to the backend consistency total system first, and since only the primary server (master) performs a transactional operation, the primary server (master) may call a GetTrimPoint to obtain a next location ID number that the backend NuwaLog has been deleted currently after the registration is successful, for example, a log whose ID number is currently deleted is No. 3, and then a location that the log that has been pulled currently is 4, that is, the primary server (master) starts to pull the log from location 4 next time. In addition, when the offline Frontend includes a primary server and a standby server, a unregister needs to be called to offline the Frontend from the backend coherency bus.
In an embodiment of the present invention, as shown in fig. 6, a log update request of a user is obtained from a load balancing Server (VIP Server), and the work of updating a corresponding log into a backend machine by sending the log update request of the user to the backend machine can be completed by the main Server or the standby Server;
and continuously pulling logs (Pull logs) from the corresponding partitions, continuously storing the pulled logs to a local storage system (Apply logs) of a region where the front-end computers are located, and replying transactional work of the positions (Send Ack) of the pulled logs in the partitions to the corresponding partitions, wherein the transactional work is finished only by the main server, so that the data consistency of the whole system is ensured.
In an embodiment of the present invention, receiving a metadata request for reading a corresponding partition from a user, acquiring metadata of the corresponding partition from a local storage system in a region where a front-end computer is located according to the metadata request for reading the corresponding partition, and if the metadata is acquired from the local storage system, returning the acquired metadata to the user includes:
as shown in fig. 6, the master Server (master) or the standby Server (slave) receives a log request of a user to read a corresponding partition from the load balancing Server (VIP Server);
and the main server or the standby server acquires the log of the corresponding partition from a local storage system of the region where the front-end computer is located according to the log request for reading the corresponding partition, and if the log is acquired from the local storage system, the acquired log is replied to the user. The load balancing Server (VIP Server) plays a role in load balancing, a client/user only needs to access one IP or domain name, namely the IP address or domain name address of the VIP Server, the VIP Server can be used for mounting a plurality of frontends, and the IP and port of the Frontend on each machine can be configured. The request sent by the client/user is sent to one of the main Server or the standby Server by the VIP Server according to a certain load balancing strategy.
In an embodiment of the present invention, after obtaining metadata from a corresponding partition by sending the metadata reading request to the corresponding partition without obtaining the metadata from the local storage system, replying the obtained metadata to the user includes:
and if the metadata is not acquired from the local storage system, the main server or the standby server sends the metadata reading request to the corresponding partition, and the acquired metadata is replied to the user after the metadata is acquired from the corresponding partition. Here, the corresponding logs are obtained from the corresponding partitions in a non-transactional manner, and the load balancing Server (VIP Server) can select one of the primary Server or the standby Server to complete, so as to achieve a better load balancing effect.
In detail, the following explains the flow of the operation data of the respective constituent elements:
in fig. 6, there are three types of working data flow directions, which are respectively represented by different line types, wherein the first type is front-end boot operations (front boot operations), including main server (master) and standby server (slave) boot operations; the second type is Normal read write operations (Normal read operations), which can be performed by a main server (master) or a standby server (slave) receiving a corresponding request from a load balancing server; the third type is operations (Master operations) that are performed only by the host server, including Pull logs, Aply logs, and Send Ack.
From the Frontend perspective, the working data flow is as follows:
1. as shown in fig. 6, when the front-end machine starts up (front bootstrap operations), that is, when the front-end machine starts up, the register may be called first to register to the back-end consistency master system, and since the master server is responsible for the transactional operation, after the registration is successful, the master server may call GetTrimPoint to obtain the next location NXID number of the currently deleted log of the corresponding partition, for example, the currently deleted log with the ID number of 3 is obtained, and then the location of the currently pulled log is 4, that is, the master server starts to pull the log from location 4 next time. When the Frontend is dropped off, the unregister needs to be called to drop the Frontend from the back end consistency master system;
pull logs and Apply logs are things that a thread in front does, log is dragged from a corresponding partition at intervals, and then Apply to a Bulk storage System;
send Ack is that NXID of the front applied logs to Bulk Storage System, reported by front to the next ID number, NXID, corresponding to the last ID number of log that front has pulled off and successfully written into Bulk Storage;
and 4, the Frontend receives a put/get/delete request sent by the client/user, wherein the transactional requests of the put/delete need to be sent to the corresponding partition, and the get is a non-transactional operation and is sent to the Bulk Storage.
From a partition perspective, the working data flow is as follows:
1. storing a put request sent by Frontend, and writing data into a partition in a log form;
2.pull and ack interactions with Frontend, send log to Frontend, and reply to the ack successful NXID now sent by Frontend to the corresponding partition;
3. when the client/user sends a read request of the front, if the front does not obtain the request from the Bulk Storage, the front needs to send a strong read request to the corresponding partition, and directly obtains data from the corresponding partition.
From the Bulk Storage System perspective, the working data flow is as follows:
the Bulk Storage System is a persistent Storage System that stores the full amount of log data, Frontend interacts with Bulk Storage, transfers log, and retrieves log.
The rules customized by the specific API protocol comprise: all Response return messages will have fields rc, message, traceinfo, rc representing the return code, message representing the return message, traceinfo to track the effect of the fast and slow requests, and in particular,
request and Response of registers and Unregisters
The requests and responses of the registers and Unregisters are subjected to avro serialization, when Frontend starts, Frontend ID is registered to a rear-end consistency master system, and because each Register has a plurality of frontends, the rear end of NuwaLog knows to send information to the Frontend.
Request and Response of GetTrimPoint
When the front server generates failover, the newly online main server can also acquire the Trim Point. The Trim Point is used by the Pull thread of the main server, and the Pull thread obtains the Log information by using the Trim Point to the back end.
Request and Response of Submit
The Request and Response of sumit are subjected to avro serialization. For a Submit request, the request is assembled into a transactional Log request at the front end, and a response returns a return code and return information.
Request and Response of Pull
Pull's Request and Response were subjected to avro sequencing. Frontend will start a pull thread to time the set of log sequences in NuwaLog, request is Nxid of the current log, and the maximum log number and maximum log bytes of the control flow. response is the return code and the return information, log array information (log includes inside it nxid of the log, the contents of the log).
Request and Response of Ack
The Request and Response of Ack are subjected to avro sequencing. Frontend also starts a log data queue pulled by ack thread from Pull thread at regular time, pulls data from the queue to local OTS, and confirms a message to the corresponding partition after commit succeeds, request is the largest nxid from commit to local OTS succeeds, and response is the return code and return message.
The five APIs can be transmitted through the restful protocol, so that a standard http protocol header and a protocol body are required. The following were used:
in the Request example, the meaning of each field used is:
URL, marking the ip and port number port of the service end;
REQUEST, marking that the REQUEST is an HTTP POST REQUEST and indicating the path of the accessed resource;
http Headers, where the surface needs to indicate that the protocol is avro, and NuwaNeuron-Action is an Action for indicating that the Http request originates, where the surfaces are: submit, Pull, Ack, Register, Unregister. NuwanNeuron-Timestamp indicates the Timestamp of the http request initiation, and NuwanNeuron-API-Version indicates the Version number of the http request. The NuwaNeuron-PackageId identifies the sequence number of the packet that this http request originated. The nuwaneruron-RequestId marks the ID number from which this http request originates.
In the Response example, the meaning of each field used is: http requests the response returned, which includes the response header and the http body. The header includes a Content-type, which indicates the type of the requested Content, here, avro. Content-length specifies the length of the http return request Content. NuwanNeuron-Timestamp designates the Timestamp of this http response, and NuwanNeuron-RequestId designates the requestID number of this http request, as previously described.
According to another aspect of the present invention, there is also provided a backend machine, as shown in fig. 3 and 6, including:
the partition device is used for forming a back-end consistency total system by the back-end machines, dividing the back-end machines into a plurality of partitions and distributing corresponding partitions for different users;
the updating device is used for receiving a log request of a user for updating the corresponding partition from the front-end computer and updating the corresponding log into the corresponding partition according to the request;
and the sending device is used for sending the log to the front-end computer, receiving the position of the currently pulled log in the partition from the front-end computer, and deleting the pulled log in the corresponding partition according to the position in the partition.
Here, the corresponding partition receives requests from all front-end machines (frontends) and stores the Log. The front-end machine (Frontend) can start two threads, one thread continuously pulls Log data from a corresponding partition in a certain NuwaLog, the other thread continuously writes data into an ots (bulk storage) and replies the position of the corresponding partition which has been pulled down, the corresponding partition deletes (trim) the Log which is next to the minimum ID number of the Log which has been pulled down in a plurality of positions replied by all the front-end machines (Frontend) according to the received position, namely all logs before the NXID number, such as A, B and C of three front-end machines at present, for example, 10 logs originally exist in the corresponding partition, each Log corresponds to an ID number of 1-10, each front-end machine pulls the Log from the minimum ID number 1, if the current front-end machine a pulls the Log with the ID number of 3, the front-end machine B returns the position of the NXID number to the corresponding partition to be 4, at this time, the front-end machine B pulls the Log with the ID number of 4, replying the NXID number position to the corresponding partition to be 5, replying the NXID number position to be 6 after the front-end computer C pulls the log with the ID number of 5, and then deleting the pulled log with the ID numbers of 1-3 before the ID number of 4 from the storage space of the corresponding partition by the subsequent corresponding partition, wherein part of the logs with the ID numbers of 4-5 are not pulled completely, and the logs with the ID numbers of 4-5 can be deleted after all the front-end computers are pulled completely.
In the embodiment, the plurality of backend machines form a backend consistency total system, the backend machines are divided into a plurality of subareas, and each subarea respectively responds to the corresponding request to share the request pressure, so that the problem of overlarge pressure caused by the fact that the same backend machine responds to all requests is avoided, and the horizontal extension of the backend machines is realized; in addition, by distributing corresponding partitions for different users, a Quota Quota can be made for a log resource space written by each user, and logs among the users are partitioned and isolated, so that the safety is ensured, and different users can be distinguished conveniently to charge the Quota Quota; in addition, the front-end computer (Frontend) continuously pulls logs from a corresponding partition of a certain back-end computer (NuwaLog), continuously stores the pulled logs to a local storage system (OTS) of a region where the front-end computer is located, and replies the position of the currently pulled logs to the corresponding partition, and the corresponding partition deletes the pulled logs stored in the corresponding partition according to the position, so that the corresponding partition does not need to store all logs, only the local storage system (OTS) is required to be stored and not pulled, the reduction of the stand-alone storage performance and the limitation of the storage capacity of the back-end computer (NuwaLog) can be avoided, the number of data storage sites is limited by the number of nodes of the consistency protocol, and the data consistency, the correctness and the robustness of the whole system are achieved.
In the backend machine according to an embodiment of the present invention, the sending device is further configured to obtain, from the front-end machine, a log request of a user for reading a corresponding partition, and send a corresponding log to the front-end machine according to the log reading request. The embodiment is an optimized supplement to the scheme that the front-end computer obtains the corresponding log from the local storage system of the region where the front-end computer is located, and therefore, when the situation that the log which needs to be read by the user is not pulled into the local storage system of the corresponding region occurs, the log reading request of the user can be responded in time, and the log can be directly read from the corresponding partition.
In the backend machine according to an embodiment of the present invention, the front-end machine adopts a main/standby server architecture. Here, by adopting the master/slave architecture, a Frontend single point failure can be avoided, which results in that log data in a local storage system (Bulk storage system) of a local domain (region) cannot be updated.
In the backend apparatus according to an embodiment of the present invention, the backend apparatus further includes a registration device configured to: receiving registration requests from the primary server and the standby server; and after the main server and the standby server are registered according to the registration request, sending registration success information to the main server and the standby server and sending the position of the currently pulled log in the partition to the main server. Here, the Frontend of each region includes that the primary server and the backup server both need to be registered in the backend consistency total system, so that a partition in the backend consistency total system can find a corresponding Frontend sending request result, specifically, as shown in fig. 6, when each Frontend is started, a register may be called to register to the NuwaLog first, since the primary server is responsible for transactional operations, after the registration is successful, the primary server may call a getmpoint to obtain an ID number of a next position of a log that the backend NuwaLog has currently deleted, for example, a log whose ID number is currently deleted is 3, then obtain a position of the log that has currently been pulled is 4, that is, the primary server starts to pull the log from the position 4 next time.
In the backend machine according to an embodiment of the present invention, a distributed lock service system switches a main server and a standby server in the front-end machine. In order to avoid a fronted single point failure and prevent log data in a local storage system (Bulk storage system) of a local domain (region) from being updated, a master/slave architecture is adopted to ensure that only a master server (master) performs transactional operation each time, when the master server (master) fails, the master server can be selected again through a Nuwa Lock Service, namely, the master is selected, the Nuwa Lock Service can provide a distributed Lock protocol and can be suitable for the fronted master selection, and the distributed Lock Service system (Nuwa Lock Service) can be a globalized distributed Lock Service system (Nuwa Global Lock Service) and can manage the Global master/slave architecture.
In the backend machine according to an embodiment of the present invention, as shown in fig. 6, the registration apparatus is further configured to receive a logout request from the offline main server and/or the offline standby server, and send logout result information to the offline main server and/or the offline standby server after logging out the offline main server and/or the offline standby server according to the logout request;
the registration device is further configured to receive a registration request from a newly online main server and/or standby server, register the newly online main server and/or standby server according to the registration request, and then send registration success information to the newly online main server and/or standby server and send the position of a currently pulled log in a partition where the log is located to the newly online main server. Here, Frontend of each region needs to be registered to NuwaLog, and includes a main server and a standby server which are newly on-line, so that the back-end consistency total system can find a corresponding Frontend sending request result, specifically, as shown in fig. 6, when each main server and standby server which are newly on-line are started, a register may be called to register to the back-end consistency total system first, and since only the main server (master) performs a transactional operation, after the registration is successful, the main server (master) may call a GetTrimPoint to obtain a next position ID number of a log which is currently deleted by the back-end NuwaLog, for example, a log whose ID number is currently deleted is No. 3, and then the position where the log which is currently pulled is obtained is 4, that is, the main server (master) starts to pull the log from the position 4 next time. In addition, when the offline Frontend includes a primary server and a standby server, a unregister needs to be called to offline the Frontend from the backend coherency bus.
In the backend machine according to an embodiment of the present invention, the update device is further configured to generate a corresponding snapshot according to a currently stored log in the partition and store the corresponding snapshot into the partition while updating the corresponding log into the corresponding partition;
the sending device is further configured to delete the corresponding snapshot from the partition while deleting the pulled log in the partition according to the position in the partition. Here, Snapshot is a Snapshot, that is, a Snapshot of the total amount of data in the memory at a certain time in the corresponding partition. The corresponding partition receives a request of the front and stores log and snapshot, wherein the snapshot is used for solving the problem that the partition has a failover (failover), log data are not transmitted to a local OTS (optical transport system), the partition restarts and can recover the data which are not transmitted to the OTS from the snapshot, the partition deletes the pulled log according to the position and deletes the snapshot file on the corresponding disk at intervals, and the expired useless snapshot is avoided being stored.
In an embodiment of the present invention, an avro serialization manner may be adopted between the front end (front) and the back end (NuwaLog), and Netty is used as RPC communication. The reason for choosing the avro protocol to use here is that the serialization protocol is relatively simple and efficient. Netty, a common RPC protocol in the industry, is a mature and efficient remote network communication protocol in the JAVA domain.
According to another aspect of the present invention, there is also provided a front end machine, including:
the system comprises an initiating updating device, a receiving device and a processing device, wherein the initiating updating device is used for receiving a log request of a user for updating a corresponding partition, and updating the log into the corresponding partition by sending the request to the corresponding partition, different users are distributed with the corresponding partitions, a back-end machine is divided into a plurality of partitions, and the plurality of back-end machines form a back-end consistency total system;
and the pulling device is used for pulling the logs from the corresponding partitions, storing the pulled logs to a local storage system of the region where the front-end computer is located, and replying the positions of the currently pulled logs in the partitions to the corresponding partitions so that the partitions can delete the pulled logs stored in the partitions according to the positions of the currently pulled logs in the partitions.
Here, the corresponding partition receives requests from all front-end machines (frontends) and stores the Log. The front-end machine (Frontend) can start two threads, one thread continuously pulls Log data from the corresponding partition, the other thread continuously writes data into the OTS, and replies the position of the corresponding partition which is already pulled down, the corresponding partition deletes (trim) all logs before the next ID number of the minimum ID number of the Log which is already pulled down in a plurality of positions which are replied by all the front-end machines (Frontend) according to the received position, namely all logs before the NXID number, such as three front-end machines A, B and C at present, in addition, 10 logs originally exist in the NuwaLog at the back end, each Log corresponds to the ID numbers of 1-10, each front-end machine pulls the Log from the minimum ID number 1, if the front-end machine A pulls the Log with the ID number of 3, the front-end machine B returns to the NuwaLog NXID number of 4, at this time, the front-end machine B pulls the Log with the ID number of 4, and replies to the NuwaLog NXID number of 5, and at this time, the front-end machine B pulls the Log of the NuwaLog of the Nuwa, and the NXID number position is returned to the NuwaLog to be 6, then the subsequent NuwaLog only deletes the log which is pulled and has the ID number of 1-3 before the ID number of 4 from the storage space of the NuwaLog, and because part of the log which has the ID number of 4-5 is not pulled and is not pulled completely, the log can be deleted after all the front-end machines are pulled and pulled completely.
In the embodiment, a plurality of backend machines form a backend consistency total system, the backend machines are divided into a plurality of partitions, each partition respectively responds to a corresponding request and shares request pressure, the problem of overlarge pressure caused by the fact that the same backend machine responds to all requests is avoided, and horizontal expansion of the backend machines is realized; in addition, by distributing corresponding partitions for different users, a Quota Quota can be made for a metadata resource space written by each user, metadata among the users are partitioned and isolated, the security is also ensured, and different users can be distinguished conveniently to charge the Quota Quota; in addition, the front-end computer (Frontend) continuously pulls the logs from the corresponding partition, continuously stores the pulled logs into a local storage system (OTS) of the region where the front-end computer is located, replies the position of the currently pulled logs to the corresponding partition, and deletes the pulled logs stored in the corresponding partition according to the position, so that the corresponding partition does not need to store all logs, only needs to store the logs which are not pulled into the local storage system (OTS), can avoid the reduction of the single-machine storage performance and the limitation of the storage capacity of the corresponding partition, and solves the problem that the number of data storage sites is limited by the number of nodes of the consistency protocol, thereby achieving the data consistency, the correctness and the robustness of the whole system.
The front-end computer of an embodiment of the present invention further includes a reading device, configured to receive a log request of a user for reading a corresponding partition; acquiring the log of the corresponding partition from the local storage system of the region where the front-end computer is located according to the log request for reading the corresponding partition,
and if the log is acquired from the local storage system, replying the log acquired from the corresponding partition to the user. Therefore, all non-transactional operation requests can be prevented from being sent to corresponding partitions in the back-end consistency total system, so that the workload of the back-end consistency total system is reduced, in addition, the response speed of the user for reading the log requests can also be improved by acquiring the corresponding logs from the local storage system of the region where the front-end computer is located through the front-end computer corresponding to the region, so that the data reading performance is improved, for example, a user in new york can read logs from the local storage system in new york through the front-end computer (Frontend) in new york, and a user in beijing can read logs from the local storage system in beijing through the front-end computer (Frontend) in new york.
In the front-end computer according to an embodiment of the present invention, the reading device is further configured to, if the log is not obtained from the local storage system, send the log reading request to the corresponding partition, and after obtaining the log from the corresponding partition, reply the log obtained from the corresponding partition to the user. The embodiment is an optimized supplement to the scheme that the front-end computer obtains the corresponding log from the local storage system of the region where the front-end computer is located in the previous embodiment, and therefore, when the situation that the log which needs to be read by the user is not pulled into the local storage system of the corresponding region occurs, the log reading request of the user can be timely responded, and the log can be directly read from the corresponding partition of the back-end consistency total system.
In the front-end computer according to an embodiment of the present invention, the front-end computer adopts a main/standby server architecture. Here, by adopting the master/slave architecture, a Frontend single point failure can be avoided, which results in that log data in a local storage system (Bulk storage system) of a local domain (region) cannot be updated.
The front-end computer of an embodiment of the present invention further includes a registration initiating device, configured to enable the main server and the standby server to send a registration request to a backend consistency total system; and after the main server and the standby server receive the registration success information from the back-end consistency total system, the main server acquires the position of the currently pulled log in the partition from the corresponding partition. Here, the Frontend of each region includes that the primary server and the standby server both need to be registered to the backend consistency total system, so that the corresponding partition in the backend consistency total system can find the corresponding Frontend sending request result, specifically, as shown in fig. 6, when each Frontend is started, a register may be called to register to the backend consistency total system first, and since the primary server is responsible for transactional operation, after the registration is successful, the primary server may call a GetTrimPoint to obtain a next location ID number of a currently deleted log of the corresponding partition, for example, a currently deleted log with an ID number of 3, then obtain a location of the currently pulled log of 4, that is, the primary server starts to pull the log from the location 4 next time.
In the front-end computer according to an embodiment of the present invention, a distributed lock service system switches a main server and a standby server in the front-end computer. In order to avoid a single point of failure of Frontend, log data in a local storage system (Bulk storage system) of a local domain (region) cannot be updated, a master/slave architecture is adopted here to ensure that only a master server (master) performs transactional operation each time, and after the master server (master) fails, the master can be selected again through a Nuwa Lock Service, that is, the master is selected, and the Nuwa Lock Service can provide a distributed Lock protocol and is suitable for the Frontend to perform master selection.
The front-end computer of the embodiment of the invention also comprises a logout initiating device, a main server and a standby server which are used for off-line, sending a logout request to the distributed lock service system, and receiving logout result information from a back-end consistency total system;
the registration initiating device is further configured to send a registration request to the backend consistency total system by the newly online main server and the standby server, and after receiving registration success information from the backend consistency total system, the newly online main server obtains the position of the currently pulled log in the partition from the corresponding partition. Here, the Frontend of each region needs to be registered to the backend consistency total system, including the primary server and the standby server that are newly online, so that the corresponding partition in the backend consistency total system can find the corresponding Frontend sending request result, specifically, as shown in fig. 6, when each of the primary server and the standby server that are newly online is started, a register may be called to register to the backend consistency total system first, and since only the primary server (master) performs a transactional operation, the primary server (master) may call a GetTrimPoint to obtain a next location ID number that the backend NuwaLog has been deleted currently after the registration is successful, for example, a log whose ID number is currently deleted is No. 3, and then a location that the log that has been pulled currently is 4, that is, the primary server (master) starts to pull the log from location 4 next time. In addition, when the offline Frontend includes a primary server and a standby server, a unregister needs to be called to offline the Frontend from the backend coherency bus.
In the front-end computer of an embodiment of the present invention, the update initiating device is configured to enable the main server or the standby server to obtain a log request for updating a corresponding partition of a user from a load balancing server, and the main server or the standby server updates the log into the corresponding partition by sending the log request for updating the corresponding partition of the user to the backend computer;
and the pulling device is used for pulling the log from the corresponding partition by the main server, storing the pulled log to a local storage system of the region where the front-end computer is located, and replying the position of the pulled log in the partition to the corresponding partition. The method comprises the steps that a log updating request of a user is obtained from a load balancing Server (VIP Server), the log updating request of the user is sent to a backend machine, and the corresponding log is updated into the backend machine and can be completed by a main Server or a standby Server;
and continuously pulling logs (Pull logs) from the corresponding partitions, continuously storing the pulled logs to a local storage system (Apply logs) of a region where the front-end computers are located, and replying transactional work of the positions (Send Ack) of the pulled logs in the partitions to the corresponding partitions, wherein the transactional work is finished only by the main server, so that the data consistency of the whole system is ensured.
In the front-end computer according to an embodiment of the present invention, the reading device is configured to enable the main server or the standby server to receive, from the load balancing server, a log request of a user for reading a corresponding partition; and the main server or the standby server acquires the log of the corresponding partition from a local storage system of the region where the front-end computer is located according to the log request for reading the corresponding partition, and if the log is acquired from the local storage system, the acquired log is replied to the user. The load balancing Server (VIP Server) plays a role in load balancing, a client/user only needs to access one IP or domain name, namely the IP address or domain name address of the VIP Server, the VIP Server can be used for mounting a plurality of frontends, and the IP and port of the Frontend on each machine can be configured. The request sent by the client/user is sent to one of the main Server or the standby Server by the VIP Server according to a certain load balancing strategy.
In the front-end computer according to an embodiment of the present invention, the reading device is further configured to, if the log is not obtained from the local storage system, send the log reading request to the corresponding partition by the main server or the standby server, obtain the log from the corresponding partition, and then reply the obtained log to the user. Here, the corresponding logs are obtained from the corresponding partitions in a non-transactional manner, and the load balancing Server (VIP Server) can select one of the primary Server or the standby Server to complete, so as to achieve a better load balancing effect.
According to another aspect of the present invention, there is also provided a computing-based device comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
forming a back-end consistency total system by the back-end machines, dividing the back-end machines into a plurality of partitions, and distributing corresponding partitions for different users;
receiving a log request of a user for updating a corresponding partition from a front-end computer, and updating a corresponding log into the corresponding partition according to the request;
sending logs to the front-end computer, receiving the position of the currently pulled logs in the partition from the front-end computer, and deleting the pulled logs in the corresponding partition according to the position of the currently pulled logs in the partition
According to another aspect of the present invention, there is also provided a computing-based device comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
receiving a log request of a user for updating a corresponding partition, and updating the log into the corresponding partition by sending the request to the corresponding partition, wherein different users are allocated with the corresponding partition, a back-end machine is divided into a plurality of partitions, and the plurality of back-end machines form a back-end consistency total system;
and pulling logs from the corresponding partitions, storing the pulled logs to a local storage system of a region where the front-end computer is located, and replying the positions of the currently pulled logs in the partitions to the corresponding partitions so that the partitions can delete the pulled logs stored in the partitions according to the positions of the partitions.
The invention can be applied to a plurality of business systems of the current e-commerce type, such as user information of a network transaction platform, and the like, the content of the log can be the user information, and the requirement that the user can quickly access the own treasure information everywhere is required to be met, and the requirement needs a distributed cross-region centralized data consistency to meet the data access of each region; the invention can also be applied to financial business, and also has the requirement that the content of the log can be financial data, and the financial data of the user needs to be backed up in a plurality of regions, thereby meeting the high reliability of the financial data.
In summary, the present invention divides a backend machine into a backend consistency total system, each partition respectively responds to a corresponding request to share request pressure, thereby avoiding the problem of excessive pressure caused by the same backend machine responding to all requests, realizing horizontal extension of the backend machine, in addition, by allocating corresponding partitions to different users, it is able to make a quantum Quota for a metadata resource space written by each user, and the metadata between users is partitioned and isolated, the security is also guaranteed, it is also convenient to differentiate different users to perform charging of quantum quotas, in addition, by continuously pulling logs from the corresponding partitions of a certain backend machine (NuwaLog) through a front end machine (Frontend), and storing the pulled logs into a local storage system (OTS) of the region where the front end machine is located, replying the position of currently pulled logs to the corresponding partitions, the corresponding partition deletes the pulled logs stored in the corresponding partition according to the positions, so that the corresponding partition does not need to store all logs, only needs to store logs which are not pulled to a local storage system (OTS), can avoid the problems that the stand-alone storage performance of a back-end computer (NuwaLog) is reduced, the storage capacity is limited, the number of data storage sites is limited by the number of nodes of a consistency protocol, and the data consistency, the correctness and the robustness of the whole system are achieved.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
It should be noted that the present invention may be implemented in software and/or in a combination of software and hardware, for example, as an Application Specific Integrated Circuit (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Also, the software programs (including associated data structures) of the present invention can be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Further, some of the steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
In addition, some of the present invention can be applied as a computer program product, such as computer program instructions, which when executed by a computer, can invoke or provide the method and/or technical solution according to the present invention through the operation of the computer. Program instructions which invoke the methods of the present invention may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the invention herein comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or solution according to embodiments of the invention as described above.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (36)

1. A metadata interaction method on a backend machine comprises the following steps:
receiving a metadata request of a user for updating a corresponding partition from a front-end computer, and updating corresponding metadata into the corresponding partition according to the request;
sending metadata to the front-end computer, receiving the position of the metadata which is currently pulled in the partition where the metadata is located from the front-end computer, and deleting all metadata which is before the next ID number of the minimum ID number of the metadata which is pulled in a plurality of positions replied by the front-end computer according to the position in the partition where the metadata is located.
2. The method of claim 1, wherein after sending metadata to the front-end, further comprising:
and acquiring a metadata request of a user for reading the corresponding partition from the front-end computer, and sending corresponding metadata to the front-end computer according to the metadata reading request.
3. The method of claim 2, wherein the front-end machine employs a master/standby server architecture.
4. The method of claim 3, wherein prior to receiving a user's request to update the corresponding partition from a front-end or obtaining a user's request to read the corresponding partition from the front-end, further comprising:
receiving registration requests from the primary server and the standby server;
and after the main server and the standby server are registered according to the registration request, sending registration success information to the main server and the standby server and sending the position of the currently pulled metadata in the partition to the main server.
5. The method of claim 3, wherein the primary server and the standby server in the front end are switched by a distributed lock service system.
6. The method of claim 5, wherein switching between the primary server and the standby server in the front-end computer by a distributed lock service system further comprises:
receiving a logout request from a main server and/or a standby server which is offline, and sending logout result information to the main server and/or the standby server which is offline after logging out the main server and/or the standby server which is offline according to the logout request;
receiving a registration request from a newly online main server and/or standby server, registering the newly online main server and/or standby server according to the registration request, and then sending registration success information to the newly online main server and/or standby server and sending the position of currently pulled metadata in the partition to the newly online main server.
7. The method of claim 1, wherein updating the corresponding metadata into the corresponding partition further comprises:
when the metadata is a log, generating a corresponding snapshot according to the currently stored metadata in the partition and storing the snapshot into the partition;
deleting the metadata pulled in the partition according to the position in the partition, and simultaneously:
deleting the corresponding snapshot from the partition.
8. A method for metadata interaction on a front-end machine, wherein the method comprises:
receiving a metadata request of a user for updating a corresponding partition, and updating the metadata into the corresponding partition by sending the request to the corresponding partition;
and pulling the metadata from the corresponding partition, storing the pulled metadata to a local storage system of a region where the front-end computer is located, and replying the position of the currently pulled metadata in the partition to the corresponding partition so that the partition deletes all metadata before the next ID number of the minimum ID number of the pulled metadata according to the position in the partition.
9. The method of claim 8, wherein storing the pulled metadata to a local storage system of a region in which the front-end computer is located further comprises:
receiving a metadata request of a user for reading a corresponding partition;
according to the metadata request for reading the corresponding subarea, acquiring the metadata of the corresponding subarea from a local storage system of the region where the front-end computer is located,
and if the metadata is acquired from the local storage system, the metadata of the acquired corresponding partition is replied to the user.
10. The method of claim 9, wherein after obtaining the metadata of the corresponding partition from the local storage system of the region where the front-end computer is located, further comprising:
and if the metadata is not acquired from the local storage system, sending the metadata reading request to the corresponding partition, and after acquiring the metadata from the corresponding partition, replying the acquired metadata of the corresponding partition to the user.
11. The method of claim 10, wherein the front-end machine employs a master/standby server architecture.
12. The method of claim 11, wherein receiving a user's metadata request to update the corresponding partition or receiving a user's metadata request to read the corresponding partition is preceded by:
the main server and the standby server send registration requests to a backend consistency main system;
and after the main server and the standby server receive the registration success information from the back-end consistency total system, the main server acquires the position of the currently pulled metadata in the partition from the corresponding partition.
13. The method of claim 11, wherein the primary server and the standby server in the front end are switched by a distributed lock service system.
14. The method of claim 13, wherein the distributed lock service system, while switching between the primary server and the standby server in the front-end computer, further comprises:
the off-line main server and the off-line standby server send logout requests to the distributed lock service system and receive logout result information from the back-end consistency main system;
and after the newly online main server and the standby server send registration requests to the back-end consistency total system and receive registration success information from the back-end consistency total system, the newly online main server acquires the position of the currently pulled metadata in the partition from the corresponding partition.
15. The method of claim 11, wherein receiving a user's metadata request to update a corresponding partition, updating the metadata into the corresponding partition by sending the request to the corresponding partition, comprises:
the main server or the standby server acquires a metadata request of a user for updating a corresponding partition from a load balancing server, and updates the metadata into the corresponding partition by sending the metadata request of the user for updating the corresponding partition to the backend machine;
the method comprises the following steps of pulling metadata from a corresponding partition, storing the pulled metadata to a local storage system of a region where a front-end computer is located, and replying the position of the currently pulled metadata in the partition where the metadata is located to a back-end computer, wherein the method comprises the following steps:
and the main server pulls the metadata from the corresponding partition, stores the pulled metadata to a local storage system of a region where the front-end computer is located, and replies the position of the pulled metadata in the partition to the corresponding partition.
16. The method of claim 15, wherein receiving a metadata request for reading a corresponding partition from a user, acquiring metadata of the corresponding partition from a local storage system of a region where the front-end computer is located according to the metadata request for reading the corresponding partition, and if the metadata is acquired from the local storage system, returning the acquired metadata to the user comprises:
the main server or the standby server receives a metadata request of a user for reading a corresponding partition from the load balancing server;
and the main server or the standby server acquires the metadata of the corresponding partition from a local storage system of the region where the front-end computer is located according to the metadata request for reading the corresponding partition, and if the metadata is acquired from the local storage system, the acquired metadata is replied to the user.
17. The method of claim 16, wherein, if the metadata is not obtained from the local storage system, sending the metadata reading request to the corresponding partition, and after obtaining the metadata from the corresponding partition, replying the obtained metadata to the user comprises:
and if the metadata is not acquired from the local storage system, the main server or the standby server sends the metadata reading request to the corresponding partition, and the acquired metadata is replied to the user after the metadata is acquired from the corresponding partition.
18. A backend machine, comprising:
the updating device is used for receiving a metadata request of a user for updating the corresponding partition from the front-end computer and updating the corresponding metadata into the corresponding partition according to the request;
and the sending device is used for sending the metadata to the front-end machine, receiving the position of the metadata which is currently pulled in the partition where the metadata is located from the front-end machine, and deleting all metadata which is before the next ID number of the minimum ID number of the metadata which is pulled in the plurality of positions replied by the front-end machine according to the position in the partition where the metadata is located.
19. The backend machine according to claim 18, wherein the sending device is further configured to obtain a metadata request of a user for reading a corresponding partition from the front end machine, and send corresponding metadata to the front end machine according to the metadata read request.
20. The backend machine of claim 19, wherein the front end machine employs a master/standby server architecture.
21. The backend machine according to claim 20, further comprising a registration means for: receiving registration requests from the primary server and the standby server; and after the main server and the standby server are registered according to the registration request, sending registration success information to the main server and the standby server and sending the position of the currently pulled metadata in the partition to the main server.
22. The backend machine of claim 20, wherein the primary server and the standby server in the frontend machine are switched by a distributed lock service system.
23. The backend machine according to claim 22, wherein the registration apparatus is further configured to receive a logout request from the offline main server and/or standby server, and send logout result information to the offline main server and/or standby server after logging out the offline main server and/or standby server according to the logout request;
the registration device is further configured to receive a registration request from a newly online main server and/or standby server, register the newly online main server and/or standby server according to the registration request, and then send registration success information to the newly online main server and/or standby server and send the position of currently pulled metadata in the partition where the metadata is located to the newly online main server.
24. The backend machine according to claim 18, wherein the updating device is further configured to update the corresponding metadata into the corresponding partition when the metadata is a log, and generate and store the corresponding snapshot into the partition according to the metadata currently stored in the partition;
the sending device is further configured to delete the corresponding snapshot from the partition while deleting the metadata that has been pulled from the partition according to the position in the partition where the sending device is located.
25. A front-end machine, wherein the front-end machine comprises:
the updating initiating device is used for receiving a metadata request of a user for updating the corresponding partition, and updating the metadata into the corresponding partition by sending the request to the corresponding partition;
and the pulling device is used for pulling the metadata from the corresponding partition, storing the pulled metadata to a local storage system of a region where the front-end computer is located, and replying the position of the currently pulled metadata in the partition to the corresponding partition so that the partition deletes all metadata before the next ID number of the minimum ID number of the pulled metadata according to the position in the partition.
26. The front-end according to claim 25, further comprising a reading means for receiving a user's metadata request to read the corresponding partition; according to the metadata request for reading the corresponding subarea, acquiring the metadata of the corresponding subarea from a local storage system of the region where the front-end computer is located,
and if the metadata is acquired from the local storage system, the metadata of the acquired corresponding partition is replied to the user.
27. The front-end computer of claim 26, wherein the reading device is further configured to, if not obtained from the local storage system, send the metadata reading request to the corresponding partition, and after obtaining metadata from the corresponding partition, reply the metadata obtained from the corresponding partition to the user.
28. The front-end of claim 27, wherein the front-end employs a master/standby server architecture.
29. The front-end according to claim 28, further comprising a registration initiating means for the primary server and the backup server to send registration requests to a backend coherent total system; and after the main server and the standby server receive the registration success information from the back-end consistency total system, the main server acquires the position of the currently pulled metadata in the partition from the corresponding partition.
30. The front end machine of claim 28, wherein the primary server and the standby server in the front end machine are switched by a distributed lock service system.
31. The front-end computer of claim 30, further comprising a logout initiating means for the main server and the standby server to be offline, sending a logout request to the distributed lock service system, receiving a logout result information from the back-end consistency master system;
the registration initiating device is further configured to send a registration request to the backend consistency total system by the newly online main server and the standby server, and after receiving registration success information from the backend consistency total system, the newly online main server obtains the position of the currently pulled metadata in the partition from the corresponding partition.
32. The front-end computer of claim 28, wherein the update initiating device is configured to enable the primary server or the standby server to obtain a metadata request for updating a corresponding partition of a user from a load balancing server, and the primary server or the standby server updates the metadata into the corresponding partition by sending the metadata request for updating the corresponding partition of the user to the backend computer;
and the pulling device is used for pulling the metadata from the corresponding partition by the main server, storing the pulled metadata to a local storage system of a region where the front-end computer is located, and replying the position of the pulled metadata in the partition to the corresponding partition.
33. The front-end machine according to claim 32, wherein the reading means is configured to receive, by the primary server or the standby server, a metadata request of a user for reading the corresponding partition from the load balancing server; and the main server or the standby server acquires the metadata of the corresponding partition from a local storage system of the region where the front-end computer is located according to the metadata request for reading the corresponding partition, and if the metadata is acquired from the local storage system, the acquired metadata is replied to the user.
34. The front-end computer of claim 33, wherein the reading device is further configured to, if the metadata is not obtained from the local storage system, send the metadata reading request to the corresponding partition by the primary server or the backup server, and after obtaining the metadata from the corresponding partition, reply the obtained metadata to the user.
35. A computing-based device, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
receiving a metadata request of a user for updating a corresponding partition from a front-end computer, and updating corresponding metadata into the corresponding partition according to the request;
sending metadata to the front-end computer, receiving the position of the metadata which is currently pulled in the partition where the metadata is located from the front-end computer, and deleting all metadata which is before the next ID number of the minimum ID number of the metadata which is pulled in a plurality of positions replied by the front-end computer according to the position in the partition where the metadata is located.
36. A computing-based device, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
receiving a metadata request of a user for updating a corresponding partition, and updating the metadata into the corresponding partition by sending the request to the corresponding partition;
and pulling the metadata from the corresponding partition, storing the pulled metadata to a local storage system of a region where the front-end computer is located, and replying the position of the currently pulled metadata in the partition to the corresponding partition so that the partition deletes all metadata before the next ID number of the minimum ID number of the pulled metadata according to the position in the partition.
CN201710053030.2A 2017-01-24 2017-01-24 Metadata interaction method and system Active CN108347455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710053030.2A CN108347455B (en) 2017-01-24 2017-01-24 Metadata interaction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710053030.2A CN108347455B (en) 2017-01-24 2017-01-24 Metadata interaction method and system

Publications (2)

Publication Number Publication Date
CN108347455A CN108347455A (en) 2018-07-31
CN108347455B true CN108347455B (en) 2021-03-26

Family

ID=62961826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710053030.2A Active CN108347455B (en) 2017-01-24 2017-01-24 Metadata interaction method and system

Country Status (1)

Country Link
CN (1) CN108347455B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110896406A (en) 2018-09-13 2020-03-20 华为技术有限公司 Data storage method and device and server
CN110377416A (en) * 2018-12-04 2019-10-25 天津京东深拓机器人科技有限公司 Distributed subregion method for scheduling task and device
CN112578996B (en) * 2019-09-30 2024-06-04 华为云计算技术有限公司 Metadata sending method of storage system and storage system
CN111083192B (en) * 2019-11-05 2023-02-17 北京字节跳动网络技术有限公司 Data consensus method and device and electronic equipment
CN111935320B (en) * 2020-09-28 2021-01-05 腾讯科技(深圳)有限公司 Data synchronization method, related device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079902A (en) * 2007-06-29 2007-11-28 清华大学 A great magnitude of data hierarchical storage method
CN103167026A (en) * 2013-02-06 2013-06-19 数码辰星科技发展(北京)有限公司 Processing method, system and device for cloud storage environmental data
CN104166600A (en) * 2014-08-01 2014-11-26 腾讯科技(深圳)有限公司 Data backup and recovery methods and devices
CN105187556A (en) * 2015-09-29 2015-12-23 北京奇艺世纪科技有限公司 Method and device for data back-to-source, and edge server
CN105635278A (en) * 2015-12-30 2016-06-01 深圳市瑞驰信息技术有限公司 Method for managing metadata of storage system and metadata server

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101694980B1 (en) * 2014-01-20 2017-01-23 한국전자통신연구원 Apparatus and method for distribution processing of data, storage server

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079902A (en) * 2007-06-29 2007-11-28 清华大学 A great magnitude of data hierarchical storage method
CN103167026A (en) * 2013-02-06 2013-06-19 数码辰星科技发展(北京)有限公司 Processing method, system and device for cloud storage environmental data
CN104166600A (en) * 2014-08-01 2014-11-26 腾讯科技(深圳)有限公司 Data backup and recovery methods and devices
CN105187556A (en) * 2015-09-29 2015-12-23 北京奇艺世纪科技有限公司 Method and device for data back-to-source, and edge server
CN105635278A (en) * 2015-12-30 2016-06-01 深圳市瑞驰信息技术有限公司 Method for managing metadata of storage system and metadata server

Also Published As

Publication number Publication date
CN108347455A (en) 2018-07-31

Similar Documents

Publication Publication Date Title
CN108347455B (en) Metadata interaction method and system
US11507480B2 (en) Locality based quorums
US11570255B2 (en) SMB2 scaleout
US11288253B2 (en) Allocation method and device for a distributed lock
US8627135B2 (en) Management of a distributed computing system through replication of write ahead logs
US7793140B2 (en) Method and system for handling failover in a distributed environment that uses session affinity
US8589732B2 (en) Consistent messaging with replication
JP6225262B2 (en) System and method for supporting partition level journaling to synchronize data in a distributed data grid
US20180150230A1 (en) State machine abstraction for log-based consensus protocols
CN106933547B (en) Global information acquisition and processing method, device and updating system
WO2018107772A1 (en) Method, device and apparatus for processing write request
JP2021509989A (en) Resource reservation method, resource reservation device, resource reservation device, and resource reservation system
CN106933548B (en) Global information obtaining, processing and updating method, device and system
CN106933550B (en) Global information obtaining, processing and updating method, device and system
CN107800733B (en) Method and equipment for generating session identifier in distributed system
CN113672350B (en) Application processing method and device and related equipment
CN108228581B (en) Zookeeper compatible communication method, server and system
KR20140047230A (en) Method for optimizing distributed transaction in distributed system and distributed system with optimized distributed transaction
EP2823622B1 (en) A control node in an overlay network
CN110737510A (en) Block device management system
US10169441B2 (en) Synchronous data replication in a content management system
CN108347454B (en) Metadata interaction method and system
CN110149365B (en) Service adaptation method, device, system and computer readable medium
Pankowski Consistency and availability of Data in replicated NoSQL databases
US20070011328A1 (en) System and method for application deployment service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant