CN114860821A - Data importing method and device of graph database, storage medium and electronic equipment - Google Patents
Data importing method and device of graph database, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN114860821A CN114860821A CN202110164536.7A CN202110164536A CN114860821A CN 114860821 A CN114860821 A CN 114860821A CN 202110164536 A CN202110164536 A CN 202110164536A CN 114860821 A CN114860821 A CN 114860821A
- Authority
- CN
- China
- Prior art keywords
- data
- vertex
- graph
- fragment
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure provides a data importing method, apparatus, electronic device and computer-readable storage medium for a graph database; relates to the technical field of data processing. The method comprises the following steps: acquiring source graph data, and performing field filling processing on the source graph data to obtain graph data to be imported; determining the fragmentation information of the graph data to be imported, and carrying out fragmentation processing on the graph data to be imported according to the fragmentation information to obtain corresponding fragmentation graph data; performing data writing processing on the fragment graph data to generate corresponding compressed binary graph data; determining historical compressed binary chart data corresponding to the compressed binary chart data; and merging the compressed binary chart data and the historical compressed binary chart data to obtain target chart data so as to lead the target chart data into a chart database. The method and the device can greatly reduce the processing time consumption of the graph database processor and effectively improve the data import speed at the same time.
Description
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data importing method for a graph database, a data importing apparatus for a graph database, an electronic device, and a computer-readable storage medium.
Background
With the rapid development of industries such as social contact, e-commerce, internet of things and the like, a huge and complex relationship network is formed under different application scenes, and the relationship operation of the application scenes is difficult to process by adopting a traditional database; and the relation between the data needing to be processed in the big data industry increases in a geometric progression along with the data volume, and a database supporting the relational operation of massive complex data is urgently needed, so that the graphic database is produced at the same time.
A graph database, which is a non-relational database and is called a graph database for short, can store relationship information between entities by applying graph theory. The graph database can be applied to a plurality of scenes such as knowledge graph construction, safety wind control, social recommendation, link tracking and the like.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a data importing method for a graph database, a data importing apparatus for a graph database, an electronic device, and a computer-readable storage medium, so as to overcome the problems of a slow data importing speed of a graph database and a high time consumption of a graph database processor in the prior art to some extent.
According to an aspect of the present disclosure, there is provided a data importing method of a graph database, including: acquiring source graph data, and performing field filling processing on the source graph data to obtain graph data to be imported; determining the fragmentation information of the graph data to be imported, and carrying out fragmentation processing on the graph data to be imported according to the fragmentation information to obtain corresponding fragmentation graph data; performing data writing processing on the fragment graph data to generate corresponding compressed binary graph data; determining historical compressed binary chart data corresponding to the compressed binary chart data; and merging the compressed binary image data and the historical compressed binary image data to obtain target image data so as to lead the target image data into the image database.
According to an aspect of the present disclosure, there is provided a data importing apparatus of a graph database, including: the data acquisition module is used for acquiring source map data and performing field filling processing on the source map data to obtain map data to be imported; the data fragmentation module is used for determining fragmentation information of the graph data to be imported and carrying out fragmentation processing on the graph data to be imported according to the fragmentation information to obtain corresponding fragmentation graph data; the compressed data generation module is used for carrying out data writing processing on the fragment diagram data so as to generate corresponding compressed binary diagram data; the historical data determining module is used for determining historical compressed binary chart data corresponding to the compressed binary chart data; and the data import module is used for merging the compressed binary chart data and the historical compressed binary chart data to obtain target chart data so as to import the target chart data into the chart database.
In an exemplary embodiment of the present disclosure, the data acquisition module includes a source data acquisition unit configured to: determining initial source data, and determining vertex data from the initial source data; determining edge data from the initial source data; and taking the vertex data and the edge data as source graph data.
In an exemplary embodiment of the present disclosure, the data acquisition module further includes: the metadata acquisition unit is used for acquiring metadata corresponding to the source graph data from the graph database; the verification processing unit is used for carrying out field verification processing on the source graph data according to the metadata and taking the source graph data passing the field verification processing as initial graph data; and the filling processing unit is used for carrying out field filling processing on the initial graph data so as to generate graph data to be imported.
In an exemplary embodiment of the present disclosure, the verification processing unit is configured to: determining the type and the number of the reference fields according to the metadata; determining the field type of the source graph data and the field number of the source graph data; it is determined whether the field type of the source graph data is consistent with the reference field type, and it is determined whether the number of fields of the source graph data is consistent with the number of reference fields.
In an exemplary embodiment of the present disclosure, the filling processing unit is configured to: determining a supplemental vertex field corresponding to the initial vertex data; determining a supplemental edge field corresponding to the initial edge data; and performing field filling processing on the initial graph data according to the supplementary vertex field and the supplementary edge field to generate graph data to be imported.
In an exemplary embodiment of the present disclosure, the data fragmentation module includes a fragmentation information determination unit, and the fragmentation information determination unit is configured to: determining a target vertex identification and a target starting vertex identification corresponding to graph data to be imported; acquiring the number of fragments corresponding to a graph database; determining the identifier of the fragment corresponding to the vertex data to be imported according to the number of the fragments and the target vertex identifier, and using the identifier as the vertex fragment identifier; and determining the identifier of the fragment corresponding to the to-be-imported edge data according to the number of the fragments and the target initial vertex identifier, and using the identifier as the edge fragment identifier.
In an exemplary embodiment of the present disclosure, the compressed data generation module includes: the fragment data determining unit is used for determining fragment vertex data and fragment edge data in the fragment graph data; the data sorting unit is used for respectively sorting the fragment vertex data and the fragment edge data to obtain the corresponding fragment sorting vertex data and the corresponding fragment sorting edge data; and the compressed data generating unit is used for sequentially writing the fragment sorting vertex data and the fragment sorting edge data to generate compressed binary drawing data.
In an exemplary embodiment of the present disclosure, the data sorting unit is configured to: determining a vertex sorting field corresponding to the fragment vertex data; sorting the fragment vertex data according to the vertex sorting field to generate fragment sorting vertex data; determining an edge sorting field corresponding to the fragment edge data; and sorting the fragment edge data according to the edge sorting field to generate fragment sorting edge data.
In an exemplary embodiment of the present disclosure, the data import module includes a target data generation unit configured to: performing logic combination processing on the compressed binary chart data and the historical compressed binary chart data to generate logic combination data; and carrying out physical merging and de-duplication processing on the logic merging data to generate target graph data.
In an exemplary embodiment of the present disclosure, the data importing module further includes: the identification determining unit is used for responding to the data query request and determining a request data identification in the data query request; the time stamp obtaining unit is used for determining initial request data corresponding to the request data identification and obtaining time stamp information of the initial request data; and the request data determining unit is used for determining target request data from the initial request data according to the time stamp information and returning the target request data.
In an exemplary embodiment of the present disclosure, the request data determining unit includes a first data determining subunit configured to: determining initial request vertex data according to the request vertex identification, and acquiring vertex timestamp information corresponding to the initial request vertex data; the latest vertex data is determined from the initial request vertex data based on the vertex timestamp information, the latest vertex data being the target request data.
In an exemplary embodiment of the present disclosure, the request data determining unit further includes a second data determining subunit configured to: determining initial request side data according to the request side identification, and acquiring side timestamp information corresponding to the initial request side data; the latest side data is determined from the initial request side data based on the side timestamp information, the latest side data being the target request data.
According to an aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.
According to an aspect of the present disclosure, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the data importing method of the graph database provided in the above embodiments.
Exemplary embodiments of the present disclosure may have some or all of the following benefits:
in the data import method of the graph database provided in an exemplary embodiment of the present disclosure, field filling processing is performed on acquired source graph data to obtain graph data to be imported, fragmentation processing and data writing processing are performed on the graph data to be imported, then corresponding compressed binary graph data is generated, the compressed binary graph data and corresponding historical compressed binary graph data are merged to obtain target graph data, and the target graph data is imported into the graph database. On one hand, the process of filling the source graph data is separated from the graph database, so that the time consumption of a graph database processor can be greatly reduced, and the influence of the graph database on the document service is avoided. On the other hand, by carrying out fragmentation processing on the graph data to be imported, the graph data subjected to fragmentation processing can conform to the data storage format of the graph database, and the data import speed is improved. In another aspect, compressed binary map data corresponding to the map data to be imported is generated and imported into the map database, and the data import speed can be greatly increased because the compressed binary map data are high-compression data and occupy a small bandwidth.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It should be apparent that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived by those of ordinary skill in the art without inventive effort.
FIG. 1 is a diagram illustrating an exemplary system architecture of a method and apparatus for importing data from a graph database to which embodiments of the present disclosure may be applied.
FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.
FIG. 3 schematically illustrates a flow diagram of a method of data import for a graph database according to one embodiment of the present disclosure.
FIG. 4 schematically illustrates an overall architecture diagram of a data import method for a graph database according to one embodiment of the present disclosure.
FIG. 5 schematically shows a flowchart for generating graph data to be imported from source graph data according to an embodiment of the present disclosure.
FIG. 6 schematically shows a flow diagram of a field check process on source graph data according to one embodiment of the present disclosure.
FIG. 7 schematically shows a flow diagram of a field population process for source graph data according to one embodiment of the present disclosure.
Fig. 8 schematically shows a flowchart of determining fragmentation information for graph data to be imported according to an embodiment of the present disclosure.
FIG. 9 schematically illustrates a flow diagram for generating compressed binary map data from graph data to be imported according to one embodiment of the present disclosure.
FIG. 10 schematically shows a flow diagram of a process of sorting sliced data to generate sliced sorted data according to one embodiment of the present disclosure.
FIG. 11 schematically shows a flow diagram of a process of merging compressed binary data and historical compressed binary data according to one embodiment of the present disclosure.
FIG. 12 schematically shows a flow diagram for determining target request data in response to a data query request according to one embodiment of the present disclosure.
FIG. 13 is a block diagram schematically illustrating a data importing apparatus of a graph database according to one embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises the steps of maintaining public and private key generation (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorized condition, supervising and auditing the transaction condition of some real identities, and providing rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.
Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which a data importing method and apparatus for a graph database according to an embodiment of the present disclosure may be applied.
As shown in FIG. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, a server 105, and a graph database 106. The graph database 106 may be located in a block link point device in a block chain. For example, the target graph data obtained by the data importing method of the graph database provided by the present disclosure may be stored into the blockchain.
In particular, the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The terminal devices 101, 102, 103 may be various electronic devices having a display screen, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The data importing method of the graph database provided by the embodiment of the disclosure is generally executed by the server 105, and accordingly, a data importing device of the graph database is generally disposed in the server 105. For example, in an exemplary embodiment, the server 105 may obtain source map data, perform field filling processing on the source map data by using the data importing method of the map database provided by the embodiment of the present disclosure to generate corresponding to-be-imported map data, perform fragmentation processing, data writing processing, and merging processing on the to-be-imported map data to generate target map data, and import the target map data into the map database 106, where the terminal devices 101, 102, and 103 may send a data query request to the map database 106 through the network 104, so that the map database 106 returns the target request data to the terminal devices 101, 102, and 103.
FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.
It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.
As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU)201 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data necessary for system operation are also stored. The CPU 201, ROM 202, and RAM 203 are connected to each other via a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.
The following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output section 207 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 208 including a hard disk and the like; and a communication section 209 including a network interface card such as a LAN card, a modem, or the like. The communication section 209 performs communication processing via a network such as the internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 210 as necessary, so that a computer program read out therefrom is mounted into the storage section 208 as necessary.
In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 209 and/or installed from the removable medium 211. The computer program, when executed by a Central Processing Unit (CPU)201, performs various functions defined in the methods and apparatus of the present application. In some embodiments, the computer system 200 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 3 to 12, and the like.
At present, when data import of a graph database is performed, the following two methods can be generally adopted: firstly, writing in batches; one is active loading.
In the technical scheme of batch writing, the bypass module can analyze data, and after the analyzed data is generated through data analysis processing, the analyzed data is obtained and is in a text format. And after a certain amount of data is acquired through batch requests, the data is directly written into a graph database in a request sending mode. Data is sent to the graph database for batch writing by multiple batch requests. In the technical scheme of active loading, files can be actively loaded from a graph database, and local data can be analyzed and written. The files loaded by the graph database are also graph data represented by text format files.
However, when the two schemes are adopted to import data of the graph database, because the data format imported into the graph database is the text format, when the graph data is represented by the text format file, the text format data generates obvious data expansion, so when the technical scheme is adopted, the data import speed is slow, and if the data is in the order of billions, huge time is consumed.
In addition, according to the prior art, the graph database is used for processing the related import data, so that the time consumption of a Central Processing Unit (CPU) corresponding to the graph database is high, and the graph database CPU is easily influenced to provide document service.
In view of one or more of the above problems, the present exemplary embodiment provides a data import method for a graph database. The data importing method of the graph database may be applied to the server 105, and may also be applied to one or more of the terminal devices 101, 102, and 103, which is not particularly limited in this exemplary embodiment. The technical solution of the embodiment of the present disclosure is explained in detail below:
referring to fig. 3, the data importing method of the graph database may include the following steps S310 to S350:
and S310, acquiring source map data, and performing field filling processing on the source map data to obtain the map data to be imported.
And S320, determining the fragmentation information of the graph data to be imported, and performing fragmentation processing on the graph data to be imported according to the fragmentation information to obtain corresponding fragmentation graph data.
And S330, performing data writing processing on the fragment graph data to generate corresponding compressed binary graph data.
Step S340, determining historical compressed binary chart data corresponding to the compressed binary chart data.
And S350, merging the compressed binary diagram data and the historical compressed binary diagram data to obtain target diagram data so as to lead the target diagram data into a diagram database.
In the data import method of a graph database provided in this exemplary embodiment, field filling processing is performed on acquired source graph data to obtain graph data to be imported, fragmentation processing and data writing processing are performed on the graph data to be imported, then corresponding compressed binary graph data is generated, the compressed binary graph data is merged with corresponding historical compressed binary graph data to obtain target graph data, and the target graph data is imported into the graph database. On one hand, the process of filling the source graph data is separated from the graph database, so that the time consumption of a graph database processor can be greatly reduced, and the influence of the graph database on the document service is avoided. On the other hand, by carrying out fragmentation processing on the graph data to be imported, the graph data subjected to fragmentation processing can conform to the data storage format of the graph database, and the data import speed is improved. In another aspect, compressed binary map data corresponding to the map data to be imported is generated and imported into the map database, and the data import speed can be greatly increased because the compressed binary map data are high-compression data and occupy a small bandwidth.
Next, the above-described steps of the present exemplary embodiment will be described in more detail.
In step S310, source map data is acquired, and field filling processing is performed on the source map data to obtain map data to be imported.
In this example embodiment, a graph may be a data structure consisting of a set of vertices and a set of relationships between the vertices, with the graph data typically including vertex data and edge data. Vertices are sometimes referred to as nodes or intersections, and edges are sometimes referred to as connections. Each node may represent an entity, such as a person, place, thing, category, or other data, and each edge may represent the manner in which two nodes are associated. Various scenarios may be modeled using graph structures, such as device networks, road systems, social relationships of people, or any other thing defined by a relationship, and so forth. The graph database may be a database that stores and queries data in a data structure such as a graph, with the graph database having operations to handle creation, reading, updating, and deletion of graphical data models. The source graph data may be data that may represent a graph structure determined from initial source data. The field filling process may be a process of field filling the source graph data. The graph data to be imported may be graph data generated by performing field filling processing on the source graph data, and the graph data to be imported may include related fields of data stored in the graph database.
Because the fields included in the source graph data are fields used by related users when the data are processed, and the source graph data do not include related data fields of data stored in the graph database, after the source graph data are acquired, field filling processing can be performed on the source graph data to obtain graph data to be imported, so that the graph data to be imported is subjected to related processing and is imported into the graph database.
In one exemplary embodiment of the present disclosure, initial source data is determined, and vertex data is determined from the initial source data; determining edge data from the initial source data; and taking the vertex data and the edge data as source graph data.
The initial source data may be data directly obtained from a source data platform. The initial source data is typically stored in a big data platform, for example, the initial source data may be stored in a Hive data warehouse tool, a Hadoop Distributed File System (HDFS), and so on. Graph data generally includes vertex data, which may include vertex identification and vertex attributes, and edge data; wherein, the vertex identification (vertex id) can be a unique identifier of the node in the graph data. The vertex attributes may be attributes that vertices in the graph data have. The edge data may include a starting vertex identification, a terminating vertex identification, and edge attributes. In a directed graph composed of directed edges, a starting vertex identifier (source vertex id) may be a vertex identifier in which a certain edge serves as a starting point; the terminating vertex identification (destination vertex id) may be the vertex identification of the end point to which the edge points. The edge attribute may be an attribute that an edge in the graph data has. In an undirected graph, then the starting vertex and the terminating vertex are not distinguished.
Referring to FIG. 4, FIG. 4 schematically illustrates an overall architecture diagram of a data import method for a graph database according to one embodiment of the present disclosure. The initial source data may be stored in the source data platform 410, and the source data platform 410 may be any component that provides data, such as Hive, HDFS, Hbase, and storage services provided by cloud services. After the initial source data is acquired in the source data platform, the source graph data can be determined from the initial source data.
Specifically, vertex data may be determined from the initial source data, and the vertex data may include a vertex identifier and a vertex attribute; the vertex attribute can be customized by a user and configured according to a specific scene. For example, in a shopping scenario, a vertex may represent an item, a user, and a shelf, among others, and the vertex identification may be an item unique identification, a user unique identification, and a shelf unique identification. The vertex attributes may represent the price of the item, the item classification, etc.; the size of the shelf, the material of the shelf, and the like can also be represented; the age of the user, purchasing preferences, etc. may also be indicated. In a social recommendation scenario, the vertices may represent respective different users, and the vertex identities may represent different user identities. Vertex attribute edge data may include a start vertex identification, an end vertex identification, and an edge attribute; the edge attribute may represent an associative relationship between the starting vertex and the terminating vertex. For example, in a shopping scenario, the edge attribute may represent a purchase relationship in which a user purchases a certain item, may also represent a storage relationship in which a shelf stores a certain item, and the like; in a social recommendation scenario, the edge attribute may represent a friend relationship between user 1 and user 2, or the like. After the vertex data and the edge data are obtained, the vertex data and the edge data may be used as source graph data.
In an exemplary embodiment of the present disclosure, metadata corresponding to source graph data is obtained from a graph database; performing field verification processing on the source graph data according to the metadata, and taking the source graph data passing the field verification processing as initial graph data; and carrying out field filling processing on the initial graph data to generate graph data to be imported.
Where the metadata may describe information of data attributes of the source graph data. The field check process may be a check process that determines whether a field of the source graph data is consistent with a data attribute defining the field in the metadata. The initial graph data may be graph data processed by field checking, that is, the initial graph data may be graph data in which data fields are consistent with data types and data amounts of fields defined in the metadata.
Referring to fig. 5, fig. 5 schematically illustrates a flow diagram for generating graph data to be imported from source graph data according to an embodiment of the present disclosure. In step S510, after determining source map data based on the initial source data, metadata corresponding to the source map data may be obtained from the map database. In step S520, since the metadata defines the related information of the data attribute of the source graph data, the source graph data may be subjected to field check processing according to the metadata, and the source graph data that passes the field check processing is taken as the initial graph data. In step S530, field filling processing is performed on the initial graph data, and fields required by the graph database are added to the initial graph data, so that corresponding graph data to be imported can be generated.
For example, the initial graph data may be processed by the computation engine 420 in fig. 4 to generate graph data to be imported. The computation engine 420 may be any distributed computation component, for example, the computation engine 420 may be an Apache Flink data processing platform, a multi-point interface (MPI) parallel computation framework, a Hadoop map reduction (Hadoop MapReduce) computation framework, or the like.
In an exemplary embodiment of the present disclosure, a reference field type and a reference field number are determined according to metadata; determining the field type of the source graph data and the field number of the source graph data; it is determined whether the field type of the source graph data is consistent with the reference field type, and it is determined whether the number of fields of the source graph data is consistent with the number of reference fields.
Wherein the reference field type may be a field type defined in the metadata. The reference field number may be the number of fields defined in the metadata. The field type of the source graph data may be a field type corresponding to the source graph data determined from the initial source data. The number of fields of the source graph data may be a number of fields corresponding to the source graph data determined from the initial source data.
Referring to FIG. 6, FIG. 6 schematically shows a flow diagram of a field check process on source graph data, according to one embodiment of the present disclosure. In step S610, the reference field type and the reference field number are determined from the metadata. In step S620, the field type of the source map data and the field number of the source map data may be determined, respectively. In step S630, the field type and the reference field type of the source graph data, and the number of fields of the source graph data are compared with the number of reference fields, respectively, and when the field type and the reference field type of the source graph data are completely consistent and the number of fields of the source graph data is completely consistent with the number of reference fields, the source graph data is considered to pass through the field verification processing procedure. The source graph data processed by the field check may be used as the initial graph data.
For example, when the data type defining a certain reference field in the reference field type is a boolean type, if the field type in the source graph data is also a boolean type, the field type of the source graph data is considered to be consistent with the reference field type. When the data type of another reference field defined in the reference field type is an integer type, if the field type in the source graph data is a character type, the field type of the source graph data is considered to be inconsistent with the reference field type. When the number of reference fields is 2, if the number of fields of the source map data is also 2, the number of fields of the source map data is considered to be identical to the number of reference fields. When the number of reference fields is 3, if the number of fields of the source map data is 2, it is considered that the number of fields of the source map data does not coincide with the number of reference fields.
In an exemplary embodiment of the present disclosure, a supplemental vertex field corresponding to the initial vertex data is determined; determining a supplemental edge field corresponding to the initial edge data; and performing field filling processing on the initial graph data according to the supplementary vertex field and the supplementary edge field to generate graph data to be imported.
The initial graph data may include initial vertex data and initial edge data. The supplemental vertex field may be a field that field fills in the initial vertex data; the supplemental vertex field may include a vertex type identification, a vertex timestamp, and a vertex attribute rule identification. Vertex type identification (vertex type id) may be used to identify the specific type to which the vertex corresponds. The vertex timestamp (vertex data timestamp) may be a timestamp corresponding to the time when the vertex was generated. The vertex attribute rule identifier (vertex schema version) may be an identifier of a different version rule corresponding to the vertex attribute. The supplemental edge field may be a field that field-fills the initial edge data; the supplemental edge field may include an edge type identification, an edge timestamp, and an edge attribute rule identification. An edge type identifier (edge type id) may be used to identify the specific type to which the edge corresponds. The edge timestamp (edge data timestamp) may be a timestamp corresponding to the edge generation. The edge attribute rule identification (edge schema version) may be an identification of different version rules corresponding to the edge attribute.
After the initial graph data is determined, field filling processing can be performed on the initial graph data, and complete information required by the initial graph data is filled, so that the graph data subjected to the field filling processing has a data format stored in data in a graph database, that is, graph data to be imported is generated. Referring to FIG. 7, FIG. 7 schematically illustrates a flow diagram of a field population process for source graph data, according to one embodiment of the present disclosure. In step S710, supplemental vertex fields corresponding to the initial vertex data are determined. The supplemental vertex data may include a vertex type identification, a vertex timestamp, and a vertex attribute rule identification. For example, in a shopping scenario, items, users, and shelves may respectively correspond to different vertex types, and the vertex types corresponding to the three may be respectively labeled as vt0001, vt0002, vt0003, and the like. The vertex timestamp for a vertex of a certain user type may be "2020-10-1020: 00: 00". Since the vertex attribute rule records which fields the attribute list of the vertex data specifically contains, the field type of each field. When the number of the fields of the vertex data is changed or the types of the fields are changed, a new vertex attribute rule can be formed, and therefore, vertex attribute rules of different versions can be distinguished by adopting vertex attribute rule identification. For example, the vertex attribute rule identification may be vertex schema version1, vertex schema version2, vertex schema version3, and so on.
In step S720, a supplemental edge field corresponding to the initial edge data is determined. The supplemental edge field may include an edge type identification, an edge timestamp, and an edge attribute rule identification. Multiple association relations can be corresponded between two vertex data, for example, in a shopping scenario, when a user purchases a certain item, the association relation between the user vertex and the item vertex can be purchase, and the purchase relation can be identified as et 0001; when a certain article is stored on the shelf, the association relationship between the shelf vertex and the article vertex can be storage, the storage relationship can be identified as et002, and similarly, the association relationship between other vertices in the scene can be marked. Similar to the vertex timestamp, the edge timestamp corresponding to the purchase relationship for the user to purchase an item may be "2020-10-1021: 00: 00". Similar to the vertex attribute rule, the edge attribute rule also records the attribute list of the edge data, and since the edge attribute rule may change, the edge attribute rule identifier may be used to distinguish edge attribute rules of different versions. For example, the edge attribute rule identification may be edge schema version1, edge schema version2, and so on.
In step S730, field filling processing may be performed on the initial graph data according to the supplemental vertex field and the supplemental edge field, respectively, to generate graph data to be imported.
In step S320, the fragment information of the graph data to be imported is determined, and the graph data to be imported is subjected to fragment processing according to the fragment information to obtain corresponding fragment graph data.
In this example embodiment, the fragment information may be information of a data fragment where the graph data is to be imported, for example, the fragment information may include a number of the data fragment. The fragmentation map data may be map data having fragmentation information of different data fragments after fragmentation processing.
Because the graph database can be a distributed database in the big data background, the graph database can store data in a fragmentation storage mode. Before data of a graph database is imported, the data of a graph to be imported can be fragmented according to a fragmentation rule of the graph database, and the data of the graph to be imported is divided into different storage directories, so that when the data of the graph to be imported is fragmented, the fragmentation rule of the fragmentation processing is consistent with the rule of fragment storage of the graph database.
Referring to fig. 4, after the graph data to be imported is generated, the computing engine 420 may determine fragment information corresponding to the graph data to be imported, so as to perform fragment processing on the graph data to be imported according to the fragment information, and generate fragment graph data.
In an exemplary embodiment of the present disclosure, a target vertex identifier and a target start vertex identifier corresponding to graph data to be imported are determined; acquiring the number of fragments corresponding to a graph database; determining the identifier of the fragment corresponding to the vertex data to be imported according to the number of the fragments and the target vertex identifier, and using the identifier as the vertex fragment identifier; and determining the identifier of the fragment corresponding to the to-be-imported edge data according to the number of the fragments and the target initial vertex identifier, and using the identifier as the edge fragment identifier.
The graph data to be imported comprises vertex data to be imported and edge data to be imported. The target vertex identification may be an identification corresponding to the vertex data to be imported contained in the graph data to be imported. The target starting vertex identifier may be an identifier corresponding to a starting vertex in the to-be-imported edge data included in the to-be-imported graph data. The number of the fragments may be the number of the data fragments, and the number of the fragments is the same as the number of the data fragments corresponding to the graph database. The vertex fragment identifier may be an identifier of a data fragment corresponding to vertex data to be imported, which is determined according to the number of fragments and the target vertex identifier, and the value range of the vertex fragment identifier is [0, the number of fragments ]. The edge fragment identifier may be an identifier of a data fragment corresponding to the to-be-imported edge data, which is determined according to the number of fragments and the target starting vertex identifier, and the value range of the edge fragment identifier is [0, the number of fragments ].
Referring to fig. 8, fig. 8 schematically illustrates a flowchart of determining fragmentation information for graph data to be imported according to an embodiment of the present disclosure. In step S810, a target vertex identification and a target start vertex identification corresponding to the to-be-imported graph data are determined. In step S820, the number of fragments corresponding to the map database is acquired. For example, referring to fig. 4, the number of slices shown in fig. 4 is 4. In step S830, the vertex fragment identifier corresponding to the vertex data to be imported may be determined by calculating the fragment number and the target vertex identifier. Specifically, the vertex fragment identifier may be calculated in a calculation manner shown in formula 1; wherein, the hash operation can be used, and the% can be the remainder operation.
Vertex fragment id ═ hash (vertex id)% fragment count (equation 1)
In step S840, the edge fragment identifier corresponding to the to-be-imported edge data may be determined by calculating the fragment number and the target start vertex identifier, and the vertex fragment identifier may be calculated by using the calculation method shown in formula 2.
Edge fragment identification ═ hash (source vertex id)% fragment number (formula 2)
Through the calculation mode, all data slices corresponding to the vertex data to be imported and the side data to be imported in the graph data to be imported can be determined.
In step S330, data writing processing is performed on the map data to generate corresponding compressed binary map data.
In this example embodiment, the data writing process may be a process of writing data using a database engine to generate compressed binary map data. The compressed binary diagram data may be high-compression diagram data represented by binary, for example, the compressed binary diagram data may be a Sorted String Table (SST) file, the SST file may specifically refer to a disk file of RocksDB, and the RocksDB is an embedded Key-Value pair (Key/Value, KV) storage engine written based on C + + language, and is an efficient, high-performance, single-point database engine. The compressed binary image data can also be a disk file of any one high-performance KV storage engine, such as an InNODB storage engine, a LevelDB storage engine, an HBase storage engine and the like.
After the fragment graph data are determined, data writing processing can be carried out on the fragment graph data through a high-performance KV storage engine, and corresponding compression binary graph data are generated. Because the existing graph database data importing method usually adopts a text format to represent graph data to be imported, and adopts the text format to represent the graph data, which causes a phenomenon of data expansion with larger graph data, the graph data to be imported can be converted into compressed binary graph data before data importing.
In an exemplary embodiment of the present disclosure, shard vertex data and shard edge data in shard map data are determined; sorting the fragment vertex data and the fragment edge data respectively to obtain corresponding fragment sorting vertex data and fragment sorting edge data; and sequentially writing the fragment sorting vertex data and the fragment sorting edge data to generate compressed binary drawing data.
The fragmented vertex data may be vertex data subjected to fragmentation processing. The sliced edge data may be edge data subjected to a slicing process. The sliced sorted vertex data may be sorted sliced vertex data. The slice ordering edge data may be slice edge data subjected to an ordering process.
Before generating the compressed binary map data according to the slice map data, sorting processing may be performed on the slice map data in each data slice. Referring to fig. 9, fig. 9 schematically illustrates a flow diagram for generating compressed binary map data from graph data to be imported according to an embodiment of the present disclosure. In step S910, slice vertex data and slice edge data in the slice map data are determined. In step S920, when the sorting processing is performed on the fragment vertex data, the sorting processing may be performed according to the relevant fields of the fragment vertex data, so as to generate the fragment sorting vertex data. Similarly, when the fragment edge data is sorted, the fragment edge data may be sorted according to the related fields of the fragment edge data to generate the fragment sorted edge data. In step S930, the generated piece sorting vertex data and piece sorting edge data may be subjected to sequential writing processing by using a high-performance KV storage engine to generate compressed binary map data.
For example, when the RocksDB storage engine is used to perform data writing processing, sequential writing processing may be performed using a RocksDB library (RocksDB lib), and the RocksDB lib may correspond to a format converter to generate the fragmentation graph data into a corresponding SST file. The data of the fragment graph is sequenced, so that the generated fragment sequencing data can meet the data format requirement of a graph database, and the subsequent data importing speed is improved. After generating a corresponding SST file according to the fragment graph data, the SST file is written into the Hadoop distributed file system 430 according to the storage path rule of' fragment ID/sub-fragment ID.
In an exemplary embodiment of the present disclosure, a vertex ordering field corresponding to the sliced vertex data is determined; sorting the fragment vertex data according to the vertex sorting field to generate fragment sorting vertex data; determining an edge sorting field corresponding to the fragment edge data; and sorting the fragment edge data according to the edge sorting field to generate fragment sorting edge data.
The vertex sorting field may be a sorting field used when sorting processing is performed on the sliced vertex data, and for example, the vertex sorting field may include a vertex identifier, a vertex type identifier, a vertex timestamp, a vertex attribute rule identifier, and the like. The edge ordering field may be an ordering field used for ordering the fragmented edge data, for example, the edge ordering field may include a start vertex identifier, an end vertex identifier, an edge type identifier, an edge timestamp, and an edge attribute rule identifier.
Referring to fig. 10, fig. 10 schematically illustrates a flow diagram of a process of sorting sliced data to generate sliced sorted data according to one embodiment of the present disclosure. Before the sorting process is performed on the sliced vertex data, in step S1010, a vertex sorting field corresponding to the sliced vertex data may be determined, and the vertex sorting field may include a sorting priority, for example, the sorting priority of the vertex sorting field is from high to low: vertex identification, vertex type identification, vertex timestamp and vertex attribute rule identification. In step S1020, the fragment vertex data is sorted according to the vertex sorting field according to the number size (e.g., 0-9) or the character sorting order (e.g., "a-z"). Since the same vertex in a graph database may produce different correlation data under different application scenarios. When sorting processing is carried out, if the vertex identifications of the two vertex data are the same, continuously comparing the vertex type identifications of the two vertex data, and repeating the steps until the sorting processing of all the fragment vertex data is finished, and generating the corresponding fragment sorting vertex data.
Similarly, in step S1030, before the sorting process is performed on the sliced edge data, an edge sorting field corresponding to the sliced edge data may be determined, and the edge sorting field also includes a sorting priority, for example, the sorting priority of the edge sorting field is as follows: the method comprises the steps of starting vertex identification, ending vertex identification, edge type identification, edge timestamp and edge attribute rule identification. In step S1040, the fragment edge data is sorted according to the edge sorting field according to the number size or character sorting order, and corresponding fragment sorting edge data is generated.
In step S340, the historical compressed binary image data corresponding to the compressed binary image data is determined.
In this example embodiment, the historical compressed binary image data may be compressed binary image data that is in the same data slice as the compressed binary image data and is providing services in the image database.
For the generated compressed binary graphics data, because the generated compressed binary graphics data has corresponding routing information, all compressed binary graphics data, namely historical compressed binary graphics data, which are located in the same data fragment fragmentation path as the compressed binary graphics data can be searched according to the routing information, and the historical compressed binary graphics data are processed.
In step S350, the compressed binary map data and the historical compressed binary map data are merged to obtain target map data, so as to import the target map data into the map database.
In this exemplary embodiment, the merging process may be a process of merging the compressed binary image data and the historical compressed binary image data, and may include a logical merging process, a physical merging deduplication process, and the like. The target map data may be compressed binary map data subjected to merging processing, and may be map data directly imported into a map database.
After the historical compressed binary image data is determined, the compressed binary image data and the historical compressed binary image data can be combined. Specifically, two stages of merging processing may be included, one is logical merging processing, and the other is physical merging deduplication processing. The target map data generated by the merging process is imported into the map database 440.
In an exemplary embodiment of the present disclosure, the compressed binary image data and the historical compressed binary image data are subjected to a logical merge process to generate logical merge data; and carrying out physical merging and de-duplication processing on the logic merging data to generate target graph data.
Wherein the logical merge process may be a merge process operation of concurrently retaining the compressed binary image data and the historical compressed binary image data. The logical merged data may be data that includes compressed binary image data and historical compressed binary image data. The merging and deduplication processing may be a merging processing operation for physically deleting duplicate data in the logical merged data, and only data with the latest timestamp information is retained in the graph data after the merging and deduplication processing.
Referring to fig. 11, fig. 11 schematically illustrates a flow diagram of a process of merging compressed binary image data and historical compressed binary image data according to one embodiment of the present disclosure. After determining the compressed binary image data and the historical compressed binary image data, in step S1110, the compressed binary image data and the historical compressed binary image data may be subjected to logic combination processing to generate logic combination data. For example, the logical merge process may be performed using an ingest function of RocksDB. After the logic combination processing process is finished, original historical compressed binary map data in a database and newly generated compressed binary map data can be accessed through an interface of the RocksDB.
In step S1120, the generated logical merged data may be subjected to physical merging deduplication processing to generate target map data. For example, a background thread of RocksDB may be started to perform merge deduplication processing, and the background thread of RocksDB may perform merge deduplication processing on vertex data of the same vertex identifier, and only the latest vertex data with the vertex identifier is reserved. Because one edge can correspond to the starting vertex and the ending vertex, and the starting vertex data and the ending vertex data also belong to the vertex data, the vertex data can be processed according to the processing process, and when one vertex is deleted. The edge data corresponding to the vertex will also be deleted. After the target graph data is generated through the physical merging and deduplication processing, the target graph data can be imported into the graph database.
In some scenarios, the data import method of the graph database can be 10 times faster than the data import speed of the Hugegraph database and 3 times faster than the data import speed of the Neo4j graph database when processing 100 hundred million-scale data.
In one exemplary embodiment of the present disclosure, in response to a data query request, a request data identification in the data query request is determined; determining initial request data corresponding to the request data identification, and acquiring timestamp information of the initial request data; and determining target request data from the initial request data according to the time stamp information, and returning the target request data.
The data query request may be a request for obtaining target request data from the graph database by a request end. The request data identification may be a data identification included in the data query request. The initial request data can be request data determined directly according to the request data identifier, and a plurality of related initial request data can be determined according to the same request data identifier. The time stamp information may be information related to a time stamp corresponding to the initial request data. The target request data may be request data determined from the initial request data based on the timestamp information.
Referring to FIG. 12, FIG. 12 schematically illustrates a flow diagram for responding to a data query request to determine target request data, according to one embodiment of the present disclosure. In step S1210, when a data query request is received, a request data identifier may be determined from the data query request in response to the data query request, for example, a vertex identifier and an edge identifier may be included in the data query request data. In step S1220, according to the request data identifier, initial request data corresponding to the request data identifier may be determined from the map database, and timestamp information of the initial request data may be obtained. Since the database may contain multiple copies of the requested data identified by the same data before the physical merge deduplication process is performed. When the determined initial request data comprises a plurality of data, the time stamp information of each data in the initial request data can be obtained. In step S1230, data with the latest time information is determined from the initial request data according to the timestamp information, and the data is used as target request data, and the target request data is returned to the request end.
Because the physical merge deduplication processing may take a lot of time, before the physical merge deduplication processing is completed, if a data query request is received, initial request data may be determined from the graph database according to a request data identifier in the data query request, and the initial request data may be subjected to logical deduplication processing, that is, data with the latest timestamp information is obtained from the initial request data as target request data, and the target request data is returned to the request end.
In an exemplary embodiment of the present disclosure, initial request vertex data is determined according to a request vertex identifier, and vertex timestamp information corresponding to the initial request vertex data is obtained; the latest vertex data is determined from the initial request vertex data based on the vertex timestamp information, the latest vertex data being the target request data.
The request vertex identifier may be a data identifier corresponding to vertex data included in the data query request. The initial requested vertex data may be vertex data determined directly from the requested vertex identification. The vertex timestamp information may be information related to a timestamp to which the vertex data was initially requested. The latest vertex data may be the latest generation time of the timestamp information, i.e., the latest generated vertex data.
When a data query request is received, request vertex identifications contained in the data query request are determined. All vertex data corresponding to the requested vertex identification can be determined from the graph database according to the requested vertex identification and used as initial requested vertex data. After the initial request vertex data is determined, vertex timestamp information corresponding to the initial request vertex data can be obtained, the latest vertex data with the latest timestamp can be determined from the initial request vertex data according to the vertex timestamp information, and the latest vertex data is used as target request data.
In an exemplary embodiment of the present disclosure, initial request side data is determined according to a request side identifier, and side timestamp information corresponding to the initial request side data is obtained; the latest side data is determined from the initial request side data based on the side timestamp information, the latest side data being the target request data.
The request edge identifier may be a data identifier corresponding to edge data included in the data query request. The initial request-side data may be side data determined directly from the request-side identification. The side timestamp information may be information related to a timestamp corresponding to the initially requested side data. The latest side data may be the side data whose timestamp information is the latest generation time, i.e., the latest generation.
When a data query request is received, a request edge identifier contained in the data query request is determined. According to the request edge identifier, all edge data corresponding to the request edge identifier can be determined from the graph database and used as initial request edge data. After the initial request side data is determined, side timestamp information corresponding to the initial request side data can be obtained, the latest side data with the latest timestamp can be determined from the initial request side data according to the side timestamp information, and the latest side data is used as target request data.
It should be noted that, in some exemplary embodiments, after generating the compressed binary image data, the graph database 440 may download the compressed binary image data from the HDFS to the local, and obtain the historical compressed binary image data existing in the local, so as to perform data merging processing on the compressed binary image data and the historical compressed binary image data locally, which is not limited in any way by this disclosure.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Further, in the present exemplary embodiment, a data importing apparatus of a graph database is also provided. The data importing device of the graph database can be applied to a server or terminal equipment. Referring to fig. 13, the data importing apparatus 1300 of the graph database may include a data obtaining module 1310, a data slicing module 1320, a compressed data generating module 1330, a history data determining module 1340, and a data importing module 1350. Wherein:
a data obtaining module 1310, configured to obtain source map data, and perform field filling processing on the source map data to obtain to-be-imported map data;
the data fragmentation module 1320 is configured to determine fragmentation information of the graph data to be imported, and perform fragmentation processing on the graph data to be imported according to the fragmentation information to obtain corresponding fragmentation graph data;
a compressed data generating module 1330, configured to perform data writing processing on the slice map data to generate corresponding compressed binary map data;
a historical data determination module 1340 for determining historical compressed binary data corresponding to the compressed binary data;
the data importing module 1350 is configured to merge the compressed binary map data and the historical compressed binary map data to obtain target map data, so as to import the target map data into the map database.
In an exemplary embodiment of the present disclosure, the data obtaining module 1310 includes a source data obtaining unit configured to: determining initial source data, and determining vertex data from the initial source data; determining edge data from the initial source data; and taking the vertex data and the edge data as source graph data.
In an exemplary embodiment of the present disclosure, the data obtaining module 1310 further includes: the metadata acquisition unit is used for acquiring metadata corresponding to the source graph data from the graph database; the verification processing unit is used for carrying out field verification processing on the source graph data according to the metadata and taking the source graph data passing the field verification processing as initial graph data; and the filling processing unit is used for carrying out field filling processing on the initial graph data so as to generate graph data to be imported.
In an exemplary embodiment of the present disclosure, the verification processing unit is configured to: determining the type and the number of the reference fields according to the metadata; determining the field type of the source graph data and the field number of the source graph data; it is determined whether the field type of the source graph data is consistent with the reference field type, and it is determined whether the number of fields of the source graph data is consistent with the number of reference fields.
In an exemplary embodiment of the present disclosure, the filling processing unit is configured to: determining a supplemental vertex field corresponding to the initial vertex data; determining a supplemental edge field corresponding to the initial edge data; and performing field filling processing on the initial graph data according to the supplementary vertex field and the supplementary edge field to generate graph data to be imported.
In an exemplary embodiment of the present disclosure, the data fragmentation module 1320 includes a fragmentation information determination unit configured to: determining a target vertex identification and a target starting vertex identification corresponding to graph data to be imported; acquiring the number of fragments corresponding to a graph database; determining the identifier of the fragment corresponding to the vertex data to be imported according to the number of the fragments and the target vertex identifier, and using the identifier as the vertex fragment identifier; and determining the identifier of the fragment corresponding to the to-be-imported edge data according to the number of the fragments and the target initial vertex identifier, and using the identifier as the edge fragment identifier.
In an exemplary embodiment of the present disclosure, the compressed data generation module 1330 includes: the fragment data determining unit is used for determining fragment vertex data and fragment edge data in the fragment graph data; the data sorting unit is used for respectively sorting the fragment vertex data and the fragment edge data to obtain the corresponding fragment sorting vertex data and the corresponding fragment sorting edge data; and the compressed data generating unit is used for sequentially writing the fragment sorting vertex data and the fragment sorting edge data to generate compressed binary drawing data.
In an exemplary embodiment of the present disclosure, the data sorting unit is configured to: determining a vertex sorting field corresponding to the fragment vertex data; sorting the fragment vertex data according to the vertex sorting field to generate fragment sorting vertex data; determining an edge sorting field corresponding to the fragment edge data; and sorting the fragment edge data according to the edge sorting field to generate fragment sorting edge data.
In an exemplary embodiment of the present disclosure, the data import module 1350 includes a target data generation unit configured to: performing logic combination processing on the compressed binary chart data and the historical compressed binary chart data to generate logic combination data; and carrying out physical merging and de-duplication processing on the logic merging data to generate target graph data.
In an exemplary embodiment of the present disclosure, the data import module 1350 further includes: the identification determining unit is used for responding to the data query request and determining a request data identification in the data query request; the time stamp obtaining unit is used for determining initial request data corresponding to the request data identification and obtaining time stamp information of the initial request data; and the request data determining unit is used for determining target request data from the initial request data according to the time stamp information and returning the target request data.
In an exemplary embodiment of the present disclosure, the request data determining unit includes a first data determining subunit configured to: determining initial request vertex data according to the request vertex identification, and acquiring vertex timestamp information corresponding to the initial request vertex data; the latest vertex data is determined from the initial request vertex data based on the vertex timestamp information, the latest vertex data being the target request data.
In an exemplary embodiment of the present disclosure, the request data determining unit further includes a second data determining subunit configured to: determining initial request side data according to the request side identification, and acquiring side timestamp information corresponding to the initial request side data; the latest side data is determined from the initial request side data based on the side timestamp information, the latest side data being the target request data.
The specific details of each module or unit in the data importing apparatus of a graph database have been described in detail in the data importing method of the corresponding graph database, and therefore are not described herein again.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (15)
1. A method for importing data from a graph database, comprising:
acquiring source graph data, and performing field filling processing on the source graph data to obtain graph data to be imported;
determining the fragment information of the graph data to be imported, and carrying out fragment processing on the graph data to be imported according to the fragment information to obtain corresponding fragment graph data;
performing data writing processing on the fragment diagram data to generate corresponding compressed binary diagram data;
determining historical compressed binary chart data corresponding to the compressed binary chart data;
and merging the compressed binary image data and the historical compressed binary image data to obtain target image data so as to lead the target image data into the image database.
2. The method of claim 1, wherein obtaining source map data comprises:
determining initial source data, and determining vertex data from the initial source data;
determining edge data from the initial source data;
and taking the vertex data and the edge data as the source graph data.
3. The method according to claim 1, wherein the field filling processing on the source graph data to obtain graph data to be imported includes:
obtaining metadata corresponding to the source graph data from a graph database;
performing field verification processing on the source graph data according to the metadata, and taking the source graph data passing the field verification processing as initial graph data;
and performing field filling processing on the initial graph data to generate the graph data to be imported.
4. The method according to claim 3, wherein the performing field check processing on the source graph data according to the metadata comprises:
determining the type and the number of reference fields according to the metadata;
determining the field type of the source graph data and the field number of the source graph data;
determining whether the field type of the source graph data is consistent with the reference field type, and determining whether the field number of the source graph data is consistent with the reference field number.
5. The method of claim 3, wherein the initial graph data comprises initial vertex data and initial edge data, and the performing field filling processing on the initial graph data to generate the graph data to be imported comprises:
determining a supplemental vertex field corresponding to the initial vertex data;
determining a supplemental edge field corresponding to the initial edge data;
and performing the field filling processing on the initial graph data according to the supplementary vertex field and the supplementary edge field to generate the graph data to be imported.
6. The method of claim 1, wherein the graph data to be imported comprises vertex data to be imported and edge data to be imported, and the slice information comprises a vertex slice identifier and an edge slice identifier;
the determining the fragment information of the graph data to be imported includes:
determining a target vertex identification and a target starting vertex identification corresponding to the graph data to be imported;
acquiring the number of fragments corresponding to the graph database;
determining the identifier of the fragment corresponding to the vertex data to be imported according to the number of the fragments and the target vertex identifier, and using the identifier as the vertex fragment identifier;
and determining the identifier of the fragment corresponding to the to-be-imported edge data according to the fragment number and the target starting vertex identifier, and using the identifier as the edge fragment identifier.
7. The method according to claim 1, wherein the performing data writing processing on the tile map data to generate corresponding compressed binary map data comprises:
determining fragment vertex data and fragment edge data in the fragment graph data;
sorting the fragment vertex data and the fragment edge data respectively to obtain corresponding fragment sorting vertex data and fragment sorting edge data;
and sequentially writing the fragment sorting vertex data and the fragment sorting edge data to generate the compressed binary drawing data.
8. The method of claim 7, wherein the sorting the slice vertex data and the slice edge data to obtain corresponding slice sorting vertex data and slice sorting edge data comprises:
determining a vertex sorting field corresponding to the fragment vertex data;
sorting the fragment vertex data according to the vertex sorting field to generate the fragment sorting vertex data;
determining an edge sorting field corresponding to the fragment edge data;
and sequencing the fragment edge data according to the edge sequencing field to generate the fragment sequencing edge data.
9. The method according to claim 1, wherein the merging the compressed binary image data and the historical compressed binary image data to obtain target image data comprises:
performing logic combination processing on the compressed binary chart data and the historical compressed binary chart data to generate logic combination data;
and carrying out physical merging and de-duplication processing on the logic merging data to generate the target graph data.
10. The method of claim 9, wherein prior to the performing physical merge deduplication processing on the logically merged data, the method further comprises:
responding to a data query request, and determining a request data identifier in the data query request;
determining initial request data corresponding to the request data identification, and acquiring timestamp information of the initial request data;
and determining target request data from the initial request data according to the timestamp information, and returning the target request data.
11. The method of claim 10, wherein the request data identification comprises a request vertex identification, and wherein the initial request data comprises initial request vertex data;
the determining target request data from the initial request data according to the timestamp information includes:
determining the initial request vertex data according to the request vertex identification, and acquiring vertex timestamp information corresponding to the initial request vertex data;
determining latest vertex data from the initial request vertex data based on the vertex timestamp information, the latest vertex data being the target request data.
12. The method of claim 10, wherein the request data identification comprises a request edge identification, and wherein the initial request data comprises initial request edge data;
the determining target request data from the initial request data according to the timestamp information includes:
determining the initial request side data according to the request side identification, and acquiring side timestamp information corresponding to the initial request side data;
determining latest side data from the initial request side data based on the side timestamp information, the latest side data being the target request data.
13. An apparatus for importing data from a graph database, comprising:
the data acquisition module is used for acquiring source graph data and carrying out field filling processing on the source graph data to obtain graph data to be imported;
the data fragmentation module is used for determining fragmentation information of the graph data to be imported and carrying out fragmentation processing on the graph data to be imported according to the fragmentation information to obtain corresponding fragmentation graph data;
the compressed data generation module is used for carrying out data writing processing on the fragment diagram data so as to generate corresponding compressed binary diagram data;
a historical data determination module for determining historical compressed binary chart data corresponding to the compressed binary chart data;
and the data import module is used for merging the compressed binary diagram data and the historical compressed binary diagram data to obtain target diagram data so as to import the target diagram data into the diagram database.
14. An electronic device, comprising:
a processor; and
memory having stored thereon computer readable instructions which, when executed by the processor, implement a method of data import for a graph database according to any of claims 1 to 12.
15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a data import method for a graph database according to any one of claims 1 to 12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110164536.7A CN114860821A (en) | 2021-02-05 | 2021-02-05 | Data importing method and device of graph database, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110164536.7A CN114860821A (en) | 2021-02-05 | 2021-02-05 | Data importing method and device of graph database, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114860821A true CN114860821A (en) | 2022-08-05 |
Family
ID=82627159
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110164536.7A Pending CN114860821A (en) | 2021-02-05 | 2021-02-05 | Data importing method and device of graph database, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114860821A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118394852A (en) * | 2024-06-26 | 2024-07-26 | 支付宝(杭州)信息技术有限公司 | Method, device and graph database system for importing graph data online |
CN118551062A (en) * | 2024-07-30 | 2024-08-27 | 支付宝(杭州)信息技术有限公司 | Graph data processing method, graph database-based data processing method and device |
CN118394852B (en) * | 2024-06-26 | 2024-11-12 | 支付宝(杭州)信息技术有限公司 | Method, device and graph database system for importing graph data online |
-
2021
- 2021-02-05 CN CN202110164536.7A patent/CN114860821A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118394852A (en) * | 2024-06-26 | 2024-07-26 | 支付宝(杭州)信息技术有限公司 | Method, device and graph database system for importing graph data online |
CN118394852B (en) * | 2024-06-26 | 2024-11-12 | 支付宝(杭州)信息技术有限公司 | Method, device and graph database system for importing graph data online |
CN118551062A (en) * | 2024-07-30 | 2024-08-27 | 支付宝(杭州)信息技术有限公司 | Graph data processing method, graph database-based data processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12073298B2 (en) | Machine learning service | |
US11544623B2 (en) | Consistent filtering of machine learning data | |
CN106663037B (en) | System and method for managing feature processing | |
CN106575246B (en) | Machine learning service | |
US10366053B1 (en) | Consistent randomized record-level splitting of machine learning data | |
US10963810B2 (en) | Efficient duplicate detection for machine learning data sets | |
CN110119413A (en) | The method and apparatus of data fusion | |
US20150379072A1 (en) | Input processing for machine learning | |
CN112445854B (en) | Multi-source service data real-time processing method, device, terminal and storage medium | |
US9882949B1 (en) | Dynamic detection of data correlations based on realtime data | |
CN111427971B (en) | Business modeling method, device, system and medium for computer system | |
US11720825B2 (en) | Framework for multi-tenant data science experiments at-scale | |
CN111046237A (en) | User behavior data processing method and device, electronic equipment and readable medium | |
US12045843B2 (en) | Systems and methods for tracking data shared with third parties using artificial intelligence-machine learning | |
US10482268B1 (en) | Systems and methods for access management | |
US11704345B2 (en) | Inferring location attributes from data entries | |
CN112686717B (en) | Data processing method and system for advertisement recall | |
CN113220907A (en) | Business knowledge graph construction method and device, medium and electronic equipment | |
CN111476595A (en) | Product pushing method and device, computer equipment and storage medium | |
CN112925859A (en) | Data storage method and device | |
WO2022111148A1 (en) | Metadata indexing for information management | |
CN114860821A (en) | Data importing method and device of graph database, storage medium and electronic equipment | |
CN114049089A (en) | Method and system for constructing government affair big data platform | |
US20230196185A1 (en) | Generating and maintaining a feature family repository of machine learning features | |
Nagireddy | Job recommendation system with NoSQL databases: Neo4j, MongoDB, DynamoDB, Cassandra and their critical comparison |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |