CN113407378A - Fragmentation information backup method and device for distributed database - Google Patents
Fragmentation information backup method and device for distributed database Download PDFInfo
- Publication number
- CN113407378A CN113407378A CN202110700134.4A CN202110700134A CN113407378A CN 113407378 A CN113407378 A CN 113407378A CN 202110700134 A CN202110700134 A CN 202110700134A CN 113407378 A CN113407378 A CN 113407378A
- Authority
- CN
- China
- Prior art keywords
- data
- information
- data table
- index
- character string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013467 fragmentation Methods 0.000 title claims description 20
- 238000006062 fragmentation reaction Methods 0.000 title claims description 20
- 239000012634 fragment Substances 0.000 claims abstract description 74
- 238000009826 distribution Methods 0.000 claims abstract description 62
- 238000003860 storage Methods 0.000 claims description 25
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 11
- 238000012545 processing Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000011084 recovery Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000013515 script Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a fragment information backup method and a fragment information backup device for a distributed database, wherein the method comprises the following steps: acquiring data distribution information of a distributed database; the data distribution information indicates the distribution of data records of each data table of the distributed database on a plurality of database instances; converting the data distribution information into information of a preset structure; aiming at any data table in the data tables, positioning sub information of the data table from the information of the preset structure; analyzing the sub information and extracting the fragment information of the data table; and the fragment information of each data table is backed up. When the method is applied to financial technology (F intech), the fragment information of the distributed database on a plurality of database instances can be backed up.
Description
Technical Field
The invention relates to the field of databases in the field of financial technology (Fintech), in particular to a fragment information backup method and device of a distributed database.
Background
With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but due to the requirements of the financial industry on safety and real-time performance, higher requirements are also put forward on the technologies. Currently, financial data has a high requirement on fault tolerance, and therefore, a distributed database (e.g., mongodb) is often used to store the financial data, that is, the data of the distributed database is stored in a plurality of database instances (e.g., each host runs one database instance), and each database instance only stores a part of the data in the distributed database. In some cases, a backup of the distributed database is required.
However, at present, the backup of the distributed database is directed to the backup of the data records therein, but the data records of each data table are specifically stored in which database instance, the data is the fragment information of the data table (one or more column names of the data table), and no interface or command for acquiring the fragment information of the data table is provided in the distributed database, so that the fragment information of the data table in the distributed database cannot be backed up at present. This is a problem to be solved.
Disclosure of Invention
The invention provides a fragment information backup method and device of a distributed database, which solve the problem that the fragment information of a data table in the distributed database cannot be backed up in the prior art.
In a first aspect, the present invention provides a fragmentation information backup method for a distributed database, including:
acquiring data distribution information of a distributed database; the data distribution information indicates the distribution of data records of each data table of the distributed database on a plurality of database instances;
converting the data distribution information into information of a preset structure;
aiming at any data table in the data tables, positioning sub information of the data table from the information of the preset structure; analyzing the sub information and extracting the fragment information of the data table;
and backing up the fragment information of each data table.
In the method, after the data distribution information is converted into the information with the preset structure, the sub-information of the data table can be positioned from the information with the preset structure, so that the sub-information is analyzed, the fragment information of the data table is extracted, and the fragment information of the data table is directly extracted through data processing and conversion based on the data distribution information, so that the fragment information of the data table can be backed up.
Optionally, the converting the data distribution information into information of a preset structure includes:
converting the data distribution information into a binary data input stream, and converting the data input stream into at least one character string according to a preset format;
and taking the at least one character string as the information of the preset structure.
In the method, the data distribution information is converted into a binary data input stream, and the data input stream is converted into at least one character string according to a preset format, so that the data distribution information is standardized and structured, and the obtained information of the preset structure can be more convenient for data processing and extraction of fragment information of each data table.
Optionally, the positioning the sub information of the data table from the information of the preset structure includes: aiming at a first character string, the first character string is any character string in at least one character string, if the first character string is a character string with database attributes, and a database variable in the first character string indicates the distributed database, and a data table variable in the first character string indicates the data table, the character string in a preset position range corresponding to the first character string is determined to be sub information of the data table, and the character string in the preset position range comprises the first character string.
In the method, the sub information of the data table is positioned through the database attribute, the database variable, the data table variable and other contents, and the distributed database can be judged through the database attribute, and the database and the data table are unique during creation, so that the sub information of the data table to be analyzed can be positioned more efficiently and accurately from the information of the preset structure.
Optionally, the analyzing the sub-information and extracting the fragment information of the data table includes:
for any character string in the character strings in the preset position range, if the character string is the first character string, analyzing the library name of the distributed database and the table name of the data table according to the character string; if the character string comprises the fragment identification, analyzing the fragment column name according to the column name corresponding to the fragment identification in the character string; and taking the library name, the table name and the fragment column name as fragment information of the data table.
In the above manner, the first character string analyzes the library name of the distributed database and the table name of the data table, and analyzes the character string including the fragment identifier to obtain the fragment column name, so that the library name, the table name and the fragment column name are all obtained, and thus the fragment information of the data table can clearly show the database data table information and the basis of distribution on the corresponding database instance.
Optionally, the acquiring data distribution information of the distributed database includes: inquiring a system table of the distributed database to obtain data block storage information in the multiple database instances; and determining the data distribution information according to the data block storage information in the plurality of database instances.
In the above manner, the system table of the distributed database is queried to obtain the data block storage information in the multiple database instances, and the data distribution information can be directly obtained from physical distribution, so that the method is faster and more direct.
Optionally, the method further includes:
obtaining a table index object of each data table in the distributed database;
aiming at any data table in the data tables, determining an index field of the data table and an index attribute of the data table according to a table index object of the data table;
generating an index creating command of the data table according to the index field of the data table and the index attribute of the data table; the index creating command is used for backing up index information of the data table.
In the above manner, after the index field of the data table and the index attribute of the data table are determined, the index creating command of the data table is generated according to the index field of the data table and the index attribute of the data table, so that the index of the data table is stored in the form of the index creating command, and the recovery of the distributed database is facilitated.
Optionally, the determining the index field of the data table and the index attribute of the data table according to the table index object of the data table includes:
taking a field with a field name matched with an index identifier in the index object as an index field of the data table;
determining whether the index of the data table is a unique index according to the value of the first field in the index object;
and determining the expiration time of the index of the data table according to the value of the second field in the index object.
In the above manner, the index field of the data table and the index attribute of the data table are more accurately determined through index identifier matching and values of the first field and the second field.
Optionally, the method further includes:
reading data records of the data tables aiming at any data table in the data tables;
aiming at any data record of the data records of the data table, acquiring a default main key in the data records, and if the default main key is a main key of a preset data type, converting the data records into json character strings;
generating an SQL command of the data record according to the json character string in the format of Structured Query Language (SQL); the SQL command is used for backing up the data record.
In the above manner, for any data record, if the default primary key is a primary key of a preset data type, the data record can be converted into a json character string, and then an SQL command of the data record is generated.
In a second aspect, the present invention provides a device for backing up fragmentation information of a distributed database, including: the backup module is used for acquiring data distribution information of the distributed database; the data distribution information indicates the distribution of data records of each data table of the distributed database on a plurality of database instances; converting the data distribution information into information of a preset structure;
the analysis module is used for positioning the sub information of the data table from the information of the preset structure aiming at any data table in the data tables; analyzing the sub information and extracting the fragment information of the data table;
the backup module is further configured to backup the fragmentation information of each data table.
Optionally, the parsing module is specifically configured to:
converting the data distribution information into a binary data input stream, and converting the data input stream into at least one character string according to a preset format;
and taking the at least one character string as the information of the preset structure.
Optionally, the parsing module is specifically configured to:
aiming at a first character string, the first character string is any character string in at least one character string, if the first character string is a character string with database attributes, and a database variable in the first character string indicates the distributed database, and a data table variable in the first character string indicates the data table, the character string in a preset position range corresponding to the first character string is determined to be sub information of the data table, and the character string in the preset position range comprises the first character string.
Optionally, the parsing module is specifically configured to:
for any character string in the character strings in the preset position range, if the character string is the first character string, analyzing the library name of the distributed database and the table name of the data table according to the character string;
if the character string comprises the fragment identification, analyzing the fragment column name according to the column name corresponding to the fragment identification in the character string;
and taking the library name, the table name and the fragment column name as fragment information of the data table.
Optionally, the parsing module is specifically configured to: inquiring a system table of the distributed database to obtain data block storage information in the multiple database instances;
determining the data distribution information to query a system table of the distributed database according to the data block storage information in the multiple database instances to obtain the data block storage information in the multiple database instances;
and determining the data distribution information according to the data block storage information in the plurality of database instances.
Optionally, the backup module is further configured to:
obtaining a table index object of each data table in the distributed database;
aiming at any data table in the data tables, determining an index field of the data table and an index attribute of the data table according to a table index object of the data table;
generating an index creating command of the data table according to the index field of the data table and the index attribute of the data table; the index creating command is used for backing up index information of the data table.
Optionally, the backup module is specifically configured to:
taking a field with a field name matched with an index identifier in the index object as an index field of the data table;
determining whether the index of the data table is a unique index according to the value of the first field in the index object;
and determining the expiration time of the index of the data table according to the value of the second field in the index object.
Optionally, the backup module is further configured to:
reading data records of the data tables aiming at any data table in the data tables;
aiming at any data record of the data records of the data table, acquiring a default main key in the data records, and if the default main key is a main key of a preset data type, converting the data records into json character strings;
generating an SQL command of the data record according to the json character string in the format of Structured Query Language (SQL); the SQL command is used for backing up the data record.
The advantageous effects of the second aspect and the various optional apparatuses of the second aspect may refer to the advantageous effects of the first aspect and the various optional methods of the first aspect, and are not described herein again.
In a third aspect, the present invention provides a computer device comprising a program or instructions for performing the method of the first aspect and the alternatives of the first aspect when the program or instructions are executed.
In a fourth aspect, the present invention provides a storage medium comprising a program or instructions which, when executed, is adapted to perform the method of the first aspect and the alternatives of the first aspect.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flowchart corresponding to a fragmentation information backup method for a distributed database according to an embodiment of the present invention;
fig. 2 is a schematic diagram of information of a preset structure in a fragment information backup method for a distributed database according to an embodiment of the present invention;
fig. 3 is a schematic timing diagram corresponding to a fragmentation information backup method for a distributed database according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a fragmentation information backup device of a distributed database according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
During the operation of a financial institution (e.g., a banking institution, an insurance institution or a security institution) in a business (e.g., a bank loan business, a deposit business, etc.), the data of the distributed database is stored in a plurality of database instances (e.g., one database instance is run by each host), and in some cases, backup of the distributed database is needed. The backup of the distributed database is currently directed to the backup of the whole data, but the fragmentation information of the distributed database on a plurality of database instances cannot be backed up. This situation does not meet the requirements of financial institutions such as banks, and the efficient operation of various services of the financial institutions cannot be ensured.
As shown in fig. 1, the present invention provides a fragmentation information backup method for a distributed database.
Step 101: and acquiring data distribution information of the distributed database.
Step 102: and converting the data distribution information into information with a preset structure.
Step 103: aiming at any data table in the data tables, positioning sub information of the data table from the information of the preset structure; and analyzing the sub information and extracting the fragment information of the data table.
Step 104: and backing up the fragment information of each data table.
For example, the distributed database may be mongodb, which is a database based on distributed file storage, a non-relational database, and a NOSQL database product. Shards refer to data distributed by the distributed database across each instance. For example, mongodb may scatter data across different mongodb instances, which are one slice. For example, the A table has 3 million pieces of data, the database has 3 shards (3 machines), and each shard can store 1 million pieces of data. When a certain fragment is queried, the expected data can be found only from 1/3 in the original data range, and the query efficiency is greatly improved. The slice information is column names that need to be assigned to distribute data to different machines, and these column names may also be referred to as slice keys.
It should be noted that, in the current practical application scenario, the completely backed-up distributed database also needs to backup index information to be backed up of each data table of the distributed database, and the index information to be backed up is used to restore the index of each data table in the distributed database, so that based on the index of each data table in the restored distributed database, the data record to be backed up of each data table can be restored according to the index to obtain the data record of each data table, and based on the data distribution information, the fragment information of each data table can be obtained through analysis, and backup can be performed based on the distributed database, each data table, and the fragment information of each data table.
It should be noted that, during actual software implementation, the database backup configuration file may be modified before step 101, the distributed databases to be connected are specified, and at the same time, which tables in the distributed databases need to export data records are specified, and partial data in the distributed databases may be selected for backup, where data in each data table may be all data or partial data in the distributed databases, and each step may be encapsulated in a Shell script.
Specifically, the Shell script can encapsulate a backup function and a restore function. The backup function is used for executing the steps 101 to 104, and can be packaged into the backup function in a form of running a jar packet command, so that the user operation is simplified, the error probability is reduced, and the specific command can be backup. The restore function is used for connecting the databases and sequentially executing index restoration of each data table, data record restoration of each data table and fragment information restoration of each data table, and the specific command can be back up.
In a possible implementation manner, the step of backing up the index information to be backed up of each data table of the distributed database may specifically be:
obtaining a table index object of each data table in the distributed database;
aiming at any data table in the data tables, determining an index field of the data table and an index attribute of the data table according to a table index object of the data table;
generating an index creating command of the data table according to the index field of the data table and the index attribute of the data table; the index creating command is used for backing up index information of the data table.
It should be noted that, the table index object of each data table records the index field of the data table and the index attribute of the data table, and specifically, how to determine the index field of the data table and the index attribute of the data table according to the table index object of the data table has various ways, for example, a variable is defined in the table index object, and the index field and the index attribute are recorded by assigning a value to the variable, or the index field and the index attribute are recorded by the operation result of a function by defining a function.
For example, the index creation command is specifically: collection. createindex (keys). Collection. createindex is the name of the index creation command, keys, options are parameters.
The keys parameter defines which fields the index is composed of, and may also contain other information, such as the ordering mode of the index fields, with the sample being { acctNo:1, mntSysTime:1 }. The options parameter defines the index attributes and whether the index is created in the background, unique, index name, data expiration time, etc.
In the above manner, after the index field of the data table and the index attribute of the data table are determined, the index creating command of the data table is generated according to the index field of the data table and the index attribute of the data table, so that the index of the data table is stored in the form of the index creating command, and the recovery of the distributed database is facilitated.
In the foregoing implementation manner, according to the table index object of the data table, a specific process of determining the index field of the data table and the index attribute of the data table may be as follows:
taking a field with a field name matched with an index identifier in the index object as an index field of the data table;
determining whether the index of the data table is a unique index according to the value of the first field in the index object;
and determining the expiration time of the index of the data table according to the value of the second field in the index object.
For example, the specific steps are as follows:
reading the name of each data table, so as to conveniently obtain the index and the backup data of the data table according to the table name; traversing the data table to obtain a table index object of the data table; and parsing the index field and the index attribute according to the table index object. Meanwhile, an index creation command is generated according to the analyzed index field and index attribute, so that subsequent index recovery is facilitated, and the specific steps are as follows:
assembly keys parameters: and acquiring a key field of the table index object, acquiring an index field object, and converting the object into a character string.
Assembly options parameters:
and acquiring a background field from the table index object to obtain whether the index needs background creation, whether the field can be empty, and the situation that the field needs to be compatible with the empty field can be obtained, and the background field can be specifically represented by the value of the background field.
And acquiring the name field from the table index object to obtain the index name.
And acquiring an unique field (first field) from the table index object to obtain whether the index is a unique index, wherein the field may be null and needs to be compatible with the situation of null.
And acquiring an expireAfterSeconds field (a second field) from the table index object to obtain the data expiration time, wherein the field may be null, and null indicates that the data never expires. Whether the expiration time is integer or floating point data needs to be judged, and then values of different data types are obtained according to the data types.
And splicing the character strings to obtain an index executable command, and writing the command into a file. Sample example: db. table name createIndex ({ actNo: 1, mntSysTime:1}, { name: idx _ xxxx, backgroup: xxx, expireAfterSecons: xxx, unique: xxxx }), createIndex representing the index execution command, { actNo: 1, mntSysTime:1} representing the index field, { name: idx _ xxxx, backgroup: xxx, expireAfterSecon: xxx, unique: xxxx }) representing the index attribute, and name, backgroup, expireAfterSecon, unique all being the index attributes.
In the above manner, the index field of the data table and the index attribute of the data table are more accurately determined through index identifier matching and values of the first field and the second field.
In a possible implementation manner, the specific step of backing up the data records to be backed up of each data table may be:
reading data records of the data tables aiming at any data table in the data tables; aiming at any data record of the data records of the data table, acquiring a default main key in the data records, and if the default main key is a main key of a preset data type, converting the data records into json character strings; generating an SQL command of the data record according to the json character string in the format of Structured Query Language (SQL); the SQL command is used for backing up the data record. The specific steps can be as follows:
configuring data of each data table to be backed up according to the configuration file; generating an executable command for the read data of each data table to facilitate subsequent data recovery, and specifically comprising the following steps:
and then checking whether the current data table needs to backup data according to the table name of each data table, and skipping data backup if the current data table does not need to backup data. Reading all data of the table and traversing; fields of the default primary key such as the "_ id" field may also be obtained. The field is a default primary key, and the primary key is automatically generated when data is inserted but not filled. This step requires checking whether the returned primary key is one that the system of the distributed database generates itself with business meaning (primary key of preset data type) or one that the database generates automatically without actual meaning. If the key has business meaning, the key needs to be reserved, and the key without business meaning needs to be ignored when generating the executable command, and the judgment is carried out by checking the data type of the returned _ id'.
The data records to be backed up (i.e. the data of each data table) of each data table can be converted into character strings which can be executed by a command line by a toJson method, the data insertion command is spliced, and the command is written into a file. The final spliced command sample: db. table name of insert ({ field A: value of field A, field B: value of field B }) to be spliced into SQL commands in SQL format, it should be noted that the SQL commands can be SQL statements or not, and the SQL commands and the SQL statements can establish mapping relations through preset rules, thereby reducing the size of storage space of the SQL commands.
In the above manner, for any data record, if the default primary key is a primary key of a preset data type, the data record can be converted into a json character string, and then an SQL command of the data record is generated.
In a possible implementation manner, the specific step of acquiring the data distribution information of the distributed database on multiple database instances in step 101 may be:
and executing the state query command to obtain the data distribution information of the distributed database.
Step 102 may specifically be:
converting the data distribution information into a binary data input stream, and converting the data input stream into at least one character string according to a preset format; and taking the at least one character string as the information of the preset structure.
Specifically, the following may be mentioned:
status command (status inquiry command) is executed. A distributed database (e.g., mongodb) cluster may have multiple database instances, and the command returns data distribution information for all databases of the database cluster. Such as block information of each slice, but the command is not a direct data query command, and the information returned by the command is unstructured and disordered text information, so that the readability is poor. The result is directly returned after the data query command is executed, the execution result cannot be directly obtained after the sh.status command is executed, the result is simply output to the console (multiple rows), and the return value of the command is null. Therefore, an input stream can be obtained by calling a java bottom layer method runtime.
In the method, the data distribution information is converted into a binary data input stream, and the data input stream is converted into at least one character string according to a preset format, so that the data distribution information is standardized and structured, and the obtained information of the preset structure can be more convenient for data processing and extraction of fragment information of each data table.
Specifically, the specific process of analyzing the data distribution information to obtain the fragmentation information of each data table may be as follows:
aiming at a first character string, the first character string is any character string in at least one character string, if the first character string is a character string with database attributes, and a database variable in the first character string indicates the distributed database, and a data table variable in the first character string indicates the data table, the character string in a preset position range corresponding to the first character string is determined to be sub information of the data table, and the character string in the preset position range comprises the first character string.
The specific steps of analyzing the sub-information and extracting the fragment information of the data table are as follows:
for any character string in the character strings in the preset position range, if the character string is the first character string, analyzing the library name of the distributed database and the table name of the data table according to the character string; if the character string comprises the fragment identification, analyzing the fragment column name according to the column name corresponding to the fragment identification in the character string; and taking the library name, the table name and the fragment column name as fragment information of the data table.
In the above manner, the first character string analyzes the library name of the distributed database and the table name of the data table, and analyzes the character string including the fragment identifier to obtain the fragment column name, so that the library name, the table name and the fragment column name are all obtained, and thus the fragment information of the data table can clearly show the database data table information and the basis of distribution on the corresponding database instance.
The specific steps can be as follows:
checking whether the character string starts with the database (whether the character string is the character string of the database attribute), and if so, setting the value of the gotdatabase node variable as true. If the gotDatabaseNode (database variable) is true (preset value), and the character string is resolved into the database name + "\" of "{ \\ \ id \"/"" + distributed database, and the gotDbnamenode is set as true at the beginning.
If gotDbnameNode is true and the string begins with the database name of the distributed database, the string can resolve the library name of the distributed database and the table name of the data table, and meanwhile, the subsequent row (the string within the preset position range) of the row is the string that needs to include the fragment identifier, and can resolve the fragment column name.
The above process of parsing the tile column name is as follows:
and acquiring a next line character string of the first character string. And if the character string of the row starts with 'shardkey', intercepting the character string after the 'shardkey' and taking the intercepted character string as the name of the sharded column. Sample example: the guard key { "custaccntNo": 1, "entityCode":1 }.
The results of the two-step interception are spliced into a database executable fragmentation command. Sample example: run Command ({ ShardCollection: "database name. table name", key: { "field A":1.0, "field B":1.0, "field C":1.0} }). While writing the results to the file.
Each character string is traversed according to the above steps until all character strings are traversed, and it should be noted that, for example, each character string may be set as a row to facilitate parsing.
And backing up the table structure, the data and the fragment information which are completed in the steps. If recovery is not required, the following steps may not be performed. This step modifies which database the backed up data needs to be restored to.
In another possible implementation manner, the specific step of acquiring the data distribution information of the distributed database on multiple database instances in step 101 may be:
inquiring a system table of the distributed database to obtain data block storage information in the multiple database instances;
and determining the data distribution information according to the data block storage information in the plurality of database instances.
In the relational database, all table information, including indexes, fields, data types, primary keys, partitions and the like, can be found from the metadata database. Then, by analogy with a relational database, a chunks table is found in a distributed database (e.g., mongodb), which is a system table of mongodb and stores data block storage information of data blocks in a mongodb cluster. The slicing command may be generated by further parsing the acquired slicing information using the data block storage information.
In the above manner, the system table of the distributed database is queried to obtain the data block storage information in the multiple database instances, and the data distribution information can be directly obtained from physical distribution, so that the method is faster and more direct.
In the method of step 101 to step 104, after the data distribution information is converted into the information of the preset structure, the sub information of the data table can be located from the information of the preset structure, so as to analyze the sub information, and the fragment information of the data table is extracted, so that the fragment information of the data table is directly extracted through data processing and conversion based on the data distribution information, and thus the fragment information of the data table can be backed up.
The following describes the fragmentation information backup method of a distributed database in further detail with reference to fig. 3.
Step 1: modify database connection information and specify which tables need backup data.
Step 2: sh backup command is executed.
And step 3: the data tables of the distributed database (all data tables are exemplified in fig. 3) are obtained.
And 4, step 4: and traversing each data table of the distributed database to obtain an index object (index information object) of each data table.
And 5: the returned index object is parsed, index-related information (index field and index attribute) is filtered, and an executable command (index creation command) is generated.
Step 6: the data records (basic data) of the data tables are read according to the configuration file.
And 7: and generating an executable command from the data records of each data table.
And 8: status is performed.
And step 9: and analyzing the returned text (the information with the preset structure) to generate the executable command of the fragment information.
Step 10: and executing the database information needing to be recovered.
Step 11: and (5) executing backup.
Step 12: the execution of step 5 creates an index command.
Step 13: the basic data recovery command of step 7 is executed.
Step 14: executing the command of step 10 creates a slice command.
It should be noted that, based on the method shown in fig. 3, the index, data record, and fragmentation information of the distributed database are backed up by the back-up command after each version is online, so that it is possible to select to restore to any version when restoration is required. This eliminates the need to manually integrate database scripts for each version of the sorted history. In addition, the data tables which do not need to be backed up are filtered, only the specified data tables are backed up, and the efficiency is high. Sh restore is only needed to be executed for data recovery, and the command is simple and the operability is high.
As shown in fig. 4, the present invention provides a fragmentation information backup device for a distributed database, including: the backup module 401 is configured to obtain data distribution information of a distributed database; the data distribution information indicates the distribution of data records of each data table of the distributed database on a plurality of database instances; converting the data distribution information into information of a preset structure;
an analyzing module 402, configured to locate, for any one of the data tables, sub information of the data table from the information of the preset structure; analyzing the sub information and extracting the fragment information of the data table;
the backup module 401 is further configured to backup the fragment information of each data table.
Optionally, the parsing module 402 is specifically configured to:
converting the data distribution information into a binary data input stream, and converting the data input stream into at least one character string according to a preset format;
and taking the at least one character string as the information of the preset structure.
Optionally, the parsing module 402 is specifically configured to:
aiming at a first character string, the first character string is any character string in at least one character string, if the first character string is a character string with database attributes, and a database variable in the first character string indicates the distributed database, and a data table variable in the first character string indicates the data table, the character string in a preset position range corresponding to the first character string is determined to be sub information of the data table, and the character string in the preset position range comprises the first character string.
Optionally, the parsing module 402 is specifically configured to:
for any character string in the character strings in the preset position range, if the character string is the first character string, analyzing the library name of the distributed database and the table name of the data table according to the character string;
if the character string comprises the fragment identification, analyzing the fragment column name according to the column name corresponding to the fragment identification in the character string;
and taking the library name, the table name and the fragment column name as fragment information of the data table.
Optionally, the parsing module 402 is specifically configured to: inquiring a system table of the distributed database to obtain data block storage information in the multiple database instances;
determining the data distribution information to query a system table of the distributed database according to the data block storage information in the multiple database instances to obtain the data block storage information in the multiple database instances;
and determining the data distribution information according to the data block storage information in the plurality of database instances.
Optionally, the backup module 401 is further configured to:
obtaining a table index object of each data table in the distributed database;
aiming at any data table in the data tables, determining an index field of the data table and an index attribute of the data table according to a table index object of the data table;
generating an index creating command of the data table according to the index field of the data table and the index attribute of the data table; the index creating command is used for backing up index information of the data table.
Optionally, the backup module 401 is specifically configured to:
taking a field with a field name matched with an index identifier in the index object as an index field of the data table;
determining whether the index of the data table is a unique index according to the value of the first field in the index object;
and determining the expiration time of the index of the data table according to the value of the second field in the index object.
Optionally, the backup module 401 is further configured to:
reading data records of the data tables aiming at any data table in the data tables;
aiming at any data record of the data records of the data table, acquiring a default main key in the data records, and if the default main key is a main key of a preset data type, converting the data records into json character strings;
generating an SQL command of the data record according to the json character string in the format of Structured Query Language (SQL); the SQL command is used for backing up the data record.
Based on the same inventive concept, the embodiment of the present invention further provides a computer device, which includes a program or an instruction, and when the program or the instruction is executed, the fragmentation information backup method and any optional method of the distributed database provided by the embodiment of the present invention are executed.
Based on the same inventive concept, the embodiment of the present invention further provides a computer-readable storage medium, which includes a program or an instruction, and when the program or the instruction is executed, the fragmentation information backup method and any optional method of the distributed database provided by the embodiment of the present invention are executed.
It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A fragment information backup method of a distributed database is characterized by comprising the following steps:
acquiring data distribution information of a distributed database; the data distribution information indicates the distribution of data records of each data table of the distributed database on a plurality of database instances;
converting the data distribution information into information of a preset structure;
aiming at any data table in the data tables, positioning sub information of the data table from the information of the preset structure; analyzing the sub information and extracting the fragment information of the data table;
and backing up the fragment information of each data table.
2. The method of claim 1, wherein the converting the data distribution information into the information of the preset structure comprises:
converting the data distribution information into a binary data input stream, and converting the data input stream into at least one character string according to a preset format;
and taking the at least one character string as the information of the preset structure.
3. The method of claim 2, wherein said locating the sub information of the data table from the information of the preset structure comprises:
aiming at a first character string, the first character string is any character string in at least one character string, if the first character string is a character string with database attributes, and a database variable in the first character string indicates the distributed database, and a data table variable in the first character string indicates the data table, the character string in a preset position range corresponding to the first character string is determined to be sub information of the data table, and the character string in the preset position range comprises the first character string.
4. The method of claim 3, wherein the parsing the sub-information to extract fragmentation information for the data table comprises:
for any character string in the character strings in the preset position range, if the character string is the first character string, analyzing the library name of the distributed database and the table name of the data table according to the character string;
if the character string comprises the fragment identification, analyzing the fragment column name according to the column name corresponding to the fragment identification in the character string;
and taking the library name, the table name and the fragment column name as fragment information of the data table.
5. The method of claim 1, wherein the obtaining data distribution information for a distributed database comprises:
inquiring a system table of the distributed database to obtain data block storage information in the multiple database instances;
and determining the data distribution information according to the data block storage information in the plurality of database instances.
6. The method of any one of claims 1-5, further comprising:
obtaining a table index object of each data table in the distributed database;
aiming at any data table in the data tables, determining an index field of the data table and an index attribute of the data table according to a table index object of the data table;
generating an index creating command of the data table according to the index field of the data table and the index attribute of the data table; the index creating command is used for backing up index information of the data table.
7. The method of claim 6, wherein determining the index field of the data table and the index attribute of the data table from the table index object of the data table comprises:
taking a field with a field name matched with an index identifier in the index object as an index field of the data table;
determining whether the index of the data table is a unique index according to the value of the first field in the index object;
and determining the expiration time of the index of the data table according to the value of the second field in the index object.
8. The method of any one of claims 1-5, further comprising:
reading data records of the data tables aiming at any data table in the data tables;
aiming at any data record of the data records of the data table, acquiring a default main key in the data records, and if the default main key is a main key of a preset data type, converting the data records into json character strings;
generating an SQL command of the data record according to the json character string in the format of Structured Query Language (SQL); the SQL command is used for backing up the data record.
9. A fragmentation information backup device of a distributed database is characterized by comprising:
the backup module is used for acquiring data distribution information of the distributed database; the data distribution information indicates the distribution of data records of each data table of the distributed database on a plurality of database instances; converting the data distribution information into information of a preset structure;
the analysis module is used for positioning the sub information of the data table from the information of the preset structure aiming at any data table in the data tables; analyzing the sub information and extracting the fragment information of the data table;
the backup module is further configured to backup the fragmentation information of each data table.
10. A computer device comprising a program or instructions that, when executed, perform the method of any of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110700134.4A CN113407378A (en) | 2021-06-23 | 2021-06-23 | Fragmentation information backup method and device for distributed database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110700134.4A CN113407378A (en) | 2021-06-23 | 2021-06-23 | Fragmentation information backup method and device for distributed database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113407378A true CN113407378A (en) | 2021-09-17 |
Family
ID=77682723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110700134.4A Pending CN113407378A (en) | 2021-06-23 | 2021-06-23 | Fragmentation information backup method and device for distributed database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113407378A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117609209A (en) * | 2023-11-29 | 2024-02-27 | 星环信息科技(上海)股份有限公司 | Data recovery method, data recovery device, data recovery equipment, and storage medium |
-
2021
- 2021-06-23 CN CN202110700134.4A patent/CN113407378A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117609209A (en) * | 2023-11-29 | 2024-02-27 | 星环信息科技(上海)股份有限公司 | Data recovery method, data recovery device, data recovery equipment, and storage medium |
CN117609209B (en) * | 2023-11-29 | 2024-08-16 | 星环信息科技(上海)股份有限公司 | Data recovery method, data recovery device, data recovery equipment, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107958057B (en) | Code generation method and device for data migration in heterogeneous database | |
US20210049163A1 (en) | Data preparation context navigation | |
CN106897322A (en) | The access method and device of a kind of database and file system | |
CN111367886A (en) | Method and device for data migration in database | |
CN111831629B (en) | Data processing method and device | |
CN105205053A (en) | Method and system for analyzing database incremental logs | |
CN113297182B (en) | Data migration method, device, storage medium and program product | |
CN106991100B (en) | Data import method and device | |
CN106919697B (en) | Method for simultaneously importing data into multiple Hadoop assemblies | |
CN111367890A (en) | Data migration method and device, computer equipment and readable storage medium | |
CN111694750A (en) | Method and device for constructing software testing environment | |
CN111435367B (en) | Knowledge graph construction method, system, equipment and storage medium | |
CN112948473A (en) | Data processing method, device and system of data warehouse and storage medium | |
CN112181951B (en) | Heterogeneous database data migration method, device and equipment | |
CN109657803B (en) | Construction of machine learning models | |
CN104778252A (en) | Index storage method and index storage device | |
CN111984666B (en) | Database access method, apparatus, computer readable storage medium and computer device | |
CN101751248B (en) | Method and system applied by Web for designing time-sensitive performance test case | |
CN109902070B (en) | WiFi log data-oriented analysis storage search method | |
CN113918658A (en) | Method and device for recovering data | |
CN113407378A (en) | Fragmentation information backup method and device for distributed database | |
CN114297204A (en) | Data storage and retrieval method and device for heterogeneous data source | |
CN114372102A (en) | Data analysis method and device, storage medium and electronic equipment | |
CN112631763A (en) | Program changing method and device of host program | |
CN117492756A (en) | Front-end design method and device for draggable forms and lists of low-code platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |