WO2016169237A1 - 数据处理方法及装置 - Google Patents
数据处理方法及装置 Download PDFInfo
- Publication number
- WO2016169237A1 WO2016169237A1 PCT/CN2015/092759 CN2015092759W WO2016169237A1 WO 2016169237 A1 WO2016169237 A1 WO 2016169237A1 CN 2015092759 W CN2015092759 W CN 2015092759W WO 2016169237 A1 WO2016169237 A1 WO 2016169237A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- split
- import
- processing
- module
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
Definitions
- the present invention relates to the field of communications, and in particular to a data processing method and apparatus.
- Database technology is the core of various information systems such as management information systems, office automation systems, and decision support systems. Part of it is an important technical means for scientific research and decision management.
- the present invention provides a data processing method and apparatus to solve at least the problem of low data import efficiency existing in the related art.
- a data processing method includes: receiving a data import instruction for instructing data to be imported into a database; and splitting the data according to the data import instruction; The data chunks are imported into different storage spaces in the database.
- the splitting the data according to the data importing instruction comprises: determining, according to the data importing instruction, a table structure of the table and data distribution information of the data on the table; according to the table structure And the data distribution information and the descriptor information of the data carried in the data importing instruction identify each data row field in the data; and perform the data according to each data row field in the identified data.
- Split processing comprises: determining, according to the data importing instruction, a table structure of the table and data distribution information of the data on the table; according to the table structure And the data distribution information and the descriptor information of the data carried in the data importing instruction identify each data row field in the data; and perform the data according to each data row field in the identified data.
- performing the splitting process on the data according to the data importing instruction includes: determining whether the data satisfies a splitting rule; and if the determining result is yes, performing the splitting process on the data; If the result is no, the data is subjected to correction processing; and the corrected data is subjected to split processing.
- importing the split data into the different storage spaces in the database includes: downloading the split processed data; and importing the downloaded split data into the storage In the different storage spaces in the database.
- the method further includes: deleting the downloaded data after the split processing.
- the method further includes: summarizing the import result after the split processed data is imported; and feeding back the import result.
- a data processing apparatus comprising: a receiving module configured to receive a data import instruction for instructing data to be imported into a database; and a processing module configured to: according to the data import instruction The data is split and processed; the import module is configured to import the split processed data into different storage spaces in the database.
- the processing module includes: a determining unit, configured to determine a table structure of the table according to the data importing instruction and data distribution information of the data on the table; and an identifying unit configured to be according to the table structure And the data distribution information and the descriptor information of the data carried in the data importing instruction identify each data row field in the data; the first processing unit is configured to each data according to the identified data The row field splits the data.
- the processing module includes: a determining unit, configured to determine whether the data satisfies a splitting rule; and a second processing unit configured to: when the determining result of the determining unit is yes, the data is Performing a splitting process; the correcting unit is configured to perform a correction process on the data when the determination result of the determination unit is negative; and the third processing unit is configured to perform a split process on the data after the correction process.
- the importing module includes: a downloading unit configured to download the split processed data; and an importing unit configured to import the downloaded split processed data into blocks into different storages in the database In space.
- the device further includes: a deleting module, configured to delete the downloaded split processed data.
- a deleting module configured to delete the downloaded split processed data.
- the device further includes: a summary module, configured to summarize the import result after the split processing is performed, and the feedback module is configured to feed back the import result.
- a summary module configured to summarize the import result after the split processing is performed
- the feedback module is configured to feed back the import result.
- a data import instruction for instructing data to be imported into a database is received; the data is split according to the data import instruction; and the split data block is imported into the database.
- the problem of low data import efficiency existing in the related art is solved, and the effect of improving data import efficiency is achieved.
- FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention.
- FIG. 2 is a block diagram showing the structure of a data processing apparatus according to an embodiment of the present invention.
- FIG. 3 is a first structural block diagram of a processing module 24 in a data processing apparatus according to an embodiment of the present invention.
- FIG. 4 is a block diagram showing a second structure of the processing module 24 in the data processing apparatus according to an embodiment of the present invention.
- FIG. 5 is a structural block diagram of an import module 26 in a data processing apparatus according to an embodiment of the present invention.
- FIG. 6 is a block diagram of a first preferred structure of a data processing apparatus according to an embodiment of the present invention.
- FIG. 7 is a block diagram showing a second preferred structure of a data processing apparatus according to an embodiment of the present invention.
- FIG. 8 is a block diagram showing the structure of an import system according to an embodiment of the present invention.
- FIG. 9 is a flow chart of data import processing in accordance with an embodiment of the present invention.
- FIG. 1 is a flowchart of a data processing method according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:
- Step S102 receiving a data import instruction for instructing to import data into the database
- Step S104 performing splitting processing on the data according to the data importing instruction
- step S106 the split data block is imported into different storage spaces in the database.
- the above-mentioned database can be called a distributed database system, in which the database system is free from the dependence of large equipment by constructing a high-availability and high-expansion cluster by using ordinary inexpensive equipment.
- a good distributed database architecture can be easily accessed for high availability and can scale out.
- the import and export function of large data volume is a key technology in distributed databases.
- the data may be split according to the table used for splitting, wherein the data is imported according to the data import instruction.
- Performing the splitting process includes: determining, according to the data importing instruction, a table structure of the table and data distribution information of the data on the table; and identifying data according to the table structure, the data distribution information, and the descriptor information of the data carried in the data importing instruction.
- Each data row field; the data is split according to each data row field in the identified data.
- the legality of the data import instruction may be first determined, and then the table structure information and the distribution policy information of the imported destination library table are obtained, and then the data file is read, according to the table structure information and the distribution strategy.
- Splitting the imported data file into multiple underlying databases ie, the above-mentioned storage space) storing a plurality of corresponding small files and transmitting them to the underlying database of each destination
- the cluster management module issues instructions to each of the underlying databases to perform import of the corresponding files.
- the splitting the data according to the data importing instruction includes: determining whether the data satisfies the splitting rule; and if the determining result is yes, splitting the data; If not, the above data is corrected; the corrected data is split.
- there may be multiple correction methods which may be performed by an administrator, that is, artificially; or, by means of a module that performs split processing, other modules may be acquired without manual intervention.
- Correction is performed according to some correction rules; of course, it can be corrected by manual and corresponding modules, and so on.
- This method can be used to know in time whether the data that needs to be imported into the database can be split, thereby further improving the splitting efficiency.
- the error line data can be extracted to ensure the correctness of the imported data.
- the split processed data when the split data block is imported into a different storage space in the database, the split processed data may be downloaded first; and the downloaded split processed data is downloaded. The partitions are imported into different storage spaces in the database.
- the method further includes: Delete the downloaded split processed data. Thereby achieving the purpose of clearing the garbage data file and reducing the memory occupation. This allows the database to store more data.
- the import result may also be fed back.
- the method further includes: performing import processing on the split processed data. After the import result; feedback the above import results. This allows the user to clearly determine the import result.
- the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
- the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
- the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods of various embodiments of the present invention.
- a data processing device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and will not be described again.
- the term “module” may implement a combination of software and/or hardware of a predetermined function.
- the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
- FIG. 2 is a block diagram showing the structure of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes a receiving module 22, a processing module 24, and an importing module 26. The apparatus will be described below.
- the receiving module 22 is configured to receive a data import instruction for instructing to import data into the database; the processing module 24, The receiving module 22 is configured to split the data according to the data importing instruction; the importing module 26 is connected to the processing module 24, and is configured to import the split processed data into different storages in the database. In space.
- FIG. 3 is a first structural block diagram of a processing module 24 in a data processing apparatus according to an embodiment of the present invention.
- the processing module 24 includes a determining unit 32, an identifying module 34, and a first processing unit 36. The device will be described.
- the determining unit 32 is configured to determine the table structure of the table and the data distribution information of the data on the table according to the data importing instruction; the identifying unit 34 is connected to the determining unit 32, and is set according to the table structure, the data distribution information, and the data importing instruction.
- the descriptor information of the carried data identifies each data line field in the data; the first processing unit 36, coupled to the above-described identification unit 34, is arranged to split the data according to each of the data row fields in the identified data.
- FIG. 4 is a second structural block diagram of a processing module 24 in a data processing apparatus according to an embodiment of the present invention.
- the processing module 24 includes a determining unit 42, a second processing unit 44, a correcting unit 46, and a third Processing unit 48, the processing module 24 will be described below.
- the determining unit 42 is configured to determine whether the data satisfies the splitting rule; the second processing unit 44 is connected to the determining unit, and is configured to perform splitting processing on the data if the determining result of the determining unit 42 is YES; The correcting unit 46 is connected to the determining unit 42 and configured to perform correction processing on the data when the determination result of the determining unit 42 is negative. The third processing unit 48 is connected to the correcting unit 46 and is set to correct the data. The processed data is split.
- the import module 26 includes a download unit 52 and an import unit 54, which will be described below.
- the download unit 52 is configured to download the split processed data
- the import unit 54 is connected to the download unit 52, and is configured to import the downloaded split processed data into different storage spaces in the database.
- FIG. 6 is a block diagram of a first preferred structure of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 6, the apparatus includes a deletion module 62 in addition to all the modules shown in FIG. Be explained.
- the deletion module 62 is connected to the above-described import module 26 and is set to delete the downloaded split processed data.
- FIG. 7 is a second preferred structural block diagram of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 7, the apparatus includes a summary module 72 and a feedback module 74, in addition to all the modules shown in FIG. The device will be described.
- the summary module 72 is connected to the import module 26, and is provided to summarize the import result after the split processing data is imported.
- the feedback module 74 is connected to the summary module 72 and is provided to feed back the import result.
- the existing solutions in the related art are all performed on a traditional single database, and the efficiency of the table is not required, and the system architecture is not required.
- the solution in the embodiment of the present invention is based on a distributed database system, and satisfies the characteristics of the atomicity/consistency/isolation/durability (ACID) of the database, and can be executed concurrently.
- ACID atomicity/consistency/isolation/durability
- the data import client module 82 is included.
- the module may be located between an external system and a data import server module, or may be located in an external system.
- the module is not shown in FIG. 8), the data import server module 84 (corresponding to the download server 84 in FIG. 8, the same as the receiving module 22, the processing module 24, and the import module 26), and the metadata center module 86 ( Corresponding to the metadata server 84 in FIG. 8 , the cluster management center module 88 (corresponding to the cluster manager 88 in FIG. 8 , the same as the summary module 72 and the feedback module 74 described above), and the database proxy module 810 (the same as the above deletion) Module 62) and database module 812, each module will be described below.
- the data import client module 82 (LoadClient) is mainly for the user, and the user initiates an import and export command through the module.
- the data import server module 84 (LoadServer) is configured to accept the import and export commands sent by the client, split and merge the data files according to the data distribution policy, and interact with other modules to coordinate the entire import and export process.
- the metadata center module 86 is arranged to store and manage all metadata information for the entire distributed database system.
- the cluster management center module 88 is mainly responsible for monitoring, managing, and maintaining various database clusters (DBClusters).
- the database agent module 810 is a database node management monitoring module. It is responsible for real-time monitoring of the running status of the DB nodes under its jurisdiction, and periodically collects running statistics.
- Database module 812 is the underlying module that holds all data.
- the data import and export server module 84 queries the metadata center module 86 for the metadata information of the table according to the cluster ID, the database name, and the table name, and is used to obtain the table structure definition and the data distribution information;
- the data import server module 84 uses the obtained information (plus the data file descriptor information) to identify each data row field in the data file (datafilename), and performs data file splitting;
- the data import server module 84 requests the cluster management center module 88 to notify each database agent module 810 to download the split file of the managed DBGroup;
- the data import server module 84 requests the cluster management center module 88 to notify the respective database agent module 810 to execute the real load data file LOAD DATA INFILE command after each database agent module 810 downloads successfully;
- the data import and export server module 84 requests the cluster management center 88 to notify each database agent module 810 to delete the garbage data file (the garbage data file here may be the downloaded data after being loaded);
- the data import server module 84 summarizes the results and notifies the data import and export client module 82.
- FIG. 9 is a flowchart of data import processing according to an embodiment of the present invention. As shown in FIG. 9, the flow includes the following steps:
- Step S902 the data import client module 82 sends an import data request to the data import server module 84.
- Step S904 the data import server module 84 sends a query database metadata request to the metadata center module 86 according to the cluster ID, the database name, and the table name, and the request is used to query the metadata information of the table;
- Step S906 the metadata center module 86 returns a table structure definition and data distribution information, including various field types and lengths of the table, and distribution keys and which DBGroups are distributed;
- step S908 the data import server module 84 uses the information returned by the metadata center module 88 (in addition, the data file descriptor information) to identify each data row field in the data file (datafilename) for data file splitting. If the data is wrong during the splitting process, if the type does not meet the definition of the table, the error data is selected and placed in the error file;
- Step S910 the data import server module 84 requests the cluster management center module 88 to notify each database agent module 810 to download the split file of the managed DBGroup;
- Step S912 the cluster management center module 88 notifies each database proxy module 810 to download the split file of the managed DBGroup;
- each database proxy module 810 notifies the ftp service connection data import server module 84 to download the corresponding split file, and each database proxy module 810 successfully downloads the corresponding split file and returns to the cluster management center module 88. response;
- Step S916 the cluster management center module 88 summarizes the download result
- Step S918 after receiving the successful response of all the database proxy modules 810, the cluster management center module 88 returns a successful response to the data import server module 84.
- Step S920 after the data import server module 84 downloads successfully, the cluster management center module 88 is requested to notify each database agent module 810 to execute a real LOAD DATA INFILE command;
- Step S922 the cluster management center module 88 notifies the database proxy module 810 to execute the real LOAD DATA INFILE command;
- each database proxy module 810 connects to the managed database module to execute a real LOAD DATA INFILE command; after each database proxy module 810 executes the real LOAD DATA INFILE command successfully, it returns a successful response to the cluster management center module 88;
- Step S926 after receiving the successful response of all the database proxy modules 810, the cluster management center module 88 returns a successful response to the data import server module 84; after the LOAD DATA INFILE command is successfully executed, the data import server module 84 requests the cluster management center module again. 88 to notify each database agent module to delete the garbage data file; the data import server module 84 summarizes the results and notifies the data import client module 82.
- the solution in the above embodiment is based on a distributed database system, and can import all data types supported by the Mysql database, and of course, can support other types of numbers.
- Applying the solution in the embodiment of the present invention to the distributed database system can increase the concurrency of 2 to 3 times, balance the load, ensure the correctness of importing and exporting data, and the system is robust.
- each of the above modules may be implemented by software or hardware.
- the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
- Embodiments of the present invention also provide a storage medium.
- the foregoing storage medium may be configured to store program code for performing the following steps:
- the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM).
- ROM Read-Only Memory
- RAM Random Access Memory
- modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
- the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
- the invention is not limited to any specific combination of hardware and software.
- the data processing method and apparatus provided by the embodiments of the present invention have the following beneficial effects: the problem of low data import efficiency existing in the related art is solved, and the effect of improving data import efficiency is achieved.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供了一种数据处理方法及装置,其中,该方法包括:接收用于指示将数据导入数据库的数据导入指令;根据数据导入指令对上述数据进行拆分处理;将拆分处理后的数据分块导入至上述数据库中不同的存储空间中。通过本发明,解决了相关技术中存在的数据导入效率低的问题,进而达到了提高数据导入效率的效果。
Description
本发明涉及通信领域,具体而言,涉及一种数据处理方法及装置。
随着科技的发展,数据库在人们的生活中起着越来越重要的作用。在当前的信息化社会中,充分有效地管理和利用各类资源,是进行科学研究和决策管理的前提条件,数据库技术是管理信息系统、办公自动化系统、决策支持系统等各类信息系统的核心部分,是进行科学研究和决策管理的重要技术手段。
传统的数据库系统一般是通过高端设备,例如小型机或者高端存储来保证数据库完整性,或者通过增加内存中央处理器(Central Processing Unit,简称为CPU)来提高数据库处理能力。但是这种集中式的数据库架构越来越不适合海量数据库处理,数据导入效率低,而且也得付出高额的费用。
针对相关技术中存在的数据导入效率低的问题,目前尚未提出有效的解决方案。
发明内容
本发明提供了一种数据处理方法及装置,以至少解决相关技术中存在的数据导入效率低的问题。
根据本发明的一个方面,提供了一种数据处理方法,包括:接收用于指示将数据导入数据库的数据导入指令;根据所述数据导入指令对所述数据进行拆分处理;将拆分处理后的数据分块导入至所述数据库中不同的存储空间中。
可选地,根据所述数据导入指令对所述数据进行拆分处理包括:根据所述数据导入指令确定表的表结构和所述数据在所述表上的数据分布信息;根据所述表结构、所述数据分布信息和所述数据导入指令中携带的所述数据的描述符信息识别所述数据中每个数据行字段;根据识别的所述数据中每个数据行字段对所述数据进行拆分处理。
可选地,根据所述数据导入指令对所述数据进行拆分处理包括:判断所述数据是否满足拆分规则;在判断结果为是的情况下,对所述数据进行拆分处理;在判断结果为否的情况下,对所述数据进行修正处理;对修正处理后的数据进行拆分处理。
可选地,将拆分处理后的数据分块导入至所述数据库中不同的存储空间中包括:下载拆分处理后的数据;将下载的所述拆分处理后的数据分块导入至所述数据库中不同的存储空间中。
可选地,在将拆分处理后的数据分块导入至所述数据库中不同的存储空间中之后,还包括:删除下载的所述拆分处理后的数据。
可选地,在将拆分处理后的数据分块导入至所述数据库中不同的存储空间中之后,还包括:汇总对拆分处理后的数据进行导入处理后的导入结果;反馈所述导入结果。
根据本发明的另一方面,提供了一种数据处理装置,包括:接收模块,设置为接收用于指示将数据导入数据库的数据导入指令;处理模块,设置为根据所述数据导入指令对所述数据进行拆分处理;导入模块,设置为将拆分处理后的数据分块导入至所述数据库中不同的存储空间中。
可选地,所述处理模块包括:确定单元,设置为根据所述数据导入指令确定表的表结构和所述数据在所述表上的数据分布信息;识别单元,设置为根据所述表结构、所述数据分布信息和所述数据导入指令中携带的所述数据的描述符信息识别所述数据中每个数据行字段;第一处理单元,设置为根据识别的所述数据中每个数据行字段对所述数据进行拆分处理。
可选地,所述处理模块包括:判断单元,设置为判断所述数据是否满足拆分规则;第二处理单元,设置为在所述判断单元的判断结果为是的情况下,对所述数据进行拆分处理;修正单元,设置为在所述判断单元的判断结果为否的情况下,对所述数据进行修正处理;第三处理单元,设置为对修正处理后的数据进行拆分处理。
可选地,所述导入模块包括:下载单元,设置为下载拆分处理后的数据;导入单元,设置为将下载的所述拆分处理后的数据分块导入至所述数据库中不同的存储空间中。
可选地,所述装置还包括:删除模块,设置为删除下载的所述拆分处理后的数据。
可选地,所述装置还包括:汇总模块,设置为汇总对拆分处理后的数据进行导入处理后的导入结果;反馈模块,设置为反馈所述导入结果。
通过本发明,采用接收用于指示将数据导入数据库的数据导入指令;根据所述数据导入指令对所述数据进行拆分处理;将拆分处理后的数据分块导入至所述数据库中不同的存储空间中,解决了相关技术中存在的数据导入效率低的问题,进而达到了提高数据导入效率的效果。
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例的数据处理方法的流程图;
图2是根据本发明实施例的数据处理装置的结构框图;
图3是根据本发明实施例的数据处理装置中处理模块24的第一种结构框图;
图4是根据本发明实施例的数据处理装置中处理模块24的第二种结构框图;
图5是根据本发明实施例的数据处理装置中导入模块26的结构框图;
图6是根据本发明实施例的数据处理装置的第一种优选结构框图;
图7是根据本发明实施例的数据处理装置的第二种优选结构框图;
图8是根据本发明实施例的导入系统结构框图;
图9是根据本发明实施例的数据导入处理流程图。
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
在本实施例中提供了一种数据处理方法,图1是根据本发明实施例的数据处理方法的流程图,如图1所示,该流程包括如下步骤:
步骤S102,接收用于指示将数据导入数据库的数据导入指令;
步骤S104,根据数据导入指令对上述数据进行拆分处理;
步骤S106,将拆分处理后的数据分块导入至上述数据库中不同的存储空间中。
通过上述步骤,在执行将数据导入到数据库中的处理时,首先对数据进行拆分处理,然后将拆分处理后的数据分块导入到数据库不同的存储空间中,并且对数据分块导入时,可以并行执行,提高导入效率。从而解决了相关技术中存在的数据导入效率低的问题,进而达到了提高数据导入效率的效果。上述的数据库可以称之为分布式数据库系统,该中数据库系统通过采用普通廉价的设备构建出高可用性和高扩展的集群,从而摆脱了大型设备的依赖。一个好的分布式数据库架构可以比较方便达到高可用性,可以达到向外扩展的能力。其中大数据量的导入导出功能是分布式数据库中较为关键的技术。
在对数据执行拆分处理时,可以有多种拆分方式,在一个可选的实施例中,可以依据用于拆分的表对上述数据进行拆分,其中,根据上述数据导入指令对数据进行拆分处理包括:根据上述数据导入指令确定表的表结构和数据在该表上的数据分布信息;根据上述表结构、数据分布信息和数据导入指令中携带的数据的描述符信息识别数据中每个数据行字段;根据识别的数据中每个数据行字段对数据进行拆分处理。其中,接收到数据导入指令后也可以首先去确定该数据导入指令的合法性,然后获取导入的目的库表的表结构信息和分布策略信息,进而读取数据文件,根据表结构信息和分布策略对待导入数据文件进行拆分,拆分成多个底层数据库(即,上述的存储空间)存储相对应的多个小文件并传送到各目的底层数据库的指
定目录下,然后通过集群管理模块下发指令到各底层数据库执行对应文件的导入。
能对数据进行拆分的前提是,该数据需要满足预定的拆分规则,但是也会存在数据不满足拆分规则的情况,该情况下,就需要对数据进行修正,以使该数据满足拆分规则。在一个可选的实施例中,根据上述数据导入指令对数据进行拆分处理包括:判断上述数据是否满足拆分规则;在判断结果为是的情况下,对数据进行拆分处理;在判断结果为否的情况下,对上述数据进行修正处理;对修正处理后的数据进行拆分处理。其中,在对数据进行修正处理时,可以有多种修正方式,可以是由管理员,即人为地去进行修正;也可以在无需人工干预的情况下,由执行拆分处理的模块获取其他模块根据某些修正规则去进行修正;当然可以由人工和相应地模块相互配合去进行修正,等等。采用该方法可以及时获知需要导入数据库的数据是否能够拆分,从而进一步提高拆分效率。并且,对于需要导入的数据存在错误的情况,可以将错误行数据提取出来,保证导入数据的正确性。
在一个可选的实施例中,在将拆分处理后的数据分块导入至数据库中不同的存储空间中时,可以先下载拆分处理后的数据;再将下载的拆分处理后的数据分块导入至数据库中不同的存储空间中。
当将数据导入到数据库中之后,下载的数据可以不用继续保留,在一个可选的实施例中,在将拆分处理后的数据分块导入至数据库中不同的存储空间中之后,还包括:删除下载的拆分处理后的数据。从而实现清楚垃圾数据文件的目的,减少内存的占用。从而可以使得数据库存储更多的数据。
在将拆分处理后的数据分块导入至数据库中不同的存储空间中之后,还可以反馈导入结果,在一个可选的实施例中,还包括:汇总对拆分处理后的数据进行导入处理后的导入结果;反馈上述导入结果。从而可以使得用户清楚的确定导入结果。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例的方法。
在本实施例中还提供了一种数据处理装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图2是根据本发明实施例的数据处理装置的结构框图,如图2所示,该装置包括接收模块22、处理模块24和导入模块26,下面对该装置进行说明。
接收模块22,设置为接收用于指示将数据导入数据库的数据导入指令;处理模块24,连
接至上述接收模块22,设置为根据上述数据导入指令对数据进行拆分处理;导入模块26,连接至上述处理模块24,设置为将拆分处理后的数据分块导入至数据库中不同的存储空间中。
图3是根据本发明实施例的数据处理装置中处理模块24的第一种结构框图,如图3所示,该处理模块24包括确定单元32、识别模块34和第一处理单元36,下面对该装置进行说明。
确定单元32,设置为根据数据导入指令确定表的表结构和数据在表上的数据分布信息;识别单元34,连接至上述确定单元32,设置为根据表结构、数据分布信息和数据导入指令中携带的数据的描述符信息识别数据中每个数据行字段;第一处理单元36,连接至上述识别单元34,设置为根据识别的数据中每个数据行字段对数据进行拆分处理。
图4是根据本发明实施例的数据处理装置中处理模块24的第二种结构框图,如图4所示,该处理模块24包括判断单元42、第二处理单元44、修正单元46和第三处理单元48,下面对该处理模块24进行说明。
判断单元42,设置为判断上述数据是否满足拆分规则;第二处理单元44,连接至上述判断单元,设置为在判断单元42的判断结果为是的情况下,对上述数据进行拆分处理;修正单元46,连接至上述判断单元42,设置为在判断单元42的判断结果为否的情况下,对上述数据进行修正处理;第三处理单元48,连接至上述修正单元46,设置为对修正处理后的数据进行拆分处理。
图5是根据本发明实施例的数据处理装置中导入模块26的结构框图,如图5所示,该导入模块26包括下载单元52和导入单元54,下面对该装置进行说明。
下载单元52,设置为下载拆分处理后的数据;导入单元54,连接至上述下载单元52,设置为将下载的拆分处理后的数据分块导入至数据库中不同的存储空间中。
图6是根据本发明实施例的数据处理装置的第一种优选结构框图,如图6所示,该装置除包括图5所示的所有模块外,还包括删除模块62,下面对该装置进行说明。
删除模块62,连接至上述导入模块26,设置为删除下载的拆分处理后的数据。
图7是根据本发明实施例的数据处理装置的第二种优选结构框图,如图7所示,该装置除包括图2所示的所有模块外,还包括汇总模块72和反馈模块74,下面对该装置进行说明。
汇总模块72,连接至上述导入模块26,设置为汇总对拆分处理后的数据进行导入处理后的导入结果;反馈模块74,连接至上述汇总模块72,设置为反馈上述导入结果。
下面结合具体的实施例对本发明继续进行说明。
从前述可以看出,相关技术中已有的方案都是针对传统的单个数据库进行,无需考虑表的分布结构以及系统架构,效率较低。而发明实施例中的方案是基于分布式数据库系统,满足数据库的原子性/一致性/隔离性/耐久性(Atomicity/Consistency/Isolation/Durability,简称为ACID)特性,且可以并发执行。采用shell脚本进行导入导出,具有高度实时性,移植性和可行性,极高的用户体验,是对现有技术的一次重大革新。
图8是根据本发明实施例的导入系统结构框图,如图8所示,包括数据导入客户端模块82(该模块可以位于外部系统和数据导入服务端模块之间,也可以位于外部系统中,该模块未在图8中画出)、数据导入服务端模块84(对应于图8中的下载服务器84,同上述的接收模块22、处理模块24和导入模块26)、元数据中心模块86(对应于图8中的元数据服务器84)、集群管理中心模块88(对应于图8中的集群管理器88,同上述的汇总模块72和反馈模块74)、数据库代理模块810(同上述的删除模块62)和数据库模块812,下面对各模块进行说明。
数据导入客户端模块82(LoadClient)主要面向用户,用户通过该模块发起导入导出命令。
数据导入服务端模块84(LoadServer)设置为接受客户端发送的导入导出命令,根据数据分布策略对数据文件进行拆分和合并,和其他模块进行交互,协调整个导入导出流程。
元数据中心模块86设置为保存和管理整个分布式数据库系统所有元数据信息。
集群管理中心模块88主要负责各个数据库集群(DBCluster)的监控、管理和维护。
数据库代理模块810,该模块为数据库节点管理监控模块,它负责实时监控其所管辖的DB节点的运行状态是否正常,定期收集运行统计信息。
数据库模块812为底层模块,保存所有数据。
利用图8中所示的导入系统结构框图可以实现的核心算法如下:
对于导入流程:
数据导入导出服务端模块84根据集群ID、数据库名和表名去元数据中心模块86查询该表的元数据信息,用于获取表结构定义和数据分布信息;
数据导入服务端模块84使用获取的上述信息(加上数据文件描述符信息)来识别数据文件(datafilename)中的每个数据行字段,进行数据文件拆分;
数据导入服务端模块84请求集群管理中心模块88去通知各个数据库代理模块810去下载所管辖DBGroup的拆分文件;
数据导入服务端模块84在各个数据库代理模块810下载成功之后再请求集群管理中心模块88去通知各个数据库代理模块810执行真正的加载数据文件LOAD DATA INFILE命令;
LOAD DATA INFILE命令执行成功后,数据导入导出服务端模块84再请求集群管理中心88去通知各个数据库代理模块810删除垃圾数据文件(这里的垃圾数据文件可以是已经加载之后的下载的数据);
数据导入服务端模块84汇总结果并通知数据导入导出客户端模块82。
图9是根据本发明实施例的数据导入处理流程图,如图9所示,该流程包括如下步骤:
步骤S902,数据导入客户端模块82向数据导入服务端模块84发送导入数据请求;
步骤S904,数据导入服务端模块84根据集群ID、数据库名和表名向元数据中心模块86发送查询数据库元数据请求,该请求用于查询该表的元数据信息;
步骤S906,元数据中心模块86返回表结构定义和数据分布信息,包括该表各个字段类型和长度,以及分发键和分布在哪些DBGroup上;
步骤S908,数据导入服务端模块84使用元数据中心模块88返回的信息(此外,再加上数据文件描述符信息)来识别数据文件(datafilename)中的每个数据行字段,进行数据文件拆分,拆分过程中若发现数据错误,如类型不符合表定义,将错误数据挑选出来,放入错误文件中;
步骤S910,数据导入服务端模块84请求集群管理中心模块88去通知各个数据库代理模块810去下载所管辖DBGroup的拆分文件;
步骤S912,集群管理中心模块88通知各个数据库代理模块810去下载所管辖DBGroup的拆分文件;
步骤S914,各个数据库代理模块810通知ftp服务连接数据导入服务端模块84所在服务器,下载对应的拆分文件,各个数据库代理模块810下载对应的拆分文件成功后,向集群管理中心模块88返回成功响应;
步骤S916,集群管理中心模块88汇总下载结果;
步骤S918,集群管理中心模块88收到所有数据库代理模块810成功响应后,向数据导入服务端模块84返回成功响应;
步骤S920,数据导入服务端模块84下载成功之后再请求集群管理中心模块88去通知各个数据库代理模块810执行真正的LOAD DATA INFILE命令;
步骤S922,集群管理中心模块88通知数据库代理模块810去执行真正的LOAD DATA INFILE命令;
步骤S924,各个数据库代理模块810连接管理的数据库模块,执行真正的LOAD DATA INFILE命令;各个数据库代理模块810执行真正的LOAD DATA INFILE命令成功后,向集群管理中心模块88返回成功响应;
步骤S926,集群管理中心模块88收到所有数据库代理模块810成功响应后,向数据导入服务端模块84返回成功响应;LOAD DATA INFILE命令执行成功后,数据导入服务端模块84再请求集群管理中心模块88去通知各个数据库代理模块删除垃圾数据文件;数据导入服务端模块84汇总结果并通知数据导入客户端模块82。
上述实施例中的方案是基于分布式数据库系统提出的,可以导入Mysql数据库支持的所有数据类型,当然也可以支持其他类型的数量。在分布式数据库系统上应用本发明实施例中的方案,可以提高2到3倍的并发量,均衡负载,且保证导入导出数据的正确性,系统强壮性较好。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块分别位于多个处理器中。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:
S1,接收用于指示将数据导入数据库的数据导入指令;
S2,根据数据导入指令对上述数据进行拆分处理;
S3,将拆分处理后的数据分块导入至上述数据库中不同的存储空间中。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
如上所述,本发明实施例提供的一种数据处理方法及装置具有以下有益效果:解决了相关技术中存在的数据导入效率低的问题,进而达到了提高数据导入效率的效果。
Claims (12)
- 一种数据处理方法,包括:接收用于指示将数据导入数据库的数据导入指令;根据所述数据导入指令对所述数据进行拆分处理;将拆分处理后的数据分块导入至所述数据库中不同的存储空间中。
- 根据权利要求1所述的方法,其中,根据所述数据导入指令对所述数据进行拆分处理包括:根据所述数据导入指令确定表的表结构和所述数据在所述表上的数据分布信息;根据所述表结构、所述数据分布信息和所述数据导入指令中携带的所述数据的描述符信息识别所述数据中每个数据行字段;根据识别的所述数据中每个数据行字段对所述数据进行拆分处理。
- 根据权利要求1所述的方法,其中,根据所述数据导入指令对所述数据进行拆分处理包括:判断所述数据是否满足拆分规则;在判断结果为是的情况下,对所述数据进行拆分处理;在判断结果为否的情况下,对所述数据进行修正处理;对修正处理后的数据进行拆分处理。
- 根据权利要求1所述的方法,其中,将拆分处理后的数据分块导入至所述数据库中不同的存储空间中包括:下载拆分处理后的数据;将下载的所述拆分处理后的数据分块导入至所述数据库中不同的存储空间中。
- 根据权利要求4所述的方法,其中,在将拆分处理后的数据分块导入至所述数据库中不同的存储空间中之后,还包括:删除下载的所述拆分处理后的数据。
- 根据权利要求1所述的方法,其中,在将拆分处理后的数据分块导入至所述数据库中不同的存储空间中之后,还包括:汇总对拆分处理后的数据进行导入处理后的导入结果;反馈所述导入结果。
- 一种数据处理装置,包括:接收模块,设置为接收用于指示将数据导入数据库的数据导入指令;处理模块,设置为根据所述数据导入指令对所述数据进行拆分处理;导入模块,设置为将拆分处理后的数据分块导入至所述数据库中不同的存储空间中。
- 根据权利要求7所述的装置,其中,所述处理模块包括:确定单元,设置为根据所述数据导入指令确定表的表结构和所述数据在所述表上的数据分布信息;识别单元,设置为根据所述表结构、所述数据分布信息和所述数据导入指令中携带的所述数据的描述符信息识别所述数据中每个数据行字段;第一处理单元,设置为根据识别的所述数据中每个数据行字段对所述数据进行拆分处理。
- 根据权利要求7所述的装置,其中,所述处理模块包括:判断单元,设置为判断所述数据是否满足拆分规则;第二处理单元,设置为在所述判断单元的判断结果为是的情况下,对所述数据进行拆分处理;修正单元,设置为在所述判断单元的判断结果为否的情况下,对所述数据进行修正处理;第三处理单元,设置为对修正处理后的数据进行拆分处理。
- 根据权利要求7所述的装置,其中,所述导入模块包括:下载单元,设置为下载拆分处理后的数据;导入单元,设置为将下载的所述拆分处理后的数据分块导入至所述数据库中不同的存储空间中。
- 根据权利要求10所述的装置,其中,还包括:删除模块,设置为删除下载的所述拆分处理后的数据。
- 根据权利要求7所述的装置,其中,还包括:汇总模块,设置为汇总对拆分处理后的数据进行导入处理后的导入结果;反馈模块,设置为反馈所述导入结果。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510198455.3A CN106156209A (zh) | 2015-04-23 | 2015-04-23 | 数据处理方法及装置 |
CN201510198455.3 | 2015-04-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016169237A1 true WO2016169237A1 (zh) | 2016-10-27 |
Family
ID=57143721
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/092759 WO2016169237A1 (zh) | 2015-04-23 | 2015-10-23 | 数据处理方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106156209A (zh) |
WO (1) | WO2016169237A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019205415A1 (zh) * | 2018-04-22 | 2019-10-31 | 平安科技(深圳)有限公司 | 数据导入管理方法、装置、移动终端和存储介质 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108153852A (zh) * | 2017-12-22 | 2018-06-12 | 中国平安人寿保险股份有限公司 | 一种数据处理方法、装置、终端设备及存储介质 |
CN108256087B (zh) * | 2018-01-22 | 2020-12-04 | 北京腾云天下科技有限公司 | 一种基于位图结构的数据导入、查询及处理方法 |
CN110110024B (zh) * | 2019-04-29 | 2021-12-17 | 东南大学 | 一种大容量vct文件导入空间数据库方法 |
CN110795764A (zh) * | 2019-11-01 | 2020-02-14 | 中国银行股份有限公司 | 一种数据脱敏方法及系统 |
CN110990405B (zh) * | 2019-11-28 | 2024-04-12 | 中国银行股份有限公司 | 一种数据装载方法、装置、服务器及存储介质 |
CN113722277A (zh) * | 2020-05-25 | 2021-11-30 | 中兴通讯股份有限公司 | 一种数据导入方法、装置、服务平台及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050055351A1 (en) * | 2003-09-05 | 2005-03-10 | Oracle International Corporation | Apparatus and methods for transferring database objects into and out of database systems |
CN102750368A (zh) * | 2012-06-18 | 2012-10-24 | 天津神舟通用数据技术有限公司 | 一种数据库集群数据高速导入方法 |
CN102906751A (zh) * | 2012-07-25 | 2013-01-30 | 华为技术有限公司 | 一种数据存储、数据查询的方法及装置 |
CN103077183A (zh) * | 2012-12-14 | 2013-05-01 | 北京普泽天玑数据技术有限公司 | 一种分布式顺序表的数据导入方法及其系统 |
CN103473334A (zh) * | 2013-09-18 | 2013-12-25 | 浙江中控技术股份有限公司 | 数据存储、查询方法及系统 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8078825B2 (en) * | 2009-03-11 | 2011-12-13 | Oracle America, Inc. | Composite hash and list partitioning of database tables |
-
2015
- 2015-04-23 CN CN201510198455.3A patent/CN106156209A/zh not_active Withdrawn
- 2015-10-23 WO PCT/CN2015/092759 patent/WO2016169237A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050055351A1 (en) * | 2003-09-05 | 2005-03-10 | Oracle International Corporation | Apparatus and methods for transferring database objects into and out of database systems |
CN102750368A (zh) * | 2012-06-18 | 2012-10-24 | 天津神舟通用数据技术有限公司 | 一种数据库集群数据高速导入方法 |
CN102906751A (zh) * | 2012-07-25 | 2013-01-30 | 华为技术有限公司 | 一种数据存储、数据查询的方法及装置 |
CN103077183A (zh) * | 2012-12-14 | 2013-05-01 | 北京普泽天玑数据技术有限公司 | 一种分布式顺序表的数据导入方法及其系统 |
CN103473334A (zh) * | 2013-09-18 | 2013-12-25 | 浙江中控技术股份有限公司 | 数据存储、查询方法及系统 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019205415A1 (zh) * | 2018-04-22 | 2019-10-31 | 平安科技(深圳)有限公司 | 数据导入管理方法、装置、移动终端和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN106156209A (zh) | 2016-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016169237A1 (zh) | 数据处理方法及装置 | |
US10108632B2 (en) | Splitting and moving ranges in a distributed system | |
US10853242B2 (en) | Deduplication and garbage collection across logical databases | |
CN110147407B (zh) | 一种数据处理方法、装置及数据库管理服务器 | |
US9426219B1 (en) | Efficient multi-part upload for a data warehouse | |
CN111258978B (zh) | 一种数据存储的方法 | |
US10860604B1 (en) | Scalable tracking for database udpates according to a secondary index | |
US10877810B2 (en) | Object storage system with metadata operation priority processing | |
CN102779185A (zh) | 一种高可用分布式全文索引方法 | |
WO2019109854A1 (zh) | 分布式数据库数据处理方法、装置、存储介质及电子装置 | |
CN108563697B (zh) | 一种数据处理方法、装置和存储介质 | |
US10515228B2 (en) | Commit and rollback of data streams provided by partially trusted entities | |
CN114077602B (zh) | 数据迁移方法和装置、电子设备、存储介质 | |
CN116204575A (zh) | 将数据导入数据库的方法、装置、设备及计算机存储介质 | |
US11216421B2 (en) | Extensible streams for operations on external systems | |
CN109299225A (zh) | 日志检索方法、系统、终端及计算机可读存储介质 | |
CN112685499A (zh) | 一种工作业务流的流程数据同步方法、装置及设备 | |
US10185735B2 (en) | Distributed database system and a non-transitory computer readable medium | |
JP5684671B2 (ja) | 条件検索データ保存方法、条件検索データベースクラスタシステム、ディスパッチャ、およびプログラム | |
CN116775712A (zh) | 联表查询方法、装置、电子设备、分布式系统和存储介质 | |
CN113448775B (zh) | 多源异构数据备份方法及装置 | |
CN112100208B (zh) | 一种操作请求的转发方法和装置 | |
CN111782634B (zh) | 数据分布式存储方法、装置、电子设备及存储介质 | |
CN115587119A (zh) | 一种数据库查询方法、装置、电子设备及存储介质 | |
CN114416438A (zh) | 数据导出方法、装置、计算机设备及调度服务系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15889720 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15889720 Country of ref document: EP Kind code of ref document: A1 |