CN111752944A - Data allocation method and device, computer equipment and storage medium - Google Patents

Data allocation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111752944A
CN111752944A CN202010464023.3A CN202010464023A CN111752944A CN 111752944 A CN111752944 A CN 111752944A CN 202010464023 A CN202010464023 A CN 202010464023A CN 111752944 A CN111752944 A CN 111752944A
Authority
CN
China
Prior art keywords
data
preset
processing
segment value
logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010464023.3A
Other languages
Chinese (zh)
Other versions
CN111752944B (en
Inventor
钟泽峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010464023.3A priority Critical patent/CN111752944B/en
Publication of CN111752944A publication Critical patent/CN111752944A/en
Application granted granted Critical
Publication of CN111752944B publication Critical patent/CN111752944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application belongs to the technical field of big data, and relates to a data allocation method, which comprises the steps of storing a received structured data file into a distributed database, generating a plurality of base tables through the distributed database, and merging the base tables to obtain a wide table; taking the wide table as an elastic distributed data set, and processing the elastic distributed data set according to preset processing logic to realize data allocation; and performing tail difference processing on the distributed data to obtain result data, and writing the result data into a pre-established list summary table. The application also provides a data apportionment device, computer equipment and a storage medium. In addition, the application also relates to a block chain technology, and the result data can be stored in the block chain nodes. The method and the device can be used for accelerating the data allocation processing speed of the computer while realizing fine-grained allocation of a large amount of data.

Description

Data allocation method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of big data technologies, and in particular, to a data apportionment method, apparatus, computer device, and storage medium.
Background
With the development of science and technology, data allocation work is gradually handed to computers for execution, and the data allocation work is always a significant work and task. The existing data sharing mode is processing and checking by adopting PL \ SQL of oracle.
However, with the refinement of the original data, the apportionment rules are diversified, and the apportionment result is often increased by geometric multiples relative to the original data, and the result data already reaches hundreds of millions of data volumes, even billions of data volumes. The requirement of fine-grained allocation is continuously increased, higher requirements are provided for the allocation speed, and the existing data allocation scheme is difficult to meet the requirement of fast and fine-grained allocation.
Disclosure of Invention
The embodiment of the application aims to provide a data allocation method, a data allocation device, computer equipment and a storage medium, so that a computer can quickly allocate a large amount of data in a fine granularity mode.
In order to solve the above technical problem, an embodiment of the present application provides a data apportionment method, which adopts the following technical solutions:
a method of data amortization comprising the steps of:
storing the received structured data file into a distributed database, generating a plurality of base tables through the distributed database, and merging the base tables to obtain a wide table;
taking the wide table as an elastic distributed data set, and processing the elastic distributed data set according to preset processing logic to realize data allocation;
and performing tail difference processing on the distributed data to obtain result data, and writing the result data into a pre-established list summary table.
Further, the preset processing logic comprises filtering logic, preprocessing logic, validation logic and apportionment logic;
the step of processing the elastic distributed data set according to a preset processing logic comprises:
respectively taking the filtering logic, the preprocessing logic, the validation logic and the apportionment logic as operators;
and processing the elastic distributed data set according to the operator to realize data allocation.
Further, the wide table comprises scattered table data, and the preset segment value information table comprises segment value data and data association relation;
the step of processing the elastic distributed data set according to the operator comprises:
acquiring the scatter table data and the segment value data, and associating the scatter table data with the segment value data according to the data association relation, wherein the scatter table data and the segment value data are in a many-to-one relation;
selecting the scattered table data successfully associated with the segment value data in the scattered table data, and sequentially accumulating the sum of the scattered table data corresponding to each segment value data to obtain a first segment value scattered table sum;
superposing the first segment value dispersion table sum to obtain a second segment value dispersion table sum, and comparing whether the second segment value dispersion table sum is consistent with the summarized data in the wide table or not;
if so, acquiring a value chain matched with the segment value data in advance, and acquiring a first allocation factor matched with the value chain in advance according to the value chain;
according to first allocation factors corresponding to different value chains, allocating the sum of the first segment value scatter table corresponding to each segment value data to obtain first allocation data;
and respectively carrying out apportionment on the first apportioned data according to a preset second apportionment factor and a preset third apportionment factor to obtain apportioned data.
Further, the base table includes summary data,
the step of processing the apportioned data comprises:
storing the shared data to a first preset digit to obtain pre-result data;
adding the pre-result data, and storing the pre-result data to a second preset digit to obtain sum data;
calculating a difference between the sum data and the summarized data to obtain tail difference data;
if the tail difference data is larger than a preset identification value, the tail difference data is shared according to preset processing logic;
if the tail difference data is smaller than a preset identification value, identifying the tail difference data to a data sheet with the maximum apportionment data to complete data apportionment and obtain result data;
the first preset digit and the second preset digit are both decimal point backward digits, and the first preset digit is larger than the second preset digit.
Further, the base table comprises a certificate header table, a certificate row table and a configuration segment value table; the step of merging the base tables to obtain the wide table comprises:
associating the configuration segment value table with a certificate head table, and associating the certificate head table with a certificate row table;
and adding the information in the configuration segment value table into the certificate row table to obtain a wide table.
Further, after the step of merging the base table into the wide table, the method includes: compressing and storing the wide table in a file form;
the step of treating the wide table as a flexible distributed data set comprises:
and acquiring and decompressing the stored wide table, and taking the decompressed wide table as an elastic distributed data set.
In order to solve the above technical problem, an embodiment of the present application further provides a data apportionment device, which adopts the following technical solutions:
a data amortization apparatus comprising:
the receiving module is used for storing the received structured data file into a distributed database, generating a plurality of base tables through the distributed database, and combining the base tables to obtain a wide table;
the processing module is used for taking the wide table as an elastic distributed data set and processing the elastic distributed data set according to preset processing logic so as to realize data allocation; and
and the writing module is used for carrying out tail difference processing on the shared data to obtain result data and writing the result data into a pre-established list summary table.
Further, the writing module comprises a first processing submodule, a second processing submodule, a calculating submodule, an apportioning submodule and an assigning submodule;
the first processing submodule is used for storing the apportioned data to a first preset digit to obtain pre-result data;
the second processing submodule is used for adding the pre-result data and storing the pre-result data to a second preset digit to obtain sum data;
the calculating submodule is used for calculating the difference between the sum data and the summarized data to obtain tail difference data;
the apportionment submodule is used for apportioning the tail difference data according to preset processing logic when the tail difference data is larger than a preset identification value;
the identification submodule is used for identifying the tail difference data to a data sheet with the maximum apportioned data when the tail difference data is smaller than a preset identification value so as to complete data apportionment and obtain result data;
the first preset digit and the second preset digit are both decimal point backward digits, and the first preset digit is larger than the second preset digit.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
a computer device comprising a memory having a computer program stored therein and a processor implementing the steps of the data amortization method described above when executing the computer program.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the data amortization method as described above.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
this application is through regarding wide table as elasticity distributed data set, when having realized carrying out the sharing of fine grit to a large amount of data, has accelerated the data sharing processing speed of computer, promotes system's sharing performance, simultaneously because handle according to predetermined processing logic through to elasticity distributed data set, be convenient for data sharing in-process and data blood reason analysis and backtracking after sharing. And tail difference processing is carried out on the shared data, so that the accuracy of data sharing is improved.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a data amortization method according to the present application;
FIG. 3 is a schematic block diagram of one embodiment of a data amortization apparatus according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Reference numerals: 200. a computer device; 201. a memory; 202. a processor; 203. a network interface; 300. a data apportioning device; 301. a receiving module; 302. a processing module; 303. and writing into a module.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the data apportionment method provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the data apportionment device is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow diagram of one embodiment of a method of data amortization according to the present application is shown. The data allocation method comprises the following steps:
s1: and storing the received structured data file into a distributed database, generating a plurality of base tables through the distributed database, and merging the base tables to obtain the wide table.
In this embodiment, the base table is also called a large table, which indicates a table with a large data size, and the wide table indicates a table with a large number of table fields. The base table is derived from the business system and includes data to be apportioned. In the present application: the structured data can be a financial system of an enterprise, the distributed database is hadoop, and the sqoop can be adopted to introduce the base table in the oracle into the hadoop. Because the real-time performance of the business system is high, Hadoop and oracle are databases with two dimensions, the base table cannot be directly imported into the Hadoop from the business system, but needs to pass through oracle. Orale is a database belonging to OLTP type, and is mainly used for storing data and avoiding redundancy; hadoop belongs to an OLAP type database and is mainly used for processing and analyzing data. Meanwhile, the structured data file can be directly stored in the Hadoop, and the structured data file can be mapped into a hive table, namely a base table, through the Hadoop. And calling hive in hadoop, merging the base table into a wide table, and storing the wide table in the hive. hive is a component of Hadoop, and serves as a database, and data of hive is stored in a file system of Hadoop.
In this embodiment, the electronic device (for example, the server/terminal device shown in fig. 1) on which the data apportionment method operates may receive the structured data file through a wired connection or a wireless connection. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
Specifically, the base table includes a credential header table, a credential row table, and a configuration segment value table; in step S1, the step of merging the base tables to obtain the wide table includes:
associating the configuration segment value table with a certificate head table, and associating the certificate head table with a certificate row table;
and adding the information in the configuration segment value table into the certificate row table to obtain a wide table.
In this embodiment, the wide table refers to a database table in which indexes, dimensions, and attributes related to the business topic are associated together. In the present application, the wide table is generated based on the processing of the certificate row table. Because the certificate row table is a table which is only added and not deleted, only new certificate row tables are added, a plurality of records exist under the same logic main key, and the wide table is the table which is formed by combining the records and adding the information of the configuration segment value table. The number of fields of the certificate row table is nearly 100, and the number of fields is more after the configuration segment value table is added, but a wide table is generated for subsequent use.
Meanwhile, the metadata of the base table is written into the metadata column of the wide table. Metadata (Metadata), also called intermediary data and relay data, is data (data about data) describing data, and is mainly information describing data attribute (property) for supporting functions such as indicating storage location, history data, resource search, file record, and the like. Writing the metadata into a metadata column of the wide table; specifically, the method comprises the following steps: and recording field information of the table, the data size of the table, the latest updating time, partition information, the increment size and the increment pull time. And the method is convenient for relevant personnel to evaluate the operation time during development and locate the information in the watch. Wherein, relevant personnel can add and modify the metadata according to actual needs.
S2: and taking the wide table as an elastic distributed data set, and processing the elastic distributed data set according to preset processing logic so as to realize data allocation.
In this embodiment, the broad table in the hive is read by calling the spark, and the broad table is processed as RDD (flexible Distributed DataSet) in the spark.
The preset processing logic comprises filtering logic, preprocessing logic, validation logic and allocation logic; the step of processing the elastic distributed data set according to a preset processing logic comprises:
respectively taking the filtering logic, the preprocessing logic, the validation logic and the apportionment logic as operators;
and processing the elastic distributed data set according to the operator to realize data allocation.
In this embodiment, Spark is run centered on the RDD concept. RDD is a fault-tolerant set of elements that can be operated on in parallel. There are two ways to create an RDD: parallelizing an existing set in a driver; a data set is referenced from an external storage system.
Specifically, the wide table includes scattered table data, and the preset segment value information table includes segment value data and data association relation; the step of processing the elastic distributed data set according to the operator comprises:
acquiring the scatter table data and the segment value data, and associating the scatter table data with the segment value data according to the data association relation, wherein the scatter table data and the segment value data are in a many-to-one relation;
selecting the scattered table data successfully associated with the segment value data in the scattered table data, and sequentially accumulating the sum of the scattered table data corresponding to each segment value data to obtain a first segment value scattered table sum;
superposing the first section value dispersion table sum to obtain a second section value dispersion table sum, and comparing whether the second section value dispersion table sum is consistent with the summarized data in the wide table or not;
if so, acquiring a value chain matched with the segment value data in advance, and acquiring a first allocation factor matched with the value chain in advance according to the value chain;
according to first allocation factors corresponding to different value chains, allocating the sum of the first segment value scatter table corresponding to each segment value data to obtain first allocation data;
and respectively carrying out apportionment on the first apportioned data according to a preset second apportionment factor and a preset third apportionment factor to obtain apportioned data.
In this embodiment, whether the sum of the second dispersion table is consistent with the summarized data in the wide table is compared, and if not, an abnormal notification is sent to a designated person. The distributed data set is processed according to the preset processing logic, so that the apportionment results are checked conveniently, the query efficiency of the apportionment process is improved, and backtracking and analysis are facilitated.
The processing logic in the present application is set according to actual needs, and includes:
filtering logic: and the scatter table data is scattered amount data in the wide table, and association join operation is carried out on the amount data of the certificate bank table of the wide table and the segment value data of the preset segment value information table. The data in the certificate row table is filtered and screened through a filter in spark according to the segment values of the cost center segment, the company segment, the sub-segment and the like in the segment value information table, so that the data in the certificate row table related to the segment values of the cost center segment, the company segment, the sub-segment and the like are reserved, other amount data are removed, redundancy is avoided, and subsequent processing of a computer is facilitated.
The preprocessing logic: and identifying segment values such as a cost center segment, a company segment and a sub-segment in the segment value information table, and summarizing the amount data associated with the segment values such as the cost center segment, the company segment and the sub-segment in the certificate bank table respectively to obtain data such as total data of the cost center, total data of the company and sub-total data.
Validation logic: and calculating whether the sum of the total cost data, the total company data, the total sub-category data and the like is equal to the total cost data.
And (3) apportionment logic: the total data of the cost center, the total data of the company, the total data of the sub-company and other data corresponding to the segment values of the cost center segment, the company segment, the sub-category segment and other segment values respectively correspond to different value chains, such as: some total data may correspond to a value chain that is common support-common resource support. And the different value chains correspond to different first allocation factors, and the data corresponding to the different segment values are allocated according to the first allocation factors. Such as: and the total data of the cost center is distributed to each branch company according to the cost center distribution factor, the total data of the company is distributed to each branch company according to the company distribution factor, and the sub-total data is distributed to each branch company according to the sub-division factor. And finishing the first-stage apportionment to obtain first apportionment data, wherein the first-stage apportionment is only apportioned to the branch companies and is not apportioned to the main company. And distributing the first distribution data of each branch company to each contract group under each branch company according to a second distribution factor to obtain second distribution data, wherein the second distribution factor of each branch company is different. And (4) distributing the second distribution data of each contract group to each policy under each contract group according to a preset third distribution factor to obtain third distribution data, namely the distributed data.
After step S1, that is, after the step of merging the base table into the wide table, the method includes: compressing and storing the wide table in a file form;
in step S2, the step of regarding the wide table as the elastic distributed data set includes:
and acquiring and decompressing the stored wide table, and taking the decompressed wide table as an elastic distributed data set.
In the embodiment, the data is processed by adopting a compression format (such as a tar or zip format) and a storage format (storage of computer information in a file mode, including images, sounds, images and the like), so that the storage of the file is reduced, and I/O transmission when a spark reads a wide table from a hive is called later is facilitated.
S3: and performing tail difference processing on the distributed data to obtain result data, and writing the result data into a list summary table created in advance.
In this embodiment, the summarized data is written back to the list summary table of the oracle library and stored.
Specifically, the base table includes summarized data, and in step S3, the step of processing the apportioned data includes:
storing the shared data to a first preset digit to obtain pre-result data;
adding the pre-result data, and storing the pre-result data to a second preset digit to obtain sum data;
calculating a difference between the sum data and the summarized data to obtain tail difference data;
if the tail difference data is larger than a preset identification value, the tail difference data is shared according to preset processing logic;
if the tail difference data is smaller than a preset identification value, identifying the tail difference data to a data sheet with the maximum apportionment data to complete data apportionment and obtain result data;
the first preset digit and the second preset digit are decimal point backward digits, the first preset digit is 10 digits, and the second preset digit is less than 3 digits.
The "tail difference", i.e. the difference in the last digit, is mainly due to errors caused by rounding off in the calculation. For example, the unit price calculation is exactly 5 digits after the decimal point, the total price obtained by multiplying the added unit prices by the tax rate usually retains 2 significant decimal points, so the total price is not equal to the total sum of the sum of each invoice, and the error between the two is called tail difference. Therefore, the tail difference is not an artificial calculation error but an error generated in the process of different calculation methods. The summary data is the sum of the amount data for each row of the credential row table in the base table.
In this application, because of business reasons, so 1 item of policy assigned to headquarters data will be divided into 5 million items, so the data will expand after being assigned, and the result set will be very huge like this, and after the amount of money is assigned, it is saved to a first preset digit, and the first preset digit is a 15-digit decimal number, for example: assuming that 1 piece of money is allocated to the policy, each data is 1 in 5 million, so that if only 2 bits are reserved for the mantissa, a large amount of data errors occur. And adding the pre-result data, and storing the pre-result data to a second preset digit, wherein the second preset digit is 2 decimal places, and the sum data is calculated at the moment, so that no larger data error exists in the 2 decimal places after the sum data is stored to the decimal place. The preset identification value is 1, and if the tail difference is less than 1, the tail difference is identified to the largest running water list (namely the policy).
This application is through regarding wide table as elasticity distributed data set, when having realized carrying out the sharing of fine grit to a large amount of data, has accelerated the data sharing processing speed of computer, promotes system's sharing performance, simultaneously because handle according to predetermined processing logic through to elasticity distributed data set, be convenient for data sharing in-process and data blood reason analysis and backtracking after sharing. In the processing process, whether the sum of the second-stage dispersion table is consistent with the summarized data in the wide table is checked, so that the occurrence of data processing errors is reduced. And tail difference processing is carried out on the shared data, so that the accuracy of data sharing is improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a data apportioning apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.
As shown in fig. 3, the data apportioning apparatus 300 according to this embodiment includes: a receiving module 301, a processing module 302, and a writing module 303. Wherein:
a receiving module 301, configured to store the received structured data file in a distributed database, generate a plurality of base tables through the distributed database, and merge the base tables to obtain a wide table;
a processing module 302, configured to use the wide table as an elastic distributed data set, and process the elastic distributed data set according to a preset processing logic, so as to implement data splitting; and
and a writing module 303, configured to perform tail difference processing on the shared data to obtain result data, and write the result data into a pre-established list summary table.
In this embodiment, this application is through regarding wide table as elasticity distributed data set, when having realized carrying out the sharing of fine grit to a large amount of data, has accelerated the data sharing processing speed of computer, promotes system's sharing performance, simultaneously because handle according to predetermined processing logic through elasticity distributed data set, be convenient for data sharing in-process and data after sharing blood reason analysis and backtracking. And tail difference processing is carried out on the shared data, so that the accuracy of data sharing is improved.
The base table comprises a certificate header table, a certificate row table and a configuration segment value table, and the receiving module 301 comprises an association submodule and an addition submodule; the association submodule is used for associating the configuration segment value table with the certificate head table, and the certificate head table is associated with the certificate row table; and the adding submodule is used for adding the information in the configuration segment value table into the certificate row table to obtain the wide table.
The preset processing logic comprises filtering logic, preprocessing logic, validation logic and allocation logic; the processing module 302 comprises a first operator submodule and a second operator submodule, wherein the first operator submodule is used for respectively taking the filtering logic, the preprocessing logic, the validation logic and the apportionment logic as operators; and the second operator submodule is used for processing the elastic distributed data set according to the operator so as to realize data allocation.
The second operator submodule comprises an association unit, an accumulation unit, a comparison unit, an acquisition unit, an allocation unit and a result unit. The association unit is used for associating the scattered table data in the wide table with the segment value data in a preset segment value information table according to an association relation contained in the segment value information table, wherein the scattered table data and the segment value data are in a many-to-one relation; the accumulation unit is used for selecting the scattered table data successfully associated with the segment value data in the scattered table data, and sequentially accumulating the sum of the scattered table data corresponding to each segment value data to obtain a first segment value scattered table sum; the comparison unit is used for superposing the first segment value dispersion table sum to obtain a second segment value dispersion table sum, and comparing whether the second segment value dispersion table sum is consistent with the summarized data in the wide table or not; the obtaining unit is used for obtaining a value chain matched with the segment value data in advance when the sum of the second segment value scatter table is consistent with the summarized data in the wide table, and obtaining a first allocation factor matched with the value chain in advance according to the value chain; the allocation unit is used for allocating the first segment value scatter table sum corresponding to each segment value data according to first allocation factors corresponding to different value chains to obtain first allocation data; and the result unit is used for respectively allocating the first allocation data according to a preset second allocation factor and a preset third allocation factor to obtain allocated data.
The writing module comprises a first processing submodule, a second processing submodule, a calculating submodule, an apportioning submodule and an indicating submodule. The first processing submodule is used for storing the apportioned data to a first preset digit to obtain pre-result data; the second processing submodule is used for adding the pre-result data and storing the pre-result data to a second preset digit to obtain sum data; the calculating submodule is used for calculating the difference between the sum data and the summarized data to obtain tail difference data; the apportionment submodule is used for apportioning the tail difference data according to preset processing logic when the tail difference data is larger than a preset identification value; the identification submodule is used for identifying the tail difference data to a data sheet with the maximum apportioned data when the tail difference data is smaller than a preset identification value so as to complete data apportionment and obtain result data; the first preset digit and the second preset digit are both decimal point backward digits, and the first preset digit is larger than the second preset digit.
In some optional implementations of this embodiment, the apparatus 300 further includes: and the compression module is used for compressing and storing the wide table in a file form. The processing module is further used for obtaining and decompressing the stored wide table, and the decompressed wide table is used as an elastic distributed data set.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 200 comprises a memory 201, a processor 202, a network interface 203 communicatively connected to each other via a system bus. It is noted that only computer device 200 having components 201 and 203 is shown, but it is understood that not all of the illustrated components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a programmable gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 201 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 201 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 200. Of course, the memory 201 may also include both internal and external storage devices of the computer device 200. In this embodiment, the memory 201 is generally used for storing an operating system installed in the computer device 200 and various types of application software, such as program codes of a data splitting method. Further, the memory 201 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 202 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 202 is generally operative to control overall operation of the computer device 200. In this embodiment, the processor 202 is configured to execute the program code stored in the memory 201 or process data, for example, execute the program code of the data apportionment method.
The network interface 203 may comprise a wireless network interface or a wired network interface, and the network interface 203 is generally used for establishing communication connection between the computer device 200 and other electronic devices.
In this embodiment, when a large amount of data is subjected to fine-grained allocation, the data allocation processing speed of the computer is accelerated, and meanwhile, blood-related analysis and backtracking in the data allocation process and after the data allocation are facilitated.
The present application provides yet another embodiment, which provides a computer-readable storage medium storing a data amortization program executable by at least one processor to cause the at least one processor to perform the steps of the data amortization method as described above.
In this embodiment, when a large amount of data is subjected to fine-grained allocation, the data allocation processing speed of the computer is accelerated, and meanwhile, blood-related analysis and backtracking in the data allocation process and after the data allocation are facilitated.
As an embodiment, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of the blockchain node, etc., such as result data.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A method of data amortization comprising the steps of:
storing the received structured data file into a distributed database, generating a plurality of base tables through the distributed database, and merging the base tables to obtain a wide table;
taking the wide table as an elastic distributed data set, and processing the elastic distributed data set according to preset processing logic to realize data allocation;
and performing tail difference processing on the distributed data to obtain result data, and writing the result data into a pre-established list summary table.
2. The data amortization method of claim 1, wherein said pre-defined processing logic comprises filtering logic, preprocessing logic, validation logic and apportionment logic;
the step of processing the elastic distributed data set according to a preset processing logic comprises:
respectively taking the filtering logic, the preprocessing logic, the validation logic and the apportionment logic as operators;
and processing the elastic distributed data set according to the operator to realize data allocation.
3. The data splitting method according to claim 2, wherein the wide table comprises scatter table data, and the preset segment value information table comprises segment value data and data association relation;
the step of processing the elastic distributed data set according to the operator comprises:
acquiring the scatter table data and the segment value data, and associating the scatter table data with the segment value data according to the data association relation, wherein the scatter table data and the segment value data are in a many-to-one relation;
selecting the scattered table data successfully associated with the segment value data in the scattered table data, and sequentially accumulating the sum of the scattered table data corresponding to each segment value data to obtain a first segment value scattered table sum;
superposing the first segment value dispersion table sum to obtain a second segment value dispersion table sum, and comparing whether the second segment value dispersion table sum is consistent with the summarized data in the wide table or not;
if so, acquiring a value chain matched with the segment value data in advance, and acquiring a first allocation factor matched with the value chain in advance according to the value chain;
according to first allocation factors corresponding to different value chains, allocating the sum of the first segment value scatter table corresponding to each segment value data to obtain first allocation data;
and respectively carrying out apportionment on the first apportioned data according to a preset second apportionment factor and a preset third apportionment factor to obtain apportioned data.
4. The method of claim 1, wherein the base table comprises summary data;
the step of processing the apportioned data comprises:
storing the shared data to a first preset digit to obtain pre-result data;
adding the pre-result data, and storing the pre-result data to a second preset digit to obtain sum data;
calculating a difference between the sum data and the summarized data to obtain tail difference data;
if the tail difference data is larger than a preset identification value, the tail difference data is shared according to preset processing logic;
if the tail difference data is smaller than a preset identification value, identifying the tail difference data to a data sheet with the maximum apportionment data to complete data apportionment and obtain result data;
the first preset digit and the second preset digit are both decimal point backward digits, and the first preset digit is larger than the second preset digit.
5. The data amortization method of claim 1, wherein the base table comprises a credential header table, a credential row table, and a configuration segment value table;
the step of merging the base tables to obtain the wide table comprises:
associating the configuration segment value table with a certificate head table, and associating the certificate head table with a certificate row table;
and adding the information in the configuration segment value table into the certificate row table to obtain a wide table.
6. A method according to any one of claims 1 to 5, wherein after said step of merging said base tables into wide tables, comprising: compressing and storing the wide table in a file form;
the step of treating the wide table as a flexible distributed data set comprises:
and acquiring and decompressing the stored wide table, and taking the decompressed wide table as an elastic distributed data set.
7. A data amortization apparatus, comprising:
the receiving module is used for storing the received structured data file into a distributed database, generating a plurality of base tables through the distributed database, and combining the base tables to obtain a wide table;
the processing module is used for taking the wide table as an elastic distributed data set and processing the elastic distributed data set according to preset processing logic so as to realize data allocation; and
and the writing module is used for carrying out tail difference processing on the shared data to obtain result data and writing the result data into a pre-established list summary table.
8. The data amortization device of claim 7, wherein the writing module comprises a first processing sub-module, a second processing sub-module, a computation sub-module, an apportionment sub-module, and an assignment sub-module;
the first processing submodule is used for storing the apportioned data to a first preset digit to obtain pre-result data;
the second processing submodule is used for adding the pre-result data and storing the pre-result data to a second preset digit to obtain sum data;
the calculating submodule is used for calculating the difference between the sum data and the summarized data to obtain tail difference data;
the apportionment submodule is used for apportioning the tail difference data according to preset processing logic when the tail difference data is larger than a preset identification value;
the identification submodule is used for identifying the tail difference data to a data sheet with the maximum apportioned data when the tail difference data is smaller than a preset identification value so as to complete data apportionment and obtain result data;
the first preset digit and the second preset digit are both decimal point backward digits, and the first preset digit is larger than the second preset digit.
9. A computer device, characterized in that it comprises a memory in which a computer program is stored and a processor which, when executing said computer program, implements the steps of the data amortization method according to any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the data amortization method according to any one of claims 1 to 6.
CN202010464023.3A 2020-05-27 2020-05-27 Data allocation method, device, computer equipment and storage medium Active CN111752944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010464023.3A CN111752944B (en) 2020-05-27 2020-05-27 Data allocation method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010464023.3A CN111752944B (en) 2020-05-27 2020-05-27 Data allocation method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111752944A true CN111752944A (en) 2020-10-09
CN111752944B CN111752944B (en) 2024-07-09

Family

ID=72674031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010464023.3A Active CN111752944B (en) 2020-05-27 2020-05-27 Data allocation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111752944B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396332A (en) * 2020-11-25 2021-02-23 上汽通用五菱汽车股份有限公司 Data synchronization method, device, equipment and medium based on multi-vehicle part change
CN113760836A (en) * 2021-01-27 2021-12-07 北京京东振世信息技术有限公司 Wide table calculation method and device
CN115033227A (en) * 2022-06-17 2022-09-09 中国平安人寿保险股份有限公司 Method, device, equipment and medium for managing product data through page
CN115796472A (en) * 2022-10-17 2023-03-14 北京力控元通科技有限公司 Energy metering processing method and device, electronic equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030195780A1 (en) * 2001-12-13 2003-10-16 Liquid Engines, Inc. Computer-based optimization system for financial performance management
TW201636910A (en) * 2015-04-14 2016-10-16 Chunghwa Telecom Co Ltd Two-dimensional difference apportioning balance module
CN109189835A (en) * 2018-08-21 2019-01-11 北京京东尚科信息技术有限公司 The method and apparatus of the wide table of data are generated in real time
US20190034493A1 (en) * 2017-12-28 2019-01-31 Intel Corporation Method and apparatus to efficiently perform filter operations for an in-memory relational database
CN110729038A (en) * 2019-10-10 2020-01-24 北京东软望海科技有限公司 Cost sharing method and device, electronic equipment and computer-readable storage medium
CN111177220A (en) * 2019-12-26 2020-05-19 中国平安财产保险股份有限公司 Data analysis method, device and equipment based on big data and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030195780A1 (en) * 2001-12-13 2003-10-16 Liquid Engines, Inc. Computer-based optimization system for financial performance management
TW201636910A (en) * 2015-04-14 2016-10-16 Chunghwa Telecom Co Ltd Two-dimensional difference apportioning balance module
US20190034493A1 (en) * 2017-12-28 2019-01-31 Intel Corporation Method and apparatus to efficiently perform filter operations for an in-memory relational database
CN109189835A (en) * 2018-08-21 2019-01-11 北京京东尚科信息技术有限公司 The method and apparatus of the wide table of data are generated in real time
CN110729038A (en) * 2019-10-10 2020-01-24 北京东软望海科技有限公司 Cost sharing method and device, electronic equipment and computer-readable storage medium
CN111177220A (en) * 2019-12-26 2020-05-19 中国平安财产保险股份有限公司 Data analysis method, device and equipment based on big data and readable storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396332A (en) * 2020-11-25 2021-02-23 上汽通用五菱汽车股份有限公司 Data synchronization method, device, equipment and medium based on multi-vehicle part change
CN113760836A (en) * 2021-01-27 2021-12-07 北京京东振世信息技术有限公司 Wide table calculation method and device
CN113760836B (en) * 2021-01-27 2024-04-12 北京京东振世信息技术有限公司 Wide table calculation method and device
CN115033227A (en) * 2022-06-17 2022-09-09 中国平安人寿保险股份有限公司 Method, device, equipment and medium for managing product data through page
CN115033227B (en) * 2022-06-17 2024-08-27 中国平安人寿保险股份有限公司 Method, device, equipment and medium for managing product data through page
CN115796472A (en) * 2022-10-17 2023-03-14 北京力控元通科技有限公司 Energy metering processing method and device, electronic equipment and readable storage medium
CN115796472B (en) * 2022-10-17 2024-04-05 北京力控元通科技有限公司 Energy metering processing method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN111752944B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
CN111752944B (en) Data allocation method, device, computer equipment and storage medium
US9317542B2 (en) Declarative specification of data integration workflows for execution on parallel processing platforms
CN112445854B (en) Multi-source service data real-time processing method, device, terminal and storage medium
CN112100219B (en) Report generation method, device, equipment and medium based on database query processing
CN111339073A (en) Real-time data processing method and device, electronic equipment and readable storage medium
CN111427971B (en) Business modeling method, device, system and medium for computer system
CN112380227A (en) Data synchronization method, device and equipment based on message queue and storage medium
CN112199442B (en) Method, device, computer equipment and storage medium for distributed batch downloading files
CN111046237A (en) User behavior data processing method and device, electronic equipment and readable medium
CN113010542B (en) Service data processing method, device, computer equipment and storage medium
CN111611249A (en) Data management method, device, equipment and storage medium
CN111124917A (en) Public test case management and control method, device, equipment and storage medium
CN112182004A (en) Method and device for viewing data in real time, computer equipment and storage medium
CN106708869B (en) Group data processing method and device
CN113254106A (en) Task execution method and device based on Flink, computer equipment and storage medium
US8229946B1 (en) Business rules application parallel processing system
CN111291045A (en) Service isolation data transmission method and device, computer equipment and storage medium
CN115292580A (en) Data query method and device, computer equipment and storage medium
CN111221817B (en) Service information data storage method, device, computer equipment and storage medium
CN114860722A (en) Data fragmentation method, device, equipment and medium based on artificial intelligence
CN113504957A (en) Table data processing method and device, computer equipment and storage medium
CN111611056A (en) Data processing method and device, computer equipment and storage medium
CN112288559A (en) Financial data accounting method, device, equipment and storage medium based on equity
CN112328960B (en) Optimization method and device for data operation, electronic equipment and storage medium
CN114328214B (en) Efficiency improving method and device for interface test case of reporting software and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant