CN109299183A - A kind of data processing method, device, terminal device and storage medium - Google Patents
A kind of data processing method, device, terminal device and storage medium Download PDFInfo
- Publication number
- CN109299183A CN109299183A CN201811386474.9A CN201811386474A CN109299183A CN 109299183 A CN109299183 A CN 109299183A CN 201811386474 A CN201811386474 A CN 201811386474A CN 109299183 A CN109299183 A CN 109299183A
- Authority
- CN
- China
- Prior art keywords
- data
- cleaning
- initial
- initial data
- raw data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data processing method, device, terminal device and storage mediums.This method comprises: obtaining two or more initial data, which includes field information and data content;Field identification is converted raw data into, and extracts major key from field information;Without intersection then raw data base is written in initial data by the field identification in the corresponding field identification of initial data and raw data base;The initial data that raw data base is written is standardized pretreatment, obtains cleaning data;The major key in the major key and cleaning database of data is cleaned without intersection, then will clean data write-in cleaning database.The present invention in data processing, without by the corresponding major key of data all in raw data base and cleaning database in major key be compared, to improve data-handling efficiency.
Description
Technical field
The present embodiments relate to data processing technique more particularly to a kind of data processing method, device, terminal device and
Storage medium.
Background technique
ETL is the abbreviation of English Extract-Transform-Load, and Chinese is data pick-up, conversion and load.
ETL is the important ring for constructing data warehouse, and ETL is the data for will extract from heterogeneous data source, through over cleaning and is turned
It changes, loads data into purpose data warehouse (cleaning database), the basis as on-line analytical processing, data mining.
In ETL treatment process, when there is high-volume data access in source data warehouse, in conventional ETL work, directly
By the data access in source data warehouse, then by cleaning conversion links, data update is inserted into purpose data warehouse.But
When the data update in face of big data quantity, and in the application scenarios for requiring timeliness, above-mentioned processing scheme easily becomes number
According to the bottleneck of timeliness, so that the data in source data warehouse can not be updated in purpose data warehouse in time.
Summary of the invention
In view of this, the present invention provides a kind of data processing method, device, terminal device and storage medium, to improve number
According to treatment effeciency.
In a first aspect, the embodiment of the invention provides a kind of data processing methods, comprising:
Two or more initial data are obtained, the initial data includes field information and data content;
The initial data is converted into field identification, and extracts major key from the field information;
Field identification in the corresponding field identification of the initial data and raw data base, then will be described original without intersection
The raw data base is written in data;
The initial data that the raw data base is written is standardized pretreatment, obtains cleaning data;
Major key in the major key and cleaning database of the cleaning data is without intersection, then by the cleaning data
The cleaning database is written.
Further, before the two or more initial data of acquisition, further includes:
Raw data file is obtained, and format judgement is carried out to the raw data file;
The raw data file is JSON file, is parsed to the raw data file, to obtain JSON data
The initial data of format.
Further, it is described the initial data that the raw data base is written is standardized pretreatment before, also
Include:
Obtain the write time of initial data write-in raw data base or the creation time of raw data file;
Using said write time or creation time as the batch identification of initial data.
It is further, described that the initial data that the raw data base is written is standardized pretreatment, comprising:
Inquire the corresponding batch identification of initial data in the raw data base;
Pretreatment is standardized to the corresponding initial data of newest batch identification.
Further, after the cleaning database by cleaning data write-in, further includes:
Obtain the last push time of cleaning data;
It will be greater than the last push time and be less than the cleaning data-pushing of present system time to associated
In application platform.
Further, the initial data is converted into field identification, specifically:
The initial data is converted into corresponding cryptographic Hash.
Second aspect, the embodiment of the invention also provides a kind of data processing equipments, comprising:
First obtains module, and for obtaining two or more initial data, the initial data includes field letter
Breath and data content;
Extraction module is converted, for the initial data to be converted to field identification, and is mentioned from the field information
Take major key;
First writing module, for the field identification in the corresponding field identification of the initial data and raw data base without
Then the raw data base is written in the initial data by intersection;
Preprocessing module obtains clear for the initial data that the raw data base is written to be standardized pretreatment
Wash data;
Second writing module, for the major key in the major key and cleaning database of the cleaning data without friendship
Then the cleaning database is written in the cleaning data by collection.
Further, the data processing equipment further include:
Format judgment module, for obtaining raw data file before obtaining two or more initial data,
And format judgement is carried out to the raw data file;
Parsing module, being used for the raw data file is JSON file, is parsed to the raw data file, with
Obtain the initial data of JSON data format.
Further, the data processing equipment, further includes:
Second obtains module, for the initial data that the raw data base is written to be standardized pretreatment
Before, obtain the write time of initial data write-in raw data base or the creation time of raw data file;
Determining module, for using said write time or creation time as the batch identification of initial data.
Further, the preprocessing module, comprising:
Query unit, for inquiring the corresponding batch identification of initial data in the raw data base;
Pretreatment unit, for being standardized pretreatment to the corresponding initial data of newest batch identification.
Further, the data processing equipment, further includes:
Third obtains module, for after the cleaning database is written in the cleaning data, obtaining cleaning data
The last push time;
Pushing module is pushed away for will be greater than the last push time and be less than the cleaning data of present system time
It send into associated application platform.
Further, described that the initial data is converted into field identification, it is specifically used for:
The initial data is converted into corresponding cryptographic Hash.
The third aspect, the embodiment of the invention also provides a kind of terminal devices, comprising: at memory and one or more
Manage device;
The memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes data processing method as described in relation to the first aspect.
Fourth aspect, it is described the embodiment of the invention also provides a kind of storage medium comprising computer executable instructions
Computer executable instructions by computer processor when being executed for executing data processing method as described in relation to the first aspect.
The present invention by obtain it is two or more include field information and data content initial data;It will be former
Beginning data are converted to field identification, and major key is extracted from field information;If the corresponding field identification of initial data with
Without intersection raw data base is written in initial data by the field identification in raw data base;The original of raw data base will be written
Data are standardized pretreatment, obtain cleaning data;If cleaning the main key in the major key and cleaning database of data
Cleaning data write-in cleaning database is not necessarily to data all in raw data base by word in data processing without intersection
Major key in corresponding major key and cleaning database is compared, to improve data-handling efficiency.
Detailed description of the invention
Fig. 1 is a kind of flow chart for data processing method that the embodiment of the present invention one provides;
Fig. 2 is a kind of display schematic diagram for initial data write-in raw data base that the embodiment of the present invention one provides;
Fig. 3 is a kind of display schematic diagram for cleaning data write-in cleaning database that the embodiment of the present invention one provides;
Fig. 4 is a kind of flow chart of data processing method provided by Embodiment 2 of the present invention;
Fig. 5 is a kind of schematic diagram of data handling procedure provided by Embodiment 2 of the present invention;
Fig. 6 is a kind of display schematic diagram of determining batch identification provided by Embodiment 2 of the present invention;
Fig. 7 is a kind of structural block diagram of data processing system provided by Embodiment 2 of the present invention;
Fig. 8 is a kind of flow chart for data processing method that the embodiment of the present invention three provides;
Fig. 9 is a kind of component connection schematic diagram for data processing that the embodiment of the present invention three provides;
Figure 10 is a kind of structural block diagram for data processing equipment that the embodiment of the present invention four provides;
Figure 11 is a kind of structural schematic diagram for terminal device that the embodiment of the present invention five provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
It should be noted that in this programme data processing method all embodiments, be by kettle tool set
Component is in Kettle come the ETL data handling procedure realized.Wherein, Kettle is a ETL tool, is allowed to from different numbers
It is managed according to the data in library, and describes the data processing to be carried out by providing a patterned user environment
Journey.
Embodiment one
Fig. 1 is a kind of flow chart for data processing method that the embodiment of the present invention one provides, the number provided in the present embodiment
It can be executed by terminal device according to processing method, which can be realized by way of software and/or hardware, the terminal
Equipment can be two or more physical entities and constitute, and is also possible to a physical entity and constitutes.Terminal device in the present embodiment
Performance support is provided for application platform associated by server for server for handling initial data.
With reference to Fig. 1, which specifically comprises the following steps:
S110, two or more initial data are obtained.
Wherein, initial data includes field information and data content.
In embodiment, initial data can be understood as the different data information obtained from heterogeneous data source.Wherein, isomery
Data source refers to the data between different data base management systems.It is to be appreciated that server can be from the same time
Different data sources obtains initial data, includes multiple tables of data in initial data, and in every number of initial data
According to including field information and data content in table, wherein field information can be understood as each word in each tables of data
Section name, and data content can be understood as in tables of data corresponding data information in each field name.It is to be understood that each word
Duan Mingjun is corresponding with different data contents.Wherein, initial data is located in advance without standardization such as field filter, format conversions
The data information of reason.
S120, field identification is converted raw data into, and extracts major key from field information.
Wherein, field identification is used to indicate whether that the data content to file where initial data is modified, if occurring
Modification, then field identification changes;Conversely, if there is no modification, field identification does not also change.Illustratively,
Field identification can be cryptographic Hash, can also be MD5 Message Digest 5 (MD5 Message Digest Algorithm) value.Its
In, cryptographic Hash is also a kind of hash function, is used for message or data compression at abstract, so that data volume becomes smaller, by data
Format is fixed up;MD5 is a kind of Cryptographic Hash Function being widely used, and can produce one 128 hashed values, is used for
Ensure that information transmission is complete consistent.In embodiment, field identification is converted raw data into, is to detect initial data institute
Whether modify in the data content of file, is analyzed without compare one by one to the data content in initial data, from
And accelerate the detection speed of initial data.
It should be noted that there are multiple tables of data in initial data, have in each tables of data a plurality of
Record can extract one or more field conducts to uniquely identify the record of a certain item in tables of data from tables of data
The major key of the tables of data, so that the service speed of database can be accelerated by major key when searching some tables of data.
Field identification in the corresponding field identification of S130, initial data and raw data base is without intersection, then by original number
According to write-in raw data base.
Wherein, raw data base can be understood as the database for storing initial data.In embodiment, each original
File where data all preserves unique field identification in raw data base.If the corresponding field identification of initial data with
Field identification in raw data base is identical, then shows that the data content in the initial data is not modified, i.e. the original number
The data content in data content and raw data base in be it is duplicate, then be not required to by the initial data be written initial data
In library;Conversely, if the field identification in the corresponding field identification of initial data and raw data base does not have intersection, i.e. initial data
In data content and raw data base in data content be different, then by the initial data be written raw data base in,
To provide data performance support for application platform associated by server.Fig. 2 is that one kind that the embodiment of the present invention one provides is original
The display schematic diagram of data write-in raw data base.With reference to Fig. 2, it is assumed that get four initial data, then by each original number
It is then right according to being converted to corresponding field identification, respectively field identification 1, field identification 2, field identification 3 and field identification 4
Field identification in the field identification and raw data base of four initial data is compared, and finds field identification 1 and word
Segment identification 3 exists in raw data base, then deletes field identification 1 and the corresponding initial data of field identification 3, and
It only will be in field identification 1 and the corresponding initial data of field identification 4 write-in raw data base.
It should be noted that in the present solution, using the word in the corresponding field identification of initial data and raw data base
Segment identification is compared, and to determine whether initial data to be written in raw data base, and not uses initial data pair
The major key answered is for the ease of tracing initial data.It is to be understood that when needing to trace initial data, to find pair
When the old version record answered, it is compared according to the major key in the major key and raw data base of initial data,
When the major key of initial data does not change, and the corresponding data content of other fields changes, not by the field
In corresponding data content write-in raw data base, to cause the omission of initial data, corresponding history can not be found
Colophon;And compared according to the field identification in the corresponding field identification of initial data in this programme and raw data base
To analysis, if the corresponding any data content of initial data is modified, which also changes, so as to incite somebody to action
The initial data is written in raw data base, ensure that the integrality of initial data, and can find corresponding old version note
Record and initial data.
S140, the initial data that raw data base is written is standardized pretreatment, obtains cleaning data.
Wherein, standardization pretreatment includes a series of processes such as field filter, format conversion and data check.Implementing
In example, after raw data base is written in initial data, the field information in initial data is screened, it then will screening
The corresponding data content of the field information obtained afterwards is converted to preset format, and the data content after format is converted
Data check is carried out, to obtain cleaning data.Wherein, field filter can be understood as that initial data will not be met in initial data
The preset field information in library screens, with the initial data after being screened.For example, including field in initial data
D, but be not provided with field D in raw data base in preset field information, then by other data in initial data
When content is written in raw data base, Screening Treatment is carried out to the corresponding data content of field D.Then to Screening Treatment after
Initial data formats, to be converted to the data information for presetting format.For example, preset data length threshold
Value is 300, and number of data lines threshold value is 100;If the data length of initial data is 200, number of data lines 200 is then needed to original
Data carry out deconsolidation process, to obtain the initial data less than or equal to data length threshold value and number of data lines threshold value;Then right
Initial data after format conversion carries out data check, to guarantee the legitimacy of data.Specifically, after to format conversion
Character string in initial data is verified, to filter out the forbidden character string in the initial data after format conversion, finally
Obtain cleaning data.
Major key in S150, the major key for cleaning data and cleaning database then writes cleaning data without intersection
Enter to clean database.
Wherein, cleaning database can be understood as the data warehouse for storing cleaning data, also be understood as cleaning number
It is purpose data warehouse according to library.It in embodiment, can be directly to the major key for cleaning data in order to improve data write efficiency
It is compared with the major key in cleaning database, if the main pass in the major key of cleaning data and cleaning database
Key word does not repeat, and shows the not stored cleaning data for having step S140 to obtain in current cleaning database, need to be by the cleaning
In data write-in cleaning database, to be updated processing to the data information in cleaning database;Conversely, if cleaning data
Major key in major key and cleaning database repeats, and shows that being stored with step S140 in current cleaning database obtains
Cleaning data, can directly to the cleaning data carry out Screening Treatment.
Fig. 3 is a kind of display schematic diagram for cleaning data write-in cleaning database that the embodiment of the present invention one provides.Fig. 3 is
The process of cleaning data write-in cleaning database is illustrated on the basis of Fig. 2.With reference to Fig. 3, by field identification 2 and word
The corresponding initial data of segment identification 4 is written after raw data base, pretreatment is standardized to initial data, to obtain
Data are cleaned, corresponding major key, respectively major key 2 and major key 4 then will be found, by the two main keys
Major key in word and cleaning database is compared, and discovery major key 4 exists in cleaning database, then will
The corresponding cleaning data of major key 4 are deleted, and only the corresponding cleaning data of major key 2 are written in cleaning database.Wherein,
Initial data corresponding to field identification 2 and major key 2 is identical;It is original corresponding to field identification 4 and major key 4
Data are also identical.
The technical solution of the present embodiment includes field information and data content by obtaining two or more
Initial data;Field identification is converted raw data into, and extracts major key from field information;If initial data is corresponding
Field identification and raw data base in field identification without intersection, raw data base is written into initial data;It will be written original
The initial data of database is standardized pretreatment, obtains cleaning data;If cleaning the major key and cleaning data of data
Cleaning data write-in cleaning database is not necessarily to initial data by the major key in library in data processing without intersection
Major key in library in the corresponding major key of all data and cleaning database is compared, to improve data processing
Efficiency.
Embodiment two
Fig. 4 is a kind of flow chart of data processing method provided by Embodiment 2 of the present invention.The present embodiment is in above-mentioned reality
On the basis of applying example, further embody is made to data processing method.Fig. 5 is a kind of data provided by Embodiment 2 of the present invention
The schematic diagram for the treatment of process.It should be noted that for the ease of being illustrated to data handling procedure.In embodiment,
Only obtain 5 raw data files, respectively raw data file 1, raw data file 2, raw data file 3, original number
According to file 4 and raw data file 5, as shown in Figure 5.
Referring to Fig. 4, which specifically comprises the following steps:
S201, raw data file is obtained, and format judgement is carried out to raw data file.
Wherein, raw data file can be understood as the file of storage initial data.In embodiment, by ETL
Kettle is obtained and the consistent initial data of preliminary setting data format.It is to be understood that get raw data file it
Afterwards, preliminary screening is carried out to the data format of initial data in raw data file, to obtain meeting preliminary setting data format
Initial data.Wherein, the data format of initial data can be the data lattice such as EXCEL, JSON, text in raw data file
Formula is not limited thereto.It is to be understood that getting the raw data file for being stored with initial data from heterogeneous data source
Later, format judgement is carried out to the initial data in raw data file, to identify the original for meeting preliminary setting data format
Beginning data and raw data file.It should be noted that the format of raw data file and the data format of initial data are
It is identical.For example, the format of raw data file is generally the format of normal folder, but it is stored in the raw data file
The format of each document and the data format of initial data of initial data are identical.It is to be understood that each initial data
It may include having multiple documents for being stored with initial data in file.For example, document there are three including in raw data file, and
The data format of initial data is EXCEL format in three documents, then the format of these three documents is just EXCEL format.
S202, raw data file are JSON file, are parsed to raw data file, to obtain JSON data format
Initial data.
In embodiment, the preliminary setting data format of initial data is JSON data format in raw data file.It can
To be interpreted as, after obtaining raw data file, format judgement is carried out to raw data file, if raw data file is
The file of JSON format then parses the raw data file, to obtain the initial data of JSON data format.Certainly,
The data format of initial data can also be set as other data formats, can be set according to business demand.Such as Fig. 5 institute
Show, raw data file is parsed, to obtain the initial data of JSON data format, respectively initial data 1, original number
According to 2, initial data 3, initial data 4 and initial data 5.
S203, two or more initial data are obtained.
Wherein, initial data includes field information and data content.In embodiment, since raw data file has 5,
Then acquired initial data is corresponding also 5.
S204, field identification is converted raw data into, and extracts major key from field information.
In embodiment, each initial data is converted into corresponding field identification, which can be cryptographic Hash,
It can be MD5 value.Corresponding major key is extracted in the field information of each initial data simultaneously.
Field identification in the corresponding field identification of S205, initial data and raw data base is without intersection, then by original number
According to write-in raw data base.
Specifically, judge whether the field identification in the corresponding field identification of initial data and raw data base has friendship
Collection, as shown in figure 5, the field identification in field identification 2 and raw data base has intersection, is then deleted corresponding to field identification 2
Initial data, to realize the data deduplication in raw data base;And by field identification 1, field identification 3, field identification 4 and field
It identifies in 5 corresponding initial data write-in raw data bases.
The creation time of S206, the write time for obtaining initial data write-in raw data base or raw data file.
Wherein, the write time is system time locating when initial data to be written to raw data base;Creation time is pair
Initial data is assembled and forms system time locating when raw data file.In embodiment, raw data file
Write time of the creation time earlier than initial data write-in raw data base.It is to be understood that being got from heterogeneous data source
When raw data file, completed the creation to raw data file, for the ease of count raw data file creation when
Between, system time locating for raw data file can will be obtained from heterogeneous data source as when the creation of raw data file
Between.After getting raw data file, the conversion of field identification and pair of field identification are carried out to raw data file
Than analysis, then the initial data for meeting duplicate removal filtering rule is written in raw data base, when by the write-in raw data base
Locating system time is as the write time.Wherein, duplicate removal filtering rule is understood that as according to the corresponding field of initial data
Mark is compared and analyzed with the field identification in raw data base, to filter the rule of initial data.
S207, using write time or creation time as the batch identification of initial data.
Wherein, batch identification is used to identify the tandem that initial data updates in raw data base.In embodiment,
In order to which newest initial data can be found as early as possible from raw data base, to the initial data in each write-in raw data base
One corresponding batch identification is set, the write time that raw data base is written can be directlyed adopt as initial data in original number
According to the batch identification in library, batch of the creation time of raw data file as initial data in raw data base can also be used
Secondary mark.In order to more intuitively determine tandem that initial data updates in raw data base, batch according to batch identification
Mark can directly adopt the time of numeralization to set.Such as, it is assumed that the creation time of raw data file is November 9 in 2018
28 minutes at 16 points in day afternoons, then corresponding batch identification is 201811091628, for another example, it is assumed that raw data base is written in initial data
In write time be 6 minutes at 18 points in afternoons on November 9th, 2018, then corresponding batch identification is 201811091806.
Fig. 6 is a kind of display schematic diagram of determining batch identification provided by Embodiment 2 of the present invention.Assuming that between to be within 1 minute
Every data statistics amount within 15 minutes in total, as shown in fig. 6, the creation time of raw data file 5 is the twoth minute;
The creation time of raw data file 1 is the 6th second of the 7th minute;The creation time of raw data file 3 is the 9th minute
30th second;The creation time of raw data file 4 is the 18th second of the 12nd minute, solid arrow as shown in FIG. 6;And this four
The write time of raw data file write-in raw data base is the 30th second of the 15th minute, dotted line arrow as shown in FIG. 6
Head.
Certainly, it is managed collectively for the ease of the batch identification to initial data, it is necessary to used by batch identification
Setting is fixed in time, it can be understood as, the write time of raw data base is written as batch mark according to initial data
Know, the batch identification of all initial data need to be counted with the write time;Similarly, according to raw data file
Creation time then needs to count the batch identification of all initial data with creation time as batch identification, cannot incite somebody to action
Write time and creation time carry out mixing statistics.
The corresponding batch identification of initial data in S208, inquiry raw data base.
It in embodiment, can be by the corresponding batch mark of the initial data after raw data base is written in initial data
Know and be also written in raw data base, corresponding original can be found from raw data base according to batch identification as early as possible in order to subsequent
Beginning data.Wherein, data query sentence can be used and search batch identification from raw data base, for example, data query sentence can
Using the query statement in the databases such as structured query language (Structured Query Language, SQL), Oracle,
Certainly, it is not limited thereto, can be selected according to business demand.It should be noted that for the ease of to batch
A temporary data table being pre-created can be written in batch identification by the inquiry of mark, and temporary data table deposit is original
In database.Certainly, batch can directly be passed through with the relationship between initial data and batch identification in the temporary data table
Mark obtains corresponding initial data in raw data base.
S209, pretreatment is standardized to the corresponding initial data of newest batch identification, obtains cleaning data.
In embodiment, it in order to improve the speed being standardized to the initial data in raw data base, only needs
The corresponding initial data of batch identification newest in raw data base is standardized.It is to be understood that being looked by data
It askes sentence and obtains newest batch identification in current raw data base, and obtain the corresponding initial data of the newest batch identification,
Then after being pre-processed by a series of standardization such as field filter, format conversion and data checks, it can be obtained and cleaned
Cleaning data after filter.Wherein, it pretreated detailed process be standardized to initial data can be found in above-described embodiment and retouch
It states, details are not described herein.As shown in figure 5, original corresponding to field identification 1, field identification 3, field identification 4 and field identification 5
Beginning data, corresponding to batch identification be it is newest in raw data base, then by this corresponding original number of four field identifications
According to pretreatment is standardized, corresponding cleaning data 1, cleaning data 3, cleaning data 4 and cleaning data 5 are obtained.
It should be noted that using the write time of initial data write-in raw data base as former in the present embodiment
The batch identification of beginning data, then batch identification corresponding to field identification 1, field identification 3, field identification 4 and field identification 5 be
It is identical.
Major key in S210, the major key for cleaning data and cleaning database then writes cleaning data without intersection
Enter to clean database.
As shown in figure 5, the major key in major key 3 corresponding to cleaning data 3 and cleaning database repeats, then delete
Except cleaning data 3, and will only clean in data 1, cleaning data 4 and the cleaning write-in cleaning database of data 5.
The technical solution of the present embodiment is sentenced by carrying out format to raw data file on the basis of the above embodiments
It is disconnected, to obtain the initial data of JSON data format, meanwhile, initial data corresponding to batch identification newest in raw data base
It is standardized pretreatment, to obtain cleaning data, and the main key in the major key of cleaning data and cleaning database
When word is without intersection, by cleaning data write-in cleaning database, realizes and only the initial data for presetting format is obtained
It takes, and pretreatment only is standardized to the corresponding initial data of newest batch identification, simplify data handling procedure, thus
Improve data processing speed.
On the basis of the above embodiments, in order to timely more to the data progress in application platform associated by server
Newly, after step S210, further includes:
S211, the last push time for obtaining cleaning data.
Wherein, the last push time can be understood as the last to clean the cleaning data-pushing in database extremely
The time of application platform associated by server.In embodiment, data query sentence can be directlyed adopt to push the last time
Time carries out inquiry acquisition.Specifically, it will be cleaned after data are sent to associated application platform in the last time, it will be nearest
The primary push time is counted and is stored into preset time temporary data table, to transfer use subsequent.Certainly,
When being obtained to push time the last time, can directly be looked by the query statement in the databases such as SQL, Oracle
Inquiry obtains.
S212, it will be greater than the last push time and be less than present system time cleaning data-pushing to being closed
In the application platform of connection.
In embodiment, when the application platform associated by server is in unlatching use state, in order to clean in time
The cleaning data updated in database are sent to application platform associated by server, and the last time that can obtain cleaning data pushes away
The time is sent, and acquires present system time, when being greater than the last push time to acquire and be less than current system
Between all cleaning data, then by this it is all cleaning data by data communication mode push to associated by application platform
In.Wherein, the modes such as wireless network, cable network can be used in data communication mode, are not limited thereto.Wherein, it applies
Platform can be by the application program installed in client, wherein client can be desktop computer, laptop, smart phone etc.
Equipment.
Fig. 7 is a kind of structural block diagram of data processing system provided by Embodiment 2 of the present invention.As shown in fig. 7, the data
Processing system includes: server 310 and application platform 320;Wherein, server 310 is used to obtain initial data, and to original number
According to being handled, to obtain cleaning data;Application platform 320 can be desktop computer, smart phone, laptop.In server
310 will clean on data-pushing to application platform 320, and application platform 320 is according to cleaning data to the data in own database
It is updated.
Certainly, in data processing, can the key link of data processing add successfully with the process of false judgment.
For example, can terminate since being written raw data base in initial data to data-pushing to associated application platform will be cleaned,
It can be used as the key link in data handling procedure, when detecting that data handling procedure when the error occurs, directly passes through
Error message is sent to related development personnel by the mail components in Kettle, so that developer carries out data handling procedure
Real time monitoring.
Embodiment three
Fig. 8 is a kind of flow chart for data processing method that the embodiment of the present invention three provides.The present embodiment is in above-mentioned reality
On the basis of applying example, data handling procedure is illustrated with the various components in Kettle.With reference to Fig. 8, the data processing side
Method step specific as follows:
S410, setting time started.
It wherein, include task module in the component of Kettle, and each task module may include multiple processes, together
When each process parallel work-flow may be implemented.Meanwhile may include multiple components in each process, realize serial operation.Fig. 9
It is a kind of component connection schematic diagram for data processing that the embodiment of the present invention three provides.In embodiment, the data handling procedure
It can be considered a process, and include multiple components as shown in Figure 9 in the process, each component is executable not
Same data processing step.As shown in figure 9, component 510 is a beginning component, for the time parameter method that task starts is arranged,
For example, timing or time interval etc..Wherein, it periodically can be understood as starting a task at the time of setting;And time interval
It can be understood as starting a task every a period of time.Wherein, a task can be understood as a data handling procedure.
S420, the number of raw data file is counted, and judges whether it is 0, if 0, then follow the steps S470;
If not 0, then follow the steps S430.
In embodiment, after receiving the clicking trigger for starting component, raw data file is obtained, and to original number
Format judgement is carried out according to file, if raw data file is JSON file, the number of raw data file is counted, if former
Beginning data file is not 0, thens follow the steps S430;If raw data file is 0, S470 is thened follow the steps.Wherein, step S420
Detailed process can be realized by component 520 as shown in Figure 9.
S430, initial data is written in raw data base.
In embodiment, it is parsed to raw data file, to obtain the initial data of JSON data format, then
The field information and data content in initial data are obtained, and converts raw data into field identification, and from field information
Major key is extracted, judges whether the field identification in the corresponding field identification of initial data and raw data base has intersection, if
There is no intersection, then it represents that there is no the initial data in raw data base, which is written in raw data base.Wherein,
The step can realize that component 530 is raw data base by component 530 as shown in Figure 9.
It should be noted that by process in the initial data write-in raw data base in raw data file, it can
Referring in the prior art pass through Kettle in component by JSON file be inserted into database in process.Specifically can include: JSON
Parsing obtains variable, field selection, replacement NULL value, increases check column, obtains system information.Wherein, JSON is parsed
It is to be parsed to raw data file, to obtain the initial data of JSON data format;Then it obtains set in variable component
The variate-value set, and selected by field, required field is renamed and screened;And by the null value in initial data
It is substituted for NULL character string, in order to be combined the setting of major key;Then it determines the field information of cryptographic Hash, and obtains
System information, for example, system time information etc..
S440, cleaning data write-in is cleaned in database.
In embodiment, after raw data base is written in initial data, the corresponding batch identification of inquiry initial data,
It is pre- that a series of standardization such as field filter, format conversion and data check are carried out to the corresponding initial data of newest batch identification
Then processing the major key for cleaning data is compared with the major key in cleaning database with obtaining cleaning data
Analysis will cleaning data write-in cleaning if the major key of cleaning data is not present in the major key in cleaning database
In database.Wherein, component 540 as shown in Figure 9 is cleaning database.It is to be understood that the step is from component 530
The corresponding initial data of newest batch identification is obtained, is realized during cleaning database to write-in component 540.
S450, the data volume for the cleaning data for increasing newly or updating in cleaning database is counted, and judged whether it is
0。
In embodiment, the data in cleaning database are inquired, increases newly or update determines in cleaning database
Cleaning data data volume, increase newly or update cleaning data data volume be 0, then follow the steps S470;If newly-increased or more
The data volume of new cleaning data is not 0, thens follow the steps S460.Wherein, which is by component 550 as shown in Figure 9
Come what is realized.
S460, will be newly-increased or the cleaning data-pushing that updates to associated application platform.
In embodiment, after the cleaning data in cleaning database update or is newly-increased, answering associated by server
When in the open state with platform, automatically by cleaning data-pushing that is newly-increased or updating to associated application platform, to update
Data information in application platform.Wherein, which realized by component 560 as shown in Figure 9.
S470, work flow is exited.
In embodiment, if the number of raw data file is 0, show not get new initial data, then directly
It connects and exits work flow.Meanwhile when the data volume of newly-increased or update cleaning data is 0 in cleaning database, show to clean
There is no cleaning data that are newly-increased or updating in database, then directly exits work flow.Wherein, which is by such as Fig. 9 institute
The component 570 that shows is realized.
S480, miscue information is sent to related development personnel.
In embodiment, in order to guarantee that developer can recognize the errors present in data handling procedure in time, in number
Added successfully and false judgment according to the key link for the treatment of process, for example, from step 420- step S460, it is middle add successfully with it is wrong
Miscue information is then sent to related development personnel when occurring mistake in data handling procedure by erroneous judgement.
The technical solution of the present embodiment, in data processing, it is not necessary that data all in raw data base are corresponding
Major key in major key and cleaning database is compared, and improves data-handling efficiency.
Example IV
Figure 10 is a kind of structural block diagram for data processing equipment that the embodiment of the present invention four provides.At the data of the present embodiment
Reason device is configured in server, and with reference to Figure 10, which includes: the first acquisition module 610, conversion extraction mould
Block 620, the first writing module 630, preprocessing module 640 and the second writing module 650.
Wherein, first module 610 is obtained, for obtaining two or more initial data, which includes
Field information and data content;
Extraction module 620 is converted, for converting raw data into field identification, and extracts main pass from field information
Key word;
First writing module 630, for the field identification in the corresponding field identification of initial data and raw data base without
Then raw data base is written in initial data by intersection;
Preprocessing module 640 is cleaned for the initial data that raw data base is written to be standardized pretreatment
Data;
Second writing module 650, the major key in major key and cleaning database for cleaning data without intersection,
It then will cleaning data write-in cleaning database.
Technical solution provided in this embodiment includes in field information and data by obtaining two or more
The initial data of appearance;Field identification is converted raw data into, and extracts major key from field information;If initial data
Without intersection raw data base is written in initial data by the field identification in corresponding field identification and raw data base;It will write-in
The initial data of raw data base is standardized pretreatment, obtains cleaning data;If cleaning the major key and cleaning of data
Major key in database is without intersection, and by cleaning data write-in cleaning database, in data processing, being not necessarily to will be original
Major key in database in the corresponding major key of all data and cleaning database is compared, to improve data
Treatment effeciency
On the basis of the above embodiments, the data processing equipment further include:
Format judgment module, for obtaining raw data file before obtaining two or more initial data,
And format judgement is carried out to raw data file;
Parsing module is JSON file for raw data file, parses to raw data file, to obtain JSON
The initial data of data format.
On the basis of the above embodiments, the data processing equipment, further includes:
Second obtains module, for obtaining before the initial data that raw data base is written is standardized pretreatment
Take the write time of initial data write-in raw data base or the creation time of raw data file;
Determining module, for using write time or creation time as the batch identification of initial data.
On the basis of the above embodiments, the preprocessing module 640, comprising:
Query unit, for inquiring the corresponding batch identification of initial data in raw data base;
Pretreatment unit, for being standardized pretreatment to the corresponding initial data of newest batch identification.
On the basis of the above embodiments, the data processing equipment, further includes:
Third obtains module, for obtaining nearest the one of cleaning data after it will clean data write-in cleaning database
The secondary push time;
Pushing module, for will be greater than the last push time and be less than the cleaning data-pushing of present system time extremely
In associated application platform.
On the basis of the above embodiments, the initial data is converted into field identification, is specifically used for:
Convert raw data into corresponding cryptographic Hash.
Data processing method provided by any embodiment of the invention can be performed in above-mentioned data processing equipment, has the side of execution
The corresponding functional module of method and beneficial effect.
Embodiment five
Figure 11 is a kind of structural schematic diagram for terminal device that the embodiment of the present invention five provides.With reference to Figure 11, which is set
Standby includes: processor 710, memory 720, input unit 730 and output device 740.Processor 710 in the terminal device
Quantity can be one or more, in Figure 11 by taking a processor 710 as an example.The quantity of memory 720 in the terminal device
It can be one or more, in Figure 11 by taking a memory 720 as an example.The processor 710 of the terminal device, memory 720,
Input unit 730 and output device 740 can be connected by bus or other modes, to be connected as by bus in Figure 11
Example.In embodiment, which can be server.
Memory 720 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer
Sequence and module, the corresponding program instruction/module of equipment as described in any embodiment of that present invention is (for example, data processing equipment
In first obtain module 610, conversion extraction module 620, the first writing module 630, preprocessing module 640 and second write-in mould
Block 650).Memory 720 can mainly include storing program area and storage data area, wherein storing program area can store operation system
Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to equipment.This
Outside, memory 720 may include high-speed random access memory, can also include nonvolatile memory, for example, at least one
Disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, memory 720 can be into one
Step includes the memory remotely located relative to processor 710, these remote memories can pass through network connection to equipment.On
The example for stating network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Input unit 730 can be used for receiving the number or character information of input, and generate the user setting with equipment
And the related key signals input of function control.Output device 740 may include the audio frequency apparatuses such as loudspeaker.It needs to illustrate
It is that the concrete composition of input unit 730 and output device 740 may be set according to actual conditions.
Software program, instruction and the module that processor 710 is stored in memory 720 by operation, thereby executing setting
Standby various function application and data processing, that is, realize above-mentioned data processing method.
The terminal device of above-mentioned offer can be used for executing the data processing method that above-mentioned any embodiment provides, and have corresponding
Function and beneficial effect.
Embodiment six
The embodiment of the present invention six also provides a kind of storage medium comprising computer executable instructions, and the computer can be held
Row instruction by computer processor when being executed for executing a kind of data processing method, comprising:
Two or more initial data are obtained, which includes field information and data content;
Field identification is converted raw data into, and extracts major key from field information;
Then initial data is written without intersection for field identification in the corresponding field identification of initial data and raw data base
Raw data base;
The initial data that raw data base is written is standardized pretreatment, obtains cleaning data;
The major key in the major key and cleaning database of data is cleaned without intersection, then will clean data write-in cleaning
Database.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention
The data processing method operation that executable instruction is not limited to the described above, can also be performed provided by any embodiment of the invention
Relevant operation in data processing method, and have corresponding function and beneficial effect.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention
It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more
Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art
Part can be embodied in the form of software products, which can store in computer readable storage medium
In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer
Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set
Standby (can be robot, personal computer, server or the network equipment etc.) executes number described in any embodiment of that present invention
According to processing method.
It is worth noting that, included each unit and module are only patrolled according to function in above-mentioned data processing equipment
It volume is divided, but is not limited to the above division, as long as corresponding functions can be realized;In addition, each function list
The specific name of member is also only for convenience of distinguishing each other, the protection scope being not intended to restrict the invention.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned
In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage
Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware
Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal
Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any
One or more embodiment or examples in can be combined in any suitable manner.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (10)
1. a kind of data processing method characterized by comprising
Two or more initial data are obtained, the initial data includes field information and data content;
The initial data is converted into field identification, and extracts major key from the field information;
Field identification in the corresponding field identification of the initial data and raw data base is without intersection, then by the initial data
The raw data base is written;
The initial data that the raw data base is written is standardized pretreatment, obtains cleaning data;
Then the cleaning data are written without intersection for major key in the major key and cleaning database of the cleaning data
The cleaning database.
2. data processing method according to claim 1, which is characterized in that obtain two or more originals described
Before beginning data, further includes:
Raw data file is obtained, and format judgement is carried out to the raw data file;
The raw data file is JSON file, is parsed to the raw data file, to obtain JSON data format
Initial data.
3. data processing method according to claim 2, which is characterized in that the raw data base will be written described
Initial data is standardized before pretreatment, further includes:
Obtain the write time of initial data write-in raw data base or the creation time of raw data file;
Using said write time or creation time as the batch identification of initial data.
4. data processing method according to claim 3, which is characterized in that the original that the raw data base will be written
Beginning data are standardized pretreatment, comprising:
Inquire the corresponding batch identification of initial data in the raw data base;
Pretreatment is standardized to the corresponding initial data of newest batch identification.
5. data processing method according to claim 1, which is characterized in that it is described will the cleaning data write-in described in
It cleans after database, further includes:
Obtain the last push time of cleaning data;
It will be greater than the last push time and less than the cleaning data-pushing of present system time to associated application
In platform.
6. data processing method according to claim 1, which is characterized in that described that the initial data is converted to field
Mark, specifically:
The initial data is converted into corresponding cryptographic Hash.
7. a kind of data processing equipment characterized by comprising
First obtains module, for obtaining two or more initial data, the initial data include field information and
Data content;
Extraction module is converted, for the initial data to be converted to field identification, and extracts master from the field information
Keyword;
First writing module, for the field identification in the corresponding field identification of the initial data and raw data base without friendship
Then the raw data base is written in the initial data by collection;
Preprocessing module obtains cleaning number for the initial data that the raw data base is written to be standardized pretreatment
According to;
Second writing module, for the major key in the major key and cleaning database of the cleaning data without intersection, then
The cleaning database is written into the cleaning data.
8. data processing equipment according to claim 7, which is characterized in that described device further include:
Format judgment module carries out format judgement for obtaining raw data file, and to the raw data file;
Parsing module is JSON file for the raw data file, parses to the raw data file, to obtain
The initial data of JSON data format.
9. a kind of terminal device characterized by comprising memory and one or more processors;
The memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as data processing method as claimed in any one of claims 1 to 6.
10. a kind of storage medium comprising computer executable instructions, which is characterized in that the computer executable instructions by
For executing such as data processing method as claimed in any one of claims 1 to 6 when computer processor executes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811386474.9A CN109299183A (en) | 2018-11-20 | 2018-11-20 | A kind of data processing method, device, terminal device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811386474.9A CN109299183A (en) | 2018-11-20 | 2018-11-20 | A kind of data processing method, device, terminal device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109299183A true CN109299183A (en) | 2019-02-01 |
Family
ID=65143446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811386474.9A Pending CN109299183A (en) | 2018-11-20 | 2018-11-20 | A kind of data processing method, device, terminal device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109299183A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110096653A (en) * | 2019-04-16 | 2019-08-06 | 湖北地信科技集团股份有限公司 | Construction method, device, equipment and the storage medium of space time information service architecture |
CN110502563A (en) * | 2019-08-26 | 2019-11-26 | 腾讯科技(深圳)有限公司 | A kind of processing method and processing device of the data of multi-data source, storage medium |
CN110618983A (en) * | 2019-08-15 | 2019-12-27 | 复旦大学 | JSON document structure-based industrial big data multidimensional analysis and visualization method |
CN111177133A (en) * | 2019-12-24 | 2020-05-19 | 集奥聚合(北京)人工智能科技有限公司 | Processing insertion method for multivariate data |
CN111209736A (en) * | 2020-01-03 | 2020-05-29 | 恩亿科(北京)数据科技有限公司 | Text file analysis method and device, computer equipment and storage medium |
CN111258997A (en) * | 2020-01-16 | 2020-06-09 | 浪潮软件股份有限公司 | Data processing method and device based on NiFi |
CN111400392A (en) * | 2020-06-03 | 2020-07-10 | 上海冰鉴信息科技有限公司 | Multi-source heterogeneous data processing method and device |
CN111581182A (en) * | 2020-04-21 | 2020-08-25 | 北京龙云科技有限公司 | Data cleaning method and device |
CN111835847A (en) * | 2020-07-10 | 2020-10-27 | 中国联合网络通信集团有限公司 | Data processing method, device, equipment and storage medium |
CN111968703A (en) * | 2020-09-02 | 2020-11-20 | 荣联科技集团股份有限公司 | Colorectal cancer gene variation and medication reading system, reading method and device |
CN112131291A (en) * | 2020-09-11 | 2020-12-25 | 重庆誉存大数据科技有限公司 | JSON data-based structured analysis method, device, equipment and storage medium |
CN112306987A (en) * | 2020-01-19 | 2021-02-02 | 深圳新阳蓝光能源科技股份有限公司 | Data management method and device and electronic equipment |
CN112420168A (en) * | 2020-11-12 | 2021-02-26 | 武汉联影医疗科技有限公司 | Method, device, equipment and storage medium for writing data into database |
CN112486971A (en) * | 2020-12-08 | 2021-03-12 | 企查查科技有限公司 | Data cleaning method, equipment and storage medium with correction function |
CN112491926A (en) * | 2020-12-11 | 2021-03-12 | 迈普通信技术股份有限公司 | SRv6 path quality measuring method, device, electronic equipment and storage medium |
CN112835967A (en) * | 2019-11-25 | 2021-05-25 | 浙江宇视科技有限公司 | Data processing method, device, equipment and medium based on distributed storage system |
CN113127460A (en) * | 2019-12-31 | 2021-07-16 | 北京懿医云科技有限公司 | Evaluation method of data cleaning frame, device, equipment and storage medium thereof |
CN113190608A (en) * | 2021-05-28 | 2021-07-30 | 北京红山信息科技研究院有限公司 | Data standardized acquisition method, device, equipment and storage medium |
CN114579610A (en) * | 2022-03-02 | 2022-06-03 | 南方电网数字电网研究院有限公司 | Heterogeneous data processing method and device, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101290622A (en) * | 2007-04-20 | 2008-10-22 | 鸿富锦精密工业(深圳)有限公司 | Database cleaning system and method |
CN101504670A (en) * | 2009-03-04 | 2009-08-12 | 成都市华为赛门铁克科技有限公司 | Data operation method, system, client terminal and data server |
CN103136249A (en) * | 2011-11-30 | 2013-06-05 | 北京航天长峰科技工业集团有限公司 | System and method of multiplex mode isomerous data integration |
CN103473150A (en) * | 2013-08-28 | 2013-12-25 | 华中科技大学 | Fragment rewriting method for data repetition removing system |
CN105069033A (en) * | 2015-07-22 | 2015-11-18 | 北京京东尚科信息技术有限公司 | Method and device for creating database table model |
CN105959253A (en) * | 2015-11-19 | 2016-09-21 | 中国银联股份有限公司 | Method and device for determining data flow to be cleaned |
CN107491515A (en) * | 2017-08-11 | 2017-12-19 | 国电南瑞科技股份有限公司 | Intelligence based on big data platform matches somebody with somebody electricity consumption data transfer device |
CN108052665A (en) * | 2017-12-29 | 2018-05-18 | 深圳市中易科技有限责任公司 | A kind of data cleaning method and device based on distributed platform |
-
2018
- 2018-11-20 CN CN201811386474.9A patent/CN109299183A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101290622A (en) * | 2007-04-20 | 2008-10-22 | 鸿富锦精密工业(深圳)有限公司 | Database cleaning system and method |
CN101504670A (en) * | 2009-03-04 | 2009-08-12 | 成都市华为赛门铁克科技有限公司 | Data operation method, system, client terminal and data server |
CN103136249A (en) * | 2011-11-30 | 2013-06-05 | 北京航天长峰科技工业集团有限公司 | System and method of multiplex mode isomerous data integration |
CN103473150A (en) * | 2013-08-28 | 2013-12-25 | 华中科技大学 | Fragment rewriting method for data repetition removing system |
CN105069033A (en) * | 2015-07-22 | 2015-11-18 | 北京京东尚科信息技术有限公司 | Method and device for creating database table model |
CN105959253A (en) * | 2015-11-19 | 2016-09-21 | 中国银联股份有限公司 | Method and device for determining data flow to be cleaned |
CN107491515A (en) * | 2017-08-11 | 2017-12-19 | 国电南瑞科技股份有限公司 | Intelligence based on big data platform matches somebody with somebody electricity consumption data transfer device |
CN108052665A (en) * | 2017-12-29 | 2018-05-18 | 深圳市中易科技有限责任公司 | A kind of data cleaning method and device based on distributed platform |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110096653A (en) * | 2019-04-16 | 2019-08-06 | 湖北地信科技集团股份有限公司 | Construction method, device, equipment and the storage medium of space time information service architecture |
CN110618983A (en) * | 2019-08-15 | 2019-12-27 | 复旦大学 | JSON document structure-based industrial big data multidimensional analysis and visualization method |
CN110618983B (en) * | 2019-08-15 | 2023-01-06 | 复旦大学 | JSON document structure-based industrial big data multidimensional analysis and visualization method |
CN110502563A (en) * | 2019-08-26 | 2019-11-26 | 腾讯科技(深圳)有限公司 | A kind of processing method and processing device of the data of multi-data source, storage medium |
CN110502563B (en) * | 2019-08-26 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Data processing method and device of multiple data sources and storage medium |
CN112835967A (en) * | 2019-11-25 | 2021-05-25 | 浙江宇视科技有限公司 | Data processing method, device, equipment and medium based on distributed storage system |
CN111177133A (en) * | 2019-12-24 | 2020-05-19 | 集奥聚合(北京)人工智能科技有限公司 | Processing insertion method for multivariate data |
CN113127460B (en) * | 2019-12-31 | 2023-11-17 | 北京懿医云科技有限公司 | Evaluation method of data cleaning frame, device, equipment and storage medium thereof |
CN113127460A (en) * | 2019-12-31 | 2021-07-16 | 北京懿医云科技有限公司 | Evaluation method of data cleaning frame, device, equipment and storage medium thereof |
CN111209736A (en) * | 2020-01-03 | 2020-05-29 | 恩亿科(北京)数据科技有限公司 | Text file analysis method and device, computer equipment and storage medium |
CN111258997B (en) * | 2020-01-16 | 2023-11-03 | 浪潮软件股份有限公司 | Data processing method and device based on NiFi |
CN111258997A (en) * | 2020-01-16 | 2020-06-09 | 浪潮软件股份有限公司 | Data processing method and device based on NiFi |
CN112306987A (en) * | 2020-01-19 | 2021-02-02 | 深圳新阳蓝光能源科技股份有限公司 | Data management method and device and electronic equipment |
CN111581182A (en) * | 2020-04-21 | 2020-08-25 | 北京龙云科技有限公司 | Data cleaning method and device |
US11170022B1 (en) | 2020-06-03 | 2021-11-09 | Shanghai Icekredit, Inc. | Method and device for processing multi-source heterogeneous data |
CN111400392B (en) * | 2020-06-03 | 2020-08-21 | 上海冰鉴信息科技有限公司 | Multi-source heterogeneous data processing method and device |
CN111400392A (en) * | 2020-06-03 | 2020-07-10 | 上海冰鉴信息科技有限公司 | Multi-source heterogeneous data processing method and device |
CN111835847A (en) * | 2020-07-10 | 2020-10-27 | 中国联合网络通信集团有限公司 | Data processing method, device, equipment and storage medium |
CN111835847B (en) * | 2020-07-10 | 2021-12-14 | 中国联合网络通信集团有限公司 | Data processing method, device, equipment and storage medium |
CN111968703A (en) * | 2020-09-02 | 2020-11-20 | 荣联科技集团股份有限公司 | Colorectal cancer gene variation and medication reading system, reading method and device |
CN112131291B (en) * | 2020-09-11 | 2023-12-15 | 重庆誉存大数据科技有限公司 | Structured analysis method, device and equipment based on JSON data and storage medium |
CN112131291A (en) * | 2020-09-11 | 2020-12-25 | 重庆誉存大数据科技有限公司 | JSON data-based structured analysis method, device, equipment and storage medium |
CN112420168A (en) * | 2020-11-12 | 2021-02-26 | 武汉联影医疗科技有限公司 | Method, device, equipment and storage medium for writing data into database |
CN112486971A (en) * | 2020-12-08 | 2021-03-12 | 企查查科技有限公司 | Data cleaning method, equipment and storage medium with correction function |
CN112486971B (en) * | 2020-12-08 | 2024-03-26 | 企查查科技股份有限公司 | Data cleaning method, apparatus and storage medium with correction function |
CN112491926A (en) * | 2020-12-11 | 2021-03-12 | 迈普通信技术股份有限公司 | SRv6 path quality measuring method, device, electronic equipment and storage medium |
CN113190608A (en) * | 2021-05-28 | 2021-07-30 | 北京红山信息科技研究院有限公司 | Data standardized acquisition method, device, equipment and storage medium |
CN114579610A (en) * | 2022-03-02 | 2022-06-03 | 南方电网数字电网研究院有限公司 | Heterogeneous data processing method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299183A (en) | A kind of data processing method, device, terminal device and storage medium | |
US10489454B1 (en) | Indexing a dataset based on dataset tags and an ontology | |
CN109034993B (en) | Account checking method, account checking equipment, account checking system and computer readable storage medium | |
US20180276304A1 (en) | Advanced computer implementation for crawling and/or detecting related electronically catalogued data using improved metadata processing | |
GB2496120A (en) | Analysis of emails using a hidden Markov model to recognize sections of the email, e.g. header, body, signature block and disclaimer | |
CN110941629B (en) | Metadata processing method, apparatus, device and computer readable storage medium | |
US20170154123A1 (en) | System and method for processing metadata to determine an object sequence | |
KR20160124744A (en) | Systems and methods for hosting an in-memory database | |
WO2016200667A1 (en) | Identifying relationships using information extracted from documents | |
CN113760847A (en) | Log data processing method, device, equipment and storage medium | |
CN113221535B (en) | Information processing method, device, computer equipment and storage medium | |
CN106844553B (en) | Data detection and expansion method and device based on sample data | |
CN111125213A (en) | Data acquisition method, device and system | |
CN109086382A (en) | A kind of method of data synchronization, device, equipment and storage medium | |
US20190220441A1 (en) | Method, device and computer program product for data migration | |
CN113434506B (en) | Data management and retrieval method, device, computer equipment and readable storage medium | |
CN107168822B (en) | Oracle streams exception recovery system and method | |
CN109614442B (en) | Data table maintenance method and device for data synchronization, storage medium and electronic equipment | |
CN109426576B (en) | Fault-tolerant processing method and fault-tolerant assembly | |
Sharma et al. | Bug Report Triaging Using Textual, Categorical and Contextual Features Using Latent Dirichlet Allocation | |
JP2021140430A (en) | Database migration method, database migration system, and database migration program | |
CN108009204A (en) | Method and system based on extension name classification and de-redundancy | |
CN115794861A (en) | Offline data query multiplexing method based on feature abstract and application thereof | |
CN114691700A (en) | Kafaka cluster-based intelligent park retrieval method | |
CN117389908B (en) | Dependency analysis method, system and medium for interface automation test case |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190201 |
|
RJ01 | Rejection of invention patent application after publication |