CN113656400B - Characteristic data encoding method and device - Google Patents
Characteristic data encoding method and device Download PDFInfo
- Publication number
- CN113656400B CN113656400B CN202110774090.XA CN202110774090A CN113656400B CN 113656400 B CN113656400 B CN 113656400B CN 202110774090 A CN202110774090 A CN 202110774090A CN 113656400 B CN113656400 B CN 113656400B
- Authority
- CN
- China
- Prior art keywords
- data
- level
- hierarchy
- table stored
- column
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000012512 characterization method Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method for encoding characteristic data, which comprises the following steps: a. converting a data table stored in a row form into a coding mode stored in a column form with a hierarchy as a corresponding relation according to the high-low sequence of each node hierarchy, wherein the column form with the hierarchy as the corresponding relation at least comprises first column data serving as characterization data and N column data divided according to the hierarchy; b. and assigning each item in the N columns of data according to a specific sequence according to a pointing path from a high level to a low level, wherein the assignment at least reflects label tracing of an upper level relation corresponding to the current level. The invention improves the coding structure and the coding mode of the characteristic data again, classifies the characteristic structure in the characteristic data in a structured way, reduces the treatment difficulty of the characteristic data, satisfies the efficient storage of the characteristic data, reduces the storage cost, has smaller I/O operation, loads the characteristic data according to the need, has flexible overall data structure, stronger expansibility and strong functions, and has extremely high commercial value.
Description
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a method and a device for encoding characteristic data.
Background
The feature engineering plays an important role in machine learning, and is used as a pre-process of model training to help the model training to perform feature cleaning and feature extraction in massive data. The online model pre-estimation is used as the pre-processing of the model, and the real-time characteristics of the model pre-estimation query are brought into the online model for real-time pre-estimation. The industrial-level characteristic data can consume huge calculation and storage resources, the data volume is PB level and the number of the characteristic data is hundreds of billions, meanwhile, the characteristic data needs to be processed very quickly and cannot have overlarge impact on business services, and in order to meet the requirements of efficient storage, inquiry and analysis of the characteristic data, the coding of the characteristic data needs to be deeply modified.
The data storage of the features is usually row-level storage, the data of which are stored in the HDFS according to rows, the offline analysis uses a MapReduce model, the Map stage analyzes the data, each node calculates partial data, and finally the data combination is carried out through Reduce. At present, the offline processing mode of the feature data uses a large number of Mapreduce, so that all rows and columns of the data need to be fully loaded into a memory, and thousands of features, hundreds of billions of rows and PB levels of storage are required. The data are extracted and combined through Map and Reduce, so that characteristic data analysis is carried out by taking a large amount of invalid data together, and the IO, network and calculation consumption is huge.
At present, there is no technical solution capable of solving the above technical problems, and in particular, there is no method and apparatus for encoding feature data.
Disclosure of Invention
In view of the technical drawbacks of the prior art, an object of the present invention is to provide a method and an apparatus for encoding feature data, according to an aspect of the present invention, there is provided a method for encoding feature data, which is used for efficient storage of feature data, including the following steps:
a. converting a data table stored in a row form into a coding mode stored in a column form taking a hierarchy as a corresponding relation according to the high-low sequence of each node hierarchy, wherein the column form taking the hierarchy as the corresponding relation at least comprises first column data serving as characterization data and N column data divided according to the hierarchy, wherein the number of columns taking the hierarchy as the corresponding relation is the number of the node hierarchies, and N is more than 1;
b. each item in the N columns of data is assigned according to a specific sequence according to a pointing path from a high level to a low level, and the assignment at least reflects label tracing of an upper level relation corresponding to a current level;
wherein in the step b, the specific sequence includes a numerical arrangement or an alphabetical arrangement, and in the numerical arrangement, the hierarchical positions of the nodes in the data table stored in a row form corresponding to the upper node of each item of the N columns of data are sequentially recorded in a form of counting from 0;
and, regarding the node levels existing in the data table stored in the form of rows, the highest node level number is taken as the node level number of the current coding mode.
Preferably, before, during or after opening the data table stored in the form of rows and before encoding the feature data, the data table stored in the form of rows is screened to obtain row data required by the user.
Preferably, for the null data existing in the data table stored in the row form, each row in the column data corresponding to the null data is assigned 0 according to the pointing path from the high level to the low level.
Preferably, before the step a, the method further comprises the steps of:
i: the English letters in the data table stored in a row form are arranged according to the sequence of appearance and then marked with numbers;
ii: determining the numerical coordinates of the English letters in the data table stored in a row form, which correspond to words and are replaced by numerical marks, by taking the words as units;
iii: and storing the corresponding relation between the digital coordinates and the English letters and the digital marks.
According to another aspect of the present invention, there is provided an encoding apparatus of feature data, comprising:
first processing means: converting a data table stored in a row form into a coding mode stored in a column form with a hierarchy as a corresponding relation according to the high-low sequence of each node hierarchy, wherein the column form with the hierarchy as the corresponding relation at least comprises first column data serving as characterization data and N column data divided according to the hierarchy;
and a second processing device: performing assignment on each item of the N columns of data according to a pointing path from a high level to a low level according to a specific sequence, wherein the assignment at least represents label tracing of an upper level relation corresponding to a current level, the specific sequence comprises numerical arrangement or alphabetical arrangement, and in the numerical arrangement, the level positions of all nodes in a data table stored in a row form and corresponding to upper nodes of each item of the N columns of data are sequentially recorded in a form of counting from 0;
and, regarding the node levels existing in the data table stored in the form of rows, the highest node level number is taken as the node level number of the current coding mode.
Preferably, the method further comprises:
third processing means: the English letters in the data table stored in a row form are arranged according to the sequence of appearance and then marked with numbers;
first determining means: determining the numerical coordinates of the English letters in the data table stored in a row form, which correspond to words and are replaced by numerical marks, by taking the words as units;
fourth processing means: and storing the corresponding relation between the digital coordinates and the English letters and the digital marks.
According to the method, a data table stored in a row form is converted into a coding mode stored in a column form taking a hierarchy as a corresponding relation according to the height sequence of each node hierarchy, the column form taking the hierarchy as the corresponding relation at least comprises first column data serving as characterization data and N column data divided according to the hierarchy level, wherein the number of columns taking the hierarchy as the corresponding relation is the number of the node hierarchies, and N is more than 1; and assigning each item in the N columns of data according to a specific sequence according to a pointing path from a high level to a low level, wherein the assignment at least reflects label tracing of an upper level relation corresponding to the current level. In machine learning, the data sources of the features are diverse, such as behavior log data generated at the product end, purchase records of the user, user access depth, training process data of the feature data set, and the like. The invention improves the coding structure and the coding mode of the characteristic data again based on the aim of improving efficiency, classifies the characteristic structure in the characteristic data in a structured way, reduces the treatment difficulty of the characteristic data, can meet the requirements of high-efficiency storage of the characteristic data, reduces the storage cost, has smaller I/O operation, can load the characteristic data according to the requirement, has flexible overall data structure and stronger expansibility, and has simple flow, convenient use, strong function and extremely high commercial value.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart of a method for encoding feature data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram showing a specific flow before converting a data table stored in a row form into an encoding mode stored in a column form in which the hierarchy is a correspondence in the order of the levels of the respective nodes according to the first embodiment of the present invention;
FIG. 3 is a schematic diagram showing the module connection of a device for encoding characteristic data according to another embodiment of the present invention;
fig. 4 shows a hierarchical structure diagram of featureId according to a second embodiment of the present invention;
FIG. 5 shows a hierarchical structure of an age according to a third embodiment of the present invention;
FIG. 6 is a schematic view showing a hierarchical structure of a code according to a fourth embodiment of the present invention; and
fig. 7 shows a schematic diagram of an ordered set of codes corresponding to letters and numbers according to a fifth embodiment of the present invention.
Detailed Description
In order to better and clearly show the technical scheme of the invention, the invention is further described below with reference to the accompanying drawings.
Fig. 1 shows a specific flow diagram of a method for encoding feature data according to a specific embodiment of the present invention, and the present invention provides a method for encoding feature data, and for feature data processing, a special encoding mode is involved, and a currently stored mode is stored in a form of line data, as shown in the following table:
the data table stored in the form of rows in the table is converted into the form of columns mainly through a special coding mode, and the conversion mode is reversible, namely, the data table can be restored into the original form of rows according to the converted form of columns.
First, step S101 is performed, in which a data table stored in a row form is converted into a coding mode stored in a column form with a hierarchy as a correspondence, where the column form with a hierarchy as a correspondence includes at least first column data as characterization data and N column data divided according to a hierarchy level, where the number of columns with a hierarchy as a correspondence is the number of node hierarchies, N > 1, and in such an embodiment, the high-low order of the node hierarchies is arranged downward with the highest category according to the concept of upper and lower, for example: country, province, city, district, street, in combination with the above table, it may be classified into features at the uppermost layer, featureId, base and contact at the second layer, and name, age, sex, mail, code at the third layer according to the order of the levels of the nodes, further, it may be converted into an encoding mode stored in a column form of the levels as a correspondence relationship, and those skilled in the art understand that in such an embodiment, the present invention aims to acquire a plurality of lines of data currently required by using such an encoding mode, instead of all the data, so the encoding mode converted by the present invention is stored in a column form for a single line of data, further, there are a plurality of data in a plurality of levels in the above table, and we only need to acquire the data of the levels, it may be converted into the following table:
in such an embodiment, the column form using the hierarchy as the correspondence relationship at least includes first column data serving as the characterization data and N column data divided according to the level, where the first column data serving as the characterization data is 29, null, 40, null in val, and col-index and val-index in the following represent the hierarchy respectively, where the col-index represents the parent level of the feature data to be converted and val-index represents the level of the parent level.
Then, proceeding to step S102, each item in the N columns of data is assigned according to a pointing path from a high level to a low level, where the assignment represents at least a tracing of a label of an upper level relationship corresponding to a current level, in an embodiment where the assignment needs to be performed on "pending" in step S101, how the assignment is described in detail in connection with several specific embodiments, and those skilled in the art understand that, in the step S102, the specific order includes a numerical arrangement or an alphabetical arrangement, and in the numerical arrangement, a level position of each node in a data table stored in a row corresponding to an upper node of each item in the N columns of data is sequentially recorded in a form of counting from 0.
After the feature data structure is redefined, the feature data is globally categorized into key nodes, as shown in fig. 4, the hierarchy of the featureId only has two levels, and 1, 2, 3 and 4 of the featureId are located in the second level, so that the following table can be assigned to the feature data:
val | col-index | val-index |
1 | 0 | 1 |
2 | 0 | 1 |
3 | 0 | 1 |
4 | 0 | 1 |
wherein, col-index represents feature whose parent level is the uppermost level, namely "0", and val-index represents level where "1", "2", "3", "4" is located, namely "1" level, namely "1" of val-index.
As will be understood by those skilled in the art, for null data existing in a data table stored in a row format, each row in column data corresponding to the null data is assigned 0 according to a pointing path from a high level to a low level, fig. 5 shows a schematic hierarchical structure of an age according to a third embodiment of the present invention, and as shown in fig. 5, an age exists in three levels, and an age is located in the third level, so that the following table may be assigned:
val | col-index | val-index |
29 | 1 | 2 |
Null | 0 | 0 |
40 | 1 | 2 |
Null | 0 | 0 |
wherein, the assignment at least reflects label tracing of the upper hierarchy relationship corresponding to the current hierarchy, taking 29 as an example, col-index is 1 to represent base whose parent is the first layer, val-index is 2 to represent that the label is in the second hierarchy.
Further, for different node levels existing in the data table stored in the line form, the highest node level is used as the node level of the current coding mode, for example, a plurality of different node levels exist in the data table stored in the line form, and the data table has a three-layer structure and a four-layer structure.
For another example, fig. 6 shows a hierarchical structure diagram of a code according to a fourth embodiment of the present invention, and as shown in fig. 6, there are three levels of codes, and the codes are located at the third level, so they can be assigned as the following table:
val | col-index | val-index |
Null | 0 | 0 |
1000-1 | 0 | 2 |
1000-2 | 1 | 2 |
2000-1 | 0 | 2 |
2000-2 | 1 | 2 |
the rules and ideas of the assignment refer to the foregoing steps.
Further, before, during or after opening the data table stored in the form of a row, and before encoding the feature data, the data table stored in the form of a row is screened to obtain the row data required by the user, in a preferred embodiment, due to huge data volume of the data table stored in the form of a row, which often causes situations such as blocking, flashing back or slow running when opening, preferably before opening the data table stored in the form of a row, the corresponding instruction is adopted to directly screen and extract the row data required to be encoded by the feature data, and in another embodiment, if a certain blocking is caused when opening the data table stored in the form of a row, the corresponding instruction is adopted to skip the complete loading, and in another embodiment, after opening the data table stored in the form of a row, and before encoding the feature data, the data table stored in the form of a row is screened to obtain the row data required by the user, which does not affect the specific implementation of the invention, and this is not repeated.
Fig. 2 is a schematic flowchart showing a specific procedure before the data table stored in the form of rows is converted into the coding mode stored in the form of columns with the levels as the correspondence according to the high-low order of the respective node levels, and before the step S101, the method further includes the steps of:
first, step S201 is entered: the english letters in the data table stored in the form of rows are arranged according to the order of appearance and then marked with numbers, while in other embodiments, the english letters may be arranged according to the natural order of appearance, as shown in fig. 7, in combination with the first table in fig. 1, where A1 is the first occurrence, and in combination with fig. 7, the english letters are arranged according to the order of appearance and then marked with numbers.
Then, step S202 is proceeded to, by using words as units, the english alphabets in the data table stored in the form of rows to determine the numerical coordinates corresponding to the words after being replaced by the numerical marks, as shown in fig. 7, wherein, taking "A2" as an example, the numerical coordinates corresponding thereto are "5,1,6,7", and for "A4", the numerical coordinates corresponding thereto are "11,9,4,6,1".
Finally, step S203 is entered, where the digital coordinates and the correspondence between the english alphabets and the digital marks are stored, the feature data includes a large number of repeated code sets, and an ordered code set is constructed for the code sets according to the feature data, and the real feature data is projected to the key coordinate points. This slows down the feature data from a large amount of redundant data.
In the feature engineering in machine learning, one type of feature contains thousands of feature information, and after the feature information is improved by a new feature coding mode, the generation and adjustment of the feature are realized, and the independent management of single features can be realized without changing the data set of thousands of other features. And after the repeated coding of the characteristic data is improved, more than about 60% of storage space can be saved, IO is reduced by 90%, and the calculation consumption of the characteristic data is reduced by 70%.
Fig. 3 shows a schematic block diagram of a device for encoding characteristic data according to another embodiment of the present invention. The invention provides a characteristic data encoding device, which comprises a first processing device 1: the data table stored in the form of rows is converted into the coding mode stored in the form of columns with the levels as the corresponding relations according to the level sequence of each node level, wherein the column with the levels as the corresponding relations at least comprises the first column data as the characterization data and the N column data divided according to the level sequence, and the working principle of the first processing device 1 can refer to the aforementioned step S101 and is not repeated herein.
Further, the second processing device 2 is also included: each item in the N columns of data is assigned according to a specific order according to the pointing path from the high level to the low level, the assignment at least represents the label tracing of the upper level relationship corresponding to the current level, and the working principle of the second processing device 2 may refer to the foregoing step S102 and will not be described herein.
Further, the third processing device 3 is also included: the english alphabets in the data table stored in the form of rows are arranged according to the sequence of appearance and then marked with numerals, and the working principle of the third processing device 3 can refer to the aforementioned step S201, which is not repeated here.
Further, the first determining device 4 is also included: the english alphabets in the data table stored in the form of rows are used to determine the numerical coordinates corresponding to the words after being replaced by the numerical marks in units of words, and the working principle of the first determining device 4 may refer to the foregoing step S202, which is not repeated herein.
Further, the fourth processing device 5 is also included: the numerical coordinates and the correspondence between the english alphabets and the numerical marks are stored, and the working principle of the fourth processing device 5 may refer to the aforementioned step S203, which is not repeated herein.
It should be noted that, the specific implementation manner of each device embodiment is the same as the specific implementation manner of the corresponding method embodiment, and will not be described herein.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may also be used with the teachings herein. The required structure for the construction of such devices is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some embodiments, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in an apparatus according to embodiments of the invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.
Claims (6)
1. The encoding method of the characteristic data is used for efficiently storing the characteristic data and is characterized by comprising the following steps:
a. converting a data table stored in a row form into a coding mode stored in a column form taking a hierarchy as a corresponding relation according to the high-low sequence of each node hierarchy, wherein the column form taking the hierarchy as the corresponding relation at least comprises first column data serving as characterization data and N column data divided according to the hierarchy, wherein the number of columns taking the hierarchy as the corresponding relation is the number of the node hierarchies, and N is more than 1;
b. each item in the N columns of data is assigned according to a specific sequence according to a pointing path from a high level to a low level, and the assignment at least reflects label tracing of an upper level relation corresponding to a current level;
wherein in the step b, the specific sequence includes a numerical arrangement or an alphabetical arrangement, and in the numerical arrangement, the hierarchical positions of the nodes in the data table stored in a row form corresponding to the upper node of each item of the N columns of data are sequentially recorded in a form of counting from 0;
and, regarding the node levels existing in the data table stored in the form of rows, the highest node level number is taken as the node level number of the current coding mode.
2. The encoding method according to claim 1, wherein the data table stored in a line form is screened to obtain line data required by a user before, at the time of, or after opening the data table stored in a line form and before encoding the feature data.
3. The encoding method according to claim 1, wherein for null data existing in a data table stored in a row format, each row in column data corresponding to the null data is assigned 0 according to a high-level to low-level directional path.
4. The encoding method according to claim 1, characterized in that before said step a, it further comprises the step of:
i: the English letters in the data table stored in a row form are arranged according to the sequence of appearance and then marked with numbers;
ii: determining the numerical coordinates of the English letters in the data table stored in a row form, which correspond to words and are replaced by numerical marks, by taking the words as units;
iii: and storing the corresponding relation between the digital coordinates and the English letters and the digital marks.
5. A characteristic data encoding apparatus employing the encoding method according to any one of claims 1 to 4, comprising:
first processing device (1): converting a data table stored in a row form into a coding mode stored in a column form with a hierarchy as a corresponding relation according to the high-low sequence of each node hierarchy, wherein the column form with the hierarchy as the corresponding relation at least comprises first column data serving as characterization data and N column data divided according to the hierarchy;
second processing means (2): performing assignment on each item of the N columns of data according to a pointing path from a high level to a low level according to a specific sequence, wherein the assignment at least represents label tracing of an upper level relation corresponding to a current level, the specific sequence comprises numerical arrangement or alphabetical arrangement, and in the numerical arrangement, the level positions of all nodes in a data table stored in a row form and corresponding to upper nodes of each item of the N columns of data are sequentially recorded in a form of counting from 0;
and, regarding the node levels existing in the data table stored in the form of rows, the highest node level number is taken as the node level number of the current coding mode.
6. The encoding device according to claim 5, further comprising:
third processing means (3): the English letters in the data table stored in a row form are arranged according to the sequence of appearance and then marked with numbers;
first determining means (4): determining the numerical coordinates of the English letters in the data table stored in a row form, which correspond to words and are replaced by numerical marks, by taking the words as units;
fourth processing device (5): and storing the corresponding relation between the digital coordinates and the English letters and the digital marks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110774090.XA CN113656400B (en) | 2021-07-08 | 2021-07-08 | Characteristic data encoding method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110774090.XA CN113656400B (en) | 2021-07-08 | 2021-07-08 | Characteristic data encoding method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113656400A CN113656400A (en) | 2021-11-16 |
CN113656400B true CN113656400B (en) | 2024-02-27 |
Family
ID=78477277
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110774090.XA Active CN113656400B (en) | 2021-07-08 | 2021-07-08 | Characteristic data encoding method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113656400B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1381142A (en) * | 2000-05-18 | 2002-11-20 | 皇家菲利浦电子有限公司 | Encoding method for compression of video sequence |
CN102112962A (en) * | 2008-07-31 | 2011-06-29 | 微软公司 | Efficient column based data encoding for large-scale data storage |
CN105930371A (en) * | 2016-04-14 | 2016-09-07 | 江苏马上游科技股份有限公司 | Big-data-oriented HDFS-based dimensional storage and query method |
CN109564541A (en) * | 2016-05-31 | 2019-04-02 | 东新软件开发株式会社 | Data exchange system, method for interchanging data and data exchange program |
CN109684336A (en) * | 2018-12-27 | 2019-04-26 | 普元信息技术股份有限公司 | The system and method for tree data table efficient retrieval and ranking function is realized based on big data application |
CN111177302A (en) * | 2019-12-16 | 2020-05-19 | 金蝶软件(中国)有限公司 | Business document processing method and device, computer equipment and storage medium |
CN112000667A (en) * | 2020-08-10 | 2020-11-27 | 多点(深圳)数字科技有限公司 | Method, apparatus, server and medium for retrieving tree data |
CN112347118A (en) * | 2021-01-08 | 2021-02-09 | 阿里云计算有限公司 | Data storage, query and generation method, database engine and storage medium |
CN112817538A (en) * | 2021-02-22 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6804677B2 (en) * | 2001-02-26 | 2004-10-12 | Ori Software Development Ltd. | Encoding semi-structured data for efficient search and browsing |
JP2004272590A (en) * | 2003-03-07 | 2004-09-30 | Sony Corp | Data encoder, data encoding method and computer program |
US11245416B2 (en) * | 2016-06-20 | 2022-02-08 | Anacode Labs, Inc. | Parallel, block-based data encoding and decoding using multiple computational units |
US11068456B2 (en) * | 2019-12-13 | 2021-07-20 | Sap Se | Level-based hierarchies |
-
2021
- 2021-07-08 CN CN202110774090.XA patent/CN113656400B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1381142A (en) * | 2000-05-18 | 2002-11-20 | 皇家菲利浦电子有限公司 | Encoding method for compression of video sequence |
CN102112962A (en) * | 2008-07-31 | 2011-06-29 | 微软公司 | Efficient column based data encoding for large-scale data storage |
CN105930371A (en) * | 2016-04-14 | 2016-09-07 | 江苏马上游科技股份有限公司 | Big-data-oriented HDFS-based dimensional storage and query method |
CN109564541A (en) * | 2016-05-31 | 2019-04-02 | 东新软件开发株式会社 | Data exchange system, method for interchanging data and data exchange program |
CN109684336A (en) * | 2018-12-27 | 2019-04-26 | 普元信息技术股份有限公司 | The system and method for tree data table efficient retrieval and ranking function is realized based on big data application |
CN111177302A (en) * | 2019-12-16 | 2020-05-19 | 金蝶软件(中国)有限公司 | Business document processing method and device, computer equipment and storage medium |
CN112000667A (en) * | 2020-08-10 | 2020-11-27 | 多点(深圳)数字科技有限公司 | Method, apparatus, server and medium for retrieving tree data |
CN112347118A (en) * | 2021-01-08 | 2021-02-09 | 阿里云计算有限公司 | Data storage, query and generation method, database engine and storage medium |
CN112817538A (en) * | 2021-02-22 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
行列混合存储的数据压缩策略研究;魏玲 等;《小型微型计算机系统》;第38卷(第06期);1267-1272 * |
Also Published As
Publication number | Publication date |
---|---|
CN113656400A (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101093513A (en) | Method and system for solving issue of supporting multilanguage of computer software | |
CN111651453A (en) | User historical behavior query method and device, electronic equipment and storage medium | |
CN113569508B (en) | Database model construction method and device for data indexing and access based on ID | |
CN111291099B (en) | Address fuzzy matching method and system and computer equipment | |
CN101877087A (en) | System and method for monitoring and surveying floating population based on space position | |
CN113360654A (en) | Text classification method and device, electronic equipment and readable storage medium | |
CN111339166A (en) | Word stock-based matching recommendation method, electronic device and storage medium | |
CN116797195A (en) | Work order processing method, apparatus, computer device, and computer readable storage medium | |
CN111221813A (en) | Database index and database query processing method, device and equipment | |
KR102657104B1 (en) | Operation device of convolutional neural network, operation method of convolutional neural network and computer program stored in a recording medium to execute the method thereof | |
CN115774552A (en) | Configurated algorithm design method and device, electronic equipment and readable storage medium | |
CN113656400B (en) | Characteristic data encoding method and device | |
CN113609084B (en) | BIM custom format-based data compression method, device, equipment and medium | |
CN114595302A (en) | Method, device, medium, and apparatus for constructing multi-level spatial relationship of spatial elements | |
CN112068812B (en) | Micro-service generation method and device, computer equipment and storage medium | |
CN112347739A (en) | Application rule analysis method and device, electronic equipment and storage medium | |
CN114969385B (en) | Knowledge graph optimization method and device based on document attribute assignment entity weight | |
CN102542363B (en) | Business object generation method and device and ERP (Enterprise Resource Planning) system | |
CN115116069A (en) | Text processing method and device, electronic equipment and storage medium | |
CN115455050A (en) | Distributed database and query method | |
CN114881001A (en) | Report generation method based on artificial intelligence and related equipment | |
CN114661616A (en) | Target code generation method and device | |
CN111738005A (en) | Named entity alignment method and device, electronic equipment and readable storage medium | |
CN114881913A (en) | Image defect detection method and device, electronic equipment and storage medium | |
CN111414452A (en) | Search word matching method and device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |