CN106709006A - Associated data compressing method friendly to query - Google Patents

Associated data compressing method friendly to query Download PDF

Info

Publication number
CN106709006A
CN106709006A CN201611209081.1A CN201611209081A CN106709006A CN 106709006 A CN106709006 A CN 106709006A CN 201611209081 A CN201611209081 A CN 201611209081A CN 106709006 A CN106709006 A CN 106709006A
Authority
CN
China
Prior art keywords
vector
subject
data
predicate
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611209081.1A
Other languages
Chinese (zh)
Other versions
CN106709006B (en
Inventor
顾进广
彭燊
黄智生
符海东
梅琨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Chu Tianyun Polytron Technologies Inc
Wuhan University of Science and Engineering WUSE
Wuhan University of Science and Technology WHUST
Original Assignee
Wuhan Chu Tianyun Polytron Technologies Inc
Wuhan University of Science and Engineering WUSE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Chu Tianyun Polytron Technologies Inc, Wuhan University of Science and Engineering WUSE filed Critical Wuhan Chu Tianyun Polytron Technologies Inc
Priority to CN201611209081.1A priority Critical patent/CN106709006B/en
Publication of CN106709006A publication Critical patent/CN106709006A/en
Application granted granted Critical
Publication of CN106709006B publication Critical patent/CN106709006B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an associated data compressing method friendly to query. The method comprises the following steps: defining a relation mining rule, and mining a potential incidence relation in a triad; defining a compression query memory model which consists of a subject vector, a predicate vector and an object matrix; defining a serialization mode of the compression query memory model, and implementing serialization and deserialization by using three auxiliary symbols; defining a query mode of executing SPARQL on the compression query memory model, querying a subject and a predicate by using a binary search method, and querying an object by using a linear traverse method; and defining a scheme for solving slow query caused by the over-large object matrix, and dividing a large data block into a plurality of small data blocks. Compared with most of existing compression schemes, an associated data set processed by the method has the characteristics that the compression ratio is increased, and SPARQL query operation can be carried out directly under the compression state.

Description

A kind of associated data compression method friendly to inquiry
Technical field
The present invention relates to big data field, storage for magnanimity RDF, LOD and knowledge mapping related data, transmit and look into Ask.More particularly to a kind of associated data compression method friendly to inquiry
Background technology
Existing associated data compression scheme has a many kinds, but major part is for inquiry and unfriendly.The pressure generally accepted Contracting scheme has HDT, and this compression scheme compression ratio is higher, but inquiry when need first to decompress, to inquiry and it is unfriendly.Receive The inspiration of HDT schemes, many compress techniques based on HDT schemes are also suggested, such as HDT FoQ, WaterFowl, HDT++, this The characteristics of a little compress techniques have one jointly:High compression rate, but to inquiry and it is unfriendly.
Also there are some schemes friendly to inquiry, for example BitMat methods, this compression scheme uses the side of three-dimensional matrice Formula states triple relation, is many non-existent triple relations also reserved storage space.When associated data set is arrived greatly necessarily During scale, this three-dimensional matrice has reformed into a super sparse matrix, and due to storing many redundancies, compression ratio is not It is preferable.In order to reduce the redundancy of storage, K2- triple schemes are suggested, and be divided into for three-dimensional matrice according to predicate by it Multiple two-dimensional matrixs, two-dimensional matrix is stored with the structure of K2 trees.This method improves compression ratio to a certain extent, but also breaks Original intuitively matrix structure is broken, so need first to go back original matrix when being inquired about, and this operation can reduce RDF Search efficiency.
Increasing associated data is flooded with whole data network, when needing to manage and inquire about these data, inquiry Performance and data are expansible has become focal issue.Although can be stored using enough storage mediums increasingly huger Associated data set, but huge data set not only results in search efficiency reduction, can also aggravate other common processes (such as RDF issue and exchange) performance issue.As the long-range SPARQL end points inquiry mode by network transmission implementing result is more next More welcome, the issue and exchange of RDF use more and more frequent in the inquiry of associated data.Therefore a kind of inquiry friend is found Good associated data compression method is significant.
The content of the invention
Regarding to the issue above, the purpose of the present invention is to find a kind of compression scheme friendly to inquiry.Not to association In the case of data compression data decompression, SPARQL inquiries can be directly carried out, while improving compression ratio as far as possible.
Target of the present invention concentrates potential relational matrix to realize by excavation associated data.The method includes:
A kind of associated data compression method friendly to inquiry, it is characterised in that
The step of one structure structural model, specifically include:
Step 1, is based on triple memory model N-Triple format associated datas and parses, obtains triplet sets, Then dictionary is built, and by triple IDization, wherein, the process of parsing includes:
Step 1.1, filters out the row or null started with " # ";
Step 1.2, reads and cutting character string in space is pressed per data line;
Step 1.3, data after cutting are mapped to subject, predicate and the object of triple, are built into a triple;
Step 2, based on relation excavation constraint, potentially associates in excavation triple;
Step 3, defines Compressed text search memory model, is made up of header information, dictionary and data set of blocks, each data block It is made up of subject vector, predicate vector sum object matrix;The Compressed text search memory model uses subject vector, predicate vector sum The mode of object matrix represents triple relation:Subject vector is defined for a length is the column vector of m, predicate vector is one Length is the row vector of n, and object matrix is a matrix of m*n, and subject vector sum predicate vector does vector multiplication, obtains one The matrix being made up of subject-predicate language with object matrix size identical, then mapped one by one with the data item of object matrix, after mapping Each single item is a triple relation;
One the step of carry out high compression rate data storage based on structural model, specifically includes:
Step 4, the memory length for defining each ID is identical, the serializing mode based on Compressed text search memory model:Use Auxiliary sign carries out serializing and unserializing operation;
Step 4.1, serializing:For each data block, by object matrix flattening, with preposition accessory ID by subject to Amount, the object matrix data of predicate vector sum flattening connect into linear data structure, then by the linear data structure after treatment Linked together with data block accessory ID;
Step 4.2, unserializing:The data of serializing are divided into each according to data block auxiliary identifier to count one by one According to block, for each data block, the object square of subject vector, predicate vector sum flattening is divided into according to preposition auxiliary identifier The object matrix of flattening is reduced into ewal matrix by battle array, the length further according to subject vector;
One the step of carry out data query based on structural model, specifically includes:
Step 5, the inquiry of the SPARQL based on Compressed text search memory model:Inquiry to subject and predicate is looked into using two points Look for, the inquiry for object is searched using linear sweep, is specifically included:
Step 5.1, subject querying method is specifically included:The subject vector of all data blocks is traveled through, because in subject vector Portion has sorted, and using binary chop method, therefore subject query time complexity is O (log2n);
Step 5.2, predicate querying method is specifically included:The predicate vector of all data blocks is traveled through, because in predicate vector Portion has sorted, and using binary chop method, therefore predicate query time complexity is O (log2n);
Step 5.3, object querying method is specifically included:The object matrix of all data blocks is traveled through, because in object matrix Portion is unsorted, can only sequential search, time complexity be O (n).
In a kind of above-mentioned associated data compression method friendly to inquiry, based on N-Triple format associated datas and solve Analysis, obtains triplet sets and specifically includes:
Step 2.1, filters out the row or null started with #;
Step 2.2, reads and cutting character string in space is pressed per data line;
Step 2.3, data after cutting are mapped to subject, predicate and the object of triple, are built into a triple.
In above-mentioned associated data compression method a kind of friendly to inquiry, dictionary is built, and triple IDization is specific Including:
Step 3.1, flattening operation is carried out by triple obtained in the previous step, removes repeated data;
Step 3.2, is the unique ID of each single item data distribution one, obtains Dictionary;
Step 3.3, extracts every item data identical header information in Dictionary, obtains Header;By original three Tuple data is replaced with ID, obtains the triplet sets of IDization.
In above-mentioned associated data compression method a kind of friendly to inquiry, relation excavation constraint includes:
Constraints one:Merge the triple with identical subject and predicate;
Constraints two:All triples are classified according to subject, are merged all predicates and object of identical subject, Predicate vector sum object vector is formed, the predicate vector of each subject is extracted;
Constraints three:Merge the triple with identical predicate (predicate vector) and object;
Constraints four:All triples are classified according to predicate vector, merge the subject and object of identical predicate vector, Form subject vector sum object matrix.
In a kind of above-mentioned associated data compression method friendly to inquiry, Compressed text search memory model, using subject to Amount, the mode of predicate vector sum object matrix represent triple relation:It is assumed that subject vector is the column vector of m for a length, Predicate vector is the row vector of n for length, and object matrix is a matrix of m*n, subject vector sum predicate vector do to Amount multiplication, obtains the matrix that and object matrix size identical are made up of subject-predicate language, then the data item one with object matrix One mapping, each single item after mapping is a triple relation.
In above-mentioned associated data compression method a kind of friendly to inquiry, the inquiry of SPARQL also includes:
Step 5.4, complex query, all of complex query can be broken into above three kinds of simple queries, remerge The result of simple queries;The step 5.1 to step 5.4 can be performed concurrently.
In above-mentioned associated data compression method a kind of friendly to inquiry, also including a step for object matrix-split Suddenly, the data block excessive for object matrix, is split as multiple data blocks, specifically causes object to be inquired about when object matrix is excessive Slow solution, keeps predicate vector constant, and subject vector sum object matrix correspondence is split, and obtains multiple small data Block, this method for splitting can safeguard Compressed text search memory model structure, it is ensured that the Compressed text search memory model after fractionation still can enter The concurrent inquiry operation of row.
Therefore, the invention has the advantages that:Using the associated data set of present invention treatment, relative to most of existing pressure Contracting scheme, improves compression ratio, and under compression, can directly carry out SPARQL inquiry operations.
Brief description of the drawings
Fig. 1 is the contraction principle figure of the embodiment of the present invention.
Fig. 2 is the Compressed text search memory model figure of the embodiment of the present invention.
Fig. 3 is four relation excavations rule description figure of the embodiment of the present invention.
Fig. 4 splits rule description figure for the long data block of the embodiment of the present invention.
Fig. 5 is method of the present invention schematic flow sheet.
Specific embodiment
Technical solution of the present invention is described in detail below in conjunction with drawings and Examples.
The technical scheme that the present invention is provided is the associated data set compression algorithm based on relational matrix, specifically includes following step Suddenly:
1. triple memory model is defined, comprising subject S, predicate P and tri- data segments of object O;
2. it is input into N-Triple format associated datas and parses, obtains triplet sets;
Detailed process is as follows:
2.1. the row or null started with " # " are filtered out;
2.2. read and cutting character string in space is pressed per data line;
2.3., data after cutting are mapped to subject, predicate and the object of triple, a triple is built into;
3. dictionary is built, and triple IDization;
Detailed process is as follows:
3.1. triple obtained in the previous step is carried out into flattening operation, removes repeated data;
3.2. it is the unique ID of each single item data distribution one, obtains Dictionary;
3.3. every item data identical header information in Dictionary is extracted, Header is obtained;
3.4. original triple data are replaced with ID, obtains the triplet sets of IDization;
4. the relation excavation first step, merges the triple with identical subject and predicate, and Step1 in 1, derives referring to the drawings Formula Rule1 in 3 referring to the drawings;
5. relation excavation second step, all triples are classified according to subject, merge all predicates of identical subject And object, predicate vector sum object vector is formed, the predicate vector of each subject is extracted, and predicate vector inside is arranged Sequence, referring to the drawings Step2 in 1, derivation formula Rule2 in 3 referring to the drawings;
6. the step of relation excavation the 3rd, merges the triple with identical predicate (predicate vector) and object, in 1 Step3, derivation formula Rule3 in 3 referring to the drawings;
7. the step of relation excavation the 4th, by all triples according to predicate (predicate vector) classification, merges identical predicate (predicate Vector) subject and object, form internal ordering subject vector sum object matrix, will it is such a by subject vector, call The structure of language vector sum object matrix composition is referred to as a data block, referring to the drawings Step4 in 1, and derivation formula is referring to the drawings in 3 Rule4;
8. extract the subject vector of each data block, predicate vector sum object matrix and set up Compressed text search memory model, Compressed text search memory model is referring to the drawings 2;
9. the SPARQL inquiry modes under compressive state, can carry out concurrent inquiry operation in all of data block;
Detailed process is as follows:
9.1. subject inquiry, travels through the subject vector of all data blocks, because subject vector is internal having sorted, uses two points Lookup method, therefore subject query time complexity is O (log2n);
9.2. predicate inquiry, travels through the predicate vector of all data blocks, because predicate vector is internal having sorted, uses two points Lookup method, therefore predicate query time complexity is O (log2n);
9.3. object inquiry, travels through the object matrix of all data blocks, because object internal matrix is unsorted, can only order Search, time complexity is O (n), for the king-sized data block of object matrix, can carry out deblocking, it is linear to reduce Traversal searches the time overhead for bringing, refer to the attached drawing 4;
9.4. complex query, all of complex query can be broken into above three kinds of simple queries, remerge simple The result of inquiry.
10. serializing write-in file, the memory length for setting each ID is identical, using auxiliary symbol " | " (or identifier), ", " (subject-predicate object decollator) and "/" (data block decollator) realize serializing and unserializing.
Specific embodiment described herein is only to the spiritual explanation for example of the present invention.Technology neck belonging to of the invention The technical staff in domain can be made various modifications or supplement to described specific embodiment or be replaced using similar mode Generation, but without departing from spirit of the invention or surmount scope defined in appended claims.

Claims (7)

1. a kind of to inquiring about friendly associated data compression method, it is characterised in that
The step of one structure structural model, specifically include:
Step 1, is based on triple memory model N-Triple format associated datas and parses, obtains triplet sets, then Dictionary is built, and by triple IDization, wherein, the process of parsing includes:
Step 1.1, filters out the row or null started with #;
Step 1.2, reads and cutting character string in space is pressed per data line;
Step 1.3, data after cutting are mapped to subject, predicate and the object of triple, are built into a triple;
Step 2, based on relation excavation constraint, potentially associates in excavation triple;
Step 3, defines Compressed text search memory model, is made up of header information, dictionary and data set of blocks, and each data block is by leading Language vector, predicate vector sum object matrix composition;The Compressed text search memory model uses subject vector, predicate vector sum object The mode of matrix represents triple relation:Subject vector is defined for a length is the column vector of m, predicate vector is a length It is the row vector of n, object matrix is a matrix of m*n, and subject vector sum predicate vector does vector multiplication, obtains one and guest The matrix that language matrix size identical is made up of subject-predicate language, then mapped one by one with the data item of object matrix, it is each after mapping Xiang Weiyi triple relation;
One the step of carry out high compression rate data storage based on structural model, specifically includes:
Step 4, the memory length for defining each ID is identical, the serializing mode based on Compressed text search memory model:Use auxiliary Sign carries out serializing and unserializing operation;
Step 4.1, serializing:For each data block, by object matrix flattening, with preposition accessory ID by subject vector, The object matrix data of predicate vector sum flattening connects into linear data structure, then by the linear data structure number after treatment Linked together according to block accessory ID;
Step 4.2, unserializing:The data of serializing are divided into by each data one by one according to data block auxiliary identifier Block, for each data block, the object square of subject vector, predicate vector sum flattening is divided into according to preposition auxiliary identifier The object matrix of flattening is reduced into ewal matrix by battle array, the length further according to subject vector;
One the step of carry out data query based on structural model, specifically includes:
Step 5, the inquiry of the SPARQL based on Compressed text search memory model:Inquiry to subject and predicate uses binary chop, Inquiry for object is searched using linear sweep, is specifically included:
Step 5.1, subject querying method is specifically included:The subject vector of all data blocks is traveled through, because subject vector is internal Sequence, using binary chop method, therefore subject query time complexity is O (log2n);
Step 5.2, predicate querying method is specifically included:The predicate vector of all data blocks is traveled through, because predicate vector is internal Sequence, using binary chop method, therefore predicate query time complexity is O (log2n);
Step 5.3, object querying method is specifically included:The object matrix of all data blocks is traveled through, because object internal matrix is not Sequence, can only sequential search, time complexity be O (n).
2. it is according to claim 1 a kind of to inquiring about friendly associated data compression method, it is characterised in that based on N- Triple format associated datas are simultaneously parsed, and are obtained triplet sets and are specifically included:
Step 2.1, filters out the row or null started with #;
Step 2.2, reads and cutting character string in space is pressed per data line;
Step 2.3, data after cutting are mapped to subject, predicate and the object of triple, are built into a triple.
3. it is according to claim 1 a kind of to inquiring about friendly associated data compression method, it is characterised in that to build word Allusion quotation, and triple IDization is specifically included:
Step 3.1, flattening operation is carried out by triple obtained in the previous step, removes repeated data;
Step 3.2, is the unique ID of each single item data distribution one, obtains Dictionary;
Step 3.3, extracts every item data identical header information in Dictionary, obtains Header;By original triple Data are replaced with ID, obtain the triplet sets of IDization.
4. it is according to claim 1 a kind of to inquiring about friendly associated data compression method, it is characterised in that relation excavation Constraint includes:
Constraints one:Merge the triple with identical subject and predicate;
Constraints two:All triples are classified according to subject, is merged all predicates and object of identical subject, formed Predicate vector sum object vector, extracts the predicate vector of each subject;
Constraints three:Merge the triple with identical predicate vector sum object;
Constraints four:All triples are classified according to predicate vector, merges the subject and object of identical predicate vector, formed Subject vector sum object matrix.
5. it is according to claim 1 a kind of to inquiring about friendly associated data compression method, it is characterised in that Compressed text search Memory model, triple relation is represented using the mode of subject vector, predicate vector sum object matrix:It is assumed that subject vector is one Individual length is the column vector of m, and predicate vector is the row vector of n for a length, and object matrix is a matrix of m*n, subject Vector sum predicate vector does vector multiplication, obtains one and matrix that object matrix size identical is made up of subject-predicate language, then with The data item of object matrix maps one by one, and each single item after mapping is a triple relation.
6. it is according to claim 1 a kind of to inquiring about friendly associated data compression method, it is characterised in that SPARQL's Inquiry also includes:
Step 5.4, complex query, all of complex query can be broken into above three kinds of simple queries, remerge simple The result of inquiry;The step 5.1 to step 5.4 can be performed concurrently.
7. it is according to claim 1 a kind of to inquiring about friendly associated data compression method, it is characterised in that also including one The step of individual object matrix-split, the data block excessive for object matrix is split as multiple data blocks, specifically when object square Battle array is excessive to cause object to inquire about slow solution, keeps predicate vector constant, and subject vector sum object matrix correspondence is torn open Point, multiple small data blocks are obtained, this method for splitting can safeguard Compressed text search memory model structure, it is ensured that the compression after fractionation is looked into Asking memory model still can carry out concurrent inquiry operation.
CN201611209081.1A 2016-12-23 2016-12-23 Query-friendly associated data compression method Active CN106709006B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611209081.1A CN106709006B (en) 2016-12-23 2016-12-23 Query-friendly associated data compression method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611209081.1A CN106709006B (en) 2016-12-23 2016-12-23 Query-friendly associated data compression method

Publications (2)

Publication Number Publication Date
CN106709006A true CN106709006A (en) 2017-05-24
CN106709006B CN106709006B (en) 2020-10-30

Family

ID=58895698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611209081.1A Active CN106709006B (en) 2016-12-23 2016-12-23 Query-friendly associated data compression method

Country Status (1)

Country Link
CN (1) CN106709006B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704617A (en) * 2017-10-25 2018-02-16 武汉科技大学 A kind of compression method of the associated data based on classification tree index
CN110457697A (en) * 2019-08-01 2019-11-15 南京邮电大学 A kind of RDF data compression and decompression method based on anonymous predicate index
CN111026747A (en) * 2019-10-25 2020-04-17 广东数果科技有限公司 Distributed graph data management system, method and storage medium
CN111291185A (en) * 2020-01-21 2020-06-16 京东方科技集团股份有限公司 Information extraction method and device, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4199361A1 (en) * 2021-12-17 2023-06-21 Dassault Systèmes Compressed graph notation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521299A (en) * 2011-11-30 2012-06-27 华中科技大学 Method for processing data of resource description framework
CN102968804A (en) * 2012-11-23 2013-03-13 西安工程大学 Method for carrying out compression storage on adjacent matrixes of sparse directed graph
CN103326730A (en) * 2013-06-06 2013-09-25 清华大学 Data parallelism compression method
CN104809168A (en) * 2015-04-06 2015-07-29 华中科技大学 Partitioning and parallel distribution processing method of super-large scale RDF graph data
CN105955999A (en) * 2016-04-20 2016-09-21 华中科技大学 Large scale RDF graph Thetajoin query processing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521299A (en) * 2011-11-30 2012-06-27 华中科技大学 Method for processing data of resource description framework
CN102968804A (en) * 2012-11-23 2013-03-13 西安工程大学 Method for carrying out compression storage on adjacent matrixes of sparse directed graph
CN103326730A (en) * 2013-06-06 2013-09-25 清华大学 Data parallelism compression method
CN104809168A (en) * 2015-04-06 2015-07-29 华中科技大学 Partitioning and parallel distribution processing method of super-large scale RDF graph data
CN105955999A (en) * 2016-04-20 2016-09-21 华中科技大学 Large scale RDF graph Thetajoin query processing method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JAVIER D.FERNANDEZ ET AL.: "Binary RDF representation for publication and exchange(HDT)", 《WEB SEMANTICS SCIENCE SERVICES & AGENTS ON THE WORLD WIDE WEB》 *
PINGPENG YUAN.ETC: "TripleBit:a Fast and Compact System for Large Scale RDF Data", 《PROCEEDINGS OF THE VLDB ENDOWNMENT》 *
SANDRA ALVAREZ-GARCIA.ETC.: "Compressed vertical partitioning for efficient RDF management", 《KNOWLEDGE AND INFORMATION SYSTEMS》 *
TROY NY: "BitMat:A Main Memory RDF Triple Store", 《TETHERLESS WORLD CONSTELLATION》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704617A (en) * 2017-10-25 2018-02-16 武汉科技大学 A kind of compression method of the associated data based on classification tree index
CN110457697A (en) * 2019-08-01 2019-11-15 南京邮电大学 A kind of RDF data compression and decompression method based on anonymous predicate index
CN110457697B (en) * 2019-08-01 2023-01-31 南京邮电大学 RDF data compression and decompression method based on anonymous predicate index
CN111026747A (en) * 2019-10-25 2020-04-17 广东数果科技有限公司 Distributed graph data management system, method and storage medium
CN111291185A (en) * 2020-01-21 2020-06-16 京东方科技集团股份有限公司 Information extraction method and device, electronic equipment and storage medium
WO2021147726A1 (en) * 2020-01-21 2021-07-29 京东方科技集团股份有限公司 Information extraction method and apparatus, electronic device and storage medium
CN111291185B (en) * 2020-01-21 2023-09-22 京东方科技集团股份有限公司 Information extraction method, device, electronic equipment and storage medium
US11922121B2 (en) 2020-01-21 2024-03-05 Boe Technology Group Co., Ltd. Method and apparatus for information extraction, electronic device, and storage medium

Also Published As

Publication number Publication date
CN106709006B (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN106709006A (en) Associated data compressing method friendly to query
US10216794B2 (en) Techniques for evaluating query predicates during in-memory table scans
CN110990638B (en) Large-scale data query acceleration device and method based on FPGA-CPU heterogeneous environment
KR102407510B1 (en) Method, apparatus, device and medium for storing and querying data
US9535939B2 (en) Intra-block partitioning for database management
CN107368527B (en) Multi-attribute index method based on data stream
CN104408192B (en) The compression processing method and device of character string type row
CN105677683A (en) Batch data query method and device
CN102402617A (en) Easily-compressed database index storage system utilizing fragments and sparse bitmap and corresponding construction, scheduling and query processing methods thereof
CN107357843B (en) Massive network data searching method based on data stream structure
CN106874425B (en) Storm-based real-time keyword approximate search algorithm
CN107704617A (en) A kind of compression method of the associated data based on classification tree index
CN107291964A (en) A kind of method that fuzzy query is realized based on HBase
US11657051B2 (en) Methods and apparatus for efficiently scaling result caching
CN106203171A (en) Big data platform Security Index system and method
CN106649286B (en) One kind carrying out the matched method of term based on even numbers group dictionary tree
CN108287985A (en) A kind of the DNA sequence dna compression method and system of GPU acceleration
Soransso et al. Data modeling for analytical queries on document-oriented DBMS
CN106503040A (en) It is suitable for KV data bases and its creation method of SQL query method
CN106484684B (en) Data in a kind of pair of database carry out the matched method of term
CN102214216B (en) Aggregation summarization method for keyword search result of hierarchical relation data
US20150012563A1 (en) Data mining using associative matrices
CN112052240A (en) HBase secondary memory index construction method based on coprocessor
Stockinger et al. Using bitmap index for joint queries on structured and text data
CN103049506A (en) Data caching method and system of mobile device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 430081 No. 947 Heping Avenue, Qingshan District, Hubei, Wuhan

Applicant after: WUHAN University OF SCIENCE AND TECHNOLOGY

Applicant after: Wuhan Chutianyun Technology Co.,Ltd.

Address before: 430081 No. 947 Heping Avenue, Qingshan District, Hubei, Wuhan

Applicant before: WUHAN University OF SCIENCE AND TECHNOLOGY

Applicant before: WUHAN CHUCLOUD TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20170524

Assignee: Wuhan Bilin Software Co.,Ltd.

Assignor: WUHAN University OF SCIENCE AND TECHNOLOGY

Contract record no.: X2022420000026

Denomination of invention: A query friendly associated data compression method

Granted publication date: 20201030

License type: Common License

Record date: 20220330