CN110990358A - Decompression method, electronic equipment and computer readable storage medium - Google Patents
Decompression method, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN110990358A CN110990358A CN201910944737.1A CN201910944737A CN110990358A CN 110990358 A CN110990358 A CN 110990358A CN 201910944737 A CN201910944737 A CN 201910944737A CN 110990358 A CN110990358 A CN 110990358A
- Authority
- CN
- China
- Prior art keywords
- block
- data
- decompressed
- compressed
- decompression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000006837 decompression Effects 0.000 title claims abstract description 89
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000007906 compression Methods 0.000 claims abstract description 28
- 230000006835 compression Effects 0.000 claims abstract description 28
- 238000004590 computer program Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 abstract description 5
- 238000005192 partition Methods 0.000 description 47
- 238000004422 calculation algorithm Methods 0.000 description 10
- 238000013144 data compression Methods 0.000 description 4
- 238000000638 solvent extraction Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The embodiment of the invention relates to the field of data processing, and discloses a decompression method, electronic equipment and a computer-readable storage medium. In some embodiments of the present application, a decompression method includes: pre-decoding data to be decompressed to obtain block information of the data to be decompressed, wherein the block information indicates the position of a compressed block in the data to be decompressed; dividing data to be decompressed into N data blocks according to the block information, wherein each data block at least comprises one compression block, and N is a positive integer; each data block is decompressed concurrently. In this embodiment, the speed of decompression is increased.
Description
Technical Field
Embodiments of the present invention relate to the field of data processing, and in particular, to a decompression method, an electronic device, and a computer-readable storage medium.
Background
The data compression format based on the GZIP has high compression rate and high industrial maturity, so that the data compression format is widely used in the fields of internet and big data. For example, mass access logs of a Content Delivery Network (CDN) are usually compressed into GZIP packets for storage, and the compression rate can reach 5 to 6 times. The format data stream is transmitted to a big data platform through a network for real-time or off-line analysis, so that the network transmission efficiency can be greatly improved, and network congestion can be reduced.
However, the inventors found that at least the following problems exist in the prior art: the decompression speed of the decompression method based on the GZIP data compression format is too slow.
Disclosure of Invention
An object of embodiments of the present invention is to provide a decompression method, an electronic device, and a computer-readable storage medium, so as to improve a decompression speed.
To solve the above technical problem, an embodiment of the present invention provides a decompression method, including the following steps: pre-decoding data to be decompressed to obtain block information of the data to be decompressed, wherein the block information indicates the position of a compressed block in the data to be decompressed; dividing data to be decompressed into N data blocks according to the block information, wherein each data block at least comprises one compression block, and N is a positive integer; each data block is decompressed concurrently.
An embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the decompression method mentioned in the above embodiments.
Embodiments of the present invention also provide a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the decompression method mentioned in the above embodiments.
Compared with the prior art, the embodiment of the invention has the advantages that the block information of the data to be decompressed is obtained by pre-decoding the data to be decompressed, so that the data to be decompressed is partitioned into blocks, and parallel decompression is realized. Because each divided data block is decompressed in parallel, compared with the whole file decompression, the decompression speed is improved, and the total decompression delay time is reduced.
In addition, the pre-decoding is performed on the data to be decompressed to obtain block information of the data to be decompressed, and the method specifically comprises the following steps: predecoding data to be decompressed and determining position information of the tail of each compressed block; and determining block information according to the position information of the block tail of each compressed block.
In addition, predecoding data to be decompressed and determining the position information of the tail of each compressed block specifically include: according to the coding table of the data to be decompressed, carrying out code table matching on characters in the data to be decompressed; and if the code value matched with the character is 256, using the position information of the character as the position information of the tail of the current compressed block. In the implementation, only the end character of the compression block is locked, the distance and position replacement is not carried out, and then the parallel decompression is carried out after the block division, so that the decompression speed is improved.
In addition, according to the block information, dividing the data to be decompressed into N data blocks specifically includes: according to the block information and a preset merging rule, merging the compressed blocks in the data to be decompressed into N data blocks; in the merged data blocks, the first compressed block of the (i + 1) th data block is the same as the last compressed block of the ith data block, and i is more than or equal to 1 and less than N. In the implementation, each data block has redundancy with the last compressed block of the previous data block, so that the first compressed block in the data block can find a sufficiently long-distance reference character during decompression, and normal character replacement is completed.
In addition, the block information also indicates the order in which the blocks are compressed; the merging rule is as follows: merging the 1 st compression block to the Mth compression block into a data block according to the sequence of the compression blocks; judging whether the 2M is smaller than N; wherein M is a positive integer; if yes, combining the Mth to the 2 Mth compressed blocks into a data block; making M equal to 2M, and returning to execute the step of judging whether 2M is smaller than N; and if not, combining the Mth to Nth compressed blocks into one data block. In the implementation, the compression blocks are combined according to the sequence of the compression blocks, so that the continuity of the contents between the compression blocks during the subsequent parallel decompression is ensured.
In addition, the decompression process for the k-th data block includes: if the k is determined to be 1, starting decompression from the first compressed block, and decompressing to the last predetermined symbol of the last compressed block; if the condition that 1 < k < N is determined, starting decompression from the last preset symbol of the first compressed block, and decompressing to the last preset symbol of the last compressed block; and if the k is determined to be N, starting decompression from the last predetermined symbol of the first compressed block until the last compressed block is decompressed. In the implementation, the integrity of the content of each data block after decompression is ensured, and a foundation is provided for seamless integration of a hadoop platform and a streaming computing cluster (Spark computing platform).
In addition, the distributed computing platform divides the data to be decompressed into N data blocks according to the block information, and decompresses each data block concurrently.
In addition, the distributed computing platform is in communication connection with the Spark computing platform, and the distributed computing platform transmits the decompressed data of each data block to the Spark computing platform.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.
Fig. 1 is a flow chart of a decompression method according to a first embodiment of the present invention;
fig. 2 is a flow chart of a decompression method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a decompression apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
A first embodiment of the present invention relates to a decompression method applied to an electronic device, such as a server or a terminal. As shown in fig. 1, the decompression method includes the following steps:
step 101: and pre-decoding the data to be decompressed to obtain block information of the data to be decompressed.
Specifically, the block information indicates the position of a compressed block in the data to be decompressed.
The block information may be position information of a block header of the compressed block, position information of a block end of the compressed block, or position information of a block header and position information of a block end of the compressed block, and is not limited herein.
In one embodiment, the block information is position information of a block end of the compressed block. Specifically, the electronic equipment pre-decodes data to be decompressed and determines the position information of the tail of each compressed block; and determining block information according to the position information of the block tail of each compressed block.
In one embodiment, the location information of the end of the block may be an addressing location of an end character of the compressed block. Specifically, the code value of the ending character of the compressed block is 256, and the electronic equipment performs code table matching on the characters in the data to be decompressed according to the code table of the data to be decompressed; and if the code value matched with the character is 256, using the position information of the character as the position information of the tail of the current compressed block.
Assuming that the data to be decompressed is a compressed file (hereinafter referred to as a deflate file) compressed by a lossless data compression algorithm (deflate algorithm), the process of the electronic device determining the block information is as follows: first, the electronic device unpacks the deflate file, that is, removes a header file in the deflate file, and reserves a compressed stream of the deflate file, where the header file may include description information of the deflate file, and the description information may indicate that the deflate file is a dynamic compressed file or a static compressed file. The coding tree of the deflate file is then read and the compressed stream is initially pre-decoded. Discarding the value solved in the pre-decoding process no matter whether the value is greater than or less than 256, and continuing to analyze backwards; if the solved value is found to be 256, indicating that the compressed block has ended, then buffering all information for the block; and continuing to process the next block according to the same rule until all blocks are analyzed. The predecoding process is usually implemented by decoding the coding tree through a huffman algorithm to obtain a coding table, and then contacting each character of the deflate file according to the coding table through an lz77 algorithm. In the process, only whether the read character is 256 or not is judged, and the distance position of the repeated character is not replaced, so that the content in the cached block is still data which is not completely decompressed. Through this step, one can obtain: the address of the end character of the compressed block and the compressed block that is not fully decompressed. Alternatively, in the pre-decoding process, each compressed block is numbered so that an ID number of each compressed block can be obtained.
It is worth mentioning that the separation of the compressed blocks and the addressing location of the ending character of the compressed blocks provide the basis for the subsequent implementation of parallel decompression.
It is worth mentioning that numbering each compressed block provides a basis for subsequent compressed block merging.
Step 102: and dividing the data to be decompressed into N data blocks according to the block information.
Specifically, each data block comprises at least one compressed block, and N is a positive integer.
In one embodiment, the electronic device may treat each compressed block as a data block.
In one embodiment, the data to be decompressed may be divided into N data blocks by the distributed computing platform according to the block information, and each data block may be decompressed concurrently.
Step 103: each data block is decompressed concurrently.
In particular, the electronic device may decompress each data block in parallel through the distributed computing platform. And determining a final decompressed file of the data to be decompressed according to the decompressed data of each data block.
In one embodiment, the distributed computing platform is communicatively coupled to a Spark computing platform, and the distributed computing platform transmits the decompressed data of each data block to the Spark computing platform.
The above description is only for illustrative purposes and does not limit the technical aspects of the present invention.
Compared with the prior art, the decompression method provided by the embodiment pre-decodes the data to be decompressed to obtain the block information of the data to be decompressed, so that the data to be decompressed is partitioned to realize parallel decompression. Because each divided data block is decompressed in parallel, compared with the whole file decompression, the decompression speed is improved, and the total decompression delay time is reduced.
A second embodiment of the present invention relates to a decompression method, and this embodiment exemplifies step 102 and step 103 of the first embodiment.
Specifically, as shown in fig. 2, the present embodiment includes steps 201 to 203, where step 201 is substantially the same as step 101 in the first embodiment, and is not repeated here. The following mainly introduces the differences:
step 201: and pre-decoding the data to be decompressed to obtain block information of the data to be decompressed.
Step 202: and according to the block information and a preset merging rule, merging the compressed blocks in the data to be decompressed into N data blocks.
Specifically, in the merged data blocks, the first compressed block of the (i + 1) th data block is the same as the last compressed block of the ith data block, and i is more than or equal to 1 and less than N.
It is worth mentioning that combining a plurality of compressed blocks into one compressed block can avoid the excessive parallel decompression tasks in the electronic device from occupying the resources of the electronic device.
In one embodiment, the block information further indicates the order of the compressed blocks, for example, the block information further includes the number of the compressed blocks, i.e. ID sequence number, and the merging rule is: merging the 1 st compression block to the Mth compression block into a data block according to the sequence of the compression blocks; judging whether the 2M is smaller than N; wherein M is a positive integer; if yes, combining the Mth to the 2 Mth compressed blocks into a data block; making M equal to 2M, and returning to execute the step of judging whether 2M is smaller than N; and if not, combining the Mth to Nth compressed blocks into one data block.
It should be noted that, as will be understood by those skilled in the art, in practical applications, M may be determined according to the data size of each compressed block and the parallel processing capability of the distributed computing platform, and is not limited herein.
The value is mentioned that the ID serial numbers of the compression blocks are combined in sequence, so that the continuity of the contents between the compression blocks during the subsequent parallel decompression is ensured.
It is worth mentioning that each data block has a redundancy for the last compressed block of the previous data block, so as to ensure that the first compressed block in the data block can find a reference character with a sufficiently long distance during decompression, thereby completing normal character replacement.
Suppose that the data to be decompressed is a deflate file (GZIP file), and the distributed computing platform is a hadoop platform. The Hadoop platform combines a predetermined number of compressed blocks into a Hadoop partition (data block) with a proper size according to the ID serial numbers of the compressed blocks. Wherein the predetermined number may be set as desired. Since the size of the compressed block of the GZIP file is about tens of K, and the partition of the Hadoop is usually 64M, starting a processing task for every tens of K is not an optimal solution for a large data system. Therefore, after identifying and caching all block information, the Hadoop platform needs to correspondingly merge the compressed blocks. Considering that the data amount after decompression is doubled, in this embodiment, 100 consecutive compressed blocks are combined into one large set as one hadoop partition. Each Hadoop partition provides the number, the ID sequence number, the content of the compressed block, etc. of the compressed blocks included in the partition as part of the Hadoop partition information.
When the compressed blocks are merged into hadoop partitions, the compressed blocks may be merged in order according to the ID numbers of the compressed blocks, for example, the first hadoop partition is #1, #2, #3, #4, and a hadoop partition is #8, #9, #10, #11, etc. The purpose of merging according to the ID sequence number sequence of the compressed blocks is to ensure that continuity exists between blocks during subsequent parallel decompression, and then smoothly complete decompression of each hadoop block.
In addition, when the compression blocks are merged into hadoop partitions, the following conditions can be met: the first compression block of each hadoop partition is the last compression block of the last hadoop partition. At the beginning of each hadoop partition (except the first hadoop partition), the last block of the previous hadoop partition is added for redundancy. If the first hadoop partition has 4 blocks #1, #2, #3 and #4, the second hadoop partition should contain #4, #5, #6, #7 and #8, so as to ensure that the block #4 is contained; the third hadoop partition should contain #8, #9, #10, #11, #12 …, and ensure that block #8 is contained, and so on, in order to ensure that the first block in the partition can find the reference character far enough away when decompressing, and complete the normal character replacement.
Step 203: each data block is decompressed concurrently.
Specifically, the decompression process for the k-th data block includes: if the k is determined to be 1, starting decompression from the first compressed block, and decompressing to the last predetermined symbol of the last compressed block; if the condition that 1 < k < N is determined, starting decompression from the last preset symbol of the first compressed block, and decompressing to the last preset symbol of the last compressed block; and if the k is determined to be N, starting decompression from the last predetermined symbol of the first compressed block until the last compressed block is decompressed.
The predetermined symbol may be a line break symbol, or may be another designated symbol, such as a colon, and the present embodiment is not limited thereto.
It should be noted that the electronic device may decompress the data in each data block through the LZ77 algorithm, or may select an appropriate decompression algorithm according to the compression algorithm of the data to be decompressed, so as to decompress the data in each data block, and this embodiment does not limit the algorithm used in the decompression process.
Assume that the predetermined symbol is a line break. After the hadoop partitions are formed in the last step, the distributed computing platform acquires the partition information list, and then, the content of each hadoop partition is decompressed in parallel, wherein the decompression process of each hadoop partition is as follows:
(1) judging whether the currently decompressed hadoop partition is a first partition; if the partition is the first partition, decoding is started from the first compressed block of the current partition until the last visible line break of the last compressed block of the current partition. The string following the last line break, considered incomplete, is discarded when the current partition is decompressed and left for the next partition to process. If the partition is not the first partition, entering the next step;
(2) and judging whether the current decompressed hadoop partition is the last partition. If the current partition is not the last partition, decoding is started from the last visible line break of the first compressed block of the current partition until the last visible line break of the last compressed block. I.e., the data before the last linefeed of the first compressed block is considered to have been processed by the previous partition. If the last partition is determined, entering the next step;
(3) and starting decoding from the last visible line break of the first compressed block of the current partition until the last compressed block is completely decompressed.
The purpose of the steps is to solve the problem that the decompressed content in the hadoop partition cannot be directly used for subsequent RDD construction and big data concurrent computation. Due to the GZIP compression characteristic, the compressed blocks allocated in each partition, particularly the first block and the last block, cannot guarantee that the first character of the first block is exactly the head character of a text record, and cannot guarantee that the last character of the last block is exactly the tail character of a text record. In this case, an abnormal situation such as a system error may occur when the partition content is directly handed over to the subsequent distributed computation. Through the steps, the last line break of the first block and the last line break of the last block of each hadoop partition are divided, so that the complete reading of the partition content can be guaranteed, the Spark computing platform can be guaranteed to be capable of constructing Spark RDD (resource description device), namely the minimum computing unit, based on the memory, and the GZIP file parallel decompression is finally realized, so that the hadoop platform and the streaming computing cluster (Spark computing platform) are seamlessly integrated.
The inventor finds that the core consumption of the existing GZIP-based file acquisition and big data calculation analysis system lies in the process of decompressing the GZIP. Because the special packaging head of the GZIP file and the storage of the compressed blocks are not based on whole byte storage but continuous storage of binary stream (bit stream), the start-stop and summary information of each compressed block is not specially provided in the stream, so that the list information of all blocks cannot be acquired in the stream at one time, and the characteristics enable that the GZIP does not support hadoop partition naturally and does not support parallel reading and decompression. The partitioning and partitioning characteristics of hadoop are not supported naturally, which means that even if the large data platform is large in scale, more physical machines and more CPUs are used, the advantages cannot be exerted. For a GZIP file, the default parallel multi-task decompression and calculation cannot be realized in the decompression stage, and only single-core single-process decompression is realized, so that the bottleneck greatly limits the computing capacity and the use efficiency of a large data platform. The decompression method provided by the embodiment obtains the block information of the compressed block through pre-decoding, and adaptively improves the hadoop platform decoding process, so that the GZIP file can be decoded in parallel by using the hadoop platform, and the decoding speed is improved.
It is worth mentioning that, because the data input into the hadoop platform is the uncompressed data to be compressed, compared with the way of inputting the decompressed data into the hadoop platform for partitioning, the decompression efficiency during real-time computation on the hadoop platform is improved, and through rapid preprocessing and caching, parallel decompression is realized, so that the total decompression delay time is reduced. The larger the size of the compressed file, the higher the promotion efficiency. In addition, the space of the decompression intermediate file in the Hadoop storage space is reduced by more than 10 times. Through the implementation mode, a Hadoop platform and a streaming computing framework, such as a Spark computing platform, are seamlessly integrated, a large-data service application developer only needs to pay attention to large-scale distributed operation development of services, and the pure technical problems of how to improve decompression efficiency and the like due to technical limitation brought by GZIP are avoided.
The above description is only for illustrative purposes and does not limit the technical aspects of the present invention.
Compared with the prior art, the decompression method provided by the embodiment pre-decodes the data to be decompressed to obtain the block information of the data to be decompressed, so that the data to be decompressed is partitioned to realize parallel decompression. Because each divided data block is decompressed in parallel, compared with the whole file decompression, the decompression speed is improved, and the total decompression delay time is reduced. In addition, the ID sequence of each compression block is combined, so that the continuity of the contents between the compression blocks during the subsequent parallel decompression is ensured. Each data block has redundancy with the last compressed block of the previous data block, so that the first compressed block in the data block can find a reference character with enough long distance during decompression, and normal character replacement is completed.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A third embodiment of the present invention relates to a decompression device, as shown in fig. 3, including: a pre-decode module 301, a block module 302, and a decompression module 303. The pre-decoding module 301 is configured to pre-decode data to be decompressed to obtain block information of the data to be decompressed, where the block information indicates a position of a compressed block in the data to be decompressed. The block dividing module 302 is configured to divide data to be decompressed into N data blocks according to the block information, where each data block at least includes one compressed block, and N is a positive integer; the decompression module 303 is configured to decompress each data block concurrently.
It should be understood that this embodiment is a system example corresponding to the first embodiment, and may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
A fourth embodiment of the present invention relates to an electronic apparatus, as shown in fig. 4, including: at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executed by the at least one processor 401, so that the at least one processor 401 can execute the decompression method according to the above embodiments.
The electronic device includes: one or more processors 401 and a memory 402, one processor 401 being exemplified in fig. 4. The processor 401 and the memory 402 may be connected by a bus or other means, and fig. 4 illustrates the connection by a bus as an example. Memory 402, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 401 executes various functional applications of the device and data processing by running non-volatile software programs, instructions, and modules stored in the memory 402, that is, implements the decompression method described above.
The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 402 may optionally include memory located remotely from processor 401, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 402 and when executed by the one or more processors 401 perform the decompression method of any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.
A fifth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.
Claims (10)
1. A method of decompression, comprising:
pre-decoding data to be decompressed to obtain block information of the data to be decompressed, wherein the block information indicates the position of a compression block in the data to be decompressed;
dividing the data to be decompressed into N data blocks according to the block information, wherein each data block at least comprises one compression block, and N is a positive integer;
decompressing each of the data blocks concurrently.
2. The decompression method according to claim 1, wherein the pre-decoding the data to be decompressed to obtain the block information of the data to be decompressed specifically comprises:
predecoding data to be decompressed and determining position information of the tail of each compressed block;
and determining the block information according to the position information of the block tail of each compressed block.
3. The decompression method according to claim 2, wherein the predecoding the data to be decompressed and determining the position information of the end of each compressed block specifically comprises:
according to the coding table of the data to be decompressed, carrying out code table matching on characters in the data to be decompressed;
and if the code value matched with the character is 256, using the position information of the character as the position information of the tail of the current compressed block.
4. The decompression method according to claim 1, wherein the dividing the data to be decompressed into N data blocks according to the block information specifically comprises:
according to the block information and a preset combination rule, combining the compressed blocks in the data to be decompressed into N data blocks;
and in the merged data blocks, the first compressed block of the (i + 1) th data block is the same as the last compressed block of the ith data block, and i is more than or equal to 1 and less than N.
5. The decompression method according to claim 4, wherein the block information further indicates an order of compressing the blocks; the merging rule is as follows:
merging the 1 st compression block to the Mth compression block into a data block according to the sequence of the compression blocks;
judging whether the 2M is smaller than N; wherein M is a positive integer;
if yes, combining the Mth to the 2 Mth compressed blocks into a data block; making M equal to 2M, and returning to execute the step of judging whether 2M is smaller than N;
and if not, combining the Mth to Nth compressed blocks into one data block.
6. The decompression method according to claim 1, wherein the decompression process for the kth data block comprises:
if the k is determined to be 1, starting decompression from the first compressed block, and decompressing to the last predetermined symbol of the last compressed block;
if the condition that 1 < k < N is determined, starting decompression from the last predetermined symbol of the first compressed block, and decompressing to the last predetermined symbol of the last compressed block;
and if the k is determined to be N, starting the decompression from the last predetermined symbol of the first compressed block until the last compressed block is decompressed.
7. The decompression method according to any one of claims 1 to 6, wherein the data to be decompressed is divided into N data blocks by a distributed computing platform according to the block information, and each data block is decompressed concurrently.
8. The decompression method according to claim 7, wherein the distributed computing platform is communicatively connected to a Spark computing platform, and the distributed computing platform transmits the decompressed data of each of the data blocks to the Spark computing platform.
9. An electronic device, comprising: at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the decompression method of any one of claims 1 to 8.
10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the decompression method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910944737.1A CN110990358B (en) | 2019-09-30 | 2019-09-30 | Decompression method, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910944737.1A CN110990358B (en) | 2019-09-30 | 2019-09-30 | Decompression method, electronic equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110990358A true CN110990358A (en) | 2020-04-10 |
CN110990358B CN110990358B (en) | 2023-06-30 |
Family
ID=70081984
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910944737.1A Active CN110990358B (en) | 2019-09-30 | 2019-09-30 | Decompression method, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110990358B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111708574A (en) * | 2020-05-28 | 2020-09-25 | 中国科学院信息工程研究所 | Instruction stream compression and decompression method and device |
CN111884658A (en) * | 2020-07-09 | 2020-11-03 | 上海兆芯集成电路有限公司 | Data decompression method, data compression method and convolution operation device |
CN113868206A (en) * | 2021-10-08 | 2021-12-31 | 八十一赞科技发展(重庆)有限公司 | Data compression method, decompression method, device and storage medium |
CN114124106A (en) * | 2022-01-28 | 2022-03-01 | 苏州浪潮智能科技有限公司 | LZ4 decompression method, system, storage medium and equipment |
CN114172521A (en) * | 2022-02-08 | 2022-03-11 | 苏州浪潮智能科技有限公司 | Decompression chip verification method, device and equipment and readable storage medium |
CN118277348A (en) * | 2024-05-31 | 2024-07-02 | 天津南大通用数据技术股份有限公司 | LZO compressed file loading method and system |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010038642A1 (en) * | 1999-01-29 | 2001-11-08 | Interactive Silicon, Inc. | System and method for performing scalable embedded parallel data decompression |
US20020031271A1 (en) * | 2000-09-08 | 2002-03-14 | Matsushita Electric Industrial Co., Ltd. | Image processor and image processing method for decompressing progressive-coded compressed image data |
US6822589B1 (en) * | 1999-01-29 | 2004-11-23 | Quickshift, Inc. | System and method for performing scalable embedded parallel data decompression |
CN101355364A (en) * | 2008-09-08 | 2009-01-28 | 北大方正集团有限公司 | Method and apparatus for compressing and decompressing file |
US20110018745A1 (en) * | 2009-07-23 | 2011-01-27 | Kabushiki Kaisha Toshiba | Compression/decompression apparatus and compression/decompression method |
CN103428494A (en) * | 2013-08-01 | 2013-12-04 | 浙江大学 | Image sequence coding and recovering method based on cloud computing platform |
CN103581673A (en) * | 2012-08-07 | 2014-02-12 | 上海算芯微电子有限公司 | Video data compression or decompression method and system |
CN103581675A (en) * | 2012-08-07 | 2014-02-12 | 上海算芯微电子有限公司 | Video data compression or decompression method and system |
CN103997648A (en) * | 2014-06-11 | 2014-08-20 | 中国科学院自动化研究所 | System and method for achieving decompression of JPEG2000 standard images rapidly based on DSPs |
CN104753540A (en) * | 2015-03-05 | 2015-07-01 | 华为技术有限公司 | Data compression method, data decompression method and device |
CN106503165A (en) * | 2016-10-31 | 2017-03-15 | 杭州华为数字技术有限公司 | Compression, decompressing method, device and equipment |
CN107404654A (en) * | 2017-08-23 | 2017-11-28 | 郑州云海信息技术有限公司 | A kind of jpeg image decompression method, device and platform |
CN107977442A (en) * | 2017-12-08 | 2018-05-01 | 北京希嘉创智教育科技有限公司 | Journal file compresses and decompression method, electronic equipment and readable storage medium storing program for executing |
CN107977233A (en) * | 2016-10-19 | 2018-05-01 | 华为技术有限公司 | The quick loading method of kernel mirror image file and device |
-
2019
- 2019-09-30 CN CN201910944737.1A patent/CN110990358B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010038642A1 (en) * | 1999-01-29 | 2001-11-08 | Interactive Silicon, Inc. | System and method for performing scalable embedded parallel data decompression |
US6822589B1 (en) * | 1999-01-29 | 2004-11-23 | Quickshift, Inc. | System and method for performing scalable embedded parallel data decompression |
US20020031271A1 (en) * | 2000-09-08 | 2002-03-14 | Matsushita Electric Industrial Co., Ltd. | Image processor and image processing method for decompressing progressive-coded compressed image data |
CN101355364A (en) * | 2008-09-08 | 2009-01-28 | 北大方正集团有限公司 | Method and apparatus for compressing and decompressing file |
US20110018745A1 (en) * | 2009-07-23 | 2011-01-27 | Kabushiki Kaisha Toshiba | Compression/decompression apparatus and compression/decompression method |
CN103581673A (en) * | 2012-08-07 | 2014-02-12 | 上海算芯微电子有限公司 | Video data compression or decompression method and system |
CN103581675A (en) * | 2012-08-07 | 2014-02-12 | 上海算芯微电子有限公司 | Video data compression or decompression method and system |
CN103428494A (en) * | 2013-08-01 | 2013-12-04 | 浙江大学 | Image sequence coding and recovering method based on cloud computing platform |
CN103997648A (en) * | 2014-06-11 | 2014-08-20 | 中国科学院自动化研究所 | System and method for achieving decompression of JPEG2000 standard images rapidly based on DSPs |
CN104753540A (en) * | 2015-03-05 | 2015-07-01 | 华为技术有限公司 | Data compression method, data decompression method and device |
CN107977233A (en) * | 2016-10-19 | 2018-05-01 | 华为技术有限公司 | The quick loading method of kernel mirror image file and device |
CN106503165A (en) * | 2016-10-31 | 2017-03-15 | 杭州华为数字技术有限公司 | Compression, decompressing method, device and equipment |
CN107404654A (en) * | 2017-08-23 | 2017-11-28 | 郑州云海信息技术有限公司 | A kind of jpeg image decompression method, device and platform |
CN107977442A (en) * | 2017-12-08 | 2018-05-01 | 北京希嘉创智教育科技有限公司 | Journal file compresses and decompression method, electronic equipment and readable storage medium storing program for executing |
Non-Patent Citations (3)
Title |
---|
宋刚等: "基于共享存储和Gzip的并行压缩算法研究", 《计算机工程与设计》 * |
樊星等: "大场景点云文件多核并行批量压缩方法研究", 《太原理工大学学报》 * |
胡日辉: ""基于GPU的J2K解压缩技术研究"", 《中国硕士学位论文全文数据库信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111708574A (en) * | 2020-05-28 | 2020-09-25 | 中国科学院信息工程研究所 | Instruction stream compression and decompression method and device |
CN111708574B (en) * | 2020-05-28 | 2023-03-31 | 中国科学院信息工程研究所 | Instruction stream compression and decompression method and device |
CN111884658A (en) * | 2020-07-09 | 2020-11-03 | 上海兆芯集成电路有限公司 | Data decompression method, data compression method and convolution operation device |
CN113868206A (en) * | 2021-10-08 | 2021-12-31 | 八十一赞科技发展(重庆)有限公司 | Data compression method, decompression method, device and storage medium |
CN114124106A (en) * | 2022-01-28 | 2022-03-01 | 苏州浪潮智能科技有限公司 | LZ4 decompression method, system, storage medium and equipment |
CN114172521A (en) * | 2022-02-08 | 2022-03-11 | 苏州浪潮智能科技有限公司 | Decompression chip verification method, device and equipment and readable storage medium |
CN118277348A (en) * | 2024-05-31 | 2024-07-02 | 天津南大通用数据技术股份有限公司 | LZO compressed file loading method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110990358B (en) | 2023-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110990358B (en) | Decompression method, electronic equipment and computer readable storage medium | |
US10862513B2 (en) | Data processing unit having hardware-based parallel variable-length codeword decoding | |
RU2630750C1 (en) | Device and method for encoding and decoding initial data | |
US11431351B2 (en) | Selection of data compression technique based on input characteristics | |
US8279096B2 (en) | Parallel compression for dictionary-based sequential coders | |
JP5123186B2 (en) | Remote protocol support for large object communication in any format | |
US8125364B2 (en) | Data compression/decompression method | |
CN112214462B (en) | Multi-layer decompression method for compressed file, electronic device and storage medium | |
CN108287877B (en) | FPGA (field programmable Gate array) compression/decompression system and hardware decompression method for RIB (run in Box) rendering compressed file | |
CN114337678A (en) | Data compression method, device, equipment and storage medium | |
US7889102B2 (en) | LZSS with multiple dictionaries and windows | |
KR102542239B1 (en) | Data output method, data acquisition method, device, and electronic equipment | |
CN112290953B (en) | Array encoding device and method, array decoding device and method for multi-channel data stream | |
CN105791819A (en) | Frame compression method for image and decompression method and device for image | |
CN115643310B (en) | Method, device and system for compressing data | |
CN116684595A (en) | Ultra-low-time-delay image coding system, method and device and storage medium | |
US9998745B2 (en) | Transforming video bit streams for parallel processing | |
US10931303B1 (en) | Data processing system | |
CN113890540A (en) | Parallel acceleration LZ77 decoding method and device | |
US8823557B1 (en) | Random extraction from compressed data | |
CN103929404B (en) | Method for analyzing HTTP chunked code data | |
US7999705B2 (en) | Unicode-compatible base-N range coding | |
CN118100955B (en) | Method for preprocessing compressed data by parallel decompression | |
CN118132522B (en) | Data compression device, method and chip | |
US11966597B1 (en) | Multi-domain configurable data compressor/de-compressor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |