CN110990358A

CN110990358A - Decompression method, electronic equipment and computer readable storage medium

Info

Publication number: CN110990358A
Application number: CN201910944737.1A
Authority: CN
Inventors: 闫威
Original assignee: Migu Cultural Technology Co Ltd
Current assignee: Migu Cultural Technology Co Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-04-10
Anticipated expiration: 2039-09-30
Also published as: CN110990358B

Abstract

The embodiment of the invention relates to the field of data processing, and discloses a decompression method, electronic equipment and a computer-readable storage medium. In some embodiments of the present application, a decompression method includes: pre-decoding data to be decompressed to obtain block information of the data to be decompressed, wherein the block information indicates the position of a compressed block in the data to be decompressed; dividing data to be decompressed into N data blocks according to the block information, wherein each data block at least comprises one compression block, and N is a positive integer; each data block is decompressed concurrently. In this embodiment, the speed of decompression is increased.

Description

Decompression method, electronic equipment and computer readable storage medium

Technical Field

Embodiments of the present invention relate to the field of data processing, and in particular, to a decompression method, an electronic device, and a computer-readable storage medium.

Background

The data compression format based on the GZIP has high compression rate and high industrial maturity, so that the data compression format is widely used in the fields of internet and big data. For example, mass access logs of a Content Delivery Network (CDN) are usually compressed into GZIP packets for storage, and the compression rate can reach 5 to 6 times. The format data stream is transmitted to a big data platform through a network for real-time or off-line analysis, so that the network transmission efficiency can be greatly improved, and network congestion can be reduced.

However, the inventors found that at least the following problems exist in the prior art: the decompression speed of the decompression method based on the GZIP data compression format is too slow.

Disclosure of Invention

An object of embodiments of the present invention is to provide a decompression method, an electronic device, and a computer-readable storage medium, so as to improve a decompression speed.

To solve the above technical problem, an embodiment of the present invention provides a decompression method, including the following steps: pre-decoding data to be decompressed to obtain block information of the data to be decompressed, wherein the block information indicates the position of a compressed block in the data to be decompressed; dividing data to be decompressed into N data blocks according to the block information, wherein each data block at least comprises one compression block, and N is a positive integer; each data block is decompressed concurrently.

An embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the decompression method mentioned in the above embodiments.

Embodiments of the present invention also provide a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the decompression method mentioned in the above embodiments.

Compared with the prior art, the embodiment of the invention has the advantages that the block information of the data to be decompressed is obtained by pre-decoding the data to be decompressed, so that the data to be decompressed is partitioned into blocks, and parallel decompression is realized. Because each divided data block is decompressed in parallel, compared with the whole file decompression, the decompression speed is improved, and the total decompression delay time is reduced.

In addition, the pre-decoding is performed on the data to be decompressed to obtain block information of the data to be decompressed, and the method specifically comprises the following steps: predecoding data to be decompressed and determining position information of the tail of each compressed block; and determining block information according to the position information of the block tail of each compressed block.

In addition, predecoding data to be decompressed and determining the position information of the tail of each compressed block specifically include: according to the coding table of the data to be decompressed, carrying out code table matching on characters in the data to be decompressed; and if the code value matched with the character is 256, using the position information of the character as the position information of the tail of the current compressed block. In the implementation, only the end character of the compression block is locked, the distance and position replacement is not carried out, and then the parallel decompression is carried out after the block division, so that the decompression speed is improved.

In addition, according to the block information, dividing the data to be decompressed into N data blocks specifically includes: according to the block information and a preset merging rule, merging the compressed blocks in the data to be decompressed into N data blocks; in the merged data blocks, the first compressed block of the (i + 1) th data block is the same as the last compressed block of the ith data block, and i is more than or equal to 1 and less than N. In the implementation, each data block has redundancy with the last compressed block of the previous data block, so that the first compressed block in the data block can find a sufficiently long-distance reference character during decompression, and normal character replacement is completed.

In addition, the block information also indicates the order in which the blocks are compressed; the merging rule is as follows: merging the 1 st compression block to the Mth compression block into a data block according to the sequence of the compression blocks; judging whether the 2M is smaller than N; wherein M is a positive integer; if yes, combining the Mth to the 2 Mth compressed blocks into a data block; making M equal to 2M, and returning to execute the step of judging whether 2M is smaller than N; and if not, combining the Mth to Nth compressed blocks into one data block. In the implementation, the compression blocks are combined according to the sequence of the compression blocks, so that the continuity of the contents between the compression blocks during the subsequent parallel decompression is ensured.

In addition, the decompression process for the k-th data block includes: if the k is determined to be 1, starting decompression from the first compressed block, and decompressing to the last predetermined symbol of the last compressed block; if the condition that 1 < k < N is determined, starting decompression from the last preset symbol of the first compressed block, and decompressing to the last preset symbol of the last compressed block; and if the k is determined to be N, starting decompression from the last predetermined symbol of the first compressed block until the last compressed block is decompressed. In the implementation, the integrity of the content of each data block after decompression is ensured, and a foundation is provided for seamless integration of a hadoop platform and a streaming computing cluster (Spark computing platform).

In addition, the distributed computing platform divides the data to be decompressed into N data blocks according to the block information, and decompresses each data block concurrently.

In addition, the distributed computing platform is in communication connection with the Spark computing platform, and the distributed computing platform transmits the decompressed data of each data block to the Spark computing platform.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

Fig. 1 is a flow chart of a decompression method according to a first embodiment of the present invention;

fig. 2 is a flow chart of a decompression method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a decompression apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

A first embodiment of the present invention relates to a decompression method applied to an electronic device, such as a server or a terminal. As shown in fig. 1, the decompression method includes the following steps:

step 101: and pre-decoding the data to be decompressed to obtain block information of the data to be decompressed.

Specifically, the block information indicates the position of a compressed block in the data to be decompressed.

The block information may be position information of a block header of the compressed block, position information of a block end of the compressed block, or position information of a block header and position information of a block end of the compressed block, and is not limited herein.

In one embodiment, the block information is position information of a block end of the compressed block. Specifically, the electronic equipment pre-decodes data to be decompressed and determines the position information of the tail of each compressed block; and determining block information according to the position information of the block tail of each compressed block.

In one embodiment, the location information of the end of the block may be an addressing location of an end character of the compressed block. Specifically, the code value of the ending character of the compressed block is 256, and the electronic equipment performs code table matching on the characters in the data to be decompressed according to the code table of the data to be decompressed; and if the code value matched with the character is 256, using the position information of the character as the position information of the tail of the current compressed block.

Assuming that the data to be decompressed is a compressed file (hereinafter referred to as a deflate file) compressed by a lossless data compression algorithm (deflate algorithm), the process of the electronic device determining the block information is as follows: first, the electronic device unpacks the deflate file, that is, removes a header file in the deflate file, and reserves a compressed stream of the deflate file, where the header file may include description information of the deflate file, and the description information may indicate that the deflate file is a dynamic compressed file or a static compressed file. The coding tree of the deflate file is then read and the compressed stream is initially pre-decoded. Discarding the value solved in the pre-decoding process no matter whether the value is greater than or less than 256, and continuing to analyze backwards; if the solved value is found to be 256, indicating that the compressed block has ended, then buffering all information for the block; and continuing to process the next block according to the same rule until all blocks are analyzed. The predecoding process is usually implemented by decoding the coding tree through a huffman algorithm to obtain a coding table, and then contacting each character of the deflate file according to the coding table through an lz77 algorithm. In the process, only whether the read character is 256 or not is judged, and the distance position of the repeated character is not replaced, so that the content in the cached block is still data which is not completely decompressed. Through this step, one can obtain: the address of the end character of the compressed block and the compressed block that is not fully decompressed. Alternatively, in the pre-decoding process, each compressed block is numbered so that an ID number of each compressed block can be obtained.

It is worth mentioning that the separation of the compressed blocks and the addressing location of the ending character of the compressed blocks provide the basis for the subsequent implementation of parallel decompression.

It is worth mentioning that numbering each compressed block provides a basis for subsequent compressed block merging.

Step 102: and dividing the data to be decompressed into N data blocks according to the block information.

Specifically, each data block comprises at least one compressed block, and N is a positive integer.

In one embodiment, the electronic device may treat each compressed block as a data block.

In one embodiment, the data to be decompressed may be divided into N data blocks by the distributed computing platform according to the block information, and each data block may be decompressed concurrently.

Step 103: each data block is decompressed concurrently.

In particular, the electronic device may decompress each data block in parallel through the distributed computing platform. And determining a final decompressed file of the data to be decompressed according to the decompressed data of each data block.

In one embodiment, the distributed computing platform is communicatively coupled to a Spark computing platform, and the distributed computing platform transmits the decompressed data of each data block to the Spark computing platform.

The above description is only for illustrative purposes and does not limit the technical aspects of the present invention.

Compared with the prior art, the decompression method provided by the embodiment pre-decodes the data to be decompressed to obtain the block information of the data to be decompressed, so that the data to be decompressed is partitioned to realize parallel decompression. Because each divided data block is decompressed in parallel, compared with the whole file decompression, the decompression speed is improved, and the total decompression delay time is reduced.

A second embodiment of the present invention relates to a decompression method, and this embodiment exemplifies step 102 and step 103 of the first embodiment.

Specifically, as shown in fig. 2, the present embodiment includes steps 201 to 203, where step 201 is substantially the same as step 101 in the first embodiment, and is not repeated here. The following mainly introduces the differences:

step 201: and pre-decoding the data to be decompressed to obtain block information of the data to be decompressed.

Step 202: and according to the block information and a preset merging rule, merging the compressed blocks in the data to be decompressed into N data blocks.

Specifically, in the merged data blocks, the first compressed block of the (i + 1) th data block is the same as the last compressed block of the ith data block, and i is more than or equal to 1 and less than N.

It is worth mentioning that combining a plurality of compressed blocks into one compressed block can avoid the excessive parallel decompression tasks in the electronic device from occupying the resources of the electronic device.

In one embodiment, the block information further indicates the order of the compressed blocks, for example, the block information further includes the number of the compressed blocks, i.e. ID sequence number, and the merging rule is: merging the 1 st compression block to the Mth compression block into a data block according to the sequence of the compression blocks; judging whether the 2M is smaller than N; wherein M is a positive integer; if yes, combining the Mth to the 2 Mth compressed blocks into a data block; making M equal to 2M, and returning to execute the step of judging whether 2M is smaller than N; and if not, combining the Mth to Nth compressed blocks into one data block.

It should be noted that, as will be understood by those skilled in the art, in practical applications, M may be determined according to the data size of each compressed block and the parallel processing capability of the distributed computing platform, and is not limited herein.

The value is mentioned that the ID serial numbers of the compression blocks are combined in sequence, so that the continuity of the contents between the compression blocks during the subsequent parallel decompression is ensured.

It is worth mentioning that each data block has a redundancy for the last compressed block of the previous data block, so as to ensure that the first compressed block in the data block can find a reference character with a sufficiently long distance during decompression, thereby completing normal character replacement.

Suppose that the data to be decompressed is a deflate file (GZIP file), and the distributed computing platform is a hadoop platform. The Hadoop platform combines a predetermined number of compressed blocks into a Hadoop partition (data block) with a proper size according to the ID serial numbers of the compressed blocks. Wherein the predetermined number may be set as desired. Since the size of the compressed block of the GZIP file is about tens of K, and the partition of the Hadoop is usually 64M, starting a processing task for every tens of K is not an optimal solution for a large data system. Therefore, after identifying and caching all block information, the Hadoop platform needs to correspondingly merge the compressed blocks. Considering that the data amount after decompression is doubled, in this embodiment, 100 consecutive compressed blocks are combined into one large set as one hadoop partition. Each Hadoop partition provides the number, the ID sequence number, the content of the compressed block, etc. of the compressed blocks included in the partition as part of the Hadoop partition information.

When the compressed blocks are merged into hadoop partitions, the compressed blocks may be merged in order according to the ID numbers of the compressed blocks, for example, the first hadoop partition is #1, #2, #3, #4, and a hadoop partition is #8, #9, #10, #11, etc. The purpose of merging according to the ID sequence number sequence of the compressed blocks is to ensure that continuity exists between blocks during subsequent parallel decompression, and then smoothly complete decompression of each hadoop block.

In addition, when the compression blocks are merged into hadoop partitions, the following conditions can be met: the first compression block of each hadoop partition is the last compression block of the last hadoop partition. At the beginning of each hadoop partition (except the first hadoop partition), the last block of the previous hadoop partition is added for redundancy. If the first hadoop partition has 4 blocks #1, #2, #3 and #4, the second hadoop partition should contain #4, #5, #6, #7 and #8, so as to ensure that the block #4 is contained; the third hadoop partition should contain #8, #9, #10, #11, #12 …, and ensure that block #8 is contained, and so on, in order to ensure that the first block in the partition can find the reference character far enough away when decompressing, and complete the normal character replacement.

Step 203: each data block is decompressed concurrently.

Specifically, the decompression process for the k-th data block includes: if the k is determined to be 1, starting decompression from the first compressed block, and decompressing to the last predetermined symbol of the last compressed block; if the condition that 1 < k < N is determined, starting decompression from the last preset symbol of the first compressed block, and decompressing to the last preset symbol of the last compressed block; and if the k is determined to be N, starting decompression from the last predetermined symbol of the first compressed block until the last compressed block is decompressed.

The predetermined symbol may be a line break symbol, or may be another designated symbol, such as a colon, and the present embodiment is not limited thereto.

It should be noted that the electronic device may decompress the data in each data block through the LZ77 algorithm, or may select an appropriate decompression algorithm according to the compression algorithm of the data to be decompressed, so as to decompress the data in each data block, and this embodiment does not limit the algorithm used in the decompression process.

Assume that the predetermined symbol is a line break. After the hadoop partitions are formed in the last step, the distributed computing platform acquires the partition information list, and then, the content of each hadoop partition is decompressed in parallel, wherein the decompression process of each hadoop partition is as follows:

(1) judging whether the currently decompressed hadoop partition is a first partition; if the partition is the first partition, decoding is started from the first compressed block of the current partition until the last visible line break of the last compressed block of the current partition. The string following the last line break, considered incomplete, is discarded when the current partition is decompressed and left for the next partition to process. If the partition is not the first partition, entering the next step;

(2) and judging whether the current decompressed hadoop partition is the last partition. If the current partition is not the last partition, decoding is started from the last visible line break of the first compressed block of the current partition until the last visible line break of the last compressed block. I.e., the data before the last linefeed of the first compressed block is considered to have been processed by the previous partition. If the last partition is determined, entering the next step;

(3) and starting decoding from the last visible line break of the first compressed block of the current partition until the last compressed block is completely decompressed.

The purpose of the steps is to solve the problem that the decompressed content in the hadoop partition cannot be directly used for subsequent RDD construction and big data concurrent computation. Due to the GZIP compression characteristic, the compressed blocks allocated in each partition, particularly the first block and the last block, cannot guarantee that the first character of the first block is exactly the head character of a text record, and cannot guarantee that the last character of the last block is exactly the tail character of a text record. In this case, an abnormal situation such as a system error may occur when the partition content is directly handed over to the subsequent distributed computation. Through the steps, the last line break of the first block and the last line break of the last block of each hadoop partition are divided, so that the complete reading of the partition content can be guaranteed, the Spark computing platform can be guaranteed to be capable of constructing Spark RDD (resource description device), namely the minimum computing unit, based on the memory, and the GZIP file parallel decompression is finally realized, so that the hadoop platform and the streaming computing cluster (Spark computing platform) are seamlessly integrated.

The inventor finds that the core consumption of the existing GZIP-based file acquisition and big data calculation analysis system lies in the process of decompressing the GZIP. Because the special packaging head of the GZIP file and the storage of the compressed blocks are not based on whole byte storage but continuous storage of binary stream (bit stream), the start-stop and summary information of each compressed block is not specially provided in the stream, so that the list information of all blocks cannot be acquired in the stream at one time, and the characteristics enable that the GZIP does not support hadoop partition naturally and does not support parallel reading and decompression. The partitioning and partitioning characteristics of hadoop are not supported naturally, which means that even if the large data platform is large in scale, more physical machines and more CPUs are used, the advantages cannot be exerted. For a GZIP file, the default parallel multi-task decompression and calculation cannot be realized in the decompression stage, and only single-core single-process decompression is realized, so that the bottleneck greatly limits the computing capacity and the use efficiency of a large data platform. The decompression method provided by the embodiment obtains the block information of the compressed block through pre-decoding, and adaptively improves the hadoop platform decoding process, so that the GZIP file can be decoded in parallel by using the hadoop platform, and the decoding speed is improved.

It is worth mentioning that, because the data input into the hadoop platform is the uncompressed data to be compressed, compared with the way of inputting the decompressed data into the hadoop platform for partitioning, the decompression efficiency during real-time computation on the hadoop platform is improved, and through rapid preprocessing and caching, parallel decompression is realized, so that the total decompression delay time is reduced. The larger the size of the compressed file, the higher the promotion efficiency. In addition, the space of the decompression intermediate file in the Hadoop storage space is reduced by more than 10 times. Through the implementation mode, a Hadoop platform and a streaming computing framework, such as a Spark computing platform, are seamlessly integrated, a large-data service application developer only needs to pay attention to large-scale distributed operation development of services, and the pure technical problems of how to improve decompression efficiency and the like due to technical limitation brought by GZIP are avoided.

Compared with the prior art, the decompression method provided by the embodiment pre-decodes the data to be decompressed to obtain the block information of the data to be decompressed, so that the data to be decompressed is partitioned to realize parallel decompression. Because each divided data block is decompressed in parallel, compared with the whole file decompression, the decompression speed is improved, and the total decompression delay time is reduced. In addition, the ID sequence of each compression block is combined, so that the continuity of the contents between the compression blocks during the subsequent parallel decompression is ensured. Each data block has redundancy with the last compressed block of the previous data block, so that the first compressed block in the data block can find a reference character with enough long distance during decompression, and normal character replacement is completed.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

A third embodiment of the present invention relates to a decompression device, as shown in fig. 3, including: a pre-decode module 301, a block module 302, and a decompression module 303. The pre-decoding module 301 is configured to pre-decode data to be decompressed to obtain block information of the data to be decompressed, where the block information indicates a position of a compressed block in the data to be decompressed. The block dividing module 302 is configured to divide data to be decompressed into N data blocks according to the block information, where each data block at least includes one compressed block, and N is a positive integer; the decompression module 303 is configured to decompress each data block concurrently.

It should be understood that this embodiment is a system example corresponding to the first embodiment, and may be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.

It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.

A fourth embodiment of the present invention relates to an electronic apparatus, as shown in fig. 4, including: at least one processor 401; and a memory 402 communicatively coupled to the at least one processor 401; the memory 402 stores instructions executable by the at least one processor 401, and the instructions are executed by the at least one processor 401, so that the at least one processor 401 can execute the decompression method according to the above embodiments.

The electronic device includes: one or more processors 401 and a memory 402, one processor 401 being exemplified in fig. 4. The processor 401 and the memory 402 may be connected by a bus or other means, and fig. 4 illustrates the connection by a bus as an example. Memory 402, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 401 executes various functional applications of the device and data processing by running non-volatile software programs, instructions, and modules stored in the memory 402, that is, implements the decompression method described above.

The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 402 may optionally include memory located remotely from processor 401, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 402 and when executed by the one or more processors 401 perform the decompression method of any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.

A fifth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A method of decompression, comprising:

pre-decoding data to be decompressed to obtain block information of the data to be decompressed, wherein the block information indicates the position of a compression block in the data to be decompressed;

dividing the data to be decompressed into N data blocks according to the block information, wherein each data block at least comprises one compression block, and N is a positive integer;

decompressing each of the data blocks concurrently.

2. The decompression method according to claim 1, wherein the pre-decoding the data to be decompressed to obtain the block information of the data to be decompressed specifically comprises:

predecoding data to be decompressed and determining position information of the tail of each compressed block;

and determining the block information according to the position information of the block tail of each compressed block.

3. The decompression method according to claim 2, wherein the predecoding the data to be decompressed and determining the position information of the end of each compressed block specifically comprises:

according to the coding table of the data to be decompressed, carrying out code table matching on characters in the data to be decompressed;

and if the code value matched with the character is 256, using the position information of the character as the position information of the tail of the current compressed block.

4. The decompression method according to claim 1, wherein the dividing the data to be decompressed into N data blocks according to the block information specifically comprises:

according to the block information and a preset combination rule, combining the compressed blocks in the data to be decompressed into N data blocks;

and in the merged data blocks, the first compressed block of the (i + 1) th data block is the same as the last compressed block of the ith data block, and i is more than or equal to 1 and less than N.

5. The decompression method according to claim 4, wherein the block information further indicates an order of compressing the blocks; the merging rule is as follows:

merging the 1 st compression block to the Mth compression block into a data block according to the sequence of the compression blocks;

judging whether the 2M is smaller than N; wherein M is a positive integer;

if yes, combining the Mth to the 2 Mth compressed blocks into a data block; making M equal to 2M, and returning to execute the step of judging whether 2M is smaller than N;

and if not, combining the Mth to Nth compressed blocks into one data block.

6. The decompression method according to claim 1, wherein the decompression process for the kth data block comprises:

if the k is determined to be 1, starting decompression from the first compressed block, and decompressing to the last predetermined symbol of the last compressed block;

if the condition that 1 < k < N is determined, starting decompression from the last predetermined symbol of the first compressed block, and decompressing to the last predetermined symbol of the last compressed block;

and if the k is determined to be N, starting the decompression from the last predetermined symbol of the first compressed block until the last compressed block is decompressed.

7. The decompression method according to any one of claims 1 to 6, wherein the data to be decompressed is divided into N data blocks by a distributed computing platform according to the block information, and each data block is decompressed concurrently.

8. The decompression method according to claim 7, wherein the distributed computing platform is communicatively connected to a Spark computing platform, and the distributed computing platform transmits the decompressed data of each of the data blocks to the Spark computing platform.

9. An electronic device, comprising: at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the decompression method of any one of claims 1 to 8.

10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the decompression method of any one of claims 1 to 8.