Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expedite read operations by using multiple threads #23

Closed
gwlucastrig opened this issue Jun 13, 2022 · 4 comments
Closed

Expedite read operations by using multiple threads #23

gwlucastrig opened this issue Jun 13, 2022 · 4 comments

Comments

@gwlucastrig
Copy link
Owner

gwlucastrig commented Jun 13, 2022

This issue proposes to use a multi-threaded approach to improve the speed of reading data. It applies to files that are stored with data compression.

When GVRS reads data from a file that uses data compression, there are two cost factors:

  1. Access times for reading data from file.
  2. Processing times for decompressing the data.

It turns out that decompression is a significant contributor to access times. For example, reading the entire set of raw data from the uncompressed version of the ETOPO1 global elevation and depth data set (233 million points) requires 0.277 seconds just for file access. The compressed version requires 3.34 seconds for combined file access and decompression.

The Gridfour team is currently investigating an approach to reading data from a file using multiple-threads to perform the decompression operation.

Recall that a GVRS file is organized in tiles. If an application accesses tiles in a random order, there’s not much that additional threads can do to expedite data access. But if the application accesses tiles in a predictable order, the GVRS library can predict the next tile that the application will require and read and decompress it ahead of time using a supporting thread.
In our initial experiments, access time for compressed ETOPO1 was reduced from 3.34 seconds to 1.88 seconds.

The GVRS API also includes an enhanced data compression technique known as LSOP that improves compression ratios but requires more processing time that the standard technique. In our experiments with the LSOP version of ETOPO1, reading time was reduced from 8.22 seconds to 4.36 seconds.

We also tested with the much larger GEBCO 2020 data set (3.7 billion points). Time to read the entire data set was reduced from 66.4 seconds to 37.2 seconds.

Remaining tasks for this issue include the creation of Junit tests, code inspections, and documentation.

@gwlucastrig
Copy link
Owner Author

The initial code for this issue is now pushed up to the version 1.0.3-SNAPSHOT of the Gridfour code.

For an example of how to enable multi-threading, please see the GvrsReadPerformance.java demonstration code.

Again, we note that multiple threading is useful only when working with compressed data.

@gwlucastrig
Copy link
Owner Author

I have pushed out JUnit tests to verify the following:

  1. The API properly closes its background thread (the "GVRS Reading Assistant") when reading compressed data. This JUnit test verifies normal, successful close operations. It also tests termination of the thread when GVRS encounters an IOException and the file is closed in a try-with-resources block.
  2. The API properly handles cases where the file features a mix of compressed and non-compressed data tiles. For example, random data is usually non-compressible. So this test verifies that the reading assistant handles non-compressed tiles successfully.

Please see MultiThreadReadTest.java for more details.

@gwlucastrig
Copy link
Owner Author

gwlucastrig commented Jun 15, 2022

The enhancements for multi-threaded reading are now compete. JUnit tests are implemented. And our wiki includes updated content describing this feature at GVRS Using Multiple Threads to Speed Processing .

The multi-threaded read implementation is available as part of the 1.0.3-SNAPSHOT version of the software now available on Github. The full 1.0.3 release is planned for late summer, 2022.

@gwlucastrig
Copy link
Owner Author

This issue is now closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant