-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor GVRS to improve metadata and API #10
Comments
Tonight, I pushed a large number of changes to refactor elements of the GVRS API and file format as described in the text above. Unfortunately, the new API and file format breaks compatibility with earlier versions of GVRS (again, as described above). I believe that these items are now stable and should not have to change in the foreseeable future (barring any unanticipated set backs). For the next couple of weeks, I plan on refining some of the internals and adding new features and JUnit tests. When these are complete, I will be treating the code as Release 1.0 of Gridfour and making the initial submission of the Gridfour API to the Maven Central Repository. However, before I make any further code changes, I will be updating the wiki to reflect the new API. Beyond that, there are a few areas that I would still like to address:
Once this work is complete, the next major undertaking will be to provide documentation on the GVRS file format. As always, I welcome any suggestions for ways to make the GVRS API more effective, efficient, or easier to understand. Thanks. Gary |
One thing that may need revision is the creation of the UUID to uniquely identify a GVRS product. The idea of a UUID is that it would provide a unique identifier for each and every GVRS "product". Currently, the UUID is established when a GvrsFileSpecification is constructed. A new GVRS file is established by calling the GvrsFile constructor and passing a file specification into it. The constructor opens up a file on disk and writes header information to it. The problem with this approach is that the UUID is tied to the file-specification object, not the file object. And there is nothing preventing an application from creating multiple, different GVRS files using the same specification. Therefore, I am looking at the possibility of moving the logic that establishes a UUID into the GvrsFile constructor and taking it out of the GvrsFileSpecification class. In the current implementation, the UUID is used for coordinating a GVRS file with its associated index file. The index is a "side-car" file that can be written when a GVRS file is closed. It is used when a GVRS file is opened (for reading or writing) to load up the file positions of internal content. The availability of an index can significantly reduce the time required to open a GVRS file. Since both files are part of the same group of things (e.g. the same "product"), they share a common UUID. This feature allows us to be sure that the correct index is used when an existing GVRS file is opened. |
I added the UUID to the main GVRS file header (GvrsFile.java) and took it out of the GvrsFileSpecification. I also made a modification to clean up some confusing code in the record allocation logic. Once again, these changes break compatibility with the earlier ISSUE 10 versions. I think I am done making changes that alter the file structure. I have a few more features to add, but I have already reserved space for them. I regret the inconvenience that these frequent changes have caused as I move toward completing the refactoring operation. One prominent change that I am thinking about is eliminating the use of a separate "index file" and moving the index into the main GVRS file body. Fortunately, I have already planned for this change and, when implemented, it will not break compatibility. I am still on track to complete this issue and submit Gridfour to the Maven Central Repository by year's end. Thank you for your patience in this matter. |
I have integrated the content of the index file into the main GVRS file thus eliminating the need for a separate "side car" file. The original purpose of the index file was to expedite opening a large GVRS file. These functions are not integrated into the GVRS file itself. The index file is no longer used. This change does not break backward compatibility. I am currently working on support for very large data files. At present, the maximum file size supported by the Java implementation is 32 gigabytes. This limitation is due to an incomplete implementation of the GVRS file format specification. GVRS itself supports a 64-bit address space. The changes are relatively minor and should be done in the next couple of days (the real challenge isn't the implementation, but the testing thereof). Once that change is in place, the remaining work consists of the following:
|
I pushed changes to support very large data files. The size of GVRS files is no longer tied to the 32 GB limit. |
To exercise GVRS' metadata features, I am working on a demonstration application that reads elevation data from a Shuttle Radar Topography Mission (SRTM) file, creates a shaded-relief image, and stores the results as a GeoTIFF file. The TIFF tags from the source file are transcribed to GVRS metadata objects and stored with the GVRS file. They are used to compute the geographic parameters for rendering the image. Once the image is done, the demonstration application writes out a GeoTIFF file, using the metadata to format TIFF tags as appropriate. The application depends on the Apache Commons Imaging library for reading and writing TIFF files. The algorithms for the shaded-relief technique are described in Elevation GeoTIFF Part 1 -- Shaded Relief The image below shows a work in progress. It's a down-sampled JEPG. The actual TIFF files are quite a bit larger (3600-by-3600 pixels). I hope to have the demonstration ready for review in a couple of weeks. Incidentally, when storing the raw SRTM elevation information using GVRS' data compression, the output required 2.09 bits per sample value. SRTM data tends to compress rather well. |
I didn't have access to my computer over the holidays, but as I was washing the dishes from Christmas dinner it occurred to me that there was a flaw in the file-space management logic. I have fixed the problem and am doing some testing before pushing out the change. I should have an update in the next few days. When data compression is enabled, a change to a even a single data cell in a grid often changes the compressed size of the tile that contains it. So if an application is adjusting the content of an existing tile (record) in the file, it may need to allocate a bigger block of file space to store it. In such a case, the formerly occupied section of the file is added to a "free list" for future use and the data is written to a new file location. The RecordManager class takes care of all of that. The flaw occurred when the last block of free space in the file happened to be at the end of the file, but was not large enough to hold the new content. The RecordManager realized that it had to allow the file size to grow, but it didn't realized that it could re-use the space occupied by that last block. So GVRS would end up making the file larger than it actually needed to be. The whole file-allocation process is similar to the way C/C++ programs handle malloc, realloc, and free. At some point in a future release, I am going research algorithms for malloc and see if they can be applied to GVRS. My current implementation is pretty sturdy, but I suspect it is also a bit naïve. There may be opportunities to improve performance or attain more efficient use of file space. |
I pushed a new commit to Github with the changes described above. In addition to JUnit tests, I have a test procedure I run with the PackageData demo application in which I set the tile cache to a small size and enable data compression. This configuration results in the tiles being written and re-written multiple times as data is added to the file. As each row of the source data is scanned, the storage size for the tiles grows progressively larger (as empty data cells are replaced with valid data). As processing progresses, tiles are read from disk and re-written to a new location in the file. The file-space management logic reclaims their old storage locations for future use. Because the tile cache is so small, each tile is written and then re-written to disk 120 times. The test ran just fine. I am satisfied with the behavior of the file-space management system. The code is now very close to being ready for the release of version 1.0. If you are interested, you can read more about how the tile cache operates in the PackageData demonstration application at The Tile Cache |
One of this goals of Issue-10 was to establish the final Version 1.0 of the GVRS file format before pushing out the first release. I am considering making one last change that will break compatibility with earlier files before submitting the Gridfour core library to the Maven Central Repository and making the official 1.0 release of Gridfour. Right now, the Gridfour user base is rather small (perhaps non-existent), so I think the impact of the change would be small. However, if anyone has built up a collection of GVRS files, please let me know so I can think of an alternate approach to solving the problem I wish to address. Thank you for your attention in this matter. |
Such exciting changes! |
I am glad that you like the idea. Merci! The current test program still needs a lot of work before it's ready to distribute. I look forward to posting it to the Gridfour site sometime in the next couple of weeks. The current version of the test program stores into the GVRS file a subset of the TIFF tags that were taken from the original TIFF file. They are then transcribed into the output GeoTIFF file. By preserving the GeoTIFF tags from the original, the process ensures that the final output file is also a valid GeoTIFF file. The following lists the GeoTIFF tags that the demonstration application currently supports. I am researching ideas for including more TIFF tags.
Do you have suggestions about other TIFF tags that you think the program should be preserving? |
Hi Gary, |
Hi Erwan, I used the TIFF-to-GVRS-to-TIFF test program to process that DEM that you gave to me a couple of months ago. I had to make some modifications to my test program because your DEM uses a projected coordinate system (Cartesian coordinates) rather than a geographic coordinate system. Testing with your file worked out well, because it uncovered some bugs in my coordinate transformation logic. Anyway, I used the process to create a TIFF file and then plotted it on Google Earth. I think the results would be compatible with any good quality GIS system. |
I have encountered additional delays in the refactoring effort. This week, I started looking at what would be required to implement a Raster Pyramid feature. Although this feature will not be added until a future release, I wanted to be sure that I could do it without breaking compatibility with the current implementation. This effort revealed some significant limitations in the current file format. I just pushed a new version of the GVRS code to Github. I believe that Version 1.0 is very close to being ready for release. One other thing I will be adding to the release is support for an AffineTransform for mapping real-valued coordinates to the raster grid and vice versa. Although I have previously implemented basic scale-and-offset transforms, this feature will permit the addition of skew and rotation of coordinate systems into the GVRS file specification. I still have to do testing, code review, and write some more JUnit tests. But I hope to release Version 1.0 next weekend (Jan 16th). |
Hi @gwlucastrig , Erwan |
The demo program for tiff to gvrs is not ready yet. But I make up a zip
file tonight and send it to you.
…On Mon, Jan 10, 2022, 2:34 PM Bocher ***@***.***> wrote:
Hi @gwlucastrig <https://github.com/gwlucastrig> ,
I hope you are well.
I'm testing the new API starting from DemoCOG.java. In a previous message,
you talk about a TIFF-to-GVRS-to-TIFF test program but I'm not able to find
it.
Erwan
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEWJDYOMLUJCNLCYOIEWWITUVMYGFANCNFSM5IWBGN7Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Erwan,
I've emailed you a zip file containing source code for the
TIFF-to-GVRS-to-TIFF demonstration application.
In the version that I sent you, I commented out the lines that activate
data compression. If you un-comment them and
activate data compression, it reduces the size of the GVRS file for your
dem.tiff from 13 megabytes down to 1.
As a diagnostic tool, I made the demo program print out details about the
TIFF file it is processing. Here's the information from dem.tiff:
Summary of GeoTIFF Elements ----------------------------
GDAL No-Data value: -9999
GeoKey Table
key ref len value/pos name
1 1 0 20 ~~~ ~~~
1024 0 1 1 GTModelTypeGeoKey
Projected Coordinate System
1025 0 1 1 GTRasterTypeGeoKey
RasterPixelIsArea
1026 34737 22 0 (A) GTCitationGeoKey
NTF_Lambert_II_�tendu
2048 0 1 32767 GeographicTypeGeoKey ~~~
2049 34737 35 22 (A) GeogCitationGeoKey GCS
Name = NTF|Primem = Greenwich|
2050 0 1 6275 GeogGeodeticDatumGeoKey See
GeoTIFF specification
2054 0 1 9102 GeogAngularUnitsGeoKey
Degrees
2057 34736 1 7 (D) GeogSemiMajorAxisGeoKey
6378249.2
2059 34736 1 6 (D) GeogInvFlatteningGeoKey
293.46602
2061 34736 1 8 (D) GeogPrimeMeridianLongGeoKey See
GeoTIFF specification
3072 0 1 32767 ProjectedCRSGeoKey
User-Defined Projection
3074 0 1 32767 ProjectionGeoKey
User-Defined
3075 0 1 8 ProjCoordTransGeoKey
LambertConfConic_2SP
3076 0 1 9001 ProjLinearUnitsGeoKey Metre
3078 34736 1 2 (D) ProjStdParallel1GeoKey
45.8989
3079 34736 1 3 (D) ProjStdParallel2GeoKey
47.6960
3084 34736 1 1 (D) ProjFalseOriginLongGeoKey
2.3372
3085 34736 1 0 (D) ProjFalseOriginLatGeoKey
46.8000
3086 34736 1 4 (D) ProjFalseOriginEastingGeoKey
600000.0000
3087 34736 1 5 (D) ProjFalseOriginNorthingGeoKey
2200000.0000
ModelPixelScale
2.5000000000e+01 2.5000000000e+01 0.0000000000e+00
ModelTiepointTag
Pixel Model
0.0 0.0 0.0 273987.500 2291012.500 0.000
On Mon, Jan 10, 2022 at 4:23 PM Gary Lucas ***@***.***>
wrote:
… The demo program for tiff to gvrs is not ready yet. But I make up a zip
file tonight and send it to you.
On Mon, Jan 10, 2022, 2:34 PM Bocher ***@***.***> wrote:
> Hi @gwlucastrig <https://github.com/gwlucastrig> ,
> I hope you are well.
> I'm testing the new API starting from DemoCOG.java. In a previous
> message, you talk about a TIFF-to-GVRS-to-TIFF test program but I'm not
> able to find it.
>
> Erwan
>
> —
> Reply to this email directly, view it on GitHub
> <#10 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AEWJDYOMLUJCNLCYOIEWWITUVMYGFANCNFSM5IWBGN7Q>
> .
> Triage notifications on the go with GitHub Mobile for iOS
> <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
> or Android
> <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
>
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Now that I am almost done with this issue I am thinking about the next step after I release the Version 1.0. I think the next thing I will do is to write some more wiki pages describing "How To" use GVRS. I can only guess at what developers who are new to GVRS need to know, and any questions or suggestions that you may have will help me narrow it down. Topics I am considering
Any other suggestions that you may have. I've put a lot of information into the Javadoc, but in places where the use of the software is not self-evident, I think some supplemental wiki's would help. |
I am pleased to announce that I have completed work on this issue. When I started it in November, I had no idea how much work it was going to be, but I am happy with pretty much every change I've made in both the code and the file format. I am treating the current state of Gridfour as Version 1.0. I believe that the file-format is now stable and will not change in the near future. While the API could still be extended with additional methods and Javadoc, I believe it is now in a state where additional changes can wait until future releases. I have just pushed changes up to Github for Version 1.0. I am getting ready to push Gridfour Jars out to Maven Central. Thank you for your patience in this matter. |
I am working on a significant revision to the GVRS API and file-format. I plan to submit changes by the end of 2021. The changes include:
Unfortunately, these changes will be incompatible with earlier versions of GVRS. In particular, older GVRS files will be inaccessible. If you have built up a collection of GVRS files, please let me know so that we can figure out the easiest way to transition to the new format.
Ordinarily, I try to avoid breaking compatibility across revisions. But since GVRS is still in pre-alpha development, it seems like the most efficient way to move forward.
The text was updated successfully, but these errors were encountered: