Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing xarray.DataArray to grdfilter gives incorrect results #859

Closed
seisman opened this issue Feb 11, 2021 · 29 comments · Fixed by #1224
Closed

Passing xarray.DataArray to grdfilter gives incorrect results #859

seisman opened this issue Feb 11, 2021 · 29 comments · Fixed by #1224
Labels
bug Something isn't working upstream Bug or missing feature of upstream core GMT
Milestone

Comments

@seisman
Copy link
Member

seisman commented Feb 11, 2021

Description of the problem

Here is the result from GMT CLI. From the output, we can know the min and max of the filtered grid, -6147.49072266 and 5164.06005859.

$ gmt grdfilter @earth_relief_01d_p -Fg600 -D4 -Goutput.nc
$ gmt grdinfo -C output.nc
output.nc	-180	180	-90	90	-6147.49072266	5164.06005859	1	1	360	180	1	1

This is how we do the same thing in Python:

from pygmt import grdfilter
from pygmt.datasets import load_earth_relief

# passing a grid file
result1 = grdfilter("@earth_relief_01d_p", filter="g600", distance="4")
print("result1:", result1.data.min(), result1.data.max())

# passing a xarray.DataArray
grid = load_earth_relief(resolution="01d", registration="pixel")
result2 = grdfilter(grid=grid, filter="g600", distance="4")
print("result2:", result2.data.min(), result2.data.max())

The output is:

result1: -6147.4907 5164.06
result2: -6147.4727 5164.1157

Obviously, passing xarray.DataArray to grdfilter gives incorrect results.

It's still unclear to me if this is a PyGMT bug or an upstream GMT bug.

Note: the tests in test_grdfilter.py is testing the "wrong" results. We need to update the tests after figuring out the reason.

@seisman seisman added the bug Something isn't working label Feb 11, 2021
@seisman
Copy link
Member Author

seisman commented Mar 17, 2021

@PaulWessel It looks like a GMT bug. Please see if you can fix it when you have time.

@seisman seisman added the upstream Bug or missing feature of upstream core GMT label Mar 17, 2021
@PaulWessel
Copy link
Member

To keep it as simple as possible, can you demonstrate the same mismatch with -Fg300 or even better, -Fb300 ?

@seisman
Copy link
Member Author

seisman commented Mar 17, 2021

CLI:

gmt grdfilter @earth_relief_01d_p -Fb300 -D4 -Goutput.nc
gmt grdinfo -C output.nc
output.nc	-180	180	-90	90	-6470.13720703	5266.73339844	1	1	360180	1	1

Python:

from pygmt import grdfilter
from pygmt.datasets import load_earth_relief

# passing a grid file
result1 = grdfilter("@earth_relief_01d_p", filter="b300", distance="4")
print("result1:", result1.data.min(), result1.data.max())

# passing a xarray.DataArray
grid = load_earth_relief(resolution="01d", registration="pixel")
result2 = grdfilter(grid=grid, filter="b300", distance="4")
print("result2:", result2.data.min(), result2.data.max())

the output is even worse:

result1: -6470.137 5266.7334
result2: nan nan

@PaulWessel
Copy link
Member

How would you do the equivalent of

gmt grdcut @earth_relief_01d_p -R0/10/0/10 -Gt.nc
gmt grdfilter t.grd -Fb300 -D4 -Goutput.nc

in python, and does it differ then too (trying to make as small test as possible)

@seisman
Copy link
Member Author

seisman commented Mar 17, 2021

import pygmt

grid = pygmt.grdcut("@earth_relief_01d_p", region=[0, 10, 0, 10])
print(grid.data.max(), grid.data.min())

result = pygmt.grdfilter(grid, filter='b300', distance='4')
print(result.data.max(), result.data.min())

output:

821.5 -4852.0
nan nan

@seisman
Copy link
Member Author

seisman commented Mar 27, 2021

Ping @PaulWessel just in case you forget it.

@PaulWessel
Copy link
Member

Was starting to look using my new M1 laptop but first have to install miniconda and that got me this far:

bash ~/Downloads/Miniconda3-py39_4.9.2-MacOSX-x86_64.sh
WARNING:
Your operating system appears not to be 64-bit, but you are trying to
install a 64-bit version of Miniconda3.
Are sure you want to continue the installation? [yes|no]

Some googling talks about condo running in emulation mode but not sure about miniconda. I guess this is all work in progress. Not sure how to proceed. Perhaps just not do this on laptop but do it on the old Mac Pro Intel...

@PaulWessel
Copy link
Member

Hi @seisman, is this still failing after the last API fixes today/yesterday?

@seisman
Copy link
Member Author

seisman commented Apr 6, 2021

It's still failing.

@PaulWessel
Copy link
Member

Odd, I just stepped through the whole grdfilter example above (the 10x10 case) and I got the same result as CLI. Sure you rebuilt etc?

@seisman
Copy link
Member Author

seisman commented Apr 6, 2021

I think I'm using the latest GMT version (6.2.0_c673720_2021.04.05) and all three examples still fail.

What happens if you just run the examples, not in Xcode debugging.

@PaulWessel
Copy link
Member

(base) pwessel@macnut:~/GMTdev/pygmt-> python
Python 3.8.3 (default, May 19 2020, 13:54:14) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygmt
>>> 
>>> grid = pygmt.grdcut("@earth_relief_01d_p", region=[0, 10, 0, 10])
>>> print(grid.data.max(), grid.data.min())
821.5 -4852.0
>>> 
>>> result = pygmt.grdfilter(grid, filter='b300', distance='4')
>>> print(result.data.max(), result.data.min())
599.0929 -4787.664

Same result with release build or xbuild.

@seisman
Copy link
Member Author

seisman commented Apr 6, 2021

This is my output:

$ python
Python 3.8.5 (default, Sep  4 2020, 02:22:02)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygmt
>>> pygmt.show_versions()
PyGMT information:
  version: v0.3.2.dev74+g9e290825
System information:
  python: 3.8.5 (default, Sep  4 2020, 02:22:02)  [Clang 10.0.0 ]
  executable: /Users/seisman/.miniconda/bin/python
  machine: macOS-10.16-x86_64-i386-64bit
Dependency information:
  numpy: 1.20.1
  pandas: 1.2.3
  xarray: 0.17.0
  netCDF4: 1.5.6
  packaging: 20.9
  ghostscript: 9.53.3
  gmt: 6.2.0_c673720_2021.04.05
GMT library information:
  binary dir: /Users/seisman/.miniconda/bin
  cores: 8
  grid layout: rows
  library path: /Users/seisman/opt/GMT-master/lib/libgmt.dylib
  padding: 2
  plugin dir: /Users/seisman/opt/GMT-master/lib/gmt/plugins
  share dir: /Users/seisman/opt/GMT-master/share
  version: 6.2.0
>>> grid = pygmt.grdcut("@earth_relief_01d_p", region=[0, 10, 0, 10])
>>> print(grid.data.max(), grid.data.min())
821.5 -4852.0
>>>
>>> result = pygmt.grdfilter(grid, filter='b300', distance='4')
>>> print(result.data.max(), result.data.min())
nan nan
>>>

Can you show your output of pygmt.show_versions()?

@PaulWessel
Copy link
Member

(base) pwessel@macnut:~/GMTdev/pygmt-> python
Python 3.8.3 (default, May 19 2020, 13:54:14) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> pygmt.show_versions()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'pygmt' is not defined
>>> import pygmt
>>> pygmt.show_versions()
PyGMT information:
  version: v0.2.0+12.g56ba0832
System information:
  python: 3.8.3 (default, May 19 2020, 13:54:14)  [Clang 10.0.0 ]
  executable: /Users/pwessel/opt/miniconda3/bin/python
  machine: macOS-10.16-x86_64-i386-64bit
Dependency information:
  numpy: 1.19.2
  pandas: 1.1.2
  xarray: 0.16.1
  netCDF4: 1.5.4
  packaging: 20.4
  ghostscript: 9.53.3
  gmt: 6.2.0_2eab897-dirty_2021.04.05
GMT library information:
  binary dir: /Users/pwessel/opt/miniconda3/bin
  cores: 24
  grid layout: rows
  library path: /Users/pwessel/GMTdev/gmt-dev/rbuild/gmt6/lib/libgmt.dylib
  padding: 2
  plugin dir: /Users/pwessel/GMTdev/gmt-dev/rbuild/gmt6/lib/gmt/plugins
  share dir: /Users/pwessel/GMTdev/gmt-dev/rbuild/gmt6/share
  version: 6.2.0
>>> grid = pygmt.grdcut("@earth_relief_01d_p", region=[0, 10, 0, 10])
>>> print(grid.data.max(), grid.data.min())
821.5 -4852.0
>>> 

@seisman
Copy link
Member Author

seisman commented Apr 6, 2021

The PyGMT version doesn't make sense:

PyGMT information:
  version: v0.2.0+12.g56ba0832

Did you update your pygmt to the latest (using git pull)? And maybe reinstall PyGMT using make install?

@PaulWessel
Copy link
Member

OK, I had done git pull but not make install. Now done this morning and still works

(base) pwessel@macnut:~/GMTdev/pygmt-> python
Python 3.8.3 (default, May 19 2020, 13:54:14) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygmt
>>> 
>>> grid = pygmt.grdcut("@earth_relief_01d_p", region=[0, 10, 0, 10])
>>> print(grid.data.max(), grid.data.min())
821.5 -4852.0
>>> 
>>> result = pygmt.grdfilter(grid, filter='b300', distance='4')
>>> print(result.data.max(), result.data.min())
599.0929 -4787.664
>>> pygmt.show_versions()
PyGMT information:
  version: v0.3.1.dev120+gb1f4133f
System information:
  python: 3.8.3 (default, May 19 2020, 13:54:14)  [Clang 10.0.0 ]
  executable: /Users/pwessel/opt/miniconda3/bin/python
  machine: macOS-10.16-x86_64-i386-64bit
Dependency information:
  numpy: 1.19.2
  pandas: 1.1.2
  xarray: 0.16.1
  netCDF4: 1.5.4
  packaging: 20.4
  ghostscript: 9.53.3
  gmt: 6.2.0_2eab897-dirty_2021.04.05
GMT library information:
  binary dir: /Users/pwessel/opt/miniconda3/bin
  cores: 24
  grid layout: rows
  library path: /Users/pwessel/GMTdev/gmt-dev/rbuild/gmt6/lib/libgmt.dylib
  padding: 2
  plugin dir: /Users/pwessel/GMTdev/gmt-dev/rbuild/gmt6/lib/gmt/plugins
  share dir: /Users/pwessel/GMTdev/gmt-dev/rbuild/gmt6/share
  version: 6.2.0

I also followed this in xcode and the temporary files that pygmt makes (the grid, etc) match that of CLI.

@seisman
Copy link
Member Author

seisman commented Apr 6, 2021

Can @meghanrjones or @weiji14 check if it works for you (you need to build the latest GMT dev).

@maxrjones
Copy link
Member

Can @meghanrjones or @weiji14 check if it works for you (you need to build the latest GMT dev).

I can check it out.

@maxrjones
Copy link
Member

I'm getting the same results as @seisman. I do not understand why pygmt.show_versions shows v0.3.2 for me:

PyGMT information:
version: v0.3.2.dev78+ga4eec49d

While @PaulWessel's has 0.3.1:

PyGMT information:
version: v0.3.1.dev120+gb1f4133f

If that's not the issue, perhaps it would be worth trying to sync the dependencies since it seems that @seisman and I are using later xarray versions.

@PaulWessel
Copy link
Member

I do not understand either, but cannot see how this can be a bug in GMT if it steps through and does all the right things?

@seisman
Copy link
Member Author

seisman commented Apr 6, 2021

The version difference is due to a known limitation of setuptools_scm (#912).

@seisman
Copy link
Member Author

seisman commented Apr 6, 2021

I just created a new conda environment and install the package versions @PaulWessel is using. I still have the same problem.

@PaulWessel Your PyGMT version v0.3.1.dev120+gb1f4133f doesn't make sense.

Could you please follow the instructions below:

cd pygmt/
git checkout master
git pull
pip uninstall pygmt
make install

and then check pygmt.show_versions() again.

@PaulWessel
Copy link
Member

OK:

(base) pwessel@macnut:~/GMTdev/pygmt-> git checkout master
Switched to branch 'master'
Your branch is behind 'origin/master' by 277 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)

git pull (lots of updates).

(base) pwessel@macnut:~/GMTdev/pygmt-> python
Python 3.8.3 (default, May 19 2020, 13:54:14) 
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pygmt
>>> pygmt.show_versions()
PyGMT information:
  version: v0.3.2.dev78+ga4eec49d
System information:
  python: 3.8.3 (default, May 19 2020, 13:54:14)  [Clang 10.0.0 ]
  executable: /Users/pwessel/opt/miniconda3/bin/python
  machine: macOS-10.16-x86_64-i386-64bit
Dependency information:
  numpy: 1.19.2
  pandas: 1.1.2
  xarray: 0.16.1
  netCDF4: 1.5.4
  packaging: 20.4
  ghostscript: 9.53.3
  gmt: 6.2.0_2eab897-dirty_2021.04.05
GMT library information:
  binary dir: /Users/pwessel/opt/miniconda3/bin
  cores: 24
  grid layout: rows
  library path: /Users/pwessel/GMTdev/gmt-dev/rbuild/gmt6/lib/libgmt.dylib
  padding: 2
  plugin dir: /Users/pwessel/GMTdev/gmt-dev/rbuild/gmt6/lib/gmt/plugins
  share dir: /Users/pwessel/GMTdev/gmt-dev/rbuild/gmt6/share
  version: 6.2.0
>>> grid = pygmt.grdcut("@earth_relief_01d_p", region=[0, 10, 0, 10])
>>> print(grid.data.max(), grid.data.min())
821.5 -4852.0
>>> 
>>> result = pygmt.grdfilter(grid, filter='b300', distance='4')
>>> print(result.data.max(), result.data.min())
nan nan

I assume the only thing that changed here is pyGMT and that the shared GMT lib stayed the same? So perhaps this is an issue in how the gmt info call is done? Can you check the intermediate grid in /tmp?

@seisman
Copy link
Member Author

seisman commented Apr 6, 2021

Good to know that we're on the same page.

I think the old version you were using (v0.3.1.dev120+gb1f4133f) is from this branch (#517), in which I change "GMT_IN|GMT_IS_REFERENCE" to "GMT_IN".

I just switched to that PR, and it works for me. So, it fails for "GMT_IN|GMT_IS_REFERENCE", but works for "GMT_IN".

@PaulWessel
Copy link
Member

Which step is failing? the filtering or the hidden grdinfo on the temp file?

@seisman
Copy link
Member Author

seisman commented Apr 6, 2021

The result variable is:

<xarray.DataArray 'z' (lat: 10, lon: 10)>
array([[-4.78765088e+03, -4.66174365e+03, -4.44416650e+03,
        -4.20050000e+03, -3.92733325e+03, -3.78450000e+03,
        -3.39150000e+03, -1.18400000e+03, -2.15200000e+03,
        -1.02500000e+02],
       [-4.78037695e+03, -4.72881396e+03, -4.48929199e+03,
        -4.13781885e+03, -3.85460278e+03, -3.33823193e+03,
        -2.30468335e+03, -2.24452539e+03, -1.38591980e+03,
        -9.59372192e+02],
       [-4.70636475e+03, -4.46724658e+03, -4.18454590e+03,
        -3.98674048e+03, -3.52549683e+03, -2.73720386e+03,
        -2.43260181e+03, -1.82589233e+03, -1.83602002e+03,
        -9.43485229e+02],
       [-4.28431592e+03, -4.22724170e+03, -4.09034619e+03,
        -3.59750879e+03, -3.00039795e+03, -2.11359961e+03,
        -1.38362561e+03, -1.34146802e+03, -9.78972961e+02,
        -2.18933426e+02],
       [-4.11023340e+03, -3.97998047e+03, -3.68851514e+03,
        -3.52370239e+03, -2.20820093e+03, -7.08126648e+02,
        -3.55464844e+02, -2.41948364e+02,  1.33764954e+02,
         1.62218369e+02],
       [-1.89801746e+03, -1.89643042e+03, -1.91765100e+03,
        -1.19407947e+03, -1.86361618e+02, -1.18267006e+02,
         6.83262253e+01,  1.39698380e+02,  1.37670410e+02,
         3.06902313e+02],
       [ 7.62500000e+01,  5.18333321e+01,  9.16666698e+00,
         2.50000000e+00, -1.25197708e+02,  6.48353195e+01,
         1.16186203e+02,  1.30855728e+02,  2.55223892e+02,
         2.97338562e+02],
       [ 1.07000000e+02,  4.55000000e+01,  3.00000000e+00,
        -2.10000000e+01,  2.55000000e+01,  8.75000000e+01,
         1.39000000e+02,  2.14500000e+02,  6.70000000e+01,
         5.22000000e+02],
       [            nan,             nan,             nan,
                    nan,             nan,             nan,
                    nan,             nan,             nan,
                    nan],
       [            nan,             nan,             nan,
                    nan,             nan,             nan,
                    nan,             nan,             nan,
                    nan]], dtype=float32)
Coordinates:
  * lon      (lon) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
  * lat      (lat) float64 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
Attributes:
    long_name:     z
    actual_range:  [-4787.65087891   522.        ]

@PaulWessel
Copy link
Member

OK, and this is with GMT_IN, and with REFERENCE it works OK? Not sure why grdinfo would return NaN in the limits since the grid clearly has non-NaN values.

@seisman
Copy link
Member Author

seisman commented Apr 6, 2021

No, PyGMT is using GMT_IN|GMT_IS_REFERENCE, and it fails. If I change it to GMT_IN, it works.

The min and max values are not obtained from the grdinfo call. So any NaN values in data can cause:

>>> print(result.data.max(), result.data.min())
nan nan

@PaulWessel
Copy link
Member

Sorry, I wrote it backwards but meant what you wrote above (!).
I can look at grdfilter and see if there are places where we change any input grid value (thus requiring GMT_IN) or if it ought to work with reference (not changing any values).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Bug or missing feature of upstream core GMT
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants