Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with OpenDAP #253

Closed
joa-quim opened this issue Mar 18, 2024 · 11 comments
Closed

Issues with OpenDAP #253

joa-quim opened this issue Mar 18, 2024 · 11 comments

Comments

@joa-quim
Copy link

Hi Alexander,

I', having errors when trying to reproduce the example in Data from NASA EarthData). On Windows, it it starts well

julia> url = "https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/GHRSST%20Level%204%20MUR%20Global%20Foundation%20Sea%20Surface%20Temperature%20Analysis%20(v4.1)/granules/20190101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1";

julia> ds = NCDataset(url)
Dataset: https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/GHRSST%20Level%204%20MUR%20Global%20Foundation%20Sea%20Surface%20Temperature%20Analysis%20(v4.1)/granules/20190101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1
Group: /

Dimensions
   lat = 17999
   lon = 36000
   time = 1

Variables
  analysed_sst   (36000 × 17999 × 1)
    Datatype:    Union{Missing, Float64} (Int16)
    Dimensions:  lon × lat × time

...

julia> lonr=(-15,-14); latr=(35,36);

julia> ds_subset = NCDatasets.@select(ds["analysed_sst"], $lonr[1] <= lon <= $lonr[2] && $latr[1] <= lat <= $latr[2])
analysed_sst (101 × 101 × 1)
  Datatype:    Union{Missing, Float64}
  Dimensions:  lon × lat × time
  Attributes:

Even loads the lon, lat fine

julia> lon = ds_subset["lon"][:];

julia> lat = ds_subset["lat"][:];

julia> time = ds_subset["time"][1]
2019-01-01T09:00:00

But it stalls for more than an hour at

SST = ncvar[:,:,1];

Tried the ncdump example in WSL and got immediately this error

ncdump -h "https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/GHRSST%20Level%204%20MUR%20Global%20Foundation%20Sea%20Surface%20Temperature%20Analysis%20(v4.1)/granules/20190101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1"
syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: HTTP^ Basic: Access denied.
ncdump: https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/GHRSST%20Level%204%20MUR%20Global%20Foundation%20Sea%20Surface%20Temperature%20Analysis%20(v4.1)/granules/20190101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1: NetCDF: Access failure

Which is the same error that I got when after > 1 hour when trying to reproduce what is done by this Matlab function (checked that it works in Matlab)

Joaquim

@Alexander-Barth
Copy link
Owner

I cannot reproduce the error on Linux:

julia> ncvar_subset = NCDatasets.@select(ds["analysed_sst"], $lonr[1] <= lon <= $lonr[2] && $latr[1] <= lat <= $latr[2]);

julia> ds_subset = NCDatasets.@select(
           ds["analysed_sst"],
           $lonr[1] <= lon <= $lonr[2] && $latr[1] <= lat <= $latr[2])
analysed_sst (101 × 101 × 1)
  Datatype:    Union{Missing, Float64}
  Dimensions:  lon × lat × time
  Attributes:
   long_name            = analysed sea surface temperature
   standard_name        = sea_surface_foundation_temperature
   units                = kelvin
   _FillValue           = -32768
   add_offset           = 298.15
   scale_factor         = 0.001
   valid_min            = -32767
   valid_max            = 32767
   comment              = \"Final\" version using Multi-Resolution Variational Analysis (MRVA) method for interpolation
   coordinates          = lon lat
   source               = MODIS_T-JPL, MODIS_A-JPL, AMSR2-REMSS, AVHRR19_G-NAVO, AVHRRMTA_G-NAVO, iQUAM-NOAA/NESDIS, Ice_Conc-OSISAF
   origname             = analysed_sst
   fullnamepath         = /analysed_sst


julia> ncvar = ds_subset["analysed_sst"];

julia> SST = ncvar[:,:,1];

julia> 

julia> size(SST)
(101, 101)

julia> typeof(SST)
Matrix{Union{Missing, Float64}} (alias for Array{Union{Missing, Float64}, 2})

julia> SST[1,1]
291.138

As you have also the error Independent of NCDatasets with ncdump, I am wondering if the error could be related to the configuration of the credentials (which might be different on Windows than on Linux).

Can you confirm that your ncvar is the subset (not the whole array which is huge 36000 × 17999)?
What is the version of ncdump?

@joa-quim
Copy link
Author

joa-quim commented Mar 18, 2024

Yes, the ncvar is a small array

julia> ncvar = ds_subset["analysed_sst"]
analysed_sst (101 × 101 × 1)
  Datatype:    Union{Missing, Float64}
  Dimensions:  lon × lat × time

and the WSL ncdump is ... well Ubuntu, lucky that it's from this decade.

netcdf library version 4.8.1 of Sep 29 2021 09:36:14 $

I thought on the credentials problem and that's why I tried WSL where setting them seems more straightforward, but it can't be it because one of my attempts used https://www.ncei.noaa.gov/thredds-ocean/dodsC/ghrsst/L4/GLOB/JPL/MUR/2023/090/20230331090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
(from that Matlab function that I linked above) and this one does not require to have an account.

Also tried NetCDF.jl and with it both sites are accessible. So my credentials settings on Windows are working too.

julia> using NetCDF

julia> url = "https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/GHRSST%20Level%204%20MUR%20Global%20Foundation%20Sea%20Surface%20Temperature%20Analysis%20(v4.1)/granules/20190101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1";

julia> SST = ncread(url,"analysed_sst", start=[16500,12500,1], count = [101,101,1])
101×101×1 Array{Int16, 3}:
[:, :, 1] =
 -7012  -7001  -6992  -6982  -6971  -6962  -6955  -6951  -6950  -6954  …  -8060  -8093  -8116  -8130  -8138  -8145  -8158  -8166  -8163

BTW, my last attempt with NCDatasets ended with (after ~1 hour)

julia> SST = ncvar[:,:,1];
Error:curl error: Timeout was reached
curl error details:
Warning:oc_open: Could not read url
ERROR: NetCDF error: NetCDF: I/O failure (NetCDF error code: -68)

@Alexander-Barth
Copy link
Owner

Which version of NCDatasets and NetCDF_jll and DiskArrays are you using?

]status --manifest NCDatasets NetCDF_jll  DiskArrays

@joa-quim
Copy link
Author

Here it is

(@v1.10) pkg> status --manifest NCDatasets NetCDF_jll  DiskArrays
Status `C:\Users\j\.julia\environments\v1.10\Manifest.toml`
⌅ [3c3547ce] DiskArrays v0.3.23
⌃ [85f8d34a] NCDatasets v0.13.2
⌅ [7243133f] NetCDF_jll v400.902.5+1

@Alexander-Barth
Copy link
Owner

Can you upgrade? I have this:

(@v1.10) pkg> status --manifest NCDatasets NetCDF_jll  DiskArrays
Status `~/.julia/environments/v1.10/Manifest.toml`
  [3c3547ce] DiskArrays v0.3.22
  [85f8d34a] NCDatasets v0.14.3 `~/.julia/dev/NCDatasets`
⌃ [7243133f] NetCDF_jll v400.902.5+1
Info Packages marked with ⌃ have new versions available and may be upgradable.


julia> using NCDatasets

julia> url = "https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/GHRSST%20Level%204%20MUR%20Global%20Foundation%20Sea%20Surface%20Temperature%20Analysis%20(v4.1)/granules/20190101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1";

julia> ds = NCDataset(url);

julia> SST = ds["analysed_sst"][16500 .+ (0:100),12500 .+ (0:100),1];

julia> SST = @time ds["analysed_sst"][16500 .+ (0:100),12500 .+ (0:100),1];
  3.473732 seconds (143 allocations: 115.523 KiB)

julia> ENV["JULIA_DEBUG"] = "NCDatasets"
"NCDatasets"


julia> SST = @time ds["analysed_sst"][16500 .+ (0:100),12500 .+ (0:100),1];
┌ Debug: nc_get_vars!: [0, 12499, 16499],[1, 101, 101],[1, 1, 1]    # <<------ should not be nc_get_var1 !!!
└ @ NCDatasets ~/.julia/dev/NCDatasets/src/netcdf_c.jl:1022
  2.675656 seconds (621.25 k allocations: 43.754 MiB, 2.90% gc time, 11.37% compilation time)

@joa-quim
Copy link
Author

OK, will do it this by the end of this afternoon (have classes now), but the problem cannot be in NetCDF_jll because the download worked with NetCDF.jl.

Did an update try but nothing changes. It's weird because it says:

Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading.

and NCDatasets is marked with ⌃, not ⌅

⌃ [85f8d34a] NCDatasets v0.13.2

@Alexander-Barth
Copy link
Owner

you can upgrade with ]add [email protected].
Please paste the output of ]status to confirm.

@joa-quim
Copy link
Author

Rasters.jl is holding back the update.

(@v1.10) pkg> add [email protected]
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package NCDatasets [85f8d34a]:
 NCDatasets [85f8d34a] log:
 ├─possible versions are: 0.3.0-0.14.3 or uninstalled
 ├─restricted to versions 0.14.3 by an explicit requirement, leaving only versions: 0.14.3
 └─restricted by compatibility requirements with Rasters [a3a2b9e3] to versions: 0.10.0-0.13.2 or uninstalled — no versions left
   └─Rasters [a3a2b9e3] log:
     ├─possible versions are: 0.1.0-0.10.1 or uninstalled
     └─restricted to versions * by an explicit requirement, leaving only versions: 0.1.0-0.10.1

To tell you a secreted, I can't stand the environments complication but I made an effort this time and created one.
But no luck either.

(v) pkg> status --manifest NCDatasets NetCDF_jll  DiskArrays
Status `C:\v\Manifest.toml`
⌅ [3c3547ce] DiskArrays v0.3.23
  [85f8d34a] NCDatasets v0.14.3
  [7243133f] NetCDF_jll v400.902.209+0
url = "https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/GHRSST%20Level%204%20MUR%20Global%20Foundation%20Sea%20Surface%20Temperature%20Analysis%20(v4.1)/granules/20190101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1";

ds = NCDataset(url);

SST = ds["analysed_sst"][16500 .+ (0:100),12500 .+ (0:100),1]
oc_open: server error retrieving url: code=? message="Error {
    code = 500;
    message = "CurlUtils::process_get_redirect_http_status() - ERROR -  I tried 3 times to access:
    https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20190101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc
I was expecting to receive an HTTP redirect code and location header in the response.
Unfortunately this did not happen.
Here are the details of the most recent transaction:

# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
HTTP Response Details
The remote service returned an HTTP status of: 502
Response Headers -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  Content-Type: application/json
  Content-Length: 36
  Connection: keep-alive
  Date: Tue, 19 Mar 2024 19:33:14 GMT
  x-amzn-RequestId: 0de8664e-5751-4a8d-83c9-df9f19fe9196
  x-amzn-Remapped-x-amzn-RequestId: 0b7321af-0bc0-4e27-bedb-952f04ab2e42
  X-XSS-Protection: 1; mode=block
  Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
  x-amzn-Remapped-Content-Length: 36
  X-Frame-Options: SAMEORIGIN
  x-amzn-ErrorType: InternalServerErrorException
  x-amzn-Remapped-Connection: keep-alive
  x-amz-apigw-id: U5FCMFpQPHcEFcg=
  x-amzn-Remapped-Server: Server
  X-Content-Type-Options: nosniff
  X-Amzn-Trace-Id: Root=1-65f9e874-7a33e3406b97ba89704fb337;Parent=0a68c6a0f7700579;Sampled=0;lineage=96eb4450:0
  x-amzn-Remapped-X-Forwarded-For: 54.190.110.130, 15.158.54.83
  x-amzn-Remapped-Date: Tue, 19 Mar 2024 19:33:14 GMT
  X-Cache: Error from cloudfront
  Via: 1.1 182d3a3dbb6658c964ee75cd45a42242.cloudfront.net (CloudFront)
  X-Amz-Cf-Pop: HIO52-P2
  X-Amz-Cf-Id: QGsUuWHhluWpxKjHpUK2MXhnFfDn4aKyf9su8dHDOis-AOMtR4mifQ==
# BEGIN Response Body -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
{"message": "Internal server error"}
# END Response Body   -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --";
}"ERROR: NetCDF error: NetCDF: file not found (NetCDF error code: -90)

@Alexander-Barth
Copy link
Owner

Alexander-Barth commented Mar 20, 2024

"Internal server error" typically indicates an error on the server (not on the client/julia side).
In any case, I can download the the data without problem (also with the current NetCDF_jll v400.902.209+0) on Linux.
The server problem is probably resolved by now.

(v1.10) pkg> status --manifest NCDatasets NetCDF_jll  DiskArrays
Status `/mnt/data1/abarth/.julia/environments/v1.10/Manifest.toml`
⌅ [3c3547ce] DiskArrays v0.3.23
  [85f8d34a] NCDatasets v0.14.3 `~/.julia/dev/NCDatasets`
  [7243133f] NetCDF_jll v400.902.209+0
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`

julia> using NCDatasets

julia> url = "https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/GHRSST%20Level%204%20MUR%20Global%20Foundation%20Sea%20Surface%20Temperature%20Analysis%20(v4.1)/granules/20190101090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1";

julia> ds = NCDataset(url);

julia> SST = ds["analysed_sst"][16500 .+ (0:100),12500 .+ (0:100),1];

julia> SST = @time ds["analysed_sst"][16500 .+ (0:100),12500 .+ (0:100),1];
  5.381840 seconds (143 allocations: 115.523 KiB)

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

Without the .netrc file, I get an Access denied. error

syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: HTTP^ Basic: Access denied.

The credentials in this file are actually used.

@joa-quim
Copy link
Author

Yes, it's working now for me too.

julia> SST = @time ds["analysed_sst"][16500 .+ (0:100),12500 .+ (0:100),1];
  3.787950 seconds (143 allocations: 115.523 KiB)

But on a related note, that site/OpenDAP is a strange thing. If I try to access to a more recent data set it errors saying wrong DAP vesion

julia> url = "https://opendap.earthdata.nasa.gov/collections/C1996881146-POCLOUD/granules/20231202090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1";

julia> ds = NCDataset(url);
syntax error, unexpected $end, expecting ';'
...
If you are using a specific DAP client like pyDAP or Panoply you may be able to signal the tool to use DAP4 by changing the protocol of the dataset_url from https:// to dap4:https://

So I tried that, and it apparently half works.

julia> url = "dap4:https://opendap.earthdata.nasa.gov/collections/C1996881146-POCLOUD/granules/20231202090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1";

julia> ds = NCDataset(url)

Dimensions
   time = 1
   lat = 17999
   lon = 36000

Variables
  time   (1)
    Datatype:    Dates.DateTime (Int32)
    Dimensions:  time
    Attributes:
     long_name            = reference time of sst field
     standard_name        = time
     axis                 = T
     units                = seconds since 1981-01-01 00:00:00 UTC
     comment              = Nominal time of analyzed fields
...

But it stalls when trying to retrieve the data (same thing happens with NetCDF.jl)

julia> SST = @time ds["analysed_sst"][16500 .+ (0:100),12500 .+ (0:100),1];
Error:curl error: Timeout was reached

@joa-quim
Copy link
Author

Guess this is over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants