Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-model statistics on changed standard names through CMIP projects #1985

Open
bettina-gier opened this issue Mar 22, 2023 · 9 comments
Open
Labels
bear at a dinner party Something very unexpected bug Something isn't working

Comments

@bettina-gier
Copy link
Contributor

Describe the bug
I'm a walking bugfinder right now =D
I wanted to apply the multi_model_statistics preprocessor on gpp obs data for a "multi-obs mean", but encountered an error with the standard name

ValueError: Multi-model statistics failed to merge input cubes into a single array:
0: gross_primary_productivity_of_carbon / (kg m-2 s-1) (time: 72; latitude: 90; longitude: 180)
1: gross_primary_productivity_of_biomass_expressed_as_carbon / (kg m-2 s-1) (time: 72; latitude: 90; longitude: 180)
  cube.standard_name differs: 'gross_primary_productivity_of_carbon' != 'gross_primary_productivity_of_biomass_expressed_as_carbon'
  cube.long_name differs: 'Carbon Mass Flux out of Atmosphere due to Gross Primary Production on Land' != 'Carbon Mass Flux out of Atmosphere Due to Gross Primary Production on Land [kgC m-2 s-1]'

Turns out the standard name of gpp is different in CMIP5 and CMIP6
The CMIP5 standard name is considered an official alias according to the cf conventions, so it isn't wrong.

This occurs because two of the datasets are OBS while another is OBS6. As the OBS6 dataset isn't added yet, I've made a small example recipe just using CMIP5 and CMIP6 data to reproduce the error:

Example recipe
# ESMValTool
---
documentation:

  title: test
  description: test
  authors:
    - gier_bettina

preprocessors:

  test:
    regrid:
      target_grid: 2x2
      scheme: linear
    multi_model_statistics:
      span: overlap
      statistics: [mean]

diagnostics:

  diag_test:
    variables:
      gpp:
        preprocessor: test
        mip: Lmon
        project: CMIP6
        exp: historical
        ensemble: r1i1p1f1
        grid: gn
        start_year: 2000
        end_year: 2005
        additional_datasets:
          - {dataset: ACCESS-ESM1-5}
          - {dataset: BNU-ESM, project: CMIP5, ensemble: r1i1p1}
    scripts: null

I assume this is more of an iris issue and I guess we should forward it to them to ask if they will allow the use of different standard_names on cube merges if they are official aliases @ESMValGroup/technical-lead-development-team? @zklaus probably interesting for you as our standards person!

In my current case I was using the multi_model_statistics in the diagnostic so I did a quick and dirty workaround, just wanted to raise this issue to make it known!

@bettina-gier bettina-gier added bug Something isn't working bear at a dinner party Something very unexpected labels Mar 22, 2023
@zklaus
Copy link

zklaus commented Mar 23, 2023

Thanks for the report, @bettina-gier! If we defer this to Iris, the only solution I can see is to go with the standard name that is not an alias, in case several standard names appear. But I think we should think about a solution within ESMValTool as well. After all, this is not only about the standard name, but also about the long name and, more generally, about changed variables from different sources.

Perhaps we can have a "target project" (the project of the first dataset, the project of the majority of datasets, a user specified option, ...) and standardize all data to that? I'll also feed this upstream into the CMIP7 Task Teams for the next generation data request.

@valeriupredoi
Copy link
Contributor

thanks a lot @bettina-gier amd @zklaus 🍺 Tina, I am a bit confused since you mention OBS and OBS6 data, but your example recipe, and inherent issue describes a CMIP5 vs CMIP6 problem - if it's OBS-related then I wouldn't expect that to be an issue since we use the same standards to cmorize OBS data (I believe it is CMIP6 standards, but I might be wrong); if it's a CMIP-data-related issue then we have fixes for that, and by that I mean the on-the-fly CMIP5-vs-CMIP6 dictionary lookup and fixing, so either way I think it's something we can do to repair.

I'll also feed this upstream into the CMIP7 Task Teams for the next generation data request.

@zklaus that is indeed the root of our issues, please shake them all into (one standard) shape, holler back if you need muscle for that 😁

@bettina-gier
Copy link
Contributor Author

thanks a lot @bettina-gier amd @zklaus beer Tina, I am a bit confused since you mention OBS and OBS6 data, but your example recipe, and inherent issue describes a CMIP5 vs CMIP6 problem - if it's OBS-related then I wouldn't expect that to be an issue since we use the same standards to cmorize OBS data (I believe it is CMIP6 standards, but I might be wrong); if it's a CMIP-data-related issue then we have fixes for that, and by that I mean the on-the-fly CMIP5-vs-CMIP6 dictionary lookup and fixing, so either way I think it's something we can do to repair.

The currently available datasets for gpp are in the OBS project, which use the CMIP5 standard name, while the new data @schlunma and me added is in the OBS6 standard using the CMIP6 cmor data and thus CMIP6 standard name. I just used CMIP5 + CMIP6 model data in my example recipe as our OBS6 dataset isn't in the tool yet and the issue is the same. Sure we could remake all current OBS data into OBS6 data but we'd still encounter this problem if we ever want to merge cubes from the different CMIP projects for gpp.

@valeriupredoi
Copy link
Contributor

gotcha! @remi-kazeroni needs a summon here - my take would be to have CMIP5-standards OBS data transferred to CMIP6 standards; about the issue not working for the CMIP5/6 data case, am a bit surprised, since we have that mapping dict in and that should be done on the fly - maybe we need to add this there, @sloosvel needs a summon here then 😁

@bettina-gier
Copy link
Contributor Author

bettina-gier commented Mar 23, 2023

as far as I recall our current mapping is for changed short names but the standard names were the same, what I found is changed standard names with the same short name. So we might need to enlarge our mapping dict? I haven't checked what other variables are affected by this.

@valeriupredoi
Copy link
Contributor

maybe we need everyone 😁
everyone

@bouweandela
Copy link
Member

bouweandela commented Mar 28, 2023

Technically, aligning the names of variables and coordinates to a single project could be implemented as a preprocessor function, in the recipe you could then specify the name of the target dataset (or project?) as an argument to the function. We would need to take a bit of care to do this in a safe way though, i.e. don't allow renaming just anything to anything.

@LisaBock
Copy link
Contributor

LisaBock commented May 2, 2024

When trying to finalize the PR with the diagnostics from my last paper I came up with the same problem.

ValueError: Multi-model statistics failed to merge input cubes into a single array:
cube.standard_name differs: 'atmosphere_cloud_ice_content' != 'atmosphere_mass_content_of_cloud_ice'

For the clivi variable there are again different standard_names for CMIP5 and CMIP6. And I used models from both projects for my study.

Is someone working on a solution right now? Theoretically I promised to provide the code as soon as possible in the ESMValTool.

@schlunma
Copy link
Contributor

schlunma commented May 2, 2024

There's an open PR in iris about handling standard name aliases: SciTools/iris#5313. This could be useful for us here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bear at a dinner party Something very unexpected bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants