-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entity names #19
Comments
Usually, variables are gases, in which case it would make sense to use the same capitalization as openscm (e.g. CO2). What other entities are there? Gas baskets like F-gases and population, right? |
there will be a lot of economical variables (different GDP variants, etc) |
I agree on the variables available in openscm. But openscm doesn't have the baslets, right? |
Nope, no baskets in openscm. |
Is there some other standard (maybe from the IIASA universe) that we can follow? If not, we have to write one ourselves, but it would be less work if there is already something. (-: |
I don't know for sure, but I think the IIASA databases have some standard. Though I doubt it's described somewhere |
From pyam, I found this "standard": https://data.ene.iiasa.ac.at/database/ FAOstat has entity lists, but they use codes instead of shorter names, which I think is pretty user-hostile (hey there, here you got data for Maybe we should make our own list? If so, then what should our rules be? Use normal english capitalization rules, so that we end up with |
I think that in case we use population, kyotoghg, F-gases, I would be for KyotoGHG (even though it is not great...) And I also would vote against using codes as in FAOstat. |
Would it make sense to have a list with normal English capitalization rules, but then convert it to uppercase for internal use, so that errors due to the wrong capitalization are not leading to a program breakdown? |
The world bank also has an entity list, but I don't know if we want to use it, e.g. @AnnGuenther: Do you have a Reason for KyotoGHG? Just because Kyoto is a name and GHG is an abbreviation and therefore english capitalization rules yield KyotoGHG, or for another reason? |
No other reason, just the ones you listed. |
I don't really like silently correcting e.g. capitalization. A |
I've started adding entity names to a terminology over in climate_categories. So far, there are only emission rates of "gases" from openscm_units ("gases" is wrong here, because e.g. black carbon is not a gas, but what other word is more correct here?), but maybe you can have a look if the level of detail and general idea seems good. its all bundled in a PR: pik-primap/climate_categories#1 The definition is here: https://github.com/pik-primap/climate_categories/pull/1/files#diff-c28a5ab1cbffcb57d64c46d658e69f373450cce100d9a8f70c72b89648a45f16 Maybe we can continue the discussion in the pull request. |
(climate) forcers or drivers instead of gases? |
Did you take a look at the https://github.com/openENTRANCE/nomenclature project? Not sure whether it's as wordy as the @danielhuppmann is pretty keen on interop so he might have ideas. |
Thanks for looping me in @rgieseke - had a look at the discussion so far and the referenced PR. Not sure whether I understand the objective here, but two (more concrete) references to related work.
|
Hi, thanks for chiming in! Information for context: There are two things happening in primap2 land at the moment:
I had a look at the openENTRANCE/nomenclature project before embarking on building an own package, but as far as I could tell from the available documentation, the goal is different there. E.g., there is no hierarchy of IPCC1996 and IPCC2006 categories and I also would not be sure how it fits in your format (would category That said, we can look if we can re-use some of the definitions of openENTRANCE/nomenclature for primap2. Cheers, Mika |
Thanks for the context! Don't want to overload this conversation, so my response is as concise as possible - and let's have a follow-up (spoken) discussion somewhere else if there is interest... The goal of the openENTRANCE nomenclature:
Re your question about 1.A.3.b.iii, I would implement in our yaml lingo as
You should also take a look at the OpenEnergyOntology (h/t @Ludee & @christian-rli) - they use a formal ontology framework to write their definitions and interrelations... |
@danielhuppmann alerted me to this issue. To pick up on one point:
This is broader than the World Bank; it reflects the use of SDMX (https://sdmx.org/?page_id=5008, https://datahelpdesk.worldbank.org/knowledgebase/articles/1886701-sdmx-api-queries) which provides an information model that can cover most climate/energy/etc. use cases (at least, all that I've seen). A key like As briefly as possible:
So a different key/composite like Publishing and referring to such code lists is, IMO, much better than trying to cram all metadata into labels on every data set. Over at transportenergy/database#62 we're trying to take this approach, namely:
After having done so, it's certainly possible to:
But it's also possible to handle data in its original dimensions (one per distinct concept), or (as analysis requires) to restore those dimensions when receiving data that's labeled with a collapsed "variable name". Apologies for a long comment! |
@khaeru |
@mikapfl sorry, I should have included that URL: https://datahelpdesk.worldbank.org/knowledgebase/articles/201175-how-does-the-world-bank-code-its-indicators To be clear, the World Bank uses these internally, but does not publish separate SDMX code lists for the constituent parts, because they don't intend to publish data for/support general public usage of all combinations. Instead (last URL in my first comment) they provide a code list called "SERIES" that includes some of these composite codes but also others, based on other schemes. To expand a little on my point about "collapsing": for instance, if data (e.g. for a measure like <id=EMI, name=Emissions>) has conceptual dimensions like "Species" (coded as <id=CO2; name=Carbon Dioxide>, <id=N2O>, etc.) and "Sector" (coded as <id=T, name=Transport>, <id=A, name=Agriculture>, etc.):
…then, one defines a new code list "VARIABLE" using a simple & transparent algorithm, e.g.: for measure, species, sector in product(…):
# Mixing IDs and names is fine, according to need, as long as we're clear what is done
id = f"{measure.name}|{species.id}|{sector.id}"
name = f"{measure.name} of {species.name.lower()} from {sector.name.lower()}"
# Store the mapping to full dimensions in the description; this could be done in several ways
description=f'{MEASURE="measure.id", SPECIES="{species.id}", SECTOR="{sector.id}"
# (create and store a code) …giving:
Then publish the data with 3 distinct conceptual dimensions collapsed to 1:
…and the VARIABLE codelist, which includes all the information needed for users to restore the actual dimensions, if they want. This is what we see from the World Bank: Among other reasons, I think this approach can cover the common case in energy/climate where we include multiple measures in the same data set for which different concepts/dimensions are relevant. (For instance, the "Species" concept/dimension is relevant for the "Emissions" measure, but not for "Population".) Other solutions I've seen include (a) add many columns for every dimension relevant to any one measure (overkill) and (b) split to distinct data sets/data flows, one for each measure, with the appropriate dimensions for each (SDMX does support this, but I realize it's beyond capacity for most of us at this moment). |
Additionally to a convention on names it would be great to have lists of other names for the entities as e.g. f-gases often have several names referring to the same gas which each have different notations. |
I'm wondering if we want to have a standard for variable names. In PRIMAP1 it's all upper case letters. For PRIMAP2 we have specified a way to add GWP information to variable names, but no convention for the variables themselves. I think all uppercase is sometimes hard to read. I think we should have a specification to simplify running code on different datasets.
The text was updated successfully, but these errors were encountered: