Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid duplicate codes because of recodes #24

Closed
GregorDeCillia opened this issue Jan 25, 2022 · 1 comment
Closed

Avoid duplicate codes because of recodes #24

GregorDeCillia opened this issue Jan 25, 2022 · 1 comment
Assignees
Labels
Milestone

Comments

@GregorDeCillia
Copy link
Contributor

GregorDeCillia commented Jan 25, 2022

The following json file is not handled correctly by sc_table()

{
  "database" : "str:database:deenenea",
  "measures" : [ "str:statfn:deenenea:F-DATA:F-EBIL:SUM" ],
  "recodes" : {
    "str:field:deenenea:F-DATA:C-VERWEND0-0" : {
      "map" : [ 
        [ 
          "str:value:deenenea:F-DATA:C-VERWEND0-0:C-VERWEND0-0:VERWEND0-1", 
          "str:value:deenenea:F-DATA:C-VERWEND0-0:C-VERWEND0-0:VERWEND0-2" 
        ], 
        [ "str:value:deenenea:F-DATA:C-VERWEND0-0:C-VERWEND0-0:VERWEND0-1" ]
      ]
    }
  },
  "dimensions" : [ [ "str:field:deenenea:F-DATA:C-VERWEND0-0" ] ]
}

It results in duplicate codes for the field C-VERWEND0-0 which causes all kind of issues with $tabulate() because if implicit assumptions.

sc_table('test.json')$field("C-VERWEND0-0")
#> # STATcubeR metadata: 3 x 7
#>   code       label                   parsed                 
#>   <chr>      <chr>                   <chr>                  
#> 1 VERWEND0-1 Space and water heating Space and water heating
#> 2 VERWEND0-1 Space and water heating Space and water heating
#> 3 SC_TOTAL   Total                   Total                  
#> # … with 4 more columns: 'label_de', 'label_en', 'visible', 'order'

The reason for that is that the map field in the json contains several URIs and only the first URI is used to generate the code column in $field(). It should be made sure that unique codes are generated in this case, possibly by concatinating the codes of the individual uris. A fixed version might create a field definition like this

sc_table('test.json')$field("C-VERWEND0-0")
#> # STATcubeR metadata: 3 x 7
#>   code                  label                   parsed                 
#>   <chr>                 <chr>                   <chr>                  
#> 1 VERWEND0-1;VERWEND0-2 Space and water heating Space and water heating
#> 2 VERWEND0-1            Space and water heating Space and water heating
#> 3 SC_TOTAL              Total                   Total                  
#> # … with 4 more columns: 'label_de', 'label_en', 'visible', 'order'

Time variables, should be converted to type category in this case, with a warning. Labels could also be concatenated. However, this would lead to very long labels which might not be ideal.

@GregorDeCillia GregorDeCillia added this to the Version 1.0 milestone Jan 25, 2022
@GregorDeCillia GregorDeCillia self-assigned this Jan 25, 2022
@GregorDeCillia GregorDeCillia changed the title Avoid duplicate codes Avoid duplicate codes because of recodes Jan 25, 2022
@GregorDeCillia GregorDeCillia pinned this issue Jan 25, 2022
@GregorDeCillia
Copy link
Contributor Author

Multiple codes and labels are now concatinated using a semicolon. In the above example, the first classification (useful energy category) of is now parsed as follows

  • label: Space and water heating;Process heat <200 °C
  • code: VERWEND0-1;VERWEND0-2
sc_table('test.json')$field("C-VERWEND0-0")
#> # STATcubeR metadata: 3 x 7
#>   code                  label                                        parsed                                      
#>   <chr>                 <chr>                                        <chr>                                       
#> 1 VERWEND0-1;VERWEND0-2 Space and water heating;Process heat <200 °C Space and water heating;Process heat <200 °C
#> 2 VERWEND0-1            Space and water heating                      Space and water heating                     
#> 3 SC_TOTAL              Total                                        Total                                       
#> # … with 4 more columns: 'label_de', 'label_en', 'visible', 'order'

@GregorDeCillia GregorDeCillia unpinned this issue Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant