-
Notifications
You must be signed in to change notification settings - Fork 38
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
notebook to store data in json files (#604)
* notebook to store data in json files * actual notebook
- Loading branch information
1 parent
22dc990
commit 151a0a9
Showing
6 changed files
with
270 additions
and
12 deletions.
There are no files selected for viewing
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,201 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Writing Tables Into Re-Usable Json Files\n", | ||
"\n", | ||
"This notebook demonstrate how to use PMP's Json class to write easily parsable and reusable json files. See [this notebook](ReadInJsonFiles.ipynb) to see how to take advantage of this json format.\n", | ||
"\n", | ||
"## Key Concepts\n", | ||
"\n", | ||
"\n", | ||
"### Structure\n", | ||
"\n", | ||
"This essentialy helps storing possibly complex tables into a json format that can later be easily parsed back into cdms/numpy variables.\n", | ||
"\n", | ||
"The idea is that the user ran a set of metrics looping over different parameters and wants to store these results\n", | ||
"\n", | ||
"For example for a given set of ***models***, loop through a given set of ***variables*** and for each variable compute a set of ***statitics***.\n", | ||
"\n", | ||
"`model`, `variable` and `statistic` would represent what the call the json file's **structure**\n", | ||
"\n", | ||
"Another example is to loop through model and realizations test against a set of references loop through modes and seasons to produce a statistic\n", | ||
"\n", | ||
"Here the structure would be:\n", | ||
"\n", | ||
"`model`, `realization`, `reference`, `mode`, `season`, `statistic`\n", | ||
"\n", | ||
"A python code to generate this would probably look similar to this:\n", | ||
"\n", | ||
"```python\n", | ||
"for model in [\"A\", \"B\", \"C\"]:\n", | ||
" for realization in [\"a\", \"b\", \"c\", \"d\"]:\n", | ||
" for reference in [\"ref1\", \"ref2\"]:\n", | ||
" for mode in [\"NAM\", \"NAO\", \"NPGO\", \"PDO\", \"PNA\"]:\n", | ||
" for season in [\"DJF\", \"JJA\", \"MAM\"]:\n", | ||
" for stat in [\"rms\", \"average\"]:\n", | ||
" value = compute_some_stat(model, realization, reference, mode, season, stat)\n", | ||
"```\n", | ||
"\n", | ||
"### Dictionary\n", | ||
"\n", | ||
"If stored in an array the final shape would be: `(3,4,2, 5, 3, 2)` which is 720 values\n", | ||
"\n", | ||
"But in reality maybe for each mode the user runs a different set of statistics these can also depend on the variable. Storing this in an array would end up with a lot of missing values. This is not necessary when using dictionaries.\n", | ||
"\n", | ||
"(If your data comes as a cdms2 variable, our package comes with a utility function to convert it back to a dictionary)\n", | ||
"\n", | ||
"\n", | ||
"As described above the \"Structure\" defines what each layer of keys represent\n", | ||
"\n", | ||
"In the example above to access the first value one would do:\n", | ||
"\n", | ||
"```python\n", | ||
"\n", | ||
"value = results[\"A\"][\"a\"][\"ref1\"][\"NAM\"][\"DJF\"][\"rms\"]\n", | ||
"\n", | ||
"```\n", | ||
"\n", | ||
"Additional the \"results\" are expected to be in a filed named \"RESULTS\"\n", | ||
"\n", | ||
"## Example\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"INFO::2019-05-23 14:00::pcmdi_metrics:: Results saved to a json file: /1TB/git/pcmdi_metrics/doc/jupyter/Jsons/myfile.json\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"results = {\"RESULTS\": {\"A\": {\"rms\": .2, \"mean\":.5}, \"B\": {\"mean\":.123, \"rms\": .67}}}\n", | ||
"\n", | ||
"import pcmdi_metrics\n", | ||
"\n", | ||
"out = pcmdi_metrics.io.base.Base(\".\", \"myfile.json\")\n", | ||
"out.write(results, json_structure=[\"model\", \"Statisitc\"])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"{\"RESULTS\": {\"A\": {\"rms\": 0.2, \"mean\": 0.5}, \"B\": {\"mean\": 0.123, \"rms\": 0.67}},\n", | ||
" \"json_version\": 3.0, \"json_structure\": [\"model\", \"Statisitc\"], \"provenance\": {\"\n", | ||
"platform\": {\"OS\": \"Linux\", \"Version\": \"4.15.0-50-generic\", \"Name\": \"drdoom\"}, \"u\n", | ||
"serId\": \"doutriaux1\", \"osAccess\": false, \"commandLine\": \"/1Tb/miniconda3/envs/ju\n", | ||
"pyter-vcdat/lib/python3.6/site-packages/ipykernel_launcher.py -f /run/user/1000/\n", | ||
"jupyter/kernel-76cecce7-1761-432d-915f-fc0bfd45647d.json\", \"date\": \"2019-05-23 1\n", | ||
"4:00:21\", \"conda\": {}, \"packages\": {}, \"openGL\": {\"GLX\": {\"server\": {}, \"client\"\n", | ||
": {}}}}}\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"!more myfile.json" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"J = pcmdi_metrics.io.base.JSONs(files=[\"myfile.json\",], oneVariablePerFile=False)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[ id: model\n", | ||
" Length: 2\n", | ||
" First: A\n", | ||
" Last: B\n", | ||
" Python id: 0x7f18d1163a90, id: Statisitc\n", | ||
" Length: 2\n", | ||
" First: mean\n", | ||
" Last: rms\n", | ||
" Python id: 0x7f18d1163160]" | ||
] | ||
}, | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"J.getAxisList()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"variable_5\n", | ||
"masked_array(\n", | ||
" data=[[0.5 , 0.2 ],\n", | ||
" [0.123, 0.67 ]],\n", | ||
" mask=False,\n", | ||
" fill_value=1e+20)" | ||
] | ||
}, | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"J()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"data_variable_file_paths": {}, | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.3" | ||
}, | ||
"selected_variables": [], | ||
"variable_source_names": {}, | ||
"vcdat_file_path": "", | ||
"vcdat_loaded_variables": [] | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
# init for pcmdi_metrics.io | ||
from . import base # noqa | ||
from .base import MV2Json # noqa |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
__version__ = 'v1.2' | ||
__git_tag_describe__ = 'v1.2-45-g6fef135' | ||
__git_sha1__ = '6fef1358acba0e4c5617143fbf2fe25ad4e0f406' | ||
__git_tag_describe__ = 'v1.2-50-gef54524' | ||
__git_sha1__ = 'ef54524c9a3845afadc9f1312393d0f68734a4be' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
import unittest | ||
from pcmdi_metrics.io import MV2Json | ||
import MV2 | ||
import cdms2 | ||
|
||
|
||
class TestMV2Json(unittest.TestCase): | ||
def test2D(self): | ||
a = MV2.array(range(6)) | ||
a = MV2.resize(a, (2, 3)) | ||
ax1 = cdms2.createAxis(["A", "B"], id="UPPER") | ||
ax2 = cdms2.createAxis(["a", "b", "c"], id="lower") | ||
a.setAxis(0, ax1) | ||
a.setAxis(1, ax2) | ||
jsn, struct = MV2Json(a) | ||
self.assertEqual( | ||
jsn, {'A': {'a': 0, 'b': 1, 'c': 2}, 'B': {'a': 3, 'b': 4, 'c': 5}}) | ||
self.assertEqual(struct, ['UPPER', 'lower']) | ||
|
||
def test3D(self): | ||
self.maxDiff = None | ||
a = MV2.array(range(24)) | ||
a = MV2.resize(a, (2, 4, 3)) | ||
ax1 = cdms2.createAxis(["A", "B"], id="UPPER") | ||
ax2 = cdms2.createAxis(["1", "2", "3", "4"], id="numbers") | ||
ax3 = cdms2.createAxis(["a", "b", "c"], id="lower") | ||
a.setAxis(0, ax1) | ||
a.setAxis(1, ax2) | ||
a.setAxis(2, ax3) | ||
jsn, struct = MV2Json(a) | ||
self.assertEqual(jsn, {'A': {'1': {'a': 0, 'b': 1, 'c': 2}, | ||
'2': {'a': 3, 'b': 4, 'c': 5}, | ||
'3': {'a': 6, 'b': 7, 'c': 8}, | ||
'4': {'a': 9, 'b': 10, 'c': 11}}, | ||
'B': {'1': {'a': 12, 'b': 13, 'c': 14}, | ||
'2': {'a': 15, 'b': 16, 'c': 17}, | ||
'3': {'a': 18, 'b': 19, 'c': 20}, | ||
'4': {'a': 21, 'b': 22, 'c': 23}}}) | ||
|
||
self.assertEqual(struct, ['UPPER', 'numbers', 'lower']) |