Skip to content

Commit

Permalink
notebook to store data in json files (#604)
Browse files Browse the repository at this point in the history
* notebook to store data in json files

* actual notebook
  • Loading branch information
doutriaux1 authored May 30, 2019
1 parent 22dc990 commit 151a0a9
Show file tree
Hide file tree
Showing 6 changed files with 270 additions and 12 deletions.
File renamed without changes.
201 changes: 201 additions & 0 deletions doc/jupyter/Jsons/WriteToJson.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Writing Tables Into Re-Usable Json Files\n",
"\n",
"This notebook demonstrate how to use PMP's Json class to write easily parsable and reusable json files. See [this notebook](ReadInJsonFiles.ipynb) to see how to take advantage of this json format.\n",
"\n",
"## Key Concepts\n",
"\n",
"\n",
"### Structure\n",
"\n",
"This essentialy helps storing possibly complex tables into a json format that can later be easily parsed back into cdms/numpy variables.\n",
"\n",
"The idea is that the user ran a set of metrics looping over different parameters and wants to store these results\n",
"\n",
"For example for a given set of ***models***, loop through a given set of ***variables*** and for each variable compute a set of ***statitics***.\n",
"\n",
"`model`, `variable` and `statistic` would represent what the call the json file's **structure**\n",
"\n",
"Another example is to loop through model and realizations test against a set of references loop through modes and seasons to produce a statistic\n",
"\n",
"Here the structure would be:\n",
"\n",
"`model`, `realization`, `reference`, `mode`, `season`, `statistic`\n",
"\n",
"A python code to generate this would probably look similar to this:\n",
"\n",
"```python\n",
"for model in [\"A\", \"B\", \"C\"]:\n",
" for realization in [\"a\", \"b\", \"c\", \"d\"]:\n",
" for reference in [\"ref1\", \"ref2\"]:\n",
" for mode in [\"NAM\", \"NAO\", \"NPGO\", \"PDO\", \"PNA\"]:\n",
" for season in [\"DJF\", \"JJA\", \"MAM\"]:\n",
" for stat in [\"rms\", \"average\"]:\n",
" value = compute_some_stat(model, realization, reference, mode, season, stat)\n",
"```\n",
"\n",
"### Dictionary\n",
"\n",
"If stored in an array the final shape would be: `(3,4,2, 5, 3, 2)` which is 720 values\n",
"\n",
"But in reality maybe for each mode the user runs a different set of statistics these can also depend on the variable. Storing this in an array would end up with a lot of missing values. This is not necessary when using dictionaries.\n",
"\n",
"(If your data comes as a cdms2 variable, our package comes with a utility function to convert it back to a dictionary)\n",
"\n",
"\n",
"As described above the \"Structure\" defines what each layer of keys represent\n",
"\n",
"In the example above to access the first value one would do:\n",
"\n",
"```python\n",
"\n",
"value = results[\"A\"][\"a\"][\"ref1\"][\"NAM\"][\"DJF\"][\"rms\"]\n",
"\n",
"```\n",
"\n",
"Additional the \"results\" are expected to be in a filed named \"RESULTS\"\n",
"\n",
"## Example\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO::2019-05-23 14:00::pcmdi_metrics:: Results saved to a json file: /1TB/git/pcmdi_metrics/doc/jupyter/Jsons/myfile.json\n"
]
}
],
"source": [
"results = {\"RESULTS\": {\"A\": {\"rms\": .2, \"mean\":.5}, \"B\": {\"mean\":.123, \"rms\": .67}}}\n",
"\n",
"import pcmdi_metrics\n",
"\n",
"out = pcmdi_metrics.io.base.Base(\".\", \"myfile.json\")\n",
"out.write(results, json_structure=[\"model\", \"Statisitc\"])"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\"RESULTS\": {\"A\": {\"rms\": 0.2, \"mean\": 0.5}, \"B\": {\"mean\": 0.123, \"rms\": 0.67}},\n",
" \"json_version\": 3.0, \"json_structure\": [\"model\", \"Statisitc\"], \"provenance\": {\"\n",
"platform\": {\"OS\": \"Linux\", \"Version\": \"4.15.0-50-generic\", \"Name\": \"drdoom\"}, \"u\n",
"serId\": \"doutriaux1\", \"osAccess\": false, \"commandLine\": \"/1Tb/miniconda3/envs/ju\n",
"pyter-vcdat/lib/python3.6/site-packages/ipykernel_launcher.py -f /run/user/1000/\n",
"jupyter/kernel-76cecce7-1761-432d-915f-fc0bfd45647d.json\", \"date\": \"2019-05-23 1\n",
"4:00:21\", \"conda\": {}, \"packages\": {}, \"openGL\": {\"GLX\": {\"server\": {}, \"client\"\n",
": {}}}}}\n"
]
}
],
"source": [
"!more myfile.json"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"J = pcmdi_metrics.io.base.JSONs(files=[\"myfile.json\",], oneVariablePerFile=False)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[ id: model\n",
" Length: 2\n",
" First: A\n",
" Last: B\n",
" Python id: 0x7f18d1163a90, id: Statisitc\n",
" Length: 2\n",
" First: mean\n",
" Last: rms\n",
" Python id: 0x7f18d1163160]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"J.getAxisList()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"variable_5\n",
"masked_array(\n",
" data=[[0.5 , 0.2 ],\n",
" [0.123, 0.67 ]],\n",
" mask=False,\n",
" fill_value=1e+20)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"J()"
]
}
],
"metadata": {
"data_variable_file_paths": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
},
"selected_variables": [],
"variable_source_names": {},
"vcdat_file_path": "",
"vcdat_loaded_variables": []
},
"nbformat": 4,
"nbformat_minor": 2
}
1 change: 1 addition & 0 deletions pcmdi_metrics/io/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# init for pcmdi_metrics.io
from . import base # noqa
from .base import MV2Json # noqa
36 changes: 26 additions & 10 deletions pcmdi_metrics/io/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import cdms2
import hashlib
import numpy
import collections
from collections import OrderedDict, Mapping
import pcmdi_metrics
import cdp.cdp_io
import subprocess
Expand All @@ -33,6 +33,23 @@
basestring = str


# Convert cdms MVs to json
def MV2Json(data, dic={}, struct=None):
if struct is None:
struct = []
if not isinstance(data, cdms2.tvariable.TransientVariable) and dic != {}:
raise RuntimeError("MV2Json needs a cdms2 transient variable as input")
if not isinstance(data, cdms2.tvariable.TransientVariable):
return data, struct # we reach the end
else:
axis = data.getAxis(0)
if axis.id not in struct:
struct.append(axis.id)
for i, name in enumerate(axis):
dic[name], _ = MV2Json(data[i], {}, struct)
return dic, struct


# Group merged axes
def groupAxes(axes, ids=None, separator="_"):
if ids is None:
Expand All @@ -56,7 +73,7 @@ def groupAxes(axes, ids=None, separator="_"):
# cdutil region object need a serializer
def update_dict(d, u):
for k, v in u.items():
if isinstance(v, collections.Mapping):
if isinstance(v, Mapping):
r = update_dict(d.get(k, {}), v)
d[k] = r
else:
Expand Down Expand Up @@ -88,9 +105,9 @@ def populate_prov(prov, cmd, pairs, sep=None, index=1, fill_missing=False):


def generateProvenance():
prov = collections.OrderedDict()
prov = OrderedDict()
platform = os.uname()
platfrm = collections.OrderedDict()
platfrm = OrderedDict()
platfrm["OS"] = platform[0]
platfrm["Version"] = platform[2]
platfrm["Name"] = platform[1]
Expand All @@ -110,7 +127,7 @@ def generateProvenance():
prov["osAccess"] = bool(os.access('/', os.W_OK) * os.access('/', os.R_OK))
prov["commandLine"] = " ".join(sys.argv)
prov["date"] = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
prov["conda"] = collections.OrderedDict()
prov["conda"] = OrderedDict()
pairs = {
'Platform': 'platform ',
'Version': 'conda version ',
Expand Down Expand Up @@ -140,7 +157,7 @@ def generateProvenance():
'vcs': 'vcs ',
'vtk': 'vtk-cdat ',
}
prov["packages"] = collections.OrderedDict()
prov["packages"] = OrderedDict()
populate_prov(prov["packages"], "conda list", pairs, fill_missing=None)
pairs = {
'vcs': 'vcs-nox ',
Expand All @@ -159,11 +176,11 @@ def generateProvenance():
"version": "OpenGL version string",
"shading language version": "OpenGL shading language version string",
}
prov["openGL"] = collections.OrderedDict()
prov["openGL"] = OrderedDict()
populate_prov(prov["openGL"], "glxinfo", pairs, sep=":", index=-1)
prov["openGL"]["GLX"] = {
"server": collections.OrderedDict(),
"client": collections.OrderedDict()}
"server": OrderedDict(),
"client": OrderedDict()}
pairs = {
"version": "GLX version",
}
Expand Down Expand Up @@ -294,7 +311,6 @@ def write(self, data, type='json', *args, **kwargs):
data["json_structure"] = json_structure
f = open(file_name, 'w')
data["provenance"] = generateProvenance()
# data["user_notes"] = "BLAH"
json.dump(data, f, cls=CDMSDomainsEncoder, *args, **kwargs)
f.close()

Expand Down
4 changes: 2 additions & 2 deletions pcmdi_metrics/version.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
__version__ = 'v1.2'
__git_tag_describe__ = 'v1.2-45-g6fef135'
__git_sha1__ = '6fef1358acba0e4c5617143fbf2fe25ad4e0f406'
__git_tag_describe__ = 'v1.2-50-gef54524'
__git_sha1__ = 'ef54524c9a3845afadc9f1312393d0f68734a4be'
40 changes: 40 additions & 0 deletions tests/test_pmp_mv2json.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
import unittest
from pcmdi_metrics.io import MV2Json
import MV2
import cdms2


class TestMV2Json(unittest.TestCase):
def test2D(self):
a = MV2.array(range(6))
a = MV2.resize(a, (2, 3))
ax1 = cdms2.createAxis(["A", "B"], id="UPPER")
ax2 = cdms2.createAxis(["a", "b", "c"], id="lower")
a.setAxis(0, ax1)
a.setAxis(1, ax2)
jsn, struct = MV2Json(a)
self.assertEqual(
jsn, {'A': {'a': 0, 'b': 1, 'c': 2}, 'B': {'a': 3, 'b': 4, 'c': 5}})
self.assertEqual(struct, ['UPPER', 'lower'])

def test3D(self):
self.maxDiff = None
a = MV2.array(range(24))
a = MV2.resize(a, (2, 4, 3))
ax1 = cdms2.createAxis(["A", "B"], id="UPPER")
ax2 = cdms2.createAxis(["1", "2", "3", "4"], id="numbers")
ax3 = cdms2.createAxis(["a", "b", "c"], id="lower")
a.setAxis(0, ax1)
a.setAxis(1, ax2)
a.setAxis(2, ax3)
jsn, struct = MV2Json(a)
self.assertEqual(jsn, {'A': {'1': {'a': 0, 'b': 1, 'c': 2},
'2': {'a': 3, 'b': 4, 'c': 5},
'3': {'a': 6, 'b': 7, 'c': 8},
'4': {'a': 9, 'b': 10, 'c': 11}},
'B': {'1': {'a': 12, 'b': 13, 'c': 14},
'2': {'a': 15, 'b': 16, 'c': 17},
'3': {'a': 18, 'b': 19, 'c': 20},
'4': {'a': 21, 'b': 22, 'c': 23}}})

self.assertEqual(struct, ['UPPER', 'numbers', 'lower'])

0 comments on commit 151a0a9

Please sign in to comment.