-
Notifications
You must be signed in to change notification settings - Fork 3
/
kv.ijs
238 lines (213 loc) · 18.8 KB
/
kv.ijs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
coclass 'kv'
assertC =: 2 : '] [ v (13!:8^:((0 e. ])`(12"_))) u'
lr_z_ =: 3 : '5!:5 < ''y'''
NB. utilities to clean/modify/process data
mdef =: 2 : 'n&u : u' NB. monad default parameter n to dyad u
dtb =: ] #~ +./\.@:~: mdef ' ' NB. used to clean fills, if data gets dirty before turning into boxed. works with numeric fills fill atom is dyad param.
cut =: ([: -.&a: <;._2@,~) mdef ' '
numerify =: 0&".^:(2 = 3!:0)@]
linearize =: (, $~ 1 -.~ $)@]
maybenum =: 0&".^:(] -:&linearize ":@:numerify)@dltb NB. for mixed string/num boxed values will convert strings that can be into numbers.
standardnull =: (''"_)^:(0 e. $)
kvkeys =: G0 =: 0 {:: ]
kvvals =: G1 =: 1 {:: ]
G2 =: 2 {:: ]
ss =: {{ ' '&cut m }}
bb =: 1 : 'dltb each (''`''&cut) m'
Note 'kv/dictionary definition'
A datastructure such that when provided with a list of keys a get function will retrieve
the latest value associated with that key as a result of latest set/add/del operations that could have modified that associated value.
If a requested retrival key has no associated value, then nothing is returned,
where in a list of requested keys, nothing combined with other requested values is the list of the other values.
If a user only uses set/del modifiers after loading or optimizing to a unique dictionary, then the dictionary will remain in unique state.
a user may use special add1 function to place kv in non-unique state. get and filter will still operate as if unique.
in non unique mode, filtall and delall will affect multiple values. set/del/get operate on last key occurrence + related value.
keys are stored as a table of symbols. Values as a table of numeric, string, or boxed values. Table width for simplest keys and values is 1.
This allows multikey keys or metainfo after first key. Allows inverted table or associated array datastructures.
Boxed values permit embedded keyed dictionaries.
keyed data/value access is similar to J locale access to data/functions.
An embedded dictionary is an association of data variables that replaces use for a locale/datastructure.
A dictionary or dictionary hierarchy is a single J entity that can be serialized/deserialized with 3!:1 , 3!:2
)
Note 'basic operation'
(key0;key1...) kv val1;val2... creates dictionary. keys can also be a space separated string, or symbols. # of keys and values must match
a dictionary is 2 boxes. left is keys as symbols, right is values. Both are table shaped.
(key0;key1...) kvget dictionary... retrieves any values associated with key0 and/or key1 and/or other keys in list. Unboxes values if possible.
kvdel has same calling signature. will delete any keys+associated values matching x argument.
kvfilt will return a dictionary instead of just values.
kvfiltall will return duplicate keys instead of just last one. kvdelall deletes all duplicate keys instead of last.
dictionary kvset dictionary... uses x dictionary key/values to update/add y dictionary. Merging matching keys with new values from x.
kvadd has same signature.
)
Note 'kv features'
intended for coinsert 'kv' into any other locale. (should be) safe for coinsert_z_ 'kv' (base needs extra coinsert 'z' call)
unique key implied access even when non-unique keys permitted.
create(bulk), add, del, update/set all have versions to allow/avoid duplicates. 1 suffix permits duplicates
optimized for bulk operations, where arguments to functions are either a list of keys, or a kv dictionary.
kv dictionary always y argument to kvfunctions. modifications return copies.'
x right argument to set/add is another dictionary (key,value pairs)
x right argument to get/del is a boxed list of symbol keys or keys as strings.
A DSL is provided to permit one line string descriptions of required x argument dictionaries or deep set/get/del variations.
Non-unique key implementation can still provide unique key expected behaviour. add appending a duplicate key-value creates an undo operation when del deletes the last value.
use kvadd1 instead of kvadd
Multiple internal keys also permit using kv with meaningful order (kvinsert permits ordered manipulation) and /.(key) "applications" and classifiers.
tosym replacement for s: cut instead of leading delimiter. tosym on symbols returns the symbols instead of error. Use of ;: in dsl version allows J words as keys/symbols.
values kept native numeric or string (padded) if possible. Otherwise boxed.
values can hold other kv structures, and so may wish to hold 3 independent dictionaries for each data type: numeric, string, boxed.
kv (get) function returns values only for keys found, returns i.0 if no keys found.
adding or updating unboxed values will promote to boxed if internal values are boxed.
deep operations are supported, where typical kv right arguments (set add) will when provided with nested k1;(kv) :
will modify kv deeply with (k0 kv (keys;vals) set kvdata will set at k0 level.
get/del can access/del at infinte depth by appending (single key) path
)
Note 'limitations'
values lists are stored as tables. If you attempt to store a higher shape than a table,
it is presumed to be a mistake.
linearize is a crutch to combine with "tableize" (monad ,.) to "fix" situations that might create/provide greater than table shapes prior to inclusion as a table.
kv does not use ,.@linearize combo in order to prevent non table keys/values.
set,set1,update,update1 present needlessly many options for extremely limited behaviour operation.
Behaviour that has a good argument for systemically preventing.
Behaviour deviances that I know how to fix easily.
update1 will add a key instead of updating when trying to add a boxed or numeric value to a string value table (+vice versa)
update and update1 will operate as set with autoboxing if adding a string or numeric value to a boxed value table.
)
Note 'Architecture'
Non-unique mode (add1 instead of set) can provide 4x update throughput.
Using optimize every 1 to 5 seconds can serve a slightly out of date dataset to a wide audience with 100x access throughput on top of 4x update throughput.
keyed boxed arrays in a single dictionary provides keyed inverted table structure, or associated data as alternative to classes.
Using J native data offers a size and access performance benefit, and is encouraged.
Arrays/tables as key values makes J suitable for key-value oriented programs.
JSON implementation with kv should outperform current J implementations.
)
kvEMPTY =: 2$ <,.i.0 NB. for reference. use kvbulk instead of set on empty
tosym =: ( s: :: ])@:(cut :: ]) (@]) NB. string transformed to boxes cut on space. Boxes transformed to symbols. symbols unchanged.
NB. filters dict for unique keys keeping the last value associated to a key (last add/set wins)
kvuniquify =: (] {~L:_1 (] i: ~.)@:G0) ^: ( (#@~. < #)@:G0) NB. optimize for presumed unique.
NB. returns a kv from list keys and values. keys transformed into symbols and shaped as num,1. If values a list, also shaped num,1 so that future adds will create fills if they are longer than atoms.
kv1 =: (,.^:(1 = #@$)@] ,&<~ ] =&# assertC 'keys and values must have same item count' (] $~ 1 ,~ #)@:tosym@[ )
kv =: kv1 (kvuniquify@:) NB. choice of functions, permits unique or non-unique implmentations.
NB. ' '&cut(&.>)`(".(&.>))"0 ' as ` 3' bb NB. cuts first on ` ' '&cut for first, ". for 2nd box
NB. DSL to simplify passing keys ` values as a one line string.
NB. keys and values are separated by a `
NB. default key processing is cut on space. Use gerund param to override it. keyprocessfunction`u instead of u
NB. u will apply to values string to generate value items. Examples:
NB. ". will turn numeric. supports expressions such as 10 3 $ i.30 to make 10 items.
NB. ';'&cut will make string items. (keeps leading/trailing spaces)
NB. maybenum@:(';'&cut) will cut on ; and then turn any items that can be into numbers.
NB. Any boxed values created by DSL, will attempt to be unboxed into items if no error. So if all strings, will automatically be turned into padded items.
kvdsL =: (;:`) 1 : 0 ((kv&.>(<@)/(>@)(>@)(f.))@:)
'`k val' =. _2 {. m
NB.(k @:G0 ,&< (> :: ])@:val@:G1)@:(dltb each@:('`'&cut)) f.
(k @:G0 ,&< (> :: ])@:val@:G1) ((@(_2&{.)) ,~ tosym L:0(^:(0 <#))@:(_2 &}.) )@:(dltb each@:('`'&cut)) f.
)
NB. verb arguments to kvdsL to create inverted tables. keys=field names. keys ` tabledata still separated by `
NB. itdsl has values for each key separated by ; and within each string field by : or space (assumes more than 1 value ie. if only 1 string value with : at end will fail (and use space separator))
NB. use with kvdsL kvsetL kvaddL
itdsl =: cut &>^:(1 = #)@(':'&cut)^:(2 = 3!:0)(>@:)@(,.^:(2 ~: 3!:0))@maybenum each@(';'&cut)( {.^:(2< #@$)each@:)
NB. allows "upgrade" of value structure when new items are incompatible with existing ones.
append =: , :: (, <"_1) :: ((<"_1)@[ , ]) :: ([ , <"_1@]) :: (<"_1@[ , <"_1@]) (,.@:)
NB. append items from kv x into kv y. 1verb ommits uniquify step.
kvadd1 =: append~ L:_1
kvadd =: kvuniquify@:kvadd1 NB. adds and updates uniquify
kvaddL =: kvdsL@[ kvadd ] NB. adverb (AVV) that uses DSL to process x string. see kvdsL for u param.
kvadd1L =: kvdsL@[ kvadd1 ] NB. adverb (AVV) that uses DSL to process x string. see kvdsL for u param.
NB. manipulate the key/value order by inserting x: (ks kv vs),< atindexpoint (3 boxes). Does not uniquify. if keys exist later than insert point, those values are still "official" (get result)
kvinsert =: (}:@[ kvadd1 ]) ([ kvi~ (i.@(G1 + G2) ([ {~ (-@G1 }. [ }.~ G0),~ (G2 (+ i.) G1),~ G0 {. [ ) ])) G2~ ([ , ((<: G1) *. 0 < [) assertC 'indexat must be between 0 and count of kvitems') ,&#&G0
NB.ex: ' gg fr` 133 15' ". kvaddL ' as fr' kvbulk 33 5
NB. if kv keys are static, saving indexes may speed access. x is list of numeric indices, kv y is filtered by index list x
NB. kv keys and values are static for purposes of functions that access keys or values in order to
kvi =: { L:_1 NB. filter by index G0@kvi retrieves keys, if x is computed from values (G1), then this is a keysfromvalues function.
kvdi =: ] kvi~ [ -.~ i.@#@G0 NB. del by index
NB.x are keys boxed or ' '&cut formatted string. kv y has items removed that match keys x.
kvdel =: ] kvdi~ G0 i: ,.@tosym@[ NB. if kv is dirty (not unique), del will undo last duplicate add
NB. deep del matches kvd x format below: last key in list operates at deepest point in path. result rolled up into path described by previous keys.
kvdeld =: ] ([ kvsetd~ (linearize each)@}:@] kv&.>(<@)/(>@)(>@)@:(, <@<) >@{:@] kvdel [ kvgetd~ }:@]) (,.@tosym@dltb each @:('`'&cut)(linearize@) :: (,.@tosym each@](linearize@) )@[)
NB. gets values by keys in x from kv y. if any x not found, then omit from result. if no x found return i.0
kvget =: (G1 {~ G0 (#@[ -.~ i:) ,.@tosym@[)(linearize@)((> :: ])@)NB. if values are all atomic, then returns atom or list of atoms. BUGf: if items are all unpadded chars then will concatenate them.
NB. filters kv y by keys in x
kvfilt =: (] {~ L:_1 G0 (#@[ -.~ i:) ,.@tosym@[)
NB. =========================================================
padstrmatch =: -@#@:(';'&cut :: ])@[ (}. e. {.) (] , >@:(';'&cut :: ])@[) NB. when strings padded, this function allows matching with (unpadded x) with padded version (in y).
NB. padstrmatch =: -@#@[ (}. e. {.) (] , >@[) NB. when strings padded, this function allows matching with (unpadded x) with padded version (in y).
kvQ =: ((I.@) kvi"1 G1) kv~ linearize@:G0 NB. AVV for boolean function u (visible to full dict), filter dic where u is true.
NB. deep get (kvd) will tunnel into sub dicts. x is list of boxes or string cut first on ` then on space inside each box. left to right boxes specify top down keys
kvgetd =: (<@] ] F.. (kvget &.>) (,.@tosym@dltb each @:('`'&cut)(linearize@) :: (,.@tosym each@](linearize@)) )@[)((> :: ])@) NB. strange value error bug calling localed from base related to F.. ?
kvgetd =: (<@] kvget &.>/@:(,~ |.) (,.@tosym@dltb each @:('`'&cut)(linearize@) :: (,.@tosym each@](linearize@)) )@[)((> :: ])@)
itdisp1 =: <"_1@G0@:tosym ,: boxopen"_1@kvget NB. displays kv as inverted table (for fields x). for all fields use: (G0 itdisp ])
itdisp1 =: ((<"_1)@G0 ,: linearize@:(boxopen"_1)@:G1)@:kvfilt NB. displays kv as inverted table (for fields x). for all fields use: (G0 itdisp ])
itdisp =: G0 itdisp1 ]
NB. kvf uses lastkey value to filter. Allowing all keys filter allows expansion of use beyond dictionary.
NB. duplicate keys allow undo, and having firstkeyindict contain value for field1 as a compound value structure, 2ndkeyindict holds field2 value...
NB. THe /. (key) dyad function is a powerful J paradigm permitted by kvfall multiple key copy structure.
kvfiltall =: ] #~ L:_1 tosym~ ,@:(+./^:(1 < #))@:(="0 _) G0
NB. delete all instead of last keys x from kv y. When non unique internals are relied upon.
kvdelall=: ] #~ L:_1 tosym~ ,@:-.@:(+./^:(1 < #))@:(="0 _) G0
NB. optimized to keep unique(ish) status, set will update in place those keys that are found, and add1 keys not found.
NB. merging kv x into kv y. will not check for nulls in kv x
NB. IMPLEMENTED: extra err check to use boxed x values before kvadd resorted to. kvadd would trigger if attempting incompatible update on CURRENTLY unboxed kv y values.
NB. TOCONSIDER: kvadd or kvadd1 should never be backstop to update. Match full append backstops
kvupdate2 =: (G0 ,&< <"_1@:G1~`(i:~&G0)`(G1)}) NB. internal error handling. try again with boxed x items.
kvupdate =: (G0 ,&< G1~`(i:~&G0)`(G1)}) :: kvupdate2 :: kvadd NB. requires keys present. Error assumes rare incompatible type.
kvupdate1 =: (G0 ,&< G1~`(i:~&G0)`(G1)}) :: kvupdate2 :: kvadd1 NB. will keep "ununique structure" on error. kvadd will uniquify.
kvset =: (([ kvdel~ -.&G0) kvupdate^:(0 < #@G0~) ([ kvfilt~ -.&G0) kvadd1^:(0 < #@G0~) ] )
kvset1 =: ([ kvdel~ -.&G0) kvupdate1^:(0 < #@G0~) ([ kvfilt~ -.&G0) kvadd1^:(0 < #@G0~) ] NB. ensures ununique properties if they exist (due to error handling in update which includes uniquify if updating to incompatible type that requires box promotion)
kvsetL =: kvdsL@[ kvset ] NB. AVV adverb uses DSL to create x kv from string.
NB. set at depth1 key1 kvbulk keys;values will set keys and values at depth of key1.
NB. if x is hierarchy of key1 kvbulk key2 kvbulk keys;values then key2 will have its value replaced by keys;values... instead of set(update/add) operation at depth.
NB.TODO: Having set work at unlimited depth will require extra leading boxes in x similar to kvd
kvsetd =: (] kvset~ linearize@G0~ (kv <) ,@:>@G1~ kvset G0~ kvget ])
NB. ADV optimize kv function once kv m is static. resulting function is monadic kvget for keys to retrieve. Returned function embeds full kv inside it. Not sure if special code for (m i: ]) so not using.
kvO =: 1 : 'm =. kvuniquify m label_. (((G1 m) {~ (# G0 m) -.~ (G0 m)&i:)@:,.@tosym (linearize@)) f.'
coclass 'kvtest'
coinsert 'kv'
pD_z_ =: (1!:2&2) : (] [ (1!:2&2)@:(,&<))
pDh =: (1!:2&2) 1 : ' u : (] [&u '': vvv'' ,~ [)'
myattr =: tosym 'sorted unique foreignkey foreignval'
pD d=: ('nums' kv < 'field2 field3' kv > 1 ; 0 1 0 0) kvadd ' descF' kvfilt 'descF field2' kv 2 $ < myattr kv _1 0 0 0
'set on empty dict. using gerund dsL call' pDh 'dicts1 dicts2' kv ,. (< kvEMPTY) kvset~ each ';'&cut`cut kvdsL each ('asdf`v' ; '2nd dict key with embedded spaces ` v2')
'transform numeric array with a symbol list into a dictionary' pDh myattr kv 'nums ` field2' kvgetd d
'dictionary to values' pDh ,@kvvals myattr kv 'nums ` field2' kvgetd d
d =: ('strs' kv < cut kvdsL 'str1 str2 str3`g asdf xcvb') kvset d
NB. 'manual deep update of strs`str1`ggg
((< 'str1 `ggg ' cut kvsetL 'strs' kvget d) kv~ 'strs') kvset d
'multidict' pDh d kvset~ kv&.>(<@)/(>@)(>@) ( <'strs'),(tosym' str1 fds'),&< 'gg',:'fds'
'simple dsL add of deep misc`fields`values' pDh d =: 'misc ` str1 fds ` gg fds ' cut kvsetL d
'deep get strs ` str1 str2' pDh 'strs ` str1 str2' kvgetd d
'use `misc vals to deep set strs' pDh (kvsetd~ 'strs' (kv <) 'misc'&kvget ) d
'subkey access str1 str2 for all if all dicts with kvget"1' pDh ,/ 'str1 str2' kvget"1 (G0 kvget ]) d
'subkey access str2 str1 with <@kvget"1 and clean for empties' pDh a: -.~ standardnull each 'str2 str1' <@(kvget"1) (G0 kvget ]) d
'subkey filter str2 str1 with <@kvfilt"1 returns list of dictionaries)' pDh ('str2 str1' kvfilt"1 G0 kvget ]) d
'subkey filter str2 str1 with <@kvfilt"1 adding back keys' pDh (,@G0 kv 'str2 str1' kvfilt"1 G0 kvget ]) d
pD 'empty dictionary is "2$ <,.i.0", but a filtered empty dictionary retains its original value shape'
pD 'if numbers mixed with strings, values are upgraded to boxes'
'add 2 numeric keys to all (last filtered) dictionaries with dsl ". kvsetL"1' pDh 'a b ` 3 4' ". kvsetL"1 ( 'str2 str1' kvfilt"1 G0 kvget ]) d
'make a mistake adding duplicate key' pDh d =: ('misc' kv 123) kvadd1 d
'undo mistake... restore dict by deleting last key' pDh d=: 'misc' kvdel d
'deep delete dic`misc`fds from `dic (masterdict over) d' pDh 'dic` misc `fds fd 'kvdeld 'dic' kv < d
'create inverted table' pDh it =: 'Id Name Job Status' kv ,.&.:>"1 |: maybenum each > ','cut each cutLF 0 : 0 NB. borrowing from https://github.com/tikkanz/jdataframe
3,Jerry,Unemployed,Married
6,Jan,CEO,Married
5,Frieda,student,Single
1,Alex,Waiter,Separated
)
'dsL version matches' pD it (-:&]) itdsl kvdsL 'Id Name Job Status `3 6 5 1 ; Jerry Jan Frieda Alex ;Unemployed:CEO:student:Waiter; Married Married Single Separated' NB. note extra garbage spaces
'itdisplay(selected fields) query on Job -:(padded) ''CEO'' or ''student'' ' pDh 'Id Name Job' itdisp1 (('CEO';'student') padstrmatch 'Job' kvget ]) kvQ it
'dsL (; separated "keys") version of same query as kv' pDh ('CEO;student' padstrmatch 'Job' kvget ]) kvQ it
'add inverted table to "main" dic' pDh d =: ('it' kv < it) kvset d
NB. x keys in range of 0 to y. numeric symbols associated with same numeric value.
bench =: 4 : 0
'create ? x$y keys/vals ignoring duplicates' pD timespacex 'a =. (kv~ ":) ? x$y'
'uniquify on last step' pD timespacex 'kvuniquify a'
'30 keys' pD k =. ": ? 30 $ y
'uniquify and optimize' pD timespacex 'aO =. a kvO'
'random 30key optimized access' pD timespacex 'aO k'
'same 30key unoptimized access' pD timespacex 'k kvget a'
'matches' pD k (aO@[ -: kvget)a
'create 100000 key/vals (uniquified) in range of 2*y so half are new half are existing' pD timespacex 'b =. (kv~ ":) ? 100000 $ 2*y'
'set (update or add depending on existing status of key) 100000 keys/vals into first kv' pD timespacex 'b kvset a'
'using add1 on the 100000 keys/vals into first kv' pD timespacex 'b kvadd1 a'
'using uniquify after all of the add1s on the 100000 keys/vals into first kv' pD timespacex 'b kvuniquify@:kvadd1 a'
'kvset result matches uniquify@:kvadd1' pD b (kvset -: kvuniquify@:kvadd1) a
aO k
)
pD 'bench_kvtest_~ 100000 for benchmark timimgs. 1000000 bench_kvtest_ 50000 for greater contrast'