-
Notifications
You must be signed in to change notification settings - Fork 209
/
manpage.txt
3616 lines (2991 loc) · 180 KB
/
manpage.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
MILLER(1) MILLER(1)
1mNAME0m
Miller -- like awk, sed, cut, join, and sort for name-indexed data such
as CSV and tabular JSON.
1mSYNOPSIS0m
Usage: mlr [flags] {verb} [verb-dependent options ...] {zero or more
file names}
If zero file names are provided, standard input is read, e.g.
mlr --csv sort -f shape example.csv
Output of one verb may be chained as input to another using "then",
e.g.
mlr --csv stats1 -a min,mean,max -f quantity then sort -f color
example.csv
Please see 'mlr help topics' for more information. Please also see
https://miller.readthedocs.io
1mDESCRIPTION0m
Miller operates on key-value-pair data while the familiar Unix tools
operate on integer-indexed fields: if the natural data structure for
the latter is the array, then Miller's natural data structure is the
insertion-ordered hash map. This encompasses a variety of data
formats, including but not limited to the familiar CSV, TSV, and JSON.
(Miller can handle positionally-indexed data as a special case.) This
manpage documents mlr 6.8.0-dev.
1mEXAMPLES0m
mlr --icsv --opprint cat example.csv
mlr --icsv --opprint sort -f shape example.csv
mlr --icsv --opprint sort -f shape -nr index example.csv
mlr --icsv --opprint cut -f flag,shape example.csv
mlr --csv filter '$color == "red"' example.csv
mlr --icsv --ojson put '$ratio = $quantity / $rate' example.csv
mlr --icsv --opprint --from example.csv sort -nr index then cut -f shape,quantity
1mFILE FORMATS0m
CSV/CSV-lite: comma-separated values with separate header line
TSV: same but with tabs in places of commas
+---------------------+
| apple,bat,cog |
| 1,2,3 | Record 1: "apple":"1", "bat":"2", "cog":"3"
| 4,5,6 | Record 2: "apple":"4", "bat":"5", "cog":"6"
+---------------------+
JSON (array of objects):
+---------------------+
| [ |
| { |
| "apple": 1, | Record 1: "apple":"1", "bat":"2", "cog":"3"
| "bat": 2, |
| "cog": 3 |
| }, |
| { |
| "dish": { | Record 2: "dish.egg":"7",
| "egg": 7, | "dish.flint":"8", "garlic":""
| "flint": 8 |
| }, |
| "garlic": "" |
| } |
| ] |
+---------------------+
JSON Lines (sequence of one-line objects):
+------------------------------------------------+
| {"apple": 1, "bat": 2, "cog": 3} |
| {"dish": {"egg": 7, "flint": 8}, "garlic": ""} |
+------------------------------------------------+
Record 1: "apple":"1", "bat":"2", "cog":"3"
Record 2: "dish:egg":"7", "dish:flint":"8", "garlic":""
PPRINT: pretty-printed tabular
+---------------------+
| apple bat cog |
| 1 2 3 | Record 1: "apple:"1", "bat":"2", "cog":"3"
| 4 5 6 | Record 2: "apple":"4", "bat":"5", "cog":"6"
+---------------------+
Markdown tabular (supported for output only):
+-----------------------+
| | apple | bat | cog | |
| | --- | --- | --- | |
| | 1 | 2 | 3 | | Record 1: "apple:"1", "bat":"2", "cog":"3"
| | 4 | 5 | 6 | | Record 2: "apple":"4", "bat":"5", "cog":"6"
+-----------------------+
XTAB: pretty-printed transposed tabular
+---------------------+
| apple 1 | Record 1: "apple":"1", "bat":"2", "cog":"3"
| bat 2 |
| cog 3 |
| |
| dish 7 | Record 2: "dish":"7", "egg":"8"
| egg 8 |
+---------------------+
DKVP: delimited key-value pairs (Miller default format)
+---------------------+
| apple=1,bat=2,cog=3 | Record 1: "apple":"1", "bat":"2", "cog":"3"
| dish=7,egg=8,flint | Record 2: "dish":"7", "egg":"8", "3":"flint"
+---------------------+
NIDX: implicitly numerically indexed (Unix-toolkit style)
+---------------------+
| the quick brown | Record 1: "1":"the", "2":"quick", "3":"brown"
| fox jumped | Record 2: "1":"fox", "2":"jumped"
+---------------------+
1mHELP OPTIONS0m
Type 'mlr help {topic}' for any of the following:
Essentials:
mlr help topics
mlr help basic-examples
mlr help file-formats
Flags:
mlr help flags
mlr help flag
mlr help list-separator-aliases
mlr help list-separator-regex-aliases
mlr help comments-in-data-flags
mlr help compressed-data-flags
mlr help csv/tsv-only-flags
mlr help file-format-flags
mlr help flatten-unflatten-flags
mlr help format-conversion-keystroke-saver-flags
mlr help json-only-flags
mlr help legacy-flags
mlr help miscellaneous-flags
mlr help output-colorization-flags
mlr help pprint-only-flags
mlr help profiling-flags
mlr help separator-flags
Verbs:
mlr help list-verbs
mlr help usage-verbs
mlr help verb
Functions:
mlr help list-functions
mlr help list-function-classes
mlr help list-functions-in-class
mlr help usage-functions
mlr help usage-functions-by-class
mlr help function
Keywords:
mlr help list-keywords
mlr help usage-keywords
mlr help keyword
Other:
mlr help auxents
mlr help terminals
mlr help mlrrc
mlr help output-colorization
mlr help type-arithmetic-info
Shorthands:
mlr -g = mlr help flags
mlr -l = mlr help list-verbs
mlr -L = mlr help usage-verbs
mlr -f = mlr help list-functions
mlr -F = mlr help usage-functions
mlr -k = mlr help list-keywords
mlr -K = mlr help usage-keywords
Lastly, 'mlr help ...' will search for your exact text '...' using the sources of
'mlr help flag', 'mlr help verb', 'mlr help function', and 'mlr help keyword'.
Use 'mlr help find ...' for approximate (substring) matches, e.g. 'mlr help find map'
for all things with "map" in their names.
1mVERB LIST0m
altkv bar bootstrap case cat check clean-whitespace count-distinct count
count-similar cut decimate fill-down fill-empty filter flatten format-values
fraction gap grep group-by group-like gsub having-fields head histogram
json-parse json-stringify join label latin1-to-utf8 least-frequent
merge-fields most-frequent nest nothing put regularize remove-empty-columns
rename reorder repeat reshape sample sec2gmtdate sec2gmt seqgen shuffle
skip-trivial-records sort sort-within-records split ssub stats1 stats2 step
sub summary tac tail tee template top utf8-to-latin1 unflatten uniq unspace
unsparsify
1mFUNCTION LIST0m
abs acos acosh antimode any append apply arrayify asin asinh asserting_absent
asserting_array asserting_bool asserting_boolean asserting_empty
asserting_empty_map asserting_error asserting_float asserting_int
asserting_map asserting_nonempty_map asserting_not_array asserting_not_empty
asserting_not_map asserting_not_null asserting_null asserting_numeric
asserting_present asserting_string atan atan2 atanh bitcount boolean
capitalize cbrt ceil clean_whitespace collapse_whitespace concat cos cosh
count depth dhms2fsec dhms2sec distinct_count erf erfc every exec exp expm1
flatten float floor fmtifnum fmtnum fold format fsec2dhms fsec2hms get_keys
get_values gmt2localtime gmt2nsec gmt2sec gssub gsub haskey hexfmt hms2fsec
hms2sec hostname index int invqnorm is_absent is_array is_bool is_boolean
is_empty is_empty_map is_error is_float is_int is_map is_nan is_nonempty_map
is_not_array is_not_empty is_not_map is_not_null is_null is_numeric is_present
is_string joink joinkv joinv json_parse json_stringify kurtosis latin1_to_utf8
leafcount leftpad length localtime2gmt localtime2nsec localtime2sec log log10
log1p logifit lstrip madd mapdiff mapexcept mapselect mapsum max maxlen md5
mean meaneb median mexp min minlen mmul mode msub nsec2gmt nsec2gmtdate
nsec2localdate nsec2localtime null_count os percentile percentiles pow qnorm
reduce regextract regextract_or_else rightpad round roundm rstrip sec2dhms
sec2gmt sec2gmtdate sec2hms sec2localdate sec2localtime select sgn sha1 sha256
sha512 sin sinh skewness sort sort_collection splita splitax splitkv splitkvx
splitnv splitnvx sqrt ssub stddev strfntime strfntime_local strftime
strftime_local string strip strlen strpntime strpntime_local strptime
strptime_local sub substr substr0 substr1 sum sum2 sum3 sum4 sysntime system
systime systimeint tan tanh tolower toupper truncate typeof unflatten unformat
unformatx upntime uptime urand urand32 urandelement urandint urandrange
utf8_to_latin1 variance version ! != !=~ % & && * ** + - . .* .+ .- ./ / // <
<< <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ | || ~
1mCOMMENTS-IN-DATA FLAGS0m
Miller lets you put comments in your data, such as
# This is a comment for a CSV file
a,b,c
1,2,3
4,5,6
Notes:
* Comments are only honored at the start of a line.
* In the absence of any of the below four options, comments are data like
any other text. (The comments-in-data feature is opt-in.)
* When `--pass-comments` is used, comment lines are written to standard output
immediately upon being read; they are not part of the record stream. Results
may be counterintuitive. A suggestion is to place comments at the start of
data files.
--pass-comments Immediately print commented lines (prefixed by `#`)
within the input.
--pass-comments-with {string}
Immediately print commented lines within input, with
specified prefix.
--skip-comments Ignore commented lines (prefixed by `#`) within the
input.
--skip-comments-with {string}
Ignore commented lines within input, with specified
prefix.
1mCOMPRESSED-DATA FLAGS0m
Miller offers a few different ways to handle reading data files
which have been compressed.
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin``--zstdin`
* Decompression done outside the Miller process: `--prepipe` `--prepipex`
Using `--prepipe` and `--prepipex` you can specify an action to be
taken on each input file. The prepipe command must be able to read from
standard input; it will be invoked with `{command} < {filename}`. The
prepipex command must take a filename as argument; it will be invoked with
`{command} {filename}`.
Examples:
mlr --prepipe gunzip
mlr --prepipe zcat -cf
mlr --prepipe xz -cd
mlr --prepipe cat
Note that this feature is quite general and is not limited to decompression
utilities. You can use it to apply per-file filters of your choice. For output
compression (or other) utilities, simply pipe the output:
`mlr ... | {your compression command} > outputfilenamegoeshere`
Lastly, note that if `--prepipe` or `--prepipex` is specified, it replaces any
decisions that might have been made based on the file suffix. Likewise,
`--gzin`/`--bz2in`/`--zin``--zin` are ignored if `--prepipe` is also specified.
--bz2in Uncompress bzip2 within the Miller process. Done by
default if file ends in `.bz2`.
--gzin Uncompress gzip within the Miller process. Done by
default if file ends in `.gz`.
--prepipe {decompression command}
You can, of course, already do without this for
single input files, e.g. `gunzip < myfile.csv.gz |
mlr ...`. Allowed at the command line, but not in
`.mlrrc` to avoid unexpected code execution.
--prepipe-bz2 Same as `--prepipe bz2`, except this is allowed in
`.mlrrc`.
--prepipe-gunzip Same as `--prepipe gunzip`, except this is allowed in
`.mlrrc`.
--prepipe-zcat Same as `--prepipe zcat`, except this is allowed in
`.mlrrc`.
--prepipe-zstdcat Same as `--prepipe zstdcat`, except this is allowed
in `.mlrrc`.
--prepipex {decompression command}
Like `--prepipe` with one exception: doesn't insert
`<` between command and filename at runtime. Useful
for some commands like `unzip -qc` which don't read
standard input. Allowed at the command line, but not
in `.mlrrc` to avoid unexpected code execution.
--zin Uncompress zlib within the Miller process. Done by
default if file ends in `.z`.
--zstdin Uncompress zstd within the Miller process. Done by
default if file ends in `.zstd`.
1mCSV/TSV-ONLY FLAGS0m
These are flags which are applicable to CSV format.
--allow-ragged-csv-input or --ragged or --allow-ragged-tsv-input
If a data line has fewer fields than the header line,
fill remaining keys with empty string. If a data line
has more fields than the header line, use integer
field labels as in the implicit-header case.
--csv-trim-leading-space Trims leading spaces in CSV data. Use this for data
like '"foo", "bar' which is non-RFC-4180 compliant,
but common.
--headerless-csv-output or --ho or --headerless-tsv-output
Print only CSV/TSV data lines; do not print CSV/TSV
header lines.
--implicit-csv-header or --headerless-csv-input or --hi or --implicit-tsv-header
Use 1,2,3,... as field labels, rather than from line
1 of input files. Tip: combine with `label` to
recreate missing headers.
--lazy-quotes Accepts quotes appearing in unquoted fields, and
non-doubled quotes appearing in quoted fields.
--no-implicit-csv-header or --no-implicit-tsv-header
Opposite of `--implicit-csv-header`. This is the
default anyway -- the main use is for the flags to
`mlr join` if you have main file(s) which are
headerless but you want to join in on a file which
does have a CSV/TSV header. Then you could use `mlr
--csv --implicit-csv-header join
--no-implicit-csv-header -l
your-join-in-with-header.csv ...
your-headerless.csv`.
--quote-all Force double-quoting of CSV fields.
-N Keystroke-saver for `--implicit-csv-header
--headerless-csv-output`.
1mFILE-FORMAT FLAGS0m
See the File formats doc page, and or `mlr help file-formats`, for more
about file formats Miller supports.
Examples: `--csv` for CSV-formatted input and output; `--icsv --opprint` for
CSV-formatted input and pretty-printed output.
Please use `--iformat1 --oformat2` rather than `--format1 --oformat2`.
The latter sets up input and output flags for `format1`, not all of which
are overridden in all cases by setting output format to `format2`.
--asv or --asvlite Use ASV format for input and output data.
--csv or -c Use CSV format for input and output data.
--csvlite Use CSV-lite format for input and output data.
--dkvp Use DKVP format for input and output data.
--gen-field-name Specify field name for --igen. Defaults to "i".
--gen-start Specify start value for --igen. Defaults to 1.
--gen-step Specify step value for --igen. Defaults to 1.
--gen-stop Specify stop value for --igen. Defaults to 100.
--iasv or --iasvlite Use ASV format for input data.
--icsv Use CSV format for input data.
--icsvlite Use CSV-lite format for input data.
--idkvp Use DKVP format for input data.
--igen Ignore input files and instead generate sequential
numeric input using --gen-field-name, --gen-start,
--gen-step, and --gen-stop values. See also the
seqgen verb, which is more useful/intuitive.
--ijson Use JSON format for input data.
--ijsonl Use JSON Lines format for input data.
--inidx Use NIDX format for input data.
--io {format name} Use format name for input and output data. For
example: `--io csv` is the same as `--csv`.
--ipprint Use PPRINT format for input data.
--itsv Use TSV format for input data.
--itsvlite Use TSV-lite format for input data.
--iusv or --iusvlite Use USV format for input data.
--ixtab Use XTAB format for input data.
--json or -j Use JSON format for input and output data.
--jsonl Use JSON Lines format for input and output data.
--nidx Use NIDX format for input and output data.
--oasv or --oasvlite Use ASV format for output data.
--ocsv Use CSV format for output data.
--ocsvlite Use CSV-lite format for output data.
--odkvp Use DKVP format for output data.
--ojson Use JSON format for output data.
--ojsonl Use JSON Lines format for output data.
--omd Use markdown-tabular format for output data.
--onidx Use NIDX format for output data.
--opprint Use PPRINT format for output data.
--otsv Use TSV format for output data.
--otsvlite Use TSV-lite format for output data.
--ousv or --ousvlite Use USV format for output data.
--oxtab Use XTAB format for output data.
--pprint Use PPRINT format for input and output data.
--tsv or -t Use TSV format for input and output data.
--tsvlite Use TSV-lite format for input and output data.
--usv or --usvlite Use USV format for input and output data.
--xtab Use XTAB format for input and output data.
--xvright Right-justify values for XTAB format.
-i {format name} Use format name for input data. For example: `-i csv`
is the same as `--icsv`.
-o {format name} Use format name for output data. For example: `-o
csv` is the same as `--ocsv`.
1mFLATTEN-UNFLATTEN FLAGS0m
These flags control how Miller converts record values which are maps or arrays, when input is JSON and output is non-JSON (flattening) or input is non-JSON and output is JSON (unflattening).
See the Flatten/unflatten doc page for more information.
--flatsep or --jflatsep {string}
Separator for flattening multi-level JSON keys, e.g.
`{"a":{"b":3}}` becomes `a:b => 3` for non-JSON
formats. Defaults to `.`.
--no-auto-flatten When output is non-JSON, suppress the default
auto-flatten behavior. Default: if `$y = [7,8,9]`
then this flattens to `y.1=7,y.2=8,y.3=9, and
similarly for maps. With `--no-auto-flatten`, instead
we get `$y=[1, 2, 3]`.
--no-auto-unflatten When input non-JSON and output is JSON, suppress the
default auto-unflatten behavior. Default: if the
input has `y.1=7,y.2=8,y.3=9` then this unflattens to
`$y=[7,8,9]`. flattens to `y.1=7,y.2=8,y.3=9. With
`--no-auto-flatten`, instead we get
`${y.1}=7,${y.2}=8,${y.3}=9`.
1mFORMAT-CONVERSION KEYSTROKE-SAVER FLAGS0m
As keystroke-savers for format-conversion you may use the following.
The letters c, t, j, l, d, n, x, p, and m refer to formats CSV, TSV, DKVP, NIDX,
JSON, JSON Lines, XTAB, PPRINT, and markdown, respectively. Note that markdown
format is available for output only.
| In\out | CSV | TSV | JSON | JSONL | DKVP | NIDX | XTAB | PPRINT | Markdown |
+--------+-------+-------+--------+--------+--------+--------+--------+----------+
| CSV | | --c2t | --c2j | --c2l | --c2d | --c2n | --c2x | --c2p | --c2m |
| TSV | --t2c | | --t2j | --t2l | --t2d | --t2n | --t2x | --t2p | --t2m |
| JSON | --j2c | --j2t | | --j2l | --j2d | --j2n | --j2x | --j2p | --j2m |
| JSONL | --l2c | --l2t | | | --l2d | --l2n | --l2x | --l2p | --l2m |
| DKVP | --d2c | --d2t | --d2j | --d2l | | --d2n | --d2x | --d2p | --d2m |
| NIDX | --n2c | --n2t | --n2j | --n2l | --n2d | | --n2x | --n2p | --n2m |
| XTAB | --x2c | --x2t | --x2j | --x2l | --x2d | --x2n | | --x2p | --x2m |
| PPRINT | --p2c | --p2t | --p2j | --p2l | --p2d | --p2n | --p2x | | --p2m |
-p Keystroke-saver for `--nidx --fs space --repifs`.
-T Keystroke-saver for `--nidx --fs tab`.
1mJSON-ONLY FLAGS0m
These are flags which are applicable to JSON output format.
--jlistwrap or --jl Wrap JSON output in outermost `[ ]`. This is the
default for JSON output format.
--jvquoteall Force all JSON values -- recursively into lists and
object -- to string.
--jvstack Put one key-value pair per line for JSON output
(multi-line output). This is the default for JSON
output format.
--no-jlistwrap Wrap JSON output in outermost `[ ]`. This is the
default for JSON Lines output format.
--no-jvstack Put objects/arrays all on one line for JSON output.
This is the default for JSON Lines output format.
1mLEGACY FLAGS0m
These are flags which don't do anything in the current Miller version.
They are accepted as no-op flags in order to keep old scripts from breaking.
--jknquoteint Type information from JSON input files is now
preserved throughout the processing stream.
--jquoteall Type information from JSON input files is now
preserved throughout the processing stream.
--json-fatal-arrays-on-input
Miller now supports arrays as of version 6.
--json-map-arrays-on-input
Miller now supports arrays as of version 6.
--json-skip-arrays-on-input
Miller now supports arrays as of version 6.
--jsonx The `--jvstack` flag is now default true in Miller 6.
--mmap Miller no longer uses memory-mapping to access data
files.
--no-mmap Miller no longer uses memory-mapping to access data
files.
--ojsonx The `--jvstack` flag is now default true in Miller 6.
--quote-minimal Ignored as of version 6. Types are inferred/retained
through the processing flow now.
--quote-none Ignored as of version 6. Types are inferred/retained
through the processing flow now.
--quote-numeric Ignored as of version 6. Types are inferred/retained
through the processing flow now.
--quote-original Ignored as of version 6. Types are inferred/retained
through the processing flow now.
--vflatsep Ignored as of version 6. This functionality is
subsumed into JSON formatting.
1mMISCELLANEOUS FLAGS0m
These are flags which don't fit into any other category.
--fflush Force buffered output to be written after every
output record. The default is flush output after
every record if the output is to the terminal, or
less often if the output is to a file or a pipe. The
default is a significant performance optimization for
large files. Use this flag to force frequent updates
even when output is to a pipe or file, at a
performance cost.
--from {filename} Use this to specify an input file before the verb(s),
rather than after. May be used more than once.
Example: `mlr --from a.dat --from b.dat cat` is the
same as `mlr cat a.dat b.dat`.
--hash-records This is an internal parameter which normally does not
need to be modified. It controls the mechanism by
which Miller accesses fields within records. In
general --no-hash-records is faster, and is the
default. For specific use-cases involving data having
many fields, and many of them being processed during
a given processing run, --hash-records might offer a
slight performance benefit.
--infer-int-as-float or -A
Cast all integers in data files to floats.
--infer-none or -S Don't treat values like 123 or 456.7 in data files as
int/float; leave them as strings.
--infer-octal or -O Treat numbers like 0123 in data files as numeric;
default is string. Note that 00--07 etc scan as int;
08-09 scan as float.
--load {filename} Load DSL script file for all put/filter operations on
the command line. If the name following `--load` is a
directory, load all `*.mlr` files in that directory.
This is just like `put -f` and `filter -f` except
it's up-front on the command line, so you can do
something like `alias mlr='mlr --load ~/myscripts'`
if you like.
--mfrom {filenames} Use this to specify one of more input files before
the verb(s), rather than after. May be used more than
once. The list of filename must end with `--`. This
is useful for example since `--from *.csv` doesn't do
what you might hope but `--mfrom *.csv --` does.
--mload {filenames} Like `--load` but works with more than one filename,
e.g. `--mload *.mlr --`.
--no-dedupe-field-names By default, if an input record has a field named `x`
and another also named `x`, the second will be
renamed `x_2`, and so on. With this flag provided,
the second `x`'s value will replace the first `x`'s
value when the record is read. This flag has no
effect on JSON input records, where duplicate keys
always result in the last one's value being retained.
--no-fflush Let buffered output not be written after every output
record. The default is flush output after every
record if the output is to the terminal, or less
often if the output is to a file or a pipe. The
default is a significant performance optimization for
large files. Use this flag to allow less-frequent
updates when output is to the terminal. This is
unlikely to be a noticeable performance improvement,
since direct-to-screen output for large files has its
own overhead.
--no-hash-records See --hash-records.
--nr-progress-mod {m} With m a positive integer: print filename and record
count to os.Stderr every m input records.
--ofmt {format} E.g. `%.18f`, `%.0f`, `%9.6e`. Please use
sprintf-style codes (https://pkg.go.dev/fmt) for
floating-point numbers. If not specified, default
formatting is used. See also the `fmtnum` function
and the `format-values` verb.
--ofmte {n} Use --ofmte 6 as shorthand for --ofmt %.6e, etc.
--ofmtf {n} Use --ofmtf 6 as shorthand for --ofmt %.6f, etc.
--ofmtg {n} Use --ofmtg 6 as shorthand for --ofmt %.6g, etc.
--records-per-batch {n} This is an internal parameter for maximum number of
records in a batch size. Normally this does not need
to be modified, except when input is from `tail -f`.
See also
https://miller.readthedocs.io/en/latest/reference-main-flag-list/.
--s-no-comment-strip {file name}
Take command-line flags from file name, like -s, but
with no comment-stripping. For more information
please see
https://miller.readthedocs.io/en/latest/scripting/.
--seed {n} with `n` of the form `12345678` or `0xcafefeed`. For
`put`/`filter` `urand`, `urandint`, and `urand32`.
--tz {timezone} Specify timezone, overriding `$TZ` environment
variable (if any).
-I Process files in-place. For each file name on the
command line, output is written to a temp file in the
same directory, which is then renamed over the
original. Each file is processed in isolation: if the
output format is CSV, CSV headers will be present in
each output file, statistics are only over each
file's own records; and so on.
-n Process no input files, nor standard input either.
Useful for `mlr put` with `begin`/`end` statements
only. (Same as `--from /dev/null`.) Also useful in
`mlr -n put -v '...'` for analyzing abstract syntax
trees (if that's your thing).
-s {file name} Take command-line flags from file name. For more
information please see
https://miller.readthedocs.io/en/latest/scripting/.
1mOUTPUT-COLORIZATION FLAGS0m
Miller uses colors to highlight outputs. You can specify color preferences.
Note: output colorization does not work on Windows.
Things having colors:
* Keys in CSV header lines, JSON keys, etc
* Values in CSV data lines, JSON scalar values, etc in regression-test output
* Some online-help strings
Rules for coloring:
* By default, colorize output only if writing to stdout and stdout is a TTY.
* Example: color: `mlr --csv cat foo.csv`
* Example: no color: `mlr --csv cat foo.csv > bar.csv`
* Example: no color: `mlr --csv cat foo.csv | less`
* The default colors were chosen since they look OK with white or black
terminal background, and are differentiable with common varieties of human
color vision.
Mechanisms for coloring:
* Miller uses ANSI escape sequences only. This does not work on Windows
except within Cygwin.
* Requires `TERM` environment variable to be set to non-empty string.
* Doesn't try to check to see whether the terminal is capable of 256-color
ANSI vs 16-color ANSI. Note that if colors are in the range 0..15
then 16-color ANSI escapes are used, so this is in the user's control.
How you can control colorization:
* Suppression/unsuppression:
* Environment variable `export MLR_NO_COLOR=true` means don't color
even if stdout+TTY.
* Environment variable `export MLR_ALWAYS_COLOR=true` means do color
even if not stdout+TTY.
For example, you might want to use this when piping mlr output to `less -r`.
* Command-line flags `--no-color` or `-M`, `--always-color` or `-C`.
* Color choices can be specified by using environment variables, or command-line
flags, with values 0..255:
* `export MLR_KEY_COLOR=208`, `MLR_VALUE_COLOR=33`, etc.:
`MLR_KEY_COLOR` `MLR_VALUE_COLOR` `MLR_PASS_COLOR` `MLR_FAIL_COLOR`
`MLR_REPL_PS1_COLOR` `MLR_REPL_PS2_COLOR` `MLR_HELP_COLOR`
* Command-line flags `--key-color 208`, `--value-color 33`, etc.:
`--key-color` `--value-color` `--pass-color` `--fail-color`
`--repl-ps1-color` `--repl-ps2-color` `--help-color`
* This is particularly useful if your terminal's background color clashes
with current settings.
If environment-variable settings and command-line flags are both provided, the
latter take precedence.
Colors can be specified using names such as "red" or "orchid": please see
`mlr --list-color-names` to see available names. They can also be specified using
numbers in the range 0..255, like 170: please see `mlr --list-color-codes`.
You can also use "bold", "underline", and/or "reverse". Additionally, combinations of
those can be joined with a "-", like "red-bold", "bold-170", "bold-underline", etc.
--always-color or -C Instructs Miller to colorize output even when it
normally would not. Useful for piping output to `less
-r`.
--fail-color Specify the color (see `--list-color-codes` and
`--list-color-names`) for failing cases in `mlr
regtest`.
--help-color Specify the color (see `--list-color-codes` and
`--list-color-names`) for highlights in `mlr help`
output.
--key-color Specify the color (see `--list-color-codes` and
`--list-color-names`) for record keys.
--list-color-codes Show the available color codes in the range 0..255,
such as 170 for example.
--list-color-names Show the names for the available color codes, such as
`orchid` for example.
--no-color or -M Instructs Miller to not colorize any output.
--pass-color Specify the color (see `--list-color-codes` and
`--list-color-names`) for passing cases in `mlr
regtest`.
--value-color Specify the color (see `--list-color-codes` and
`--list-color-names`) for record values.
1mPPRINT-ONLY FLAGS0m
These are flags which are applicable to PPRINT format.
--barred Prints a border around PPRINT output (not available
for input).
--right Right-justifies all fields for PPRINT output.
1mPROFILING FLAGS0m
These are flags for profiling Miller performance.
--cpuprofile {CPU-profile file name}
Create a CPU-profile file for performance analysis.
Instructions will be printed to stderr. This flag
must be the very first thing after 'mlr' on the
command line.
--time Print elapsed execution time in seconds to stderr at
the end of the execution of the program.
--traceprofile Create a trace-profile file for performance analysis.
Instructions will be printed to stderr. This flag
must be the very first thing after 'mlr' on the
command line.
1mSEPARATOR FLAGS0m
See the Separators doc page for more about record separators, field
separators, and pair separators. Also see the File formats doc page, or
`mlr help file-formats`, for more about the file formats Miller supports.
In brief:
* For DKVP records like `x=1,y=2,z=3`, the fields are separated by a comma,
the key-value pairs are separated by a comma, and each record is separated
from the next by a newline.
* Each file format has its own default separators.
* Most formats, such as CSV, don't support pair-separators: keys are on the CSV
header line and values are on each CSV data line; keys and values are not
placed next to one another.
* Some separators are not programmable: for example JSON uses a colon as a
pair separator but this is non-modifiable in the JSON spec.
* You can set separators differently between Miller's input and output --
hence `--ifs` and `--ofs`, etc.
Notes about line endings:
* Default line endings (`--irs` and `--ors`) are newline
which is interpreted to accept carriage-return/newline files (e.g. on Windows)
for input, and to produce platform-appropriate line endings on output.
Notes about all other separators:
* IPS/OPS are only used for DKVP and XTAB formats, since only in these formats
do key-value pairs appear juxtaposed.
* IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines;
XTAB records are separated by two or more consecutive IFS/OFS -- i.e.
a blank line. Everything above about `--irs/--ors/--rs auto` becomes `--ifs/--ofs/--fs`
auto for XTAB format. (XTAB's default IFS/OFS are "auto".)
* OFS must be single-character for PPRINT format. This is because it is used
with repetition for alignment; multi-character separators would make
alignment impossible.
* OPS may be multi-character for XTAB format, in which case alignment is
disabled.
* FS/PS are ignored for markdown format; RS is used.
* All FS and PS options are ignored for JSON format, since they are not relevant
to the JSON format.
* You can specify separators in any of the following ways, shown by example:
- Type them out, quoting as necessary for shell escapes, e.g.
`--fs '|' --ips :`
- C-style escape sequences, e.g. `--rs '\r\n' --fs '\t'`.
- To avoid backslashing, you can use any of the following names:
ascii_esc = "\x1b"
ascii_etx = "\x04"
ascii_fs = "\x1c"
ascii_gs = "\x1d"
ascii_null = "\x01"
ascii_rs = "\x1e"
ascii_soh = "\x02"
ascii_stx = "\x03"
ascii_us = "\x1f"
asv_fs = "\x1f"
asv_rs = "\x1e"
colon = ":"
comma = ","
cr = "\r"
crcr = "\r\r"
crlf = "\r\n"
crlfcrlf = "\r\n\r\n"
equals = "="
lf = "\n"
lflf = "\n\n"
newline = "\n"
pipe = "|"
semicolon = ";"
slash = "/"
space = " "
tab = "\t"
usv_fs = "\xe2\x90\x9f"
usv_rs = "\xe2\x90\x9e"
- Similarly, you can use the following for `--ifs-regex` and `--ips-regex`:
spaces = "( )+"
tabs = "(\t)+"
whitespace = "([ \t])+"
* Default separators by format:
Format FS PS RS
csv "," N/A "\n"
csvlite "," N/A "\n"
dkvp "," "=" "\n"
json N/A N/A N/A
markdown " " N/A "\n"
nidx " " N/A "\n"
pprint " " N/A "\n"
tsv " " N/A "\n"
xtab "\n" " " "\n\n"
--fs {string} Specify FS for input and output.
--ifs {string} Specify FS for input.
--ifs-regex {string} Specify FS for input as a regular expression.
--ips {string} Specify PS for input.
--ips-regex {string} Specify PS for input as a regular expression.
--irs {string} Specify RS for input.
--ofs {string} Specify FS for output.
--ops {string} Specify PS for output.
--ors {string} Specify RS for output.
--ps {string} Specify PS for input and output.
--repifs Let IFS be repeated: e.g. for splitting on multiple
spaces.
--rs {string} Specify RS for input and output.
1mAUXILIARY COMMANDS0m
Available entries:
mlr aux-list
mlr hex
mlr lecat
mlr termcvt
mlr unhex
For more information, please invoke mlr {subcommand} --help.
1mMLRRC0m
You can set up personal defaults via a $HOME/.mlrrc and/or ./.mlrrc.
For example, if you usually process CSV, then you can put "--csv" in your .mlrrc file
and that will be the default input/output format unless otherwise specified on the command line.
The .mlrrc file format is one "--flag" or "--option value" per line, with the leading "--" optional.
Hash-style comments and blank lines are ignored.
Sample .mlrrc:
# Input and output formats are CSV by default (unless otherwise specified
# on the mlr command line):
csv
# These are no-ops for CSV, but when I do use JSON output, I want these
# pretty-printing options to be used:
jvstack
jlistwrap
How to specify location of .mlrrc:
* If $MLRRC is set:
o If its value is "__none__" then no .mlrrc files are processed.
o Otherwise, its value (as a filename) is loaded and processed. If there are syntax
errors, they abort mlr with a usage message (as if you had mistyped something on the
command line). If the file can't be loaded at all, though, it is silently skipped.
o Any .mlrrc in your home directory or current directory is ignored whenever $MLRRC is
set in the environment.
* Otherwise:
o If $HOME/.mlrrc exists, it's then processed as above.
o If ./.mlrrc exists, it's then also processed as above.
(I.e. current-directory .mlrrc defaults are stacked over home-directory .mlrrc defaults.)
* The command-line flag "--norc" can be used to suppress loading the .mlrrc file even when other
conditions are met.
See also:
https://miller.readthedocs.io/en/latest/customization.html
1mREPL0m
Usage: mlr repl [options] {zero or more data-file names}
-v Prints the expressions's AST (abstract syntax tree), which gives
full transparency on the precedence and associativity rules of
Miller's grammar, to stdout.
-d Like -v but uses a parenthesized-expression format for the AST.
-D Like -d but with output all on one line.
-w Show warnings about uninitialized variables
-q Don't show startup banner
-s Don't show prompts
--load {DSL script file} Load script file before presenting the prompt.
If the name following --load is a directory, load all "*.mlr" files
in that directory.
--mload {DSL script files} -- Like --load but works with more than one filename,
e.g. '--mload *.mlr --'.
-h|--help Show this message.
Or any --icsv, --ojson, etc. reader/writer options as for the main Miller command line.
Any data-file names are opened just as if you had waited and typed :open {filenames}
at the Miller REPL prompt.
1mVERBS0m
1maltkv0m
Usage: mlr altkv [options]
Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs.
Options:
-h|--help Show this message.
1mbar0m
Usage: mlr bar [options]
Replaces a numeric field with a number of asterisks, allowing for cheesy
bar plots. These align best with --opprint or --oxtab output format.
Options:
-f {a,b,c} Field names to convert to bars.
--lo {lo} Lower-limit value for min-width bar: default '0.000000'.
--hi {hi} Upper-limit value for max-width bar: default '100.000000'.
-w {n} Bar-field width: default '40'.
--auto Automatically computes limits, ignoring --lo and --hi.
Holds all records in memory before producing any output.
-c {character} Fill character: default '*'.
-x {character} Out-of-bounds character: default '#'.
-b {character} Blank character: default '.'.
Nominally the fill, out-of-bounds, and blank characters will be strings of length 1.
However you can make them all longer if you so desire.
-h|--help Show this message.
1mbootstrap0m
Usage: mlr bootstrap [options]
Emits an n-sample, with replacement, of the input records.
See also mlr sample and mlr shuffle.
Options:
-n Number of samples to output. Defaults to number of input records.
Must be non-negative.
-h|--help Show this message.
1mcase0m
Usage: mlr case [options]
Uppercases strings in record keys and/or values.
Options:
-k Case only keys, not keys and values.
-v Case only values, not keys and values.
-f {a,b,c} Specify which field names to case (default: all)
-u Convert to uppercase
-l Convert to lowercase
-s Convert to sentence case (capitalize first letter)
-t Convert to title case (capitalize words)
-h|--help Show this message.
1mcat0m
Usage: mlr cat [options]
Passes input records directly to output. Most useful for format conversion.
Options:
-n Prepend field "n" to each record with record-counter starting at 1.
-N {name} Prepend field {name} to each record with record-counter starting at 1.
-g {a,b,c} Optional group-by-field names for counters, e.g. a,b,c
--filename Prepend current filename to each record.
--filenum Prepend current filenum (1-up) to each record.
-h|--help Show this message.
1mcheck0m
Usage: mlr check [options]
Consumes records without printing any output,
Useful for doing a well-formatted check on input data.
with the exception that warnings are printed to stderr.
Current checks are:
* Data are parseable
* If any key is the empty string
Options:
-h|--help Show this message.
1mclean-whitespace0m
Usage: mlr clean-whitespace [options]
For each record, for each field in the record, whitespace-cleans the keys and/or
values. Whitespace-cleaning entails stripping leading and trailing whitespace,
and replacing multiple whitespace with singles. For finer-grained control,
please see the DSL functions lstrip, rstrip, strip, collapse_whitespace,
and clean_whitespace.
Options:
-k|--keys-only Do not touch values.
-v|--values-only Do not touch keys.
It is an error to specify -k as well as -v -- to clean keys and values,
leave off -k as well as -v.
-h|--help Show this message.
1mcount-distinct0m
Usage: mlr count-distinct [options]
Prints number of records having distinct values for specified field names.
Same as uniq -c.
Options:
-f {a,b,c} Field names for distinct count.
-n Show only the number of distinct values. Not compatible with -u.
-o {name} Field name for output count. Default "count".
Ignored with -u.
-u Do unlashed counts for multiple field names. With -f a,b and
without -u, computes counts for distinct combinations of a
and b field values. With -f a,b and with -u, computes counts
for distinct a field values and counts for distinct b field
values separately.
1mcount0m
Usage: mlr count [options]
Prints number of records, optionally grouped by distinct values for specified field names.
Options:
-g {a,b,c} Optional group-by-field names for counts, e.g. a,b,c
-n {n} Show only the number of distinct values. Not interesting without -g.
-o {name} Field name for output-count. Default "count".
-h|--help Show this message.
1mcount-similar0m
Usage: mlr count-similar [options]
Ingests all records, then emits each record augmented by a count of
the number of other records having the same group-by field values.
Options:
-g {a,b,c} Group-by-field names for counts, e.g. a,b,c
-o {name} Field name for output-counts. Defaults to "count".
-h|--help Show this message.
1mcut0m
Usage: mlr cut [options]
Passes through input records with specified fields included/excluded.
Options:
-f {a,b,c} Comma-separated field names for cut, e.g. a,b,c.
-o Retain fields in the order specified here in the argument list.
Default is to retain them in the order found in the input data.
-x|--complement Exclude, rather than include, field names specified by -f.
-r Treat field names as regular expressions. "ab", "a.*b" will
match any field name containing the substring "ab" or matching
"a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may
be used. The -o flag is ignored when -r is present.
-h|--help Show this message.