Skip to content

Commit

Permalink
update mlr -O behavior for #756 (#788)
Browse files Browse the repository at this point in the history
  • Loading branch information
johnkerl committed Dec 22, 2021
1 parent fafff68 commit 93862f1
Show file tree
Hide file tree
Showing 12 changed files with 75 additions and 45 deletions.
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,9 @@ dev:
make -C docs
@echo DONE

docs:
make -C docs

# ----------------------------------------------------------------
# Keystroke-savers
it: build check
Expand Down Expand Up @@ -216,4 +219,4 @@ release_tarball: build check

# ================================================================
# Go does its own dependency management, outside of make.
.PHONY: build mlr mprof mprof2 mprof3 mprof4 mprof5 check unit_test regression_test fmt dev
.PHONY: build mlr mprof mprof2 mprof3 mprof4 mprof5 check unit_test regression_test fmt dev docs
7 changes: 4 additions & 3 deletions docs/src/manpage.md
Original file line number Diff line number Diff line change
Expand Up @@ -488,10 +488,11 @@ MISCELLANEOUS FLAGS
slight performance benefit.
--infer-int-as-float or -A
Cast all integers in data files to floats.
--infer-no-octal or -O Treat numbers like 0123 in data files as string
"0123", not octal for decimal 83 etc.
--infer-none or -S Don't treat values like 123 or 456.7 in data files as
int/float; leave them as strings.
--infer-octal or -O Treat numbers like 0123 in data files as numeric;
default is string. Note that 00--07 etc scan as int;
08-09 scan as float.
--load {filename} Load DSL script file for all put/filter operations on
the command line. If the name following `--load` is a
directory, load all `*.mlr` files in that directory.
Expand Down Expand Up @@ -3006,5 +3007,5 @@ SEE ALSO



2021-12-15 MILLER(1)
2021-12-22 MILLER(1)
</pre>
7 changes: 4 additions & 3 deletions docs/src/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -467,10 +467,11 @@ MISCELLANEOUS FLAGS
slight performance benefit.
--infer-int-as-float or -A
Cast all integers in data files to floats.
--infer-no-octal or -O Treat numbers like 0123 in data files as string
"0123", not octal for decimal 83 etc.
--infer-none or -S Don't treat values like 123 or 456.7 in data files as
int/float; leave them as strings.
--infer-octal or -O Treat numbers like 0123 in data files as numeric;
default is string. Note that 00--07 etc scan as int;
08-09 scan as float.
--load {filename} Load DSL script file for all put/filter operations on
the command line. If the name following `--load` is a
directory, load all `*.mlr` files in that directory.
Expand Down Expand Up @@ -2985,4 +2986,4 @@ SEE ALSO



2021-12-15 MILLER(1)
2021-12-22 MILLER(1)
2 changes: 1 addition & 1 deletion docs/src/new-in-miller-6.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ The following differences are rather technical. If they don't sound familiar to
* See also `mlr help legacy-flags` or the [legacy-flags reference](reference-main-flag-list.md#legacy-flags).
* Type-inference:
* The `-S` and `-F` flags to `mlr put` and `mlr filter` are ignored, since type-inference is no longer done in `mlr put` and `mlr filter`, but rather, when records are first read. You can use `mlr -S` and `mlr -A`, respectively, instead to control type-inference within the record-readers.
* Similarly, use `mlr -O` to force octal-looking strings to remain strings like `"0123"`, not ints like `0123` which is 83 in decimal.
* Octal numbers like `0123` and `07` are type-inferred as string. Use `mlr -O` to infer them as octal integers. Note that `08` and `09` will then infer as float.
* See also the [miscellaneous-flags reference](reference-main-flag-list.md#miscellaneous-flags).
* Emitting a map-valued expression now requires either a temporary variable or the new `emit1` keyword. Please see the
[page on emit statements](reference-dsl-output-statements.md#emit1-and-emitemitpemitf) for more information.
2 changes: 1 addition & 1 deletion docs/src/new-in-miller-6.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ The following differences are rather technical. If they don't sound familiar to
* See also `mlr help legacy-flags` or the [legacy-flags reference](reference-main-flag-list.md#legacy-flags).
* Type-inference:
* The `-S` and `-F` flags to `mlr put` and `mlr filter` are ignored, since type-inference is no longer done in `mlr put` and `mlr filter`, but rather, when records are first read. You can use `mlr -S` and `mlr -A`, respectively, instead to control type-inference within the record-readers.
* Similarly, use `mlr -O` to force octal-looking strings to remain strings like `"0123"`, not ints like `0123` which is 83 in decimal.
* Octal numbers like `0123` and `07` are type-inferred as string. Use `mlr -O` to infer them as octal integers. Note that `08` and `09` will then infer as float.
* See also the [miscellaneous-flags reference](reference-main-flag-list.md#miscellaneous-flags).
* Emitting a map-valued expression now requires either a temporary variable or the new `emit1` keyword. Please see the
[page on emit statements](reference-dsl-output-statements.md#emit1-and-emitemitpemitf) for more information.
4 changes: 2 additions & 2 deletions docs/src/reference-main-flag-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -345,10 +345,10 @@ These are flags which don't fit into any other category.
`: This is an internal parameter which normally does not need to be modified. It controls the mechanism by which Miller accesses fields within records. In general --no-hash-records is faster, and is the default. For specific use-cases involving data having many fields, and many of them being processed during a given processing run, --hash-records might offer a slight performance benefit.
* `--infer-int-as-float or -A
`: Cast all integers in data files to floats.
* `--infer-no-octal or -O
`: Treat numbers like 0123 in data files as string "0123", not octal for decimal 83 etc.
* `--infer-none or -S
`: Don't treat values like 123 or 456.7 in data files as int/float; leave them as strings.
* `--infer-octal or -O
`: Treat numbers like 0123 in data files as numeric; default is string. Note that 00--07 etc scan as int; 08-09 scan as float.
* `--load {filename}
`: Load DSL script file for all put/filter operations on the command line. If the name following `--load` is a directory, load all `*.mlr` files in that directory. This is just like `put -f` and `filter -f` except it's up-front on the command line, so you can do something like `alias mlr='mlr --load ~/myscripts'` if you like.
* `--mfrom {filenames}
Expand Down
7 changes: 4 additions & 3 deletions internal/pkg/cli/option_parse.go
Original file line number Diff line number Diff line change
Expand Up @@ -2619,11 +2619,12 @@ data having many fields, and many of them being processed during a given process
},

{
name: "--infer-no-octal",
name: "--infer-octal",
altNames: []string{"-O"},
help: `Treat numbers like 0123 in data files as string "0123", not octal for decimal 83 etc.`,
help: `Treat numbers like 0123 in data files as numeric; default is string.
Note that 00--07 etc scan as int; 08-09 scan as float.`,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
mlrval.SetInferrerNoOctal()
mlrval.SetInferrerOctalAsInt()
*pargi += 1
},
},
Expand Down
56 changes: 35 additions & 21 deletions internal/pkg/mlrval/mlrval_infer.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import (
"github.com/johnkerl/miller/internal/pkg/lib"
)

// TODO: no infer-bool from data files. Always false in this path.
// TODO: comment no infer-bool from data files. Always false in this path.

// It's essential that we use mv.Type() not mv.mvtype since types are
// JIT-computed on first access for most data-file values. See type.go for more
Expand All @@ -23,14 +23,24 @@ func (mv *Mlrval) Type() MVType {
// Support for mlr -S, mlr -A, mlr -O.
type tInferrer func(mv *Mlrval, input string, inferBool bool) *Mlrval

var packageLevelInferrer tInferrer = inferNormally
var packageLevelInferrer tInferrer = inferWithOctalAsString

func SetInferrerNoOctal() {
packageLevelInferrer = inferWithOctalSuppress
// SetInferrerOctalAsInt is for default behavior.
func SetInferrerOctalAsString() {
packageLevelInferrer = inferWithOctalAsString
}

// SetInferrerOctalAsInt is for mlr -O.
func SetInferrerOctalAsInt() {
packageLevelInferrer = inferWithOctalAsInt
}

// SetInferrerStringOnly is for mlr -A.
func SetInferrerIntAsFloat() {
packageLevelInferrer = inferWithIntAsFloat
}

// SetInferrerStringOnly is for mlr -S.
func SetInferrerStringOnly() {
packageLevelInferrer = inferStringOnly
}
Expand All @@ -47,7 +57,24 @@ var downcasedFloatNamesToNotInfer = map[string]bool{
"nan": true,
}

func inferNormally(mv *Mlrval, input string, inferBool bool) *Mlrval {
var octalDetector = regexp.MustCompile("^-?0[0-9]+")

// inferWithOctalAsString is for default behavior.
func inferWithOctalAsString(mv *Mlrval, input string, inferBool bool) *Mlrval {
inferWithOctalAsInt(mv, input, inferBool)
if mv.mvtype != MT_INT && mv.mvtype != MT_FLOAT {
return mv
}

if octalDetector.MatchString(mv.printrep) {
return mv.SetFromString(input)
} else {
return mv
}
}

// inferWithOctalAsInt is for mlr -O.
func inferWithOctalAsInt(mv *Mlrval, input string, inferBool bool) *Mlrval {
if input == "" {
return mv.SetFromVoid()
}
Expand All @@ -73,30 +100,17 @@ func inferNormally(mv *Mlrval, input string, inferBool bool) *Mlrval {
return mv.SetFromString(input)
}

var octalDetector = regexp.MustCompile("^-?0[0-9]+")

func inferWithOctalSuppress(mv *Mlrval, input string, inferBool bool) *Mlrval {
inferNormally(mv, input, inferBool)
if mv.mvtype != MT_INT && mv.mvtype != MT_FLOAT {
return mv
}

if octalDetector.MatchString(mv.printrep) {
return mv.SetFromString(input)
} else {
return mv
}
}

// inferWithIntAsFloat is for mlr -A.
func inferWithIntAsFloat(mv *Mlrval, input string, inferBool bool) *Mlrval {
inferNormally(mv, input, inferBool)
inferWithOctalAsString(mv, input, inferBool)
if mv.Type() == MT_INT {
mv.floatval = float64(mv.intval)
mv.mvtype = MT_FLOAT
}
return mv
}

// inferStringOnly is for mlr -S.
func inferStringOnly(mv *Mlrval, input string, inferBool bool) *Mlrval {
return mv.SetFromString(input)
}
13 changes: 10 additions & 3 deletions internal/pkg/mlrval/mlrval_output.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,17 @@ func (mv *Mlrval) String() string {
if floatOutputFormatter != nil && mv.Type() == MT_FLOAT {
// Use the format string from global --ofmt, if supplied
return floatOutputFormatter.FormatFloat(mv.floatval)
} else {
mv.setPrintRep()
return mv.printrep
}

// TODO: track dirty-flag checking / somesuch.
// At present it's cumbersome to check if an array or map has been modified
// and it's safest to always recompute the string-rep.
if mv.IsArrayOrMap() {
mv.printrepValid = false
}

mv.setPrintRep()
return mv.printrep
}

// See mlrval.go for more about JIT-formatting of string backings
Expand Down
7 changes: 4 additions & 3 deletions man/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -467,10 +467,11 @@ MISCELLANEOUS FLAGS
slight performance benefit.
--infer-int-as-float or -A
Cast all integers in data files to floats.
--infer-no-octal or -O Treat numbers like 0123 in data files as string
"0123", not octal for decimal 83 etc.
--infer-none or -S Don't treat values like 123 or 456.7 in data files as
int/float; leave them as strings.
--infer-octal or -O Treat numbers like 0123 in data files as numeric;
default is string. Note that 00--07 etc scan as int;
08-09 scan as float.
--load {filename} Load DSL script file for all put/filter operations on
the command line. If the name following `--load` is a
directory, load all `*.mlr` files in that directory.
Expand Down Expand Up @@ -2985,4 +2986,4 @@ SEE ALSO



2021-12-15 MILLER(1)
2021-12-22 MILLER(1)
9 changes: 5 additions & 4 deletions man/mlr.1
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
.\" Title: mlr
.\" Author: [see the "AUTHOR" section]
.\" Generator: ./mkman.rb
.\" Date: 2021-12-15
.\" Date: 2021-12-22
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "MILLER" "1" "2021-12-15" "\ \&" "\ \&"
.TH "MILLER" "1" "2021-12-22" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Portability definitions
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -586,10 +586,11 @@ These are flags which don't fit into any other category.
slight performance benefit.
--infer-int-as-float or -A
Cast all integers in data files to floats.
--infer-no-octal or -O Treat numbers like 0123 in data files as string
"0123", not octal for decimal 83 etc.
--infer-none or -S Don't treat values like 123 or 456.7 in data files as
int/float; leave them as strings.
--infer-octal or -O Treat numbers like 0123 in data files as numeric;
default is string. Note that 00--07 etc scan as int;
08-09 scan as float.
--load {filename} Load DSL script file for all put/filter operations on
the command line. If the name following `--load` is a
directory, load all `*.mlr` files in that directory.
Expand Down
1 change: 1 addition & 0 deletions todo.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ PUNCHDOWN LIST
- sort-hof check
- more linux perf checks
- mlr -O / abor!
> doc 07 int 08 float
- --ifs-regex & --ips-regex -- guessing is not safe as evidence by '.' and '|'
- big-picture item @ Rmd (csv memes; and beyond); also webdoc intro page
- function: randsel for arrays; use for example-csv-expander
Expand Down

0 comments on commit 93862f1

Please sign in to comment.