Skip to content

Commit

Permalink
Natural sort (johnkerl#932)
Browse files Browse the repository at this point in the history
* Add natural sort order as an option for the sort verb

* Add natural sort order as an option for the sort DSL function

* doc-build artifacts for on-line help

* webdocs

* codespell fix

* unit-test files for sort verb

* unit-test files for sort DSL function
  • Loading branch information
johnkerl committed Feb 8, 2022
1 parent b3127eb commit ca9505d
Show file tree
Hide file tree
Showing 28 changed files with 341 additions and 63 deletions.
17 changes: 13 additions & 4 deletions docs/src/manpage.md
Original file line number Diff line number Diff line change
Expand Up @@ -1731,6 +1731,8 @@ VERBS
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.

Example:
Expand Down Expand Up @@ -2496,10 +2498,17 @@ FUNCTIONS FOR FILTER/PUT
(class=math #args=1) Hyperbolic sine.

sort
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements naturally, and maps naturally by map keys. If the second argument is a string, it can contain any of "f" for lexical (default "n" for natural/numeric), "), "c" for case-folded lexical, and "r" for reversed/descending sort. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements with numbers first numerically and then strings lexically, and map elements likewise by map keys. If the second argument is a string, it can contain any of "f" for lexical ("n" is for the above default), "c" for case-folded lexical, or "t" for natural sort order. An additional "r" in that string is for reverse. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
Examples:
Array example: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map example: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
Default sorting: sort([3,"A",1,"B",22]) returns [1, 3, 20, "A", "B"].
Note that this is numbers before strings.
Default sorting: sort(["E","a","c","B","d"]) returns ["B", "E", "a", "c", "d"].
Note that this is uppercase before lowercase.
Case-folded ascending: sort(["E","a","c","B","d"], "c") returns ["a", "B", "c", "d", "E"].
Case-folded descending: sort(["E","a","c","B","d"], "cr") returns ["E", "d", "c", "B", "a"].
Natural sorting: sort(["a1","a10","a100","a2","a20","a200"], "t") returns ["a1", "a2", "a10", "a20", "a100", "a200"].
Array with function: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map with function: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.

splita
(class=conversion #args=2) Splits string into array with type inference. First argument is string to split; second is the separator to split on.
Expand Down Expand Up @@ -3162,5 +3171,5 @@ SEE ALSO



2022-02-07 MILLER(1)
2022-02-08 MILLER(1)
</pre>
17 changes: 13 additions & 4 deletions docs/src/manpage.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1710,6 +1710,8 @@ VERBS
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.

Example:
Expand Down Expand Up @@ -2475,10 +2477,17 @@ FUNCTIONS FOR FILTER/PUT
(class=math #args=1) Hyperbolic sine.

sort
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements naturally, and maps naturally by map keys. If the second argument is a string, it can contain any of "f" for lexical (default "n" for natural/numeric), "), "c" for case-folded lexical, and "r" for reversed/descending sort. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
(class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements with numbers first numerically and then strings lexically, and map elements likewise by map keys. If the second argument is a string, it can contain any of "f" for lexical ("n" is for the above default), "c" for case-folded lexical, or "t" for natural sort order. An additional "r" in that string is for reverse. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
Examples:
Array example: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map example: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
Default sorting: sort([3,"A",1,"B",22]) returns [1, 3, 20, "A", "B"].
Note that this is numbers before strings.
Default sorting: sort(["E","a","c","B","d"]) returns ["B", "E", "a", "c", "d"].
Note that this is uppercase before lowercase.
Case-folded ascending: sort(["E","a","c","B","d"], "c") returns ["a", "B", "c", "d", "E"].
Case-folded descending: sort(["E","a","c","B","d"], "cr") returns ["E", "d", "c", "B", "a"].
Natural sorting: sort(["a1","a10","a100","a2","a20","a200"], "t") returns ["a1", "a2", "a10", "a20", "a100", "a200"].
Array with function: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map with function: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.

splita
(class=conversion #args=2) Splits string into array with type inference. First argument is string to split; second is the separator to split on.
Expand Down Expand Up @@ -3141,4 +3150,4 @@ SEE ALSO



2022-02-07 MILLER(1)
2022-02-08 MILLER(1)
2 changes: 2 additions & 0 deletions docs/src/online-help.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,8 @@ Options:
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.

Example:
Expand Down
13 changes: 10 additions & 3 deletions docs/src/reference-dsl-builtin-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -671,10 +671,17 @@ Map example: select({"a":1, "b":3, "c":5}, func(k,v) {return v >= 3}) returns {"

### sort
<pre class="pre-non-highlight-non-pair">
sort (class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements naturally, and maps naturally by map keys. If the second argument is a string, it can contain any of "f" for lexical (default "n" for natural/numeric), "), "c" for case-folded lexical, and "r" for reversed/descending sort. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
sort (class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements with numbers first numerically and then strings lexically, and map elements likewise by map keys. If the second argument is a string, it can contain any of "f" for lexical ("n" is for the above default), "c" for case-folded lexical, or "t" for natural sort order. An additional "r" in that string is for reverse. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values.
Examples:
Array example: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map example: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
Default sorting: sort([3,"A",1,"B",22]) returns [1, 3, 20, "A", "B"].
Note that this is numbers before strings.
Default sorting: sort(["E","a","c","B","d"]) returns ["B", "E", "a", "c", "d"].
Note that this is uppercase before lowercase.
Case-folded ascending: sort(["E","a","c","B","d"], "c") returns ["a", "B", "c", "d", "E"].
Case-folded descending: sort(["E","a","c","B","d"], "cr") returns ["E", "d", "c", "B", "a"].
Natural sorting: sort(["a1","a10","a100","a2","a20","a200"], "t") returns ["a1", "a2", "a10", "a20", "a100", "a200"].
Array with function: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1].
Map with function: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}.
</pre>

## Math functions
Expand Down
2 changes: 2 additions & 0 deletions docs/src/reference-verbs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2803,6 +2803,8 @@ Options:
-n {comma-separated field names} Numerical ascending; nulls sort last
-nf {comma-separated field names} Same as -n
-nr {comma-separated field names} Numerical descending; nulls sort first
-t {comma-separated field names} Natural ascending
-tr {comma-separated field names} Natural descending
-h|--help Show this message.

Example:
Expand Down
42 changes: 28 additions & 14 deletions docs/src/sorting.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,12 @@ Miller gives you three ways to sort your data:

## Sorting records: the sort verb

The `sort` verb (see [its documentation](reference-verbs.md#sort) for more
information) reorders entire records within the data stream. You can sort
lexically (with or without case-folding) or numerically, ascending or
descending; and you can sort primary by one column, then secondarily by
The `sort` verb (see [its documentation](reference-verbs.md#sort) for more information) reorders
entire records within the data stream. You can sort lexically (with or without case-folding),
numerically, or naturally (see
[https://en.wikipedia.org/wiki/Natural_sort_order](https://en.wikipedia.org/wiki/Natural_sort_order)
or [https://github.com/facette/natsort](https://github.com/facette/natsort) for more about natural
sorting); ascending or descending; and you can sort primarily by one column, then secondarily by
another, etc.

Input data:
Expand Down Expand Up @@ -143,13 +145,13 @@ a b c
## The sort function by example

* It returns a sorted copy of an input array or map.
* Without second argument, uses the natural ordering.
* With second which is string, takes sorting flags from it: `"f"` for lexical or `"c"` for case-folded lexical, and/or `"r"` for reverse/descending.
* Without second argument, uses Miller's default ordering which is numbers numerically, then strings lexically.
* With second which is string, takes sorting flags from it: `"f"` for lexical or `"c"` for case-folded lexical, or `"t"` for natural sort order. An additional `"r"` in this string is for reverse/descending.

<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort array with natural ordering</b>
<b> # Sort array with default ordering</b>
<b> print sort([5,2,3,1,4]);</b>
<b> }</b>
<b>'</b>
Expand All @@ -161,7 +163,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort array with reverse-natural ordering</b>
<b> # Sort array with reverse-default ordering</b>
<b> print sort([5,2,3,1,4], "r");</b>
<b> }</b>
<b>'</b>
Expand All @@ -173,7 +175,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort array with custom function: natural ordering</b>
<b> # Sort array with custom function: another way to get default ordering</b>
<b> print sort([5,2,3,1,4], func(a,b) { return a <=> b});</b>
<b> }</b>
<b>'</b>
Expand All @@ -185,7 +187,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort array with custom function: reverse-natural ordering</b>
<b> # Sort array with custom function: another way to get reverse-default ordering</b>
<b> print sort([5,2,3,1,4], func(a,b) { return b <=> a});</b>
<b> }</b>
<b>'</b>
Expand All @@ -197,7 +199,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort map with natural ordering on keys</b>
<b> # Sort map with default ordering on keys</b>
<b> print sort({"c":2, "a": 3, "b": 1});</b>
<b> }</b>
<b>'</b>
Expand All @@ -213,7 +215,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort map with reverse-natural ordering on keys</b>
<b> # Sort map with reverse-default ordering on keys</b>
<b> print sort({"c":2, "a": 3, "b": 1}, "r");</b>
<b> }</b>
<b>'</b>
Expand All @@ -229,7 +231,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort map with custom function: natural ordering on values</b>
<b> # Sort map with custom function: default ordering on values</b>
<b> print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return av <=> bv});</b>
<b> }</b>
<b>'</b>
Expand All @@ -245,7 +247,7 @@ a b c
<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Sort map with custom function: reverse-natural ordering on values</b>
<b> # Sort map with custom function: reverse-default ordering on values</b>
<b> print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return bv <=> av});</b>
<b> }</b>
<b>'</b>
Expand All @@ -258,6 +260,18 @@ a b c
}
</pre>

<pre class="pre-highlight-in-pair">
<b>mlr -n put '</b>
<b> end {</b>
<b> # Natural sort</b>
<b> print sort(["a1","a10","a100","a2","a20","a200"], "t");</b>
<b> }</b>
<b>'</b>
</pre>
<pre class="pre-non-highlight-in-pair">
["a1", "a2", "a10", "a20", "a100", "a200"]
</pre>

In the rest of this page we'll look more closely at these variants.

## Simple sorting of arrays
Expand Down
39 changes: 25 additions & 14 deletions docs/src/sorting.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ Miller gives you three ways to sort your data:

## Sorting records: the sort verb

The `sort` verb (see [its documentation](reference-verbs.md#sort) for more
information) reorders entire records within the data stream. You can sort
lexically (with or without case-folding) or numerically, ascending or
descending; and you can sort primary by one column, then secondarily by
The `sort` verb (see [its documentation](reference-verbs.md#sort) for more information) reorders
entire records within the data stream. You can sort lexically (with or without case-folding),
numerically, or naturally (see
[https://en.wikipedia.org/wiki/Natural_sort_order](https://en.wikipedia.org/wiki/Natural_sort_order)
or [https://github.com/facette/natsort](https://github.com/facette/natsort) for more about natural
sorting); ascending or descending; and you can sort primarily by one column, then secondarily by
another, etc.

Input data:
Expand Down Expand Up @@ -55,13 +57,13 @@ GENMD-EOF
## The sort function by example

* It returns a sorted copy of an input array or map.
* Without second argument, uses the natural ordering.
* With second which is string, takes sorting flags from it: `"f"` for lexical or `"c"` for case-folded lexical, and/or `"r"` for reverse/descending.
* Without second argument, uses Miller's default ordering which is numbers numerically, then strings lexically.
* With second which is string, takes sorting flags from it: `"f"` for lexical or `"c"` for case-folded lexical, or `"t"` for natural sort order. An additional `"r"` in this string is for reverse/descending.

GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort array with natural ordering
# Sort array with default ordering
print sort([5,2,3,1,4]);
}
'
Expand All @@ -70,7 +72,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort array with reverse-natural ordering
# Sort array with reverse-default ordering
print sort([5,2,3,1,4], "r");
}
'
Expand All @@ -79,7 +81,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort array with custom function: natural ordering
# Sort array with custom function: another way to get default ordering
print sort([5,2,3,1,4], func(a,b) { return a <=> b});
}
'
Expand All @@ -88,7 +90,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort array with custom function: reverse-natural ordering
# Sort array with custom function: another way to get reverse-default ordering
print sort([5,2,3,1,4], func(a,b) { return b <=> a});
}
'
Expand All @@ -97,7 +99,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort map with natural ordering on keys
# Sort map with default ordering on keys
print sort({"c":2, "a": 3, "b": 1});
}
'
Expand All @@ -106,7 +108,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort map with reverse-natural ordering on keys
# Sort map with reverse-default ordering on keys
print sort({"c":2, "a": 3, "b": 1}, "r");
}
'
Expand All @@ -115,7 +117,7 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort map with custom function: natural ordering on values
# Sort map with custom function: default ordering on values
print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return av <=> bv});
}
'
Expand All @@ -124,12 +126,21 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr -n put '
end {
# Sort map with custom function: reverse-natural ordering on values
# Sort map with custom function: reverse-default ordering on values
print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return bv <=> av});
}
'
GENMD-EOF

GENMD-RUN-COMMAND
mlr -n put '
end {
# Natural sort
print sort(["a1","a10","a100","a2","a20","a200"], "t");
}
'
GENMD-EOF

In the rest of this page we'll look more closely at these variants.

## Simple sorting of arrays
Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ module github.com/johnkerl/miller
go 1.15

require (
github.com/facette/natsort v0.0.0-20181210072756-2cd4dd1e2dcb // indirect
github.com/goccmack/gocc v0.0.0-20211213154817-7ea699349eca // indirect
github.com/johnkerl/lumin v1.0.0 // indirect
github.com/kballard/go-shellquote v0.0.0-20180428030007-95032a82bc51
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
github.com/davecgh/go-spew v1.1.0 h1:ZDRjVQ15GmhC3fiQ8ni8+OwkZQO4DARzQgrnXU1Liz8=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/facette/natsort v0.0.0-20181210072756-2cd4dd1e2dcb h1:IT4JYU7k4ikYg1SCxNI1/Tieq/NFvh6dzLdgi7eu0tM=
github.com/facette/natsort v0.0.0-20181210072756-2cd4dd1e2dcb/go.mod h1:bH6Xx7IW64qjjJq8M2u4dxNaBiDfKK+z/3eGDpXEQhc=
github.com/goccmack/gocc v0.0.0-20211213154817-7ea699349eca h1:NuA6w6b01Ojdig+4K1l9p4Pp3unlv4owphbOiENm8m4=
github.com/goccmack/gocc v0.0.0-20211213154817-7ea699349eca/go.mod h1:c4Mb67Mg9+pl6OlxvnFBUiiQOSlXfh0QukINLl54OD0=
github.com/johnkerl/lumin v1.0.0 h1:CV34cHZOJ92Y02RbQ0rd4gA0C06Qck9q8blOyaPoWpU=
Expand Down
Loading

0 comments on commit ca9505d

Please sign in to comment.