Use native byte/char arrays in Go #2818

RustanLeino · 2022-09-30T03:51:35Z

This PR does three things to improve performance of Go code that uses arrays:

For nonempty arrays where the target representation of elements uses uint8 or rune, the underlying Array uses a []uint8 or []rune array, respectively, instead of the default []any.
For 1-dimensional arrays, the index is passed directly to the ArrayGet/ArraySet method in the Dafny runtime, rather than using a var-args version of those methods.
When the static type of the array operations reveals that the underlying representation is uint8 or rune, then special versions of ArrayGet/ArraySet are called where the argument/return is a uint8 or rune, respectively, rather than the default versions, which use an any (that is, interface{}) as the argument/return.

Unlike the current compilation to Java, which introduces type descriptors for all types in order to specialize arrays, this PR uses existing type descriptors or the initialization elements provided by the program. The one case where the type cannot be recovered for specialization is for 0-length arrays, which this PR represents with an underlying nil array. It seems the design in this PR can be adapted to Java or JavaScript as well.

The performance improvements are as follows. The program run is included below, and the Go measurements were repeated 6 times (each varying less than 0.2s from the numbers shown here).

target features	time
C#	31.4s
Go with specialized arrays and no boxing across Set/Get	27.4s
Go with specialized arrays, but still using boxes across Set/Get	31.8s
Go with arrays of `any`	103.6s
Go with arrays of `any` and using var-args even for 1-dim arrays	179.1s

newtype byte = x | 0 <= x < 256
newtype int32 = x | -0x8000_0000 <= x < 0x8000_0000

method MM<X>(x: X) returns (r: X) { r := x; }

method Main() {
  for i := 0 to 10_000 {
    var a := GenerateArray();
    a := CopyArray(a);
    var s := SumArray(a);
    expect s == 0;
  }
  print "done\n";
}

method GenerateArray() returns (b: array<byte>)
  ensures b.Length == 256_000
{
  b := new byte[256_000];
  for i: int32 := 0 to 256_000 {
    b[i] := (i % 256) as byte;
  }
}

method CopyArray(a: array<byte>) returns (b: array<byte>)
  requires a.Length == 256_000
  ensures b.Length == 256_000
{
  b := new byte[256_000];
  for i: int32 := 0 to 256_000 {
    b[i] := a[i];
  }
}

method SumArray(a: array<byte>) returns (sb: byte)
  requires a.Length == 256_000
{
  var s: int32 := 0;
  for i: int32 := 0 to 256_000
    invariant 0 <= s < 256
  {
    s := s + a[i] as int32;
    if 256 <= s {
      s := s - 256;
    }
  }
  sb := s as byte;
}

Other fixes along the way

As part of writing tests for the new functionality, I detected and fixed various other infelicities as well:

fix: In Java, cast to non-box type after let
fix: In JavaScript, compare arrays by comparing their references, not their elements (issue Array comparison causes JavaScript run-time crash #3207)
fix: In C#, add some necessary casts
In Go, improved implementation of array comparisons
In Go, fixes array comparisons (to be reference equality, not equality of elements) (Go incorrectly compares array values by contents, instead of by reference #2708)
fix: In Java, fix enumerations of long Java values (which previously ended up truncating numbers to ints)
fix: In Java, don't confuse a Dafny-defined type Long with Java's java.lang.Long (and similar for other integer types) (reported as Emitted Java converts long to int and loses precision #3204)
fix: In Python, don't share inner arrays, which was previously done when arrays have 3 or more dimensions
Across the target languages, pass BigInteger as array sizes and use native integers for array indices
Pretty printing: Elide the array size in new only when the input program does (this uses AutoGeneratedToken, as in other places)
When there are several IntBoundedPools, try a little harder to pick the best bounds. Alas, there are still simple cases where the target code ends up enumerating all 32-bit integers, for example. To improve the target code further, we need to make use of the partial-evaluation of expressions, which exists elsewhere in the Dafny implementation, and/or make use of run-time Min and Max routines.

Fixes #3204
Fixes #2708
Fixes #3207

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

This solution uses an AutoGeneratedToken when filling in an omitted size. Previously, the code looked at the LiteralExpr in ArrayDimensions[0], which had the effect of also eliding a programmer-supplied size. Also, the previous implementation would crash if a size was given but wasn’t a LiteralExpr.

* Remove duplicate check * Add omitted check * Change error message to talk about “memory limit” rather than some C# property

… Python

…go-array-refresh

robin-aws · 2023-01-17T19:34:45Z

Just to update: I've been working on fixing the remaining issues with the new Test/unicodechars/comp/Arrays.dfy test case - I have them mostly fixed on a branch but there's still some non-trivial work to coerce Java functions correctly in the same style Go already does.

robin-aws · 2023-01-19T19:08:10Z

The Java fixes are rather involved so I'm going to move them to a separate PR, and just merge this one with Java disabled in the new Test/unicodechars/comp/Arrays.dfy test case for now.

@fabiomadge would you mind giving my two commits (6835539 and bcd2f04) a sanity check before I merge this? Rustan asked me to add whatever was necessary to complete this while he was on vacation, but I don't want to use that to sneak code in without review. :)

fabiomadge · 2023-01-19T22:53:48Z

Source/DafnyCore/Compilers/Compiler-python.cs

@@ -1665,7 +1661,7 @@ private class ClassWriter : IClassWriter {
 SeqType or MultiSetType => ("[", "]"),
 _ => ("{", "}")
 };
- wr.Write(TypeHelperName(ct));
+ wr.Write(ct is SeqType ? DafnySeqMakerFunction : TypeHelperName(ct));


Did you try doing this without circumventing the TypeHelperName?

I did at first and found the code generation was a lot messier.

fabiomadge · 2023-01-19T23:03:50Z

Source/DafnyRuntime/DafnyRuntime.py

+ if self.isStr is None or other.isStr is None:
+ isStr = None
+ else:
+ isStr = self.isStr and other.isStr
+ return Seq(super().__add__(other), isStr=isStr)


Are you sure that this changes the semantics?

Yup. Before Seq("a", isStr=None) + Seq("b", isStr=None) would become the same as Seq("ab", isStr=False), which would still go through the inference and decide it was a string, which we don't want.

>>> (None and None) == None True >>> (None and None) == False False

Ah you're right, self.isStr = None is all that needed to change. I added this at the same time and never tested it without. :) Thanks!

fabiomadge · 2023-01-19T23:08:05Z

Source/DafnyRuntime/DafnyRuntime.py

@@ -399,6 +402,12 @@ def plus_char(a, b):
 def minus_char(a, b):
 return chr(ord(a) - ord(b))

+def plus_unicode_char(a, b):
+ return CodePoint(plus_char(a, b))


I was certain to have made this remark elsewhere, but can't find it, so: Can you overload + and - instead?

Ah true, I can do that. For the record I don't care much about the usability of CodePoint though, since I only expect compiled Dafny code to reference it.

It did end up simplifying the code generation logic a little though, which I DO care about :) , so thanks!

Source/DafnyCore/Compilers/Compiler-python.cs

Source/DafnyRuntime/DafnyRuntime.py

Co-authored-by: Fabio Madge <[email protected]>

fabiomadge · 2023-01-25T02:49:50Z

The Python changes look ok to me.

This PR does three things to improve performance of Go code that uses arrays: * For nonempty arrays where the target representation of elements uses `uint8` or `rune`, the underlying `Array` uses a `[]uint8` or `[]rune` array, respectively, instead of the default `[]any`. * For 1-dimensional arrays, the index is passed directly to the `ArrayGet`/`ArraySet` method in the Dafny runtime, rather than using a var-args version of those methods. * When the static type of the array operations reveals that the underlying representation is `uint8` or `rune`, then special versions of `ArrayGet`/`ArraySet` are called where the argument/return is a `uint8` or `rune`, respectively, rather than the default versions, which use an `any` (that is, `interface{}`) as the argument/return. Unlike the current compilation to Java, which introduces type descriptors for all types in order to specialize arrays, this PR uses existing type descriptors or the initialization elements provided by the program. The one case where the type cannot be recovered for specialization is for 0-length arrays, which this PR represents with an underlying `nil` array. It seems the design in this PR can be adapted to Java or JavaScript as well. The performance improvements are as follows. The program run is included below, and the Go measurements were repeated 6 times (each varying less than 0.2s from the numbers shown here). | target features | time | | ------------- | ------------- | | C# | 31.4s | | Go with specialized arrays and no boxing across Set/Get | 27.4s | | Go with specialized arrays, but still using boxes across Set/Get | 31.8s | | Go with arrays of `any` | 103.6s | | Go with arrays of `any` and using var-args even for 1-dim arrays | 179.1s | ``` dafny newtype byte = x | 0 <= x < 256 newtype int32 = x | -0x8000_0000 <= x < 0x8000_0000 method MM<X>(x: X) returns (r: X) { r := x; } method Main() { for i := 0 to 10_000 { var a := GenerateArray(); a := CopyArray(a); var s := SumArray(a); expect s == 0; } print "done\n"; } method GenerateArray() returns (b: array<byte>) ensures b.Length == 256_000 { b := new byte[256_000]; for i: int32 := 0 to 256_000 { b[i] := (i % 256) as byte; } } method CopyArray(a: array<byte>) returns (b: array<byte>) requires a.Length == 256_000 ensures b.Length == 256_000 { b := new byte[256_000]; for i: int32 := 0 to 256_000 { b[i] := a[i]; } } method SumArray(a: array<byte>) returns (sb: byte) requires a.Length == 256_000 { var s: int32 := 0; for i: int32 := 0 to 256_000 invariant 0 <= s < 256 { s := s + a[i] as int32; if 256 <= s { s := s - 256; } } sb := s as byte; } ``` ## Other fixes along the way As part of writing tests for the new functionality, I detected and fixed various other infelicities as well: * fix: In Java, cast to non-box type after let * fix: In JavaScript, compare arrays by comparing their references, not their elements (issue dafny-lang#3207) * fix: In C#, add some necessary casts * In Go, improved implementation of array comparisons * In Go, fixes array comparisons (to be reference equality, not equality of elements) (dafny-lang#2708) * fix: In Java, fix enumerations of `long` Java values (which previously ended up truncating numbers to `int`s) * fix: In Java, don't confuse a Dafny-defined type `Long` with Java's `java.lang.Long` (and similar for other integer types) (reported as dafny-lang#3204) * fix: In Python, don't share inner arrays, which was previously done when arrays have 3 or more dimensions * Across the target languages, pass BigInteger as array sizes and use native integers for array indices * Pretty printing: Elide the array size in `new` only when the input program does (this uses `AutoGeneratedToken`, as in other places) * When there are several `IntBoundedPool`s, try a little harder to pick the best bounds. Alas, there are still simple cases where the target code ends up enumerating all 32-bit integers, for example. To improve the target code further, we need to make use of the partial-evaluation of expressions, which exists elsewhere in the Dafny implementation, and/or make use of run-time `Min` and `Max` routines. Fixes dafny-lang#3204 Fixes dafny-lang#2708 Fixes dafny-lang#3207 <small>By submitting this pull request, I confirm that my contribution is made under the terms of the [MIT license](https://github.com/dafny-lang/dafny/blob/master/LICENSE.txt).</small> Co-authored-by: Fabio Madge <[email protected]> Co-authored-by: Robin Salkeld <[email protected]> Co-authored-by: Aaron Tomb <[email protected]>

Fixes #3413. Addresses the issues uncovered during #2818 by adding a `--unicode-char` mode version of `Test/comp/Arrays.dfy`, which all stem from incomplete handling of manual boxing/unboxing of the `CodePoint` type at various Java code generation points. Note this is still incomplete as there must be other spots that do not handle coercion correctly in general, but these changes at least cover `Arrays.dfy`. It turned out that handling arrow coercion as in the Go compiler was not necessary for Java, because the initial casting of a function reference as a lambda is where the necessary boxing/unboxing needs to happen instead anyway. That is, a reference to a Dafny `function f(x: char): char` has to end up as a Java `Function<CodePoint, CodePoint>` and therefore be eta expanded from the start. Also removed the fail-fast behavior from `%testDafnyForEachCompiler` as I found it more useful for debugging that way. Also implemented `ConcreteSyntaxTree.Comma` as part of an initial implementation of Java arrow conversion, which ended up not directly used but seems useful enough to keep (and I refactored one spot to use it as an example). Edit: I've also added a few more similar cases exposed as part of enabling `/unicodeChar:1` by default for Dafny 4.0: #3635 Co-authored-by: Rustan Leino <[email protected]> Co-authored-by: Fabio Madge <[email protected]> Co-authored-by: Aaron Tomb <[email protected]>

RustanLeino added 30 commits September 26, 2022 10:12

chore: Improve formatting

fd573f8

Change Array from a struct to an interface in Go

0593726

fix: Cast to non-box type after let in Java

d398d89

fix: Comparison of arrays in JavaScript

310fca6

fix: Add needed casts in C#

9a5da01

Add tests

ed1c1d6

Improve EqualsGeneric for ArrayStruct

06232a3

Further improve NewValue…

6dc48fc

Add more tests

37e7c2e

Improve NewArray… methods further

b7160e1

Change “interface{}” to its alias “any” in Go

7e1adf7

chore: Remove unused method

ca2205d

Remove a use of EmitArraySelectAsLvalue

e6cb5b5

Remove another use of EmitArraySelectAsLvalue

91b44b3

Make EmitArrayUpdate return RHS wr rather than take RHS as parameter

c3f810a

Replace EmitArraySelectAsLvalue by ArrayLvalueImpl

8fb3c25

Replace ArrayIndex pointers with Set/Get

a15454a

Remove unused runtime routines

4b841ae

Change dims() into dimensionCount()/dimensionLength()

254100f

Move Get1/Set1 into Array interface

3b9086c

Use ArraySet1 instead of ArrayUpdate

af8c60d

Remove contents() from Array interface

6bb4e63

Improve C# array-size run-time checking

bf6eeeb

* Remove duplicate check * Add omitted check * Change error message to talk about “memory limit” rather than some C# property

Add exampleElement parameter to EmitNewArray

a883bd8

Create Go arrays using example element

c12c102

Add start parameter to CreateForLoop

d8940f6

Change virtual EmitNewArray to take strings instead of Expressions

8246df6

Rework new-array with init-funciton to support example

1482bfd

Add tests for more array types

5bda65d

RustanLeino added 6 commits December 16, 2022 17:01

fix: Default value of unicode char for JavaScript

e842bfc

Big cop-out: Java unicode boxing/unboxing not yet working

e5ee2aa

Some unicode fixes, and giving up on Java and Python

b9b7831

Merge branch 'master' into go-array-refresh

5069cee

Export only Array, not ArrayStruct, ArrayForByte, …

027e17c

Merge branch 'master' into go-array-refresh

8cc0b87

RustanLeino mentioned this pull request Dec 23, 2022

Sequence display with generic type causes Java assertion failure #2071

Closed

robin-aws added 3 commits January 6, 2023 12:06

Merge branch 'master' into go-array-refresh

3a8c6fe

Ensure isStr inference is disabled in more cases for —unicode-char in…

6835539

… Python

Merge branch 'go-array-refresh' of github.com:RustanLeino/dafny into …

93720cc

…go-array-refresh

robin-aws self-assigned this Jan 10, 2023

Merge branch 'master' into go-array-refresh

3522080

robin-aws added 2 commits January 19, 2023 10:10

Enabling Python in —unicode-char versino of Arrays.dfy

bcd2f04

Merge branch 'master' into go-array-refresh

0353ee3

fabiomadge reviewed Jan 19, 2023

View reviewed changes

Overload + and - on CodePoint instead

57b80d0

fabiomadge reviewed Jan 20, 2023

View reviewed changes

Source/DafnyCore/Compilers/Compiler-python.cs Outdated Show resolved Hide resolved

fabiomadge reviewed Jan 20, 2023

View reviewed changes

Source/DafnyRuntime/DafnyRuntime.py Outdated Show resolved Hide resolved

robin-aws and others added 2 commits January 19, 2023 17:35

None and None == None

ec525c9

Apply suggestions from code review

d332ca3

Co-authored-by: Fabio Madge <[email protected]>

Merge branch 'master' into go-array-refresh

41bad87

robin-aws approved these changes Jan 25, 2023

View reviewed changes

robin-aws merged commit 822dadc into dafny-lang:master Jan 25, 2023

robin-aws mentioned this pull request Jan 26, 2023

Missing coercions in edge cases for characters in Java with --unicode-char #3413

Closed

robin-aws mentioned this pull request Feb 24, 2023

fix: Java --unicode-char mode coercion improvements #3630

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use native byte/char arrays in Go #2818

Use native byte/char arrays in Go #2818

RustanLeino commented Sep 30, 2022 •

edited

Loading

robin-aws commented Jan 17, 2023

robin-aws commented Jan 19, 2023

fabiomadge Jan 19, 2023

robin-aws Jan 19, 2023

fabiomadge Jan 19, 2023

robin-aws Jan 19, 2023

fabiomadge Jan 20, 2023

robin-aws Jan 20, 2023

fabiomadge Jan 19, 2023 •

edited

Loading

robin-aws Jan 19, 2023

robin-aws Jan 20, 2023

fabiomadge commented Jan 25, 2023

Use native byte/char arrays in Go #2818

Use native byte/char arrays in Go #2818

Conversation

RustanLeino commented Sep 30, 2022 • edited Loading

Other fixes along the way

robin-aws commented Jan 17, 2023

robin-aws commented Jan 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabiomadge Jan 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabiomadge commented Jan 25, 2023

RustanLeino commented Sep 30, 2022 •

edited

Loading

fabiomadge Jan 19, 2023 •

edited

Loading