Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe Access to Contiguous Storage #2307

Open
wants to merge 52 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
8093cad
[draft proposal] Safe Access to Contiguous Storage
glessard Feb 6, 2024
548b250
edit placeholder proposal url
glessard Feb 6, 2024
7cfd46d
link to pitch thread
glessard Feb 6, 2024
9993694
declare typealiases correctly
glessard Feb 9, 2024
fece228
add “first” and “last” properties
glessard Feb 9, 2024
a3784fe
fix inits from raw pointers
glessard Feb 9, 2024
ceb261d
Update proposals/nnnn-safe-shared-contiguous-storage.md
glessard Feb 14, 2024
2480ca2
add `view(as: T)`
glessard Feb 14, 2024
441a5c8
incorporate feedback from pitch discussion
glessard Feb 15, 2024
b360a50
enclose index and iterator types in the main type
glessard Feb 17, 2024
9aa96ea
update protocol declaration
glessard Feb 23, 2024
1adba6d
link to additional related pitches
glessard Feb 24, 2024
300591d
fix a stored property type
glessard Apr 16, 2024
ed5fea2
rename type, adopt new syntax
glessard Apr 17, 2024
7b2bb1f
various updates
glessard Apr 19, 2024
f876043
add more RawSpan API, doc-comment fixes
glessard Apr 21, 2024
844e661
Added more prose, added TODOs for further clarification
milseman Apr 22, 2024
15de4ab
Update proposals/nnnn-safe-shared-contiguous-storage.md
glessard Apr 22, 2024
be180f1
remove some trailing whitespace from code blocks
glessard Apr 23, 2024
26a7637
Update
milseman May 6, 2024
50a38df
Update
glessard May 24, 2024
6add19b
lots of updates
glessard Jun 20, 2024
5d19ead
Apply suggestions from code review
glessard Jun 20, 2024
d46c815
Move byte parsing helpers into a future direction
milseman Jun 20, 2024
d87f041
Fill out the index appendix
milseman Jun 20, 2024
a5239b4
tweaks and corrections
glessard Jun 21, 2024
99f305a
add missing keywords
glessard Jun 21, 2024
b3db4b4
Apply editing suggestions from review
glessard Jun 21, 2024
d728217
annotation adjustments, various edits
glessard Jun 22, 2024
a0d3b87
some more edits
glessard Jun 22, 2024
90890a5
move `ContiguousStorage` to future directions
glessard Jun 22, 2024
2d463ab
edits about unsafe initializer usage
glessard Jun 25, 2024
c8b2d5c
remove “generally” from index-sharing note
glessard Jun 25, 2024
859a071
improve index validation functions
glessard Jun 25, 2024
e924bab
omit some duplicated documentation
glessard Jun 25, 2024
1844c97
add html anchors to important sections
glessard Jun 25, 2024
385cccb
add link to second pitch thread
glessard Jun 26, 2024
5266e65
more cleanup surrounding `ContiguousStorage`
glessard Jun 28, 2024
913f6e1
whitespace fixes
glessard Jun 28, 2024
ba482d9
Change some uses of the word “view” to “span” instead
glessard Jun 30, 2024
9370b13
fix misspelling
glessard Jun 30, 2024
572a236
add missing doc-comment paragraph
glessard Jun 30, 2024
c9c312c
change `uncheckedBounds` to `unchecked`
glessard Jun 30, 2024
42170bf
fix doc-comments
glessard Jun 30, 2024
1319b1d
rework `load` and company
glessard Jun 30, 2024
66bcb19
add the `SurjectiveBitPattern` future direction
glessard Jul 1, 2024
f84aefc
more about `SurjectiveBitPattern`, plus an alternative
glessard Jul 1, 2024
4b13bcd
move reference to SE-0256 to the ContiguousStorage item
glessard Jul 1, 2024
7a88571
reword coroutine accessors
glessard Jul 3, 2024
a183439
remove undesirable annotations and default values
glessard Jul 16, 2024
0a12619
add containment utilities
glessard Jul 16, 2024
3c9ef51
Apply suggestions from code review
glessard Jul 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Added more prose, added TODOs for further clarification
  • Loading branch information
milseman authored and glessard committed Apr 23, 2024
commit 844e661bf5c4cad0be09d213b22d5dd2e3fa74a8
109 changes: 97 additions & 12 deletions proposals/nnnn-safe-shared-contiguous-storage.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Safe Access to Contiguous Storage

* Proposal: [SE-NNNN](nnnn-safe-shared-contiguous-storage.md)
* Authors: [Guillaume Lessard](https://github.com/glessard), [Andrew Trick](https://github.com/atrick)
* Authors: [Guillaume Lessard](https://github.com/glessard), [Andrew Trick](https://github.com/atrick), [Michael Ilseman](https://github.com/milseman)
* Review Manager: TBD
* Status: **Awaiting implementation**
* Roadmap: [BufferView Roadmap](https://forums.swift.org/t/66211)
Expand All @@ -20,23 +20,47 @@ This proposal is related to two other features being proposed along with it: [No

## Motivation

Consider for example a program using multiple libraries, including one for [base64](https://datatracker.ietf.org/doc/html/rfc4648) decoding. The program would obtain encoded data from one or more of its dependencies, which could supply the data in the form of `[UInt8]`, `Foundation.Data` or even `String`, among others. None of these types is necessarily more correct than another, but the base64 decoding library must pick an input format. It could declare its input parameter type to be `some Sequence<UInt8>`, but such a generic function significantly limits performance. This may force the library author to either declare its entry point as inlinable, or to implement an internal fast path using `withContiguousStorageIfAvailable()` and use an unsafe type. The ideal interface would have a combination of the properties of both `some Sequence<UInt8>` and `UnsafeBufferPointer<UInt8>`.
Swift needs safe and performant types for local processing over values in contiguous memory. Consider for example a program using multiple libraries, including one for [base64](https://datatracker.ietf.org/doc/html/rfc4648) decoding. The program would obtain encoded data from one or more of its dependencies, which could supply the data in the form of `[UInt8]`, `Foundation.Data` or even `String`, among others. None of these types is necessarily more correct than another, but the base64 decoding library must pick an input format. It could declare its input parameter type to be `some Sequence<UInt8>`, but such a generic function significantly limits performance. This may force the library author to either declare its entry point as inlinable, or to implement an internal fast path using `withContiguousStorageIfAvailable()` and use an unsafe type. The ideal interface would have a combination of the properties of both `some Sequence<UInt8>` and `UnsafeBufferPointer<UInt8>`.

The `UnsafeBufferPointer` passed to a `withUnsafeXXX` closure-style API, while performant, is unsafe in multiple ways:

1. The pointer itself is unsafe and unmanaged
2. `subscript` is only bounds-checked in debug builds of client code
3. It might escape the duration of the closure

Even if the body of the `withUnsafeXXX` call does not escape the pointer, other functions called inside the closure have to be written in terms of unsafe pointers. This requires programmer vigilance across a project and pollutes code that otherwise could be written in terms of safe constructs.


## Proposed solution

`Span` will allow sharing the contiguous internal representation of a type, by providing access to a borrowed view of an interval of contiguous memory. A view does not copy the underlying data: it instead relies on a guarantee that the original container cannot be modified or destroyed during the lifetime of the view. `Span`'s lifetime is statically enforced as a lifetime dependency to a binding of the type vending it, preventing its escape from the scope where it is valid for use. This guarantee preserves temporal safety. `Span` also performs bounds-checking on every access to preserve spatial safety. Additionally `Span` always represents initialized memory, preserving the definite initialization guarantee.

By relying on borrowing, `Span` can provide simultaneous access to a non-copyable container, and can help avoid unwanted copies of copyable containers. Note that `Span` is not a replacement for a copyable container with owned storage; see the future directions for more details ([Resizable, contiguously-stored, untyped collection in the standard library](#Bytes))

A type can indicate that it can provide a `Span` by conforming to the `ContiguousStorage` protocol. For example, for the hypothetical base64 decoding library mentioned above, a possible API could be:
`Span` is the currency type for local processing over values in contiguous memory. It is the replacement for any API currently using `Array`, `UnsafeBufferPointer`, `Foundation.Data`, etc., that does not need to escape the value.

### `ContiguousStorage`

A type can indicate that it can provide a `Span` by conforming to the `ContiguousStorage` protocol. `ContiguousStorage` forms a bridge between multi-type or generically-typed interfaces and a performant concrete implementation.

For example, for the hypothetical base64 decoding library mentioned above, a possible API could be:

```swift
extension HypotheticalBase64Decoder {
public func decode(bytes: some ContiguousStorage<UInt8>) -> [UInt8]
}
```

We will also provide a `RawSpan` in order to provide operations over contiguous bytes, for use in decoders and the like. The advantage of `RawSpan` is to be a concrete type, without a generic parameter.
**TODO**: But, we don't want to encourage this use. We want to encourage one concrete function taking a `Span<UInt8>`. Advanced libraries might add an inlinable/alwaysEmitIntoClient generic-dispatch interface in addition to this.

### `RawSpan`

`RawSpan` allows sharing the contiguous internal representation for values which may be heterogenously-typed, such as in decoders. Furthermore, it is a fully concrete type, without a generic parameter, which achieves better performance in debug builds of client code as well as a more straight-forwards unstanding of performance for library code.

All `Span<T>`s have a backing `RawSpan`.

**TODO**: Do we have a (parent) protocol for just raw span? Do we have API to get the raw span from a span?


## Detailed design

Expand Down Expand Up @@ -83,6 +107,8 @@ for i in mySpan.indices {
}
```

### `ContiguousStorage`

A type can declare that it can provide access to contiguous storage by conforming to the `ContiguousStorage` protocol:

```swift
Expand Down Expand Up @@ -111,8 +137,6 @@ extension MyResilientType {

Here, the public function obtains the `Span` from the type that vends it in inlinable code, then calls a concrete, opaque function defined in terms of `Span`. Inlining the generic shim in the client is often a critical optimization. The need for such a pattern and related improvements are discussed in the future directions below (see [Syntactic Sugar for Automatic Conversions](#Conversions).)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I remain very much unconvinced that it'll be wise to encourage folks to create APIs that can only operate on contiguous storage, as opposed to a series of contiguous storage chunks.


In addition to `Span<T>`, we propose the addition of `RawSpan`. `RawSpan` is similar to `Span<T>`, but represents initialized bytes. Its API supports slicing, along with the operations `load(as:)` and `loadUnaligned(as:)`. `RawSpan` is a specialized type supporting parsing and decoding applications in particular, where heavily-used code paths require concrete types as much as possible.

#### Extensions to Standard Library and Foundation types

```swift
Expand Down Expand Up @@ -177,6 +201,14 @@ extension UnsafeMutableRawBufferPointer: ContiguousStorage<UInt8> {
}
```

glessard marked this conversation as resolved.
Show resolved Hide resolved
**TODO**: What is the `@_unsafeNonescapableResult` annotation? Would `Slice<UnsafeBufferPointer<UInt8>>` need it?

**TODO**: Do we do a `Sequence.withSpanIfAvailable` API?

**TODO**: What all can we deprecate with this proposal?

**TODO**: Do these needs lifetime annotations on them?

#### Using `Span` with C functions or other unsafe code:

`Span` has an unsafe hatch for use with unsafe code.
Expand Down Expand Up @@ -529,6 +561,14 @@ extension Span where Element: BitwiseCopyable {
}
```

**TODO**: `public var rawSpan: RawSpan` API, as well a conformance to a raw span protocol if there is one.

### RawSpan

In addition to `Span<T>`, we propose the addition of `RawSpan` which can represent heterogenously-typed values in contiguous memory. `RawSpan` is similar to `Span<T>`, but represents initialized untyped bytes. Its API supports slicing, along with the operations `load(as:)` and `loadUnaligned(as:)`.

`RawSpan` is a specialized type supporting parsing and decoding applications in particular, as well as applications where heavily-used code paths require concrete types as much as possible.

#### Complete `RawSpan` API:

```swift
Expand Down Expand Up @@ -620,9 +660,11 @@ extension RawSpan {
}
```

**TODO**: What does `typealias Index = Span<Element>.Index` mean?

##### Index validation utiliities:

Every time `Span` uses an index or an integer offset, it checks for their validity, unless the parameter is marked with the word "unchecked". The validation is performed with these functions:
Every time `RawSpan` uses an index or an integer offset, it checks for their validity, unless the parameter is marked with the word "unchecked". The validation is performed with these functions:

```swift
extension RawSpan {
Expand Down Expand Up @@ -744,7 +786,13 @@ extension RawSpan {
public func loadUnaligned<T: BitwiseCopyable>(
from index: Index, as: T.Type
) -> T
```

**TODO**: What about unchecked variants? Those would/could be the bottom API called by data parsers which have already checked the bounds earlier (e.g. for error-throwing purposes).

A `RawSpan` can be viewed as a `Span<T>`, provided the memory is laid out homogenously as instances of `T`.

```swift
/// View the memory span represented by this view as a different type
///
/// The memory must be laid out identically to the in-memory representation of `T`.
Expand Down Expand Up @@ -809,24 +857,61 @@ while i < view.endIndex {
doSomething(view[i])
view.index(after: &i)
}

// ...or:
for o in 0..<view.count {
doSomething(view[offset: o])
for i in 0..<view.indices {
glessard marked this conversation as resolved.
Show resolved Hide resolved
doSomething(view[i])
}

// ...or
var iter = view.makeIterator()
while let elt = iter.next() {
doSomething(elt)
}

```

**TODO**: Karoy mentioned that be might not want to even take the name `Iterator` until more of the borrowed iterator design if figured out

##### Collection-like protocols for non-copyable and non-escapable types

Non-copyable and non-escapable containers would benefit from a `Collection`-like protocol family to represent a set basic, common operations. This may be `Collection` if we find a way to make it work; it may be something else.
glessard marked this conversation as resolved.
Show resolved Hide resolved

##### Sharing piecewise-contiguous memory

Some types store their internal representation in a piecewise-contiguous manner, such as trees and ropes. Some operations naturally return information in a piecewise-contiguous manner, such as network operations. These could supply results by iterating through a list of contiguous chunks of memory.

##### Delegating mutations of memory with `MutableSpan<T>`
Some data structures can delegate mutations of their owned memory. In the standard library we have `withMutableBufferPointer()`, for example. A `MutableSpan<T>` should provide a better, safer alternative.
##### Safe mutations of memory with `MutableSpan<T>`

Some data structures can delegate mutations of their owned memory. In the standard library we have `withMutableBufferPointer()`, for example.

The `UnsafeMutableBufferPointer` passed to a `withUnsafeMutableXXX` closure-style API is unsafe in multiple ways:

1. The pointer itself is unsafe and unmanaged
2. `subscript` is only bounds-checked in debug builds of client code
3. It might escape the duration of the closure
4. Exclusivity of writes is not enforced
5. Initialization of any particular memory address is not ensured

I.e., it is unsafe in all the ways `UnsafeBufferPointer`-passing closure APIs are unsafe in addition to being unsafe in exclusivity and in initialization.

Loading an uninitialized non-`BitwiseCopyable` value leads to undefined behavior. Loading an uninitialized `BitwiseCopyable` value does not immediately lead to undefined behavior, but it produces a garbage value which may lead to misbehavior of the program.

A `MutableSpan<T>` should provide a better, safer alternative to mutable memory in the same way that `Span<T>` provides a better, safer read-only type. `MutableSpan<T>` would also automatically enforce exclusivity of writes.

However, it alone does not track initialization state of each address, and that will continue to be the responsibility of the developer.


##### Delegating initialization of memory with `OutputSpan<T>`

glessard marked this conversation as resolved.
Show resolved Hide resolved
Some data structures can delegate initialization of their initial memory representation, and in some cases the initialization of additional memory. In the standard library we have `Array.init(unsafeUninitializedCapacity:initializingWith:)` and `String.init(unsafeUninitializedCapacity:initializingUTF8With:)`. A safer abstraction for initialization would make such initializers less dangerous, and would allow for a greater variety of them.

`OutputSpan<T>` would need run-time bookkeeping (e.g. a bitvector with a bit per-address) to track initialization state to safely support random access and random-order initialization.
glessard marked this conversation as resolved.
Show resolved Hide resolved

Alternatively, a divide-and-conqueor style initialization order might be solvable via an API layer without run-time bookkeeping, but with more complex ergonomics.
glessard marked this conversation as resolved.
Show resolved Hide resolved



##### <a name="Bytes"></a>Resizable, contiguously-stored, untyped collection in the standard library

The example in the [motivation](#motivation) section mentions the `Foundation.Data` type. There has been some discussion of either replacing `Data` or moving it to the standard library. This document proposes neither of those. A major issue is that in the "traditional" form of `Foundation.Data`, namely `NSData` from Objective-C, it was easier to control accidental copies because the semantics of the language did not lead to implicit copying.
Expand Down Expand Up @@ -856,4 +941,4 @@ The [`std::span`](https://en.cppreference.com/w/cpp/container/span) class templa

## Acknowledgments

Joe Groff, John McCall, Tim Kientzle, Michael Ilseman, Karoy Lorentey contributed to this proposal with their clarifying questions and discussions.
Joe Groff, John McCall, Tim Kientzle, Karoy Lorentey contributed to this proposal with their clarifying questions and discussions.