[DRAFT][stdlib] String.Index: Add custom printing #58479
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[This changes the public API of the stdlib, so it will likely need to go through the Swift Evolution process before landing.]
Forum pitch: https://forums.swift.org/t/improving-string-index-s-printed-descriptions/57027
This PR conforms
String.Index
toCustomStringConvertible
andCustomDebugStringConvertible
, making it easier to understand what these indices actually are, and considerably simplifying the debugging experience when working with string indices.Having String indices print nicer will be particularly helpful while working with the new string processing algorithms.
For reference, in Swift 5.6,
String.Index
uses an unhelpful, mirror-based description that is not at all human-readable, not even for the humans working on the implementation ofString
in the stdlib:String indices are simply offsets from the start of the string's underlying storage representation, referencing a particular UTF-8 or UTF-16 code unit, depending on the string's encoding. Most Swift strings are UTF-8 encoded, but strings bridged over from Objective-C may remain in their original UTF-16 encoded form.
For
CustomStringConvertible
, the index description displays the storage offset value and its encoding:Note how the start index does not care about its storage encoding -- offset zero is the same location in either case.
String index ranges print in a compact, easily understandable form:
Exposing the actual storage offsets in the description effectively demonstrates how indices work, helping people gain a better understanding of both the underlying Unicode concepts, and the details of their implementation in Swift.
For example, successive
String
indices tend to skip code units in irregular ways, reflecting the size of the underlying grapheme clusters.Note how the initial emoji takes up 8 code units, followed by a 1-unit ASCII space, and ending with a series of Cyrillic characters that take two UTF-8 code units each.
Looking at indices in the Unicode scalars view shows how the emoji breaks into two separate code points (U+1F44B and U+1F3FC, at offsets 13 and 17, respectively):
This is a native UTF-8 string, so indices in the UTF-8 view are rather boring -- they simply count offsets from 0 up to 21:
UTF-16 indices are rather more interesting, as the string starts with two Unicode scalars outside the BMP. In UTF-16, these are encoded as surrogate pairs, which aren't directly present in this string's storage. To manage this, the index values for the trailing surrogates include a transcoded offset value (the
+1
in the printout below), to help identify which code unit is addressed by the index:The
CustomDebugStringConvertible
output is a bit more verbose. In addition to the offset + encoding, it also includes the information that is maintained in the bits of the index that are reserved for performance flags and other auxiliary data.For example, index
i
below is addressing the UTF-8 code unit at offset 10 in some string, which happens to be the first code unit in aCharacter
(i.e., an extended grapheme cluster) of length 8:rdar:https://