[DRAFT][stdlib] String.Index: Add custom printing #58479

lorentey · 2022-04-28T01:12:14Z

[This changes the public API of the stdlib, so it will likely need to go through the Swift Evolution process before landing.]

Forum pitch: https://forums.swift.org/t/improving-string-index-s-printed-descriptions/57027

This PR conforms String.Index to CustomStringConvertible and CustomDebugStringConvertible, making it easier to understand what these indices actually are, and considerably simplifying the debugging experience when working with string indices.

Having String indices print nicer will be particularly helpful while working with the new string processing algorithms.

For reference, in Swift 5.6, String.Index uses an unhelpful, mirror-based description that is not at all human-readable, not even for the humans working on the implementation of String in the stdlib:

  let string = "👋🏼 Привіт"

  print(string.startIndex) // ⟹ Index(_rawBits: 1)
  print(string.endIndex) // ⟹ Index(_rawBits: 1376257)

String indices are simply offsets from the start of the string's underlying storage representation, referencing a particular UTF-8 or UTF-16 code unit, depending on the string's encoding. Most Swift strings are UTF-8 encoded, but strings bridged over from Objective-C may remain in their original UTF-16 encoded form.

For CustomStringConvertible, the index description displays the storage offset value and its encoding:

  let string = "👋🏼 Привіт"

  print(string.startIndex) // ⟹ "0(any)"
  print(string.endIndex) // ⟹ "21(utf8)"

Note how the start index does not care about its storage encoding -- offset zero is the same location in either case.

String index ranges print in a compact, easily understandable form:

  let i = string.firstIndex(of: "р")!
  let j = string.firstIndex(of: "і")!
  print(i ..< j) // ⟹ 11(utf8)..<17(utf8)

Exposing the actual storage offsets in the description effectively demonstrates how indices work, helping people gain a better understanding of both the underlying Unicode concepts, and the details of their implementation in Swift.

For example, successive String indices tend to skip code units in irregular ways, reflecting the size of the underlying grapheme clusters.

  print(string.indices.map { "\($0)" })
  // ⟹ ["0(any)", "8(utf8)", "9(utf8)", "11(utf8)", "13(utf8)", "15(utf8)", "17(utf8)", "19(utf8)"]

Note how the initial emoji takes up 8 code units, followed by a 1-unit ASCII space, and ending with a series of Cyrillic characters that take two UTF-8 code units each.

Looking at indices in the Unicode scalars view shows how the emoji breaks into two separate code points (U+1F44B and U+1F3FC, at offsets 13 and 17, respectively):

  print(string.unicodeScalars.indices.map { "\($0)" })
  // ⟹ ["0(any)", "4(utf8)", "8(utf8)", "9(utf8)", "11(utf8)", "13(utf8)", "15(utf8)", "17(utf8)", "19(utf8)"]

This is a native UTF-8 string, so indices in the UTF-8 view are rather boring -- they simply count offsets from 0 up to 21:

  print(string.utf8.indices.map { "\($0)" })
  // ⟹ ["0(any)", "1(utf8)", "2(utf8)", "3(utf8)", ..., "19(utf8)", "20(utf8)"]

UTF-16 indices are rather more interesting, as the string starts with two Unicode scalars outside the BMP. In UTF-16, these are encoded as surrogate pairs, which aren't directly present in this string's storage. To manage this, the index values for the trailing surrogates include a transcoded offset value (the +1 in the printout below), to help identify which code unit is addressed by the index:

  print(string.utf16.indices.map { "\($0)" })
  // ⟹ ["0(any)", "0(utf8)+1", "4(utf8)", "4(utf8)+1", "8(utf8)", "9(utf8)", "11(utf8)", "13(utf8)", "15(utf8)", "17(utf8)", "19(utf8)"]

The CustomDebugStringConvertible output is a bit more verbose. In addition to the offset + encoding, it also includes the information that is maintained in the bits of the index that are reserved for performance flags and other auxiliary data.

For example, index i below is addressing the UTF-8 code unit at offset 10 in some string, which happens to be the first code unit in a Character (i.e., an extended grapheme cluster) of length 8:

print(String(reflecting: i)) // ⟹ String.Index(offset: 10(utf8), aligned: character, stride: 8)

rdar:https://

lorentey · 2022-04-28T01:15:34Z

@swift-ci test

Azoy

Code changes look good to me! Awaiting what evolution says about the format...

…gDescription`

Implement a type summary for Swift's `String.Index`. The summary string follows the following: 1. Original proposal: https://forums.swift.org/t/improving-string-index-s-printed-descriptions/57027 2. Proposed implementation: swiftlang/swift#58479 3. Temporary(ish) near-`CustomStringConvertible` implementation: swiftlang/swift#61548 The associated test cases are taken from the test cases in swiftlang/swift#58479. rdar:https://99211823

Implement a type summary for Swift's `String.Index`. The summary string follows the following: 1. Original proposal: https://forums.swift.org/t/improving-string-index-s-printed-descriptions/57027 2. Proposed implementation: swiftlang/swift#58479 3. Temporary(ish) near-`CustomStringConvertible` implementation: swiftlang/swift#61548 The associated test cases are taken from the test cases in swiftlang/swift#58479. rdar:https://99211823 (cherry picked from commit c7146a3)

[stdlib] String.Index: Add custom printing

dd7998b

lorentey added swift evolution pending discussion Flag → feature: A feature that has a Swift evolution proposal currently in review standard library Area: Standard library umbrella labels Apr 28, 2022

lorentey requested review from milseman, rxwei and Azoy April 28, 2022 01:12

lorentey marked this pull request as draft April 28, 2022 01:15

Azoy approved these changes Apr 28, 2022

View reviewed changes

lorentey added 3 commits April 29, 2022 16:37

[stdlib] String.Index: Update descriptions per feedback

c5cd73e

[stdlib] String.Index: Add CustomReflectable conformance

934bacc

[stdlib] String.Index: Make customMirror more consistent with `debu…

7a2da47

…gDescription`

kastiglione mentioned this pull request Oct 27, 2022

[lldb] Add type summary for String.Index swiftlang/llvm-project#5515

Merged

kastiglione mentioned this pull request Feb 9, 2023

[lldb] Add type summary for String.Index (#5515) swiftlang/llvm-project#6262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT][stdlib] String.Index: Add custom printing #58479

[DRAFT][stdlib] String.Index: Add custom printing #58479

lorentey commented Apr 28, 2022 •

edited

Loading

lorentey commented Apr 28, 2022

Azoy left a comment

[DRAFT][stdlib] String.Index: Add custom printing #58479

Are you sure you want to change the base?

[DRAFT][stdlib] String.Index: Add custom printing #58479

Conversation

lorentey commented Apr 28, 2022 • edited Loading

lorentey commented Apr 28, 2022

Azoy left a comment

Choose a reason for hiding this comment

lorentey commented Apr 28, 2022 •

edited

Loading