Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT][stdlib] String.Index: Add custom printing #58479

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Apr 28, 2022

[This changes the public API of the stdlib, so it will likely need to go through the Swift Evolution process before landing.]

Forum pitch: https://forums.swift.org/t/improving-string-index-s-printed-descriptions/57027

This PR conforms String.Index to CustomStringConvertible and CustomDebugStringConvertible, making it easier to understand what these indices actually are, and considerably simplifying the debugging experience when working with string indices.

Having String indices print nicer will be particularly helpful while working with the new string processing algorithms.

For reference, in Swift 5.6, String.Index uses an unhelpful, mirror-based description that is not at all human-readable, not even for the humans working on the implementation of String in the stdlib:

  let string = "👋🏼 Привіт"

  print(string.startIndex) // ⟹ Index(_rawBits: 1)
  print(string.endIndex) // ⟹ Index(_rawBits: 1376257)

String indices are simply offsets from the start of the string's underlying storage representation, referencing a particular UTF-8 or UTF-16 code unit, depending on the string's encoding. Most Swift strings are UTF-8 encoded, but strings bridged over from Objective-C may remain in their original UTF-16 encoded form.

For CustomStringConvertible, the index description displays the storage offset value and its encoding:

  let string = "👋🏼 Привіт"

  print(string.startIndex) // ⟹ "0(any)"
  print(string.endIndex) // ⟹ "21(utf8)"

Note how the start index does not care about its storage encoding -- offset zero is the same location in either case.

String index ranges print in a compact, easily understandable form:

  let i = string.firstIndex(of: "р")!
  let j = string.firstIndex(of: "і")!
  print(i ..< j) // ⟹ 11(utf8)..<17(utf8)

Exposing the actual storage offsets in the description effectively demonstrates how indices work, helping people gain a better understanding of both the underlying Unicode concepts, and the details of their implementation in Swift.

For example, successive String indices tend to skip code units in irregular ways, reflecting the size of the underlying grapheme clusters.

  print(string.indices.map { "\($0)" })
  // ⟹ ["0(any)", "8(utf8)", "9(utf8)", "11(utf8)", "13(utf8)", "15(utf8)", "17(utf8)", "19(utf8)"]

Note how the initial emoji takes up 8 code units, followed by a 1-unit ASCII space, and ending with a series of Cyrillic characters that take two UTF-8 code units each.

Looking at indices in the Unicode scalars view shows how the emoji breaks into two separate code points (U+1F44B and U+1F3FC, at offsets 13 and 17, respectively):

  print(string.unicodeScalars.indices.map { "\($0)" })
  // ⟹ ["0(any)", "4(utf8)", "8(utf8)", "9(utf8)", "11(utf8)", "13(utf8)", "15(utf8)", "17(utf8)", "19(utf8)"]

This is a native UTF-8 string, so indices in the UTF-8 view are rather boring -- they simply count offsets from 0 up to 21:

  print(string.utf8.indices.map { "\($0)" })
  // ⟹ ["0(any)", "1(utf8)", "2(utf8)", "3(utf8)", ..., "19(utf8)", "20(utf8)"]

UTF-16 indices are rather more interesting, as the string starts with two Unicode scalars outside the BMP. In UTF-16, these are encoded as surrogate pairs, which aren't directly present in this string's storage. To manage this, the index values for the trailing surrogates include a transcoded offset value (the +1 in the printout below), to help identify which code unit is addressed by the index:

  print(string.utf16.indices.map { "\($0)" })
  // ⟹ ["0(any)", "0(utf8)+1", "4(utf8)", "4(utf8)+1", "8(utf8)", "9(utf8)", "11(utf8)", "13(utf8)", "15(utf8)", "17(utf8)", "19(utf8)"]

The CustomDebugStringConvertible output is a bit more verbose. In addition to the offset + encoding, it also includes the information that is maintained in the bits of the index that are reserved for performance flags and other auxiliary data.

For example, index i below is addressing the UTF-8 code unit at offset 10 in some string, which happens to be the first code unit in a Character (i.e., an extended grapheme cluster) of length 8:

print(String(reflecting: i)) // ⟹ String.Index(offset: 10(utf8), aligned: character, stride: 8)

rdar:https://

@lorentey lorentey added swift evolution pending discussion Flag → feature: A feature that has a Swift evolution proposal currently in review standard library Area: Standard library umbrella labels Apr 28, 2022
@lorentey lorentey marked this pull request as draft April 28, 2022 01:15
@lorentey
Copy link
Member Author

@swift-ci test

Copy link
Contributor

@Azoy Azoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good to me! Awaiting what evolution says about the format...

kastiglione added a commit to swiftlang/llvm-project that referenced this pull request Oct 28, 2022
Implement a type summary for Swift's `String.Index`.

The summary string follows the following:
1. Original proposal: https://forums.swift.org/t/improving-string-index-s-printed-descriptions/57027
2. Proposed implementation: swiftlang/swift#58479
3. Temporary(ish) near-`CustomStringConvertible` implementation: swiftlang/swift#61548

The associated test cases are taken from the test cases in swiftlang/swift#58479.

rdar:https://99211823
kastiglione added a commit to swiftlang/llvm-project that referenced this pull request Nov 2, 2022
Implement a type summary for Swift's `String.Index`.

The summary string follows the following:
1. Original proposal: https://forums.swift.org/t/improving-string-index-s-printed-descriptions/57027
2. Proposed implementation: swiftlang/swift#58479
3. Temporary(ish) near-`CustomStringConvertible` implementation: swiftlang/swift#61548

The associated test cases are taken from the test cases in swiftlang/swift#58479.

rdar:https://99211823
(cherry picked from commit c7146a3)
kastiglione added a commit to swiftlang/llvm-project that referenced this pull request Feb 9, 2023
Implement a type summary for Swift's `String.Index`.

The summary string follows the following:
1. Original proposal: https://forums.swift.org/t/improving-string-index-s-printed-descriptions/57027
2. Proposed implementation: swiftlang/swift#58479
3. Temporary(ish) near-`CustomStringConvertible` implementation: swiftlang/swift#61548

The associated test cases are taken from the test cases in swiftlang/swift#58479.

rdar:https://99211823
(cherry picked from commit c7146a3)
kastiglione added a commit to swiftlang/llvm-project that referenced this pull request Feb 9, 2023
Implement a type summary for Swift's `String.Index`.

The summary string follows the following:
1. Original proposal: https://forums.swift.org/t/improving-string-index-s-printed-descriptions/57027
2. Proposed implementation: swiftlang/swift#58479
3. Temporary(ish) near-`CustomStringConvertible` implementation: swiftlang/swift#61548

The associated test cases are taken from the test cases in swiftlang/swift#58479.

rdar:https://99211823
(cherry picked from commit c7146a3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
standard library Area: Standard library umbrella swift evolution pending discussion Flag → feature: A feature that has a Swift evolution proposal currently in review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants