- Abstract
- Problem
- Background
- Proposal
- Details
- Future work
- Rationale
- Alternatives considered
This proposes an interface that can be implemented by user types to support
range-based iteration with for
.
The current for
proposal does not define a way for types to support
range-based for
loops.
Examples of use cases for for
loops include iterating over:
- Mutable and immutable elements
- Random access containers
- Maps, hash maps, and complex containers
- R-value containers
- Synthetic data
- Large containers, and large elements
- Filtering, transforming, and reversing
- Interoperability with C++ types and containers
The goals for the solution includes:
- Easy to support for user types, with minimal requirements
- Simple and clear usage
- Minimal performance overhead
The original proposal of for
loops were discussed in
p0353, and the basic design documented in
control_flow/loops#for.
The resulting syntax was:
for (
var declarationin
expression) {
statements}
We now need a way to allow supporting this syntax for user and library container types.
This section highlights some examples of how iterators and range-based for user-defined types is addressed in different languages.
C++ requires either .begin()
/.end()
methods, or
begin(<container>)
/end(<container>)
to be supported for
range-based for
.
Python requires 2 dunder methods, __iter__
on the container and __next__
on
the iterator, to be usable with a for <element> in <container>:
construct.
Generator functions act like iterators.
They are executed until the next yield
statement, whose argument is returned
as the next value to the for
loop. Iteration ends when the function returns.
def generate_fibonacci():
yield 0;
n1, n2 = 0, 1;
while True:
n1, n2 = n2, n1+n2;
yield n1;
for n in generate():
print(n);
Rust supports iterating over a container with for using
- ranges
for n in 1..100
- iterators
for element in container.iter()
or.iter_mut()
The Iterator
type can be used with custom types using the
following syntax
(though specific):
struct Fibonacci {
curr: u32,
next: u32,
}
impl Iterator for Fibonacci {
type Item = u32;
fn next(&mut self) -> Option<Self::Item> {
// calculate and yield value
[...]
}
}
for i in fibonacci().take(4) {
...
}
Typescript uses the
.iterator
property,
along with
two forms
of range-based for loops: for..in
and for..of
.
let pets = new Set(['Cat', 'Dog', 'Hamster']);
pets['species'] = 'mammals';
for (let pet in pets) {
console.log(pet); // "species"
}
for (let pet of pets) {
console.log(pet); // "Cat", "Dog", "Hamster"
}
Using the .iterator
property, Typescript allow to lazily generate data easily
using functions, classes, or objects:
const reverse = (arr) => ({
[Symbol.iterator]() {
let i = arr.length;
return {
next: () => ({
value: arr[--i],
done: i < 0,
}),
};
},
});
Go does not officially support customizing for
for user types.
Provide interfaces that can be implemented by user types to enable support for ranged-for loops.
We propose:
- A new
Iterate
interface to support for loops - Using a cursor-based approach
- Making the
for
loop valuelet
by default, that is, non-reassignable
Below is a high-level example of this concept:
// Add support for `for`
class MyRange {
impl as Iterate where .ElementType = i32 and .CursorType = i32 {
// Implement `Iterate`
}
}
// Usage
let my_range: MyRange = {...};
for (item: auto in my_range) {
// Do something with `item`
}
This proposal exposes a new Iterate
. This interface:
- Relies on a cursor-based approach: minimize implementation effort for developers, could support R-values.
- Expects an
ElementType
andCursorType
. - Combines "advance", "get", and "bounds check" into a single
Next(cursor)
method that returns an optional value.
The Iterate
interface is:
interface Iterate {
let ElementType:! type;
let CursorType:! type;
fn NewCursor[self: Self]() -> CursorType;
fn Next[self: Self](cursor: CursorType*) -> Optional(ElementType);
}
A naive Carbon implementation of the for loop could be:
var cursor: range.(Iterate.CursorType) = range.(Iterate.NewCursor)();
var iter: Optional(range.(Iterate.ElementType)) = range.(Iterate.Next)(&cursor);
// A. Possible implementation
while (iter.HasValue()) {
ExecuteForBlock(iter.Get());
iter = container.(Iterate.Next)(&cursor);
}
// B. Possible alternative depending on the API for Optional(T):
while (true) {
match (iter) {
case .None => { break; }
case .Some(let value: range.(Iterate.ElementType)) => {
ExecuteForBlock(value);
iter = container.(Iterate.Next)(&cursor);
}
}
}
The cursor tracks progression, and the Next
method advances the cursor and
returns an Optional
value. An empty Optional
indicates that we have reached
the end.
Below is a sample implementation of that interface for a user type.
class MyIntContainer {
var data: ...
fn Size[self: Self]() -> i32 {
return 4;
}
impl as Iterate where .ElementType = i32 and .CursorType = i32 {
fn NewCursor[self: Self]() -> i32 { return 0; }
fn Next[self: Self](cursor: i32*) -> Optional(i32) {
// Advance and return value, or return empty
if (*cursor < self.Size()) {
let index: i32 = *cursor;
*cursor = index + 1;
return Optional(i32).Create(self.data[index]);
} else {
return Optional(i32).CreateEmpty();
}
}
}
}
Which can be used as follow:
fn Main() -> i32 {
var container: MyIntContainer = {.data = (1, 2, 3, 4)};
// Implicit `let` value declaration
for (value: i32 in container) {
Print("{0}", value);
}
return 0;
}
To keep usage simple and non-surprising, we propose defaulting variable
declarations to let
declarations. This is consistent with function parameters
and scrutinees for
pattern matching,
which are immutable by default.
// Implicitly `let item: T`
for (item: T in my_range) {
// `item` cannot be reassigned
}
This default can be overridden by adding the var
keyword:
for (var item: T in my_range) {
// `item` can be modified, but this will not modify `my_range`
// because it is a copy of the element with its own storage
}
It will be important to ensure that temporary entities on the right-hand side of
:
are alive during the execution of the for
loop to prevent invalid memory
access (Note: this was only recently addressed for C++ by
P2644).
This applies to Carbon and C++ interoperability.
// Example using a temporary range, with `GetElements()` returning an object that implements `Iterate`
for (item: T in bucket.GetElements()) {
// We need to make sure the object is alive for the duration of the `for` loop
}
This section covers the use cases highlighted in the introduction, and usage with the proposed design.
This proposal covers strictly immutable elements, and proposes defaulting
element declaration to let
, to minimize surprises and clarify the intent.
Mutable elements are mentioned in the Future work section.
For random access containers, the cursor would represent a numerical index,
incremented within Next()
.
Because the cursor type is defined by the developer, and opaque to the user, it could be a hashmap index, a string, or pointer to a node, without changing the usage for users.
The ElementType
can be a tuple, such as a (key, value)
for maps, or a single
value. See [Future work][#future-work] for other examples.
R-values are likely to be limited in terms of addressing and mutation. The impact is undefined until the corresponding design is finalized.
The cursor approach used for Iterate
has the slight advantage of not requiring
pointers (and addressing) for some container types. This may be beneficial as a
first iteration, until the dependent design is updated, at which point we will
have to define how to handle this special case.
Next()
can be trivially implemented to return a single computed value
on-demand based on the cursor, without the need for storage. If the sequence is
infinite, Next()
will never return an empty Optional
, and the for
loop
will continue until interrupted by the user.
class SquaredSequence {
impl as Iterate where .ElementType = i32 and .CursorType = i32 {
fn NewCursor[self: Self]() -> i32 { return -1; }
fn Next[self: Self](cursor: i32*) -> Optional(i32) {
// Advance and return computed value
*cursor = *cursor + 1;
return Optional(i32).Create((*cursor) * (*cursor));
}
}
}
The proposal does not present any limitation regarding large collections. The cursor can be adapted to the collection size, and no copy of the container should occur both for immutable collections, and future mutable elements.
Provided the traversal order is the same, generic filters and transformers are
supported by implementing a proxy Iterate
interface, internally or externally.
This example illustrates a simple proxy interface.
class BasicFilter(T:! Iterate) {
var seq: T*;
var value: T.ElementType;
impl as Iterate where .ElementType = T.ElementType and .CursorType = T.CursorType {
fn NewCursor[self: Self]() -> T.ElementType { return self.seq->NewCursor(); }
fn Next[self: Self](cursor: T.CursorType*) -> Optional(T.ElementType) {
var res: auto = self.seq->Next(cursor);
while (res.has_value() && res.get() == value) {
res = self.seq->Next(cursor);
}
return res;
}
}
}
fn Main() -> i32 {
let container: MyIntContainer = MyIntContainer.Create(0,1,0,2,3,0);
let no_zeros: BasicFilter(MyIntContainer) =
{ .seq = &container, .value = 0 };
// Prints "123"
for (v: i32 in no_zeros) {
Print("{0}", v);
}
return 0;
}
On the other hand, changing how the collection is traversed (for example
reverse), requires internal knowledge about the data layout, and a custom
implementation. One approach for containers that support reverse iteration would
be to implement a different interface with a Prev()
method. A proxy could be
exposed to make it usable with for
directly.
Below is an example of what such an interface could look like:
interface IterateBidirectional;
class ReverseProxy(T:! IterateBidirectional);
interface IterateBidirectional {
extends Iterate;
fn NewEndCursor[self: Self]() -> CursorType;
fn Prev[self: Self](cursor: CursorType*) -> Optional(ElementType);
final fn Reversed[self: Self]() -> ReverseProxy(Self);
}
class ReverseProxy(T:! IterateBidirectional) {
var container: T;
impl as Iterate where .ElementType = T.ElementType and .CursorType = T.CursorType {
fn NewCursor[self: Self]() -> CursorType {
return self.container.NewEndCursor();
}
fn Next[self: Self](cursor: CursorType*) -> Optional(ElementType) {
return self.container.Prev(cursor);
}
}
}
fn IterateBidirectional.Reversed[self: Self]() -> ReverseProxy(Self) {
return {.container = self};
}
/// Usage
class MyContainer {
fn Create(...) { ... }
impl as Iterate ... { ... }
impl as IterateBidirectional ... { ... }
}
var container: MyContainer = MyContainer.Create(...);
for (v: container.(Iterate.ElementType) in container.Reversed()) {...}
The cursor approach deviates from the C++ iterator approach, requiring some adaptation.
Given the current state of the design, C++ interoperability is yet to be defined, and suggestions have been highlighted in the Future work section.
Supporting mutable values could be achieved using a second interface that
provides a proxy or view for the data, exposing an Iterate
interface with
ElementType*
// Object implementing `MutableIterate` returned by a function
for (p: T* in my_range.AddrRange()) {}
The example MutableIterate
interface below acts as a proxy for the collection,
and implements our Iterate
interface.
interface MutableIterate {
impl as Iterate;
alias ElementType = Iterate.ElementType;
let ProxyType:! Iterate where .ElementType = ElementType*
and .CursorType is ImplicitAs((Self as Iterate).CursorType);
fn AddrRange[addr self: Self*]() -> ProxyType;
}
class MyIntContainer {
//...
impl as Iterate ...
external impl as MutateIterate where .ProxyType = Self {
...
}
}
Mixins are currently in early design stages. This section highlights possible uses speculating on the final design. Some may include:
- Improved semantics for views compared to getter methods: no direct side effect from using the view
- Facilitate code reuse, compared to reimplementing an interface
- Direct access to
self
, limiting needs for pointers and address resolution
Named mixins, used as programmable members, could expose a view with mutable values:
// Using a named mixin to expose mutable values
for (p: my_range.(Iterate.ElementType)* in my_range.mutable) {}
Used in this way, this clarifies that this a view of the data, like an attribute, with no direct side effects, when compared to a method.
Similarly, mixins could allow supporting other views into the container.
For example, maps could expose a view with "key" and "value" only, in addition to the default (key, value) tuple.
// Using a named mixin to expose mutable values
for (k: my_dict.(Iterate.CursorType) in my_dict.keys) {}
And "reversed" view could also be used for reverse iteration:
// Reversing
for (v: my_dict.(Iterate.ElementType) in my_dict.reversed) {}
Large elements would have a negative impact if the Optional
cannot leverage
unused bit patterns within the type, nor present a view instead of a copy.
If this proves difficult, we can let implementers of the container choose
whether to expose values or pointers to values. When electing to "pointers to
values", an adapter can be provided to support transforming the range of
pointers into a range a values. This choice allows supporting either iteration
over container r-values (ElementType = T
), or efficient iteration
(ElementType = T*
), but not both. While acceptable as an intermediate
solution, a better alternative will be needed in the future.
adapter IterateByValue(T:! Iterate where .CursorType impls Deref) for T {
impl as Iterate
where .CursorType = T.CursorType
and .ElementType = T.CursorType.(Deref.Result) {
...
}
}
// Used like:
var frames: NetworkFramesContainerByPtr = ...
for (i: NetworkFrame in frames as IterateByValue(NetworkFramesContainerByPtr))
{
...
}
Alternatively, we can allow the binding in the for
loop to be an addr
pattern, implemented by calling into a separate interface on the container. This
puts the choice of using values or pointers to values in the hands of the for
loop author. While this deviates from the ergonomics of let
(where the
compiler can choose to copy or alias the value depending on what is more
efficient), this option would provide parity with C++ for
loops (where the
author chooses between value and reference).
For reference, range-based for
loops in C++ requires:
begin()
andend()
methods or free functions, and- the type returned supports pre-increment
++
, indirection*
, and inequality!=
operations
See range-based for statement for more details.
A limitation of the approach proposed in this document is the need for adaptation to interoperate with the C++ iterator model. The adapter would be a Carbon type that satisfies the Cursor interface, implemented in terms of a C++ iterator.
NewCursor
providing the "start iterator" returned bybegin()
Next
doing the "increment, bounds check, and returning value"
Two different approaches are identified:
- One is to have a templated
impl
ofIterate
, for anything with the right method signatures (begin
,end
, and so on) - Another is a templated adapter that implements
Iterate
when requested
// Template impl approach
template constraint CppIterator { ... }
template constraint CppContainer {
// Speculative
for_unique IteratorType: CppIterator;
fn begin() -> IteratorType;
fn end() -> IteratorType;
}
external impl forall [template T:! CppContainer] T as Iterate
where .CursorType = T.(CppContainer.IteratorType)
and .ElementType = .CursorType.ElementType {
fn NewCursor[self: Self]() -> CursorType {
return self.(CppContainer.begin)();
}
fn Next[self: Self](cursor: CursorType*) -> Optional(ElementType) {
if (*cursor == self.(CppContainer.end)()) {
return Optional(ElementType).MakeEmpty();
}
let value: ElementType = **cursor;
++*cursor;
return Optional(ElementType).MakeValue(value);
}
}
// Used like:
var v: Cpp.std.vector(i32) = ...
for (i: i32 in v) { ... }
// Adapter approach
adapter CppIterate(template T:! type) for T {
impl as Iterate
// Speculative syntax. Alternatively, a cursor type parameter
// would avoid the need for deduction.
where .CursorType = typeof(T.begin())
and .ElementType = .CursorType.(Deref.Result) {
fn NewCursor[self: Self]() -> CursorType {
return self.begin();
}
fn Next[self: Self](cursor: CursorType*) -> Optional(ElementType) {
if (*cursor == self.end()) {
return Optional(ElementType).MakeEmpty();
}
let value: ElementType = **cursor;
++*cursor;
return Optional(ElementType).MakeValue(value);
}
}
}
// Used like:
var v: Cpp.std.vector(i32) = ...
for (i: i32 in v as CppIterate(Cpp.std.vector(i32))) { ... }
These two options could allow interoperability with a compatible C++ type, either implicitly (template approach) or explicitly (adapter approach).
There need to be begin()
and end()
methods or free functions accessible to
C++ for any Carbon type that implements the Iterate
interface, in addition to
overloads of the *
, ++
, and !=
operators for the types returned.
The Iterate
interface does not expose a way to have a cursor nor an iterator
to the past-last element. An option would be to have end()
return a predefined
invalid value, to satisfy iterator comparison when Next()
returns an empty
Optional
.
Below is sample implementation to iterate on a Carbon Iterate
. Note that we
need to copy and cache the last value, to allow for *it
to be called. This
puts a requirement on ElementType
to be copyable for this interoperability to
be usable.
namespace Carbon {
// A C++ input iterator for any type that implements the Carbon `Iterate` interface.
template <typename Iterate>
class IterateIterator {
public:
// Returns an "Invalid" iterator representing `end()`
static auto Invalid(const Iterate& container) -> IterateIterator<Iterate> {
return IterateIterator<Iterate>(container, std::nullopt);
}
IterateIterator(const Iterate& container,
std::optional<typename Iterate::CursorType> cursor)
: container_(&container), cursor_(cursor) {
if (cursor_) {
next();
}
}
IterateIterator(const IterateIterator& other)
: container_(other.container_), cursor_(other.cursor_),
value_(other) {}
auto operator*() const -> const typename Iterate::ElementType& {
assert(cursor_);
return value_;
}
auto operator++() -> IterateIterator<Iterate>& {
next();
return *this;
}
auto operator==(const IterateIterator<Iterate>& rhs) -> bool {
return container_ == rhs.container_ && cursor_ == rhs.cursor_;
}
auto operator!=(const IterateIterator<Iterate>& rhs) -> bool {
return !(*this == rhs);
}
private:
void next() {
assert(cursor_);
if (const auto v = container_->Next(*cursor_); v) {
value_ = *std::move(v);
} else {
cursor_ = std::nullopt;
}
}
const Iterate* container_;
std::optional<typename Iterate::CursorType> cursor_;
typename Iterate::ElementType value_;
};
// The C++ side of a custom Carbon type implementing
// `Iterate` with CursorType `C`, and ElementType `T`
// could have `begin()` and `end()` methods defined as:
class MyIterateType {
// [...]
auto begin() -> IterateIterator<Iterate<C, T>> {
return IterateIterator<Iterate<C, T>>(*this, NewCursor());
}
auto end() -> IterateIterator<Iterate<C, T>> {
return IterateIterator<Iterate<C, T>>::Invalid(*this);
}
};
} // namespace Carbon
// Usage:
Carbon::MyIterateType container;
for (auto s : container) { /*...*/ }
Using an Optional
has downsides, discussed in the
"alternatives considered" section. A possible
approach would be to invert the control of the for
loop, by having the
container call in the for
block for each value, passing the value as argument.
This would work similarly to Python generator functions.
This has the following advantages:
- Removes the need for an
Optional
, and the associated overheads, copies, and unwrapping. - No boundary checks needed at the
for
level - Compatible with R-value containers
Limitations:
- The context needed by the
for
body will needs to be available or provided - No optimizer known to the team can reliably produce efficient code for this construct.
This proposal offers a first step to support a needed way to iterate using a
for
loop. More specifically:
- Requires a single interface to be implemented with a combined
Next
method, uses cursors instead of their more complex iterator counterpart, and defines loop values aslet
by default to be less error prone (Code that is easy to read, understand, and write). - Makes use of Library APIs only.
An option was to define methods such as Get
, Advance
, IsAtEnd
on
Iterate
, instead of a unique Next
methods. While being more atomic, the
concern is that these individual methods will all need to be called separately
while iterating, while a single call to Next
performs these three operations
together, providing more opportunities for optimization.
Using a single operation to return an optional is also consistent with Rust’s approach.
The combined Next
method forces us to return an Optional
, which has
downsides:
- potential performance overhead to wrapping and unwrapping,
- storage overhead for
Optional
itself, and - may require additional copying of the value wrapped.
Those may be mitigated by leveraging unused bit patterns, passing a view of the value, or inverting control.
The iterator approach was considered but proved to have key limitations that a cursor approach does not have:
- It requires implementing 2 interfaces instead of 1, and is more complex due to having the iteration logic separated from the container itself
- More difficult to harden or troubleshoot, compared to a cursor that allows bounds checking in debug & hardened build modes
- Can pose problems with ranges that consumes their input (See Barry Revzin’s
"take(5)" presentation from C++Now 2023), when compared to a combined
Next()
approach. - Higher overhead
- Proves to be difficult to support for R-values
On the other hand, the cursor approach deviates more from C++ than iterator, and may require some more effort for C++ interoperability.
Because some collections will be read-only, we want to separate mutable and immutable interfaces. A single interface would force developers to implement or stub the mutable portion of the interface, even if it is not applicable.