rfc:union_types

PHP RFC: Union Types

Introduction

PHP has type declarations which can be associated with function parameters or return values. These declarations perform two useful roles:

  • They allow the PHP engine to enforce the correct type of variable passed to or returned from a function.
  • They make it easier to reason about what types need to be passed to or can be returned from a function. Both humans and static code analysis tools can use this information to help determine the correctness of the code.

For a lot of functions in PHP each parameter will only be one type. Similarly, for the majority of functions the return value will only ever be of one type.

However, for a significant number of functions, the acceptable parameters or the possible return values can be of more than one type. For example consider the stripos function where the return value varies based on:

  • if the needle exists it returns an integer.
  • if the needle is not found, false is returned.

In the documentation on php.net, the two possible return types are documented as mixed - however this does not actually document what the possible return types are, only that there is more than one possible type returned.

Currently in userland code when a parameter for a function can be one of multiple types, or the return value from a function can be one of multiple types, there can be no type information supplied. It is not possible for the PHP engine to enforce any types passed to/from these functions and similarly it is not easy for people using these functions to reason about the types passed to/returned from these functions.

This RFC seeks to address these limitations.

Proposal

This RFC proposes the ability to define multiple possible types for parameter and return types. To define a 'union type' a single vertical bar (OR) is placed between types e.g. int|bool represents the union type of either integer or boolean. For these 'union types' a value passes the type check if the value would pass any one of the types in the union.

Additionally this RFC proposes that the values true, false (see the "True/False" section) and null (equal to type null; see the "Nullable types" section) will be usable as types in both parameter types and return type definitions.

There can be more than two types in the union.

Parameter type examples

A function that requires either a string or an array is passed to it as the parameter:

function print_each(array | string $in) {
    foreach ((array) $in as $value) {
        echo $value, PHP_EOL;
    }
}
 
print_each(['Bob', 'Joe', 'Levi']); // ok
print_each('Levi'); // ok
print_each(new stdclass()); // TypeError

For this example, it is clear to both static analysis tools and humans that passing anything other than an array or a string to this function, would be an error. (or will be weakly cast to a string if strict_types are disabled, see also the "Weak Scalar Types" section)

A class instance method that requires that either a string or a ParameterGenerator object is passed as the parameter.

// From zend-code
class MethodGenerator extends AbstractMemberGenerator {
     ...
    public function setParameter(ParameterGenerator|string $parameter) {
        ...
    }
}

For this example, it is clear to both static analysis tools and humans that passing anything other than a ParameterGenerator object or a string to this function, would be an error.

Return type example

A userland definition of stripos function:

function stripos(string $haystack, string $needle, int $offset = 0): int|false {
    $lowerHaystack = strtolower($haystack);
    $lowerNeedle = strtolower($needle);
    return strpos($lowerHaystack, $lowerNeedle, $offset);
}

For this example, it is clear to both static analysis tools and humans this function can return either an integer or the value 'false', and so both cases need to be handled in the calling code.

Nullable types

To cover the common use-case of returning some type or null, the null type needs to be permitted in a type declaration. The name is already reserved and the documentation already documents that null is both a type and a value. Previously it was not a helpful type declaration - if something is always passed null then there doesn't need to be a parameter at all, and if a function always returns null then there is no need to assign it. With the introduction of union types it becomes helpful and so this RFC proposes allowing null in unions:

function lookup_user(string $id): User | null;

This is currently possible via the short-hand nullable type support ?Type. However some concerns have been raised:

  1. ?Foo | Bar is pretty weird, it reads like “(nullable Foo) or (Bar)” when the nullability is not tied to a particular type.
  2. Allowing Foo | null and ?Foo is redundant.
  3. Foo | null is more explicit than ?Foo. Users who are not familiar with ? in other languages may understand the | better.

To address some of these issues this RFC disallows ? being used in combination with union types. Thus Foo | Bar | null is allowed, but not ?Foo | Bar.

This RFC proposes a vote on whether ?Foo shall be replaced by Foo | null in general.

True/False

It may be helpful to be able to explicitly use | false for return types as this is a common idiom in PHP's standard functions. As an example, the function signature for strpos could change:

// from
strpos ( string $haystack , mixed $needle [, int $offset = 0 ] ): mixed
// to
strpos ( string $haystack , mixed $needle [, int $offset = 0 ] ): int | false

This now allows to perfectly forward any internal signature and allows users to be more explicit.

Also false and true are not types in user-land, but they are internally.

This RFC proposes a vote to decide if true and false should be supported for unions.

Weak Scalar Types

Problem

PHP 7 allows weak scalar types. There is a question of how things will get converted in some situations when used in unions. As an example, if we have a union type of int and float and are passed the string “10” how is it converted?

function f(int | float $number) {
    return $number * 2;
}
f("10");

Would it be converted to int(10) or float(10), since either is acceptable? Does it matter given they are both acceptable?

Solution

Primarily, this issue is avoided if a parameter type exactly matches the input type or if PHP is in strict type mode.

With the only exception that an int(10) passed to a parameter requiring float as one of its types, but not int (e.g. string | float), will be, in accordance with normal handling of integers passed to floats, coerced to float(10).

Otherwise PHP's casting rules are applied in an order to be as lossless as possible. PHP's weak-type casting rules are complex, which leads to a seemingly complex set of rules for casting types, however these rules are not an invention of this proposal. This RFC applies PHP casting rules in a sane way to convert a value to a type accepted by the union whenever possible.

Passed type Union type #1 #2 #3
object string (__toString()) - -
boolean int float string
int float* string boolean
float string int boolean
string int/float boolean -

* While string is more lossless than float for big values, we have to match behavior with strict types enabled here
only if is_numeric() would return true
Respecting the order, for each type check if it is available in the union, else throw a TypeError if none is available.

Problems with left-to-right

Left-to-right conversion has been proposed multiple times. But this is not a viable solution for the following reasons:

  • (string|float) would convert to a string if passed an integer, which would be inconsistent with strict types converting it to a float. This type of inconsistency must be avoided.
  • Also, in strict left-to-right, exact matches would still be cast to the first type (from the left) which they can be cast to. This would again, be inconsistent with strict types enabled.
  • Ultimately, (float|int) would, even in strict types mode, lead to a conversion to float in any case upon passing integer; this is very counterintuitive.

It might be possible to exempt exact matches, but then we have yet another rule and still the first problem in the list above. At which point it just is much simpler to have well-defined conversion order depending on the passed type.

Variance

Return types are covariant: it is possible to remove types from the union in child functions.

Parameter types are contravariant: it is possible to add types to the union in child functions.

interface Foo {
    function pos(string $baz): int | false;
}
interface Bar extends Foo {
    function pos(string | Stringable $baz): int;
}

Reflection

This RFC proposes the addition of a class ReflectionUnionType inheriting from ReflectionType with a single method ReflectionUnionType::getTypes(): array<ReflectionType>; to get the reflection classes of the individual types in the union.

ReflectionUnionType::__toString() will now provide a full union type as string; e.g. “int | float | NumberObject”.

Proposed PHP Version(s)

This RFC targets PHP version 7.1.

Vote

This RFC requires that two-thirds of voters vote in favor of the RFC to pass.

Merge union types
Real name Yes No
ajf (ajf)  
bmajdak (bmajdak)  
bwoebi (bwoebi)  
cmb (cmb)  
colinodell (colinodell)  
derick (derick)  
dmitry (dmitry)  
galvao (galvao)  
guilhermeblanco (guilhermeblanco)  
jgmdev (jgmdev)  
kalle (kalle)  
lcobucci (lcobucci)  
levim (levim)  
lstrojny (lstrojny)  
marcio (marcio)  
mariano (mariano)  
mbeccati (mbeccati)  
mrook (mrook)  
ocramius (ocramius)  
pierrick (pierrick)  
rasmus (rasmus)  
salathe (salathe)  
santiagolizardo (santiagolizardo)  
sebastian (sebastian)  
seld (seld)  
sobak (sobak)  
stas (stas)  
svpernova09 (svpernova09)  
zeev (zeev)  
Final result: 11 18
This poll has been closed.

Additionally, there are two 50%+1 votes:

Replace ?QuestionMarkNullables by union | null
Real name Yes No
ajf (ajf)  
bmajdak (bmajdak)  
bwoebi (bwoebi)  
cmb (cmb)  
colinodell (colinodell)  
derick (derick)  
dmitry (dmitry)  
galvao (galvao)  
guilhermeblanco (guilhermeblanco)  
jgmdev (jgmdev)  
kalle (kalle)  
lcobucci (lcobucci)  
lstrojny (lstrojny)  
marcio (marcio)  
mariano (mariano)  
mbeccati (mbeccati)  
mrook (mrook)  
ocramius (ocramius)  
pierrick (pierrick)  
salathe (salathe)  
santiagolizardo (santiagolizardo)  
sebastian (sebastian)  
seld (seld)  
sobak (sobak)  
stas (stas)  
svpernova09 (svpernova09)  
Final result: 5 21
This poll has been closed.

Include true/false types
Real name Yes No
ajf (ajf)  
bmajdak (bmajdak)  
bwoebi (bwoebi)  
cmb (cmb)  
colinodell (colinodell)  
derick (derick)  
dmitry (dmitry)  
galvao (galvao)  
guilhermeblanco (guilhermeblanco)  
jgmdev (jgmdev)  
kalle (kalle)  
lcobucci (lcobucci)  
lstrojny (lstrojny)  
marcio (marcio)  
mariano (mariano)  
mrook (mrook)  
nikic (nikic)  
ocramius (ocramius)  
pierrick (pierrick)  
salathe (salathe)  
santiagolizardo (santiagolizardo)  
sebastian (sebastian)  
seld (seld)  
stas (stas)  
svpernova09 (svpernova09)  
tpunt (tpunt)  
zeev (zeev)  
zimt (zimt)  
Final result: 5 23
This poll has been closed.

The vote started 14th June 2016 and will end 23th June 2016.

Patches and Tests

Bob Weinand and Joe Watkins have created a patch: https://github.com/php/php-src/pull/1887 which is needs some small polishing but implements the proposed features.

Future Scope

This sections details areas where the feature might be improved in future, but that are not currently proposed in this RFC.

Long Type Expressions

Since you can create a chain of types the names can get quite lengthy. Even the fairly short union type of Array | Traversable can be repetitive to type out. Should a mechanism to provide type aliases exist?

type Iterable = Array | Traversable;
 
function map(Callable $f, Iterable $input): Iterable {
    foreach ($input as $key => $value) {
        yield $key => $f($value);
    }
}
 
function filter(Callable $f, Iterable $input): Iterable {
    foreach ($input as $key => $value) {
        if ($value) {
            yield $key => $value;
        }
    }
}

It may also be advantageous for implementation reasons to define a type name for an expression.

References

rfc/union_types.txt · Last modified: 2017/09/22 13:28 by 127.0.0.1