Skip to content

Latest commit

 

History

History
1406 lines (1084 loc) · 58.9 KB

p0107.md

File metadata and controls

1406 lines (1084 loc) · 58.9 KB

Code and name organization

Pull request

Table of contents

Problem

How do developers store code for the compiler, and access it for reuse?

Proposal

Adopt an approach with tiered files, libraries, packages and namespaces in the design.

Out-of-scope issues

Related issues that are out-of-scope for this proposal are:

  • Access control: while there is some implicit access control from interface vs implementation considerations for libraries, they are more about addressing circular dependencies in code.

  • Aliasing implementation: while the alias keyword is critical to how easy or difficult refactoring is, it should be designed on its own.

  • Compilation details: this proposal sets up a framework that should enable well-designed compilation, but does not set about to design how compilation will work.

  • File-private identifiers: Something similar to C++ static functions may exist. However, that will be addressed separately.

  • Incremental migration and unused imports: incrementally migrating a declaration from one library to another might require an intermediate state where callers import both libraries, with consequent issues. However, it may also not require such. Whether it does, or whether tooling needs to be added to support the specific intermediary state of transitional, unused imports, is out of scope.

  • Name lookup, including addressing conflicting names between imports and names in the current file: the name lookup design is likely to address this better, including offering syntax that could refer to it if needed.

    • After discussion, we believe we do not need to support package renaming. However, that final decision should be based on name lookup addressing the issue, as implications need to be considered more deeply.
  • Package management: while we want to choose syntax that won't impose barriers on building package management into Carbon, we should not make assumptions about how package management should work.

  • Prelude package, or fundamentals: while we've discussed how to handle name lookup for things like Int32, this proposal mainly lays out a framework where options for addressing that are possible.

This proposal should not be interpreted as addressing these issues. A separate discussion of these issues will remain necessary.

Open questions for the decision

Extended open question comparisons may be found in the examples doc in addition to the code_and_organization.md alternatives section.

Should we switch to a library-oriented structure that's package-agnostic?

Decision: No.

Right now, the package syntax is very package-oriented. We could instead eliminate package semantics from code and organization, relying only on libraries and removing the link to distribution. This is the collapse the package concept into libraries alternative.

Does the core team agree with the approach to packages and libraries? If not, does the alternative capture what the core team wants to be turned into the proposal, or is some other approach preferred?

Should there be a tight association between file paths and packages/libraries?

Decision: Make paths correspond to libraries for API files, not impl files. Keep package.

Right now, the package syntax requires the package's own name be repeated through code. This touches on a couple alternatives:

The end result of taking both alternatives would be that:

  • The package and library would no longer need to be specified on the first line.
    • The import would still need a library.
  • The package keyword would always be used to refer to the current package.
    • Referring to the current package by name would be disallowed, to allow for easier renames of conflicting package names.

Justification

  • Software and language evolution:

    • The syntax and interactions between package and import should enable moving code between files and libraries with fewer modifications to callers, easing maintenance of large codebases.

      • In C++ terms, #include updates are avoidable when moving code around.
  • Code that is easy to read, understand, and write:

    • By setting up imports so that each name in a file is unique and refers to the source package, we make the meaning of symbols clear and easier to understand.

    • The proposed namespace syntax additionally makes it clear when the package's default namespace is not being used.

      • This is in contrast to C++ namespaces, where the entire body of code above the line of code in question may be used to start a namespace.
    • Clearly marking interfaces will make it easier for both client code and IDE tab completion to more easily determine which APIs can be used from a given library.

  • Fast and scalable development:

    • The structure of libraries and imports should help enable separate compilation, particularly improving performance for large codebases.
  • Interoperability with and migration from existing C++ code:

    • The syntax of import should enable extending imports for use in interoperability code.

Alternatives considered

Packages

Name paths for package names

Right now, we only allow a single identifier for the package name. We could allow a full name path without changing syntax.

Advantages:

  • Allow greater flexibility and hierarchy for related packages, such as Database.Client and Database.Server.
  • Would allow using GitHub repository names as package names. For example, carbon-language/carbon-toolchain could become carbon_language.carbon_toolchain.

Disadvantages:

  • Multiple identifiers is more complex.
  • Other languages with similar distribution packages don't have a hierarchy, and so it may be unnecessary for us.
  • We can build a custom system for reserving package names in Carbon.

At present, we are choosing to use single-identifier package names because of the lack of clear advantage towards a more complex name path.

Referring to the package as package

Right now, we plan to refer to the package containing the current file by name. What's important in the below example is the use of Math.Stats:

package Math library "Stats" api;
api struct Stats { ... }
struct Quantiles {
  fn Stats();
  fn Build() {
    ...
    var Math.Stats: b;
    ...
  }
}

We could instead use package as an identifier within the file to refer to the package, giving package.Stats.

It's important to consider how this behaves for impl files, which expect an implicit import of the API. In other words, for impl files, this can be compared to an implicit import Math; versus an implicit import Math as package;. However, there may also be explicit imports from the package, such as import Math library "Trigonometry";, which may or may not be referable to using package, depending on the precise option used.

Advantages:

  • Gives a stable name to refer to the current library's package.
    • This reduces the amount of work necessary if the current library's package is renamed, although imports and library consumers may still need to be updated. If the library can also refer to the package by the package name, even with imports from other libraries within the package, work may not be significantly reduced.
  • The same syntax can be used to refer to entities with the same name as the package.
    • For example, in a package named DateTime, package.DateTime is unambiguous, whereas DateTime.DateTime could be confusing.

Disadvantages:

  • We are likely to want a more fine-grained, file-level approach proposed by name lookup.
  • Allows package owners to name their packages things that they rarely type, but that importers end up typing frequently.
    • The existence of a short package keyword shifts the balance for long package names by placing less burden on the package owner.
  • Reuses the package keyword with a significantly different meaning, changing from a prefix for the required declaration at the top of the file, to an identifier within the file.
    • We don't need to have a special way to refer to the package to disambiguate duplicate names. In other words, there is likely to be other syntax for referring to an entity DateTime in the package DateTime.
    • Renaming to a library keyword has been suggested to address concerns with package. Given that library is an argument to package, it does not significantly change the con.
  • Creates inconsistencies as compared to imports from other packages, such as package Math; import Geometry;, and imports from the current package, such as package Math; import Math library "Stats";.
    • Option 1: Require package to be used to refer to all imports from Math, including the current file. This gives consistent treatment for the Math package, but not for other imports. In other words, developers will always write package.Stats from within Math, and Math.Stats will only be written in other packages.
    • Option 2: Require package be used for the current library's entities, but not other imports. This gives consistent treatment for imports, but not for the Math package as a whole. In other words, developers will only write package.Stats when referring to the current library, whether in api or impl files. Math.Stats will be used elsewhere, including from within the Math package.
    • Option 3: Allow either package or the full package name to refer to the current package. This allows code to say either package or Math, with no enforcement for consistency. In other words, both package.Stats and Math.Stats are valid within the Math package.

Because name lookup can be expected to address the underlying issue differently, we will not add a feature to support name lookup. We also don't want package owners to name their packages things that even they find difficult to type. As part of pushing library authors to consider how their package will be used, we require them to specify the package by name where desired.

Remove the library keyword from package and import

Right now, we have syntax such as:

package Math library "Median" api;
package Math library "Median" namespace Stats api;
import Math library "Median";

We could remove library, resulting in:

package Math.Median api;
package Math.Median namespace Math.Stats api;
import Math.Median;

Advantages:

  • Reduces redundant syntax in library declarations.
    • We expect libraries to be common, so this may add up.

Disadvantages:

  • Reduces explicitness of package vs library concepts.
  • Creates redundancy of the package name in the namespace declaration.
    • Instead of package Math.Median namespace Math.Stats, could instead use Stats, or this.Stats to elide the package name.
  • Potentially confuses the library names, such as Math.Median, with namespace names, such as Math.Stats.
  • Either obfuscates or makes it difficult to put multiple libraries in the top-level namespace.
    • This is important because we are interested in encouraging such behavior.
    • For example, if package Math.Median api; uses the Math namespace, the presence of Median with the same namespace syntax obfuscates the actual namespace.
    • For example, if package Math.Median namespace Math api is necessary to use the Math namespace, requiring the namespace keyword makes it difficult to put multiple libraries in the top-level namespace.

As part of avoiding confusion between libraries and namespaces, we are declining this alternative.

Rename package concept

In other languages, a "package" is equivalent to what we call the name path here, which includes the namespace. We may want to rename the package keyword to avoid conflicts in meaning.

Alternative names could be 'bundle', 'universe', or something similar to Rust's 'crates'; perhaps 'compound' or 'molecule'.

Advantages:

  • Avoids conflicts in meaning with other languages.
    • Java, similar to a namespace path.
    • Go, similar to a namespace path.

Disadvantages:

  • The meaning of package also overlaps a fair amount, and we would lose that context.

No association between the file system path and library/namespace

Several languages create a strict association between the method for pulling in an API and the path to the file that provides it. For example:

  • In C++, #include refers to specific files without any abstraction.
    • For example, #include "PATH/TO/FILE.h" means there's a file PATH/TO/FILE.h.
  • In Java, package and import both reflect file system structure.
    • For example, import PATH.TO.FILE; means there's a file PATH/TO/FILE.java.
  • In Python, import requires matching file system structure.
    • For example, import PATH.TO.FILE means there's a file PATH/TO/FILE.py.
  • In TypeScript, import refers to specific files.
    • For example, import {...} from 'PATH/TO/FILE'; means there's a file PATH/TO/FILE.ts.

For contrast:

  • In Go, package uses an arbitrary name.
    • For example, import "PATH/TO/NAME" means there is a directory PATH/TO that contains one or more files starting with package NAME.

In Carbon, we are using a strict association to say that import PACKAGE library "PATH/TO/LIBRARY" means there is a file PATH/TO/LIBRARY.carbon under some package root.

Advantages:

  • The strict association makes it harder to move names between files without updating callers.
  • If there were a strict association of paths, it would also need to handle file system-dependent casing behaviors.
    • For example, on Windows, project.carbon and Project.carbon are conflicting filenames. This is exacerbated by paths, wherein a file config and a directory Config/ would conflict, even though this would be a valid structure on Unix-based filesystems.

Disadvantages:

  • A strict association between file system path and import path makes it easier to find source files. This is used by some languages for compilation.
  • Allows getting rid of the package keyword by inferring related information from the file system path.

We are choosing to have some association between the file system path and library for API files to make it easier to find a library's files. We are not getting rid of the package keyword because we don't want to become dependent on file system structures, particularly as it would increase the complexity of distributed builds.

Libraries

Allow exporting namespaces

We propose to not allow exporting namespaces as part of library APIs. We could either allow or require exporting namespaces. For example:

package Checksums;

api namespace Sha256;

While this approach would mainly be syntactic, a more pragmatic use of this would be in refactoring. It implies that an aliased namespace could be marked as an api. For example, the below could be used to share an import's full contents:

package Translator library "Interface" api;

import Translator library "Functions" as TranslatorFunctions;

api alias Functions = TranslatorFunctions;

Advantages:

  • Avoids any inconsistency in how entities are handled.
  • Reinforces whether a namespace may contain api entities.
  • Enables new kinds of refactorings.

Disadvantages:

  • Creates extra syntax for users to remember, and possibly forget, when declaring api entities.
    • Makes it possible to have a namespace marked as api that doesn't contain any api entities.
  • Allowing aliasing of entire imports makes it ambiguous which entities are being passed on through the namespace.

This alternative is declined because it's not sufficiently clear it'll be helpful, versus impairment of refactoring.

Allow importing implementation files from within the same library

The current proposal is that implementation files in a library implicitly import their API, and that they cannot import other implementation files in the same library.

We could instead allow importing implementation files from within the same library. There are two ways this could be done:

  • We could add a syntax for importing symbols from other files in the same library. This would make it easy to identify a directed acyclic graph between files in the library. For example:

    package Geometry;
    
    import file("point.6c");
    
  • We could automatically detect when symbols from elsewhere in the library are referenced, given an import of the same library. For example:

    package Geometry;
    
    import this;
    

Advantages:

  • Allows more separation of implementation between files within a library.

Disadvantages:

  • Neither approach is quite clean:
    • Using filenames creates a common case where filenames must be used, breaking away from name paths.
    • Detecting where symbols exist may cause separate parsing, compilation debugging, and compilation parallelism problems.
  • Libraries are supposed to be small, and we've chosen to only allow one API file per library to promote that concept. Encouraging implementation files to be inter-dependent appears to support a more complex library design again, and may be better addressed through inter-library ACLs.
  • Loses some of the ease-of-use that some other languages have around imports, such as Go.
  • Part of the argument towards api and impl, particularly with a single api, has been to mirror C++ .h and .cc. Wherein a .cc #include-ing other .cc files is undesirable, allowing a impl to import another impl could be considered similarly.

The problems with these approaches, and encouragement towards small libraries, is how we reach the current approach of only importing APIs, and automatically.

Alternative library separators and shorthand

Examples are using / to separator significant terms in library names, and // to separate the package name in shorthand. For example, package Time library "Timezones/Internal"; with shorthand Time//Timezones/Internal.

Note that, because the library is an arbitrary string and shorthand is not a language semantic, this won't affect much. However, users should be expected to treat examples as best practice.

We could instead use . for library names and / for packages, such as Time/Timezones.Internal.

Advantages:

  • Clearer distinction between the package and library, increasing readability.
  • We have chosen not to [enforce file system paths](#strict-association-between-the-file system-path-and-librarynamespace) in order to ease refactoring, and encouraging a mental model where they may match could confuse users.

Disadvantages:

  • Uses multiple separators, so people need to type different characters.
  • There is a preference for thinking of libraries like file system paths, even if they don't actually correspond.

People like /, so we're going with /.

Single-word libraries

We could stick to single word libraries in examples, such as replacing library "Algorithms/Distance" with library "Distance".

Advantages:

  • Encourages short library names.

Disadvantages:

  • Users are likely to end up doing some hierarchy, and we should address it.
    • Consistency will improve code understandability.

We might list this as a best practice, and have Carbon only expose libraries following it. However, some hierarchy from users can be expected, and so it's worthwhile to include a couple examples to nudge users towards consistency.

Collapse API and implementation file concepts

We could remove the distinction between API and implementation files.

Advantages:

  • Removing the distinction between API and implementation would be a language simplification.
  • Developers will not need to consider build performance impacts of how they are distributing code between files.

Disadvantages:

  • Serializes compilation across dependencies.
    • May be exacerbated because developers won't be aware of when they are adding a dependency that affects imports.
    • In large codebases, it's been necessary to abstract out API from implementation in languages that similarly consolidate files, such as Java. However, the lack of language-level support constrains potential benefit and increases friction for a split.
  • Whereas an api/impl hierarchy gives a structure for compilation, if there are multiple files we will likely need to provide a different structure, perhaps explicit file imports, to indicate intra-library compilation dependencies.
    • We could also effectively concatenate and compile a library together, reducing build parallelism options differently.
  • Makes it harder for users to determine what the API is, as they must read all the files.

Requiring users to manage the api/impl split allows us to speed up compilation for large codebases. This is important for large codebases, and shouldn't directly affect small codebases that choose to only use api files.

Automatically generating the API separation

We could try to address the problems with collapsing API and implementation files by automatically generating an API file from the input files for a library.

For example, it may preprocess files to split out an API, reducing the number of imports propagated for actual APIs. For example:

  1. Extract api declarations within the api file.
  2. Remove all implementation bodies.
  3. Add only the imports that are referenced.

Even under the proposed model, compilation will do some of this work as an optimization. However, determining which imports are referenced requires compilation of all imports that may be referenced. When multiple libraries are imported from a single package, it will be ambiguous which imports are used until all have been compiled. This will cause serialization of compilation that can be avoided by having a developer split out the impl, either manually or with developer tooling.

The impl files may make it easier to read code, but they will also allow for better parallelism than api files alone can. This does not mean the compiler will or will not add optimizations -- it only means that we cannot wholly rely on optimizations by the compiler.

Automatically generating the API separation would only partly mitigate the serialization of compilation caused by collapsing file and library concepts. Most of the build performance impact would still be felt by large codebases, and so the mitigation does not significantly improve the alternative.

Collapse file and library concepts

We could collapse the file and library concepts. What this implies is:

  • Collapse API and implementation file concepts.
    • As described there, this approach significantly reduces the ability to separate compilation.
  • Only support having one file per library.
    • The file would need to contain both API and implementation together.

This has similar advantages and disadvantages to collapse API and implementation file concepts. Differences follow.

Advantages:

  • Offers a uniformity of language usage.
    • Otherwise, some developers will use only api files, while others will always use impl files.
  • The structure of putting API and implementation in a single file mimics other modern languages, such as Java.
  • Simplifies IDEs and refactoring tools.
    • Otherwise, these systems will need to understand the potential for separation of interface from implementation between multiple files.
    • For example, see potential refactorings.

Disadvantages:

  • Avoids the need to establish a hierarchy between files in a library, at the cost of reducing build parallelism options further.
  • While both API and implementation is in the same file, it can be difficult to visually identify the API when it's mixed with a lengthy implementation.

As with collapse API and implementation file concepts, we consider the split to be important for large codebases. The additional advantages of a single-file restriction do not outweigh the disadvantages surrounding build performance.

Collapse the library concept into packages

We could only have packages, with no libraries. Some other languages do this; for example, in Node.JS, a package is often similar in size to what we currently call a library.

If packages became larger, that would lead to compile-time bottlenecks. Thus, if Carbon allowed large packages without library separation, we would undermine our goals for fast compilation. Even if we combined the concepts, we should expect it's by turning the "package with many small libraries" concept into "many small packages".

Advantages:

  • Simplification of organizational hierarchy.
    • Less complexity for users to think about on imports.

Disadvantages:

  • Coming up with short, unique package names may become an issue, leading to longer package names that overlap with the intent of libraries.
    • These longer package names would need to be used to refer to contained entities in code, affecting brevity of Carbon code. The alternative would be to expect users to always rename packages on import; some organizations anecdotally see equivalent happen for C++ once names get longer than six characters.
    • For example, boost could use per-repository packages like BoostGeometry and child libraries like algorithms-distance under the proposed approach. Under the alternative approach, it would use either a monolithic package that could create compile-time bottlenecks, or packages like BoostGeometryAlgorithmsDistance for uniqueness.
  • While a package manager will need a way to specify cross-package version compatibility, encouraging a high number of packages puts more weight and maintenance cost on the configuration.
    • We expect libraries to be versioned at the package-level.

We prefer to keep the library separation to enable better hierarchy for large codebases, plus encouraging small units of compilation. It's still possible for people to create small Carbon packages, without breaking it into multiple libraries.

Collapse the package concept into libraries

Versus collapse the library concept into packages, we could have libraries without packages. Under this model, we still have libraries of similar granularity as what's proposed. However, there is no package grouping to them: there are only libraries which happen to share a namespace.

References to imports from other top-level namespaces would need to be prefixed with a '.' in order to make it clear which symbols were from imports.

For example, suppose Boost is a large system that cannot be distributed to users in a single package. As a result, Random functionality is in its own distribution package, with multiple libraries contained. The difference between approaches looks like:

  • package vs library:
    • Trivial:
      • Proposal: package BoostRandom;
      • Alternative: library "Boost/Random" namespace Boost;
    • Multi-layer library:
      • Proposal: package BoostRandom library "Uniform";
      • Alternative: library "Boost/Random.Uniform" namespace Boost;
    • Specifying namespaces:
      • Proposal: package BoostRandom namespace Distributions;
      • Alternative: library "Boost/Random.Uniform" namespace Boost.Random.Distributions;
    • Combined:
      • Proposal: package BoostRandom library "Uniform" namespace Distributions;
      • Alternative: library "Boost/Random.Uniform" namespace Boost.Random.Distributions;
  • import changes:
    • Trivial:
      • Proposal: import BoostRandom;
      • Alternative: import "Boost/Random";
    • Multi-layer library:
      • Proposal: import BoostRandom library "Uniform";
      • Alternative: import "Boost/Random.Uniform";
    • Namespaces have no effect on import under both approaches.
  • Changes to use an imported entity:
    • Proposal: BoostRandom.UniformDistribution
    • Alternative:
      • If the code is in the Boost.Random namespace: Uniform
      • If the code is in the Boost package but a different namespace: Random.Uniform
      • If the code is outside the Boost package: .Boost.Random.Uniform

We assume that the compiler will enforce that the root namespace must either match or be a prefix of the library name, followed by a / separator. For example, Boost in the namespace Boost.Random.Uniform must either match a library "Boost" or prefix as library "Boost/..."; library "BoostRandom" does not match because it's missing the / separator.

There are several approaches which might remove this duplication, but each has been declined due to flaws:

  • We could have library "Boost/Random.Uniform"; imply namespace Boost. However, we want name paths to use things listed as identifiers in files. We specifically do not want to use strings to generate identifiers in order to support understandability of code.
  • We could alternately have namespace Boost; syntax imply library "Boost" namespace Boost;.
    • This approach only helps with single-library namespaces. While this would be common enough that a special syntax would help some developers, we are likely to encourage multiple libraries per namespace as part of best practices. We would then expect that the quantity of libraries in multi-library namespaces would dominate cost-benefit, leaving this to address only an edge-case of duplication issues.
    • This would create an ambiguity between the file-level namespace and other namespace keyword use. We could then rename the namespace argument for library to something like file-namespace.
    • It may be confusing as to what namespace Boost.Random; does. It may create library "Boost/Random" because library "Boost.Random" would not be legal, but the change in characters may in turn lead to developer confusion.
      • We could change the library specification to use . instead of / as a separator, but that may lead to broader confusion about the difference between libraries and namespaces.

Advantages:

  • Avoids introducing the "package" concept to code and name organization.
    • Retains the key property that library and namespace names have a prefix that is intended to be globally unique.
    • Avoids coupling package management to namespace structure. For example, it would permit a library collection like Boost to be split into multiple repositories and multiple distribution packages, while retaining a single top-level namespace.
  • The library and namespace are pushed to be more orthogonal concepts than packages and namespaces.
    • Although some commonality must still be compiler-enforced.
  • For the common case where packages have multiple libraries, removing the need to specify both a package and library collapses two keywords into one for both import and package.
  • It makes it easier to draw on C++ intuitions, because all the concepts have strong counterparts in C++.
  • The prefix . on imported name paths can help increase readability by making it clear they're from imports, so long as those imports aren't from the current top-level namespace.
  • Making the . optional for imports from the current top-level namespace eliminates the boilerplate character when calling within the same library.

Disadvantages:

  • The use of a leading . to mark absolute paths may conflict with other important uses, such as designated initializers and named parameters.
  • Declines an opportunity to align code and name organization with package distribution.
    • Alignment means that if a developer sees package BoostRandom library "Uniform";, they know installing a package BoostRandom will give them the library. Declining this means that users seeing library "Boost/Random.Uniform", they will still need to do research as to what package contains Boost/Random.Uniform to figure out how to install it because that package may not be named Boost.
    • Package distribution is a project goal, and cannot be avoided indefinitely.
    • This also means multiple packages may contribute to the same top-level namespace, which would prevent things like tab-completion in IDEs from producing cache optimizations based on the knowledge that modified packages cannot add to a given top-level namespace. For example, the ability to load less may improve performance:
      • As proposed, a package BoostRandom only adds to a namespace of the same name. If a user is editing libraries in a package BoostCustom, then BoostRandom may be treated as unmodifiable. An IDE could optimize cache invalidation of BoostRandom at the package level. As a result, if a user types BoostRandom. and requests a tab completion, the system need only ensure that libraries from the BoostRandom. package are loaded for an accurate result.
      • Under this alternative, a library Boost.Random similarly adds to the namespace Boost. However, if a user is editing libraries, the IDE needs to support them adding to both Boost and MyProject simultaneously. As a result, if a user types Boost. and requests a tab completion, the system must have all libraries from all packages loaded for an accurate result.
      • Although many features can be restricted to current imports, some features, such as auto-imports, examine possible imports. Large codebases may have a memory-constrained quantity of possible imports.
  • The string prefix enforcement between library and namespace forces duplication between both, which would otherwise be handled by package.
  • For the common case of packages with a matching namespace name, increases verbosity by requiring the namespace keyword.
  • The prefix . on imported name paths will be repeated frequently through code, increasing overall verbosity, versus the package approach which only affects import verbosity.
  • Making the . optional for imports from the current top-level namespace hides whether an API comes from the current library or an import.

We are declining this approach because we desire package separation, and because of concerns that this will lead to an overall increase in verbosity due to the preference for few child namespaces, whereas this alternative benefits when namespace is specified more often.

Different file type labels

We're using api and impl for file types, and have test as an open question.

We've considered using interface instead of api, but that introduces a terminology collision with interfaces in the type system.

We've considered dropping api from naming, but that creates a definition from absence of a keyword. It also would be more unusual if both impl and test must be required, that api would be excluded. We prefer the more explicit name.

We could spell out impl as implementation, but are choosing the abbreviation for ease of typing. We also don't think it's an unclear abbreviation.

We expect impl to be used for implementations of interface. This isn't quite as bad as if we used interface instead of api because of the api export syntax on entities, such as api fn DoSomething(), which could create ambiguities as interface fn DoSomething(). It may still confuse people to see an interface impl in an api file. However, we're touching on related concepts and don't see a great alternative.

Function-like syntax

We could consider more function-like syntax for import, and possibly also package.

For example, instead of:

import Math library "Stats";
import Algebra as A;

We could do:

import("Math", "Stats").Math;
alias A = import("Algebra").Algebra;

Or some related variation.

Advantages:

  • Allows straightforward reuse of alias for language consistency.
  • Easier to add more optional arguments, which we expect to need for interoperability and URLs.
  • Avoids defining keywords for optional fields, such as library.
    • Interoperability and package management may add more fields long-term.

Disadvantages:

  • It's unusual for a function-like syntax to produce identifiers for name lookup.
    • This could be addressed by requiring alias, but that becomes verbose.
    • There's a desire to explicitly note the identifier being imported some way, as with .Math and .Algebra above. However, this complicates the resulting syntax.

The preference is for keywords.

Inlining from implementation files

An implicit reason for keeping code in an api file is that it makes it straightforward to inline code from there into callers.

We could explicitly encourage inlining from impl files as well, making the location of code unimportant during compilation. Alternately, we could add an inline file type which explicitly supports separation of inline code from the api file.

Advantages:

  • Allows moving code out of the main API file for easier reading.

Disadvantages:

  • Requires compilation of impl files to determine what can be inlined from the api file, leading to the transitive closure dependency problems which impl files are intended to avoid.

We expect to only support inlining from api files in order to avoid confusion about dependency problems.

Library-private access controls

We currently have no special syntax for library-private APIs. However, non-exported APIs are essentially library-private, and may be in the api file. It's been suggested that we could either provide a special syntax or a new file type, such as shared_impl, to support library-private APIs.

Advantages:

  • Allows for better separation of library-private APIs.

Disadvantages:

  • Increases language complexity.
  • Dependencies are still an issue for library-private APIs.
    • If used from the api file, the dependencies are still in the transitive closure of client libraries, and any separation may confuse users about the downsides of the extra dependencies.
    • If only used from impl files, then they could be in the impl file if there's only one, or shared from a separate library.
  • Generalized access controls may provide overlapping functionality.

At this point in time, we prefer not to provide specialized access controls for library-private APIs.

Managing API versus implementation in libraries

At present, we plan to have api versus impl as a file type, and also .carbon versus .impl.carbon as the file extension. We chose to use both together, rather than one or the other, because we expect some parties to strongly want file content to be sufficient for compilation, while others will want file extensions to be meaningful for the syntax split.

Instead of the file type split, we could drift further and instead have APIs in any file in a library, using the same kind of API markup.

Advantages:

  • May help users who have issues with cyclical code references.
  • Improves compiler inlining of implementations, because the compiler can decide how much to actually put in the generated API.

Disadvantages:

  • While allowing users to spread a library across multiple files can be considered an advantage, we see the single API file as a way to pressure users towards smaller libraries, which we prefer.
  • May be slower to compile because each file must be parsed once to determine APIs.
  • For users that want to see only APIs in a file, they would need to use tooling to generate the API file.
    • Auto-generated documentation may help solve this problem.

Multiple API files

The proposal also presently suggests a single API file. Under an explicit API file approach, we could still allow multiple API files.

Advantages:

  • More flexibility when writing APIs; could otherwise end up with one gigantic API file.

Disadvantages:

  • Encourages larger libraries by making it easier to provide large APIs.
  • Removes some of the advantages of having an API file as a "single place" to look, suggesting more towards the markup approach.
  • Not clear if API files should be allowed to depend on each other, as they were intended to help resolve cyclical dependency issues.

We particularly want to discourage large libraries, and so we're likely to retain the single API file limit.

Name paths as library names

We're proposing strings for library names. We've discussed also using name paths (My.Library) and also restricting to single identifiers (Library).

Advantages:

  • Shares the form between packages (identifiers) and namespaces (name paths).
  • Enforces a constrained set of names for libraries for cross-package consistency of naming.

Disadvantages:

  • Indicates that a library may be referred to in code, when only the package and namespace are used for name paths of entities.
  • The constrained set of names may also get in the way for some packages that can make use of more flexibility in naming.

We've decided to use strings primarily because we want to draw the distinction that a library is not something that's used when referring to an entity in code.

Imports

Block imports

Rather than requiring an import keyword per line, we could support block imports, as can be found in languages like Go.

In other words, instead of:

import Math;
import Geometry;

We could have:

imports {
  Math,
  Geometry,
}

Advantages:

  • Allows repeated imports with less typing.

Disadvantages:

  • Makes it harder to find files importing a package or library using tools like grep.

One concern has been that a mix of import and imports syntax would be confusing to users: we should only allow one.

This alternative has been declined because retyping import statements is low-cost, and grep is useful.

Block imports of libraries of a single package

We could allow block imports of libraries from the same package. For example:

import Containers libraries({
  "FlatHashMap",
  "FlatHashSet",
})

The result of this api alias allowing Containers.HashSet() to work regardless of whether HashSet is in "HashContainers" or "Internal" may be clearer if both import Containers statements were a combined import Containers libraries({"HashContainers", "Internal"});.

The advantages/disadvantages are similar to block imports. Additional advantages/disadvantages are:

Advantages:

  • If we limit to one import per library, then any alias of the package Containers is easier to understand as affecting all libraries.

Disadvantages:

  • If we allow both library and libraries syntax, it's two was of doing the same thing.
    • Can be addressed by always requiring libraries, removing library, but that diverges from package's library syntax.

This alternative has been declined for similar reasons to block imports; the additional advantages/disadvantages don't substantially shift the cost-benefit argument.

Broader imports, either all names or arbitrary code

Carbon imports require specifying individual names to import. We could support broader imports, for example by pulling in all names from a library. In C++, the #include preprocessor directive even supports inclusion of arbitrary code. For example:

import Geometry library "Shapes" names *;

// Triangle was imported as part of "*".
fn Draw(var Triangle: x) { ... }

Advantages:

  • Reduces boilerplate code specifying individual names.

Disadvantages:

  • Loses out on parser benefits of knowing which identifiers are being imported.
  • Increases the risk of adding new features to APIs, as they may immediately get imported by a user and conflict with a preexisting name, breaking code.
  • As the number of imports increases, it can become difficult to tell which import a particular symbol comes from, or how imports are being used.
  • Arbitrary code inclusion can result in unexpected code execution, a way to create obfuscated code and a potential security risk.

We particularly value the parser benefits of knowing which identifiers are being imported, and so we require individual names for imports.

Direct name imports

We could allow direct imports of names from libraries. For example, under the current setup we might see:

import Math library "Stats";
alias Median = Stats.Median;
alias Mean = Stats.Mean;

We could simplify this syntax by augmenting import:

import Math library "Stats" name Median;
import Math library "Stats" name Mean;

Or more succinctly with block imports of names:

import Math library "Stats" names {
  Median,
  Mean,
}

Advantages:

  • Avoids an additional alias step.

Disadvantages:

  • With a single name, this isn't a significant improvement in syntax.
  • With multiple names, this runs into similar issues as block imports.

Optional package names

We could allow a short syntax for imports from the current library. For example, this code imports Geometry.Shapes:

package Geometry library "Operations" api;

import library "Shapes";

Advantages:

  • Reduces typing.

Disadvantages:

  • Makes it harder to find files importing a package or library using tools like grep.
  • Creates two syntaxes for importing libraries from the current package.
    • If we instead disallow import Geometry library "Shapes" from within Geometry, then we end up with a different inconsistency.

Overall, consistent with the decision to disallow block imports, we are choosing to require the package name.

Namespaces

File-level namespaces

We are providing entity-level namespaces. This is likely necessary to support migrating C++ code, at a minimum. It's been discussed whether we should also support file-level namespaces.

For example, this is the current syntax for defining Geometry.Shapes.Circle:

package Geometry library "Shapes" api;

namespace Shapes;
struct Shapes.Circle;

This is the proposed alternative syntax for defining Geometry.Shapes.Circle, and would put all entities in the file under the Shapes namespace:

package Geometry library "Shapes" namespace Shapes api;

struct Circle;

Advantages:

  • Reduces repetitive syntax in the file when every entity should be in the same, child namespace.
    • Large libraries and packages are more likely to be self-referential, and may pay a disproportionate ergonomics tax that others wouldn't see.
    • Although library authors could also avoid this repetitive syntax by omitting the namespace, that may in turn lead to more name collisions for large packages.
    • Note that syntax can already be reduced with a shorter namespace alias, but the redundancy cannot be eliminated.
  • Reduces the temptation of aliasing in order to reduce verbosity, wherein it's generally agreed that aliasing creates inconsistent names which hinder readability.
    • Users are known to alias long names, where "long" may be considered anything over six characters.
    • This is a risk for any package that uses namespaces, as importers may also need to address it.

Disadvantages:

  • Encourages longer namespace names, as they won't need to be retyped.
  • Increases complexity of the package keyword.
  • Creates two ways of defining namespaces, and reuses the namespace keyword in multiple different ways.
    • We generally prefer to provide one canonical way of doing things.
    • Does not add functionality which cannot be achieved with entity-level namespaces. However, the converse is not true: entity-level control allows a single file to put entities into multiple namespaces.
  • Creates a divergence between code as written by the library maintainer and code as called.
    • Calling code would need to specify the namespace, even if aliased to a shorter name. Library code gets to omit this, essentially getting a free alias.

We are choosing not to provide this for now because we want to provide the minimum necessary support, and then see if it works out. It may be added later, but it's easier to add features than to remove them.

Scoped namespaces

Instead of including additional namespace information per-name, we could have scoped namespaces, similar to C++. For example:

namespace absl {
  namespace numbers_internal {
    fn SafeStrto32Base(...) { ... }
  }

  fn SimpleAtoi(...) {
    ...
    return numbers_internal.SafeStrto32Base(...);
    ...
  }
}

Advantages:

  • Makes it easy to write many things in the same namespace.

Disadvantages:

  • It's not clear which namespace an identifier is in without scanning to the start of the file.
  • It can be hard to find the end of a namespace. For examples addressing this, end-of-namespace comments are called for by both the Google and Boost style guides.
    • Carbon may disallow the same-line-as-code comment style used for this. Even if not, if we acknowledge it's a problem, we should address it structurally for readability.
    • This is less of a problem for other scopes, such as functions, because they can often be broken apart until they fit on a single screen.

There are other ways to address the con, such as adding syntax to indicate the end of a namespace, similar to block comments. For example:

{ namespace absl
  { namespace numbers_internal
    fn SafeStrto32Base(...) { ... }
  } namespace numbers_internal

  fn SimpleAtoi(...) {
    ...
    return numbers_internal.SafeStrto32Base(...);
    ...
  }
} namespace absl

While we could consider such alternative approaches, we believe the proposed contextless namespace approach is better, as it reduces information that developers will need to remember when reading/writing code.

Rationale

This proposal provides an organizational structure that seems both workable and aligns well with Carbon's goals:

  • Distinct and required top-level namespace -- "package"s from the proposal -- both matches software best practices for long-term evolution, and avoids complex and user-confusing corner cases.
  • Providing a fine-grained import structure as provided by the "library" concept supports scalable build system implementations while ensuring explicit dependencies.
  • The structured namespace facilities provide a clear mechanism to migrate existing hierarchical naming structures in C++ code.

Open questions

Should we switch to a library-oriented structure that's package-agnostic?

  • Decision: No.
  • Rationale: While this would simplify the overall set of constructs needed, removing the concept of a global namespace remained desirable and would require re-introducing much of the complexity around top-level namespaces. Overall, the simplification trade-off didn't seem significantly better.

Should there be a tight association between file paths and packages/libraries?

  • Decision: Yes, for the API files in libraries. Specifically, the library name should still be written in the source, but it should be checked to match -- after some platform-specific translation -- against the path.
  • Note: Sufficient restrictions to result in a portable and simple translation on different filesystems should be imposed, but the Core team was happy for these restrictions to be developed as part of implementation work.
  • Rationale: This will improve usability and readability for users by making it obvious how to find the files that are being imported. Similarly, this will improve tooling by increasing the ease with which tools can find imported APIs.