compiler: Improve deterministic builds #5965

TD5 · 2022-05-05T10:08:33Z

Improves build determinism for non-native code by:

Making codegen and the preprocessor fully respect the +deterministic option (for all code, OTP or otherwise)
Adding an --enable-deterministic-build flag to configure that controls OTP's build determinism by setting +deterministic appropriately in its Makefiles

This addresses absolute paths being included in generated .erl files and compiled .beam files that resulted in builds from different source directories generating different artefacts (which is a component of the issue in #4482, but is also useful in its own right, for example around caching builds and for the upcoming Dialyzer incremental mode).

OTP itself can now be built in deterministic mode by giving the --enable-deterministic-build flag to configure. I think it would make sense to make this the default at some stage, but I've put the change behind an option for now in an attempt to decouple making deterministic OTP builds possible from making them the default.

Having +deterministic set also results in the compiler options being removed from the module_info for modules where this options was used. This may have other implications for users of OTP.

For tests themselves, +deterministic is disabled, since many test cases depend on accessing the test module's compilation options or other features not available in deterministic mode, in order to configure themselves. For tests of the determinism feature in particular, +deterministic is explicitly passed to the compiler within the relevant test cases.

To run a deterministic build, you can execute a script such as the following:

./otp_build configure --enable-deterministic-build
./otp_build boot -a
./otp_build release -a

This PR is a reformulated version of #5863, which splits the change up into more manageable commits that can be reviewed separately.

N.B. the bootstrap/preloaded .beam files need to be updated in order for this feature to work properly

github-actions · 2022-05-05T10:09:34Z

CT Test Results

      33 files     824 suites 6h 45m 26s ⏱️
10 604 tests   9 903 ✔️   688 💤 13 ❌
21 729 runs 19 987 ✔️ 1 730 💤 12 ❌

For more details on these failures, see this check.

Results for commit da001d9.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

jhogberg

Thanks for the PR! It looks good aside from a few nits. I'll add it to our nightly builds as soon as they're fixed, but unfortunately we've already released the last release candidate for 25 so the changes won't make it in until OTP 25.1 or OTP 26.

lib/asn1/src/Makefile

lib/asn1/src/asn1ct_gen.erl

erts/preloaded/src/add_abstract_code

lib/compiler/test/test_lib.erl

lib/parsetools/src/leex.erl

Makefile.in

TD5 · 2022-05-05T12:07:58Z

@jhogberg thanks for the feedback, I've resolved those formatting issues. Please let me know if you need anything else 👍

garazdawi

Thanks! I'd like to see some more additions to the documentation of the various tools that are updated by this.

Regarding making this the default, we would have to make sure that tools such as Erlang-LS or meck do not rely on any information stripped. If they work with deterministic files (I don't know if they are affected or not) then I am for making this the default.

lib/stdlib/src/epp.erl

lib/asn1/src/asn1ct_gen.erl

lib/diameter/examples/code/GNUmakefile

lib/parsetools/src/leex.erl

lib/parsetools/src/yecc.erl

TD5 · 2022-05-05T14:20:25Z

~~Regarding the checks run by GitHub on this PR, there were two issues last time it completed~~

A failure related to determinism in epp_SUITE.erl - this is correct insofar as determinism won't work properly until the bootstrap/preloaded beam files are updated. After that, this test suite should pass (it works as expected for me locally once I update these files).
~~Some other issue which I don't think I've seen locally. I am currently waiting to see whether this shows up after rerunning.~~

I think I've resolved all the failures related to this change

TD5 · 2022-05-05T14:22:21Z

Regarding making this the default, we would have to make sure that tools such as Erlang-LS or meck do not rely on any information stripped. If they work with deterministic files (I don't know if they are affected or not) then I am for making this the default.

Yes, I agree. My hope is that we can merge this, then try it out by explicitly opt-ing in with the new flag, which should let us flush out any issues with other tools.

TD5 · 2022-05-06T16:03:39Z

Just looking at one last unit test failure which I suspect is due to the test not being set up in a way that will execute correctly in the GitHub CI.

Makes the EPP module use basenames rather than absolute paths for all -file attributes / ?FILE macros when +deterministic is set. Previously, EPP did not respect +deterministic in all scenarios, so build output could still contain absolute paths.

Makes asn1ct_gen filter out potentially non-deterministic attributes from generated .erl files when +deterministic is set.

Makes generated leex scanners only use basenames in generated -file attributes, rather than absolute paths when +deterministic is set.

Makes generated yecc parsers use only basenames in generated -file attributes rather than absolute paths when +deterministic is set.

Makes compile pass along the +deterministic flag for epp to utilise.

Makes test_lib avoid a crash if +deterministic is enabled to tests. +deterministic strips the compilation options entirely, which test_lib wasn't able to handle. Now, an empty list is returned in that case.

bjorng

I have changed the target branch to maint so that we can include this in OTP 25.1.

I have two more nit picks. When you have fixed them, I will add this PR to our daily builds.

lib/asn1/src/asn1ct_gen.erl

lib/compiler/test/test_lib.erl

TD5 · 2022-06-20T16:38:25Z

Thanks for your feedback, @bjorng. I am not sure precisely the formatting convention to use, but hopefully it's better now.

TD5 · 2022-06-20T20:28:46Z

Regarding CI: The test failures seem unrelated and transient: each time I rerun, a different thing seems to fail and I don't think it's related to my change.

bjorng · 2022-06-21T04:50:19Z

Regarding CI: The test failures seem unrelated and transient: each time I rerun, a different thing seems to fail and I don't think it's related to my change.

Yes, there unfortunately are some unstable test cases.

bjorng · 2022-06-21T06:35:23Z

What I meant about the indentation being off is that {options,Opts} -> wasn't lined up with false ->.

I have pushed to your branch a suggestion for a different way to handle non-existing option lists.

I have now added this PR (along with updates to the primary bootstrap and the preloaded modules) to our daily builds.

bjorng · 2022-06-22T10:47:23Z

The test cases leex_SUITE:deterministic/1 and yecc_SUITE:deterministic/1 both fail if run using an installed Erlang system. We run our automatic tests using an installed Erlang system.

TD5 · 2022-06-23T22:19:57Z

The test cases leex_SUITE:deterministic/1 and yecc_SUITE:deterministic/1 both fail if run using an installed Erlang system. We run our automatic tests using an installed Erlang system.

If I understand correctly, then I think them failing is unavoidable, since the installed Erlang will not have the appropriate determinism features?

TD5 · 2022-06-24T20:53:14Z

The test cases leex_SUITE:deterministic/1 and yecc_SUITE:deterministic/1 both fail if run using an installed Erlang system. We run our automatic tests using an installed Erlang system.

If I understand correctly, then I think them failing is unavoidable, since the installed Erlang will not have the appropriate determinism features?

Ah, it has been explained to me that you build the current OTP version, but run make -j install, then execute the tests from there, which is not how I was running them. Now that I understand, I'll look into why the tests fail in that case.

Add a --enable-deterministic-build to the configure script, which sets ERL_DETERMINISTIC=yes throughout the relevant Makefiles, which then invoke the relevant build stages with the +deterministic option. This addresses absolute paths being included in generated .erl files and compiled .beam files that resulted in builds from different source directories generating different artefacts (which is a component of the issue in #4482). I think it would make sense to make this the default at some stage, but I've put the change behind a flag for now to decouple making deterministic OTP builds possible from making them the default. Having +deterministic set results in compiler options being removed from the module info for modules where this options was used. This may have other implications for users of OTP. For tests themselves, +determinism is not set, since many test cases depend on accessing the test module's compilation options, or other features not available in deterministic mode, in order to configure themselves. For tests of the determinism feature specifically, +deterministic must be explicitly passed to the compiler within the relevant test cases.

bjorng · 2022-06-27T10:34:40Z

Thanks, added to our daily builds.

bjorng · 2022-06-29T11:39:32Z

Thanks for your pull request.

Implements a new `dialyzer --incremental` mode. Incremental mode primarily differs from the previous, "classic", ways of running Dialyzer, in that its model is optimised around the common usecase of regularly analysing a single codebase, tweaking the code, analysing it again, and so on, without explicit reference to the building and checking of a PLT. In this mode the PLT file acts much more like a true cache, where users provide a codebase and a set of files they care about, and Dialyzer does the legwork in terms of deciding how to most efficiently report all of the relevant warnings given the cached results it may already have in the PLT (and if a PLT doesn't exist, incremental mode will make one). This change also includes additional supporting options to aid with debugging, for example, for dumping the dependency graph as incrementality sees it (which modules depend on which other modules via function calls, type definitions, etc.), and which modules Dialyzer thinks changed, and hence which modules it decided to (re-)analyse. Example usage scenarios: - Analyse two apps and report warnings on both: `dialyzer --incremental --apps dialyzer erts` - Analyse two apps, and report warnings on only one: `dialyzer --incremental --apps dialyzer erts --warning_apps dialyzer` - Analyse the apps given in config*: `dialyzer --incremental` - Debug an analysis by dumping the dependency graph Dialyzer computed: `dialyzer --incremental --apps dialyzer erts --dump_full_dependencies_graph ~/deps.dot` *Config defaults to standard location such as `~/.config/erlang/dialyzer.config`, but can be overridden via the `DIALYZER_CONFIG` environment variable. An example config: ``` {incremental, {default_apps, [stdlib, erts, kernel, compiler, dialyzer]}}. ``` Further design details ====================== Key considerations: - "Pay for what you use" - Correctness - Portability - Migratability "Pay for what you use" ---------------------- One key principle of the new incremental mode was that users should be able to invoke Dialyzer and have the tool itself work out precisely which parts of the analysis can be read straight back from the cache, and which need to be (re-)analysed, due to any changes in the code since the PLT was last updated. Relative to the typical cost of the Dialyzer analysis itself, the administrative overhead of incrementality is trivial in practice. There are some extra things kept in the PLT (such as a record of the warnings previously computed) which need to be read/written, and some changes to how modules are hashed when checking for changes, but these costs aren't big, and some other optimisations, such as indexing jobs more efficiently, mean that I've not seen an incremental Dialyzer analysis that is any slower than the corresponding building/checking of a PLT in the classical way. The gains from using incremental mode depend significantly on the changes made to the code between two incremental analyses, both in terms of the number of modules directly modified by the change, but also the size of the transitive closure of the modules which depend on the modified code. For usage scenarios with large changes, or changes to modules upon which many other modules transitively depend, the benefits of incremental mode will be more modest, because fewer of the cached results will still be unaffected by the change. For usage scenarios where the changes are at the "leaf" of the module dependency graph, the benefits of incremental mode will be more significant. Re-running an incremental analysis without making code changes should be very fast indeed, since the result of the analysis can be read entirely from the cached results in the PLT. Correctness ----------- The earlier PR to erlang/OTP erlang#5498 improved Dialyzer's tracking of dependencies in preparation for this change. The result should be that running Dialyzer in incremental mode for a set of modules should result in it "doing the right thing", giving the same results as an analysis from scratch, but leveraging an existing PLT to potentially avoid doing some unnecessary work. I've tested incremental mode in production for a large codebase for many months, comparing the issues it reports with those reported by building/checking a PLT in the classical way, and I have seen no discrepancy between the two in practice. I've noticed a practical improvement in correctness for some users who would attempt to build, merge and check PLTs for subsets of their code manually, in order to approximate something like incrementality. Unfortunately, doing this manually risked giving incorrect results, because it was easy to end up not analysing a module that was transitively affected by a change to the codebase. For simplicity and correctness reasons, incremental mode does not support merging PLTs as it stands. Portability ----------- Incremental mode can provide significant performance wins if most of the analysis can be cached. This benefit can be leveraged by sharing a PLT file between checkouts or even machines, allowing, for example, a previous run on a CI/CD server to generate a PLT file that a user can then base their local analyses on. In order to make this sort of PLT sharing portable, Dialyzer needed to stop using absolute paths for indexing modules in the PLT and internally, and switch to indexing on module name (which must already be unique within a single codebase). A corollary of this is that warning messages from incremental mode refer to only the module name, rather than the full path. To offer some mitigation for this, incremental mode can be configured to write out the mapping from module name to absolute path separately to the PLT file, for use cases which may need this information (for example, for reporting in IDE or CI/CD). When building code from a different checkout or on a different machine, build determinism becomes quite important. Previously, building code from a different absolute path would result in different build artefacts (see, for example, erlang/OTP issue erlang#4482), which can affect Dialyzer's perception of whether the code has changed. To mitigate this, Dialyzer's module hashing logic now ignores some non-semantic features, such as -file attribute paths, but running on top of a deterministic build will still give the best results. erlang/OTP PR erlang#5965 makes this much easier. Migratability ------------- Switching to the incremental mode from building/checking a PLT means skipping any `--build_plt` step, and passing `--incremental` rather than `--check_plt`, and providing those files, directories or apps to analyse, either as command line arguments or via config, rather than than relying on a list of files that were already stored in the PLT. This change introduces a distinction between classic PLT (CPLT) and the new incremental PLT (IPLT) files. The two formats contain slightly different information, and aren't compatible. This change adds new warnings which detect and report if an IPLT is used when a CPLT file was expected, or vice versa. The hope is that, by having a clear distinction between the two, mistakes during migration to incremental mode will be picked up quickly. The separation of the `dialyzer_plt` module into `dialyzer_cplt` and `dialyzer_iplt` for the serialisation-specific logic with some intentional duplication between the two should allow the two to naturally evolve independently (there's no technical reason why they must operate similarly), and give a route to deprecating the classic mechanism in time. Incremental mode offers the ability to analyse a set of modules, but only report the warnings on a subset of them. One usecase for this is where the codebase is made up of first-party code, for which you want warnings, and some third-party dependencies, which you don't want to see warnings for, but are needed as "context" for the analysis of the first-party code. This was implicitly possible in classic Dialyzer modes by virtue of the arguments used when building and checking a PLT file.

jhogberg self-assigned this May 5, 2022

jhogberg added the team:VM Assigned to OTP team VM label May 5, 2022

jhogberg added this to the OTP-26.0 milestone May 5, 2022

jhogberg reviewed May 5, 2022

View reviewed changes

garazdawi reviewed May 5, 2022

View reviewed changes

lib/stdlib/src/epp.erl Show resolved Hide resolved

lib/asn1/src/asn1ct_gen.erl Show resolved Hide resolved

lib/diameter/examples/code/GNUmakefile Outdated Show resolved Hide resolved

lib/parsetools/src/leex.erl Show resolved Hide resolved

lib/parsetools/src/yecc.erl Show resolved Hide resolved

TD5 added 6 commits May 6, 2022 09:28

compiler: Make asn1ct_gen respect +deterministic

ebda708

Makes asn1ct_gen filter out potentially non-deterministic attributes from generated .erl files when +deterministic is set.

compiler: Make leex respect +deterministic

d8014fb

Makes generated leex scanners only use basenames in generated -file attributes, rather than absolute paths when +deterministic is set.

compiler: Make yecc respect +deterministic

883d86c

Makes generated yecc parsers use only basenames in generated -file attributes rather than absolute paths when +deterministic is set.

compiler: Make compiler forward +determinsitic flag to epp

8d87cd6

Makes compile pass along the +deterministic flag for epp to utilise.

compiler: Make test_lib robust to +deterministic

a22b140

Makes test_lib avoid a crash if +deterministic is enabled to tests. +deterministic strips the compilation options entirely, which test_lib wasn't able to handle. Now, an empty list is returned in that case.

TD5 requested a review from jhogberg May 6, 2022 17:49

garazdawi approved these changes May 9, 2022

View reviewed changes

TD5 mentioned this pull request May 9, 2022

ERL-1446: Repeated Erlang/OTP compilations from different source directories don't produce identical build results #4482

Open

TD5 mentioned this pull request May 18, 2022

dialyzer: Add incremental analysis mode #5997

Merged

jhogberg assigned bjorng and unassigned jhogberg Jun 17, 2022

bjorng changed the base branch from master to maint June 20, 2022 07:39

bjorng reviewed Jun 20, 2022

View reviewed changes

lib/asn1/src/asn1ct_gen.erl Outdated Show resolved Hide resolved

lib/compiler/test/test_lib.erl Outdated Show resolved Hide resolved

bjorng added the testing currently being tested, tag is used by OTP internal CI label Jun 28, 2022

bjorng merged commit 7f51a71 into erlang:maint Jun 29, 2022

bjorng modified the milestones: OTP-26.0, OTP-25.1 Jun 29, 2022

joaohf mentioned this pull request May 8, 2023

Support deterministic builds apache/couchdb#4580

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: Improve deterministic builds #5965

compiler: Improve deterministic builds #5965

TD5 commented May 5, 2022 •

edited

Loading

github-actions bot commented May 5, 2022 •

edited

Loading

jhogberg left a comment

TD5 commented May 5, 2022

garazdawi left a comment

TD5 commented May 5, 2022 •

edited

Loading

TD5 commented May 5, 2022

TD5 commented May 6, 2022

bjorng left a comment

TD5 commented Jun 20, 2022

TD5 commented Jun 20, 2022

bjorng commented Jun 21, 2022

bjorng commented Jun 21, 2022

bjorng commented Jun 22, 2022

TD5 commented Jun 23, 2022

TD5 commented Jun 24, 2022

bjorng commented Jun 27, 2022

bjorng commented Jun 29, 2022

compiler: Improve deterministic builds #5965

compiler: Improve deterministic builds #5965

Conversation

TD5 commented May 5, 2022 • edited Loading

github-actions bot commented May 5, 2022 • edited Loading

CT Test Results

Artifacts

jhogberg left a comment

Choose a reason for hiding this comment

TD5 commented May 5, 2022

garazdawi left a comment

Choose a reason for hiding this comment

TD5 commented May 5, 2022 • edited Loading

TD5 commented May 5, 2022

TD5 commented May 6, 2022

bjorng left a comment

Choose a reason for hiding this comment

TD5 commented Jun 20, 2022

TD5 commented Jun 20, 2022

bjorng commented Jun 21, 2022

bjorng commented Jun 21, 2022

bjorng commented Jun 22, 2022

TD5 commented Jun 23, 2022

TD5 commented Jun 24, 2022

bjorng commented Jun 27, 2022

bjorng commented Jun 29, 2022

TD5 commented May 5, 2022 •

edited

Loading

github-actions bot commented May 5, 2022 •

edited

Loading

TD5 commented May 5, 2022 •

edited

Loading