Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow spawning additional UI isolates in flutter_tester #48706

Merged
merged 21 commits into from
Jan 5, 2024

Conversation

dnfield
Copy link
Contributor

@dnfield dnfield commented Dec 5, 2023

This enables work ongoing by @derekxu16 to improve performance in flutter_tester when running multiple files from large test suites.

Specifically, it:

  • Exposes a Spawn C symbol from flutter_tester that runs the current kernel in a new UI isolate with a different entrypoint and/or route name
  • Exposes two symbols from flutter_tester to allow a test harness to more efficiently load particular kernel files or to lookup an entrypoint from an imported source file.
  • Avoids re-loading the kernel file completely when spawning a new UI isolate

Googlers can look at go/flutter-tester-isolates for some more context. If anyone wants I'm happy to create a public version of that doc.

@dnfield dnfield requested a review from mraleph December 5, 2023 23:53
@mraleph mraleph requested review from mkustermann and removed request for mraleph December 6, 2023 05:37
@mraleph
Copy link
Member

mraleph commented Dec 6, 2023

I am OOO so asking @mkustermann to review in my stead.

Copy link
Member

@mkustermann mkustermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, first round of comments

// Mapping must be retained until isolate shutdown.
(*current_isolate)->kernel_buffers_.push_back(mapping);

auto lib =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider handling errors via Dart_IsError(lib).

runtime/dart_isolate.h Outdated Show resolved Hide resolved
auto current_isolate =
static_cast<std::shared_ptr<DartIsolate>*>(Dart_CurrentIsolateData());
// Mapping must be retained until isolate shutdown.
(*current_isolate)->kernel_buffers_.push_back(mapping);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel file is used to populate information that belongs to the isolate group. Any isolate may use classes/functions/... in that kernel blob.

As a result, the kernel blob is a resource belonging to the entire isolate group, it has to be kept alive until the isolate group dies.

=> So I'd think this should use Dart_CurrentIsolateGroupData() and stick it in there, ensuring it is kept alive until the group dies.

(Alternatively you can make an external typed data from the bytes, attach a finalizer and use the external typed data to load the kernel via Dart_LoadLibrary(Dart_Handle kernel_buffer))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1130,6 +1130,26 @@ Dart_Handle DartIsolate::OnDartLoadLibrary(intptr_t loading_unit_id) {
return Dart_NewApiError(error_message.c_str());
}

Dart_Handle DartIsolate::LoadLibraryFromKernel(
const std::shared_ptr<const fml::Mapping>& mapping) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body of this function seems to indicate that this kernel blob given here is an application (i.e. has one designated main function and possibly many libraries in it). That means a better name for this function would be LoadKernelProgram.

Though looking at how this function is used on call sites indicates that we don't use the main closure returned from this function at all.
=> So I'd suggest to rename it to LoadLibrariesFromKernel and make it return either an error handle or a null handle (not a closure)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test here is not actually calling main because it would get a bit more complicated to understand what the test is doing in that case. In the intended use of this API, returning main will be important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider renaming this method - it's confusing and makes it unclear what invariants are. The choices are

  • it's loading a program (kernel contains many libraries and a pointer to a main function)
  • it's loading libraries (kernel contains many libraries)

In either case, it's possibly loading multiple libraries, so the function shouldn't be singular (i.e. Library) - the naming in Dart API is due to legacy.

Furthermore the code can check this invariant: It can check that the VM's API returns either a closure or null - depending on what semantics we want (i.e. what kind of kernel files we use).

If you intend to make flutter_tester dynamically load more multiple kernel files using this function here then the question is whether those individual kernel files contain overlapping libraries (would make it slow to give VM several kernel files of MBs size that contain mostly duplicate information). Could you add some notes here about this intended use?

@@ -1130,6 +1130,26 @@ Dart_Handle DartIsolate::OnDartLoadLibrary(intptr_t loading_unit_id) {
return Dart_NewApiError(error_message.c_str());
}

Dart_Handle DartIsolate::LoadLibraryFromKernel(
const std::shared_ptr<const fml::Mapping>& mapping) {
if (DartVM::IsRunningPrecompiledCode()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this function be instead if/defed out in AOT mode? In AOT mode it doesn't make sense at all to call LoadLibraryFromKernel, so this code shouldn't be included in AOT runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't ifdef out the Dart code that would call it right?

In reality, this binary only ever runs in JIT for users, we only run it in AOT mode in engine based tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't ifdef out the Dart code that would call it right?

Why not? This code here in runtime/dart_isolate.cc that's included in flutter AOT runtimes. We don't want to rely on LTO being clever enough to tree shake it, so it's better to if/def it out.

The dart VM that's included in this binary is also if/def-ing JIT features out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Dart doesn't support conditional compilation :)

I could guard calls to the @Native functions but that seems to add a lot of complexity when the pattern in the engine is to return null or a String and handle that as an error on the Dart side.

if (!uri || !name) {
return Dart_Null();
}
return Dart_GetField(Dart_LookupLibrary(Dart_NewStringFromCString(uri)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Handle error / null of Dart_LookupLibrary()

Can we make this function either return closure or throw? (and not return null)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but still returning null. I'm not sure what closure to return if this fails, and I don't want to throw an exception from C++ code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Dart_GetField() cannot find the field with the name it will return an error handle. Returning the error handle here should cause an exception on the Dart side.

If you want to prevent that, you have to check for errors and signal error condition to Dart in other ways (e.g. by returning a string with the error) and make dart check for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this throwing an exception in Dart.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this then

Dart_Handle lib = Dart_LookupLibrary(...);
if (Dart_IsError(lib)) return lib;
return Dart_GetField(lib);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

void spawn(
{required SendPort port, String entrypoint = 'main', String route = '/'}) {
// Get off the UI isolate, otherwise there will be a current isolate that
// cannot safely be exited.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why it's not safe? (As long as we don't use a FFI leaf call, we can safely exit current isolate and re-enter it.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of hoping someone from the Dart team can explain that to me. If I Dart_ExitIsolate things go crashy. If I don't, I get complains that I can't create a new isolate without exiting the currentone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little bit vague if you say "things go crashy".

The invariants are as follows:

  • A dart isolate can only be operated on by one thread
  • Creating a new isolate in an existing isolate group requires proof that the group is still alive, the proof is given as an existing isolate (so the thread creating the child isolate needs to be the owner of the proof isolate - i.e. no other thread can operate on it)
  • We can only have one isolate being active on one thread at any given point in time, creating a new child isolate will make it the active one, so the API request the caller to have exited the isolate (but nobody else uses it) before creating the child isolate

We do have tests that exit current isolate, create child isolate, re-enter old isolate, e.g. https://github.com/dart-lang/sdk/blob/a155e2b521088d38/runtime/bin/ffi_test/ffi_test_functions_vmspecific.cc#L356

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh this is helpful. I was missing capturing the current isolate and re-entering it. I'll update the patch to remove this work around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crashes I was seeing were the VM asserting if I failed to exit the isolate before spawning the new one, and then a segfault if I failed to enter the isolate again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also, at the time, was missing a latch that you've made me realize I needed - so with the combination of a latch around the spawn task and a Dart_CurrentIsolate/Dart_ExitIsolate/Dart_EnterIsolate everything seems good now).

if (Dart_IsError(result)) {
return result;
}
return Dart_GetField(lib, Dart_NewStringFromCString("main"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If addressing above comment, make this simply return the null handle.

});
});
};
fml::TaskRunner::RunNowOrPostTask(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comment explaining how we ensure that the isolate group is kept alive until the asynchronous spawn task we trigger here has run (and why it's safe that the closure above references Shell* as pointer and the shell is still alive when closure runs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. I've also added a latch because I'm realizing that the only thing keeping the shell alive is a receive port that could get closed between now and when the platform task gets scheduled.

auto configuration = RunConfiguration::InferFromSettings(
shell->GetSettings(), /*io_worker=*/nullptr,
/*launch_type=*/IsolateLaunchType::kExistingGroup);
configuration.SetEntrypoint(entrypoint);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems entrypoint is the name of the dart function that was marked as @pragma('vm:entry-point'). How does this spawn function which library to find that entrypoint in?

Should this not use configuration.SetEntrypointAndLibrary()?

(The isolate group will have one root library, which is the initial program is was launched from)

Also: If we already have Spawn taking the entrypoint by name, why do we need a separate LookupEntryPoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawn is only good for the current library.

Slava's refactoring of the test running harness required being able to lookup other arbitrary entrypoints in other kernels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawn is only good for the current library.

There's no such thing as a current library.

We do have a concept of a root library which is the library that contains the main function of the very first isolate in the group. But I thought the idea here is to load many kernels and launch many different test main functions - so it cannot rely on the root library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawn is only good for the root library. If you want to load some other library and run an entrypoint in it, you need to use the other API introduced that we're trying to agree on a name for.

Spawn is special because it knows how to deal with creating a UI isolate. The other API does not know about UI isolates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @mraleph gave me more context into his code: The route is basically a proxy of the actual library's entrypoint to run. So the Spawn will spawn an isolate running the same main as the initial isolate and distinguishes by the arguments to main what to do.

I'll let @mraleph have review it, as he's more context (that's not included in this CL)

@dnfield
Copy link
Contributor Author

dnfield commented Dec 7, 2023

Argh. The tests have been failing because of the FML_LOG(ERROR). I'm going to just remove that.

@@ -1130,6 +1130,26 @@ Dart_Handle DartIsolate::OnDartLoadLibrary(intptr_t loading_unit_id) {
return Dart_NewApiError(error_message.c_str());
}

Dart_Handle DartIsolate::LoadLibraryFromKernel(
const std::shared_ptr<const fml::Mapping>& mapping) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider renaming this method - it's confusing and makes it unclear what invariants are. The choices are

  • it's loading a program (kernel contains many libraries and a pointer to a main function)
  • it's loading libraries (kernel contains many libraries)

In either case, it's possibly loading multiple libraries, so the function shouldn't be singular (i.e. Library) - the naming in Dart API is due to legacy.

Furthermore the code can check this invariant: It can check that the VM's API returns either a closure or null - depending on what semantics we want (i.e. what kind of kernel files we use).

If you intend to make flutter_tester dynamically load more multiple kernel files using this function here then the question is whether those individual kernel files contain overlapping libraries (would make it slow to give VM several kernel files of MBs size that contain mostly duplicate information). Could you add some notes here about this intended use?

@@ -1130,6 +1130,26 @@ Dart_Handle DartIsolate::OnDartLoadLibrary(intptr_t loading_unit_id) {
return Dart_NewApiError(error_message.c_str());
}

Dart_Handle DartIsolate::LoadLibraryFromKernel(
const std::shared_ptr<const fml::Mapping>& mapping) {
if (DartVM::IsRunningPrecompiledCode()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't ifdef out the Dart code that would call it right?

Why not? This code here in runtime/dart_isolate.cc that's included in flutter AOT runtimes. We don't want to rely on LTO being clever enough to tree shake it, so it's better to if/def it out.

The dart VM that's included in this binary is also if/def-ing JIT features out.

auto lib =
Dart_LoadLibraryFromKernel(mapping->GetMapping(), mapping->GetSize());
if (tonic::CheckAndHandleError(lib)) {
return Dart_Null();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will swallow errors:

bool CheckAndHandleError(Dart_Handle handle) {
if (...) { 
  } else if (Dart_IsError(handle)) {
    tonic::Log("Dart Error: %s", Dart_GetError(handle));
    return true;
  } else {
    ...
  }
}

We should be propagating errors to the callers (see line 548 below, where you actually return the error if there's one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern in the engine is for the Dart code to handle such error conditions without relying on making API calls that would fail to run C++ destructors (such as Dart_ThrowException). I'm open to arguments about reworking that, but would like to stick to the style used elsewhere in the engine in this patch.

fml::FileMapping::CreateReadOnly(path);
if (!mapping) {
FML_LOG(ERROR) << "Failed to load file at \"" << path << "\".";
return Dart_Null();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is it throwing an exception?

Returning a Dart_Handle (as in the code below return DartIsolate::LoadLibraryFromKernel(mapping);) that's an error should throw an exception on the Dart side. Similar to manually calling

if (Dart_IsError(handle)) Dart_PropagateError(handle);

We don't like throwing dart exceptions from C++ code in this repo because it causes hard to understand problems for users.

Could you clarify what kind of problems this would cause? As long as the errors are thrown in the outermost C function (the one that was invoked from Dart) this should be fine.

Alternatively you can check for errors and return e.g. an error string which Dart code has to check and act upon.

if (!uri || !name) {
return Dart_Null();
}
return Dart_GetField(Dart_LookupLibrary(Dart_NewStringFromCString(uri)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Dart_GetField() cannot find the field with the name it will return an error handle. Returning the error handle here should cause an exception on the Dart side.

If you want to prevent that, you have to check for errors and signal error condition to Dart in other ways (e.g. by returning a string with the error) and make dart check for it.

auto configuration = RunConfiguration::InferFromSettings(
shell->GetSettings(), /*io_worker=*/nullptr,
/*launch_type=*/IsolateLaunchType::kExistingGroup);
configuration.SetEntrypoint(entrypoint);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawn is only good for the current library.

There's no such thing as a current library.

We do have a concept of a root library which is the library that contains the main function of the very first isolate in the group. But I thought the idea here is to load many kernels and launch many different test main functions - so it cannot rely on the root library.

void spawn(
{required SendPort port, String entrypoint = 'main', String route = '/'}) {
// Get off the UI isolate, otherwise there will be a current isolate that
// cannot safely be exited.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little bit vague if you say "things go crashy".

The invariants are as follows:

  • A dart isolate can only be operated on by one thread
  • Creating a new isolate in an existing isolate group requires proof that the group is still alive, the proof is given as an existing isolate (so the thread creating the child isolate needs to be the owner of the proof isolate - i.e. no other thread can operate on it)
  • We can only have one isolate being active on one thread at any given point in time, creating a new child isolate will make it the active one, so the API request the caller to have exited the isolate (but nobody else uses it) before creating the child isolate

We do have tests that exit current isolate, create child isolate, re-enter old isolate, e.g. https://github.com/dart-lang/sdk/blob/a155e2b521088d38/runtime/bin/ffi_test/ffi_test_functions_vmspecific.cc#L356

Copy link
Member

@mkustermann mkustermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @mraleph to finish review, as he's more context on this.

if (!uri || !name) {
return Dart_Null();
}
return Dart_GetField(Dart_LookupLibrary(Dart_NewStringFromCString(uri)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this then

Dart_Handle lib = Dart_LookupLibrary(...);
if (Dart_IsError(lib)) return lib;
return Dart_GetField(lib);

auto configuration = RunConfiguration::InferFromSettings(
shell->GetSettings(), /*io_worker=*/nullptr,
/*launch_type=*/IsolateLaunchType::kExistingGroup);
configuration.SetEntrypoint(entrypoint);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @mraleph gave me more context into his code: The route is basically a proxy of the actual library's entrypoint to run. So the Spawn will spawn an isolate running the same main as the initial isolate and distinguishes by the arguments to main what to do.

I'll let @mraleph have review it, as he's more context (that's not included in this CL)

auto shell = g_shell->get();
fml::AutoResetWaitableEvent latch;
auto isolate = Dart_CurrentIsolate();
Dart_ExitIsolate();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move this closer down to where it's needed (the closure allocation here can happen before exiting the isolate)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

fml::TaskRunner::RunNowOrPostTask(
shell->GetTaskRunners().GetPlatformTaskRunner(), spawn_task);
latch.Wait();
Dart_EnterIsolate(isolate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the reason you have to exit and re-enter the isolate is because the callback gets executed synchronously. Do you then really need to use RunNowOrPostTask() - can you just call the lambda here yourself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the process can be invoked in such a way that it uses multiple threads - if it does, this makes sure the call is executed on the correct thread.

@mraleph
Copy link
Member

mraleph commented Dec 15, 2023

Here is an idea: this change has two components to it:

  • Spawn which is for spawning isolates / engine instances for running tests.
  • Reflective kernel loading and lookup machinery for actually splitting test runner from the blob that contains tests.

I suggest we land spawn and think about the second half separately.

In reality Dart SDK already has all necessary functionality, it is just hidden. So maybe we should not be writing additional code which just replicates existing features:

import 'dart:mirrors';

// e.g. for loading Kernel blob into existing isolate 
final isolate = currentMirrorSystem().isolate;
final lib = await isolate.loadUri(Uri.file('mega_only_tests.dill'));
final main = lib.getField(#main).reflectee;

The only problem is that we set --enable-mirrors=false in flutter_tester (just like everywhere else), but we can allow test runner to use mirrors and be done with it. (We can do it in a way that only enables mirrors for the runner itself and not for tests).

This way we can drop all the C++ changes which are not Spawn related and simplify this PR to barebones. This also closes all the questions around how these APIs for Kernel loading and reflective access should be structured.

@flutter-dashboard

This comment was marked as resolved.

fml::FileMapping::CreateReadOnly(path);
if (!mapping) {
FML_LOG(ERROR) << "Failed to load file at \"" << path << "\".";
return Dart_Null();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider returning an error object here.

@dnfield dnfield added the autosubmit Merge PR when tree becomes green via auto submit App label Jan 5, 2024
@auto-submit auto-submit bot merged commit edbce21 into flutter:main Jan 5, 2024
28 checks passed
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jan 5, 2024
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jan 5, 2024
auto-submit bot pushed a commit to flutter/flutter that referenced this pull request Jan 5, 2024
…141037)

flutter/engine@f3496a7...3fd9c4e

2024-01-05 [email protected] Roll Skia from 548827f77466 to 9ec012033c2b (1 revision) (flutter/engine#49564)
2024-01-05 [email protected] Roll Skia from a006922df1c5 to 548827f77466 (1 revision) (flutter/engine#49561)
2024-01-05 [email protected] Roll Skia from 52be1b25c3a2 to a006922df1c5 (4 revisions) (flutter/engine#49559)
2024-01-05 [email protected] Allow spawning additional UI isolates in flutter_tester (flutter/engine#48706)

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC [email protected],[email protected] on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autosubmit Merge PR when tree becomes green via auto submit App
Projects
None yet
3 participants