Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8328866: Add raw monitor rank support to the debug agent #19044

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

plummercj
Copy link
Contributor

@plummercj plummercj commented May 1, 2024

This PR adds ranked monitor support to the debug agent. The debug agent has a large number of monitors, and it's really hard to know which order to grab them in, and for that matter which monitors might already be held at any given moment. By imposing a rank on each monitor, we can check to make sure they are always grabbed in the order of their rank. Having this in place when I was working on JDK-8324868 would have made it much easier to detect a deadlock that was occuring, and the reason for it. That's what motivated me to do this work

There were 2 or 3 minor rank issues discovered as a result of these changes. I also learned a lot about some of the more ugly details of the locking support in the process.

Tested with the following on all supported platforms and with virtual threads:

com/sun/jdi
vmTestbase/nsk/jdi
vmTestbase/nsk/jdb
vmTestbase/nsk/jdwp

Still need to run tier2 and tier5.

Details of the changes follow in the first comment.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8328866: Add raw monitor rank support to the debug agent (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19044/head:pull/19044
$ git checkout pull/19044

Update a local copy of the PR:
$ git checkout pull/19044
$ git pull https://git.openjdk.org/jdk.git pull/19044/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19044

View PR using the GUI difftool:
$ git pr show -t 19044

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19044.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 1, 2024

👋 Welcome back cjplummer! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 1, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title 8328866 8328866: Add raw monitor rank support to the debug agent. May 1, 2024
@openjdk
Copy link

openjdk bot commented May 1, 2024

@plummercj The following label will be automatically applied to this pull request:

  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@plummercj
Copy link
Contributor Author

I was able to minimize changes for users of RawMonitor by keeping the same client wrapper APIs (eg. debugMonitorEnter), and for the most part passing in the same argument (the monitor to enter). The only change is that the client is passing a new type called DebugRawMonitor* instead of a jrawMonitorID. The DebugRawMonitor encapsulates the jrawMonitorID and contains other monitor info that needs to be tracked (like the rank and current owner).

Ranks were determined largely by trial and error by running tests and fixing the rank violations asserts as I hit them. In the enum that contains all the rankings, I explain the rational for each monitor's rank.

For the moment I have included a flag (rankedmonitors) to control whether or not ranked monitors should be used. This is just a safety net in case someone runs into an issue. The flag is not documented and will likely be removed at some point.

The ranked monitor supported ended up needing a RawMonitor itself (dbgRawMonitor_lock). This RawMonitor is therefore not part of the ranked monitor support. It could probably use a better name. Suggestions are welcomed.

There are a few issues I fixed while working on the ranked monitor support:

During debug agent initialization we need to call util_initialize() before commonRef_initialize() because the later creates a monitor, and that depends on util_initialize() having already been called.

getLocks() in threadControl.c wasn't grabbing locks in the right order. This could have lead to a deadlock, but seems the structure of how locks are used prevented one from happening.

In invoker_completeInvokeRequest() I had to add grabbing the stepLock in order to maintain proper lock order when getLocks() was eventually called (which also grabbed stepLock). Other than triggering a rank violation assert, this doesn't seem to have caused any real issue, but did leave the potential for a deadlock.

In handleInterrupt() I noticed that a local JNI ref was not being freed. I noticed because there were other places where I had added calls to threadControl_currentThread() that resulted in a noticeable JNI ref leak. When I fixed those, I fixed the one in handleInterrupt() too even tough it was not causing any problems.

The ranked monitor APIs needed some extra safeguards during VM exit that are not needed for the unranked monitor APIs. This is because the unranked APIs didn't have a problem with the current thread being NULL, but the ranked APIs require a valid current thread. That is why you see the following check:

    if (gdata->vmDead && current_thread == NULL) {
        return;
    }

@openjdk openjdk bot added the rfr Pull request is ready for review label May 1, 2024
@mlbridge
Copy link

mlbridge bot commented May 1, 2024

error = ignore_vm_death(error);
if (error != JVMTI_ERROR_NONE) {
EXIT_ERROR(error, "on destruction of raw monitor");
}

dbg_monitor->monitor = NULL;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to protect this with dbgRawMonitor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how that helps. Access to the field is not protected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there could be a race if one thread is destroying this monitor while another is trying to use it. Thus a thread could be doing something like a RawMonitorEnter in the middle of (or after) the monitor being destroyed. Fixing that would require holding dbgRawMonitor during RawMonitorEnter, and that would cause deadlocks all over the place. Also, this same potential issue exists already, but doesn't seem to ever arise. It seems we only call debugMonitorDestroy for cmdQueueLock. Not sure why that is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW Deleting monitors is a tricky business and needs to be done with great care. You have to ensure all threads using the monitor are completely done with it. Simply locking it first to check it is "unused" is not sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the reason that the debug agent only destroys one of the 20 or so monitors it creates. I'm not sure how it got to that point. It may have destroyed more at one point but there were issues, and only one survived destruction without issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If debugger attaches to the debuggee, detaches and re-attaches again, are the monitors recreated again?
(with rankedMonitor you added an assert if DebugRawMonitor::monitor is not null)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If debugger attaches to the debuggee, detaches and re-attaches again, are the monitors recreated again?
(with rankedMonitor you added an assert if DebugRawMonitor::monitor is not null)

Almost all of the debugMonitorCreate() calls seem to come (indirectly) from initialize(), which is only called once.

cmdQueueLock, which is the one monitor we call debugMonitorDestroy() on, is called from debugLoop_run(), which is called whenever a debugger connection is initiated. This is the only case of creating, destroying, and then recreating. I wonder if there is some need for that.

popFrameEventLock and popFrameProceedLock are created lazily when first needed. There is a check to make sure they are not recreated if already created. They are never destroyed.

That seems to be it. So we pretty much create all monitors upon initialization and never recreate those that are created during initialization.

With regard to the assert, this can only be an issue for cmdQueueLock, and since debugMonitorDestroy() is called on it (and that clears the needed fields), the assert shouldn't be triggered.

src/jdk.jdwp.agent/share/native/libjdwp/util.c Outdated Show resolved Hide resolved
src/jdk.jdwp.agent/share/native/libjdwp/util.c Outdated Show resolved Hide resolved
src/jdk.jdwp.agent/share/native/libjdwp/util.c Outdated Show resolved Hide resolved
src/jdk.jdwp.agent/share/native/libjdwp/util.c Outdated Show resolved Hide resolved
src/jdk.jdwp.agent/share/native/libjdwp/util.c Outdated Show resolved Hide resolved
src/jdk.jdwp.agent/share/native/libjdwp/util.c Outdated Show resolved Hide resolved

// Need to lock during initialization so verifyMonitorRank() can be guaranteed that
// if the monitor field is set, then the monitor if fully initialized. More specifically,
// that the rank field has been set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rank for all monitors can be set in util_initialize():

for (i = 0; i < NUM_DEBUG_RAW_MONITORS; i++) {
  dbg_monitors[i]->rank = i;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided not to make this change. It was a bit disruptive with the need for forward declarations. Plus I wasn't convinced I wouldn't still need the locking since there is also the assignment of the name field (although that is only used if something has gone wrong) and also I was a little worried that tearing could happen on some platforms (a partial ptr value is read in), although not on any platform that we support. I did update the comment to make it less specific about the rank field.

@sspitsyn
Copy link
Contributor

sspitsyn commented May 7, 2024

Chris, I'm looking at this fix.

@plummercj
Copy link
Contributor Author

I added a verifyMonitorRank() when doing a debugMonitorWait() since wait does an exit and enter, so we should do the same rank check that we do during an enter. I was a bit worried that we might be doing wait() while holding higher ranking lock, especially given how getLocks() pro-actively grabs locks before they are needed, but all tests (surprisingly) passed after this change.

…d enter, so we should do the same rank check that we do during an enter.
@openjdk openjdk bot removed the rfr Pull request is ready for review label May 7, 2024
JDI_ASSERT(monitor != NULL);

// Need to lock during initialization so verifyMonitorRank() can be guaranteed that
// if the monitor field is not NULL, then the monitor if fully initialized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It looks like a typo: "the monitor if fully" => "the monitor is fully"

@openjdk openjdk bot added the rfr Pull request is ready for review label May 7, 2024
@plummercj
Copy link
Contributor Author

plummercj commented May 7, 2024

What do you think about the terminology usage of "rank". In this PR the term "higher ranked" means that it must be entered first (before any monitor with a lower rank), but 0 is the highest rank, and bigger numbers mean a lower rank. Sometimes when we rank things (in general) we say #1 is the best, but sometimes we instead rank with a score, and the highest score is the best. I'm wondering if for this PR the rank ordering should be reversed, so a "higher rank" is actually a higher number.

Update: Hotspot seems to use the term "higher/lowest rank" in the same way, but applies the lowest rank number to the lowest ranked mutex, so this would suggest that I should reverse the rank orders.

@sspitsyn
Copy link
Contributor

sspitsyn commented May 8, 2024

What do you think about the terminology usage of "rank".

I slightly prefer to have the same rank order as Hotspot uses.
At the same time, I'm not sure if it won't cause any issues/difficulties. But I'd give it a try.

@@ -63,7 +63,7 @@ lastCommand(jdwpCmdPacket *cmd)
void
debugLoop_initialize(void)
{
vmDeathLock = debugMonitorCreate("JDWP VM_DEATH Lock");
vmDeathLock = debugMonitorCreate(vmDeathLockForDebugLoop_Rank, "JDWP VM_DEATH Lock");
Copy link
Contributor

@sspitsyn sspitsyn May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Just a suggestion to consider (there was a similar suggestion from by Alex)...
Can all the debugMonitor's created/initialized by one function debugMonitor_initialize() called by util_initialize()? Then each monitor consumer can cache needed debugMonitor's like this:

vmDeathLock = getDebugMonitor(vmDeathLockForDebugLoop_Rank);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are suggesting that all monitors be created up front during util_intialize(). The debug agent in the past has always created them lazily (and sometimes some of them end up never being created because they are not needed). I don't really see a big advantage in creating them all up front.

Alex's suggestion was a very different one, and was just a suggestion to initialize all the DebugRawMonitor ranks up front rather than when the jMonitorID is created, and the reason for the suggestion was to avoid having to grab the dbgRawMonitor when setting the rank, but I wasn't convinced that it actually allowed you to not grab the dbgRawMonitor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to say that Alex's suggestion is in a similar direction.
I do not see a big advantage to save just on creation of a few monitors. It is not a scalability problem as their number is constant. Creation of the monitors at initialization time is a simplification as we do not care about synchronization at this phase. Also, it is easier to understand because there is no interaction with other threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init code would have to know the name of each monitor since it is no longer being passed in. Something like the following might work:

static DebugRawMonitor dbg_monitors[] = {
{ NULL, "JDWP VM_DEATH Lock", vmDeathLockForDebugLoop_Rank, NULL, 0 },
{ NULL, "JDWP Callback Block", callbackBlock_Rank, NULL, 0 },
{ NULL, "JDWP Callback Lock", callbackLock_Rank, NULL, 0 },
...
};

And then the init() function can iterate over the array and allocate the monitor for each entry. Note that this array needs to be kept in sync with the DebugRawMonitorRank enum (same entries and in the same order). I can assert during the initialization that the rank field for each entry is equal to its array index and that the array size is NUM_DEBUG_RAW_MONITORS.

Copy link
Contributor

@sspitsyn sspitsyn May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do as below? :

static DebugRawMonitor dbg_monitors[]; // initialized with all zeros by default
. . .
static void
createDebugRawMonitor(DebugRawMonitorRank rank, const char* name) {
    dbg_monitors[rank] = debugMonitorCreate(rank, name);
}

static void
monitors_initialize() {
  createDebugRawMonitor( vmDeathLockForDebugLoop_Rank, "JDWP VM_DEATH Lock");
  createDebugRawMonitor( callbackBlock_Rank, "JDWP Callback Block");
  createDebugRawMonitor( callbackLock_Rank, "JDWP Callback Lock");
  . . .
}

The pair of functions createDebugRawMonitor()+debugMonitorCreate() can be refactored into just one function. There monitors can be created in any order but they will be set according to the DebugRawMonitorRank enum.

src/jdk.jdwp.agent/share/native/libjdwp/util.c Outdated Show resolved Hide resolved
struct DebugRawMonitor {
jrawMonitorID monitor;
const char* name;
DebugRawMonitorRank rank;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please, consider to add same comment for lines #1116-1117 as for the line #1119 below.
It might be useful to move the field ownerThread before the other fields (not sure about the monitor field), so the same comment can be shared by multiple fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of don't want to distract from the point that ownerThread is protected by the dbgRawMonitor, and that is the main purpose of dbgRawMonitor. Once you have verified that ownerThread is the current thread, you can release the dbgRawMonitor and access the rank and name fields safely.

There is a race issue with all the fields during debugMonitorCreate() and verifyMonitorRank(), which is why the debugMonitorCreate() must hold the dbgRawMonitor(). However, there is not a race with the fields (other than ownerThread) for the DebugRawMonitor passed into the debugMonitorXXX() APIs because they are not called until the DebugRawMonitor is done being initialized. So this means that debugMonitorXXX() can access the monitor, name, and rank fields without holding dbgRawMonitor, because they are guaranteed to be fully initialized by that point, and they don't change. ownerThread does change, so you must access it while holding the dbgRawMonitor. Technically speaking entryCount could also be changed by other threads, but we don't access it until we know the current thread owns the DebugRawMonitor, so it is safe to access (no other thread would modify or access it).

To put this another way, the debugMonitorXXX() APIs can safely access most of the DebugRawMonitor fields for the DebugRawMonitor passed in because we know they are initialized by this point and don't change. ownerThread (and to some extent entryCount) are the exception. verifyMonitorRank() introduces additional synchronization issues because it iterates over all DebugRawMonitors, and some might be in the middle of creation, so some synchronization with the creation code is needed.

However, I now realize that if we initialized all the monitors up front, then there would be no need to hold dbgRawMonitor during debugMonitorCreate(), although this does not allow for any improvements in verifyMonitorRank() because it still needs to hold the dbgRawMonitor due to accessing the ownerThread field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No pressure, it is up to you. I just wonder if there is any room to make it a little bit more clear and simple.

@openjdk openjdk bot removed the rfr Pull request is ready for review label May 9, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label May 16, 2024
* verifyMonitorRank(), we should also hold it during monitor initialization.
*/

static jrawMonitorID dbgRawMonitor;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the monitor is used to synchronize access to dbg_monitors array, maybe rename it to something like dbg_monitors_lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexmenkov Actually I don't think dbg_monitors_lock is a good name. It doesn't synchronize access to the dbg_monitors array, but rather to the ownerThread field of any given entry in the array. Maybe something like ownerThread_field_access_lock would be better, although that's getting a bit wordy, and maybe more specific than it needs to be (it would prevent other uses of the lock).

}

static void
assertIsCurrentThread(JNIEnv *env, jthread thread, jthread current_thread)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function gets both threads as arguments, I think assertIsSameThread would be more correct name

@dholmes-ora
Copy link
Member

Note we can't check if the current thread owns a lock without grabbing dbgRawMonitor. In fact the main purpose of dbgRawMonitor is to allow the current thread to check if it owns the monitor.

That smells fishy to me. A thread can always safely check if it owns something because it can't race with itself and get a wrong answer.

@plummercj
Copy link
Contributor Author

That smells fishy to me. A thread can always safely check if it owns something because it can't race with itself and get a wrong answer.

Unfortunately checking ownership means comparing jobjects, which you can't safely do if you are comparing to a jobject that could be freed at any moment (I'm referring the JNI ref being freed, not the object being GC'd). I ran into asserts in JNI IsSameObject() due to this happening. Basically the local ref became invalid between the time it was fetched from the DebugRawMonitor and when is was used by IsSameObject().

@dholmes-ora
Copy link
Member

Unfortunately checking ownership means comparing jobjects, which you can't safely do if you are comparing to a jobject that could be freed at any moment

Okay but how does this dbgRawMonitor lock prevent the GC from clearing a jobject?

@plummercj
Copy link
Contributor Author

Okay but how does this dbgRawMonitor lock prevent the GC from clearing a jobject?

The JNI ref. It's actually a global ref (I said local ref earlier). As long as any setters of DebugRawMonitor::ownerThread (setting the thread or clearing it) grab dbgRawMonitor first, and the current thread accessing ownerThread does the same, then you know that whatever ownerThread points to is either NULL, the current thread, or some other thread that is still alive and is actually holding the monitor that the DebugRawMonitor is encapsulating. As long as the current thread continues to hold dbgRawMonitor, no other thread will be able to change ownerThread. This means that if some other thread currently owns the monitor that the DebugRawMonitor is encapsulating, then it will first block on dbgRawMonitor before trying to release it and clear ownerThread.

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 19, 2024

@plummercj This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@plummercj
Copy link
Contributor Author

keep alive

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 22, 2024

@plummercj This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@plummercj
Copy link
Contributor Author

Keep alive. I'll be continuing work on this soon.

@plummercj plummercj changed the title 8328866: Add raw monitor rank support to the debug agent. 8328866: Add raw monitor rank support to the debug agent Aug 15, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Sep 14, 2024

@plummercj This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfr Pull request is ready for review serviceability [email protected]
Development

Successfully merging this pull request may close these issues.

5 participants