Compare invocation namespaces when handling a cycle and recovering a queue #5432
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This is to fix a bug in that ETCD data for non-existent containers persist forever.
This bug happens when a shared action is invoked.
When the queue manager receives an activation message, it finds a queue based on the
docId
only and this is wrong.As a result, the activation message could be sent to the wrong queue as long as the
docId
is same.But the
docId
is same for all shared actions.ex)
/whisk.system/sharedPackage/hello
/style95/myPackage/hello
(docId:/whisk.system/sharedPackage/hello
)/bdoyle/yourPackage/hello
(docId:/whisk.system/sharedPackage/hello
)So an activation for
/style95/myPackage/hello
could be sent to/bdoyle/yourPackage/hello
.Then a memory queue will send the activation to a container for
/style95/myPackage/hello
.The container is initialized with
/style95/myPackage/hello
and it registers the ETCD data for the running container.But after executing the activation for
/bdoyle/yourPackage/hello
, it tries to remove the ETCD data for the running container using the key prefix with/bdoyle/yourPackage/hello
because the key now resides in the container data(WarmData
).Accordingly, the original ETCD data for the running container is not deleted forever.
Please refer to the following logs that I found.
(The shared action is
whisk.system/sharedPackage/hello
and I replaced the name of two namespace tostyle95
andbdoyle
from the original logs just to hide proprietary information.)As you can see above the activation(
98481e836089419d881e836089d19d00
) is originally forstyle95
but it is finally sent to the queue forbdoyle
.And the queue sent this activation to a container with an ID(
f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff
).This container was originally created for
bdoyle
.But when the same container(
f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff
) is being paused, it tries to unwatch endpoint based on thestyle95
key because it executed an activation forstyle95
namespace.Consequently, the data for
bdoyle
(whisk/namespace/bdoyle/whisk.system/sharedPackage/hello/38-5e7fc51a452c685030fdfcec61e3bdf1/invoker0/container/f372eb3f960da72c4bff75110d24b0c5b72d6d306d3c31e59f4c10f90ccefdff
) is not removed forever unless the invoker is restarted.Related issue and scope
My changes affect the following components
Types of changes
Checklist: