Unable to start mirth channel after using ChannelUtil to halt #4701
-
I have a channel's source that detects when channels' and connectors' status is stuck at reading, receiving, or polling. The idea was to initiate corrective action by halting the channel and redeploying it using ChannelUtil, but I'm running into an error. Has anyone run into this before? `[2021-08-31 13:39:57,071] ERROR (com.mirth.connect.server.channel.ErrorTaskHandler:25): com.mirth.connect.donkey.server.StartException: Failed to start channel Outbound DB to DBXML (357b41ab-0947-49a0-abc1-d42effc01993).
Caused by: com.mirth.connect.donkey.server.ConnectorTaskException: org.quartz.ObjectAlreadyExistsException: Unable to store Job : '357b41ab-0947-49a0-abc1-d42effc01993.PollConnector357b41ab-0947-49a0-abc1-d42effc01993', because one already exists with this identification.
Caused by: org.quartz.ObjectAlreadyExistsException: Unable to store Job : '357b41ab-0947-49a0-abc1-d42effc01993.PollConnector357b41ab-0947-49a0-abc1-d42effc01993', because one already exists with this identification.
` |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 8 replies
-
Is that a common thing for you? In several years I can't recall having to worry about hung channels. I've certainly had to manually halt a channel due to code errors, but that's only been a handful of times during development. I'd expect the assorted timeouts on sources and destinations would stop most of those hangs. Care to share a few more details? |
Beta Was this translation helpful? Give feedback.
-
I've seen this before. I don't remember the exact circumstances but basically the internal scheduler has a duplicate entry that it cannot clean up nor overwrite. An old forum post shows a tolerable workaround - https://forums.mirthproject.io/forum/mirth-connect/support/14415-channel-startup-intermittent-errors . Just clone the channel to give it a new channel ID then run it. That should get past your current error. @pacmano1 is generally correct - Monitoring for hung or stuck channels is sensible but figuring out why they are stuck or hung in the first place is a root-cause solution. Problems with pollers most often come up when there is slow or intensive or inefficient logic in the source connector, particularly a JS reader. Failures or slowdowns are harder to debug because message content, timestamps, etc are less visible if the fault occurs before the source message exists and executes the source transformers. If that logic is delegated to a destination connector issues are easier to see. If you'd care to post a followup reply with what problems your util channel is fixing in other channels that would be worthwhile I think. |
Beta Was this translation helpful? Give feedback.
-
I've seen it happen when using the SFTP file reader and writer. Unfortunately, we can't just clone a channel while this is running in production. The point is to create a solution that initiates corrective action for these circumstances without manual intervention. We can always go in to halt/redeploy the channel manually to fix it. |
Beta Was this translation helpful? Give feedback.
-
I should note that the timeout was set to 10000 ms. After it encounters a few errors, it'll eventually get stuck and I'm not sure why. |
Beta Was this translation helpful? Give feedback.
-
This should only be required for mirth versions prior to 3.6. I had two channels that hit a particularly bad sftp server that constantly had errors. I created a new channel called "File Reader Bouncer." I placed the code below in a Javascript Reader. It will watch for the number of pool threads in the file reader channels being monitored to be used up to a threshold, and then try to stop the channel. If it takes too long, then it will try to halt the channel. Then assuming the channel stopped, it will restart it. It will create a blank message setting the default metadata values (mirth_source and mirth_type) with the channelId and status. It worked well enough for me. You could also probably set up an alert for when a channel fails to restart if you notice that still being a problem. // adjust these values for your needs
const channelsToMonitor = [
'e15d4b58-4fe2-4f5d-b9ec-ef47bfa7bba8',
'8f547f2f-c3b9-46b7-99e4-8ed5e594f201'
];
const maxPoolActive = 6;
const maxStopTimeoutInMillis = 10000;
const maxHaltTimeoutInMillis = 10000;
const maxStartTimeoutInMillis = 10000;
/*
Possible values for mirth_type:
RESTARTED, FAILED_START, FAILED_HALT
*/
var targetChannelId, channel, sourceConnector, f, fileConnector, pools, pool, future, isStopped, mirth_type, isInterrupted = false;
function wait(future, timeout) {
try{
future.get(timeout);
}
catch(e if e.javaException instanceof java.lang.InterruptedException) {
isInterrupted = true;
}
catch(e) {}
return future.isDone();
}
var messages = new java.util.ArrayList();
var donkey = com.mirth.connect.donkey.server.Donkey.getInstance();
for (var i = 0; i < channelsToMonitor.length; i++) {
if (isInterrupted || java.lang.Thread.currentThread().isInterrupted()) break;
targetChannelId = channelsToMonitor[i];
channel = donkey.getDeployedChannels().get(targetChannelId);
if (!channel) continue;
sourceConnector = channel.getSourceConnector();
f = sourceConnector.getClass().getDeclaredField("fileConnector");
f.setAccessible(true);
fileConnector = f.get(sourceConnector);
f = fileConnector.getClass().getDeclaredField("pools");
f.setAccessible(true);
pools = f.get(fileConnector);
try {
// can cause an issue if channel is deployed, but stopped
pool = pools.values().iterator().next();
}
catch(e) {
continue;
}
// pool is the channel's source connector's pool of FileSystemConnection objects.
// The default max objects for the pool is 8.
// We can set maxPoolActive to be lower than 8 to bump the channel before it freezes.
// Make sure to set maxStopTimeoutInMillis appropriately in this case to avoid halting a
// channel while it is actively working.
if (pool.getNumActive() > maxPoolActive || (pool.getNumActive() == 8 && pool.getNumWaiters() == 1)) {
future = ChannelUtil.stopChannel(targetChannelId);
isStopped = wait(future, maxStopTimeoutInMillis);
if (!isStopped) {
future = ChannelUtil.haltChannel(targetChannelId);
isStopped = wait(future, maxHaltTimeoutInMillis);
}
if (isStopped) {
future = ChannelUtil.startChannel(targetChannelId);
if (!wait(future, maxStartTimeoutInMillis)) {
mirth_type = "FAILED_START";
}
else {
mirth_type = "RESTARTED";
}
}
else {
mirth_type = "FAILED_HALT";
}
messages.add(new RawMessage('', null, {mirth_source: targetChannelId, mirth_type: mirth_type}));
}
}
return messages; |
Beta Was this translation helpful? Give feedback.
This should only be required for mirth versions prior to 3.6.
I had two channels that hit a particularly bad sftp server that constantly had errors. I created a new channel called "File Reader Bouncer." I placed the code below in a Javascript Reader. It will watch for the number of pool threads in the file reader channels being monitored to be used up to a threshold, and then try to stop the channel. If it takes too long, then it will try to halt the channel. Then assuming the channel stopped, it will restart it. It will create a blank message setting the default metadata values (mirth_source and mirth_type) with the channelId and status.
It worked well enough for me. You could also proba…