Unable to start mirth channel after using ChannelUtil to halt #4701

shadowlinq · 2021-08-31T17:41:42Z

shadowlinq
Aug 31, 2021

I have a channel's source that detects when channels' and connectors' status is stuck at reading, receiving, or polling. The idea was to initiate corrective action by halting the channel and redeploying it using ChannelUtil, but I'm running into an error. Has anyone run into this before?

`[2021-08-31 13:39:57,071] ERROR (com.mirth.connect.server.channel.ErrorTaskHandler:25): com.mirth.connect.donkey.server.StartException: Failed to start channel Outbound DB to DBXML (357b41ab-0947-49a0-abc1-d42effc01993).

at com.mirth.connect.donkey.server.channel.Channel.start(Channel.java:724)

at com.mirth.connect.server.controllers.DonkeyEngineController$DeployTask.doDeploy(DonkeyEngineController.java:1821)

at com.mirth.connect.server.controllers.DonkeyEngineController$DeployTask.execute(DonkeyEngineController.java:1718)

at com.mirth.connect.server.channel.ChannelTask.call(ChannelTask.java:67)

at com.mirth.connect.server.channel.ChannelTask.call(ChannelTask.java:16)

at java.util.concurrent.FutureTask.run(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)

Caused by: com.mirth.connect.donkey.server.ConnectorTaskException: org.quartz.ObjectAlreadyExistsException: Unable to store Job : '357b41ab-0947-49a0-abc1-d42effc01993.PollConnector357b41ab-0947-49a0-abc1-d42effc01993', because one already exists with this identification.

at com.mirth.connect.donkey.server.channel.PollConnector.start(PollConnector.java:45)

at com.mirth.connect.donkey.server.channel.Channel.start(Channel.java:700)

... 8 more

Caused by: org.quartz.ObjectAlreadyExistsException: Unable to store Job : '357b41ab-0947-49a0-abc1-d42effc01993.PollConnector357b41ab-0947-49a0-abc1-d42effc01993', because one already exists with this identification.

at org.quartz.simpl.RAMJobStore.storeJob(RAMJobStore.java:277)

at org.quartz.core.QuartzScheduler.addJob(QuartzScheduler.java:920)

at org.quartz.impl.StdScheduler.addJob(StdScheduler.java:269)

at com.mirth.connect.donkey.util.PollConnectorJobHandler.scheduleJob(PollConnectorJobHandler.java:97)

at com.mirth.connect.donkey.server.channel.PollConnector.start(PollConnector.java:40)

... 9 more

`

Answered by tonygermano

Sep 1, 2021

This should only be required for mirth versions prior to 3.6.

I had two channels that hit a particularly bad sftp server that constantly had errors. I created a new channel called "File Reader Bouncer." I placed the code below in a Javascript Reader. It will watch for the number of pool threads in the file reader channels being monitored to be used up to a threshold, and then try to stop the channel. If it takes too long, then it will try to halt the channel. Then assuming the channel stopped, it will restart it. It will create a blank message setting the default metadata values (mirth_source and mirth_type) with the channelId and status.

It worked well enough for me. You could also proba…

View full answer

pacmano1 · 2021-08-31T21:42:32Z

pacmano1
Aug 31, 2021
Collaborator

Is that a common thing for you? In several years I can't recall having to worry about hung channels. I've certainly had to manually halt a channel due to code errors, but that's only been a handful of times during development. I'd expect the assorted timeouts on sources and destinations would stop most of those hangs. Care to share a few more details?

0 replies

jonbartels · 2021-08-31T21:52:35Z

jonbartels
Aug 31, 2021

I've seen this before. I don't remember the exact circumstances but basically the internal scheduler has a duplicate entry that it cannot clean up nor overwrite.

An old forum post shows a tolerable workaround - https://forums.mirthproject.io/forum/mirth-connect/support/14415-channel-startup-intermittent-errors . Just clone the channel to give it a new channel ID then run it. That should get past your current error.

@pacmano1 is generally correct - Monitoring for hung or stuck channels is sensible but figuring out why they are stuck or hung in the first place is a root-cause solution.

Problems with pollers most often come up when there is slow or intensive or inefficient logic in the source connector, particularly a JS reader. Failures or slowdowns are harder to debug because message content, timestamps, etc are less visible if the fault occurs before the source message exists and executes the source transformers. If that logic is delegated to a destination connector issues are easier to see.

If you'd care to post a followup reply with what problems your util channel is fixing in other channels that would be worthwhile I think.

1 reply

pacmano1 Aug 31, 2021
Collaborator

Zen Healthcare (https://consultzen.com/) has some enterprise tools to help with this kind of mirth tuning, analysis and performance monitoring. I do not work for them.

shadowlinq · 2021-09-01T22:47:55Z

shadowlinq
Sep 1, 2021
Author

I've seen it happen when using the SFTP file reader and writer. Unfortunately, we can't just clone a channel while this is running in production. The point is to create a solution that initiates corrective action for these circumstances without manual intervention. We can always go in to halt/redeploy the channel manually to fix it.

0 replies

shadowlinq · 2021-09-01T22:49:58Z

shadowlinq
Sep 1, 2021
Author

I should note that the timeout was set to 10000 ms. After it encounters a few errors, it'll eventually get stuck and I'm not sure why.

6 replies

shadowlinq Sep 1, 2021
Author

3.4.2

tonygermano Sep 1, 2021
Collaborator

There's a bug in that version. I had the same problem. I had a similar solution, but I had to recycle the channel before it got stuck.

tonygermano Sep 1, 2021
Collaborator

See #4142
It was fixed in 3.6

shadowlinq Sep 1, 2021
Author

That's a good idea, but how would you predict the moment before it gets stuck?

shadowlinq Sep 1, 2021
Author

Perfect thank you. I think I remember seeing that before too.

tonygermano · 2021-09-01T23:18:04Z

tonygermano
Sep 1, 2021
Collaborator

This should only be required for mirth versions prior to 3.6.

I had two channels that hit a particularly bad sftp server that constantly had errors. I created a new channel called "File Reader Bouncer." I placed the code below in a Javascript Reader. It will watch for the number of pool threads in the file reader channels being monitored to be used up to a threshold, and then try to stop the channel. If it takes too long, then it will try to halt the channel. Then assuming the channel stopped, it will restart it. It will create a blank message setting the default metadata values (mirth_source and mirth_type) with the channelId and status.

It worked well enough for me. You could also probably set up an alert for when a channel fails to restart if you notice that still being a problem.

// adjust these values for your needs
const channelsToMonitor = [
    'e15d4b58-4fe2-4f5d-b9ec-ef47bfa7bba8',
    '8f547f2f-c3b9-46b7-99e4-8ed5e594f201'
];
const maxPoolActive = 6;
const maxStopTimeoutInMillis = 10000;
const maxHaltTimeoutInMillis = 10000;
const maxStartTimeoutInMillis = 10000;


/*
    Possible values for mirth_type: 
        RESTARTED, FAILED_START, FAILED_HALT
*/

var targetChannelId, channel, sourceConnector, f, fileConnector, pools, pool, future, isStopped, mirth_type, isInterrupted = false;

function wait(future, timeout) {
    try{
        future.get(timeout);
    }
    catch(e if e.javaException instanceof java.lang.InterruptedException) {
        isInterrupted = true;
    }
    catch(e) {}
    return future.isDone();
}

var messages = new java.util.ArrayList();
var donkey = com.mirth.connect.donkey.server.Donkey.getInstance();

for (var i = 0; i < channelsToMonitor.length; i++) {
    if (isInterrupted || java.lang.Thread.currentThread().isInterrupted()) break;
    targetChannelId = channelsToMonitor[i];

    channel = donkey.getDeployedChannels().get(targetChannelId);
    if (!channel) continue;
    sourceConnector = channel.getSourceConnector();
    f = sourceConnector.getClass().getDeclaredField("fileConnector");
    f.setAccessible(true);
    fileConnector = f.get(sourceConnector);
    f = fileConnector.getClass().getDeclaredField("pools");
    f.setAccessible(true);
    pools = f.get(fileConnector);
    try {
        // can cause an issue if channel is deployed, but stopped
        pool = pools.values().iterator().next();
    }
    catch(e) {
        continue;
    }

    // pool is the channel's source connector's pool of FileSystemConnection objects.
    // The default max objects for the pool is 8.
    // We can set maxPoolActive to be lower than 8 to bump the channel before it freezes.
    // Make sure to set maxStopTimeoutInMillis appropriately in this case to avoid halting a
    // channel while it is actively working.
    if (pool.getNumActive() > maxPoolActive || (pool.getNumActive() == 8 && pool.getNumWaiters() == 1)) {
        future = ChannelUtil.stopChannel(targetChannelId);
        isStopped = wait(future, maxStopTimeoutInMillis);
        if (!isStopped) {
            future = ChannelUtil.haltChannel(targetChannelId);
            isStopped = wait(future, maxHaltTimeoutInMillis);
        }

        if (isStopped) {
            future = ChannelUtil.startChannel(targetChannelId);
            if (!wait(future, maxStartTimeoutInMillis)) {
                mirth_type = "FAILED_START";
            }
            else {
                mirth_type = "RESTARTED";
            }
        }
        else {
            mirth_type = "FAILED_HALT";
        }
        messages.add(new RawMessage('', null, {mirth_source: targetChannelId, mirth_type: mirth_type}));
    }
}

return messages;

1 reply

shadowlinq Sep 8, 2021
Author

This works great thank you @tonygermano

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to start mirth channel after using ChannelUtil to halt #4701

{{title}}

Replies: 5 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Unable to start mirth channel after using ChannelUtil to halt #4701

shadowlinq Aug 31, 2021

Replies: 5 comments · 8 replies

pacmano1 Aug 31, 2021 Collaborator

jonbartels Aug 31, 2021

pacmano1 Aug 31, 2021 Collaborator

shadowlinq Sep 1, 2021 Author

shadowlinq Sep 1, 2021 Author

shadowlinq Sep 1, 2021 Author

tonygermano Sep 1, 2021 Collaborator

tonygermano Sep 1, 2021 Collaborator

shadowlinq Sep 1, 2021 Author

shadowlinq Sep 1, 2021 Author

tonygermano Sep 1, 2021 Collaborator

shadowlinq Sep 8, 2021 Author

shadowlinq
Aug 31, 2021

Replies: 5 comments 8 replies

pacmano1
Aug 31, 2021
Collaborator

jonbartels
Aug 31, 2021

pacmano1 Aug 31, 2021
Collaborator

shadowlinq
Sep 1, 2021
Author

shadowlinq
Sep 1, 2021
Author

shadowlinq Sep 1, 2021
Author

tonygermano Sep 1, 2021
Collaborator

tonygermano Sep 1, 2021
Collaborator

shadowlinq Sep 1, 2021
Author

shadowlinq Sep 1, 2021
Author

tonygermano
Sep 1, 2021
Collaborator

shadowlinq Sep 8, 2021
Author