Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Halt a database connector that hangs in 'Polling' state. #3304

Open
rbeckman-nextgen opened this issue May 11, 2020 · 18 comments
Open
Labels
bug Something isn't working channel connector Internal-Issue-Created An issue has been created in NextGen's internal issue tracker RS-876 triaged

Comments

@rbeckman-nextgen
Copy link
Collaborator

During network and/or firewall problems something went wrong with a database connection of our test-environment, and a mirth-channel (database reader, with the keep-connection-option open set to true) got stuck in 'polling' state (the reason of the db-query hanging isnt a/the mirth-problem).

But the Mirth-problem was: Stopping the channel didnt work (this i expected), and Halting the channel didnt do anything too (it stayed in halted state, probably not able to terminate the db-connection, and never stopped.).

Also due to this problem, stopping/restarting mirth-connect wasnt possible anymore.

This problem maybe related to: Mirth-3366

Imported Issue. Original Details:
Jira Issue Key: MIRTH-3403
Reporter: amc_cru
Created: 2014-08-13T02:13:50.000-0700

@rbeckman-nextgen
Copy link
Collaborator Author

Driver used by the database-reader channel: jdbc:jtds:sqlserver:https://.....

Imported Comment. Original Details:
Author: amc_cru
Created: 2014-08-13T02:30:13.000-0700

@rbeckman-nextgen
Copy link
Collaborator Author

Others are running into this as well, also using jTDS. This is due to a couple of things. First, the database reader doesn't support halting at all right now. The delegate interface doesn't even have a halt method. Second, both the reader and writer query delegates just close the Connection object. However, that could block, depending on the driver. To do a proper halt, we should be calling the abort method, and possibly also the setNetworkTimeout method when the connection first gets created.

However, jTDS is dumb and doesn't support either of those methods. In the JtdsConnection class it literally just throws an AbstractMethodError, nothing else. So as far as I can tell, it's impossible to force-halt a hanging jTDS connection, unless we override the class and implement those methods ourselves. PostgreSQL's driver does implement them, so it should work fine there. Not sure about other drivers. Maybe Microsoft's JDBC driver does.

This is very easy to reproduce. I just use a VM with SQL Server on it, and a Database Writer channel that invokes WAITFOR. While a message is processing I suspend the VM. After that, the channel cannot be stopped or halted, and requires the entire server to be restarted.

Imported Comment. Original Details:
Author: narupley
Created: 2015-04-30T12:07:41.000-0700

@rbeckman-nextgen
Copy link
Collaborator Author

Doesn't look like Microsoft's JDBC driver supports those methods either. So as far as SQL Server goes there's nothing that can be done, unless as I said we alter jTDS to support it.

It sucks that in cases like this the channel basically can't be used at all, but the only alternative is to spawn the possibly-forever-blocking operations in a separate thread, and when something like this happens we just try our best and then forget about the thread. Then the channel will be able to stop, and can be used, redeployed, etc. It's just that in the JVM you'll still have lingering threads that could stick around forever. What's worse? A possible thread leak, or forcing the user to restart the entire server? In the thread leak case we would obviously send some error to the server log letting the user know that it's happening, and that they should restart the server when it's convenient. I think that's better than channel being in a perpetually unusable state, wherein the user is forced to either restart the server immediately, or abandon/clone the channel.

Imported Comment. Original Details:
Author: narupley
Created: 2015-04-30T12:29:09.000-0700

@rbeckman-nextgen
Copy link
Collaborator Author

[http:https://www.mirthcorp.com/community/forums/showthread.php?t=14253]

Imported Comment. Original Details:
Author: narupley
Created: 2015-05-14T07:47:43.000-0700

@rbeckman-nextgen
Copy link
Collaborator Author

Is there a workaround for this? We are encountering this in our production (supported) environment.

Imported Comment. Original Details:
Author: justinsk
Created: 2019-07-25T17:26:34.000-0700

@rbeckman-nextgen
Copy link
Collaborator Author

We are currently having a similar problem with Connector Type of File Server and Method of FTP. Just last week we had the same problem occur with a Connector Type of File Server and Method of FTP and Method of File and a Directory pointing to a networked server. When the network server has problems, Mirth Connect didn't handle this well and started using a so much CPU that other channels could get their work done, and we had to reboot the server. We are using Mirth Version 3.5.2. Justin Kaltenbach, what version are you using? We, too, are using a supported version.

Imported Comment. Original Details:
Author: mulleg
Created: 2019-08-02T11:55:14.000-0700

@rbeckman-nextgen
Copy link
Collaborator Author

Would like to know workaround as well. Locking up production workflow about once every 2 weeks. Occurring with the following connecting to SQL Azure using Mirth Connect Server 3.5.2, Java version: 1.8.0_121:

Database Reader

Use Javascript: No
Keep Connection Open: No
Aggregate Results: No
Cache Results: Yes
Retries on Error: 3
Retry Interval: 10000

sqljdbc42.jar

Imported Comment. Original Details:
Author: sesq
Created: 2019-08-28T12:32:07.000-0700

@cturczynskyj cturczynskyj added the closed-due-to-inactivity This is being closed due to age. If this is still an issue, comment and we can reopen. label Mar 1, 2021
@pladesma pladesma closed this as completed Mar 1, 2021
@brendanhwell
Copy link

Running into the same issue? Any workaround? This happens weekly for us....requires a complete kill of MC, channel just sits there querying indefinitely, and requires a mirth connect restart. I was going to try Microsoft's driver but it sounds like you tried that @rbeckman-nextgen ?

@aemerytruven
Copy link

This happens to us so often I have a Windows Scheduled task to kill mirth and start it again a few times a day. I have not found another work around. We house our Mirth instances in the same datacenter as our MS SQL Servers, so they are direct server to server connections, no cloud.

@RaulDeLaMantua
Copy link

This happens to us all the time.

@michaelmarcuccio
Copy link

michaelmarcuccio commented Jun 25, 2021

Can this be reopened? If I am not mistaken this was never addressed and has been an issue for years with no workaround.
Was this resolved by newer Mirth versions or the new integrated SQL driver they use?

@rivforthesesh
Copy link

Please can this be reopened? I've been running into this issue a lot with one particular database reader (even though the query I've used runs just fine in SSMS), and having to force kill the service every time this happens is really slowing down work.

@michaelmarcuccio
Copy link

@rivforthesesh what Mirth version are you using and what sqljdbc.jar driver version are you using? Also are you using timeout parameters in your connection string? I think the solution might be something around using the new timeout params, but you have to use an up to date version of the driver(6.2). I have not tested this and probably don't have time to at the moment. e.g. https://github.com/Microsoft/mssql-jdbc/wiki/QueryTimeout

@rivforthesesh
Copy link

@michaelmarcuccio currently using Mirth 3.11.0, and I believe the driver is mssql-jdbc-8.4.1.jre8.jar - that's my best guess based on the config files we have. I think I've managed a workaround now but thank you for the query timeout suggestion, I'll make a note of it for later!

In case this helps anyone else already stuck in the loop: stopping the channel and then killing the process in SSMS (sp_who2 to find the SPID, kill to kill it) will end the polling loop, but that requires having the right permissions in SSMS to kill processes

(For actually fixing it, changing the source query to add the option maxdop=2 got me past the polling stage and onto reading - I was digging through sys tables in SSMS to try to find anything odd about the query and it had a dop of 8 which, while it should've worked on this server, was way higher than any other running process. That's more likely to be a quirk of the server I'm working on than something with Mirth, but I'll drop it here anyway)

@michaelmarcuccio
Copy link

michaelmarcuccio commented Aug 4, 2021

@rivforthesesh I am using SQL Azure with sqljdbc42.jar, just adjusted from maxdop=0 to maxdop=1 as our instance has less than one vCore. Thanks for the info here. If the issue still occurs I will probably try upgrading drivers and setting timeouts in connection string.

Edit: Issue still occurs.

@pladesma pladesma reopened this Sep 3, 2021
@pladesma pladesma added bug Something isn't working Internal-Issue-Created An issue has been created in NextGen's internal issue tracker triaged RS-876 and removed closed-due-to-inactivity This is being closed due to age. If this is still an issue, comment and we can reopen. labels Sep 3, 2021
@twest-mirthconnect
Copy link
Contributor

@rbeckman-nextgen - Will you please let me know if this is still a problem in your production? Let me know what version you are using when you respond. :) Thanks.

@michaelmarcuccio
Copy link

Using v3.5.2, the issue hasn't reoccurred in a few months. Maybe the Azure SQL DB side got more friendly to prevent this scenario, or maybe I just have been getting lucky. It is really painful when the issue does occur as it causes like a 30 minute downtime to restart everything.

@twest-mirthconnect
Copy link
Contributor

For those on this string, if you can or want to develop a PR on your end, we will pull it into our sprint schedule and complete a code review and quality check. Once we complete our validation, the commit will be available for use ahead of the release.

We hope this new process will improve both the response to problems engineers are facing in the field and a stronger, more connected community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working channel connector Internal-Issue-Created An issue has been created in NextGen's internal issue tracker RS-876 triaged
Projects
Community Projects
Awaiting triage
Development

No branches or pull requests

9 participants