Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a recovery mechanism for failed http requests #66

Open
gzhytar opened this issue Nov 12, 2017 · 12 comments · May be fixed by #100
Open

Implement a recovery mechanism for failed http requests #66

gzhytar opened this issue Nov 12, 2017 · 12 comments · May be fixed by #100

Comments

@gzhytar
Copy link

gzhytar commented Nov 12, 2017

krakenex: 2.0.0rc2

What are you trying to achieve?

During peak times large amount of http requests are failing without a specific response from Kraken server. These are intermittent problems with CloudFlare/Kraken and for some of returned http codes it should be safe to retry an operation. As each request has a unique nonce which is checked on a server, it should prevent execution of several identical requests if they get to a server despite a failed status_code.

Max number of retries should be configurable


EDITed by @veox:

Pull requests with the suggested change: #65 #99 #100

@veox
Copy link
Owner

veox commented Nov 12, 2017

I find the idea reasonable; however, look at #65 (comment) for some remarks.

Most importantly, in the face of network congestion, a "retry" approach must be live-tested to determine that the particular HTTP errors don't result in duplicate orders. (Hence "need info".)


OT: Have you encountered any other major issues with 2.0.0rc2? I'm not sure how to interpret the relative silence concerning it.

Are there no other bugs? (Hard to believe.)

Does no one use it, since pip doesn't install it by default? (Quite likely.)

Did everyone give up on Kraken, since it lags so much? (Passionately empathising.)

@gzhytar
Copy link
Author

gzhytar commented Nov 12, 2017

Thanks @veox, your comments in the pull req make lots of sense. I'll upload a new revision once test it for a while.

I've started using krakenex few days ago through https://github.com/Endogen/Telegram-Kraken-Bot and was annoyed with constant networking problems as original version of the bot references krakenex 1.0 (i made a pull request today to use 2.0.0 RC).

I wish i looked at your repo and notice that you already have 2.0.0rc, but i didn't. Ended up rewriting krakenex to use Requests on my own, used it for couple days and only today noticed that you already did all the hard work ;)

My observations are that 2.0.0 rc is more stable, especially in multithreaded use-case (as telegram bot has several active threads sending requests). Adding retries also increased a success rate making orders and querying balance. I haven't noticed any issues so far.

And yes, lots of folks who are still trying to use Kraken through web UI are giving up now.

@veox
Copy link
Owner

veox commented Nov 12, 2017

For ref: a query that resulted in a 502 HTTP error can definitely "go through" to the trade execution engine. I've observed this personally on many occasions (see PSA on /r/krakenex).

The assumption that a nonce will prevent duplicate orders can therefore be tested if attempting a retry on a 502.


EDIT: To clarify: this is risky, in general don't do it! I've already tested this (see #66 (comment)), no need to step on the rake. :)

@veox
Copy link
Owner

veox commented Nov 12, 2017

I'll try to get to a "proper" v2.0.0 release next week, with pip package and all. Hopefully, that'll get more people to use it. Also, live-testing PR #65.

@gzhytar
Copy link
Author

gzhytar commented Nov 12, 2017 via email

@philippose
Copy link

Hello,
A Good Evening to you both.

Firstly, @veox, thank you very much for this very functional, but at the same time extremely simple to use interface to the Kraken API. I have been using this library for about a month now... Not trading yet, but still working on putting together a platform for automated trading (maybe someday, I will get that far!)

Just a comment on this particular proposed change...:
I was considering a similar idea because of the increased number of HTTP errors in the last few days.
However, like @veox, I do not think it is a very good idea to implement the retry logic so deep in the library. Most of the time when I received this error while adding orders, the order had gone through. In such situations, it would be catastrophic to have an automated retry, which would result in multiple orders.

I personally think that the implementation of the retry logic should be placed higher up at the application layer. something like "try, except" blocks could be used to process the error, and then probably send a request to check the current orders to first verify the status of the order before attempting a retry.

Regards,
Philippose Rajan

@veox
Copy link
Owner

veox commented Nov 12, 2017

@philippose Thanks for your thoughts!

I do tend to agree in general that it's "better" to scream loudly of a fault and do nothing when money is involved; however, the try/except approach doesn't allow for simple nonce reuse (the API-level one), as proposed here.

Perhaps, if re-using the PreparedRequest, instead of constructing a new one in the except block...


Otherwise, one could add userref parameters to orders when placing them, and if that fails:

  • wait for an (arbitrarily) "long" time;
  • look up the order by its userref;
  • if it's not found, retry sending the order.

This is error-prone, though, since the time-to-wait seems to depend on the load on Kraken's trade execution engine.

@veox
Copy link
Owner

veox commented Nov 12, 2017

Perhaps, if re-using the PreparedRequest, instead of constructing a new one in the except block...

That is, if the except caught a 502 a 504 something that's known to be "safe" to repeat, one should be able to access the precise request that was sent - as krakenex_API_object.response.request - and re-send that with krakenex_API_object.session.send(...).

Haven't tried it yet. Hopefully - next week.

EDIT: This will likely only work if there haven't been any queries in the interim with higher nonces, if not increasing the the nonce window in Kraken's settings... I should really go rest, though. :D

@veox veox added this to the v2.x milestone Nov 12, 2017
@veox
Copy link
Owner

veox commented Nov 14, 2017

TL;DR: try/except with session.send() in end-app works, but adds a lot of boilerplate (EDIT: and with a nasty repetition of in-lib code to catch subsequent exceptions). retries within lib approach as in #65 could be OK, if made opt-in.


Note: v2.0.0 has been released; I've used it for the experiment below. It has no changes compared to v2.0.0rc2.

I've tried the approach from my previous comment. It can be seen in this gist.

In short - it works. (Tried a few times.) A query that seems to fail with a 502, but does in fact go through, will be rejected by Kraken when repeated, with a response of:

{'error': ['EAPI:Invalid nonce']}

In other words, the nonce is indeed sufficient for Kraken to recognise it as a duplicate. (At least for now...)


This approach, however, adds a lot of boilerplate to the end application. The "wrapper" must be repeated for every query type, either manually or with a decorator.

This is unwieldy, and, more importantly, would likely be too complicated for a lot of people using krakenex. (From numerous issues reported this year, I'm guessing trading automation is the first stint with programming for many.)

@gzhytar's approach seems warranted - it doesn't add too much bloat, and is a solution for a common issue, especially in light of Kraken's current (abysmal) performance.

This should be, however, a deliberate opt-in setting, as mentioned in #65 (comment). Otherwise, it's only likely to further increase the load on Kraken's TEX.

@pawapps
Copy link

pawapps commented Nov 17, 2017

Appreciate all the work on this, I am using v1 and have implemented similar logic into the API class.

One thing I do not quite understand is Kraken's "nonce window" implementation. From experimentation I know resubmitting a nonce with a nonce window > 0 can make for a duplicate submit.

Here's my logic/flow for private queries:
Submit Query
if HTTP error: resubmit query with same nonce
if EService:Unavailable or EService:Busy error: resubmit query with same nonce
if EAPI:Invalid nonce error and AddOrder method: spend time checking for the wayward order
if EAPI:Invalid nonce error and NOT AddOrder method: resubmit query with new nonce

The Kraken has been very unhappy lately...

ps. (likely off-topic but...) how does v2 improve the reliability of the network connections? I don't see how it would make better the onslaught of 5XX server errors

@veox
Copy link
Owner

veox commented Nov 19, 2017

@pawapps The "nonce window" part would make a good separate issue. ;)


As to network connection reliability:

v1 didn't gracefully handle the case where the remote "dropped" the connection between queries. You'd have to check the connection state in advance, prior to sending the query; or reset the connection manually and repeat the failed query. The latter, when done inappropriately, could also result in two open connections, or at least a Connection object floating in memory (due to what looks like the OS-level broken socket left open)... All of this is too much networking where people are usually concerned with business logic (often literally).

In v2, requests handles this behind the scene. So you'll "only" get a requests exception as a direct result of performing a query; as compared to something that looks like a failed query, but didn't even make it to CloudFlare.

The latter is a rare case these days, which this issue is about (kind of...).

@veox
Copy link
Owner

veox commented Dec 10, 2017

It's been about a month of testing, and I haven't as-of-yet experienced a duplicate order when using the nonce-protection approach (see PR #65).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants