Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 1.1.1 is possibly not working for old keys? (need confirmation) #183

Closed
vantheshark opened this issue Jul 28, 2017 · 8 comments
Closed
Assignees
Milestone

Comments

@vantheshark
Copy link

vantheshark commented Jul 28, 2017

Hi,

We've updated from 0.8.0 to 1.1.1 and found many issues, unfortunately on our Production >.<
CacheManager.Core
CacheManager.StackExchange.Redis

"# Server
redis_version:3.0.7
redis_build_id:f7cfd70bacad9a98
redis_mode:cluster
os:Linux 3.10.0-327.13.1.el7.x86_64 x86_64
gcc_version:4.8.5

1/ There are suddenly a lot of errors: Update failed on ':' because of too many retries. There are many old key created by previous version but when running with the version 1.1.1, the problem seems to start to occur and the old key staying there undeleted/unexpired.

More info
The hashvalue of the key (probably created by the old version)
HasValue

The code that read the key probably check the expiration mode incorrectly, please confirm
Code version 1.1.1

When comparing the version, it's getting obvious that my theory is correct 😠

Breaking change

usesDefaultExpiration is true in this case however, the expirationMode was sliding as created by the old version. I think the cacheItem creation should use the existing expirationMode unless it's "None" and usesDefaultExpiration == true.

// Current code
var cacheItem =
	usesDefaultExpiration ?
	string.IsNullOrWhiteSpace(region) ?
		new CacheItem<TCacheValue>(key, value) :
		new CacheItem<TCacheValue>(key, region, value) :
	string.IsNullOrWhiteSpace(region) ?
		new CacheItem<TCacheValue>(key, value, expirationMode, expirationTimeout) :
		new CacheItem<TCacheValue>(key, region, value, expirationMode, expirationTimeout);

Finally, i saw the code EvictFromOtherHandles if the version is conflicted but for some reasons the key is not deleted from the other Handle.

2/ OutOfMemoryException thrown From BackplaneMessage.ReadBytes

OutOfMemoryException

And it came from here
DeserializeMessage

I tried to debug but I'm not really familiar with the code so it might be better for the author to have a look.

Rollback to 0.8.0 and the problem gone.
Thanks

@vantheshark vantheshark changed the title Version 1.1.1 is very buggy Version 1.1.1 is possibly not working for old keys? (need confirmation) Jul 28, 2017
@MichaCo
Copy link
Owner

MichaCo commented Jul 28, 2017

Hi @vanthoainguyen,
Thanks for reporting those issues and sorry for any inconveniences.

Regarding Update failed on ':' because of too many retries, does this really have to do with the expiration settings? Can you give me more information of what key/region in that case is and what the existing hash looks like in Redis?

Regarding the expiration issue:. Yes, since 1.0 there are some changes of how the expiration of cache items in Redis gets computed. That was actually a bug fix as the expiration didn't get applied correctly to cache layers above the Redis layer.
I usually try to keep versions compatible, in that case, this change was not, intentionally. The best would be to start with a clean Redis database.
Anyways, I'll have a look into that again. Assuming useDefault=true if the flag is not set might be not correct, at least in case the expiration type and timeout is set. Maybe that's a way to make it backward compatible again. I'll have to do some testing though...

Regarding OOM Exception, well, that's pretty hard to tell why this occurs. I did run some long running tests on that with millions of keys and didn't had any issues. But that is highly dependent on the environment and what else goes on. It is usually just coincidence what part of the app throws that OOM exception if the machine is low on memory already.
That being said, I know that the code can be optimized by reusing byte arrays as allocating many small byte arrays can fragment the system memory.
I'll have a look into that and might have to roll back to the previous version if I cannot make the allocations more efficient.

Can you give me some more information of what your production environment looks like, hardware wise. And, how many keys do you store and maybe how many keys do change over time? How large are the values on each key (roughly).

@MichaCo MichaCo self-assigned this Jul 28, 2017
@vantheshark
Copy link
Author

vantheshark commented Jul 28, 2017 via email

@vantheshark
Copy link
Author

FYI

Check these photos to see how the hash look like.
I think version 0.8.0 created 5 hash entries but the version 1.1.1 read 7 entries and decide the expirationMode incorectly.

The region was null btw.
Due to the usecase, our system can not wipe out all the existing cache items. However, we expect those keys will be expired in 30 mins but something happened and the entries are retured and never been deleted.

I think the culprit is likely related to why the hanle couldn't delete the conflicted version.

@MichaCo
Copy link
Owner

MichaCo commented Jul 28, 2017

I think the culprit is likely related to why the handle couldn't delete the conflicted version.

Most likely, yes. The update handling also changed to use a version field on the hash instead of passing the old value in every time for comparison.

Again, why do you have to run 0.8.0 (1.5 years old) and 1.1.1 together in one environment?
That might cause several issues and is not something I can support as that would require tons more testing/work on my side ;)

@vantheshark
Copy link
Author

vantheshark commented Jul 28, 2017 via email

@vantheshark
Copy link
Author

Found out 1 more clue.

  • I have multiple instances of the same services sharing a redis cluster. The OOE happens alot when the old code (0.8.0) add a key and at the same time the code with version 1.1.1 is running on another process. I reckon it's related to the changes made to MessageReader when it deserialises the message. It might be related to why the cache item is not removed from the handle.

MichaCo added a commit that referenced this issue Jul 31, 2017
…the available bytes prior to allocating a byte array while parsing messages to prevent OOM exceptions #183
@MichaCo
Copy link
Owner

MichaCo commented Jul 31, 2017

Hi @vanthoainguyen first of all, again, the backplane received some breaking changes and is not compatible with 0.8.0.
That being said, I also found one bug which causes the OOM exception. I'm actually validating the length of the array to prevent those things, but I allocated the array beforehand, which was wrong.
I added some tweaks to the backplane, any "old" messages might now log warnings and will be skipped due to errors parsing them. At least it should never throw OOM exceptions...

MichaCo added a commit that referenced this issue Jul 31, 2017
@MichaCo MichaCo added this to the 1.1.2 milestone Nov 18, 2017
@MichaCo
Copy link
Owner

MichaCo commented Nov 18, 2017

fixed in release 1.1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants