Hacker News new | past | comments | ask | show | jobs | submit login
A Complete Guide to Rails Caching (nateberkopec.com)
171 points by nateberkopec on July 15, 2015 | hide | past | favorite | 56 comments



> Developers, by our nature, are very different from end-users. We're used to the idea that when you interact with a computer, it takes a little while for the computer to come back with an answer.

I don't agree. End users are used to shitty systems which take some time to load, but we developers should know better. We should be able to recognise performance bottlenecks during the architecture design and development, also we should be able to measure the layers where optimisation is necessary and useful. And when developing web app, caching should always be in mind as one tool to improve system performance in one way or another.

Otherwise fantastic piece of information.


Taking this further, as a developer, I find I have even less patience for slow loading times than a 'regular end-user'.

I don't know exactly, but I assume this is because I know what goes on behind the scenes, and 99% of the time, the website is doing way more work than it really needs to in order to deliver the experience that I'm asking for.


Fantastic resource.

No mention of HTTP caching though, which for a whole class of applications is a great way to minimise rendering time.

Rails has great support[1] for etag based content expiry.

[1] http://api.rubyonrails.org/classes/ActionController/Conditio...


Does anyone here use compression on all of their cached content? Our Redis cache store has ballooned up to about 8G (we store a lot of html/json fragments) and is unwieldy to ship around when we want to debug bad data bugs on our dev machines. We are experimenting with lz4 compression now and the speed-compression ratio tradeoff looks pretty good with it.

What has been your experience with Rails caching + compression?


If you're using Readthis[0] for Redis caching you can use an alternate marshaller to avoid the size and performance overhead of marshalling all objects through Ruby. If you aren't using Readthis you really should, it's faster than RedisStore, more customizable, and actually maintained!

Mandatory disclaimer, I wrote the gem.

0: https://github.com/sorentwo/readthis


Does marshaling pose a problem (leading to cache flushes) when you upgrade to a new Ruby version?


No, that hasn't posed a problem in my experience. Note that all of the stores rely on Ruby marshalling underneath. To my knowledge Readthis is the only cache that lets you choose something else like pass through, JSON, Oj, etc.

You will definitely have to flush before switching from no compression to compression or changing marshallers though.


I wrote a small snippet to share cache (particularly session) between a php webapp and rails using Dalli long back. It might be completely broken by now, but worked brilliantly back then.

it was based roughly on this https://gist.github.com/watsonbox/3170415


That sounds like a job for phuby! https://github.com/tenderlove/phuby

No, not really.


Oh man, this is great! I'll try to add you in to the Redis benchmarks in this post, or at least talk about using you instead of redis-store. I was very frustrated with the state of redis-store when writing this post.


If you have problem mem space you can use to Model cache instead of the view cache. In many use cases the view render is very fast , except if you have to many logic inside it . You can save memory without save html tags , considering that many cached views have the some models/record in input


My experience has been the exact opposite (even with simple views); the view layer always incurs the most CPU time. With the proper indexes and eager loading, data retrieval is incredibly quick.


Just curious, what are you using: ERB, Haml, Slim? According to this gem Haml could be a lot faster, but I haven't had the chance yet to check it out: https://github.com/k0kubun/hamlit


Out of interest, what database do you use? And how many joins do you normally do per view?

We use postgres and once you get to around 1m records, even COUNT is slow as it does a full table scan, even with indexes.


Same experience here. View renders have been the dog, the db queries with the right indexes and done smartly, have been fast.


Memcached is probably more suited for storing html/json fragments than Redis. The Dalli[0] memcached driver does compression for you.

0 - https://github.com/mperham/dalli


Redis is perfectly well suited to storing html/json fragments. It is as fast or faster, has built in clustering as of 3.0, has customizable expiration behavior, and is probably already used by your infrastructure. Most of the ActiveSupport caching compliant libraries will support optional gzip compression., Readthis certainly does.

There are benchmarks on the project that illustrate the performance edge it has over Dalli as well.


The bigger problem is that because of how awesome Redis is, it tends to get used for lots of things and RAM is finite. There's lots of configurations for Redis for how it handles running out of memory and nobody looks at these until they've run out of RAM and important things start disappearing. I can't count how many Rails apps I've encountered that use Redis for all or a mix of caching, background queues, pubsub, and list operations.


RAM is finite whether you're using it with Redis or Memcached.

Redis provides multiple databases (0-16) for just this situation. Use one db for caching, another for Sidekiq, and another for ad-hoc tasks. Redis can also be configured to evict entries with an expiration first, which means cached data will be cycled through while the important bits linger safely.


Not a good idea, I'm afraid. Your ad-hoc will cause the caching to stutter and the job queues to stagger. Redis is a single-threaded server and shared databases, well, share the same process. The common practice is to use dedicated Redis servers, one for each database. This also has the nice side benefit of allowing you config each as needed (e.g. LRU eviction for cache, persistency for jobs, etc...)


I found the animated GIF next to the article to be incredibly distracting while reading. I had to remove it using the inspector to be able to get through the article.


This is a great guide but it's not complete. One of the biggest problems with all of these guides is that they focus solely on view caching.

As far as I can tell the Rails ecosystem completely lacks a good model / data caching system. There are a few gems that do model caching but they all have major flaws. I'd love to see a good guide on Rails data caching and gems that eliminate the mess of calls to Rails.cache.fetch.


This is because caching should be done as coarsely as possible. Caching views and entire pages is the common case.


Exactly - why did page caching get removed from rails core?


So what’s the best approach for caching API responses?


An api response isn't any different from a web page insofar as infrastructure is concerned. HTTP caching is usually my first choice when I have the wherewithal to do that.

[Edit: conditional responses are probably the best way to go – save the bandwidth.]


Great article. You should additionally measure page speed as experienced by your users, because other pesky things like network congestion, the user's browser & hardware and the speed of light all affect website performance. If you measure from every user's browser, you'll get very detailed performance info. A chart from a recent blog post of mine: https://s3.amazonaws.com/static.rasterize.com/cljs-module-bl...

Just because the page loads quickly on your laptop doesn't mean it loads quickly for everyone. I'm working on a tool to measure this stuff: https://rasterize.io/blog/speed-objections.html, contact me if you're interested in early access.

Contact me if you're interested in an early preview.


this is a great article

> The most often-used method of all Rails cache stores is fetch

It's true, but I think you should add performance tests while the app write/read because the big problem of db/cache is the write that influence also the read(fetch). Another big problem is the expiration vs garbage collection after the memory is full.


A request with dozens or hundreds of `fetch` calls will always be slower than a few `fetch_multi` or `read_multi` calls. All of the current ActiveSupport compliant libraries support both calls.


We've found https://github.com/n8/multi_fetch_fragments to be quite handy for this.


Sure! but I mean many reads and writes load the cache db server, and the curve is exponential not linear


> First, figure about 50 milliseconds for network latency (this is on desktop, latency on mobile is a whole other discussion).

And outside the USA, add on another 200ms more. I, an Australian, visited the USA last year and was surprised, although I had expected it, at how much faster the Internet was.

I get the general feeling that people in the USA end up much more picky about performance than their Australian counterparts, because they’re used to a result which is quite a bit faster anyway.

It’s rather a pity that we as an industry don’t care more about performance, because on average we’re utterly abysmally appallingly atrocious at it. There’s simply no good reason for things to be as slow as they are.


> but usually virtualization is a mostly unnecessary step in achieving production-like behavior. Mostly, we just need to make sure we're running the Rails server in production mode.

Isn't this assuming your development mode has the same memory/cpu as your production? I can't tell you how many times I get questions from clients who ask why their 16GB dev box is running fine while their 512MB dyno is slow. The point of a docker image is to limit these resources, which `rails s production` does not do.


Sorry, I should have qualified that sentence: "a mostly unnecessary step in achieving production-like behavior *when trying to determine a page's average response time". That's just been my experience with this. I haven't noticed significant differences in my Macbook vs Heroku when measuring response times.

It's a fast-and-loose measurement, to be sure. Virtualization/dockerization is the only way to get a super-accurate measurement.


Yeah I have been thinking about this as well. I guess it depends on how wide the difference is (when I said 512MB, I really meant it. People are still comparing that to their dev machines).

Somewhat related: How has your experience been setting up docker with Rails? I have an idea for a project that seamlessly integrate docker into one's Rails dev workflow, but so far I have found its setup (let alone working with it) to be cumbersome to say the least.


Why is `caches_action` not considered a best practice? Using that for popular landing pages, I'm able to serve over half of my requests in < 10ms.


Oh, I hope I didn't give that impression that it's no longer a "best practice". But you can, of course, accomplish exactly the same thing as an action cache with fragment caching. The cases where you can use action caching are so limited (like you said - mostly landing pages) that I feel it's hardly even worth bothering with when you can just wrap the entire view in a cache method. That's all I was trying to get across.


Or `caches_page` and then it won't even hit rack / rails and can be served directly by nginx. But I agree that the usage scope is quite limited. We still use it for pages served to not-logged-in users where it makes sense, and both the performance boost and the reduction of load on the system is noticeable.


Don't forget to divide the result from `ab` by `x` where `x` is the concurrency amount (`-c x`).

The 'time per request' that has 'mean, across all concurrent requests' behind it is the mean of every single request. All other ms values show you the response time per single request * concurrency. So the percentile table is only interesting when you take this into account.


In the example where you cache the todo list and use an array of the todo item ids plus the max updated time, why is that approach better than just updating the list updated_at time whenever a child record is changed? With your approach, each todo item needs to be pulled from the db which should be slower than just checking the parent's updated at time for a cache hit.


> Why don't we cache as much as we should?

An alternative question:

Why do we have to cache so damn much in Ruby and complicate the hell out of our infrastructure and code?

Because Ruby is slow.


Every high traffic site leverages caching to some extent, no matter what language they run underneath. That may be varnish reverse-proxy caching, in-memory caching, fragment caching, browser caching, or most likely a combination of all of them. Caching is by no means limited to Ruby or slow languages.


What about caching in Ruby is complicated to you? It seems pretty simple to me.

What production website doesn't cache? You make it sound like it's a Ruby-only issue.


Caching involves adding generally untested code to your production environment. Caching can bring your production environment out of sync with your development environment. Caching introduces state between generally state-less requests - increasing the mental overhead of understanding why you are seeing certain behavior. Subtle bugs can result in disproportionately bad behavior (wrong cache key could accidentally show one user's private data to another user).

I don't think you should never do caching - just that is is non-trivial, especially once you move past a 15min-blog use case.


Caching makes things more complex, but unfortunately, that's the nature of the game and there's no escaping it. If you plan things out and break your caching strategy down into more manageable pieces, you'll be able to work out a way to make it work as a cohesive whole. It's really easy to just ignore caching until you're almost done with a project, only to be stuck in the painful position of trying to make it work with a shoehorn. Just thinking about the basics in the beginning will help make it a lot easier for you later on.


It's not that caching itself is complicated, but rather one more layer to keep track of/thing that could break.

On the JVM, .NET, etc. statically compiled languages typically provide blazing performance, which allows you to selectively cache (i.e. where you really need it) vs. caching virtually everything because it's-just-so-damn-slow otherwise.


Plenty of them. Apps that are write heavy, and/or that have highly personalized, realtime, or secure data frequently abstain from caching altogether. Cache invalidation is hard to get right, and more importantly, its hard to keep right.


Websites that are still fast without caching and don't need to?


I'd be interested in seeings popular sites that don't use caching. Can you provide examples?


bleacherreport.com is on record for generating unique content on the fly per user.


I work at Bleacher Report. We use caching. Everywhere. As much as possible. We can probably do more.

Just because we end up using that cached content to create a unique mix for each user doesn't mean that we don't cache. I think you misunderstand the process.


I stand corrected. This conference talk had me believe otherwise: https://www.youtube.com/watch?v=xT8vDHIvurs


Michael's great, one of the smartest engineers I've worked with, but I don't think even he would say he meant to imply we don't use any caching at all.

He's also mostly talking about our Elixir stack which is not our primary stack (though... we're hiring!). So it's a little bit of an edge case.

There IS less caching involved in our Elixir stack though - due to how it functions - so maybe that's where the confusion arose.


From the bleacherreport.com source:

    <!-- "breaking_news_1" background cached 07/16/2015 at 12:44 AM -->
So much for that theory.


No, because web resources are slow. Your server may be done in 100ms, but your browser will puff for 10 more seconds till it gets all the crap by slurping it 6 pieces at a time.


Caching doesn't fix browser slowness, or any type of slowness after the data it produces leaves the server. I really have no clue where you're going with this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: