James & Gerhard build Jerod's Pipe Dream™️. Adam helps. #518

gerhard · 2024-06-21T19:17:18Z

How it all started

We started on this crazy journey many years ago: Kaizen 1 - The day half the internet went down

You've heard us talk about CDNs a lot in recent Kaizen episodes:

Having tried all sorts of things, we finally did it. @jerodsanto's pipe dream is real:

This pull request is the first concrete step towards building our simple, wet wipe, CDN. If you want the context behind this, you must listen to Kaizen 15 when it ships as 🎧 https://changelog.com/friends/50. In the meantime, you should probably watch this to catch up on what happened between Kaizen 13-14 & Kaizen 15 - @jamesarosen is awesome 🙌

🎬 Let's Build a CDN - Part 1

Where are we today?

Back to this PR, there are a few important elements that are worth calling out:

fly.io/cdn-2024-01-26/default.vcl is where all the Varnish magic happens. The comments (especially those at the bottom) are super helpful for context.
fly.io/cdn-2024-01-26/fly.toml is the Fly.io app config.
fly.io/cdn-2024-01-26/run is a super duper™️ amazing script that made both @adamstac & @jerodsanto chuckle. You should definitely try running it and see what happens. Attaching a few screenshots that show what to expect:

Sub-commands like deploy, world-scale & small-scale are not going to work for you. It's OK to skip them. The most interesting command is ./run demo-2024-06-21 . It has 7 steps & this is what each step looks like:

While the above screenshots give you a good idea of what is going on in the background, for the best experience, you will want to listen to 🎧 https://changelog.com/friends/50. We captured all that fun, joy & excitement in a way that only a conversation between good friends is able to 😁 🤣 😆

Note

There is now a video available for the actual demo: 🎬 Gerhard shows Adam & Jerod The Pipe Dream™️

What happens next?

We still have a bunch of challenges ahead of us. See fly.io/cdn-2024-01-26/default.vcl for more details. I intend to make progress with them as time permits.

If there are any tips that you can share, we look forward to them! Drop it as a comment below, or join us in Changelog's Slack if you prefer it cosier: https://changelog.slack.com/archives/C024Q4CER

@jamesarosen is exploring a different direction which I am super excited about. We intend to get together in a few weeks to record together and have some more fun with this. By then, https://changelog.com/friends/50 will have shipped, so that is definitely the next milestone.

Things that we want to do part of this PR

Share it with a few folks directly - esp. @jamesarosen @lawik @mttjohnson on Slack
Pair-up & record Let's build a CDN - Part 2 with @jamesarosen & @mttjohnson
- iad region James & Gerhard build Jerod's Pipe Dream™️. Adam helps. #518 (review)
- Address VARNISH_SIZE="500M" comment
- Expose env var as custom HTTP headers
- Discuss x- headers
- Fix downtime when app instances update (a.k.a. dynamic backends)
  - Review @mttjohnson's fork live:
    - mttjohnson@2350bb1
    - mttjohnson@b7e7df2
Before merging, add screenshots from app.hyperping.io for various Changelog CDNs

Follow-ups to this PR

Split into a separate repository - thechangelog/pipedream sounds amazing @jerodsanto 💪

Kaizen!

Signed-off-by: Gerhard Lazu <[email protected]>

As shown in https://www.youtube.com/watch?v=8bDgWvyglno Signed-off-by: Gerhard Lazu <[email protected]>

We don't want it to connect to Fly.io Proxy, and therefore go to the edge of the network, pays the HTTPS termination penalty, and then comes back again to the app instance. Stay within the Fly.io internal network & connect to the app port directly instead 🤦 Signed-off-by: Gerhard Lazu <[email protected]>

Run instances across 4 regions Use up to 2GB of RAM (single shared CPU is enough) 👌 Signed-off-by: Gerhard Lazu <[email protected]>

Based on the metrics from last 12h. Also update the TODO to reflect current perspective. Signed-off-by: Gerhard Lazu <[email protected]>

jamesarosen · 2024-06-22T00:36:24Z

fly.io/cdn-2024-01-26/default.vcl

+# - https://abhishekjakhotiya.medium.com/magento-internals-cache-purging-and-cache-tags-bf7772e60797
+
+backend default {
+ .host = "top1.nearest.of.changelog-2024-01-12.internal";


You might want to play with DNS anycast by trying top2.nearest, especially if you have more than 3-4 changelog.internal origin servers.

Alternatively, you think it's worth experimenting with Varnish-level shielding? Cache nodes in low-traffic areas will have low cache-hit ratios.

I wonder if there would be a way to use Fly's DNS to specify that the cache servers in the same datacenters as the origin servers are shields. 🤔

We started with <appname>.internal which resulted in Varnish errors since .host is not able to handle multiple values. FTR, this is what that returns:

host changelog-2024-01-12.internal changelog-2024-01-12.internal has IPv6 address fdaa:0:4556:a7b:2f7:4d32:8516:2 changelog-2024-01-12.internal has IPv6 address fdaa:0:4556:a7b:2f7:89b9:c9b8:2

top2.nearest will behave the same since we are running 2 instances in a single region:

host top2.nearest.of.changelog-2024-01-12.internal top2.nearest.of.changelog-2024-01-12.internal has IPv6 address fdaa:0:4556:a7b:2f7:4d32:8516:2 top2.nearest.of.changelog-2024-01-12.internal has IPv6 address fdaa:0:4556:a7b:2f7:89b9:c9b8:2

Do you think that we should use something other than .host in Varnish?

I really like your idea of having a shield Varnish node in the same region as the origins, and then pointing all other varnish nodes to that shield. Provided that we can figure out how to template the varnish config at instance boot time, we could do the following:

if $FLY_REGION == $PRIMARY_REGION (both are environment variables available on every Fly.io machine), set the backend .hosts to the values from changelog-2024-01-12.internal (we would need to figure out how to handle more than 1 value)

if $FLY_REGION != $PRIMARY_REGION set the backend .hosts to the Varnish shield region, i.e. $PRIMARY_REGION.$FLY_APP_NAME.internal (this assumes that we want to run more than one shield instance in the same region for redundancy purposes)

WDYT?

Fly.io .internal DNS

Do you think that we should use something other than .host in Varnish?

Based on my (possibly outdated) understanding, yes. .host will limit you to a single IP address. If you want to use a CNAME to support anycast, you'll want a dynamic backend: vmod-dynamic or vmod-reqwest

The alternative, I suppose, would be to resolve the anycast address at build, then create a .host for each IP it returns. But then you'd have to recompile the VCL regularly. If dynamic isn't an option, I suppose we could put a daemon on the containers that polls top2.nearest.of and compiles the VCL every minute, then tells Varnish to reload the config.

Varnish resolves backend IP only at launch time (and how to fix it) is also worth reading

jamesarosen · 2024-06-22T00:41:42Z

fly.io/cdn-2024-01-26/default.vcl

+ set resp.http.x-ttl = obj.ttl;
+ # What is the max object staleness permitted?
+ set resp.http.x-grace = obj.grace;
+
+ # Did the response come from Varnish or from the backend?
+ if (obj.hits > 0) {
+ set resp.http.x-cache = "HIT";
+ } else {
+ set resp.http.x-cache = "MISS";
+ }
+
+ # Is this object stale?
+ if (obj.ttl < std.duration(integer=0)) {
+ set resp.http.x-cache = "STALE";
+ }
+
+ # How many times has this response been served from Varnish?
+ set resp.http.x-cache-hits = obj.hits;


It's perfectly fine to use the x- prefix, but this practice was deprecated in 2012.

That is good to know!

Having looked at IANA Hypertext Transfer Protocol (HTTP) Field Name Registry, the header which seems closest to what we want is Cache-Status. I have noticed that this does not mention anything about hits or grace

Does the following Cache-Status header example look good to you @jamesarosen?

Cache-Status: Shield; hit; hits=101; ttl=53.00; grace=86400.000, Edge; stale; hits=2; ttl=-123.00; grace=3600.000

For reference, this is how we are currently capturing the above info - we don't yet have Shield, only Edge:

x-cache: HIT x-cache-hits: 1 x-ttl: 55.312 x-grace: 86400.000

gerhard · 2024-06-22T18:39:12Z

fly.io/cdn-2024-01-26/default.vcl

+# - If the backend gets restarted (e.g. new deploy), backend remains sick in Varnish
+# - https://info.varnish-software.com/blog/two-minute-tech-tuesdays-backend-health
+# - EXPLORE: varnishlog -g raw -i backend_health


@jamesarosen this is by far the most problematic part of this config. If the app gets re-deployed, the backend remains sick. This is what I am seeing in the varnish logs:

My interpretation of the above is that Varnish does not try to re-establish the TCP connection to the backend once this goes away. WDYT?

The only fix that I have so far is to restart all Varnish instances, which forces the TCP connection to get re-established. It's basically this:

flyctl machines list | awk '/cpu/ { system("flyctl machines restart " $1) }'

I have two hypotheses:

Hypothesis One: It Takes Longer Than You Think

This VCL config in your probe:

.interval = 5s; .window = 10; .threshold = 5;

Means that the backend will only be marked healthy why 5 (threshold) of the last 10 (window) requests have returned a 200 OK. Once a backend becomes unhealthy, it will take up to 30s ((5 + 1) * 5s) to recover.

Try waiting longer. If that works, the solution is probably to do rolling restarts and use Varnish to load-balance. Given the way Fly's 6PN addressing works, I don't think you can be sure that top1.nearest.of wouldn't resolve to the same thing as any other given host. Thus, I think you need not 2 but 3 backends:

top1.closest.of -- the primary backend

iad (or any other fixed value) -- failover 1

sjc (or any other fixed value) -- failover 2

You would set up probes (healthchecks) for all three. Then you can check the health status in vcl_recv:

set req.backend = primary; if(!req.backend.healthy) { set req.backend = failover1; } if(!req.backend.healthy) { set req.backend = failover2; }

Or instead of implementing this yourself, you could use a fallback director.

Hypothesis Two: You're Not Getting 200s

Is is possible that the /health endpoint is returning a 302 or 307? Perhaps it's redirecting from http:https:// to https://? What do the origin logs say?

If you have the varnish container running and can exec into it to run varnishlog then you could see what kind of responses Varnish is getting from your backends and this may help with troubleshooting any sick backends.

I cloned the repo and built the container on my local machine to have a running Varnish container to demonstrate the varnishlog command. I had to adjust the vcl and set the backend to target pipedream.changelog.com since it was complaining about not being able to resolve top1.nearest.of.changelog-2024-01-12.internal from my machine. Even though pipedream.changelog.com was only providing 503 responses to me, it still works to demonstrate the use of varnishlog just fine.

Once I had it running I just had a different shell fire off a request through it curl -sv http:https://127.0.0.1:9000/ to get some log details.

varnish@2a59337f72cd:/etc/varnish$ varnishlog * << BeReq >> 3 - Begin bereq 2 fetch - VCL_use boot - Timestamp Start: 1719172226.180749 0.000000 0.000000 - BereqMethod GET - BereqURL / - BereqProtocol HTTP/1.1 - BereqHeader Host: 127.0.0.1:9000 - BereqHeader User-Agent: curl/8.8.0 - BereqHeader Accept: */* - BereqHeader X-Forwarded-For: 192.168.65.1 - BereqHeader Via: 1.1 2a59337f72cd (Varnish/7.4) - BereqHeader Accept-Encoding: gzip - BereqHeader X-Varnish: 3 - VCL_call BACKEND_FETCH - VCL_return fetch - Timestamp Fetch: 1719172226.180792 0.000042 0.000042 - FetchError backend default: unhealthy - Timestamp Beresp: 1719172226.180795 0.000046 0.000003 - Timestamp Error: 1719172226.180796 0.000046 0.000000 - BerespProtocol HTTP/1.1 - BerespStatus 503 - BerespReason Backend fetch failed - BerespHeader Date: Sun, 23 Jun 2024 19:50:26 GMT - BerespHeader Server: Varnish - VCL_call BACKEND_ERROR - BerespHeader Content-Type: text/html; charset=utf-8 - BerespHeader Retry-After: 5 - VCL_return deliver - Storage malloc Transient - Length 278 - BereqAcct 0 0 0 0 0 0 - End * << Request >> 2 - Begin req 1 rxreq - Timestamp Start: 1719172226.180555 0.000000 0.000000 - Timestamp Req: 1719172226.180555 0.000000 0.000000 - VCL_use boot - ReqStart 192.168.65.1 57563 http - ReqMethod GET - ReqURL / - ReqProtocol HTTP/1.1 - ReqHeader Host: 127.0.0.1:9000 - ReqHeader User-Agent: curl/8.8.0 - ReqHeader Accept: */* - ReqHeader X-Forwarded-For: 192.168.65.1 - ReqHeader Via: 1.1 2a59337f72cd (Varnish/7.4) - VCL_call RECV - VCL_return hash - VCL_call HASH - VCL_return lookup - VCL_call MISS - VCL_return fetch - Link bereq 3 fetch - Timestamp Fetch: 1719172226.181146 0.000590 0.000590 - RespProtocol HTTP/1.1 - RespStatus 503 - RespReason Backend fetch failed - RespHeader Date: Sun, 23 Jun 2024 19:50:26 GMT - RespHeader Server: Varnish - RespHeader Content-Type: text/html; charset=utf-8 - RespHeader Retry-After: 5 - RespHeader X-Varnish: 2 - RespHeader Age: 0 - RespHeader Via: 1.1 2a59337f72cd (Varnish/7.4) - VCL_call DELIVER - RespHeader x-ttl: -0.000 - RespHeader x-grace: 0.000 - RespHeader x-cache: MISS - RespUnset x-cache: MISS - RespHeader x-cache: STALE - RespHeader x-cache-hits: 0 - VCL_return deliver - Timestamp Process: 1719172226.181170 0.000614 0.000024 - Filters - RespHeader Content-Length: 278 - RespHeader Connection: keep-alive - Timestamp Resp: 1719172226.181219 0.000663 0.000048 - ReqAcct 77 0 77 315 278 593 - End * << Session >> 1 - Begin sess 0 HTTP/1 - SessOpen 192.168.65.1 57563 http 172.17.0.2 9000 1719172226.180510 24 - Link req 2 rxreq - SessClose REM_CLOSE 0.001 - End

The way this log works is you get multiple pieces, the Session, BeReq, and Request. It ends up listing this in the order that things finish and you have to hunt around a bit for where things started and figure out the chain of events.

Here the Session has an identifier of 1 and is the starting point, from there you can see a reference to Link making a req 2.

This is where the beginning of Request 2 starts and there is a reference that Request 2 was called by the id 1 in Begin req 1 rxreq. You work through a few lines of the Request and you'll see another Link that points to bereq 3 fetch which is where it makes a backend request.

This is the point where BeReq starts and if you look up at BeReq, you'll see it contains a reference to being called from the identifier 2 in Begin bereq 2 fetch. You can see the pieces of the request that BeReq received, where it makes a call to the backend, some timestamps of the various events and the details of the response it received from the backend, and then the BeReq finishes which is why it is first in the log entries.

Next you have the continuation of Request 2 where it receives a response and lists all the details of the response it got back from BeReq then it finishes

Finally you are back to Session and it finishes

If you have a Varnish instance with a bunch of activity it's useful to use some command line options for varnishlog to filter things a bit to only show the log events you want, and a nice reference for the command line options is here: https://www.varnish-software.com/developers/tutorials/varnishlog/

I had to go reference an old gist I made when doing this more regularly. Using the -q option is useful to filter the logs and only show logs coming from your own IP.

varnishlog -q 'ReqHeader:X-Forwarded-For eq "192.168.65.1" or BereqHeader:X-Forwarded-For eq "192.168.65.1"'

...this would only end up showing the Request and BeReq events from the log since the Session wouldn't have these headers

varnishlog -q 'ReqURL eq "/friends/50" or BereqURL eq "/friends/50"'

...this would only show Request and BeReq events to a specific URL.

I have another hypothesis: Fly is assigning a new IP address during the deploy. If you give a DNS name for a .host, Varnish only resolves that once, at launch. If Fly assigns a new IP, you'll be hammering someone else's service with healthcheck requests 😓

I recommend upvoting this Fly.io feature request to support SRV records. libvmod works with getdns to support SRV records. That would mean you wouldn't have to specify the port here. It would pick the port up from DNS.

gerhard · 2024-06-22T18:40:11Z

fly.io/cdn-2024-01-26/fly.toml

+# Closest to James
+primary_region = "sjc"
+# Secondary region will be "lhr", closest to Gerhard


Note to self: change this to iad so that it's in the same region as the origin.

Stop cache insertion when a backend fetch returns an 5xx error. Add new varnish_status synthetic response for the health-checks. Make it: - World Scale™ - Slow Scale™ - Adam Scale™ - Jerod Scale™ - Gerhard Scale™ Convert `demo` to `run` and teach it a few new tricks. Signed-off-by: Gerhard Lazu <[email protected]>

mttjohnson · 2024-06-29T02:17:41Z

fly.io/cdn-2024-01-26/fly.toml

+kill_timeout = 30
+
+[env]
+VARNISH_SIZE="500M"


I think this VARNISH_SIZE="500M" environment variable tells Varnish that it can use 500MB of RAM for it's service, but down below you have the VM set to only have 256MB of RAM, so this may result in Varnish trying to use more memory than the system has available.
https://www.varnish-software.com/developers/tutorials/running-varnish-docker/#setting-the-cache-size

mttjohnson · 2024-06-29T19:33:06Z

I forked the repo and picked up the branch (https://github.com/mttjohnson/changelog.com/commits/james-and-gerhard-build-jerods-pipedream-adam-helps/) to experiment with and test a few Varnish things in a local docker container.

Aside from the commits where I experimented with VCL modifications I also tried to keep notes for commands I was running, some of the results, thoughts, and resources I found along the way.

https://github.com/mttjohnson/changelog.com/blob/james-and-gerhard-build-jerods-pipedream-adam-helps/fly.io/cdn-2024-01-26/VARNISH_TESTING.md

What I've explored and documented so far I'm hoping will provide some insight that I think could resolve the sick backends issue you've been seeing, and provide a starting point for purging/banning content from the cache in varnish.

gerhard and others added 5 commits June 21, 2024 19:15

First step in building a CDN

e89896b

Signed-off-by: Gerhard Lazu <[email protected]>

Capture the config at the end of our first session

e8aa5f2

As shown in https://www.youtube.com/watch?v=8bDgWvyglno Signed-off-by: Gerhard Lazu <[email protected]>

Configure Varnish to connect to the nearest app host

7ff5110

Run instances across 4 regions Use up to 2GB of RAM (single shared CPU is enough) 👌 Signed-off-by: Gerhard Lazu <[email protected]>

Serve from 9 regions with lower memory footprint

247f2a3

Based on the metrics from last 12h. Also update the TODO to reflect current perspective. Signed-off-by: Gerhard Lazu <[email protected]>

jamesarosen reviewed Jun 22, 2024

View reviewed changes

gerhard commented Jun 22, 2024

View reviewed changes

gerhard force-pushed the james-and-gerhard-build-jerods-pipedream-adam-helps branch from eff9bcb to a44e0cb Compare June 25, 2024 18:21

mttjohnson reviewed Jun 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

James & Gerhard build Jerod's Pipe Dream™️. Adam helps. #518

James & Gerhard build Jerod's Pipe Dream™️. Adam helps. #518

gerhard commented Jun 21, 2024 •

edited

Loading

jamesarosen Jun 22, 2024

gerhard Jun 22, 2024

jamesarosen Jun 24, 2024 •

edited

Loading

jamesarosen Jun 22, 2024

gerhard Jun 22, 2024 •

edited

Loading

gerhard Jun 22, 2024 •

edited

Loading

jamesarosen Jun 22, 2024 •

edited

Loading

mttjohnson Jun 23, 2024

jamesarosen Jun 24, 2024

jamesarosen Aug 2, 2024

gerhard Jun 22, 2024

mttjohnson Jun 29, 2024

mttjohnson commented Jun 29, 2024

James & Gerhard build Jerod's Pipe Dream™️. Adam helps. #518

Are you sure you want to change the base?

James & Gerhard build Jerod's Pipe Dream™️. Adam helps. #518

Conversation

gerhard commented Jun 21, 2024 • edited Loading

How it all started

Where are we today?

What happens next?

Things that we want to do part of this PR

Follow-ups to this PR

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesarosen Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerhard Jun 22, 2024 • edited Loading

Choose a reason for hiding this comment

gerhard Jun 22, 2024 • edited Loading

Choose a reason for hiding this comment

jamesarosen Jun 22, 2024 • edited Loading

Choose a reason for hiding this comment

Hypothesis One: It Takes Longer Than You Think

Hypothesis Two: You're Not Getting 200s

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mttjohnson commented Jun 29, 2024

gerhard commented Jun 21, 2024 •

edited

Loading

jamesarosen Jun 24, 2024 •

edited

Loading

gerhard Jun 22, 2024 •

edited

Loading

gerhard Jun 22, 2024 •

edited

Loading

jamesarosen Jun 22, 2024 •

edited

Loading