Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the best pastebin that ever was or will be #62

Open
buhman opened this issue Dec 13, 2014 · 4 comments
Open

the best pastebin that ever was or will be #62

buhman opened this issue Dec 13, 2014 · 4 comments
Assignees
Labels

Comments

@buhman
Copy link
Member

buhman commented Dec 13, 2014

One of the most critical things about a pastebin is that it must always work, even through the zombie apocalypse.

People want to put their shit up on a pastebin yesterday, not fuck around with 6 alternatives trying to find the one that happens to be working now. We need to deliver on that, and be the pastebin that everyone uses because it is the only pastebin that works when all of the others fail.

There are a few considerations:

application monitoring

If nothing else fails, one inevitability is that we will run out of paste IDs or disk space or both. We should make sure that this never gets in the way of people being able to make new pastes--this should be actively monitored and actioned upon proactively, and not only after someone reports actual breakage.

stack high availability

Even the brief time it takes to deploy code and restart gunicorn is too long, let alone any possible un-planned breakage should be treated like people die for every second ptpb is unavailable. We should decide how we want to handle this: for code deployments, we should have an automated system that drains all incoming connections, and shifts load to elsewhere. We should decide the exact mechanisms we want to use to accomplish this.

network uptime

Despite my mixed feelings about Datacate, it's a serious problem if our/their infrastructure goes down, which it has at least twice since September 2014, both times for at least an hour. We should have ptpb deployed in multiple geographically separate locations, and have automatic failover and monitoring mechanisms to prepare for the catastrophic failure of an entire datacenter.

@jdppettit
Copy link
Member

Application monitoring is kind of already addressed in the Nagios monitors issue we have open. Going to work on that on my next break probably.

stack high availability

We could set up a pair of haproxy instances, each in separate locations / with separate providers. Each instance can balance between two backends that share a DB. When we deploy new code we can do it to a pair at a time, test it, and then put it back in rotation. This gives us quite a bit of redundancy, and flexibility to break things without people noticing.

network uptime

I agree, Datacate has gone down several times. Between the two of us, we have access to two separate providers, I'm sure we can set something up that providers geographic redundancy and provider redundancy. Maybe have one balancer and its backends + DB on one provider, the second on another?

@buhman
Copy link
Member Author

buhman commented Dec 22, 2014

HAProxy 1.5 looks pretty cool--even does IPv6, despite some people telling me otherwise.

However, we can do this for free with Varnish:

https://www.varnish-cache.org/docs/4.0/reference/vcl.html#backend-definition
https://www.varnish-cache.org/docs/4.0/reference/vcl.html#probes
https://www.varnish-cache.org/docs/4.0/reference/vmod_directors.generated.html

Examples: https://www.varnish-cache.org/docs/4.0/users-guide/vcl-backends.html#directors

Might as well make fewer stages in the request pipeline than more--I wouldn't be surprised if Varnish is faster anyway.

@jdppettit
Copy link
Member

Sounds good, fire

@polyzen
Copy link

polyzen commented Mar 17, 2015

A fair amount of this coversation ties in with @lericah's issue #101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants