Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure websockets write all data; Always keep_alive every websocket #178

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

shawwn
Copy link

@shawwn shawwn commented Aug 13, 2020

Summary:

  • Ensure websockets write all data

  • Always keep_alive every websocket

Explanation:

I've been using webdis websockets to write large amounts of data to a webgl demo for plotting lots of data points (https://twitter.com/theshawwn/status/1292266967940960257). There were two hurdles:

  1. webdis doesn't handle the return value of write() properly, leading to truncated sends

  2. webdis seems to close websockets after each request.

This PR writes all data, and also marks every websocket as keep_alive = 1, ensuring that it stays open for multiple requests.

There is currently a demo of this here: https://test.tensorfork.com/ though I'm not sure how long that link will remain valid.

Copy link
Owner

@nicolasff nicolasff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shawwn, thanks a lot for this PR! I understand the issue and your fix, but I think a few changes would have to be made before this can be merged. In particular, terminating the service because a websocket client got disconnected is (slightly) overkill 😊.

I see in your diff that this new code uses 4 spaces for indentation. While I agree now that this is good style, the whole code base is indented with tabs and adding blocks of code using spaces is going to make a mess. Can you please convert this new code to follow the same style? Note also that while and if keywords are not followed by a space in this repo. I wish they were and do find it more readable that way, but mixing code styles is even worse than following this relatively obscure compact style.

I've gone over your change a few times and tried to make helpful suggestions; please let me know if anything is unclear or doesn't seem to make sense.

@@ -77,10 +77,26 @@ ws_compute_handshake(struct http_client *c, char *out, size_t *out_sz) {
return 0;
}


void
ws_write_all(int socket, void *buffer, size_t length) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the caller of this function uses buffer as a char *, you can use the same type here – or even better – const char *. You can also remove the cast to (char*) in the call to write(2).

(minor) file descriptors are named fd throughout the code base, can you please rename socket to fd for consistency? Not to mention that socket is also a function.

if (n <= 0) {
if (errno == EINTR || errno == EAGAIN)
continue;
err(EXIT_FAILURE, "Could not write() data");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the logging interface instead of terminating the service with err. You can pass a pointer to the server struct from the caller and use it like this:

#include "slog.h"

// ...

int
ws_write_all(struct server *s, int fd, void *buffer, size_t length) {

// ...

const char err_msg[] = "Could not write data to websocket";
slog(s, WEBDIS_ERROR, err_msg, sizeof(err_msg)-1);
return -1;

Instead of calling err, a return code should be used to tell the caller about the failure so that it can cleanly terminate the connection and free the WS structure.
Successful writes should return 0; after the while loop. See comment below too.

Unfortunately I see that even if this error is propagated from ws_write_all to ws_handshake_reply to the caller of ws_handshake_reply, the return code is unused in worker.c. There, it should instead be used to set c->broken = 1 just like a couple of lines below the call to ws_handshake_reply.

Something like this. Instead of:

if(nparsed < ret) {
	http_client_add_to_body(c, c->buffer + nparsed + 1, c->sz - nparsed - 1);
	ws_handshake_reply(c);
}

you'd want to use the return value:

if(nparsed < ret) {
	http_client_add_to_body(c, c->buffer + nparsed + 1, c->sz - nparsed - 1);
	if(ws_handshake_reply(c) < 0)
		c->broken = 1;
}

I'm also not sure about this repeated write outside of the event loop. Ideally the call to write(2) should happen when the file descriptor is writable, and the event re-registered if there's still more data to write. This is a more complex change though, and could be done in a second pass.

@@ -159,8 +175,7 @@ ws_handshake_reply(struct http_client *c) {
p += sizeof(template4)-1;

/* send data to client */
ret = write(c->fd, buffer, sz);
(void)ret;
ws_write_all(c->fd, buffer, sz);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value of this function should be propagated to the caller in worker.c. Can you put back the int ret; and use it?

Also, to pass a server struct to be used in slog, change to:

ret = ws_write_all(c->s, c->fd, buffer, sz);

(also see comment below)

@@ -159,8 +175,7 @@ ws_handshake_reply(struct http_client *c) {
p += sizeof(template4)-1;

/* send data to client */
ret = write(c->fd, buffer, sz);
(void)ret;
ws_write_all(c->fd, buffer, sz);
free(buffer);

return 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we'd use ret:

return ret;

@shawwn
Copy link
Author

shawwn commented Aug 14, 2020

Okay, I'll look into this when I have some time. Your comments were thorough and helpful; thank you.

In particular, terminating the service because a websocket client got disconnected is (slightly) overkill 😊.

It doesn't. In fact, according to the write manpage (https://www.man7.org/linux/man-pages/man2/write.2.html), there don't seem to be any error codes that can cause the err to fire. Most error codes are related to file system errors, for example.

It's a lot more work to handle the error code gracefully. Are you sure there are situations where write can fail in a way that triggers the err?

I agree it's probably better to handle it, but FWIW I've been running this code for a week now with no problems at https://test.tensorfork.com/.

I'm also not sure about this repeated write outside of the event loop. Ideally the call to write(2) should happen when the file descriptor is writable, and the event re-registered if there's still more data to write. This is a more complex change though, and could be done in a second pass.

Yeah, the repeated write is concerning. It opens up the server to a "slow attack", where a client only downloads a few bytes per second. What are some solutions? We could spawn a thread for each connected websocket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants