Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stricter error checking for select() interface #40

Merged
merged 1 commit into from
Jun 12, 2014

Conversation

blaenk
Copy link
Contributor

@blaenk blaenk commented Jul 27, 2013

TL;DR: The select() polling interface mistakenly assumes that a file descriptor being present in the exceptions FD set indicates an error, and throws an uncaught exception which terminates the program. This patch checks to see if there is indeed an error on the socket associated with that file descriptor, and only throws the exception if there is one, along with more descriptive information as to what the error is.

So in rakshasa/rtorrent#51 Solaris derivative users say that after a while rtorrent simply terminates/exits. Discussion there seemed to talk about it being signal related, so I provided pull request #127. User @lotheac confirmed a few months later saying that it did indeed fix one of his problems. However, after extended use he ran into a similar problem with the error message "Listener port received an error event," an error message that I tracked down to libtorrent.

Solaris derivatives don't use an OS-specific I/O multiplexing API such as /dev/ports or event ports, instead falling back to the simple select() API.

The source of the problem is what I believe to be a common misinterpretation of the select() function, whose prototype is:

int select(int nfds, fd_set *readfds, fd_set *writefds,
           fd_set *exceptfds, struct timeval *timeout);

However, some manual pages such as Solaris' put it this way:

int select(int nfds,
 fd_set *restrict readfds, fd_set *restrict writefds,
 fd_set *restrict errorfds,
 struct timeval *restrict timeout);

Notice that the fourth argument is shown there as errorfds instead of exceptfds as shown on the Linux man pages. This naming discrepancy is common across API documentations, and it mistakenly gives the impression that file descriptors present in that set indicate that an I/O error has occurred on that file descriptor. However, this is not necessarily the case, as is outlined in select_tut(2):

exceptfds
       This set is watched for "exceptional conditions".  In
       practice, only one such exceptional condition is common: the
       availability of out-of-band (OOB) data for reading from a TCP
       socket.  See recv(2), send(2), and tcp(7) for more details
       about OOB data.  (One other less common case where select(2)
       indicates an exceptional condition occurs with pseudoterminals
       in packet mode; see tty_ioctl(4).)  After select() has
       returned, exceptfds will be cleared of all file descriptors
       except for those for which an exceptional condition has
       occurred.

So this says that it's usually indicative of out-of-band data being present or a certain condition in pseudoterminals in packet mode. Skimming through the source I didn't find any instance in which libtorrent sends out-of-band data, and it doesn't use pseudoterminals as far as I'm aware. Considering that this problem only shows itself on Solaris derivatives, I figure it's a Solaris' platform-specific situation in which it's perhaps more relaxed about what it considers to be an "exceptional condition."

The Solaris man page for select() says:

If a socket has a pending error, it is considered to have an exceptional condition pending. Otherwise, what constitutes an exceptional condition is file type-specific. For a file descriptor for use with a socket, it is protocol-specific except as noted below. For other file types, if the operation is meaningless for a particular file type, select() or pselect() indicates that the descriptor is ready for read or write operations and indicates that the descriptor has no exceptional condition pending.
...
A socket is considered to have an exceptional condition pending if a receive operation with O_NONBLOCK clear for the open file description and with the MSG_OOB flag set would return out-of-band data without blocking. (It is protocol-specific whether the MSG_OOB flag would be used to read out-of-band data.) A socket will also be considered to have an exceptional condition pending if an out-of-band data mark is present in the receive queue.

So I added a check to retrieve the error code associated with the socket pertaining to that file descriptor. If there is indeed an error, then follow through with throwing the exception along with descriptive text regarding what the error is. If not, then continue on normally.

User @lotheac applied the patch along with rakshasa/rtorrent#51 and tested it for a few days before reporting back that everything appeared to be working fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants