Skip to content

Conversation

@jchampio
Copy link

Occasionally, when under load from several wss:// connections, Apache will segfault. This appears to be due to mod_websocket's multithreaded use of the bucket brigade pair; note that OpenSSL may read during a write and vice-versa, which could cause multiple threads to access the same brigade simultaneously. The problem seems to be easier to reproduce on Windows (I haven't reproduced with a Linux server yet).

This pull request implements single-threaded access to the bucket brigade, with cross-thread communication being done through a queue. The entire backstory and discussion can be read on the httpd modules-dev mailing list.

Jacob Champion and others added 9 commits September 9, 2015 18:01
In preparation for an architecture where reads and writes to the brigade
happen on the same thread, perform nonblocking reads and poll on the
socket to determine when more data is available.
The public-facing API will eventually communicate cross-thread instead
of writing directly to the brigade. In preparation, pull the send logic
into an internal function, and call that from the read loop instead of
mod_websocket_plugin_send.
mod_websocket_handle_incoming now performs the core logic for
mod_websocket_data_framing. This will let us turn the processing loop
into a combination read/write loop more easily.

The state variables have all been moved into a separate
WebSocketReadState struct.
The prior implementation allowed clients to write to a parallel bucket
brigade from separate threads, while the main thread read incoming
messages from the original brigade. This appears to cause thread safety
issues, especially when combined with TLS (OpenSSL has the ability to
read from the socket during a write, and vice-versa).

This patch, inspired by both mod_spdy and mod_proxy_wstunnel, causes
cross-thread invocations of mod_websocket_plugin_send() to place their
messages on a queue and signal the main thread that a write is pending.
The event loop is based on apr_pollset.

See http://mail-archives.apache.org/mod_mbox/httpd-modules-dev/201502.mbox/%3C54EE23EB.5040705@ni.com%3E
for history. Thanks to Alex Bligh and Eric Covener for their
suggestions.
Now that the input and output brigades are only operated on from a
single thread, the extra pool and bucket allocator added as part of
commit cfaef07 can be removed. Also, allocate a single long-lived input
brigade instead of allocating new ones for every socket read.
The merged patches were posted to modules-dev, in a conversation started
at

http://mail-archives.apache.org/mod_mbox/httpd-modules-dev/201502.mbox/%3C54ED1473.1060604%40ni.com%3E

The purpose of the set is to fix crashes seen when hitting multiple
WebSockets in parallel over TLS. The original patchset cover letter
follows the break.

---

The first patch changes to nonblocking reads and the use of
apr_pollset_poll(). The next two patches are meant to be mostly
refactoring and code motion, in order to make the fourth patch (which
contains the actual architecture changes) more straightforward. The
fifth patch is cleanup of now unnecessary code.

Comments and suggestions are welcome!

The code in this patchset is Copyright (c) National Instruments and
Apache Software Foundation, licensed to the public under the Apache
License 2.0.
Pop elements into a proper void* to fix GCC warnings.
Apache 2.2 doesn't support ap_get_conn_socket(), so wrap it in a
compatibility layer. It turns out the core_module hack for getting the
apr_socket_t* was in use elsewhere already.
@jchampio jchampio mentioned this pull request Oct 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant