Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid a spin loop when inactive is fired before active #470

Merged
merged 4 commits into from
Aug 19, 2024

Conversation

glbrntt
Copy link
Contributor

@glbrntt glbrntt commented Jul 19, 2024

Motivation:

The NIOSSLHandler can enter a spin loop in doUnbufferActions if writing data into BoringSSL fails with SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE.

One way these errors can happen is if a write into BoringSSL happens prior to the handshake completing. The NIOSSLHandler is explicit about starting the handshake: it's done in channel active and in handler added if the channel is already active.

However, the handshaking step is currently done without any state checking, so if the state is 'closed' (as it would be after channel inactive) then the handshake will still start. This can happen if channel inactive happens before channel active.

To reach the write loop there must be a buffered write and flush prior to the handshake step starting and the state isn't 'idle' or 'handshaking'. This can happen if a write and flush hapens while 'NIOSSLHandler' is in 'channelActive' (it forwards the 'channelActive' event before starting the handshake) and 'channelInactive' came first.

Modifications:

  • Early exit from 'doHandshakeStep' if the state isn't applicable for starting a handshake.
  • Don't buffer writes when the state is 'closed' as they'll never succeed and can be failed immediately.
  • If the write isn't succesful in 'doUnbufferActions' then either write to the network or try reading.
  • Only allow a limited number of spins through 'doUnbufferActions'

Result:

Motivation:

The NIOSSLHandler can enter a spin loop in `doUnbufferActions` if
writing data into BoringSSL fails with SSL_ERROR_WANT_READ or
SSL_ERROR_WANT_WRITE.

One way these errors can happen is if a write into BoringSSL happens
prior to the handshake completing. The NIOSSLHandler is explicit about
starting the handshake: it's done in channel active and in handler added
if the channel is already active.

However, the handshaking step is currently done without any state
checking, so if the state is 'closed' (as it would be after channel
inactive) then the handshake will still start. This can happen if
channel inactive happens before channel active.

To reach the write loop there must be a buffered write and flush prior
to the handshake step starting and the state isn't 'idle' or 'handshaking'.
This can happen if a write and flush hapens while 'NIOSSLHandler' is in
'channelActive' (it forwards the 'channelActive' event _before_ starting
the handshake) and 'channelInactive' came first.

Modifications:

- Early exit from 'doHandshakeStep' if the state isn't applicable for
  starting a handshake.
- Don't buffer writes when the state is 'closed' as they'll never
  succeed and can be failed immediately.
- If the write isn't succesful in 'doUnbufferActions' then either write
  to the network or try reading.
- Only allow a limited number of spins through 'doUnbufferActions'

Result:

- Resolves apple#467
@glbrntt glbrntt added the semver/patch No public API change. label Jul 19, 2024
@glbrntt glbrntt requested a review from Lukasa July 19, 2024 14:32
Sources/NIOSSL/NIOSSLHandler.swift Show resolved Hide resolved
if let promise = bufferedWrite.promise { promises.append(promise) }
_ = self.bufferedActions.removeFirst()
} else if didWrite {
// The write into BoringSSL unsuccessful. This happens when BoringSSL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The write into BoringSSL unsuccessful. This happens when BoringSSL
// The write into BoringSSL was unsuccessful. This happens when BoringSSL

break writeLoop
} else {
// No write was successful in this loop, so assume the error
// was 'wants read'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to assume here? It seems likely that we might want to actually take an action on this, should we be returning these values instead?

Relatedly, I'm not sure calling doDecodeData in the loop here is a good idea. I don't think we should have any pending buffered data, we usually clear it on readComplete.

Is the expectation here that we're actually in the handshake? If that's true, I wonder if we should rethink the state management here and check for that state, rather than indirectly try to handle want read and want write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly, I'm not sure calling doDecodeData in the loop here is a good idea. I don't think we should have any pending buffered data, we usually clear it on readComplete.

Is the expectation here that we're actually in the handshake? If that's true, I wonder if we should rethink the state management here and check for that state, rather than indirectly try to handle want read and want write.

Being in the implicit handshake is one way we can get to this point. I'm not sure what other situations would get us into this state. With that said though, I think the additional state checking prevents us from getting into the implicit handshake.

I didn't feel entirely comfortable about these changes (especially the else branch) but leaving the potential spin loop also didn't sit right.

Given this code – aside from the known path into the spin loop – has worked fine for years, perhaps we should back out these changes and just keep the iteration limit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the iteration limit can work, combined with checking our state and delaying any flushing until after we think the handshake is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the iteration limit can work

To clarify, do you mean just the iteration limit as-is? Or without the current else branch? Or something else?

combined with checking our state and delaying any flushing until after we think the handshake is done.

Okay I think we have our bases covered then, we only call doUnbufferActions(context:) from three different places:

  1. channelRead(context:data:) if the state is active or outputClosed
  2. completeHandshake(context:)
  3. flush(context:) if state is one of .active, .unwrapping, .closing, .unwrapped, .inputClosed, .outputClosed, .closed

So we should only be unbuffering actions and flushing if we're in an appropriate state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or without the current else branch?

Without the current else branch.

@glbrntt glbrntt requested a review from Lukasa July 24, 2024 09:49
@glbrntt glbrntt requested a review from Lukasa August 13, 2024 15:56
@Lukasa Lukasa merged commit 7b84abb into apple:main Aug 19, 2024
8 of 9 checks passed
@weissi
Copy link
Member

weissi commented Aug 19, 2024

Thank you!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/patch No public API change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

100% CPU spin loop forever when writeAndFlush gets run from channelActive
3 participants