Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child_process: fix handleless NODE_HANDLE handling #13235

Merged
merged 1 commit into from
Jul 1, 2017

Conversation

santigimeno
Copy link
Member

It can happen that recvmsg() may return an error on ancillary data
reception when receiving a NODE_HANDLE message (for example
MSG_CTRUNC). This would end up, if the handle type was net.Socket,
on a message event with a non null but invalid sendHandle. Avoid
this by checking the handle validity before performing the
handleConversion.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

child_process

@nodejs-github-bot nodejs-github-bot added the child_process Issues and PRs related to the child_process subsystem. label May 26, 2017
@santigimeno
Copy link
Member Author

Let's see if this can fix: #3072. /cc @cjihrig

@santigimeno
Copy link
Member Author

BTW thanks to @cjihrig as he came up with a test case and helped debugging the issue.

@santigimeno
Copy link
Member Author

Removed the test as it doesn't look reliable at all. If somebody has a better idea for a test, it would be great.

// It can happen the handle is not received because of some error on
// ancillary data reception such as MSG_CTRUNC. In this case don't try to
// perform any conversion.
if (!handle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... I understand the reasoning behind this but I'm wondering if this is the right approach. If the handle is invalid at this point, should we treat it as an error condition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily an error. For example, connections can be closed by the client while they wait in a queue to be sent from the master to the worker. I think the correct thing to do is advocate that people check their received handles in the worker prior to using them (#13196).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as far as I can tell there are two cases where the sending a handle ends up with no handle on reception:

  • As Colin said, if the handle is closed while waiting in the sending queue. This case is detected by the sending process and instead of sending a NODE_HANDLE message, it sends a normal ipc message. In this case, the sendHandle argument of the message listener callback is set to null.
  • When for some reason, the handle cannot be duped on the receiving process. This is the case this PR tries to fix and I think it makes sense that it behaves in the same way as the previous case (setting sendHandle to null).

I think there's another argument not to treat it as an error. We can send data (the message argument) along with the handle that can be useful on reception even if there's no handle associated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm convinced :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a better approach be to send a "please resend the handle" message to the master instead of NODE_HANDLE_ACK?

Copy link
Member Author

@santigimeno santigimeno Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. Do you have something in mind on how would that work on the sending process?
I think resending the message automatically can lead to problems if the receiving process constantly fails. Maybe using the process.send() callback could be a good idea. So if the message sent is a NODE_HANDLE wait for a NODE_HANDLE_ACK or NODE_HANDLE_NACK to call the callback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I said "master" when I should have said "parent process."

You are right that unchecked repeated resending is no good. The cluster module could deal with it by sending the handle to another worker after a few turns but that isn't applicable to the child_process module.

I don't have a great solution in mind but as a strawman: try three times and close + log a warning after that? If it happens frequently enough and bothers users enough they will file bug reports about it and we can try to come up with something better.

We could cheat when the ChildProcess instance is managed by the cluster module and do the send-to-another-worker thing. It would be tight coupling but that doesn't bother me personally. Your call.

@@ -516,6 +516,12 @@ function setupChannel(target, channel) {
// a message.
target._send({ cmd: 'NODE_HANDLE_ACK' }, null, true);

// It can happen the handle is not received because of some error on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace "It can happen the handle" with "It is possible that the handle"

@santigimeno santigimeno force-pushed the cluster_fix branch 2 times, most recently from a884a07 to ed6e9fb Compare June 4, 2017 09:15
@santigimeno
Copy link
Member Author

PR updated with @bnoordhuis suggestion. The new commit message would look:

child_process: fix handleless NODE_HANDLE handling

It is possible that `recvmsg()` may return an error on ancillary data
reception when receiving a `NODE_HANDLE` message (for example
`MSG_CTRUNC`). This would end up, if the handle type was `net.Socket`,
on a `message` event with a non null but invalid `sendHandle`. To
improve the situation, send a `NODE_HANDLE_NACK` that'll cause the
sending process to retransmit the message again. In case the same
message is retransmitted 3 times without success, close the handle and
print a warning.

target._send(target._pendingMessage.message,
target._pendingMessage.handle,
target._pendingMessage.options,
target._pendingMessage. callback);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erroneous whitespace

process.emitWarning('Handle didn\'t reach the receiving process ' +
'correctly', 'SentHandleNotReceivedWarning');
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified:

if (target._pendingMessage) {
  if (message.cmd === 'NODE_HANDLE_ACK') {
    closePendingHandle(target);
  } else if (target._pendingMessage.retransmissions++ ===
      MAX_HANDLE_RETRANSMISSIONS) {
    // If the handle didn't reach the receiving process correctly retry up
    // to 3 times. If it still fails, close it and print a warning.
    closePendingHandle(target);
    process.emitWarning('Handle didn\'t reach the receiving process ' +
                        'correctly', 'SentHandleNotReceivedWarning');
  }
}

handle: handle,
options: options,
retransmissions: 0
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could shorten this to:

target._pendingMessage =
    { callback, message, handle, options, retransmissions: 0 };

target._pendingHandle.close();
target._pendingHandle = null;
if ((message.cmd === 'NODE_HANDLE_ACK') ||
(message.cmd === 'NODE_HANDLE_NACK')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous parens.

@santigimeno santigimeno force-pushed the cluster_fix branch 2 times, most recently from b4815fb to 1dda284 Compare June 7, 2017 21:38
@santigimeno
Copy link
Member Author

PR updated with comments from @BridgeAR and @bnoordhuis addressed. PTAL. Thanks!

@santigimeno
Copy link
Member Author

ping @bnoordhuis @cjihrig :)

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I remember looking at this last week but I must've forgotten to actually submit my review.

LGTM and sorry for the delay!

target._send(target._pendingMessage.message,
target._pendingMessage.handle,
target._pendingMessage.options,
target._pendingMessage.callback);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put braces around the body for legibility?

Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests would be great, but LGTM.

} else if (target._pendingMessage.retransmissions++ ===
MAX_HANDLE_RETRANSMISSIONS) {
closePendingHandle(target);
process.emitWarning('Handle didn\'t reach the receiving process ' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: could you change didn't to did not.

@santigimeno
Copy link
Member Author

santigimeno commented Jun 23, 2017

All nits addressed.

Tests would be great

I agree, I tried something that worked for me locally but wasn't reliable at all. If anyone has suggestions would be great.

CI: https://ci.nodejs.org/job/node-test-pull-request/8813/

@santigimeno
Copy link
Member Author

@santigimeno
Copy link
Member Author

It is possible that `recvmsg()` may return an error on ancillary data
reception when receiving a `NODE_HANDLE` message (for example
`MSG_CTRUNC`). This would end up, if the handle type was `net.Socket`,
on a `message` event with a non null but invalid `sendHandle`. To
improve the situation, send a `NODE_HANDLE_NACK` that'll cause the
sending process to retransmit the message again. In case the same
message is retransmitted 3 times without success, close the handle and
print a warning.

PR-URL: nodejs#13235
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@santigimeno santigimeno merged commit 71ca122 into nodejs:master Jul 1, 2017
@santigimeno
Copy link
Member Author

Landed in 71ca122. Thanks!

addaleax pushed a commit to addaleax/node that referenced this pull request Jul 3, 2017
It is possible that `recvmsg()` may return an error on ancillary data
reception when receiving a `NODE_HANDLE` message (for example
`MSG_CTRUNC`). This would end up, if the handle type was `net.Socket`,
on a `message` event with a non null but invalid `sendHandle`. To
improve the situation, send a `NODE_HANDLE_NACK` that'll cause the
sending process to retransmit the message again. In case the same
message is retransmitted 3 times without success, close the handle and
print a warning.

PR-URL: nodejs#13235
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@addaleax addaleax mentioned this pull request Jul 3, 2017
addaleax pushed a commit that referenced this pull request Jul 11, 2017
It is possible that `recvmsg()` may return an error on ancillary data
reception when receiving a `NODE_HANDLE` message (for example
`MSG_CTRUNC`). This would end up, if the handle type was `net.Socket`,
on a `message` event with a non null but invalid `sendHandle`. To
improve the situation, send a `NODE_HANDLE_NACK` that'll cause the
sending process to retransmit the message again. In case the same
message is retransmitted 3 times without success, close the handle and
print a warning.

PR-URL: #13235
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
addaleax pushed a commit that referenced this pull request Jul 18, 2017
It is possible that `recvmsg()` may return an error on ancillary data
reception when receiving a `NODE_HANDLE` message (for example
`MSG_CTRUNC`). This would end up, if the handle type was `net.Socket`,
on a `message` event with a non null but invalid `sendHandle`. To
improve the situation, send a `NODE_HANDLE_NACK` that'll cause the
sending process to retransmit the message again. In case the same
message is retransmitted 3 times without success, close the handle and
print a warning.

PR-URL: #13235
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@MylesBorins
Copy link
Contributor

Landed on v6.x-staging in 010857a18f, let me know if it should be backed out

MylesBorins pushed a commit that referenced this pull request Aug 15, 2017
It is possible that `recvmsg()` may return an error on ancillary data
reception when receiving a `NODE_HANDLE` message (for example
`MSG_CTRUNC`). This would end up, if the handle type was `net.Socket`,
on a `message` event with a non null but invalid `sendHandle`. To
improve the situation, send a `NODE_HANDLE_NACK` that'll cause the
sending process to retransmit the message again. In case the same
message is retransmitted 3 times without success, close the handle and
print a warning.

Fixes: #9706
PR-URL: #13235
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this pull request Aug 15, 2017
It is possible that `recvmsg()` may return an error on ancillary data
reception when receiving a `NODE_HANDLE` message (for example
`MSG_CTRUNC`). This would end up, if the handle type was `net.Socket`,
on a `message` event with a non null but invalid `sendHandle`. To
improve the situation, send a `NODE_HANDLE_NACK` that'll cause the
sending process to retransmit the message again. In case the same
message is retransmitted 3 times without success, close the handle and
print a warning.

Fixes: #9706
PR-URL: #13235
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this pull request Aug 16, 2017
It is possible that `recvmsg()` may return an error on ancillary data
reception when receiving a `NODE_HANDLE` message (for example
`MSG_CTRUNC`). This would end up, if the handle type was `net.Socket`,
on a `message` event with a non null but invalid `sendHandle`. To
improve the situation, send a `NODE_HANDLE_NACK` that'll cause the
sending process to retransmit the message again. In case the same
message is retransmitted 3 times without success, close the handle and
print a warning.

Fixes: #9706
PR-URL: #13235
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Aug 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants