Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate flaky sequential/test-benchmark-child-process on Windows #12560

Closed
Trott opened this issue Apr 21, 2017 · 9 comments
Closed

investigate flaky sequential/test-benchmark-child-process on Windows #12560

Trott opened this issue Apr 21, 2017 · 9 comments
Labels
benchmark Issues and PRs related to the benchmark subsystem. child_process Issues and PRs related to the child_process subsystem. test Issues and PRs related to the tests. windows Issues and PRs related to the Windows platform.

Comments

@Trott
Copy link
Member

Trott commented Apr 21, 2017

  • Version: v8.0.0-pre
  • Platform: win2008r2
  • Subsystem: test

sequential/test-benchmark-child-process is still failing sometimes flaky on Windows in CI. I'll open a PR to mark it as flaky. This issue is for trying to locate the problem and a solution.

When the test succeeds, it seems to take just a few seconds.

https://ci.nodejs.org/job/node-test-binary-windows/RUN_SUBSET=3,VS_VERSION=vs2015,label=win2008r2/7865/console

ok 356 sequential/test-benchmark-child-process
  ---
  duration_ms: 1.764

When it fails, it's a timeout.

https://ci.nodejs.org/job/node-test-binary-windows/7867/RUN_SUBSET=3,VS_VERSION=vs2015,label=win2008r2/console

not ok 356 sequential/test-benchmark-child-process
  ---
  duration_ms: 60.70
  severity: fail
  stack: |-
    timeout

This would suggest a race condition or something else causing a child process to hang or something. And that might be the cause. But...

Interestingly, a stress test where the five benchmarks that this test calls were all split out into individual tests, succeeded but each test took around 30 seconds to run. Wha??!! I know! (Only other change in those tests is the dur option for the benchmarks was increased from 0 to 0.1. Well, that, and that this test was run on win2016 so maybe the results are completely irrelevant? I don't know.)

https://ci.nodejs.org/job/node-stress-single-test/1161/nodes=win2016/console:

ok 1 sequential/test-benchmark-child-process-exec-stdout
  ---
  duration_ms: 3.158
  ...
ok 2 sequential/test-benchmark-child-process-params
  ---
  duration_ms: 36.735
  ...
ok 3 sequential/test-benchmark-child-process-read-ipc
  ---
  duration_ms: 30.116
  ...
ok 4 sequential/test-benchmark-child-process-read
  ---
  duration_ms: 31.844
  ...
ok 5 sequential/test-benchmark-child-process-spawn-echo
  ---
  duration_ms: 31.661

So I'm not sure what's going on here. Maybe it can be worked out by someone more comfortable testing and debugging on Windows or someone more deeply familiar with child_process and/or our benchmarking code. @nodejs/platform-windows @nodejs/benchmarking @mscdex @cjihrig @bnoordhuis @nodejs/testing

@Trott Trott added benchmark Issues and PRs related to the benchmark subsystem. child_process Issues and PRs related to the child_process subsystem. test Issues and PRs related to the tests. windows Issues and PRs related to the Windows platform. labels Apr 21, 2017
Trott added a commit to Trott/io.js that referenced this issue Apr 21, 2017
sequential/test-benchmark-child-process is still failing sometimes flaky
on Windows in CI. Mark it as flaky in sequential.status until it gets
sorted.

Refs: nodejs#12560
@Trott
Copy link
Member Author

Trott commented Apr 21, 2017

Guess it may be useful to loop in @nodejs/build too in case there's something relevant to know about the win2008r2 hosts....

Trott added a commit to Trott/io.js that referenced this issue Apr 21, 2017
sequential/test-benchmark-child-process is still failing sometimes flaky
on Windows in CI. Mark it as flaky in sequential.status until it gets
sorted.

PR-URL: nodejs#12561
Ref: nodejs#12560
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
@joaocgreis
Copy link
Member

Looking at https://ci.nodejs.org/computer/test-azure_msft-win2016-x64-6/builds , the job that run right after it killed a left-over yes.exe process: https://ci.nodejs.org/job/node-test-binary-windows/RUN_SUBSET=2,VS_VERSION=vs2015,label=win2016/7860/consoleFull

c:\workspace\node-test-binary-windows\RUN_SUBSET\2\VS_VERSION\vs2015\label\win2016>TASKKILL /F /IM yes.exe /T   || TRUE
SUCCESS: The process with PID 6256 (child process of PID 6652) has been terminated.

Could that first test be leaving a yes.exe behind draining CPU?

@refack
Copy link
Contributor

refack commented Apr 21, 2017

I'm looking as well.

@Trott
Copy link
Member Author

Trott commented Apr 21, 2017

Could that first test be leaving a yes.exe behind draining CPU?

Sounds about right since the benchmarks use yes.exe. I'm not sure why yes.exe isn't terminating reliably, of course....

@Trott
Copy link
Member Author

Trott commented Apr 21, 2017

@joaocgreis @refack Thanks for looking at this, by the way! I'm very grateful for that.

@Trott
Copy link
Member Author

Trott commented Apr 24, 2017

So, we need a way to make sure yes.exe is reliably terminated after the benchmarks that use it are done. Has anyone else looked into this and gotten further than that? Just that info we already have is hugely helpful, but if there's more info to work with, I certainly want it. :-D

@bzoz
Copy link
Contributor

bzoz commented Apr 25, 2017

child_process/child-process-exec-stdout benchmark uses exec('yes', ..) which will spawn yes inside a shell. Killing the shell does not always kill yes, so we get a leftover yes from time to time.

We can try using

require('child_process').execSync(`taskkill /f /t /pid ${child.pid}`)

instead of child.kill(). The /t switch is for terminating entire process tree. From my tests it works, but I haven gone to testing if this solves test-benchmark-child-process flakiness.

evanlucas pushed a commit that referenced this issue Apr 25, 2017
sequential/test-benchmark-child-process is still failing sometimes flaky
on Windows in CI. Mark it as flaky in sequential.status until it gets
sorted.

PR-URL: #12561
Ref: #12560
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
@Trott
Copy link
Member Author

Trott commented Apr 25, 2017

@bzoz Thanks! Awesome. I'll test that out using the Jenkins stress test job and we'll see how it does....

Trott added a commit to Trott/io.js that referenced this issue Apr 25, 2017
test-benchmark-child-process failures reveal that
child-process-exec-stdout benchmark sometimes leaves around a stray
yes.exe process. Add code to terminate the process.

Refs: nodejs#12560
Trott added a commit to Trott/io.js that referenced this issue Apr 28, 2017
test-benchmark-child-process failures reveal that
child-process-exec-stdout benchmark sometimes leaves around a stray
yes.exe process. Add code to terminate the process.

PR-URL: nodejs#12658
Ref: nodejs#12560
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Bartosz Sosnowski <[email protected]>
Reviewed-By: James M Snell <[email protected]>
evanlucas pushed a commit that referenced this issue May 1, 2017
sequential/test-benchmark-child-process is still failing sometimes flaky
on Windows in CI. Mark it as flaky in sequential.status until it gets
sorted.

PR-URL: #12561
Ref: #12560
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
evanlucas pushed a commit that referenced this issue May 1, 2017
test-benchmark-child-process failures reveal that
child-process-exec-stdout benchmark sometimes leaves around a stray
yes.exe process. Add code to terminate the process.

PR-URL: #12658
Ref: #12560
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Bartosz Sosnowski <[email protected]>
Reviewed-By: James M Snell <[email protected]>
evanlucas pushed a commit that referenced this issue May 2, 2017
sequential/test-benchmark-child-process is still failing sometimes flaky
on Windows in CI. Mark it as flaky in sequential.status until it gets
sorted.

PR-URL: #12561
Ref: #12560
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Daniel Bevenius <[email protected]>
evanlucas pushed a commit that referenced this issue May 2, 2017
test-benchmark-child-process failures reveal that
child-process-exec-stdout benchmark sometimes leaves around a stray
yes.exe process. Add code to terminate the process.

PR-URL: #12658
Ref: #12560
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Bartosz Sosnowski <[email protected]>
Reviewed-By: James M Snell <[email protected]>
evanlucas pushed a commit that referenced this issue May 2, 2017
test-benchmark-child-process failures reveal that
child-process-exec-stdout benchmark sometimes leaves around a stray
yes.exe process. Add code to terminate the process.

PR-URL: #12658
Ref: #12560
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Bartosz Sosnowski <[email protected]>
Reviewed-By: James M Snell <[email protected]>
evanlucas pushed a commit that referenced this issue May 3, 2017
test-benchmark-child-process failures reveal that
child-process-exec-stdout benchmark sometimes leaves around a stray
yes.exe process. Add code to terminate the process.

PR-URL: #12658
Ref: #12560
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Bartosz Sosnowski <[email protected]>
Reviewed-By: James M Snell <[email protected]>
gibfahn pushed a commit that referenced this issue May 16, 2017
test-benchmark-child-process failures reveal that
child-process-exec-stdout benchmark sometimes leaves around a stray
yes.exe process. Add code to terminate the process.

PR-URL: #12658
Ref: #12560
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Bartosz Sosnowski <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue May 18, 2017
test-benchmark-child-process failures reveal that
child-process-exec-stdout benchmark sometimes leaves around a stray
yes.exe process. Add code to terminate the process.

PR-URL: #12658
Ref: #12560
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Bartosz Sosnowski <[email protected]>
Reviewed-By: James M Snell <[email protected]>
@Trott
Copy link
Member Author

Trott commented May 22, 2017

Fixed in #12658, I believe.

@Trott Trott closed this as completed May 22, 2017
andrew749 pushed a commit to michielbaird/node that referenced this issue Jul 19, 2017
test-benchmark-child-process failures reveal that
child-process-exec-stdout benchmark sometimes leaves around a stray
yes.exe process. Add code to terminate the process.

PR-URL: nodejs/node#12658
Ref: nodejs/node#12560
Reviewed-By: Refael Ackermann <[email protected]>
Reviewed-By: Bartosz Sosnowski <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Issues and PRs related to the benchmark subsystem. child_process Issues and PRs related to the child_process subsystem. test Issues and PRs related to the tests. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

No branches or pull requests

4 participants