-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #509 - Flag for reviveOffers and the duration for which to reject offers #510
Conversation
One important thing to add is to |
@ConnorDoyle: thanks for reviewing this. We do that here: https://github.com/mesos/chronos/pull/510/files#diff-3b3cdf7fd755c23f1c70baa2d7c90368R49 |
Oops, I missed it. Great! |
4cab5cb
to
db8e1ae
Compare
Added some unit tests. |
c555b2b
to
52a53ae
Compare
@@ -161,6 +164,10 @@ class TaskManager @Inject()(val listeningExecutor: ListeningScheduledExecutorSer | |||
val job = jobOption.get | |||
jobsObserver.apply(JobQueued(job, taskId, attempt)) | |||
} | |||
|
|||
if (config.reviveOffersForNewJobs()) { | |||
mesosOfferReviver.reviveOffers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a counter here? From the metrics registry.
Code looks good. Can you comment on how this has been tested? Aside from the unit tests (thanks for those, btw). |
…ct offers `--revive_offers_for_new_jobs` if true, revive offers is called when a new job is added to the `TaskManager`. Also note, that Mesos only filters offers that are a strict sub set of a rejected offer. `--decline_offer_duration` allows configuring the duration for which unused offers are declined. `--min_revive_offers_interval` if `--revive_offers_for_new_jobs` is specified, do not call reviveOffers more often than this interval. It defaults to 5 seconds.
52a53ae
to
71a4b2b
Compare
Thanks for having taken the time to review this. The new actor debounces reviveOffers calls, so I added two counters: I added a couple more unit tests and also did some manual testing: Reproducing the starvation
#!/bin/sh
i=0;
while [[ $i -le 9 ]]; do
bin/start-chronos.bash --master 127.0.0.1:5050 \
--graphite_host_port 192.168.99.100:2003 \
--graphite_reporting_interval 5 \
--graphite_group_prefix "chronos_${i}" \
--http_port "808${i}" \
--zk_path "/chronos${i}/state" &>"/tmp/chronos_logs/${i}.log"&
i=$((i+1));
done;
{
"name": "test",
"command": "sleep 500000",
"schedule": "R//PT10M",
"runAsUser": "gaston",
"disk": "35",
"mem": "1"
} {
"name": "starved",
"command": "sleep 20",
"schedule": "R//PT1M",
"runAsUser": "gaston",
"disk": "2",
"mem": "1"
}
Trying out the new flags
|
Looks great to me, nice work. 👍 Merge away. |
Fixes #509 - Flag for reviveOffers and the duration for which to reject offers
@ConnorDoyle @elingg @brndnmtthws
NOTE: I plan to add some tests tomorrow, but I would really appreciate if I could get any feedback in the meantime. This has already been done to Marathon, please see mesosphere/marathon#1931.
Add the following command line flags:
--revive_offers_for_new_jobs
if true, revive offers is called when a new jobis added to the
TaskManager
.Also note, that Mesos only filters offers that are a strict sub set of a
rejected offer.
--decline_offer_duration
allows configuring the duration for which unusedoffers are declined.
--min_revive_offers_interval
if--revive_offers_for_new_jobs
is specified,do not call reviveOffers more often than this interval. It defaults to 5
seconds.