Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GAM errors out printing devices if it takes > 1 hour to fetch them all #1534

Closed
jay0lee opened this issue Jun 30, 2022 · 3 comments
Closed

Comments

@jay0lee
Copy link
Member

jay0lee commented Jun 30, 2022

Steps to reproduce

  • gam print devices on a domain with a large (150,000 or more) devices. After running for one hour, GAM will fail with an error like ERROR: 400 - Request contains an invalid argument. - 400
  • For a domain with fewer enrolled devices, it's possible to simulate this issue by setting pageSize=1 here and then adding a sleep(600) at the bottom of the while True: loop here. This tells GAM to only retrieve one device per page (the default is 100) and to sleep 10 minutes in between page retrievals. Assuming you have at least 6 devices, the process should fail in just over an hour's time just like the above issue in a.large domain.

Further detail

This is Google internal bug 237397223. It seems that a series of pages retrieved by the API as a script loops through pages and nextPageToken values (a book if you will) is only good for an hour. If you try to retrieve a page in that sequence later than one hour after the first page was retrieved (where pageToken was not set) then the API call will fail with the above error.

Most customers don't hit this because an hour is plenty of time to retrieve tens of thousands of devices but large customers with more than 150,000 devices will run into it.

It's also worth noting that deviceUsers.list() faces the same issue.

Other Google API list() calls may face similar issues and should probably be tested.

Workaround

To work around this issue we can try:

  • retrieve the first hour of pages from the devices.list() API. Set the orderBy=creation_time parameter. That parameter will ensure we are retrieving the oldest devices first.
  • As we retrieve pages of devices in this first hour, look at the createTime parameter of each retrieved device and keep track of the newest device we see.
  • When we do get the 400 error, start over with another devices.list() API call and a new set of pages but this time additionally set the filter parameter. The filter can be set to something like filter=register:<create_time_of_newest_device> where we are setting the create time of the newest device we've already retrieved. This essentially allows us to pick up where we left off when the Google servers threw an error. We do need to de-dupe the results since Google will send us the last device again but that can be handled relatively easily.
@unextro
Copy link

unextro commented Aug 17, 2022

This "bug" drove me crazy last year. :) https://groups.google.com/g/google-apps-manager/c/hMywF_k1FxI/m/lYXSGlJXCwAJ

I tried to export all browsers now with gam print browsers fields browsers and it still seems to behave the same for me. (Ending with an error after one hour.) Was it supposed to fix also browser export or just ChromeOS devices?

@jay0lee
Copy link
Member Author

jay0lee commented Apr 18, 2023

This issue should be fixed on Google's end now. The pageToken should continue to work even after 1 hour. Work remains to back out GAM's complicated workarounds that were needed to deal with the bug.

@taers232c fyi

@taers232c
Copy link
Contributor

There was a similar bug when printing cros telemetry, do you know if that is fixed as well?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants