-
-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve --single-directory sync performance #992
Conversation
This commit modifies code inside applyDifferences function, the part responsible for deleting changes/moving them out of the selected sync directories. This change makes an HTTP request responsible for checking whether a changed item still exists on OneDrive only be sent only if it may in fact influence whether an item will be deleted from the synced folder or not or just discarded. This makes an enormous performance boost, because it limits redundant HTTP requests that ask about changed items that will be discarded or not. Signed-off-by: Mateusz P-K <[email protected]>
@kapec94
The premise behind this was when files / folders were moved either into scope of sync_list or now out of scope. With this code change (will test shortly) to see if this is still working correctly for moving files into / out of scope and correctly deleting / adding locally, but always remaining on OneDrive. Those files should only be deleted on OneDrive if deleted locally, when in sync scope.
I think you are right here. Those returns should not be there, and if executed, if there were any other changes to process, those returns would prevent them from being processed.
This call should be performing a 'search match' to see if 'originalLocalPath' is anywhere in 'syncFolderChildPath'. Depending on account types (business / personal) and folder types (local in account OneDrive, remote in another users account). This only ever gets set if the item object from OneDrive contains a 'parentReference' - which at this stage only ever is set when a shared folder is added to the users account via the 'Add to my OneDrive' web function (caveat: In testing business accounts / Sharepoint - this is also set) .. so yes ... 99% of the time this should return false, unless the item is from a shared folder (or one of the business accounts that add the data in - MS Graph API is at times very inconsistent).
Yes ... I know. I have some basic testing via the Travis CI process (which is also why this PR is showing up as failed because you do not have access to the private key) - see #711 for further details. Basically, it would be ideal to have a script or process that can generate data / config & test each of the options to ensure that nothing is broken. This way anyone could run those tests either with their own account or they create a test account. The challenge here would be 'shared folder' testing, which right now I have to do all manually with a few 'test' accounts that I have created. |
* Remove erroneous 'return' statement which could prematurely end processing all changes returned from OneDrive
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hello @abraunegg,
I've prepared a PR that immensely improves performance of --single-directory and sync_list synchronizations, especially those that require iterating through full OneDrive tree.
Disclaimer
Please note that this is simply a change proposal that may or may not be acceptable yet, but I'm willing to cooperate to make it better.
What does this change include?
This PR moves
oneDriveMovedNotDeleted
check part fromsync.d:1597
from the beginning of the code block just beforeidsToDelete
modification insync.d:1643
. What it does is thatonedrive.getPathDetailsById
invocation is only made if it can really make an impact whether a change item is to be discarded or deleted in synced directory.Why is the change made?
Such change is important, because
onedrive.getPathDetailsById
invocation there is responsible for almost 90% of the time spent inside that function if there are a lot of discarded change items (I've done some profiling with valgrind). Example: on my 200 GB OneDrive: syncing --single-directory of a folder with one file took almost 12 hours before I stopped it out of pity (and it didn't finish!) (all the time was spent discarding changes).How&why does this change works?
Basically said code block makes three mostly independent checks:
Item is only deleted from synced folder if all of those checks are true. In every other case the change is discarded. So no matter the result of check 1., change will be discarded if check 2. and check 3. are false. The solution made is to make check 1. (which is extremely expensive) execute only after checks 2. and 3. pass.
That way changes that would be discarded because of check 2. or 3. are processed much faster. 10x faster.
What is the benefit?
Is the change tested?
Well... Didn't find any automated tests around, so just made several smoke tests, like checking if the code works and it works the same for basic --single-directory operations like move, delete, out-of-folder delete. It does work and it does work the same. But I didn't manage to trigger
oneDriveMovedNotDeleted
check code path in any case.Notes
I've just made a PR, instead of first reporting a ticket, because I needed to have this made change now for my private project needs.
I would also like to discuss with you several findings around the code I've modified:
return
s inoneDriveMovedNotDeleted
finally blocks affect the code (for examplesync.d:1605
). If suchreturn
statement is executed, thenapplyDifferences
function is interrupted and changes are not further processed. Are those returns there for any particular purpose?canFind(originalLocalPath, syncFolderChildPath)
call insync.d:1639
.originalLocalPath
evaluates to the file path that is not prefixed by/drive/root
, assyncFolderChildPath
variable is, so thatcanFind
call always seem to returnfalse
. Unless I'm missing something?I would really appreciate getting in touch with you to try making that change. I feel like it has a potential to make your tool's experience even greater.