Add migration script for migrating old execution field data to the new format #5255

Kami · 2021-05-07T19:45:10Z

This pull request adds a simple migration script which migrations data for executions which utilize old and slow + inefficient fielld type to the new type.

I tested it locally and it works fine (it's also a bit of a PITA to manually test, easiest is to just create MongoDB snapshot with old version of st2 and re-use that with st2 master).

Implementation Details

#4846 introduced a new field types which is up to 10-30x faster when saving and retrieving objects with larger values.

In that PR I wasn't a massive fan of a migration script - yes I could add it, but I already spent many, many days on those performance optimization changes and I think migration script adds little value yet the effort is not totally trivial - yes the code changes mostly is, but things are not just a code change - it includes test cases, documentation, upgrade notes, etc and that takes quite some effort.

The migration script doesn't provide tons of value because it only works in old executions and most users only care about new and recent executions so if when viewing data for some old execution it takes longer to load it's not the end of the world.

I would be all of migration if that would actually provide more value - e.g. reduce maintenance burden / amount of code we need to maintain, but that is not the case and it likely won't change any time soon. We still utilize those two field types in many other places and migration script step is optional and manual and we can't really force users to run them and assume by the next release everyone has done it so we can just remove that code.

For now, I only implemented it for execution related objects. Technically we could also do it for workflow, trigger instance and rule related objects, but that would require even more work and some more "work around" to determine if specific object in database utilizes all type or now - it's possible to detect that, but it's not totally trivial and just re-writing all objects sames wasteful.

Complete contributions (with tests) for other models are also welcome (especially in this case where the change itself will likely save many 10's of thousands and likely even much more $ per month in terms of CPU utilization across all the large StackStorm installations so contributing such a small change seems worth trade off for such a big improvement :)).

TODO

Unite tests (will be a bit of a PITA, but it's manageable)
Upgrade notes entry - Add migration script for migrating old execution field data to the new format #5255

… the old and and very slow field type to the new field type. This is really an optional script since all the new objects will already utilize new field type and most users only care about new / recent executions (old executions are not retrieved that often so that taking a bit longer is not the end of the world).

use mongodb field $type query to determine if a particular object / field should be migrated.

error).

cross test pollution and related failures.

Previously setup-virtualenv step would fail in case Python version got upgraded since we used cache per partial Python version and not a complete one. I believe this change should fix that issue.

Kami · 2021-05-09T17:58:32Z

I also pushed a change so we specify complete Python version for setup-python action and virtualenv cache.

Previously we didn't use complete version for virtualenv directory cache which means build started to fail once Github actions upgraded Python 3.8 from 3.8.9 to 3.8.10 (which looks like it happened some time in the last couple of days).

Example builds:

https://github.com/StackStorm/st2/runs/2514019971?check_suite_focus=true (3.6.9 pre upgrade - passing)
https://github.com/StackStorm/st2/pull/5255/checks?check_run_id=2536076316 (3.6.10 post github actions upgrade - failing)

Technically, we could still only specify major + minor version, but we would still need cache per major + minor + patch version which means we would need another Github actions step which captured complete version post setup-python action which just adds additional complexity.

/cc @cognifloyd

directly affect us since we don't utilize websockets functionality).

Kami · 2021-05-09T18:16:51Z

I also included eventlet bump in this PR - that security issue doesn't affect us directly since we don't utilize websocket functionality, but it's still a good idea to upgrade.

Kami · 2021-05-12T16:28:10Z

@guzzijones Does that execution have result_size attribute in the collection document (you can use mongo shell to find out)?

Since that's what ?max_result_size query parameter logic utilizes. As described in st2web PR, if that attribute is not present, we can't perform that filtering and full result will be returned (same as before).

EDIT: It could be that migration script doesn't set result_size, but I would need to double check. If that's the case, I will need to make a small change to the migration script.

temporary workaround for AJ which will be removed after he runs this script and confirms it works.

Kami · 2021-05-12T16:43:07Z

@guzzijones This should do the trick for you - 2a0969f.

As you can see from the commit message and code, it include "one off" code change for your situation - once you confirm it works, I will remove it since that won't be needed anymore for other users.

I confirmed that change myself - ran mongo query to unset result size (db.action_execution_d_b.update({}, {$unset: {result_size: 1}}, false, true)) and then ran migration script.

guzzijones · 2021-05-12T17:05:24Z

This is a new execution i just ran with latest build of master.

https://stackstorm-vf-dev.xxx.com/api/v1/executions/609bfafc35774f26bfb9f31c?max_result_size=102400

brought back a 3MB json and chrome is struggling to render the component to show the results still.

guzzijones · 2021-05-12T17:07:56Z

st2-3.5dev-65.x86_64.rpm

Kami · 2021-05-12T17:08:28Z

What happens if you manually run cURL command to retrieve execution details with ?max_result_size query param? And as per my comment above - inspect collection using mongo engine and make sure that execution contains result_size attribute.

I can't reproduce it locally - with latest master it works fine for all the new (and migrated) executions.

guzzijones · 2021-05-12T17:15:43Z

I will need to lookup how to query mongo directly. Let me see if I can get the latest rpm installed, run a new workflow, and check the st2web result

Kami · 2021-05-12T17:19:44Z

Something like this should work in mongo shell:

use st2;
db.action_execution_d_b.find({"_id": ObjectId("ID")}).pretty();

guzzijones · 2021-05-12T17:32:12Z

{ "_id" : ObjectId("609bfafc35774f26bfb9f31c"), "result_size" : 232 }

it says the result size is 232? This json is 3MB

guzzijones · 2021-05-12T17:37:29Z

fwiw this is a workflow not an action.

Kami · 2021-05-12T17:40:22Z

@guzzijones Ah, that does change everything - for workflow executions, I believe that actual execution result field will just contain what workflow explicitly publishes as output (but, I need to double check and confirm).

How does workflow work - does it publish the whole result / similar? A workflow which reproduces this issue would be ideal.

But it could also be that it works as expected and it's more an issue / feature request for st2web.

guzzijones · 2021-05-12T18:02:30Z

The problem is the input to the workflow is huge. And that is displayed via st2web.

Kami · 2021-05-12T18:52:17Z

@guzzijones If it's indeed related to input (workflow action parameter?) then this it has nothing to do with this functionality and migration script - this one only solves it for execution results (same for related st2web change).

Having said that, please still post all the context which is needed to reproduce it (workflow + action definitions + data / parameters which reproduces it).

But if it's indeed not related to the result attribute itself, then you will need to open a new issue (feature request) for it, likely against st2web repo.

guzzijones · 2021-05-12T19:16:17Z

yes it is a feature request against st2web. Whatever you did for the result set we also need to do for input parameters.

Kami · 2021-05-12T19:18:59Z

That will probably be quite an involved change and likely also need to involve "support" on the server side if we want to implement it efficiently (aka adding parameters_size field or similar...). Even more so, because I don't think we have any "active" st2web contributor.

So likely sadly nothing which will happen any time soon.

Kami · 2021-05-12T19:19:45Z

@guzzijones Also, have you re-ran the migration script yet with my temporary change (migrating executions without result_size field)?

Please let me know when you and finish running it so I can remove that temporary change.

affected tests.

guzzijones · 2021-05-12T20:02:09Z

go ahead and revert that change.

Kami · 2021-05-12T20:18:17Z

@guzzijones Thanks for letting me know - will revert it.

Kami · 2021-05-12T20:21:02Z

@winem I believe all your feedback has been addressed and this is now ready for a final look.

winem · 2021-05-12T22:33:52Z

st2common/bin/migrations/v3.5/st2-migrate-db-dict-field-values

+# We retrieve and process up to this amount of object IDs in a single MongoDB query. This is done
+# to avoid potentially retrieving many 10s of millions of object ids in a single query
+# TODO: Implement batching --since-ts, --model=action,trigger,workflow flag
+BATCH_SIZE = 1000


Looks like batching is not yet implemented and I guess this variable definition is obsolete? I'll approve the PR to unblock you and all those looking forward for the fix on the apt cache issue as this is just a minor issue.

Yep, this is not needed anymore - won't implement it since --start-dt and --end-dt was implemented which can be used to achieve similar end result.

Will remove it.

Kami added performance mongodb migrations labels May 7, 2021

Kami added this to the 3.5.0 milestone May 7, 2021

pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label May 7, 2021

Kami force-pushed the db_field_type_migration_scripts branch from 339b2fb to b9fc5e0 Compare May 7, 2021 19:47

Kami added 9 commits May 7, 2021 21:48

Bump apt cache key.

7fe8da3

Add support for migrating other objects which utilize new data type and

c22fa56

use mongodb field $type query to determine if a particular object / field should be migrated.

Fix query syntax, remove unncessary variable.

70261e0

Print message if we fail to migrate "related" LiveAction (non-fatal

3a3693d

error).

Fix log statements.

10b920c

Add symlink needed to be able to import module in tests.

f849763

Remove duplicated comment.

9c600fd

Add basic unit tests for migration script.

928add0

Fix operator syntax.

a917e4d

Kami changed the title ~~[WIP] Add migration script for migrating old execution field data to the new format~~ Add migration script for migrating old execution field data to the new format May 8, 2021

Kami added 5 commits May 8, 2021 22:16

Update changelog.

3f0c184

Update script name.

f2bd224

Use correct path, use better test module name.

8c79a14

Re-organize the class definitions + allow_inheritance override to avoid

e48aca4

cross test pollution and related failures.

Pin python version to a full version including patch version.

05e3c04

Previously setup-virtualenv step would fail in case Python version got upgraded since we used cache per partial Python version and not a complete one. I believe this change should fix that issue.

Bump apt cache version.

9fc2af9

Kami mentioned this pull request May 9, 2021

Add upgrade notes entry on optional field data migration utility script StackStorm/st2docs#1071

Merged

Bump eventlet version due to security vulnerability (which doesn't

86afbb1

directly affect us since we don't utilize websockets functionality).

Kami added the external dependency label May 9, 2021

Kami force-pushed the db_field_type_migration_scripts branch 2 times, most recently from 24a3d74 to 5338a9c Compare May 9, 2021 18:26

Make sure we also set execution.result_size field on migration and add

2a0969f

temporary workaround for AJ which will be removed after he runs this script and confirms it works.

Add --start-dt, --end-dt argument to the migration script, update

3bc46f0

affected tests.

Remove temporary change which is not needed anymore.

f9c3536

Kami added 3 commits May 12, 2021 22:26

Update formatting.

5f24607

Support special now notation for --end-dt.

613bf2d

Use isotime.parse() since it supports more formats.

9d8d981

winem reviewed May 12, 2021

View reviewed changes

winem approved these changes May 12, 2021

View reviewed changes

Remove unused constant.

ff37b4f

Kami merged commit 9402197 into master May 13, 2021

Kami deleted the db_field_type_migration_scripts branch May 13, 2021 08:05

arm4b mentioned this pull request Aug 11, 2021

Bump eventlet from 0.30.2 to 0.31.0 #5257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add migration script for migrating old execution field data to the new format #5255

Add migration script for migrating old execution field data to the new format #5255

Kami commented May 7, 2021 •

edited

Loading

Kami commented May 9, 2021 •

edited

Loading

Kami commented May 9, 2021

Kami commented May 12, 2021 •

edited

Loading

Kami commented May 12, 2021

guzzijones commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021

guzzijones commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021 •

edited

Loading

Kami commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021

Kami commented May 12, 2021

winem May 12, 2021

Kami May 13, 2021

Add migration script for migrating old execution field data to the new format #5255

Add migration script for migrating old execution field data to the new format #5255

Conversation

Kami commented May 7, 2021 • edited Loading

Implementation Details

TODO

Kami commented May 9, 2021 • edited Loading

Kami commented May 9, 2021

Kami commented May 12, 2021 • edited Loading

Kami commented May 12, 2021

guzzijones commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021

guzzijones commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021 • edited Loading

Kami commented May 12, 2021

guzzijones commented May 12, 2021

Kami commented May 12, 2021

Kami commented May 12, 2021

winem May 12, 2021

Choose a reason for hiding this comment

Kami May 13, 2021

Choose a reason for hiding this comment

Kami commented May 7, 2021 •

edited

Loading

Kami commented May 9, 2021 •

edited

Loading

Kami commented May 12, 2021 •

edited

Loading

Kami commented May 12, 2021 •

edited

Loading