Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vyokky/dev Agent and automator modularization + Learning for demonstration #50

Merged
merged 49 commits into from
Apr 8, 2024

Conversation

vyokky
Copy link
Contributor

@vyokky vyokky commented Apr 2, 2024

  1. Extract agent module for HostAgent and AppAgent, intergrate into main flow.
  2. Build automator module for AppAgent, for UI controls and API.
  3. Learning from human demonstration (Yun Hao)

@vyokky vyokky requested a review from kangyu April 2, 2024 13:25
"""

user_action_data = re.search(
r'<UserActionData>(.*?)</UserActionData>', content, re.DOTALL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many hard-code implementation like this. Need some regression test to secure, in case the PSR changes its format in future release.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the regression test. Will discuss offline about the implementation.

UFO can learn from user-provided demonstrations for specific requests and use them as references in the future when encountering similar tasks. Providing clear demonstrations along with precise requests can significantly enhance UFO's performance.

## How to Enable and Config this Function ❓
You can enable this function by setting the following configuration:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to mention which configuration file as we have two in UFO to set these configs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will update.

screenshot = self.get_screenshot(screenshot_file_name)
step_key = f"step_{int(action_number) - 1}"

step = DemonstrationStep(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the fields may not be extracted correctly in some cases (e.g., PSR changes it format). May need sanity check on the values.


return record

def find_boundary(self) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are private methods, could use private function definition.

Suggested change
def find_boundary(self) -> str:
def __find_boundary(self) -> str:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will update

summary = dict()
summary["example"] = {}
for key in ["Observation", "Thought", "ControlLabel", "ControlText", "Function", "Args", "Status", "Plan", "Comment"]:
summary["example"][key] = response_json.get(key, "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we use empty string if the key is not in the set? May add some comments on this.

self.logger = self.initialize_logger(self.log_path, "response.log")
self.request_logger = self.initialize_logger(self.log_path, "request.log")

self.app_selection_prompter = ApplicationAgentPrompter(configs["APP_AGENT"]["VISUAL_MODE"], configs["APP_SELECTION_PROMPT"], configs["APP_SELECTION_EXAMPLE_PROMPT"], configs["API_PROMPT"])
self.act_selection_prompter = ActionAgentPrompter(configs["ACTION_AGENT"]["VISUAL_MODE"], configs["ACTION_SELECTION_PROMPT"], configs["ACTION_SELECTION_EXAMPLE_PROMPT"], configs["API_PROMPT"])
self.HostAgent = HostAgent("HostAgent", configs["HOST_AGENT"]["VISUAL_MODE"], configs["HOSTAGENT_PROMPT"], configs["HOSTAGENT_EXAMPLE_PROMPT"], configs["API_PROMPT"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have some way to protect the value errors in config class. In case some config items are missing.

@@ -81,113 +74,118 @@ def process_application_selection(self):
"""

# Code for selecting an action
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is too long to read. May split it into several sub functions, to reveal the logic steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Will update the func and break it into different sub-steps.

ufo/module/flow.py Outdated Show resolved Hide resolved

if self.safe_guard(action, control_text):
# Execute the action
results = ui_controller.execution(function_call, args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UI controller to execute function call is wired. These function calls are actually operations instead of API calls, right? What about use different name here?

@@ -1,31 +1,28 @@
# Copyright (c) Microsoft Corporation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think this file is too long. Some functions could be split and put into different files/classes. Otherwise, it is not easy to read and follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Perhaps will break it down for next update.

@vyokky vyokky merged commit 1f6312e into pre-release Apr 8, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants