Skip to content

🤖 A collection of AI agents includes research papers, blogs, and products focused on developing autonomous systems.

Notifications You must be signed in to change notification settings

algomatic-inc/awesome-ai-agents-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Awesome AI Agents Guide

Algomatic PR Welcome License: MIT

The awesome-ai-agent-guide repository is an initial effort to put together a comprehensive list of AI/LLM Agents focused on research and products.

Please note, this repository is a voluntary project and does not list all existing AI agents. This repository is a work in progress, and items are being added gradually. Contributions are welcome, so we look forward to your proactive PRs!

Disclaimer
・ If there are any errors in interpretation or quotations, please let us know.
・ Please be sure to refer to the licenses and terms of use when using.

🌟 If this was helpful, we’d love it if you followed us!: @AlgomaticJP

Overview


AI Agent

We are not sure about the exact definition, but one commonly known definition is the following.

An autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future.
--- Franklin and Graesser (1997)

Survey

  • 2023.09 - Xi et al., The Rise and Potential of Large Language Model Based Agents: A Survey [arXiv]
  • 2023.08 - Wang et al., A Survey on Large Language Model Based Autonomous Agents [arXiv]

Workshop or Tutorial

  • 2024.06 - CVPR 2024 Tutorial on Generalist Agent AI [Home]
  • 2024.05 - ICLR 2024 Workshop on LLM Agents [Home]
  • 2023.08 - IJCAI 2023 Symposium on Large Language Models (LLM 2023) [Home]

Misc

  • 2025.XX - Micheal Lanham, GPT Agents in Action [manning]
  • 2024.XX - LangChain, Go autonomous 
with LangChain Agents [langchain]
  • 2024.06 - DeepLearning.AI, AI Agents in LangGraph [deeplearning.ai]
  • 2024.05 - DeepLeanring.AI, AI Agentic Design Patterns with AutoGen [deeplearning.ai]
  • 2024.05 - DeepLearning.AI, Multi AI Agent Systems with crewAI [deeplearning.ai]
  • 2024.05 - DeepLearning.AI, Building Agentic RAG with LlamaIndex [deeplearning.ai]
  • 2024.05 - Yohei Nakajima, Future of Autonomous Agents [X's broadcast]
  • 2024.03 - Andrew Ng, Agentic Design Patterns Part 1 [deeplearning.ai]
  • 2024.02 - Vincent Koc, Generative AI Design Patterns: A Comprehensive Guide [Medium]
  • 2023.12 - OpenAI, Practices for Governing Agentic AI Systems [OpenAI]
  • 2023.12 - Victor Dibia, Multi-Agent LLM Applications | A Review of Current Research, Tools, and Challenges [newsletter]
  • 2023.11 - Tanay Varshney, Introduction to LLM Agents [NVIDIA Blog]
  • 2023.06 - Lilian Weng, LLM Powered Autonomous Agents [Lil'Log]
  • Prompt Engineering Guide, LLM Agents [promptingguide.ai]

Libraries

Planning and Reasoning

Survey

  • 2024.02 - Huang et al., Understanding the planning of LLM agents: A survey [arXiv]
  • 2023.12 - Sun et al., A Survey of Reasoning with Foundation Models [arXiv]
  • 2023.03 - Yang et al., Foundation Models for Decision Making: Problems, Methods, and Opportunities [arXiv]

Papers

  • 2024 - Chen et al., When is Tree Search Useful for LLM Planning? It Depends on the Discriminator (ACL) [arXiv]

  • 2024 - Kim et al., An LLM Compiler for Parallel Function Calling (ICML) [arXiv]

  • 2024 - Ning et al., Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation (ICLR) [openreview]

  • 2023 - Hao et al., Reasoning with Language Model is Planning with World Model (EMNLP) [aclanthology]

  • 2023 - Khot et al., Decomposed Prompting: A Modular Approach for Solving Complex Tasks [openreview]

  • 2023 - Wang et al., Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models (ACL) [aclanthology]

  • 2024.03 - Zhu et al., KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents [arXiv]

  • 2024.02 - Kambhampati et al., LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks [arXiv]

  • 2023.08 - Dagan et al., Dynamic Planning with a LLM [arXiv]

  • 2023.05 - Xu et al., ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models [arXiv]

  • 2023.05 - Brahman et al., PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning [arXiv]

Action

WIP

Survey

  • 2024.03 - Wang et al., What Are Tools Anyway? A Survey from the Language Model Perspective [arXiv]
  • 2023.04 - Qin et al., Tool Learning with Foundation Models [arXiv]

Papers

  • 2024.03 - Wang et al., LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [arXiv]

  • 2024.02 - Das et al., MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [arXiv]

  • 2024.02 - Du et al., AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls [arXiv]

  • 2024.02 - Mekala et al., TOOLVERIFIER: Generalization to New Tools via Self-Verification [arXiv]

  • 2024.01 - Shen et al., Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [arXiv]

  • 2024.01 - Gao et al., Efficient Tool Use with Chain-of-Abstraction Reasoning [arXiv]

  • 2024.01 - Yuan et al., EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [arXiv]

  • 2023.12 - NexusRaven-V2: Surpassing GPT-4 for Zero-shot Function Calling [Nexusflow]

  • 2023.08 - Hsieh et al., Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models [arXiv]

  • 2023.07 - Qin et al., ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [arXiv]

  • 2023.06 - Song et al., RestGPT: Connecting Large Language Models with Real-World RESTful APIs [arXiv]

  • 2023.06 - Tang et al., ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases [arXiv]

  • 2023.05 - Cai et al., Large Language Models as Tool Makers [arXiv]

  • 2023.05 - Patil et al., Gorilla: Large Language Model Connected with Massive APIs [arXiv]

  • 2023.03 - Shen et al., HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face [arXiv]

  • 2024 - Basu et al., API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs (ACL) [arXiv]

  • 2024 - Qiao et al., Making Language Models Better Tool Learners with Execution Feedback (NAACL) [arXiv]

  • 2024 - Zheng et al., ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval (LREC-COLING) [arXiv]

  • 2024 - Xu et al., On the Tool Manipulation Capability of Open-sourced Large Language Models (ICLR) [openreview]

  • 2024 - Gou et al., ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving (ICLR) [openreview]

  • 2024 - Li et al., Tool-Augmented Reward Modeling [openreview]

  • 2023 - Li et al., API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs (EMNLP) [aclanthology]

  • 2023 - Jacovi et al., A Comprehensive Evaluation of Tool-Assisted Generation Strategies (EMNLP) [aclanthology]

  • 2023 - Chen et al., ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models (EMNLP) [aclanthology]

  • 2023 - Schick et al., Toolformer: Language Models Can Teach Themselves to Use Tools (NeurIPS) [openreview]

  • 2023 - Hao et al., ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings (NeurIPS) [openreview]

  • 2023 - Srinivasan et al., NexusRaven: a commercially-permissive Language Model for function calling (NeurIPS) [openreview]

  • 2023 - Yang et al., GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction (NeurIPS) [openreview]

  • 2022 - Parisi et al., TALM: Tool Augmented Language Models [arXiv]

Memory

Survey

Papers

  • 2024 - Na et al., Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning (ICLR) [openreview]

Reflection

Survey

Papers

Multi-modal

Survey

  • 2024.01 - Agent AI: Surveying the Horizons of Multimodal Interaction [arXiv]

Papers

  • 2023 - Hu et al., AVIS: Autonomous Visual Information Seeking with Large Language Model Agent (NeurIPS) [openreview]

Multi Agent

Survey

  • 2024.02 - Han et al., LLM Multi-Agent Systems: Challenges and Open Problems [arXiv]
  • 2024.01 - Cheng et al., Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects [arXiv]

Papers

  • 2024 - Zhang et al., ProAgent: Building Proactive Cooperative Agents with Large Language Models (AAAI) [arXiv]

  • 2024 - Zhang et al., Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View (ICLR) [openreview]

  • 2024 - Du et al., Improving Factuality and Reasoning in Language Models through Multiagent Debate (ICLR) [openreview]

  • 2024 - Chen et al., AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors (ICLR) [openreview]

  • 2024 - Chen et al., AutoAgents: A Framework for Automatic Agent Generation (ICLR) [openreview]

  • 2024 - Wang et al., Adapting LLM Agents Through Communication (ICLR) [openreview]

  • 2023 - Xiong et al., Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate (EMNLP) [aclanthology]

  • 2024.04 - Yue et al., MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education [arXiv]

  • 2024.02 - Wang et al., Multi-Agent Collaboration Framework for Recommender Systems [arXiv]

  • 2024.02 - Li et al., Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements [arXiv]

  • 2024.02 - Fang et al., A Multi-Agent Conversational Recommender System [arXiv]

  • 2023.07 - Nascimento et al., Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems [arXiv]

  • 2023.07 - Wang et al., Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration [arXiv]

  • 2023.05 - Liang et al., Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [arXiv]

Application

Web Navigation

  • 2024 - Wang et al., Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception (ICLR) [openreview]

  • 2024 - Gur et al., A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis (ICLR) [openreview]

  • 2024 - Furuta et al., Multimodal Web Navigation with Instruction-Finetuned Foundation Models (ICLR) [openreview]

  • 2024 - Zhang et al., You Only Look at Screens: Multimodal Chain-of-Action Agents (ICLR) [openreview]

  • 2024 - Zhou et al., WebArena: A Realistic Web Environment for Building Autonomous Agents (ICLR) [openreview]

  • 2024 - AutoDroid: LLM-powered Task Automation in Android (MobiCom) [arXiv]

  • 2023 - Ma et al., LASER: LLM Agent with State-Space Exploration for Web Navigation (NeurIPS) [openreview]

  • 2023 - Deng et al., Mind2Web: Towards a Generalist Agent for the Web (NeurIPS) [arXiv]

  • 2022 - Yao et al., WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents (NeurIPS) [neurips.cc]

  • 2020 - Li et al., Mapping Natural Language Instructions to Mobile UI Action Sequences (ACL) [aclanthology]

  • 2018 - Liu et al., Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration (ICLR) [openreview]

  • 2017 - Shi et al., World of Bits: An Open-Domain Platform for Web-Based Agents (ICML) [PMLR]

  • 2024.06 - Wang et al., Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration [arXiv]

  • 2024.05 - Tan et al., Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study [arXiv]

  • 2024.05 - Rawles et al., AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents [arXiv]

  • 2024.04 - Zhang et al., MMInA: Benchmarking Multihop Multimodal Internet Agents [arXiv]

  • 2024.04 - Lai et al., AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent [arXiv]

  • 2024.04 - Huang et al., AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation [arXiv]

  • 2024.03 - Drouin et al., WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [arXiv]

  • 2024.02 - Zhang et al., UFO: A UI-Focused Agent for Windows OS Interaction [arXiv]

  • 2024.02 - Lù et al., WebLINX: Real-World Website Navigation with Multi-Turn Dialogue [arXiv]

  • 2024.02 - Baechler et al., ScreenAI: A Vision-Language Model for UI and Infographics Understanding [arXiv]

  • 2024.01 - Zheng et al., GPT-4V(ision) is a Generalist Web Agent, if Grounded [arXiv]

  • 2023.12 - Zhang et al., AppAgent: Multimodal Agents as Smartphone Users [arXiv]

  • 2023.11 - Yan et al., GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation [arXiv]

  • 2023.11 - Furuta et al., Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web [arXiv]

  • 2023.07 - Rawles et al., Android in the Wild: A Large-Scale Dataset for Android Device Control [arXiv]

  • 2022.02 - Humphreys et al., A Data-Driven Approach for Learning to Control Computers [arXiv]

  • 2021.07 - Nakano et al., WebGPT: Browser-assisted question-answering with human feedback [arXiv]

  • 2021.05 - Toyama et al., AndroidEnv: A Reinforcement Learning Platform for Android [arXiv]

  • 2021.05 - Shvo et al., AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning [arXiv]

Code Generation and Software Engineer

  • 2024 - Qian et al., ChatDev: Communicative Agents for Software Development (ACL) [arXiv]

  • 2024 - Wang et al., Executable Code Actions Elicit Better LLM Agents (ICML) [arXiv]

  • 2024 - Olausson et al., Is Self-Repair a Silver Bullet for Code Generation? (ICLR) [openreview]

  • 2024 - Hong et al., MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (ICLR) [openreview]

  • 2024 - Jimenez et al., SWE-bench: Can Language Models Resolve Real-world Github Issues? (ICLR) [openreview]

  • 2023 - Madaan et al., Self-Refine: Iterative Refinement with Self-Feedback (NeurIPS) [openreview]

  • 2023 - Li et al., CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society (NeurIPS) [arXiv]

  • 2023 - Dibia, LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models (ACL) [aclanthology]

  • 2023 - Zhang et al., Self-Edit: Fault-Aware Code Editor for Code Generation (ACL) [openreview]

  • 2024.05 - Yang et al., SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [arXiv]

  • 2024.04 - Zhang et al., AutoCodeRover: Autonomous Program Improvement [arXiv]

  • 2024.03 - Jain et al., LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code [arXiv]

  • 2024.03 - Tufano et al., AutoDev: Automated AI-Driven Development [arXiv]

  • 2024.03 - Tao et al., MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution [arXiv]

  • 2024.02 - Hong et al., Data Interpreter: An LLM Agent For Data Science [arXiv]

  • 2024.02 - Zheng et al., OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement [arXiv]

  • 2023.12 - Huang et al., AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation [arXiv]

  • 2023.11 - Qiao et al., TaskWeaver: A Code-First Agent Framework [arXiv]

  • 2023.10 - Huang et al., MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation [arXiv]

  • 2023.06 - Jiang et al., SelfEvolve: A Code Evolution Framework via Large Language Models [arXiv]

  • 2023.04 - Ma et al., Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System [arXiv]

  • 2022.11 - Lai et al., DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation [arXiv]

Recruitment Information

Algomatic creates generative AI-native businesses across various fields.
We are looking for colleagues with diverse skills.

Learn More

About

🤖 A collection of AI agents includes research papers, blogs, and products focused on developing autonomous systems.

Topics

Resources

Stars

Watchers

Forks