Stars
Awesome Quadrupedal Robots
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
The Paper List on Data Contamination for Large Language Models Evaluation.
[ICLR 2023] SQA3D for embodied scene understanding and reasoning
A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes
😎 Awesome LIDAR list. The list includes LIDAR manufacturers, datasets, point cloud-processing algorithms, point cloud frameworks and simulators.
[CVPR 2022] Joint hand motion and interaction hotspots prediction from egocentric videos
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'
[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"
[EMNLP 2023 Demo] CLEVA: Chinese Language Models EVAluation Platform
✨✨Latest Advances on Multimodal Large Language Models
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts…
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
EMNLP 2022: ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization
Vision-Language Pre-training for Image Captioning and Question Answering
MAttNet: Modular Attention Network for Referring Expression Comprehension
re-implementation of speaker-listener-reinforcer
Generating Easy-to-Understand Referring Expressions for Target Identifications