-
Carnegie Mellon University
- Pittsburgh
- https://Kashu7100.github.io
- https://scholar.google.com/citations?hl=en&user=TF2LRvMAAAAJ
- https://hub.docker.com/u/kashu98
- https://huggingface.co/Kashu7100
- @kashu_yamazaki
Highlights
- Pro
Lists (8)
Sort Name ascending (A-Z)
Stars
Educational Python library for manipulator motion planning
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Official implementation of RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation
[CoRL 2022] Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
A unified architecture for multimodal multi-task robotic policy learning.
Code for the ICLR 2024 spotlight paper: "Learning to Act without Actions" (introducing Latent Action Policies)
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
Run Segment Anything Model 2 on a live video stream
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
The Most Faithful Implementation of Segment Anything (SAM) in 3D
Chrome extension for clipping arXiv articles to Notion.
[Official Implementation] Acoustic Autoregressive Modeling π₯
Code repo for MultiGripperGrasp Dataset
A real-time implementation of Voice Activity Projection (VAP) is aimed at controlling behaviors of spoken dialogue systems, such as turn-taking.
Model code and data for Situated Instruction Following (SIF)
Official repository of ICLR 2022 paper FILM: Following Instructions in Language with Modular Methods
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"