-
The John Hopkins University
- Baltimore, Maryland, US
- https://virobo-15.github.io/
- in/amandeep-kumar-24702a182
Stars
This repository gives the official implementation of Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models (WACV 2025)
[MM24] Official codes and datasets for ACM MM24 paper "Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models".
Official implementation of the paper "STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models"
[MICCAI 2024] Official code repository of paper titled "BAPLe: Backdoor Attacks on Medical Foundation Models using Prompt Learning" accepted in MICCAI 2024 conference.
Looking 3D: Anomaly Detection with 2D-3D Alignment (CVPR24)
[ECCV 2024] Official code repository of paper titled "Efficient 3D-Aware Facial Image Editing Via Attribute-Specific Prompt Learning"
Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)
Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".
Official Implementations of "Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space""
MobiLlama : Small Language Model tailored for edge devices
VIROBO-15 / Palmira
Forked from ihdia/Palmira📜 [ICDAR 2021] "A Deep Deformable Network for Instance Segmentation of Dense and Uneven Layouts in Handwritten Manuscripts", S P Sharan, Sowmya Aitha, Amandeep Kumar, Abhishek Trivedi, Aaron August…
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
[CVPRW 2024] Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
Are gradient information useful for pruning of LLMs?
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Official Repository for "Generalizing to Unseen Domains in Diabetic Retinopathy Classification". (WACV-24)
The open-source tool for building high-quality datasets and computer vision models
Official implementation of the paper "FLIP: Cross-domain Face Anti-spoofing with Language Guidance". (ICCV 2023)
SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation (BMVC'23 -- Oral)
Diffusion Models in Medical Imaging (Published in Medical Image Analysis Journal)