-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
124 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# 目录 | ||
|
||
- [1.VideoLLaMB的递归记忆桥接层是如何设计的?其主要优势是什么?](#1.VideoLLaMB的递归记忆桥接层是如何设计的?其主要优势是什么?) | ||
- [2.VideoLLaMB的 SceneTilling 算法是如何工作的?它在视频分割和流式字幕生成中有何优势?](#2.VideoLLaMB的SceneTilling算法是如何工作的?它在视频分割和流式字幕生成中有何优势?) | ||
- [3.VideoLLaMB在NIAVH基准上的表现如何?其独特之处体现在哪些方面?](#3.VideoLLaMB在NIAVH基准上的表现如何?其独特之处体现在哪些方面?) | ||
|
||
|
||
<h2 id="1.VideoLLaMB的递归记忆桥接层是如何设计的?其主要优势是什么?">1.VideoLLaMB的递归记忆桥接层是如何设计的?其主要优势是什么?</h2> | ||
|
||
**VideoLLaMB**是一种新颖的长视频理解框架,利用带有递归内存 token 的内存桥接层对 100% 的视频内容进行编码,而不会丢弃关键的视觉提示。 | ||
|
||
VideoLLaMB的递归记忆桥接层通过在桥接层内集成递归记忆令牌来增强线性层的记忆能力。具体设计如下: | ||
|
||
- **记忆token**:在每个视频段前缀固定数量的记忆令牌,表示为[mi;si],其中mi表示记忆令牌,si表示视频段。 | ||
|
||
- **自注意力操作**:对包含记忆令牌的视频段应用标准的自注意力操作,生成更新后的记忆令牌和视觉表示,公式如下: | ||
![](imgs/VideoLLaMB的自注意力操作.png) | ||
|
||
- **递归处理**:这个过程递归进行,遍历语义视频段的同时更新记忆 token,最终生成视频序列的压缩视觉摘要。 | ||
|
||
**其主要优势包括:** | ||
|
||
- **增强记忆能力**:通过递归记忆令牌,桥接层能够增强对视频内容的记忆能力。 | ||
- **信息压缩**:记忆令牌能够在保留当前视频场景的同时压缩过去视频的信息,提高计算效率。 | ||
- **缓解梯度消失**:通过记忆缓存和检索机制,能够有效缓解梯度消失问题,保留长期依赖信息。 | ||
|
||
|
||
<h2 id="2.VideoLLaMB的 SceneTilling 算法是如何工作的?它在视频分割和流式字幕生成中有何优势?">2.VideoLLaMB的 SceneTilling 算法是如何工作的?它在视频分割和流式字幕生成中有何优势?</h2> | ||
|
||
SceneTilling算法通过以下步骤实现视频分割: | ||
|
||
- 余弦相似度计算:计算相邻帧对之间的余弦相似度,生成相似度分数序列。 | ||
- 深度分数计算:根据相似度分数计算每个点的深度分数,公式为: | ||
![](imgs/VideoLLaMB的深度分数计算公式.png) | ||
|
||
- **分割阈值设置**:根据深度分数的分位数设置分割阈值,选择超过阈值的深度分数对应的分割点,将视频分割成多个语义段。 | ||
|
||
**在流式字幕生成中,SceneTilling算法的优势包括:** | ||
|
||
- **自动字幕结束点预测**:无需特殊训练令牌即可自动识别流式视频的字幕结束点。 | ||
- **场景变化检测**:能够有效检测视频中的场景变化,并生成相应的事件字幕。 | ||
- **无需额外训练**:利用视频的语义分割结果,无需额外的训练数据即可实现流式字幕生成。 | ||
|
||
|
||
<h2 id="3.VideoLLaMB在NIAVH基准上的表现如何?其独特之处体现在哪些方面?">3.VideoLLaMB在NIAVH基准上的表现如何?其独特之处体现在哪些方面?</h2> | ||
|
||
**在NIAVH基准上,VideoLLaMB表现出色,主要得益于其独特的设计和多模态查询能力:** | ||
|
||
- 多模态查询支持:NIAVH支持文本、图像和视频等多种模态的查询,能够全面评估模型在长视频中识别特定内容的能力。 | ||
- 高效的视频理解:VideoLLaMB 通过递归记忆桥接层和SceneTilling算法,能够在各种视频长度下准确检索到正确的图像针。 | ||
- 对比其他方法:与现有的自适应池化、位置外推结合采样等方法相比,VideoLLaMB在处理长视频时表现出更高的效率和更低的成本。 | ||
|
||
**其独特之处体现在:** | ||
|
||
- **记忆缓存与检索**:通过记忆缓存和检索机制,VideoLLaMB 能够有效保留先前的状态记忆,缓解梯度消失问题。 | ||
- **语义分割**:SceneTilling 算法将视频分割成独立的语义单元,确保语义完整性和场景变化的准确性。 | ||
- **综合性能**:在长视频问答、自我中心规划和帧检索等多个任务上,VideoLLaMB 均表现出显著优于现有方法的性能。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# 目录 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# 目录 |