TempFormer: Temporally Consistent Transformer for Video Denoising

  Рет қаралды 5,813

DisneyResearchHub

DisneyResearchHub

Жыл бұрын

Video denoising is a low-level vision task that aims to restore high-quality videos from noisy content. Vision Transformer (ViT) is a new machine learning architecture that has shown promising performance on both high-level and low-level image tasks, e.g., object detection, classification, and image restoration in the past year. In this paper, we propose a modified ViT architecture for video processing tasks, introducing a new training strategy and loss function to enhance temporal consistency without compromising spatial quality. Specifically, we propose an efficient hybrid Transformer-based model, TempFormer, which composes SpatioTemporal Transformer Blocks (STTB) and 3D convolutional layers. The proposed STTB learns the temporal information between neighboring frames implicitly by utilizing the proposed Joint Spatio-Temporal Mixer module for attention calculation and feature aggregation in each ViT block. Moreover, existing methods suffer from temporal inconsistency artifacts that are problematic in practical cases and distracting to the viewers. We propose a sliding block strategy with recurrent architecture, and use a new loss term, Overlap Loss, to alleviate the flickering between adjacent frames. Our method produces state-of-the-art spatio-temporal denoising quality with significantly improved temporal coherency and requires less computational resources to achieve comparable denoising quality with competing methods.
Publication link: studios.disneyresearch.com/20...

Пікірлер
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 225 М.
Did you believe it was real? #tiktok
00:25
Анастасия Тарасова
Рет қаралды 52 МЛН
LOVE LETTER - POPPY PLAYTIME CHAPTER 3 | GH'S ANIMATION
00:15
Was ist im Eis versteckt? 🧊 Coole Winter-Gadgets von Amazon
00:37
SMOL German
Рет қаралды 36 МЛН
Production Ready Face Re Aging for Visual Effects
6:48
DisneyResearchHub
Рет қаралды 246 М.
Developing a Pick-and-Place Robotic Arm
6:35
Kai Nakamura
Рет қаралды 8 М.
The U-Net (actually) explained in 10 minutes
10:31
rupert ai
Рет қаралды 87 М.
MoRF: Morphable Radiance Fields for Multiview Neural Head Modeling
10:09
DisneyResearchHub
Рет қаралды 11 М.
How are memories stored in neural networks? | The Hopfield Network #SoME2
15:14
Transformer Neural Networks Derived from Scratch
18:08
Algorithmic Simplicity
Рет қаралды 128 М.
Secrets Hidden in Images (Steganography) - Computerphile
13:14
Computerphile
Рет қаралды 1,2 МЛН
Facial Animation with Disentangled Identity and Motion using Transformers
14:35
Tag her 🤭💞 #miniphone #smartphone #iphone #samsung #fyp
0:11
Pockify™
Рет қаралды 32 МЛН
تجربة أغرب توصيلة شحن ضد القطع تماما
0:56
صدام العزي
Рет қаралды 23 МЛН
1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !
23:20
GoldenBurst
Рет қаралды 1,6 МЛН
⚡️Супер БЫСТРАЯ Зарядка | Проверка
1:00
Здесь упор в процессор
18:02
Рома, Просто Рома
Рет қаралды 223 М.
Красиво, но телефон жаль
0:32
Бесполезные Новости
Рет қаралды 219 М.