Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
Though many attempts have been made in blind superresolution to restore low-resolution images with unknown and ...
Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
Recent blind super-resolution (SR) methods typically consist of two branches, one for degradation prediction and ...
Dynamic Token Normalization improves Vision Transformers
Vision Transformer (ViT) and its variants (eg, Swin, PVT) have achieved great success in various computer vision tasks, owing to their ...
Uncertainty Modeling for Out-of-Distribution Generalization
Though remarkable progress has been achieved in various vision tasks, deep neural networks still suffer obvious performance degradation ...
Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning
Could we automatically derive the score of a piano accommodation based on the audio of a pop song? This is the ...
Hot-Refresh Model Upgrades with Regression-Free Compatible Training in Image Retrieval
The task of hot-refresh model upgrades of image retrieval systems plays an essential role in the industry but ...
Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
Finding relevant moments and highlights in videos according to natural language queries is a natural and ...
A Bi-lingual Benchmark for Text Segmentation in the Wild
As a prerequisite of many text-related tasks such as text erasing and text style transfer, text segmentation arouses more and more attention ...
Object-aware Video-language Pre-training for Retrieval
Recently, by introducing large-scale dataset and strong transformer network, video-language pre-training has shown great success especially for ...
Bridging Video-text Retrieval with Multiple Choice Questions
Pre-training a model to learn transferable video-text representation for retrieval has attracted a lot of attention in recent years. ...
Active Learning for Open-set Annotation
Existing active learning studies typically work in the closed-set setting by assuming that all data examples to be labeled are drawn from known classes. ...
Temporally Efficient Vision Transformer for Video Instance Segmentation
Recently vision transformer has achieved tremendous success on image-level visual recognition tasks. To effectively and ...
Universal Backward-Compatible Representation Learning
Conventional model upgrades for visual search systems require offline refresh of gallery features by feeding gallery images into new models ...
A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion
Typically, singing voice conversion (SVC) depends on an embedding vector, extracted from either a speaker lookup ...
PC-Dance: Posture-controllable Music-driven Dance Synthesis
Music-driven dance synthesis is a task to generate high-quality dance according to the music given by the user, which has promising ...
RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization
This paper explores training efficient VGG-style super-resolution (SR) ...
MM-RealSR: Metric Learning based Interactive Modulation for Real-World Super-Resolution
Interactive image restoration aims to restore images by adjusting several controlling coefficients, which ...
VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder
Although generative facial prior and geometric prior have recently demonstrated high-quality results for blind face ...
Predicting Model Transferability in a Self-challenging Fisher Space
This paper addresses an important problem of ranking the pre-trained deep neural networks and screening the most transferable ones ...
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
Image BERT pre-training with masked image modeling (MIM) becomes a popular practice to cope with self-supervised representation ...