Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Though many attempts have been made in blind superresolution to restore ...
Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution Recent blind super-resolution (SR) methods typically consist of two ...
Dynamic Token Normalization improves Vision Transformers Vision Transformer (ViT) and its variants (eg, Swin, PVT) have achieved great success in various ...
Uncertainty Modeling for Out-of-Distribution Generalization Though remarkable progress has been achieved in various vision tasks, deep neural networks still ...
Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning Could we automatically derive the score of a piano accommodation based on the ...
Hot-Refresh Model Upgrades with Regression-Free Compatible Training in Image Retrieval The task of hot-refresh model upgrades of image retrieval systems plays ...
Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection Finding relevant moments and highlights in videos according to ...
A Bi-lingual Benchmark for Text Segmentation in the Wild As a prerequisite of many text-related tasks such as text erasing and text style transfer, text ...
Object-aware Video-language Pre-training for Retrieval Recently, by introducing large-scale dataset and strong transformer network, video-language ...
Bridging Video-text Retrieval with Multiple Choice Questions Pre-training a model to learn transferable video-text representation for retrieval has attracted ...
Active Learning for Open-set Annotation Existing active learning studies typically work in the closed-set setting by assuming that all data examples to be ...
Temporally Efficient Vision Transformer for Video Instance Segmentation Recently vision transformer has achieved tremendous success on image-level visual ...
 Universal Backward-Compatible Representation Learning Conventional model upgrades for visual search systems require offline refresh of gallery features by ...
A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion Typically, singing voice conversion (SVC) depends on an embedding ...
PC-Dance: Posture-controllable Music-driven Dance Synthesis Music-driven dance synthesis is a task to generate high-quality dance according to the music given ...
RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization This paper explores training ...
MM-RealSR: Metric Learning based Interactive Modulation for Real-World Super-Resolution Interactive image restoration aims to restore images by adjusting ...
VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder Although generative facial prior and geometric prior have recently ...
Predicting Model Transferability in a Self-challenging Fisher Space This paper addresses an important problem of ranking the pre-trained deep neural networks ...
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training Image BERT pre-training with masked image modeling (MIM) becomes a popular practice to cope ...
- « Previous Page
- 1
- …
- 12
- 13
- 14
- 15
- 16
- …
- 18
- Next Page »