LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation Recently, diffusion models have achieved great success in image synthesis. ...
Accelerating Vision-Language Pretraining with Free Language Modeling The state of the arts in vision-language pretraining (VLP) achieves exemplary performance ...
OSRT: Omnidirectional Image Super-Resolution with Distortion-aware Transformer Omnidirectional images (ODIs) have obtained lots of research interest for ...
SGAT4PASS:Spherical Geometry=Aware Transformer for PAnoramic Semantic Segmentation As an important and challenging problem in computer vision, PAnoramic ...
Task-Aware Dual-Representation Network for Few-Shot Action Recognition Few-shot action recognition has attracted increasing attention in recent years, but it ...
DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models Image super-resolution (SR) with generative adversarial networks (GAN) ...
Pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation Foundation models have achieved great advances in multi-task ...
NeRF-Texture: Texture Synthesis With Neural Radiance Fields Texture synthesis is a fundamental problem in computer graphics that would benefit various ...
Binary Embedding-based Retrieval at Tencent Large-scale embedding-based retrieval (EBR) is the cornerstone of search-related industrial applications. Given a ...
Prosody Modeling with 3D Visual Information for Expressive Video Dubbing The automatic video dubbing task is proposed to meet personal and industrial demands ...
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection We present an approach to efficiently and effectively adapt a masked ...
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation To replicate the success of text-to-image (T2I) generation, recent works ...
Transfer learning has emerged to be crucial in various computer vision tasks benefiting from the vast availability of pre-trained deep learning models. ...
Video Tagging intends to infer multiple tags spanning relevant content for a given video. Typically, video tags are freely defined and uploaded by a variety of ...
MasaCtrl: Tuning-free Mutual Self-Attention Control for Consistent Image Synthesis and Editing Despite the success in large-scale text-to-image generation and ...
Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video Synthesizing realistic videos according to a given speech is still an open ...
OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution Omnidirectional images (ODIs) have become increasingly popular, as their large ...
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video We introduce HOSNeRF, a novel 360° free-viewpoint rendering method that ...
VMesh: Hybrid Volume-Mesh Representation for Efficient View Synthesis With the emergence of neural radiance fields (NeRFs), view synthesis quality has reached ...
CL-NeRF: Continual Learning of Neural Radiance Fields for Evolving Scene Representation Existing methods for adapting Neural Radiance Fields (NeRFs) to scene ...