Cryptocurrency, Bitcoin, and Behind Dark Web Technology
0
MILES: Visual BERT Pre-training with Injected Language Semantics
0

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval Dominant pre-training work for video-text retrieval mainly adopt the ...

0
AnimeSR: Learning Real-World Super-Resolution
0

AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos This paper studies the problem of real-world video super-resolution (VSR) for ...

0
DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes
0

DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes Modeling dynamic scenes is important for many applications such as virtual reality and ...

0
Snowflake Point Deconvolution for Point Cloud Completion
0

Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer Most existing point cloud completion methods suffer from the ...

0
Mitigating Artifacts in Real-World Video Super-Resolution Models
0

Mitigating Artifacts in Real-World Video Super-Resolution Models with More Cheap Hidden States and Selective Cross Attention The recurrent structure is a ...

0
Accelerating the Training of Video Super-Resolution Models
0

Accelerating the Training of Video Super-Resolution Models Despite that convolution neural networks (CNN) have recently demonstrated high-quality ...

0
What Does Your Face Sound Like? 3D Face Shape Towards Voice
0

What Does Your Face Sound Like? 3D Face Shape Towards Voice Face-based speech synthesis provides a practical solution to generate voices from human faces. ...

0
Darwinian Model Upgrades: Model Evolving with Selective Compatibility
0

Darwinian Model Upgrades: Model Evolving with Selective Compatibility The traditional model upgrading paradigm for retrieval requires recomputing all gallery ...

0
Video-Text Pre-training with Learned Regions for Retrieval
0

Video-Text Pre-training with Learned Regions for Retrieval Video-Text pre-training aims at learning transferable representations from large-scale video-text ...

0
Integrating Multi-Modal Tags for Video-Text Retrieval
0

Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval Vision-language alignment learning for video-text retrieval arouses a lot of ...

0
Masked Image Modeling with Denoising Contrast
0

Masked Image Modeling with Denoising Contrast Since the development of self-supervised visual representation learning from contrastive learning to masked ...

0
ERBNet: An Effective Representation Based Network
0

ERBNet: An Effective Representation Based Network for Unbiased Scene Graph Generation The scene graph generation (SGG) task has attracted increasing attention ...

0
Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis
0

Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-Unsupervised Pre-training The single-speaker singing voice synthesis (SVS) ...

0
Learning Transferable Spatiotemporal Representations
0

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge Pre-training on large-scale video data has become a common recipe for ...

0
HRDFuse: Monocular 360° Depth Estimation
0

HRDFuse: Monocular 360° Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions Depth estimation from a monocular 360° image ...

0
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval
0

ViLEM: Visual-Language Error Modeling for Image-Text Retrieval Dominant pre-training works for image-text retrieval adopt "dual-encoder" architecture to ...

0
SurfelNeRF: Neural Surfel Radiance Field for Online 3D Reconstruction
0

SurfelNeRF: Neural Surfel Radiance Field for Online 3D Reconstruction and Photorealistic Rendering Online reconstructing and rendering of large-scale indoor ...

0
Unified Video-Language Pre-training
0

All in One: Exploring Unified Video-Language Pre-training Mainstream Video-Language Pre-training models consist of three parts, a video encoder, a text ...

0
Masked Visual Reconstruction in Language Semantic Space
0

Masked Visual Reconstruction in Language Semantic Space Both masked image modeling (MIM) and natural language supervision have facilitated the progress of ...

0
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior
0

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models Recent CLIP-guided 3D optimization methods, eg, DreamFields ...

RxHarun
Logo