Merge pull request ‘两篇Transformer’ (#35) from songhui18/CVPR2021:master into master
1. Scaled-YOLOv4: Scaling Cross Stage Partial Network
2. You Only Look One-level Feature
3. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
4. End-to-End Object Detection with Fully Convolutional Network
5. Dynamic Head: Unifying Object Detection Heads with Attentions
6. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
7. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
8. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators
作者单位: 威斯康星大学, 谷歌
Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Xiong_MobileDets_Searching_for_Object_Detection_Architectures_for_Mobile_Accelerators_CVPR_2021_paper.pdf
Code: https://github.com/tensorflow/models/tree/master/research/object_detection
9. Tracking Pedestrian Heads in Dense Crowd
10. Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
11. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
12. IQDet: Instance-wise Quality Distribution Sampling for Object Detection
13. Multi-Scale Aligned Distillation for Low-Resolution Detection
14. Adaptive Class Suppression Loss for Long-Tail Object Detection
作者单位: 中科院, 国科大, ObjectEye, 北京大学, 鹏城实验室, Nexwise
Paper: https://arxiv.org/abs/2104.00885
Code: https://github.com/CASIA-IVA-Lab/ACSL
15. VarifocalNet: An IoU-aware Dense Object Detector
16. OTA: Optimal Transport Assignment for Object Detection
作者单位: 早稻田大学, 旷视科技
Paper: https://arxiv.org/abs/2103.14259
Code: https://github.com/Megvii-BaseDetection/OTA
17. Distilling Object Detectors via Decoupled Features
18. Robust and Accurate Object Detection via Adversarial Learning
作者单位: 谷歌, UCLA, UCSC
Paper: https://arxiv.org/abs/2103.13886
Code: None
19. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
20. Multiple Instance Active Learning for Object Detection
21. Towards Open World Object Detection
22. RankDetNet: Delving Into Ranking Constraints for Object Detection
23. Dense Label Encoding for Boundary Discontinuity Free Rotation Detection
24. ReDet: A Rotation-equivariant Detector for Aerial Object Detection
作者单位: 武汉大学
Paper: https://arxiv.org/abs/2103.07733
Code: https://github.com/csuhan/ReDet
25. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection
26. Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss
作者单位: 复旦大学, 同济大学, 浙江大学
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Accurate_Few-Shot_Object_Detection_With_Support-Query_Mutual_Guidance_and_Hybrid_CVPR_2021_paper.html
27. Adaptive Image Transformer for One-Shot Object Detection
28. Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection
29. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
作者单位: 卡内基梅隆大学(CMU)
Paper: https://arxiv.org/abs/2103.01903
30. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding
31. Hallucination Improves Few-Shot Object Detection
32. Few-Shot Object Detection via Classification Refinement and Distractor Retreatment
33. Generalized Few-Shot Object Detection Without Forgetting
34. Transformation Invariant Few-Shot Object Detection
作者单位: 华为诺亚方舟实验室
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Transformation_Invariant_Few-Shot_Object_Detection_CVPR_2021_paper.html
35. UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation
36. Beyond Max-Margin: Class Margin Equilibrium for Few-Shot Object Detection
37. Points As Queries: Weakly Semi-Supervised Object Detection by Points]
38. Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection
39. Positive-Unlabeled Data Purification in the Wild for Object Detection
作者单位: 华为诺亚方舟实验室, 悉尼大学, 北京大学
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Positive-Unlabeled_Data_Purification_in_the_Wild_for_Object_Detection_CVPR_2021_paper.html
40. Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection
41. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
42. Humble Teachers Teach Better Students for Semi-Supervised Object Detection
43. Interpolation-Based Semi-Supervised Learning for Object Detection
44. Domain-Specific Suppression for Adaptive Object Detection
45. MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
46. Unbiased Mean Teacher for Cross-Domain Object Detection
47. I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors
48. There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge
49. Instance Localization for Self-supervised Detection Pretraining
50. Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection
51. DAP: Detection-Aware Pre-training with Weak Supervision
52. Open-Vocabulary Object Detection Using Captions
作者单位:Snap, 哥伦比亚大学
Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/html/Zareian_Open-Vocabulary_Object_Detection_Using_Captions_CVPR_2021_paper.html
Code: https://github.com/alirezazareian/ovr-cnn
53. Depth From Camera Motion and Object Detection
作者单位: 密歇根大学, SIAI
Paper: https://arxiv.org/abs/2103.01468
Code: https://github.com/griffbr/ODMD
Dataset: https://github.com/griffbr/ODMD
54. Unsupervised Object Detection With LIDAR Clues
55. GAIA: A Transfer Learning System of Object Detection That Fits Your Needs
56. General Instance Distillation for Object Detection
57. AQD: Towards Accurate Quantized Object Detection
58. Scale-Aware Automatic Augmentation for Object Detection
59. Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection
60. Class-Aware Robust Adversarial Training for Object Detection
61. Improved Handling of Motion Blur in Online Object Detection
62. Multiple Instance Active Learning for Object Detection
63. Neural Auto-Exposure for High-Dynamic Range Object Detection
64. Generalizable Pedestrian Detection: The Elephant in the Room
65. Neural Auto-Exposure for High-Dynamic Range Object Detection
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
Paper: https://arxiv.org/abs/2104.14545
Code: https://github.com/researchmm/LightTrack
Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark
Homepage: https://sites.google.com/view/langtrackbenchmark/
Paper: https://arxiv.org/abs/2103.16746
Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit
Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
Graph Attention Tracking
Rotation Equivariant Siamese Networks for Tracking
Track to Detect and Segment: An Online Multi-Object Tracker
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack
Transformer Tracking
Tracking Pedestrian Heads in Dense Crowd
Multiple Object Tracking with Correlation Learning
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
Learning a Proposal Classifier for Multiple Object Tracking
1. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation
作者单位: Facebook AI, 巴伊兰大学, 特拉维夫大学
Homepage: https://nirkin.com/hyperseg/
Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Nirkin_HyperSeg_Patch-Wise_Hypernetwork_for_Real-Time_Semantic_Segmentation_CVPR_2021_paper.pdf
Code: https://github.com/YuvalNirkin/hyperseg
2. Rethinking BiSeNet For Real-time Semantic Segmentation
作者单位: 美团
Paper: https://arxiv.org/abs/2104.13188
Code: https://github.com/MichaelFan01/STDC-Seg
3. Progressive Semantic Segmentation
4. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
5. Capturing Omni-Range Context for Omnidirectional Segmentation
6. Learning Statistical Texture for Semantic Segmentation
7. InverseForm: A Loss Function for Structured Boundary-Aware Segmentation
8. DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation
9. Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation
10. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation
11. Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
作者单位: 南京理工大学, MBZUAI, 电子科技大学, 阿德莱德大学, 悉尼科技大学
Paper: https://arxiv.org/abs/2103.14581
Code: https://github.com/NUST-Machine-Intelligence-Laboratory/nsrom
12. Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation
13. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation
14. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
15. Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
16. Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency
17. Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization
18. Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation
19. Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation
作者单位: ETH Zurich, 鲁汶大学, 电子科技大学
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Gong_Cluster_Split_Fuse_and_Update_Meta-Learning_for_Open_Compound_Domain_CVPR_2021_paper.html
20. Source-Free Domain Adaptation for Semantic Segmentation
21. Uncertainty Reduction for Model Adaptation in Semantic Segmentation
22. Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation
23. RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
24. Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization
25. MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation
26. Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
27. Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation
28. DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation
29. Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation
30. Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation
31. PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering
32. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
33. Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
作者单位: 帕多瓦大学
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Michieli_Continual_Semantic_Segmentation_via_Repulsion-Attraction_of_Sparse_and_Disentangled_Latent_CVPR_2021_paper.html
Code: https://lttm.dei.unipd.it/paper_data/SDR/
34. Exploit Visual Dependency Relations for Semantic Segmentation
35. Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs
36. PLOP: Learning without Forgetting for Continual Semantic Segmentation
37. 3D-to-2D Distillation for Indoor Scene Parsing
38. Bidirectional Projection Network for Cross Dimension Scene Understanding
39. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
作者单位: 北京大学, 中科院, 国科大, ETH Zurich, 商汤科技等
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_PointFlow_Flowing_Semantics_Through_Points_for_Aerial_Image_Segmentation_CVPR_2021_paper.html
Code: https://github.com/lxtGH/PFSegNets
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
Incremental Few-Shot Instance Segmentation
A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation
RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features
Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation
Multi-Scale Aligned Distillation for Low-Resolution Detection
Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
Code: https://github.com/Jia-Research-Lab/MSAD
Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
Homepage: https://bowenc0221.github.io/boundary-iou/
Paper: https://arxiv.org/abs/2103.16562
Code: https://github.com/bowenc0221/boundary-iou-api
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers
Paper: https://arxiv.org/abs/2103.12340
Code: https://github.com/lkeab/BCNet
Zero-shot instance segmentation(Not Sure)
STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
End-to-End Video Instance Segmentation with Transformers
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
Part-aware Panoptic Segmentation
Exemplar-Based Open-Set Panoptic Segmentation Network
MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
Panoptic Segmentation Forecasting
Fully Convolutional Networks for Panoptic Segmentation
Paper: https://arxiv.org/abs/2012.00720
Code: https://github.com/yanwei-li/PanopticFCN
Cross-View Regularization for Domain Adaptive Panoptic Segmentation
1. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling
2. Every Annotation Counts: Multi-Label Deep Supervision for Medical Image Segmentation
3. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
4. DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
5. DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images
作者单位: 斯坦福大学
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hsu_DARCNN_Domain_Adaptive_Region-Based_Convolutional_Neural_Network_for_Unsupervised_Instance_CVPR_2021_paper.html
Learning Position and Target Consistency for Memory-based Video Object Segmentation
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
Homepage: https://hkchengrex.github.io/MiVOS/
Paper: https://arxiv.org/abs/2103.07941
Code: https://github.com/hkchengrex/MiVOS
Demo: https://hkchengrex.github.io/MiVOS/video.html#partb
Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
Paper: https://arxiv.org/abs/2103.10391
Code: https://github.com/svip-lab/IVOS-W
Uncertainty-aware Joint Salient Object and Camouflaged Object Detection
Paper: https://arxiv.org/abs/2104.02628
Code: https://github.com/JingZhang617/Joint_COD_SOD
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion
Group Collaborative Learning for Co-Salient Object Detection
Semantic Image Matting
Generalizable Person Re-identification with Relevance-aware Mixture of Experts
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
Combined Depth Space based Architecture Search For Person Re-identification
Anchor-Free Person Search
Temporal-Relational CrossTransformers for Few-Shot Action Recognition
FrameExit: Conditional Early Exiting for Efficient Video Recognition
No frame left behind: Full Video Action Recognition
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
Temporal Context Aggregation Network for Temporal Action Proposal Refinement
ACTION-Net: Multipath Excitation for Action Recognition
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
TDN: Temporal Difference Networks for Efficient Action Recognition
A 3D GAN for Improved Large-pose Facial Recognition
MagFace: A Universal Representation for Face Recognition and Quality Assessment
WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
HLA-Face: Joint High-Low Adaptation for Low Light Face Detection
CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
Cross Modal Focal Loss for RGBD Face Anti-Spoofing
Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain
Multi-attentional Deepfake Detection
Continuous Face Aging via Self-estimated Residual Age Embedding
PML: Progressive Margin Loss for Long-tailed Age Classification
Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition
MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks
Pose Recognition with Cascade Transformers
Paper: https://arxiv.org/abs/2104.06976
Code: https://github.com/mlpc-ucsd/PRTR
DCPose: Deep Dual Consecutive Network for Human Pose Estimation
End-to-End Human Pose and Mesh Reconstruction with Transformers
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation
Paper(Oral): https://arxiv.org/abs/2105.02465
Code: https://github.com/jfzhang95/PoseAug
Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration
Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks
HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation
From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time
POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture
Homepage: http://www.liuyebin.com/posefusion/posefusion.html
Paper(Oral): https://arxiv.org/abs/2103.15331
Fourier Contour Embedding for Arbitrary-Shaped Text Detection
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Checkerboard Context Model for Efficient Learned Image Compression
Slimmable Compressive Autoencoders for Practical Neural Image Compression
Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton
Teachers Do More Than Teach: Compressing Image-to-Image Models
Dynamic Slimmable Network
Network Quantization with Element-wise Gradient Scaling
Zero-shot Adversarial Quantization
Learnable Companding Quantization for Accurate Low-bit Neural Networks
Distilling Knowledge via Knowledge Review
Distilling Object Detectors via Decoupled Features
Image Super-Resolution with Non-Local Sparse Attention
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
AdderSR: Towards Energy Efficient Image Super-Resolution
Contrastive Learning for Compact Single Image Dehazing
Temporal Modulation Network for Controllable Space-Time Video Super-Resolution
Multi-Stage Progressive Image Restoration
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
High-Fidelity and Arbitrary Face Editing
Anycost GANs for Interactive Image Synthesis and Editing
PISE: Person Image Synthesis and Editing with Decoupled GAN
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
DG-Font: Deformable Generative Networks for Unsupervised Font Generation
Paper: https://arxiv.org/abs/2104.03064
Code: https://github.com/ecnuycxie/DG-Font
LoFTR: Detector-Free Local Feature Matching with Transformers
Convolutional Hough Matching Networks
Bridging the Visual Gap: Wide-Range Image Blending
Paper: https://arxiv.org/abs/2103.15149
Code: https://github.com/julia0607/Wide-Range-Image-Blending
Robust Reflection Removal with Reflection-free Flash-only Cues
Equivariant Point Network for 3D Point Cloud Analysis
PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
3D-MAN: 3D Multi-frame Attention Network for Object Detection
Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds
HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection
Homepage: https://cvlab.yonsei.ac.kr/projects/HVPR/
Paper: https://arxiv.org/abs/2104.00902
Code: https://github.com/cvlab-yonsei/HVPR
LiDAR R-CNN: An Efficient and Universal 3D Object Detector
M3DSSD: Monocular 3D Single Stage Object Detector
Paper: https://arxiv.org/abs/2103.13164
Code: https://github.com/mumianyuxin/M3DSSD
SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud
Center-based 3D Object Detection and Tracking
Categorical Depth Distribution Network for Monocular 3D Object Detection
Bidirectional Projection Network for Cross Dimension Scene Understanding
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation
ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning
PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency
PREDATOR: Registration of 3D Point Clouds with Low Overlap
Unsupervised 3D Shape Completion through GAN Inversion
Variational Relational Point Completion Network
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion
Homepage: https://alphapav.github.io/SpareNet/
Paper: https://arxiv.org/abs/2103.02535
Code: https://github.com/microsoft/SpareNet
Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection
Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
Homepage: https://zju3dv.github.io/neuralrecon/
Paper(Oral): https://arxiv.org/abs/2104.00681
Code: https://github.com/zju3dv/NeuralRecon
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation
FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
Beyond Image to Depth: Improving Depth Prediction using Echoes
S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation
Depth from Camera Motion and Object Detection
A Decomposition Model for Stereo Matching
Self-Supervised Multi-Frame Monocular Scene Flow
RAFT-3D: Scene Flow using Rigid-Motion Embeddings
Learning Optical Flow From Still Images
Homepage: https://mattpoggi.github.io/projects/cvpr2021aleotti/
Paper: https://mattpoggi.github.io/assets/papers/aleotti2021cvpr.pdf
Code: https://github.com/mattpoggi/depthstillation
FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
Focus on Local: Detecting Lane Marker from Bottom Up via Key Point
Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection
Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
Paper: https://arxiv.org/abs/2105.02440
Code: https://github.com/VisDrone/DroneCrowd
Dataset: https://github.com/VisDrone/DroneCrowd
Enhancing the Transferability of Adversarial Attacks through Variance Tuning
LiBRe: A Practical Bayesian Approach to Adversarial Detection
Natural Adversarial Examples
StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval
QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval
On Semantic Similarity in Video Retrieval
Paper: https://arxiv.org/abs/2103.10095
Homepage: https://mwray.github.io/SSVR/
Code: https://github.com/mwray/Semantic-Video-Retrieval
Cross-Modal Center Loss for 3D Cross-Modal Retrieval
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning
Code: https://github.com/amzn/image-to-recipe-transformers
Counterfactual Zero-Shot and Open-Set Visual Recognition
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
CDFI: Compression-Driven Network Design for Frame Interpolation
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
Homepage: https://tarun005.github.io/FLAVR/
Paper: https://arxiv.org/abs/2012.08512
Code: https://github.com/tarun005/FLAVR
Transformation Driven Visual Reasoning
GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields
Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html
Paper(Oral): https://arxiv.org/abs/2011.12100
Code: https://github.com/autonomousvision/giraffe
Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1
Taming Transformers for High-Resolution Image Synthesis
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
Self-Supervised Visibility Learning for Novel View Synthesis
NeX: Real-time View Synthesis with Neural Basis Expansion
Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
Variational Transformer Networks for Layout Generation
Generalization on Unseen Domains via Inference-time Label-Preserving Target Projections
RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
Adaptive Methods for Real-World Domain Generalization
FSDR: Frequency Space Domain Randomization for Domain Generalization
Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation
Domain Consensus Clustering for Universal Domain Adaptation
Towards Open World Object Detection
Learning Placeholders for Open-Set Recognition
HOTR: End-to-End Human-Object Interaction Detection with Transformers
Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information
Reformulating HOI Detection as Adaptive Set Prediction
Detecting Human-Object Interaction via Fabricated Compositional Learning
End-to-End Human Object Interaction Detection with HOI Transformer
Auto-Exposure Fusion for Single-Image Shadow Removal
Parser-Free Virtual Try-on via Distilling Appearance Flows
基于外观流蒸馏的无需人体解析的虚拟换装
A Second-Order Approach to Learning with Instance-Dependent Label Noise
Real-Time Selfie Video Stabilization
Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yu_Real-Time_Selfie_Video_Stabilization_CVPR_2021_paper.pdf
Code: https://github.com/jiy173/selfievideostabilization
Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
Homepage: https://www.yasamin.page/hdnet_tiktok
Paper(Oral): https://arxiv.org/abs/2103.03319
Code: https://github.com/yasaminjafarian/HDNet_TikTok
Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
论文下载链接:
Learning To Count Everything
Visual Semantic Role Labeling for Video Understanding
Homepage: https://vidsitu.org/
Paper: https://arxiv.org/abs/2104.00990
Code: https://github.com/TheShadow29/VidSitu
Dataset: https://github.com/TheShadow29/VidSitu
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
Homepage: https://vap.aau.dk/sewer-ml/
Paper: https://arxiv.org/abs/2103.10895
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Paper: https://arxiv.org/abs/2012.02206
Code: https://github.com/daveredrum/Scan2Cap
Dataset: https://github.com/daveredrum/ScanRefer
Fast and Accurate Model Scaling
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dollar_Fast_and_Accurate_Model_Scaling_CVPR_2021_paper.html
Code: https://github.com/facebookresearch/pycls
Omnimatte: Associating Objects and Their Effects in Video
Homepage: https://omnimatte.github.io/
Paper(Oral): https://arxiv.org/abs/2105.06993
Code: https://omnimatte.github.io/#code
Motion Representations for Articulated Animation
Deep Lucas-Kanade Homography for Multimodal Image Alignment
Skip-Convolutions for Efficient Video Processing
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control
Homepage: http://tomasjakab.github.io/KeypointDeformer
Paper(Oral): https://arxiv.org/abs/2104.11224
Code: https://github.com/tomasjakab/keypoint_deformer/
SOLD2: Self-supervised Occlusion-aware Line Description and Detection
Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
LEAP: Learning Articulated Occupancy of People
UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning
Towards High Fidelity Face Relighting with Realistic Shadows
BRepNet: A topological message passing system for solid models
Visually Informed Binaural Audio Generation without Binaural Audios
Homepage: https://sheldontsui.github.io/projects/PseudoBinaural
Paper: None
GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021
Demo: https://www.youtube.com/watch?v=r-uC2MyAWQc
Exploring intermediate representation for monocular vehicle pose estimation
Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB
Invertible Image Signal Processing
Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling
SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences
Embedding Transfer with Label Relaxation for Improved Metric Learning
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes
Meta-Mining Discriminative Samples for Kinship Verification
Cloud2Curve: Generation and Vectorization of Parametric Sketches
TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
Homepage: http://wellyzhang.github.io/project/prae.html
Paper: https://arxiv.org/abs/2103.14230
ACRE: Abstract Causal REasoning Beyond Covariation
Homepage: http://wellyzhang.github.io/project/acre.html
Paper: https://arxiv.org/abs/2103.14232
Confluent Vessel Trees with Accurate Bifurcations
Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling
Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks
Knowledge Evolution in Neural Networks
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
SGP: Self-supervised Geometric Perception
Oral
Paper: https://arxiv.org/abs/2103.03114
Code: https://github.com/theNded/SGP
Diffusion Probabilistic Models for 3D Point Cloud Generation
Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill
Dataset: http://rl.uni-freiburg.de/research/multimodal-distill
CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models
Toward Explainable Reflection Removal with Distilling and Model Uncertainty
DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation
Exploring Adversarial Fake Images on Face Manifold
Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task
Temporal Contrastive Graph for Self-supervised Video Representation Learning
Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching
Fast and Memory-Efficient Compact Bilinear Pooling
Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine
Estimating A Child’s Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation
https://github.com/ShaoQiangShen/CVPR2021
https://github.com/gillesflash/CVPR2021
https://github.com/anonymous-submission1991/BaLeNAS
https://github.com/cvpr2021dcb/cvpr2021dcb
https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578
https://github.com/AldrichZeng/FreqPrune
https://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM
https://github.com/ddfss/datadrive-fss
国际计算机视觉与模式识别会议(CVPR)是IEEE一年一度的学术性会议,会议的主要内容是计算机视觉与模式识别技术。CVPR是世界顶级的计算机视觉会议(三大顶会之一,另外两个是ICCV和ECCV),近年来每年有约1500名参加者,收录的论文数量一般300篇左右。
©Copyright 2023 CCF 开源发展委员会 Powered by Trustie& IntelliDE 京ICP备13000930号
CVPR2021
1. Scaled-YOLOv4: Scaling Cross Stage Partial Network
2. You Only Look One-level Feature
3. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
4. End-to-End Object Detection with Fully Convolutional Network
5. Dynamic Head: Unifying Object Detection Heads with Attentions
6. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
7. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
8. MobileDets: Searching for Object Detection Architectures for Mobile Accelerators
作者单位: 威斯康星大学, 谷歌
Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Xiong_MobileDets_Searching_for_Object_Detection_Architectures_for_Mobile_Accelerators_CVPR_2021_paper.pdf
Code: https://github.com/tensorflow/models/tree/master/research/object_detection
9. Tracking Pedestrian Heads in Dense Crowd
10. Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
11. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
12. IQDet: Instance-wise Quality Distribution Sampling for Object Detection
13. Multi-Scale Aligned Distillation for Low-Resolution Detection
14. Adaptive Class Suppression Loss for Long-Tail Object Detection
作者单位: 中科院, 国科大, ObjectEye, 北京大学, 鹏城实验室, Nexwise
Paper: https://arxiv.org/abs/2104.00885
Code: https://github.com/CASIA-IVA-Lab/ACSL
15. VarifocalNet: An IoU-aware Dense Object Detector
16. OTA: Optimal Transport Assignment for Object Detection
作者单位: 早稻田大学, 旷视科技
Paper: https://arxiv.org/abs/2103.14259
Code: https://github.com/Megvii-BaseDetection/OTA
17. Distilling Object Detectors via Decoupled Features
18. Robust and Accurate Object Detection via Adversarial Learning
作者单位: 谷歌, UCLA, UCSC
Paper: https://arxiv.org/abs/2103.13886
Code: None
19. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
20. Multiple Instance Active Learning for Object Detection
21. Towards Open World Object Detection
22. RankDetNet: Delving Into Ranking Constraints for Object Detection
旋转目标检测
23. Dense Label Encoding for Boundary Discontinuity Free Rotation Detection
24. ReDet: A Rotation-equivariant Detector for Aerial Object Detection
作者单位: 武汉大学
Paper: https://arxiv.org/abs/2103.07733
Code: https://github.com/csuhan/ReDet
25. Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection
Few-Shot目标检测
26. Accurate Few-Shot Object Detection With Support-Query Mutual Guidance and Hybrid Loss
作者单位: 复旦大学, 同济大学, 浙江大学
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Zhang_Accurate_Few-Shot_Object_Detection_With_Support-Query_Mutual_Guidance_and_Hybrid_CVPR_2021_paper.html
Code: None
27. Adaptive Image Transformer for One-Shot Object Detection
28. Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection
29. Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
作者单位: 卡内基梅隆大学(CMU)
Paper: https://arxiv.org/abs/2103.01903
Code: None
30. FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding
31. Hallucination Improves Few-Shot Object Detection
32. Few-Shot Object Detection via Classification Refinement and Distractor Retreatment
33. Generalized Few-Shot Object Detection Without Forgetting
34. Transformation Invariant Few-Shot Object Detection
作者单位: 华为诺亚方舟实验室
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_Transformation_Invariant_Few-Shot_Object_Detection_CVPR_2021_paper.html
Code: None
35. UniT: Unified Knowledge Transfer for Any-Shot Object Detection and Segmentation
36. Beyond Max-Margin: Class Margin Equilibrium for Few-Shot Object Detection
半监督目标检测
37. Points As Queries: Weakly Semi-Supervised Object Detection by Points]
38. Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection
39. Positive-Unlabeled Data Purification in the Wild for Object Detection
作者单位: 华为诺亚方舟实验室, 悉尼大学, 北京大学
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Guo_Positive-Unlabeled_Data_Purification_in_the_Wild_for_Object_Detection_CVPR_2021_paper.html
Code: None
40. Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection
41. Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
42. Humble Teachers Teach Better Students for Semi-Supervised Object Detection
43. Interpolation-Based Semi-Supervised Learning for Object Detection
域自适应目标检测
44. Domain-Specific Suppression for Adaptive Object Detection
45. MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
46. Unbiased Mean Teacher for Cross-Domain Object Detection
47. I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors
自监督目标检测
48. There Is More Than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking With Sound by Distilling Multimodal Knowledge
49. Instance Localization for Self-supervised Detection Pretraining
弱监督目标检测
50. Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection
51. DAP: Detection-Aware Pre-training with Weak Supervision
其他
52. Open-Vocabulary Object Detection Using Captions
作者单位:Snap, 哥伦比亚大学
Paper(Oral): https://openaccess.thecvf.com/content/CVPR2021/html/Zareian_Open-Vocabulary_Object_Detection_Using_Captions_CVPR_2021_paper.html
Code: https://github.com/alirezazareian/ovr-cnn
53. Depth From Camera Motion and Object Detection
作者单位: 密歇根大学, SIAI
Paper: https://arxiv.org/abs/2103.01468
Code: https://github.com/griffbr/ODMD
Dataset: https://github.com/griffbr/ODMD
54. Unsupervised Object Detection With LIDAR Clues
55. GAIA: A Transfer Learning System of Object Detection That Fits Your Needs
56. General Instance Distillation for Object Detection
57. AQD: Towards Accurate Quantized Object Detection
58. Scale-Aware Automatic Augmentation for Object Detection
59. Equalization Loss v2: A New Gradient Balance Approach for Long-Tailed Object Detection
60. Class-Aware Robust Adversarial Training for Object Detection
61. Improved Handling of Motion Blur in Online Object Detection
62. Multiple Instance Active Learning for Object Detection
63. Neural Auto-Exposure for High-Dynamic Range Object Detection
64. Generalizable Pedestrian Detection: The Elephant in the Room
65. Neural Auto-Exposure for High-Dynamic Range Object Detection
单/多目标跟踪(Object Tracking)
单目标跟踪
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
Paper: https://arxiv.org/abs/2104.14545
Code: https://github.com/researchmm/LightTrack
Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark
Homepage: https://sites.google.com/view/langtrackbenchmark/
Paper: https://arxiv.org/abs/2103.16746
Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit
Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
Graph Attention Tracking
Rotation Equivariant Siamese Networks for Tracking
Track to Detect and Segment: An Online Multi-Object Tracker
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack
Transformer Tracking
多目标跟踪
Tracking Pedestrian Heads in Dense Crowd
Multiple Object Tracking with Correlation Learning
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
Learning a Proposal Classifier for Multiple Object Tracking
Track to Detect and Segment: An Online Multi-Object Tracker
语义分割(Semantic Segmentation)
1. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation
作者单位: Facebook AI, 巴伊兰大学, 特拉维夫大学
Homepage: https://nirkin.com/hyperseg/
Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Nirkin_HyperSeg_Patch-Wise_Hypernetwork_for_Real-Time_Semantic_Segmentation_CVPR_2021_paper.pdf
Code: https://github.com/YuvalNirkin/hyperseg
2. Rethinking BiSeNet For Real-time Semantic Segmentation
作者单位: 美团
Paper: https://arxiv.org/abs/2104.13188
Code: https://github.com/MichaelFan01/STDC-Seg
3. Progressive Semantic Segmentation
4. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
5. Capturing Omni-Range Context for Omnidirectional Segmentation
6. Learning Statistical Texture for Semantic Segmentation
7. InverseForm: A Loss Function for Structured Boundary-Aware Segmentation
8. DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation
弱监督语义分割
9. Railroad Is Not a Train: Saliency As Pseudo-Pixel Supervision for Weakly Supervised Semantic Segmentation
10. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation
11. Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
作者单位: 南京理工大学, MBZUAI, 电子科技大学, 阿德莱德大学, 悉尼科技大学
Paper: https://arxiv.org/abs/2103.14581
Code: https://github.com/NUST-Machine-Intelligence-Laboratory/nsrom
12. Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation
13. BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation
半监督语义分割
14. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
15. Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
16. Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency
17. Semantic Segmentation With Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization
18. Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation
域自适应语义分割
19. Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation
作者单位: ETH Zurich, 鲁汶大学, 电子科技大学
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Gong_Cluster_Split_Fuse_and_Update_Meta-Learning_for_Open_Compound_Domain_CVPR_2021_paper.html
Code: None
20. Source-Free Domain Adaptation for Semantic Segmentation
21. Uncertainty Reduction for Model Adaptation in Semantic Segmentation
22. Self-Supervised Augmentation Consistency for Adapting Semantic Segmentation
23. RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
24. Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization
25. MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation
26. Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
27. Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation
28. DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation
Few-Shot语义分割
29. Scale-Aware Graph Neural Network for Few-Shot Semantic Segmentation
30. Anti-Aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation
无监督语义分割
31. PiCIE: Unsupervised Semantic Segmentation Using Invariance and Equivariance in Clustering
视频语义分割
32. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
其它
33. Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
作者单位: 帕多瓦大学
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Michieli_Continual_Semantic_Segmentation_via_Repulsion-Attraction_of_Sparse_and_Disentangled_Latent_CVPR_2021_paper.html
Code: https://lttm.dei.unipd.it/paper_data/SDR/
34. Exploit Visual Dependency Relations for Semantic Segmentation
35. Revisiting Superpixels for Active Learning in Semantic Segmentation With Realistic Annotation Costs
36. PLOP: Learning without Forgetting for Continual Semantic Segmentation
37. 3D-to-2D Distillation for Indoor Scene Parsing
38. Bidirectional Projection Network for Cross Dimension Scene Understanding
39. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
作者单位: 北京大学, 中科院, 国科大, ETH Zurich, 商汤科技等
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Li_PointFlow_Flowing_Semantics_Through_Points_for_Aerial_Image_Segmentation_CVPR_2021_paper.html
Code: https://github.com/lxtGH/PFSegNets
实例分割(Instance Segmentation)
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
Incremental Few-Shot Instance Segmentation
A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation
RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features
Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation
Multi-Scale Aligned Distillation for Low-Resolution Detection
Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
Code: https://github.com/Jia-Research-Lab/MSAD
Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
Homepage: https://bowenc0221.github.io/boundary-iou/
Paper: https://arxiv.org/abs/2103.16562
Code: https://github.com/bowenc0221/boundary-iou-api
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers
Paper: https://arxiv.org/abs/2103.12340
Code: https://github.com/lkeab/BCNet
Zero-shot instance segmentation(Not Sure)
视频实例分割
STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
End-to-End Video Instance Segmentation with Transformers
全景分割(Panoptic Segmentation)
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
Part-aware Panoptic Segmentation
Exemplar-Based Open-Set Panoptic Segmentation Network
MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
Panoptic Segmentation Forecasting
Fully Convolutional Networks for Panoptic Segmentation
Paper: https://arxiv.org/abs/2012.00720
Code: https://github.com/yanwei-li/PanopticFCN
Cross-View Regularization for Domain Adaptive Panoptic Segmentation
医学图像分割
1. Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling
2. Every Annotation Counts: Multi-Label Deep Supervision for Medical Image Segmentation
3. FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
4. DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
5. DARCNN: Domain Adaptive Region-Based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images
作者单位: 斯坦福大学
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Hsu_DARCNN_Domain_Adaptive_Region-Based_Convolutional_Neural_Network_for_Unsupervised_Instance_CVPR_2021_paper.html
Code: None
视频目标分割(Video-Object-Segmentation)
Learning Position and Target Consistency for Memory-based Video Object Segmentation
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
交互式视频目标分割(Interactive-Video-Object-Segmentation)
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
Homepage: https://hkchengrex.github.io/MiVOS/
Paper: https://arxiv.org/abs/2103.07941
Code: https://github.com/hkchengrex/MiVOS
Demo: https://hkchengrex.github.io/MiVOS/video.html#partb
Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
Paper: https://arxiv.org/abs/2103.10391
Code: https://github.com/svip-lab/IVOS-W
显著性检测(Saliency Detection)
Uncertainty-aware Joint Salient Object and Camouflaged Object Detection
Paper: https://arxiv.org/abs/2104.02628
Code: https://github.com/JingZhang617/Joint_COD_SOD
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion
伪装物体检测(Camouflaged Object Detection)
Uncertainty-aware Joint Salient Object and Camouflaged Object Detection
Paper: https://arxiv.org/abs/2104.02628
Code: https://github.com/JingZhang617/Joint_COD_SOD
协同显著性检测(Co-Salient Object Detection)
Group Collaborative Learning for Co-Salient Object Detection
协同显著性检测(Image Matting)
Semantic Image Matting
行人重识别(Person Re-identification)
Generalizable Person Re-identification with Relevance-aware Mixture of Experts
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
Combined Depth Space based Architecture Search For Person Re-identification
行人搜索(Person Search)
Anchor-Free Person Search
视频理解/行为识别(Video Understanding)
Temporal-Relational CrossTransformers for Few-Shot Action Recognition
FrameExit: Conditional Early Exiting for Efficient Video Recognition
No frame left behind: Full Video Action Recognition
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
Temporal Context Aggregation Network for Temporal Action Proposal Refinement
ACTION-Net: Multipath Excitation for Action Recognition
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
TDN: Temporal Difference Networks for Efficient Action Recognition
人脸识别(Face Recognition)
A 3D GAN for Improved Large-pose Facial Recognition
MagFace: A Universal Representation for Face Recognition and Quality Assessment
WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
人脸检测(Face Detection)
HLA-Face: Joint High-Low Adaptation for Low Light Face Detection
CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
人脸活体检测(Face Anti-Spoofing)
Cross Modal Focal Loss for RGBD Face Anti-Spoofing
Deepfake检测(Deepfake Detection)
Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain
Multi-attentional Deepfake Detection
人脸年龄估计(Age Estimation)
Continuous Face Aging via Self-estimated Residual Age Embedding
PML: Progressive Margin Loss for Long-tailed Age Classification
人脸表情识别(Facial Expression Recognition)
Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition
Deepfakes
MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes
人体解析(Human Parsing)
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
2D/3D人体姿态估计(2D/3D Human Pose Estimation)
2D 人体姿态估计
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks
Pose Recognition with Cascade Transformers
Paper: https://arxiv.org/abs/2104.06976
Code: https://github.com/mlpc-ucsd/PRTR
DCPose: Deep Dual Consecutive Network for Human Pose Estimation
3D 人体姿态估计
End-to-End Human Pose and Mesh Reconstruction with Transformers
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation
Paper(Oral): https://arxiv.org/abs/2105.02465
Code: https://github.com/jfzhang95/PoseAug
Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration
Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks
HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation
动物姿态估计(Animal Pose Estimation)
From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation
手部姿态估计(Hand Pose Estimation)
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time
Human Volumetric Capture
POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture
Homepage: http://www.liuyebin.com/posefusion/posefusion.html
Paper(Oral): https://arxiv.org/abs/2103.15331
Code: None
场景文本检测(Scene Text Detection)
Fourier Contour Embedding for Arbitrary-Shaped Text Detection
场景文本识别(Scene Text Recognition)
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
图像压缩
Checkerboard Context Model for Efficient Learned Image Compression
Slimmable Compressive Autoencoders for Practical Neural Image Compression
Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton
模型压缩/剪枝/量化
Teachers Do More Than Teach: Compressing Image-to-Image Models
模型剪枝
Dynamic Slimmable Network
模型量化
Network Quantization with Element-wise Gradient Scaling
Zero-shot Adversarial Quantization
Learnable Companding Quantization for Accurate Low-bit Neural Networks
知识蒸馏(Knowledge Distillation)
Distilling Knowledge via Knowledge Review
Distilling Object Detectors via Decoupled Features
超分辨率(Super-Resolution)
Image Super-Resolution with Non-Local Sparse Attention
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
AdderSR: Towards Energy Efficient Image Super-Resolution
去雾(Dehazing)
Contrastive Learning for Compact Single Image Dehazing
视频超分辨率
Temporal Modulation Network for Controllable Space-Time Video Super-Resolution
图像恢复(Image Restoration)
Multi-Stage Progressive Image Restoration
图像补全(Image Inpainting)
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations
图像编辑(Image Editing)
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
High-Fidelity and Arbitrary Face Editing
Anycost GANs for Interactive Image Synthesis and Editing
PISE: Person Image Synthesis and Editing with Decoupled GAN
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
图像描述(Image Captioning)
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
字体生成(Font Generation)
DG-Font: Deformable Generative Networks for Unsupervised Font Generation
Paper: https://arxiv.org/abs/2104.03064
Code: https://github.com/ecnuycxie/DG-Font
图像匹配(Image Matcing)
LoFTR: Detector-Free Local Feature Matching with Transformers
Convolutional Hough Matching Networks
图像融合(Image Blending)
Bridging the Visual Gap: Wide-Range Image Blending
Paper: https://arxiv.org/abs/2103.15149
Code: https://github.com/julia0607/Wide-Range-Image-Blending
反光去除(Reflection Removal)
Robust Reflection Removal with Reflection-free Flash-only Cues
3D点云分类(3D Point Clouds Classification)
Equivariant Point Network for 3D Point Cloud Analysis
PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
3D目标检测(3D Object Detection)
3D-MAN: 3D Multi-frame Attention Network for Object Detection
Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds
HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection
Homepage: https://cvlab.yonsei.ac.kr/projects/HVPR/
Paper: https://arxiv.org/abs/2104.00902
Code: https://github.com/cvlab-yonsei/HVPR
LiDAR R-CNN: An Efficient and Universal 3D Object Detector
M3DSSD: Monocular 3D Single Stage Object Detector
Paper: https://arxiv.org/abs/2103.13164
Code: https://github.com/mumianyuxin/M3DSSD
SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud
Center-based 3D Object Detection and Tracking
Categorical Depth Distribution Network for Monocular 3D Object Detection
3D语义分割(3D Semantic Segmentation)
Bidirectional Projection Network for Cross Dimension Scene Understanding
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
3D全景分割(3D Panoptic Segmentation)
Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation
3D目标跟踪(3D Object Trancking)
Center-based 3D Object Detection and Tracking
3D点云配准(3D Point Cloud Registration)
ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning
PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency
PREDATOR: Registration of 3D Point Clouds with Low Overlap
3D点云补全(3D Point Cloud Completion)
Unsupervised 3D Shape Completion through GAN Inversion
Variational Relational Point Completion Network
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion
Homepage: https://alphapav.github.io/SpareNet/
Paper: https://arxiv.org/abs/2103.02535
Code: https://github.com/microsoft/SpareNet
3D重建(3D Reconstruction)
Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection
Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
Homepage: https://zju3dv.github.io/neuralrecon/
Paper(Oral): https://arxiv.org/abs/2104.00681
Code: https://github.com/zju3dv/NeuralRecon
6D位姿估计(6D Pose Estimation)
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation
FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
相机姿态估计
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
深度估计(Depth Estimation)
S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
Beyond Image to Depth: Improving Depth Prediction using Echoes
S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation
Depth from Camera Motion and Object Detection
立体匹配(Stereo Matching)
A Decomposition Model for Stereo Matching
光流估计(Flow Estimation)
Self-Supervised Multi-Frame Monocular Scene Flow
RAFT-3D: Scene Flow using Rigid-Motion Embeddings
Learning Optical Flow From Still Images
Homepage: https://mattpoggi.github.io/projects/cvpr2021aleotti/
Paper: https://mattpoggi.github.io/assets/papers/aleotti2021cvpr.pdf
Code: https://github.com/mattpoggi/depthstillation
FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
车道线检测(Lane Detection)
Focus on Local: Detecting Lane Marker from Bottom Up via Key Point
Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection
轨迹预测(Trajectory Prediction)
Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction
人群计数(Crowd Counting)
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
Paper: https://arxiv.org/abs/2105.02440
Code: https://github.com/VisDrone/DroneCrowd
Dataset: https://github.com/VisDrone/DroneCrowd
对抗样本(Adversarial Examples)
Enhancing the Transferability of Adversarial Attacks through Variance Tuning
LiBRe: A Practical Bayesian Approach to Adversarial Detection
Natural Adversarial Examples
图像检索(Image Retrieval)
StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval
QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval
视频检索(Video Retrieval)
On Semantic Similarity in Video Retrieval
Paper: https://arxiv.org/abs/2103.10095
Homepage: https://mwray.github.io/SSVR/
Code: https://github.com/mwray/Semantic-Video-Retrieval
跨模态检索(Cross-modal Retrieval)
Cross-Modal Center Loss for 3D Cross-Modal Retrieval
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning
Code: https://github.com/amzn/image-to-recipe-transformers
Zero-Shot Learning
Counterfactual Zero-Shot and Open-Set Visual Recognition
联邦学习(Federated Learning)
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
视频插帧(Video Frame Interpolation)
CDFI: Compression-Driven Network Design for Frame Interpolation
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
Homepage: https://tarun005.github.io/FLAVR/
Paper: https://arxiv.org/abs/2012.08512
Code: https://github.com/tarun005/FLAVR
视觉推理(Visual Reasoning)
Transformation Driven Visual Reasoning
图像合成(Image Synthesis)
GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields
Homepage: https://m-niemeyer.github.io/project-pages/giraffe/index.html
Paper(Oral): https://arxiv.org/abs/2011.12100
Code: https://github.com/autonomousvision/giraffe
Demo: http://www.youtube.com/watch?v=fIaDXC-qRSg&vq=hd1080&autoplay=1
Taming Transformers for High-Resolution Image Synthesis
视图合成(View Synthesis)
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
Self-Supervised Visibility Learning for Novel View Synthesis
NeX: Real-time View Synthesis with Neural Basis Expansion
风格迁移(Style Transfer)
Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
布局生成(Layout Generation)
LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
Variational Transformer Networks for Layout Generation
Domain Generalization
Generalization on Unseen Domains via Inference-time Label-Preserving Target Projections
Generalizable Person Re-identification with Relevance-aware Mixture of Experts
RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
Adaptive Methods for Real-World Domain Generalization
FSDR: Frequency Space Domain Randomization for Domain Generalization
Domain Adaptation
Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation
Domain Consensus Clustering for Universal Domain Adaptation
Open-Set
Towards Open World Object Detection
Exemplar-Based Open-Set Panoptic Segmentation Network
Learning Placeholders for Open-Set Recognition
Adversarial Attack
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
“人-物”交互(HOI)检测
HOTR: End-to-End Human-Object Interaction Detection with Transformers
Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information
Reformulating HOI Detection as Adaptive Set Prediction
Detecting Human-Object Interaction via Fabricated Compositional Learning
End-to-End Human Object Interaction Detection with HOI Transformer
阴影去除(Shadow Removal)
Auto-Exposure Fusion for Single-Image Shadow Removal
虚拟换衣(Virtual Try-On)
Parser-Free Virtual Try-on via Distilling Appearance Flows
基于外观流蒸馏的无需人体解析的虚拟换装
标签噪声(Label Noise)
A Second-Order Approach to Learning with Instance-Dependent Label Noise
视频稳像(Video Stabilization)
Real-Time Selfie Video Stabilization
Paper: https://openaccess.thecvf.com/content/CVPR2021/papers/Yu_Real-Time_Selfie_Video_Stabilization_CVPR_2021_paper.pdf
Code: https://github.com/jiy173/selfievideostabilization
数据集(Datasets)
Tracking Pedestrian Heads in Dense Crowd
Part-aware Panoptic Segmentation
Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
Homepage: https://www.yasamin.page/hdnet_tiktok
Paper(Oral): https://arxiv.org/abs/2103.03319
Code: https://github.com/yasaminjafarian/HDNet_TikTok
Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
Paper: https://arxiv.org/abs/2105.02440
Code: https://github.com/VisDrone/DroneCrowd
Dataset: https://github.com/VisDrone/DroneCrowd
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
论文下载链接:
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
Learning To Count Everything
Semantic Image Matting
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
Visual Semantic Role Labeling for Video Understanding
Homepage: https://vidsitu.org/
Paper: https://arxiv.org/abs/2104.00990
Code: https://github.com/TheShadow29/VidSitu
Dataset: https://github.com/TheShadow29/VidSitu
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
Homepage: https://vap.aau.dk/sewer-ml/
Paper: https://arxiv.org/abs/2103.10895
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
Depth from Camera Motion and Object Detection
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Paper: https://arxiv.org/abs/2012.02206
Code: https://github.com/daveredrum/Scan2Cap
Dataset: https://github.com/daveredrum/ScanRefer
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
其他(Others)
Fast and Accurate Model Scaling
Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Dollar_Fast_and_Accurate_Model_Scaling_CVPR_2021_paper.html
Code: https://github.com/facebookresearch/pycls
Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
Homepage: https://www.yasamin.page/hdnet_tiktok
Paper(Oral): https://arxiv.org/abs/2103.03319
Code: https://github.com/yasaminjafarian/HDNet_TikTok
Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v
Omnimatte: Associating Objects and Their Effects in Video
Homepage: https://omnimatte.github.io/
Paper(Oral): https://arxiv.org/abs/2105.06993
Code: https://omnimatte.github.io/#code
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
Motion Representations for Articulated Animation
Deep Lucas-Kanade Homography for Multimodal Image Alignment
Skip-Convolutions for Efficient Video Processing
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control
Homepage: http://tomasjakab.github.io/KeypointDeformer
Paper(Oral): https://arxiv.org/abs/2104.11224
Code: https://github.com/tomasjakab/keypoint_deformer/
Learning To Count Everything
SOLD2: Self-supervised Occlusion-aware Line Description and Detection
Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
LEAP: Learning Articulated Occupancy of People
Visual Semantic Role Labeling for Video Understanding
Homepage: https://vidsitu.org/
Paper: https://arxiv.org/abs/2104.00990
Code: https://github.com/TheShadow29/VidSitu
Dataset: https://github.com/TheShadow29/VidSitu
UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning
Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
Towards High Fidelity Face Relighting with Realistic Shadows
BRepNet: A topological message passing system for solid models
Visually Informed Binaural Audio Generation without Binaural Audios
Homepage: https://sheldontsui.github.io/projects/PseudoBinaural
Paper: None
GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021
Demo: https://www.youtube.com/watch?v=r-uC2MyAWQc
Exploring intermediate representation for monocular vehicle pose estimation
Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB
Invertible Image Signal Processing
Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling
SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences
Embedding Transfer with Label Relaxation for Improved Metric Learning
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes
Meta-Mining Discriminative Samples for Kinship Verification
Cloud2Curve: Generation and Vectorization of Parametric Sketches
TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
Homepage: http://wellyzhang.github.io/project/prae.html
Paper: https://arxiv.org/abs/2103.14230
Code: None
ACRE: Abstract Causal REasoning Beyond Covariation
Homepage: http://wellyzhang.github.io/project/acre.html
Paper: https://arxiv.org/abs/2103.14232
Code: None
Confluent Vessel Trees with Accurate Bifurcations
Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling
Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks
Knowledge Evolution in Neural Networks
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
SGP: Self-supervised Geometric Perception
Oral
Paper: https://arxiv.org/abs/2103.03114
Code: https://github.com/theNded/SGP
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
Diffusion Probabilistic Models for 3D Point Cloud Generation
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Paper: https://arxiv.org/abs/2012.02206
Code: https://github.com/daveredrum/Scan2Cap
Dataset: https://github.com/daveredrum/ScanRefer
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill
Dataset: http://rl.uni-freiburg.de/research/multimodal-distill
待添加(TODO)
不确定中没中(Not Sure)
CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models
Toward Explainable Reflection Removal with Distilling and Model Uncertainty
DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation
Exploring Adversarial Fake Images on Face Manifold
Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task
Temporal Contrastive Graph for Self-supervised Video Representation Learning
Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching
Fast and Memory-Efficient Compact Bilinear Pooling
Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine
Estimating A Child’s Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation
https://github.com/ShaoQiangShen/CVPR2021
https://github.com/gillesflash/CVPR2021
https://github.com/anonymous-submission1991/BaLeNAS
https://github.com/cvpr2021dcb/cvpr2021dcb
https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578
https://github.com/AldrichZeng/FreqPrune
https://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM
https://github.com/ddfss/datadrive-fss