Computer Vision - ECCV 2022 : 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXV
- Binding: Paperback
- Publisher: Springer
- Publish date: 11/26/2022
Description:
Efficient One-Stage Video Object Detection by Exploiting Temporal Consistency.- Leveraging Action Affinity and Continuity for Semi-Supervised Temporal Action Segmentation.- Spotting Temporally Precise, Fine-Grained Events in Video.- Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation.- Efficient Video Transformers with Spatial-Temporal Token Selection.- Long Movie Clip Classification with State-Space Video Models.- Prompting Visual-Language Models for Efficient Video Understanding.- Asymmetric Relation Consistency Reasoning for Video Relation Grounding.- Self-Supervised Social Relation Representation for Human Group Detection.- K-Centered Patch Sampling for Efficient Video Recognition.- A Deep Moving-Camera Background Model.- GraphVid: It Only Takes a Few Nodes to Understand a Video.- Delta Distillation for Efficient Video Processing.- MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning.- COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality.- E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context.- TDViT: Temporal Dilated Video Transformer for Dense Video Tasks.- Semi-Supervised Learning of Optical Flow by Flow Supervisor.- Flow Graph to Video Grounding for Weakly-Supervised Multi-step Localization.- Deep 360 Optical Flow Estimation Based on Multi-Projection Fusion.- MaCLR: Motion-Aware Contrastive Learning of Representations for Videos.- Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection.- Frozen CLIP Models Are Efficient Video Learners.- PIP: Physical Interaction Prediction via Mental Simulation with Span Selection.- Panoramic Vision Transformer for Saliency Detection in 360 Videos.- Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration.- Motion Sensitive Contrastive Learning for Self-Supervised Video Representation.- Dynamic Temporal Filtering In Video Models.- Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.- Temporal Lift Pooling for Continuous Sign Language Recognition.- MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes.- SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding.- Cross-Modal Prototype Driven Network for Radiology Report Generation.- TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts.- SeqTR: A Simple Yet Universal Network for Visual Grounding.- VTC: Improving Video-Text Retrieval with User Comments.- FashionViL: Fashion-Focused Vision-and-Language Representation Learning.- Weakly Supervised Grounding for VQA in Vision-Language Transformers.- Automatic Dense Annotation of Large-Vocabulary Sign Language Videos.- MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval.- GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval.- A Simple and Robust Correlation Filtering Method for Text-Based Person Search.
Expand description

Please Wait
Usually Processes in 1 business day