Marcus Rohrbach

61 publications

11 venues

H Index 29

Name Venue Year citations
Predicting Implicit Arguments in Procedural Video Instructions. ACL 2025 0
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts. ICML 2025 0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts. CVPR 2025 0
Efficient Pre-training for Localized Instruction Generation of Procedural Videos. ECCV 2024 1
Improving Selective Visual Question Answering by Learning from Your Peers. CVPR 2023 28
Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition. ECCV 2022 48
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly. ECCV 2022 81
Learning To Recognize Procedural Activities with Distant Supervision. CVPR 2022 100
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition. ECCV 2022 0
FLAVA: A Foundational Language And Vision Alignment Model. CVPR 2022 0
SMART Frame Selection for Action Recognition. AAAI 2021 0
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA. CVPR 2021 0
Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting. ICLR 2021 0
In Defense of Grid Features for Visual Question Answering. CVPR 2020 360
TextCaps: A Dataset for Image Captioning with Reading Comprehension. ECCV 2020 530
Adversarial Continual Learning. ECCV 2020 227
12-in-1: Multi-Task Vision and Language Representation Learning. CVPR 2020 35
Learning to Generate Grounded Visual Captions Without Localization Supervision. ECCV 2020 0
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA. CVPR 2020 0
Decoupling Representation and Classifier for Long-Tailed Recognition. ICLR 2020 0
Uncertainty-guided Continual Learning with Bayesian Neural Networks. ICLR 2020 0
Towards VQA Models That Can Read. CVPR 2019 1856
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition. CVPR 2019 133
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution. ICCV 2019 650
Cycle-Consistency for Robust Visual Question Answering. CVPR 2019 206
Large-Scale Visual Relationship Understanding. AAAI 2019 0
Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering. ICML 2019 0
CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication. ACL 2019 0
Grounded Video Description. CVPR 2019 0
Graph-Based Global Reasoning Networks. CVPR 2019 0
Adversarial Inference for Multi-Sentence Video Description. CVPR 2019 0
A Dataset for Telling the Stories of Social Media Videos. EMNLP 2018 63
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence. CVPR 2018 461
Exploring the Challenges Towards Lifelong Fact Learning. ACCV 2018 12
Memory Aware Synapses: Learning What (not) to Forget. ECCV 2018 11
Visual Coreference Resolution in Visual Dialog Using Neural Module Networks. ECCV 2018 172
Learning to Reason: End-to-End Module Networks for Visual Question Answering. ICCV 2017 599
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training. ICCV 2017 15
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. TPAMI 2017 312
Generating Descriptions with Grounded and Co-referenced People. CVPR 2017 72
Modeling Relationships in Referential Expressions with Compositional Modular Networks. CVPR 2017 0
Captioning Images with Diverse Objects. CVPR 2017 0
Generating Visual Explanations. ECCV 2016 649
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data. CVPR 2016 30
Segmentation from Natural Language Expressions. ECCV 2016 520
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. EMNLP 2016 1556
Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags. AAAI 2016 29
Grounding of Textual Phrases in Images by Reconstruction. ECCV 2016 0
Neural Module Networks. CVPR 2016 0
Natural Language Object Retrieval. CVPR 2016 0
Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images. ICCV 2015 634
Spatial Semantic Regularisation for Large Scale Object Detection. ICCV 2015 25
A dataset for Movie Description. CVPR 2015 547
Long-term recurrent convolutional networks for visual recognition and description. CVPR 2015 0
Sequence to Sequence - Video to Text. ICCV 2015 0
Transfer Learning in a Transductive Setting. NIPS/NeurIPS 2013 253
Translating Video Content to Natural Language Descriptions. ICCV 2013 377
A database for fine grained activity detection of cooking activities. CVPR 2012 626
Script Data for Attribute-Based Recognition of Composite Activities. ECCV 2012 200
Evaluating knowledge transfer and zero-shot learning in a large-scale setting. CVPR 2011 374
What helps where - and why? Semantic relatedness for knowledge transfer. CVPR 2010 0
Copyright ©2019 Universität Würzburg

Impressum | Privacy | FAQ