Anna Rohrbach

37 publications

11 venues

H Index 22

Affiliation

TU Darmstadt, Germany
University of California, Berkeley, CA, USA
Max Planck Institute for Informatics, Saarbr cken, Germany

Links

Name Venue Year citations
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts. ICML 2025 0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts. CVPR 2025 0
MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding. CVPR 2023 52
Using Language to Extend to Unseen Domains. ICLR 2023 0
On Guiding Visual Attention with Language Specification. CVPR 2022 41
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens. NIPS/NeurIPS 2022 17
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning. ECCV 2022 66
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly. ECCV 2022 81
K-LITE: Learning Transferable Visual Models with External Knowledge. NIPS/NeurIPS 2022 97
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension. ACL 2022 166
TL;DW? Summarizing Instructional Videos with Task Relevance and Cross-Modal Saliency. ECCV 2022 0
DETReg: Unsupervised Pretraining with Region Priors for Object Detection. CVPR 2022 0
Object-Region Video Transformers. CVPR 2022 0
How Much Can CLIP Benefit Vision-and-Language Tasks? ICLR 2022 0
NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media. EMNLP 2021 133
CLIP-It! Language-Guided Video Summarization. NIPS/NeurIPS 2021 161
Compositional Video Synthesis with Action Graphs. ICML 2021 0
Identity-Aware Multi-sentence Video Description. ECCV 2020 22
Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules. CVPR 2020 0
Robust Change Captioning. ICCV 2019 207
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation. ACL 2019 106
Language-Conditioned Graph Networks for Relational Reasoning. ICCV 2019 182
Adversarial Inference for Multi-Sentence Video Description. CVPR 2019 0
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence. CVPR 2018 461
Video Object Segmentation with Language Referring Expressions. ACCV 2018 247
Women Also Snowboard: Overcoming Bias in Captioning Models. ECCV 2018 509
Textual Explanations for Self-Driving Vehicles. ECCV 2018 418
Speaker-Follower Models for Vision-and-Language Navigation. NIPS/NeurIPS 2018 580
Object Hallucination in Image Captioning. EMNLP 2018 654
Fooling Vision and Language Models Despite Localization and Attention Mechanism. CVPR 2018 0
Gradient-free Policy Architecture Search and Adaptation. CoRL 2017 30
Generating Descriptions with Grounded and Co-referenced People. CVPR 2017 72
A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering. CVPR 2017 0
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. EMNLP 2016 1556
Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags. AAAI 2016 29
Grounding of Textual Phrases in Images by Reconstruction. ECCV 2016 0
A dataset for Movie Description. CVPR 2015 547
Copyright ©2019 Universität Würzburg

Impressum | Privacy | FAQ