Anna Rohrbach

37 publications

11 venues

H Index 22

Affiliation

TU Darmstadt, Germany
University of California, Berkeley, CA, USA
Max Planck Institute for Informatics, Saarbr cken, Germany

Links

Name	Venue	Year	citations
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts.	ICML	2025	0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts.	CVPR	2025	0
MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding.	CVPR	2023	52
Using Language to Extend to Unseen Domains.	ICLR	2023	0
On Guiding Visual Attention with Language Specification.	CVPR	2022	41
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens.	NIPS/NeurIPS	2022	17
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning.	ECCV	2022	66
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly.	ECCV	2022	81
K-LITE: Learning Transferable Visual Models with External Knowledge.	NIPS/NeurIPS	2022	97
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension.	ACL	2022	166
TL;DW? Summarizing Instructional Videos with Task Relevance and Cross-Modal Saliency.	ECCV	2022	0
DETReg: Unsupervised Pretraining with Region Priors for Object Detection.	CVPR	2022	0
Object-Region Video Transformers.	CVPR	2022	0
How Much Can CLIP Benefit Vision-and-Language Tasks?	ICLR	2022	0
NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media.	EMNLP	2021	133
CLIP-It! Language-Guided Video Summarization.	NIPS/NeurIPS	2021	161
Compositional Video Synthesis with Action Graphs.	ICML	2021	0
Identity-Aware Multi-sentence Video Description.	ECCV	2020	22
Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules.	CVPR	2020	0
Robust Change Captioning.	ICCV	2019	207
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation.	ACL	2019	106
Language-Conditioned Graph Networks for Relational Reasoning.	ICCV	2019	182
Adversarial Inference for Multi-Sentence Video Description.	CVPR	2019	0
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence.	CVPR	2018	461
Video Object Segmentation with Language Referring Expressions.	ACCV	2018	247
Women Also Snowboard: Overcoming Bias in Captioning Models.	ECCV	2018	509
Textual Explanations for Self-Driving Vehicles.	ECCV	2018	418
Speaker-Follower Models for Vision-and-Language Navigation.	NIPS/NeurIPS	2018	580
Object Hallucination in Image Captioning.	EMNLP	2018	654
Fooling Vision and Language Models Despite Localization and Attention Mechanism.	CVPR	2018	0
Gradient-free Policy Architecture Search and Adaptation.	CoRL	2017	30
Generating Descriptions with Grounded and Co-referenced People.	CVPR	2017	72
A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering.	CVPR	2017	0
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.	EMNLP	2016	1556
Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags.	AAAI	2016	29
Grounding of Textual Phrases in Images by Reconstruction.	ECCV	2016	0
A dataset for Movie Description.	CVPR	2015	547