DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts.
|
ICML |
2025 |
0 |
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts.
|
CVPR |
2025 |
0 |
MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding.
|
CVPR |
2023 |
52 |
Using Language to Extend to Unseen Domains.
|
ICLR |
2023 |
0 |
On Guiding Visual Attention with Language Specification.
|
CVPR |
2022 |
41 |
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens.
|
NIPS/NeurIPS |
2022 |
17 |
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning.
|
ECCV |
2022 |
66 |
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly.
|
ECCV |
2022 |
81 |
K-LITE: Learning Transferable Visual Models with External Knowledge.
|
NIPS/NeurIPS |
2022 |
97 |
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension.
|
ACL |
2022 |
166 |
TL;DW? Summarizing Instructional Videos with Task Relevance and Cross-Modal Saliency.
|
ECCV |
2022 |
0 |
DETReg: Unsupervised Pretraining with Region Priors for Object Detection.
|
CVPR |
2022 |
0 |
Object-Region Video Transformers.
|
CVPR |
2022 |
0 |
How Much Can CLIP Benefit Vision-and-Language Tasks?
|
ICLR |
2022 |
0 |
NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media.
|
EMNLP |
2021 |
133 |
CLIP-It! Language-Guided Video Summarization.
|
NIPS/NeurIPS |
2021 |
161 |
Compositional Video Synthesis with Action Graphs.
|
ICML |
2021 |
0 |
Identity-Aware Multi-sentence Video Description.
|
ECCV |
2020 |
22 |
Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules.
|
CVPR |
2020 |
0 |
Robust Change Captioning.
|
ICCV |
2019 |
207 |
Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation.
|
ACL |
2019 |
106 |
Language-Conditioned Graph Networks for Relational Reasoning.
|
ICCV |
2019 |
182 |
Adversarial Inference for Multi-Sentence Video Description.
|
CVPR |
2019 |
0 |
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence.
|
CVPR |
2018 |
461 |
Video Object Segmentation with Language Referring Expressions.
|
ACCV |
2018 |
247 |
Women Also Snowboard: Overcoming Bias in Captioning Models.
|
ECCV |
2018 |
509 |
Textual Explanations for Self-Driving Vehicles.
|
ECCV |
2018 |
418 |
Speaker-Follower Models for Vision-and-Language Navigation.
|
NIPS/NeurIPS |
2018 |
580 |
Object Hallucination in Image Captioning.
|
EMNLP |
2018 |
654 |
Fooling Vision and Language Models Despite Localization and Attention Mechanism.
|
CVPR |
2018 |
0 |
Gradient-free Policy Architecture Search and Adaptation.
|
CoRL |
2017 |
30 |
Generating Descriptions with Grounded and Co-referenced People.
|
CVPR |
2017 |
72 |
A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering.
|
CVPR |
2017 |
0 |
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.
|
EMNLP |
2016 |
1556 |
Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags.
|
AAAI |
2016 |
29 |
Grounding of Textual Phrases in Images by Reconstruction.
|
ECCV |
2016 |
0 |
A dataset for Movie Description.
|
CVPR |
2015 |
547 |