Efficient Pre-training for Localized Instruction Generation of Procedural Videos.
|
ECCV |
2024 |
0 |
Improving Selective Visual Question Answering by Learning from Your Peers.
|
CVPR |
2023 |
0 |
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly.
|
ECCV |
2022 |
2 |
Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition.
|
ECCV |
2022 |
1 |
Learning To Recognize Procedural Activities with Distant Supervision.
|
CVPR |
2022 |
7 |
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition.
|
ECCV |
2022 |
0 |
FLAVA: A Foundational Language And Vision Alignment Model.
|
CVPR |
2022 |
0 |
SMART Frame Selection for Action Recognition.
|
AAAI |
2021 |
0 |
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA.
|
CVPR |
2021 |
0 |
Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting.
|
ICLR |
2021 |
0 |
Adversarial Continual Learning.
|
ECCV |
2020 |
81 |
In Defense of Grid Features for Visual Question Answering.
|
CVPR |
2020 |
175 |
TextCaps: A Dataset for Image Captioning with Reading Comprehension.
|
ECCV |
2020 |
72 |
Learning to Generate Grounded Visual Captions Without Localization Supervision.
|
ECCV |
2020 |
0 |
Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA.
|
CVPR |
2020 |
0 |
12-in-1: Multi-Task Vision and Language Representation Learning.
|
CVPR |
2020 |
0 |
Decoupling Representation and Classifier for Long-Tailed Recognition.
|
ICLR |
2020 |
0 |
Uncertainty-guided Continual Learning with Bayesian Neural Networks.
|
ICLR |
2020 |
0 |
Cycle-Consistency for Robust Visual Question Answering.
|
CVPR |
2019 |
113 |
Towards VQA Models That Can Read.
|
CVPR |
2019 |
240 |
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution.
|
ICCV |
2019 |
330 |
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition.
|
CVPR |
2019 |
76 |
Large-Scale Visual Relationship Understanding.
|
AAAI |
2019 |
0 |
Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering.
|
ICML |
2019 |
0 |
CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication.
|
ACL |
2019 |
0 |
Grounded Video Description.
|
CVPR |
2019 |
0 |
Graph-Based Global Reasoning Networks.
|
CVPR |
2019 |
0 |
Adversarial Inference for Multi-Sentence Video Description.
|
CVPR |
2019 |
0 |
A Dataset for Telling the Stories of Social Media Videos.
|
EMNLP |
2018 |
44 |
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence.
|
CVPR |
2018 |
287 |
Exploring the Challenges Towards Lifelong Fact Learning.
|
ACCV |
2018 |
12 |
Visual Coreference Resolution in Visual Dialog Using Neural Module Networks.
|
ECCV |
2018 |
128 |
Memory Aware Synapses: Learning What (not) to Forget.
|
ECCV |
2018 |
0 |
Generating Descriptions with Grounded and Co-referenced People.
|
CVPR |
2017 |
54 |
Learning to Reason: End-to-End Module Networks for Visual Question Answering.
|
ICCV |
2017 |
475 |
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training.
|
ICCV |
2017 |
218 |
Modeling Relationships in Referential Expressions with Compositional Modular Networks.
|
CVPR |
2017 |
0 |
Captioning Images with Diverse Objects.
|
CVPR |
2017 |
0 |
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.
|
TPAMI |
2017 |
0 |
Segmentation from Natural Language Expressions.
|
ECCV |
2016 |
212 |
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding.
|
EMNLP |
2016 |
1218 |
Generating Visual Explanations.
|
ECCV |
2016 |
509 |
Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags.
|
AAAI |
2016 |
30 |
Grounding of Textual Phrases in Images by Reconstruction.
|
ECCV |
2016 |
0 |
Neural Module Networks.
|
CVPR |
2016 |
0 |
Natural Language Object Retrieval.
|
CVPR |
2016 |
0 |
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data.
|
CVPR |
2016 |
0 |
Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images.
|
ICCV |
2015 |
551 |
Spatial Semantic Regularisation for Large Scale Object Detection.
|
ICCV |
2015 |
22 |
A dataset for Movie Description.
|
CVPR |
2015 |
360 |
Long-term recurrent convolutional networks for visual recognition and description.
|
CVPR |
2015 |
0 |
Sequence to Sequence - Video to Text.
|
ICCV |
2015 |
0 |
Transfer Learning in a Transductive Setting.
|
NIPS/NeurIPS |
2013 |
229 |
Translating Video Content to Natural Language Descriptions.
|
ICCV |
2013 |
334 |
Script Data for Attribute-Based Recognition of Composite Activities.
|
ECCV |
2012 |
142 |
A database for fine grained activity detection of cooking activities.
|
CVPR |
2012 |
517 |
Evaluating knowledge transfer and zero-shot learning in a large-scale setting.
|
CVPR |
2011 |
333 |
What helps where - and why? Semantic relatedness for knowledge transfer.
|
CVPR |
2010 |
0 |