The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It.
|
EMNLP |
2025 |
7 |
Reason to Rote: Rethinking Memorization in Reasoning.
|
EMNLP |
2025 |
3 |
What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns.
|
ACL |
2025 |
4 |
Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges.
|
ACL |
2025 |
20 |
Disentangling Subjectivity and Uncertainty for Hate Speech Annotation and Modeling using Gaze.
|
EMNLP |
2025 |
1 |
Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages.
|
COLING |
2025 |
0 |
LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference.
|
EMNLP |
2025 |
2 |
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation.
|
EMNLP |
2025 |
3 |
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis.
|
EMNLP |
2025 |
7 |
Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set.
|
ACL |
2025 |
6 |
Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies Between Model Predictions and Human Responses in VQA.
|
AAAI |
2025 |
0 |
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models.
|
ACL |
2025 |
0 |
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks.
|
ACL |
2025 |
0 |
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study.
|
ACL |
2025 |
0 |
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation.
|
ICLR |
2025 |
0 |
Evaluating Pixel Language Models on Non-Standardized Languages.
|
COLING |
2025 |
0 |
RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs.
|
EMNLP |
2025 |
0 |
Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification.
|
ACL |
2024 |
7 |
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models.
|
EMNLP |
2024 |
11 |
VariErr NLI: Separating Annotation Error from Human Label Variation.
|
ACL |
2024 |
43 |
Position: Insights from Survey Methodology can Improve Training Data.
|
ICML |
2024 |
11 |
Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?
|
EACL |
2024 |
17 |
NNOSE: Nearest Neighbor Occupational Skill Extraction.
|
EACL |
2024 |
11 |
Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning.
|
ACL |
2024 |
16 |
Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties.
|
EACL |
2024 |
6 |
ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation.
|
EMNLP |
2023 |
15 |
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives.
|
ACL |
2023 |
11 |
What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability.
|
EMNLP |
2023 |
46 |
From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification.
|
EMNLP |
2023 |
15 |
ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain.
|
ACL |
2023 |
28 |
Establishing Trustworthiness: Rethinking Tasks and Model Evaluation.
|
EMNLP |
2023 |
3 |
On Language Spaces, Scales and Cross-Lingual Transfer of UD Parsers.
|
CoNLL |
2022 |
5 |
Probing for Labeled Dependency Trees.
|
ACL |
2022 |
10 |
Spectral Probing.
|
EMNLP |
2022 |
3 |
Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning.
|
JAIR |
2022 |
61 |
Stop Measuring Calibration When Humans Disagree.
|
EMNLP |
2022 |
67 |
Evidence \textgreater Intuition: Transferability Estimation for Encoder Selection.
|
EMNLP |
2022 |
1 |
The "Problem" of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation.
|
EMNLP |
2022 |
0 |
Genre as Weak Supervision for Cross-lingual Dependency Parsing.
|
EMNLP |
2021 |
20 |
Learning from Disagreement: A Survey.
|
JAIR |
2021 |
254 |
Biomedical Event Extraction as Sequence Labeling.
|
EMNLP |
2020 |
74 |
DaN+: Danish Nested Named Entities and Lexical Normalization.
|
COLING |
2020 |
40 |
Neural Unsupervised Domain Adaptation in NLP - A Survey.
|
COLING |
2020 |
0 |
Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering.
|
ACL |
2019 |
55 |
Bleaching Text: Abstract Features for Cross-lingual Gender Prediction.
|
ACL |
2018 |
63 |
Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging.
|
EMNLP |
2018 |
53 |
Strong Baselines for Neural Semi-Supervised Learning under Domain Shift.
|
ACL |
2018 |
177 |
Parsing Universal Dependencies without training.
|
EACL |
2017 |
19 |
Learning to select data for transfer learning with Bayesian Optimization.
|
EMNLP |
2017 |
201 |
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures (Extended Abstract).
|
IJCAI |
2017 |
25 |
Cross-lingual tagger evaluation without test data.
|
EACL |
2017 |
6 |
When is multitask learning effective? Semantic sequence prediction under varying data conditions.
|
EACL |
2017 |
0 |
Semantic Tagging with Deep Residual Networks.
|
COLING |
2016 |
80 |
Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss.
|
ACL |
2016 |
419 |
Multi-view and multi-task training of RST discourse parsers.
|
COLING |
2016 |
56 |
Keystroke dynamics as signal for shallow syntactic parsing.
|
COLING |
2016 |
44 |
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures.
|
JAIR |
2016 |
381 |
Do dependency parsing metrics correlate with human judgments?
|
CoNLL |
2015 |
18 |
Inverted indexing for cross-lingual NLP.
|
ACL |
2015 |
92 |
Semantic Representations for Domain Adaptation: A Case Study on the Tree Kernel-based Method for Relation Extraction.
|
ACL |
2015 |
42 |
Using Frame Semantics for Knowledge Extraction from Twitter.
|
AAAI |
2015 |
34 |
Linguistically debatable or just plain wrong?
|
ACL |
2014 |
133 |
Adapting taggers to Twitter with not-so-distant supervision.
|
COLING |
2014 |
39 |
Learning part-of-speech taggers with inter-annotator agreement loss.
|
EACL |
2014 |
126 |
Importance weighting and unsupervised domain adaptation of POS taggers: a negative result.
|
EMNLP |
2014 |
22 |
Experiments with crowdsourced re-annotation of a POS tagging data set.
|
ACL |
2014 |
48 |
Opinion Mining on YouTube.
|
ACL |
2014 |
48 |
What's in a p-value in NLP?
|
CoNLL |
2014 |
0 |
Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction.
|
ACL |
2013 |
126 |
Reversible Stochastic Attribute-Value Grammars.
|
ACL |
2011 |
51 |
Effective Measures of Domain Similarity for Parsing.
|
ACL |
2011 |
109 |