Language models can learn implicit multi-hop reasoning, but only if they have lots of training data.
|
EMNLP |
2025 |
4 |
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers.
|
ICML |
2025 |
23 |
A Formal Framework for Understanding Length Generalization in Transformers.
|
ICLR |
2025 |
0 |
Separations in the Representational Capabilities of Transformers and Recurrent Architectures.
|
NIPS/NeurIPS |
2024 |
26 |
More frequent verbs are associated with more diverse valency frames: Efficient principles at the lexicon-grammar interface.
|
ACL |
2024 |
1 |
InversionView: A General-Purpose Method for Reading Information from Neural Activations.
|
NIPS/NeurIPS |
2024 |
9 |
Why are Sensitive Functions Hard for Transformers?
|
ACL |
2024 |
54 |
Information Locality in the Processing of Classifier-Noun Dependencies in Mandarin Chinese.
|
Cognitive Science |
2024 |
0 |
The Expressive Capacity of State Space Models: A Formal Language Perspective.
|
NIPS/NeurIPS |
2024 |
32 |
How Do Syntactic Statistics and Semantic Plausibility Modulate Local Coherence Effects.
|
Cognitive Science |
2023 |
1 |
Explaining patterns of fusion in morphological paradigms using the memory-surprisal tradeoff.
|
Cognitive Science |
2022 |
4 |
Modeling Fixation Behavior in Reading with Character-level Neural Attention.
|
Cognitive Science |
2022 |
1 |
An Information-Theoretic Characterization of Morphological Fusion.
|
EMNLP |
2021 |
15 |
RNNs can generate bounded hierarchical languages with optimal memory.
|
EMNLP |
2020 |
59 |
Character-based Surprisal as a Model of Reading Difficulty in the Presence of Errors.
|
Cognitive Science |
2019 |
5 |
An Information-Theoretic Explanation of Adjective Ordering Preferences.
|
Cognitive Science |
2018 |
45 |
Modeling Human Reading with Neural Attention.
|
EMNLP |
2016 |
57 |