Suvrit Sra

97 publications

18 venues

H Index 39

Affiliation

Massachusetts Institute of Technology (MIT), Laboratory for Information and Decision Systems, Cambridge, MA, USA
Max Planck Institute for Biological Cybernetics, T bingen, Germany
University of Texas at Austin, Department of Computer Sciences, Austin, TX, USA

Links

Name	Venue	Year	citations
Graph Transformers Dream of Electric Flow.	ICLR	2025	0
First-Order Methods for Linearly Constrained Bilevel Optimization.	NIPS/NeurIPS	2024	13
How to Escape Sharp Minima with Random Perturbations.	ICML	2024	0
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context.	ICML	2024	0
Linear attention is (maybe) all you need (to understand Transformer optimization).	ICLR	2024	0
Transformers learn to implement preconditioned gradient descent for in-context learning.	NIPS/NeurIPS	2023	260
On the Training Instability of Shuffling SGD with Batch Normalization.	ICML	2023	6
The Crucial Role of Normalization in Sharpness-Aware Minimization.	NIPS/NeurIPS	2023	30
Global optimality for Euclidean CCCP under Riemannian convexity.	ICML	2023	8
Sign and Basis Invariant Networks for Spectral Graph Representation Learning.	ICLR	2023	0
Efficient Sampling on Riemannian Manifolds via Langevin MCMC.	NIPS/NeurIPS	2022	0
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond.	ICLR	2022	0
Understanding the unstable convergence of gradient descent.	ICML	2022	81
CCCP is Frank-Wolfe in disguise.	NIPS/NeurIPS	2022	19
Max-Margin Contrastive Learning.	AAAI	2022	0
Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity.	ICML	2022	0
Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective.	ICML	2022	0
Understanding Riemannian Acceleration via a Proximal Extragradient Framework.	COLT	2022	0
Three Operator Splitting with a Nonconvex Loss Function.	ICML	2021	13
Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates.	NIPS/NeurIPS	2021	12
Open Problem: Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?	COLT	2021	13
Provably Efficient Algorithms for Multi-Objective Competitive RL.	ICML	2021	24
Can contrastive learning avoid shortcut solutions?	NIPS/NeurIPS	2021	164
Online Learning in Unknown Markov Games.	ICML	2021	46
Coping with Label Shift via Distributionally Robust Optimisation.	ICLR	2021	0
Contrastive Learning with Hard Negative Samples.	ICLR	2021	0
Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes.	NIPS/NeurIPS	2020	27
Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions.	ICML	2020	90
Strength from Weakness: Fast Learning Using Weak Supervision.	ICML	2020	35
From Nesterov's Estimate Sequence to Riemannian Acceleration.	COLT	2020	85
Geodesically-convex optimization for averaging partially observed covariance matrices.	ACML	2020	3
Why are Adaptive Methods Good for Attention Models?	NIPS/NeurIPS	2020	345
SGD with shuffling: optimal rates without component convexity and large epoch requirements.	NIPS/NeurIPS	2020	70
Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition.	ICML	2020	0
Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity.	ICLR	2020	0
Are deep ResNets provably better than linear predictors?	NIPS/NeurIPS	2019	14
Escaping Saddle Points with Adaptive Gradient Methods.	ICML	2019	78
Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator.	ICML	2019	51
Flexible Modeling of Diversity with Strongly Log-Concave Distributions.	NIPS/NeurIPS	2019	12
Random Shuffling Beats SGD after Finite Epochs.	ICML	2019	0
Learning Determinantal Point Processes by Corrective Negative Sampling.	AISTATS	2019	0
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity.	NIPS/NeurIPS	2019	0
Non-Linear Temporal Subspace Representations for Activity Recognition.	CVPR	2018	45
An Estimate Sequence for Geodesically Convex Optimization.	COLT	2018	63
Direct Runge-Kutta Discretization Achieves Acceleration.	NIPS/NeurIPS	2018	112
Exponentiated Strongly Rayleigh Distributions.	NIPS/NeurIPS	2018	14
A Generic Approach for Escaping Saddle points.	AISTATS	2018	0
Modular Proximal Optimization for Multidimensional Total-Variation Regularization.	JMLR	2018	0
Elementary Symmetric Polynomials for Optimal Experimental Design.	NIPS/NeurIPS	2017	20
Polynomial time algorithms for dual volume sampling.	NIPS/NeurIPS	2017	31
Combinatorial Topic Models using Small-Variance Asymptotics.	AISTATS	2017	0
Fast DPP Sampling for Nystrom with Application to Kernel Methods.	ICML	2016	76
Kronecker Determinantal Point Processes.	NIPS/NeurIPS	2016	32
Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization.	NIPS/NeurIPS	2016	202
First-order Methods for Geodesically Convex Optimization.	COLT	2016	320
AdaDelay: Delay Adaptive Distributed Stochastic Optimization.	AISTATS	2016	45
Geometric Mean Metric Learning.	ICML	2016	178
Stochastic Variance Reduction for Nonconvex Optimization.	ICML	2016	642
Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling.	NIPS/NeurIPS	2016	39
Gaussian quadrature for matrix inverse forms with applications.	ICML	2016	0
Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms.	ICML	2016	0
Efficient Sampling for k-Determinantal Point Processes.	AISTATS	2016	0
Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds.	NIPS/NeurIPS	2016	0
Fixed-point algorithms for learning determinantal point processes.	ICML	2015	55
On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants.	NIPS/NeurIPS	2015	199
Matrix Manifold Optimization for Gaussian Mixtures.	NIPS/NeurIPS	2015	97
Data modeling with the elliptical gamma distribution.	AISTATS	2015	6
Large-scale randomized-coordinate descent methods with non-separable linear constraints.	UAI	2015	0
Fast Newton methods for the group fused lasso.	UAI	2014	17
Efficient Structured Matrix Rank Minimization.	NIPS/NeurIPS	2014	20
Towards an optimal stochastic alternating direction method of multipliers.	ICML	2014	59
Riemannian Sparse Coding for Positive Definite Matrices.	ECCV	2014	55
Randomized Nonlinear Component Analysis.	ICML	2014	183
Geometric optimisation on positive definite matrices for elliptically contoured distributions.	NIPS/NeurIPS	2013	30
Jensen-Bregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices.	TPAMI	2013	180
Reflection methods for user-friendly submodular optimization.	NIPS/NeurIPS	2013	80
Fast projections onto mixed-norm balls with applications.	DMKD	2012	29
A new metric on the manifold of kernel matrices with application to matrix geometric means.	NIPS/NeurIPS	2012	155
Scalable nonconvex inexact proximal splitting.	NIPS/NeurIPS	2012	69
Generalized Dictionary Learning for Symmetric Positive Definite Matrices with Application to Nearest Neighbor Retrieval.	ECML/PKDD	2011	50
Fast Newton-type Methods for Total Variation Regularization.	ICML	2011	94
Efficient similarity search for covariance matrices via the Jensen-Bregman LogDet Divergence.	ICCV	2011	85
Fast Projections onto ℓ1, q -Norm Balls for Grouped Feature Selection.	ECML/PKDD	2011	39
Efficient filter flow for space-variant multiframe blind deconvolution.	CVPR	2010	265
A scalable trust-region algorithm with application to mixed-norm regression.	ICML	2010	40
Workshop summary: Numerical mathematics in machine learning.	ICML	2009	0
Convex Perturbations for Scalable Semidefinite Programming.	AISTATS	2009	9
Block-Iterative Algorithms for Non-negative Matrix Approximation.	ICDM	2008	5
Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem.	SDM	2007	145
Information-theoretic metric learning.	ICML	2007	0
Efficient Large Scale Linear Programming Support Vector Machines.	ECML/PKDD	2006	20
Incremental Aspect Models for Mining Document Streams.	ECML/PKDD	2006	19
Generalized Nonnegative Matrix Approximations with Bregman Divergences.	NIPS/NeurIPS	2005	522
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions.	JMLR	2005	1034
Triangle Fixing Algorithms for the Metric Nearness Problem.	NIPS/NeurIPS	2004	23
Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data.	SDM	2004	329
Generative model-based clustering of directional data.	KDD	2003	123