Attention layers provably solve single-location regression

Pierre Marion; Raphaël Berthier; Gérard Biau; Claire Boyer

Pré-Publication, Document De Travail Année : 2024

Attention layers provably solve single-location regression

(1) , (2, 3) , (4) , (5)

1
2
3
4
5

Pierre Marion

Fonction : Auteur correspondant

Département de Mathématiques - EPFL

Raphaël Berthier

Fonction : Auteur

Inria de Paris

Sorbonne Université

Gérard Biau

Fonction : Auteur

Laboratoire de Probabilités, Statistique et Modélisation

Claire Boyer

Fonction : Auteur

Laboratoire de Mathématiques d'Orsay

Résumé

Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-location regression task, where only one token in a sequence determines the output, and its position is a latent random variable, retrievable via a linear projection of the input. To solve this task, we propose a dedicated predictor, which turns out to be a simplified version of a non-linear self-attention layer. We study its theoretical properties, by showing its asymptotic Bayes optimality and analyzing its training dynamics. In particular, despite the non-convex nature of the problem, the predictor effectively learns the underlying structure. This work highlights the capacity of attention mechanisms to handle sparse token information and internal linear structures.

Domaines

Machine Learning [stat.ML]

Fichier principal

paper.pdf (1.46 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
licence	Paternité

Pierre Marion : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04720799

Soumis le : vendredi 4 octobre 2024-08:57:53

Dernière modification le : mercredi 30 octobre 2024-13:34:21

Dates et versions

hal-04720799 , version 1 (04-10-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04720799 , version 1
ARXIV : 2410.01537

Citer

Pierre Marion, Raphaël Berthier, Gérard Biau, Claire Boyer. Attention layers provably solve single-location regression. 2024. ⟨hal-04720799⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INSMI LM-ORSAY INRIA2 INRIA-EPFL UNIV-PARIS-SACLAY LPSM SORBONNE-UNIVERSITE SU-SCIENCES UP-SCIENCES GS-MATHEMATIQUES

134 Consultations

25 Téléchargements

Attention layers provably solve single-location regression

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager