Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Concise one-layer transformers can do function evaluation (sometimes)
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
Yale University, USA.
Yale University, USA.
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:umu:diva-244110OAI: oai:DiVA.org:umu-244110DiVA, id: diva2:1997317
Tillgänglig från: 2025-09-11 Skapad: 2025-09-11 Senast uppdaterad: 2025-09-12Bibliografiskt granskad
Ingår i avhandling
1. Transformers as recognizers and transducers
Öppna denna publikation i ny flik eller fönster >>Transformers as recognizers and transducers
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Alternativ titel[sv]
Transformers som igenkännare och transduktorer
Abstract [en]

This thesis studies the computational expressivity of transformers from the perspective of formal language theory, circuit complexity, and transductions. It develops a unified framework that locates finite-precision transformer architectures within established constant-depth circuit classes and descriptive logics, and links their recognition power to well-known hierarchies of string-to-string functions.

The analysis focuses on architectures under explicit and realistic constraints: constant depth, O(log n) precision, and simple positional features. Under these assumptions, encoder-only transformers with average-hard attention lie within DLOGTIME-uniform TC0. With a small set of positional encodings, masked average-hard encoders can simulate a calculus that contains the aperiodic polyregular transductions. Together, these results point to constant-depth counting (e.g., prefix sums and majority) as a shared mechanism behind both recognition and transduction.

The thesis further shows that temperature-scaled softmax attention can match the selection behavior of unique-hard attention, and that for average-hard heads with a unique maximizer per query (uniform-tieless), the same outputs can be obtained without changing the learned weights by choosing a length-dependent temperature. It also characterizes trade-offs in succinct models: some table-lookup layouts can be solved by a single concise layer under structure-aware encodings, whereas structure-oblivious layouts require either additional depth or super-polylogarithmic communication size.

Throughout, constructive proofs are paired with reference implementations and learning experiments on synthetic tasks. This combination clarifies which theoretically possible behaviors emerge in practice and where learnability lags expressivity, especially for length extrapolation. The thesis concludes with open directions toward adaptive masking, resource-aware analyses, and conditions under which standard training enables the models to reach their theoretical potential, strengthening the bridge between formal methods and neural language modeling.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå University, 2025. s. 34
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-244111 (URN)978-91-8070-790-9 (ISBN)978-91-8070-791-6 (ISBN)
Disputation
2025-10-07, SAM.A.280, Samhällsvetarhuset, Umeå, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2025-09-16 Skapad: 2025-09-11 Senast uppdaterad: 2025-09-17Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Person

Strobl, Lena

Sök vidare i DiVA

Av författaren/redaktören
Strobl, Lena
Av organisationen
Institutionen för datavetenskap
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 54 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf