Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Transformers as recognizers and transducers
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)Alternativ titel
Transformers som igenkännare och transduktorer (Svenska)
Abstract [en]

This thesis studies the computational expressivity of transformers from the perspective of formal language theory, circuit complexity, and transductions. It develops a unified framework that locates finite-precision transformer architectures within established constant-depth circuit classes and descriptive logics, and links their recognition power to well-known hierarchies of string-to-string functions.

The analysis focuses on architectures under explicit and realistic constraints: constant depth, O(log n) precision, and simple positional features. Under these assumptions, encoder-only transformers with average-hard attention lie within DLOGTIME-uniform TC0. With a small set of positional encodings, masked average-hard encoders can simulate a calculus that contains the aperiodic polyregular transductions. Together, these results point to constant-depth counting (e.g., prefix sums and majority) as a shared mechanism behind both recognition and transduction.

The thesis further shows that temperature-scaled softmax attention can match the selection behavior of unique-hard attention, and that for average-hard heads with a unique maximizer per query (uniform-tieless), the same outputs can be obtained without changing the learned weights by choosing a length-dependent temperature. It also characterizes trade-offs in succinct models: some table-lookup layouts can be solved by a single concise layer under structure-aware encodings, whereas structure-oblivious layouts require either additional depth or super-polylogarithmic communication size.

Throughout, constructive proofs are paired with reference implementations and learning experiments on synthetic tasks. This combination clarifies which theoretically possible behaviors emerge in practice and where learnability lags expressivity, especially for length extrapolation. The thesis concludes with open directions toward adaptive masking, resource-aware analyses, and conditions under which standard training enables the models to reach their theoretical potential, strengthening the bridge between formal methods and neural language modeling.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå University, 2025. , s. 34
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:umu:diva-244111ISBN: 978-91-8070-790-9 (tryckt)ISBN: 978-91-8070-791-6 (digital)OAI: oai:DiVA.org:umu-244111DiVA, id: diva2:1997321
Disputation
2025-10-07, SAM.A.280, Samhällsvetarhuset, Umeå, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2025-09-16 Skapad: 2025-09-11 Senast uppdaterad: 2025-09-17Bibliografiskt granskad
Delarbeten
1. Average-hard attention transformers are constant-depth uniform threshold circuits
Öppna denna publikation i ny flik eller fönster >>Average-hard attention transformers are constant-depth uniform threshold circuits
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-244107 (URN)
Tillgänglig från: 2025-09-11 Skapad: 2025-09-11 Senast uppdaterad: 2025-09-12Bibliografiskt granskad
2. What formal languages can transformers express? a survey
Öppna denna publikation i ny flik eller fönster >>What formal languages can transformers express? a survey
Visa övriga...
2024 (Engelska)Ingår i: Transactions of the Association for Computational Linguistics, E-ISSN 2307-387X, Vol. 12, s. 543-561Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring such questions can help clarify the power of transformers relative to other models of computation, their fundamental capabilities and limits, and the impact of architectural choices. Work in this subarea has made considerable progress in recent years. Here, we undertake a comprehensive survey of this work, documenting the diverse assumptions that underlie different results and providing a unified framework for harmonizing seemingly contradictory findings.

Ort, förlag, år, upplaga, sidor
MIT Press, 2024
Nationell ämneskategori
Jämförande språkvetenskap och allmän lingvistik
Identifikatorer
urn:nbn:se:umu:diva-225336 (URN)10.1162/tacl_a_00663 (DOI)001259434600001 ()2-s2.0-85193832932 (Scopus ID)
Tillgänglig från: 2024-06-04 Skapad: 2024-06-04 Senast uppdaterad: 2025-09-12Bibliografiskt granskad
3. Transformers as transducers
Öppna denna publikation i ny flik eller fönster >>Transformers as transducers
Visa övriga...
2025 (Engelska)Ingår i: Transactions of the Association for Computational Linguistics, E-ISSN 2307-387X, Vol. 13, s. 200-219Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of (total functional) transductions. We do so using variants of RASP, a programming language designed to help people “think like transformers,” as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence transductions and show that it computes exactly the first-order rational transductions (such as string rotation). Then, we introduce two new extensions. B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular transductions. S-RASP adds prefix sum, which enables additional arithmetic operations (such as squaring a string) and contains all first-order polyregular transductions. Finally, we show that masked average-hard attention transformers can simulate S-RASP.

Ort, förlag, år, upplaga, sidor
MIT Press, 2025
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-242255 (URN)10.1162/tacl_a_00736 (DOI)001436318200002 ()2-s2.0-105010222957 (Scopus ID)
Tillgänglig från: 2025-07-17 Skapad: 2025-07-17 Senast uppdaterad: 2025-10-09Bibliografiskt granskad
4. Simulating hard attention using soft attention
Öppna denna publikation i ny flik eller fönster >>Simulating hard attention using soft attention
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-244109 (URN)
Tillgänglig från: 2025-09-11 Skapad: 2025-09-11 Senast uppdaterad: 2025-09-12Bibliografiskt granskad
5. Concise one-layer transformers can do function evaluation (sometimes)
Öppna denna publikation i ny flik eller fönster >>Concise one-layer transformers can do function evaluation (sometimes)
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-244110 (URN)
Tillgänglig från: 2025-09-11 Skapad: 2025-09-11 Senast uppdaterad: 2025-09-12Bibliografiskt granskad

Open Access i DiVA

fulltext(1550 kB)243 nedladdningar
Filinformation
Filnamn FULLTEXT04.pdfFilstorlek 1550 kBChecksumma SHA-512
c103b656f65bac4d1e66b739ea0790d292bb142f76d49de26069d758c0b6a4ed951f033fc6c36588691869a46c0d7a98239798372e7a9dc3660a82f42ce1b33b
Typ fulltextMimetyp application/pdf
spikblad(271 kB)28 nedladdningar
Filinformation
Filnamn SPIKBLAD01.pdfFilstorlek 271 kBChecksumma SHA-512
96ad22d2a858d3609355db94b64bba74c764466170484331f7cf93eb209eeff9422019267222242c1e1c9e7ef539200137becb719cb1986b94e7cc3b448877a5
Typ spikbladMimetyp application/pdf

Person

Strobl, Lena

Sök vidare i DiVA

Av författaren/redaktören
Strobl, Lena
Av organisationen
Institutionen för datavetenskap
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 255 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 3660 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf