Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Strobl, Lena
Publications (6 of 6) Show all publications
Strobl, L. (2025). Transformers as recognizers and transducers. (Doctoral dissertation). Umeå: Umeå University
Open this publication in new window or tab >>Transformers as recognizers and transducers
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Transformers som igenkännare och transduktorer
Abstract [en]

This thesis studies the computational expressivity of transformers from the perspective of formal language theory, circuit complexity, and transductions. It develops a unified framework that locates finite-precision transformer architectures within established constant-depth circuit classes and descriptive logics, and links their recognition power to well-known hierarchies of string-to-string functions.

The analysis focuses on architectures under explicit and realistic constraints: constant depth, O(log n) precision, and simple positional features. Under these assumptions, encoder-only transformers with average-hard attention lie within DLOGTIME-uniform TC0. With a small set of positional encodings, masked average-hard encoders can simulate a calculus that contains the aperiodic polyregular transductions. Together, these results point to constant-depth counting (e.g., prefix sums and majority) as a shared mechanism behind both recognition and transduction.

The thesis further shows that temperature-scaled softmax attention can match the selection behavior of unique-hard attention, and that for average-hard heads with a unique maximizer per query (uniform-tieless), the same outputs can be obtained without changing the learned weights by choosing a length-dependent temperature. It also characterizes trade-offs in succinct models: some table-lookup layouts can be solved by a single concise layer under structure-aware encodings, whereas structure-oblivious layouts require either additional depth or super-polylogarithmic communication size.

Throughout, constructive proofs are paired with reference implementations and learning experiments on synthetic tasks. This combination clarifies which theoretically possible behaviors emerge in practice and where learnability lags expressivity, especially for length extrapolation. The thesis concludes with open directions toward adaptive masking, resource-aware analyses, and conditions under which standard training enables the models to reach their theoretical potential, strengthening the bridge between formal methods and neural language modeling.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2025. p. 34
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-244111 (URN)978-91-8070-790-9 (ISBN)978-91-8070-791-6 (ISBN)
Public defence
2025-10-07, SAM.A.280, Samhällsvetarhuset, Umeå, 13:00 (English)
Opponent
Supervisors
Available from: 2025-09-16 Created: 2025-09-11 Last updated: 2025-09-17Bibliographically approved
Strobl, L., Angluin, D., Chiang, D., Rawski, J. & Sabharwal, A. (2025). Transformers as transducers. Transactions of the Association for Computational Linguistics, 13, 200-219
Open this publication in new window or tab >>Transformers as transducers
Show others...
2025 (English)In: Transactions of the Association for Computational Linguistics, E-ISSN 2307-387X, Vol. 13, p. 200-219Article in journal (Refereed) Published
Abstract [en]

We study the sequence-to-sequence mapping capacity of transformers by relating them to finite transducers, and find that they can express surprisingly large classes of (total functional) transductions. We do so using variants of RASP, a programming language designed to help people “think like transformers,” as an intermediate representation. We extend the existing Boolean variant B-RASP to sequence-to-sequence transductions and show that it computes exactly the first-order rational transductions (such as string rotation). Then, we introduce two new extensions. B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular transductions. S-RASP adds prefix sum, which enables additional arithmetic operations (such as squaring a string) and contains all first-order polyregular transductions. Finally, we show that masked average-hard attention transformers can simulate S-RASP.

Place, publisher, year, edition, pages
MIT Press, 2025
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-242255 (URN)10.1162/tacl_a_00736 (DOI)001436318200002 ()2-s2.0-105010222957 (Scopus ID)
Available from: 2025-07-17 Created: 2025-07-17 Last updated: 2025-10-09Bibliographically approved
Strobl, L., Merrill, W., Weiss, G., Chiang, D. & Angluin, D. (2024). What formal languages can transformers express? a survey. Transactions of the Association for Computational Linguistics, 12, 543-561
Open this publication in new window or tab >>What formal languages can transformers express? a survey
Show others...
2024 (English)In: Transactions of the Association for Computational Linguistics, E-ISSN 2307-387X, Vol. 12, p. 543-561Article in journal (Refereed) Published
Abstract [en]

As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring such questions can help clarify the power of transformers relative to other models of computation, their fundamental capabilities and limits, and the impact of architectural choices. Work in this subarea has made considerable progress in recent years. Here, we undertake a comprehensive survey of this work, documenting the diverse assumptions that underlie different results and providing a unified framework for harmonizing seemingly contradictory findings.

Place, publisher, year, edition, pages
MIT Press, 2024
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:umu:diva-225336 (URN)10.1162/tacl_a_00663 (DOI)001259434600001 ()2-s2.0-85193832932 (Scopus ID)
Available from: 2024-06-04 Created: 2024-06-04 Last updated: 2025-09-12Bibliographically approved
Strobl, L.Average-hard attention transformers are constant-depth uniform threshold circuits.
Open this publication in new window or tab >>Average-hard attention transformers are constant-depth uniform threshold circuits
(English)Manuscript (preprint) (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-244107 (URN)
Available from: 2025-09-11 Created: 2025-09-11 Last updated: 2025-09-12Bibliographically approved
Strobl, L., Angluin, D. & Frank, R.Concise one-layer transformers can do function evaluation (sometimes).
Open this publication in new window or tab >>Concise one-layer transformers can do function evaluation (sometimes)
(English)Manuscript (preprint) (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-244110 (URN)
Available from: 2025-09-11 Created: 2025-09-11 Last updated: 2025-09-12Bibliographically approved
Andy, Y., Strobl, L., David, C. & Dana, A.Simulating hard attention using soft attention.
Open this publication in new window or tab >>Simulating hard attention using soft attention
(English)Manuscript (preprint) (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-244109 (URN)
Available from: 2025-09-11 Created: 2025-09-11 Last updated: 2025-09-12Bibliographically approved
Organisations

Search in DiVA

Show all publications