Umeå University's logo

umu.sePublications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Contextual language processing: from formal transducers to contextual advertising
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0009-0004-0580-6270
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Contextual language processing concerns how information from the surrounding media environment, such as accompanying texts or interaction signals, is represented and used in automated language processing systems. Incorporating contextual information enables models to derive interpretations that go beyond isolated textual analysis and capture aspects of meaning grounded in real-world situations. Despite substantial research in language modeling, effectively processing contexts remains an appealing challenge, particularly in balancing the expressive power, interpretability, and adaptivity of the model, and its alignment with human understanding.

This thesis investigates contextual language processing from both methodological and applied perspectives, focusing on two related domains: natural language processing and contextual advertising. On the methodological side, the thesis formalizes the grammatical inference of finite-state transducers as a sequential decision-making problem. By integrating structured transducer models with reinforcement learning, it demonstrates how contextual languages can be learned dynamically while retaining interpretability. These learning-based approaches offer a systematic way to adapt language transformations to shifting contexts. Such adaptivity can, for instance, lead to more effective decisions in automated advertising auctions. On the applied side, the thesis examines contextual communication in real-world, user-facing systems through empirical studies in contextual advertising. These studies reveal a gap between the computational level of contextual relevance that is often optimized by automatic metrics, and human-perceived relevance as experienced by end users. The findings show that increased algorithmic precision or model complexity does not necessarily translate into improved user experiences, and that factors such as transparency and trust play a central role in contextual effectiveness.

Through theoretical, empirical, and conceptual analyses, this thesis demonstrates that effective contextual language processing necessitates bridging multidisciplinary perspectives, including algorithmic and perceptual ones. It frames contextual language processing as an interpretable, adaptive, and user-centered interaction process. The insights discussed in this work offer implications not only for algorithmic optimization, but also for modern online advertising and other user-facing applications where contextual understanding, practical constraints, and privacy considerations are critical.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2026. , p. 58
Series
Report / UMINF, ISSN 0348-0542 ; 26.01
Keywords [en]
contextual language processing, finite-state transducers, contextual advertising, natural language processing, reinforcement learning, keyword extraction, empirical studies
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:umu:diva-248544ISBN: 978-91-8070-909-5 (electronic)ISBN: 978-91-8070-908-8 (print)OAI: oai:DiVA.org:umu-248544DiVA, id: diva2:2028114
Public defence
2026-02-13, Naturvetarhuset, NAT.D. 480, Umeå, 09:00 (English)
Opponent
Supervisors
Available from: 2026-01-23 Created: 2026-01-14 Last updated: 2026-01-15Bibliographically approved
List of papers
1. Reinforcement learning of finite-state string transductions
Open this publication in new window or tab >>Reinforcement learning of finite-state string transductions
2024 (English)In: Journal of Automata, Languages and Combinatorics, ISSN 1430-189X, Vol. 29, no 2–4, p. 109-136Article in journal (Refereed) Published
Abstract [en]

Finite state transducers (FSTs) are a valuable tool in data processing systems, where they are used to realise string-to-string transductions. We consider the problem of inferring transductions representable by FSTs through reinforcement learning. In this machine-learning paradigm, a learning algorithm repeatedly interacts with an environment by performing one out of a fixed set of candidate actions. Each action taken yields a reward, the size of which depends on the environment's current state, and causes the environment to change into a new state. The algorithm's objective is to maximise the accumulated reward. In the setting explored here, the environment consists of the next symbol in the input string to be rewritten and a transducer state. An action consists in choosing the symbol to output next, and the transducer state to shift into, thus causing a change in the environment. We propose a learning algorithm that starts out from a singleton set of states, and every time the learning rate stagnates, splits a state into two. For the split, it chooses a state that has been visited often, but despite this provides little information about how to maximise the reward. We evaluate the algorithm through empirical experiments, and the results suggest that it is robust enough to handle situations where the target transduction changes during the learning process.

National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-248538 (URN)10.25596/jalc-2024-109 (DOI)
Available from: 2026-01-14 Created: 2026-01-14 Last updated: 2026-01-14Bibliographically approved
2. Reinforcement Learning of Probabilistic Finite-State Transducers
Open this publication in new window or tab >>Reinforcement Learning of Probabilistic Finite-State Transducers
(English)Manuscript (preprint) (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-248541 (URN)
Available from: 2026-01-14 Created: 2026-01-14 Last updated: 2026-01-14Bibliographically approved
3. From precision to perception: Human-in-the-loop evaluation of keyword extraction for internet-scale contextual advertising
Open this publication in new window or tab >>From precision to perception: Human-in-the-loop evaluation of keyword extraction for internet-scale contextual advertising
2026 (English)In: Information Systems, ISSN 0306-4379, E-ISSN 1873-6076, Vol. 138, article id 102665Article in journal (Refereed) Published
Abstract [en]

Keyword extraction is a foundational task in natural language processing, underpinning countless real-world applications. One of these is contextual advertising, where keywords help predict the topical congruence between ads and their surrounding media contexts to enhance advertising effectiveness. Recent advances in artificial intelligence have improved keyword extraction capabilities but also introduced concerns about computational cost. Moreover, although the end-user experience is of vital importance, human evaluation of keyword extraction performances remains under-explored. This study provides a comparative evaluation of prevalent keyword extraction algorithms with different levels of complexity represented by TF-IDF, KeyBERT, and Llama 2. To evaluate their effectiveness, a mixed-methods approach is employed, combining quantitative benchmarking with qualitative assessments from 855 participants through four survey-based experiments. The findings demonstrate that KeyBERT achieves an effective balance between user preferences and computational efficiency, compared to the other algorithms. We observe a clear overall preference for gold-standard keywords, but there is a misalignment between algorithmic benchmark performance and user ratings. This reveals a long-overlooked gap between traditional precision-focused metrics and user-perceived algorithm efficiency. The study underscores the importance of human-in-the-loop evaluation methodologies and proposes analytical tools to facilitate their implementation.

Place, publisher, year, edition, pages
Elsevier, 2026
Keywords
Contextual advertising, Human evaluation, Human-in-the-loop, Keyword extraction, Language models, Statistical methods, Word embeddings
National Category
Computer Sciences Computer Systems
Identifiers
urn:nbn:se:umu:diva-247758 (URN)10.1016/j.is.2025.102665 (DOI)2-s2.0-105024445488 (Scopus ID)
Funder
Marianne and Marcus Wallenberg FoundationWallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Research Council
Available from: 2025-12-19 Created: 2025-12-19 Last updated: 2026-01-14Bibliographically approved
4. Beyond precision: understanding the impact of algorithmic accuracy and transparency on user perceptions in keyword-driven contextual advertising
Open this publication in new window or tab >>Beyond precision: understanding the impact of algorithmic accuracy and transparency on user perceptions in keyword-driven contextual advertising
(English)Manuscript (preprint) (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-248543 (URN)
Available from: 2026-01-14 Created: 2026-01-14 Last updated: 2026-01-14Bibliographically approved
5. Programmatic advertising in the age of AI: a conceptual overview and strategic recommendations
Open this publication in new window or tab >>Programmatic advertising in the age of AI: a conceptual overview and strategic recommendations
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Artificial intelligence (AI) is revolutionizing the field of programmatic digital marketing. This article provides a forward-looking perspective on AI in advertising and offers a conceptual framework for developing effective marketing strategies. We explore key developments in the industry: the progression from personalized to contextual targeting, and the increased reliance on AI-based automation. Additionally, the article identifies three critical factors – results, resources, and rectitude – that influence the choice of strategies for online advertisers. Our findings suggest that while AI might multiply the effectiveness of programmatic advertising campaigns, it comes with important tradeoffs that must be taken into account. By synthesizing literature from digital advertising, computer science, and media studies, we offer an improved understanding of the evolving programmatic advertising ecosystem and distill this into practical advice for advertisers.

Keywords
programmatic advertising, contextual targeting, personalized targeting, artificial intelligence, advertising strategies, digital marketing, AI automation, marketing effectiveness
National Category
Business Administration Artificial Intelligence
Identifiers
urn:nbn:se:umu:diva-238301 (URN)
Available from: 2025-04-30 Created: 2025-04-30 Last updated: 2026-01-14Bibliographically approved

Open Access in DiVA

fulltext(1050 kB)31 downloads
File information
File name FULLTEXT01.pdfFile size 1050 kBChecksum SHA-512
cefd86b2671029544d786c81f8657339826a9754a6fdd92c7e7d5abad2cfdb8a0a83af908a1e7f0803fa47d5fe36e2b700df5340ce742406fd62023b07742b96
Type fulltextMimetype application/pdf
spikblad(240 kB)11 downloads
File information
File name SPIKBLAD01.pdfFile size 240 kBChecksum SHA-512
d5302cdd0c8b5803d993d9b20d91a04089b48580b92f2f7c3bcb92301d8c5f0b71603ba313610a9a37e2f821f5c6f0a472e4a987e44d7bcf77a924742eac3ad4
Type spikbladMimetype application/pdf

Authority records

Cai, Jingwen

Search in DiVA

By author/editor
Cai, Jingwen
By organisation
Department of Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1217 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf