Umeå universitets logga

umu.sePublikationer
Ändra sökning
Avgränsa sökresultatet
12 1 - 50 av 92
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Andersson, Eric
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jonsson, Anna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Generating semantic graph corpora with graph expansion grammar2023Ingår i: 13th International Workshop on Non-Classical Models of Automata and Applications (NCMA 2023) / [ed] Nagy B., Freund R., Open Publishing Association , 2023, Vol. 388, s. 3-15Konferensbidrag (Refereegranskat)
    Abstract [en]

    We introduce LOVELACE, a tool for creating corpora of semantic graphs.The system uses graph expansion grammar as  a representational language, thus allowing users to craft a grammar that describes a corpus with desired properties. When given such grammar as input, the system generates a set of output graphs that are well-formed according to the grammar, i.e., a graph bank.The generation process can be controlled via a number of configurable parameters that allow the user to, for example, specify a range of desired output graph sizes.Central use cases are the creation of synthetic data to augment existing corpora, and as a pedagogical tool for teaching formal language theory. 

    Ladda ner fulltext (pdf)
    fulltext
  • 2.
    Bensch, Suna
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Hellström, Thomas
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Grammatical Inference of Graph Transformation Rules2015Ingår i: Proceedings of the 7th Workshop on Non-Classical Modelsof Automata and Applications (NCMA 2015), Austrian Computer Society , 2015, s. 73-90Konferensbidrag (Refereegranskat)
  • 3.
    Berglund, Martin
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems2013Ingår i: Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), Association for Computational Linguistics, 2013, s. 21-29Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We study the complexity of uniform membership for Linear Context-Free RewritingSystems, i.e., the problem where we aregiven a string w and a grammar G and areasked whether w ∈ L(G). In particular,we use parameterized complexity theoryto investigate how the complexity dependson various parameters. While we focusprimarily on rank and fan-out, derivationlength is also considered.

    Ladda ner fulltext (pdf)
    On the Parameterized Complexity of Linear Context-Free Rewriting Systems
  • 4.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Devinney, Hannah
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Computer, enhence: POS-tagging improvements for nonbinary pronoun use in Swedish2023Ingår i: Proceedings of the third workshop on language technology for equality, diversity, inclusion, The Association for Computational Linguistics , 2023, s. 54-61Konferensbidrag (Refereegranskat)
    Abstract [en]

    Part of Speech (POS) taggers for Swedish routinely fail for the third person gender-neutral pronoun hen, despite the fact that it has been a well-established part of the Swedish language since at least 2014. In addition to simply being a form of gender bias, this failure can have negative effects on other tasks relying on POS information. We demonstrate the usefulness of semi-synthetic augmented datasets in a case study, retraining a POS tagger to correctly recognize hen as a personal pronoun. We evaluate our retrained models for both tag accuracy and on a downstream task (dependency parsing) in a classicial NLP pipeline.

    Our results show that adding such data works to correct for the disparity in performance. The accuracy rate for identifying hen as a pronoun can be brought up to acceptable levels with only minor adjustments to the tagger’s vocabulary files. Performance parity to gendered pronouns can be reached after retraining with only a few hundred examples. This increase in POS tag accuracy also results in improvements for dependency parsing sentences containing hen.

  • 5.
    Björklund, Henrik
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Devinney, Hannah
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Improving Swedish part-of-speech tagging for hen2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    Despite the fact that the gender-neutral pro-noun hen was officially added to the Swedish language in 2014, state of the art part of speech taggers still routinely fail to identify it as a pronoun. We retrain both efselab and spaCy models with augmented (semi-synthetic) data, where instances of gendered pronouns are replaced by hen to correct for the lack of representation in the original training data. Our results show that adding such data works to correct for the disparity in performance

    Ladda ner fulltext (pdf)
    fulltext
  • 6.
    Björklund, Johanna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Tree-to-Graph Transductions with Scope2018Ingår i: Developments in Language Theory. DLT 2018., Springer, 2018, s. 133-144Konferensbidrag (Refereegranskat)
    Abstract [en]

    High-level natural language processing requires formal languages to represent semantic information. A recent addition of this kind is abstract meaning representations. These are graphs in which nodes encode concepts and edges relations. Node-sharing is common, and cycles occur. We show that the required structures can be generated through the combination of (i) a regular tree grammar, (ii) a sequence of linear top-down tree transducers, and (iii) a fold operator that merges selected nodes. Delimiting the application of the fold operator to connected subgraphs gains expressive power, while keeping the complexity of the associated membership problem in polynomial time.

  • 7.
    Björklund, Johanna
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Cleophas, Loek
    Stellenbosch University, Republic of South Africa.
    Karlsson, My
    Codemill.
    An evaluation of structured language modeling for automatic speech recognition2017Ingår i: Journal of universal computer science (Online), ISSN 0948-695X, E-ISSN 0948-6968, Vol. 23, nr 11, s. 1019-1034Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We evaluated probabilistic lexicalized tree-insertion grammars (PLTIGs) on a classification task relevant for automatic speech recognition. The baseline is a family of n-gram models tuned with Witten-Bell smoothing. The language models are trained on unannotated corpora, consisting of 10,000 to 50,000 sentences collected from the English section of Wikipedia. For the evaluation, an additional 150 random sentences were selected from the same source, and for each of these, approximately 3,200 variations were generated. Each variant sentence was obtained by replacing an arbitrary word by a similar word, chosen to be at most 2 character edits from the original. The evaluation task consisted of identifying the original sentence among the automatically constructed (and typically inferior) alternatives. In the experiments, the n-gram models outperformed the PLTIG model on the smaller data set, but as the size of data grew, the PLTIG model gave comparable results. While PLTIGs are more demanding to train, they have the advantage that they assign a parse structure to their input sentences. This is valuable for continued algorithmic processing, for example, for summarization or sentiment analysis.

  • 8.
    Björklund, Johanna
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Cohen, Shay B.
    University of Edinburgh.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Satta, Giorgio
    University of Padova.
    Bottom-up unranked tree-to-graph transducers for translation into semantic graphs2019Ingår i: Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing / [ed] Heiko Vogler; Andreas Maletti, Association for Computational Linguistics, 2019, s. 7-17, artikel-id W19-3104Konferensbidrag (Refereegranskat)
    Abstract [en]

    We propose a formal model for translating unranked syntactic trees, such as dependency trees, into semantic graphs. These tree-to-graph transducers can serve as a formal basis of transition systems for semantic parsing which recently have been shown to perform very well, yet hitherto lack formalization. Our model features "extended" rules and an arc-factored normal form, comes with an efficient translation algorithm, and can be equipped with weights in a straightforward manner.

  • 9.
    Björklund, Johanna
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jonsson, Anna
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Generation and polynomial parsing of graph languages with non-structural reentrancies2023Ingår i: Computational linguistics - Association for Computational Linguistics (Print), ISSN 0891-2017, E-ISSN 1530-9312, Vol. 49, nr 4, s. 841-882Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Graph-based semantic representations are popular in natural language processing (NLP), where it is often convenient to model linguistic concepts as nodes and relations as edges between them. Several attempts have been made to find a generative device that is sufficiently powerful to describe languages of semantic graphs, while at the same allowing efficient parsing. We contribute to this line of work by introducing graph extension grammar, a variant of the contextual hyperedge replacement grammars proposed by Hoffmann et al. Contextual hyperedge replacement can generate graphs with non-structural reentrancies, a type of node-sharing that is very common in formalisms such as abstract meaning representation, but which context-free types of graph grammars cannot model. To provide our formalism with a way to place reentrancies in a linguistically meaningful way, we endow rules with logical formulas in counting monadic second-order logic. We then present a parsing algorithm and show as our main result that this algorithm runs in polynomial time on graph languages generated by a subclass of our grammars, the so-called local graph extension grammars.

    Ladda ner fulltext (pdf)
    fulltext
  • 10.
    Björklund, Johanna
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Mollevik, Iris
    Towards Semantic Representations with a Temporal Dimension2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    We outline the initial ideas for a representational framework for capturing temporal aspects in semantic parsing of multimodal data.As a starting point, we take the Abstract Meaning Representations of Banarescu et al. andpropose a way of extending them to coversequential progressions of events. The firstmodality to be considered is text, but the long-term goal is to also incorporate informationfrom visual and audio modalities, as well ascontextual information.

    Ladda ner fulltext (pdf)
    fulltext
  • 11.
    Björklund, Johanna
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Fernau, Henning
    FB IV - Abteilung Informatikwissenschaften, Universität Trier, Trier, Germany.
    Learning tree languages2016Ingår i: Topics in grammatical inference / [ed] Jeffrey Heinz; José M. Sempere, Springer Berlin/Heidelberg, 2016, s. 173-214Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    Tree languages have proved to be a versatile and rewarding extension of the classical notion of string languages.Many nice applications have been established over the years, in areas such as Natural Language Processing, Information Extraction, and Computational Biology. Although some properties of string languages transfer easily to the tree case, in particular for regular languages, several computational aspects turn out to be harder. It is therefore both of theoretical and of practical interest to investigate howfar and in whatways Grammatical Inference algorithms developed for the string case are applicable to trees. This chapter surveys known results in this direction. We begin by recalling the basics of tree language theory. Then, the most popular learning scenarios and algorithms are presented. Several applications of Grammatical Inference of tree languages are reviewed in some detail. We conclude by suggesting a number of directions for future research.

  • 12.
    Björklund, Johanna
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Johansson Falck, Marlene
    Umeå universitet, Humanistiska fakulteten, Institutionen för språkstudier.
    How Spatial Relations Structure Linguistic Meaning2019Ingår i: Proceedings of the 15th SweCog Conference / [ed] Holm, Linus & Erik Billing, Skövde: University of Skövde , 2019, s. 29-31Konferensbidrag (Refereegranskat)
  • 13.
    Björklund, Johanna
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Zechner, Niklas
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Syntactic methods for topic-independent authorship attribution2017Ingår i: Natural Language Engineering, ISSN 1351-3249, E-ISSN 1469-8110, Vol. 23, nr 5, s. 789-806Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The efficacy of syntactic features for topic-independent authorship attribution is evaluated, taking a feature set of frequencies of words and punctuation marks as baseline. The features are 'deep' in the sense that they are derived by parsing the subject texts, in contrast to 'shallow' syntactic features for which a part-of-speech analysis is enough. The experiments are made on two corpora of online texts and one corpus of novels written around the year 1900. The classification tasks include classical closed-world authorship attribution, identification of separate texts among the works of one author, and cross-topic authorship attribution. In the first tasks, the feature sets were fairly evenly matched, but for the last task, the syntax-based feature set outperformed the baseline feature set. These results suggest that, compared to lexical features, syntactic features are more robust to changes in topic.

  • 14.
    Brand, Dirk
    et al.
    Computer Science Division, Stellenbosch University, South Africa.
    Kroon, Steve
    Computer Science Division, Stellenbosch University, South Africa.
    Van Der Merwe, Brink
    Computer Science Division, Stellenbosch University, South Africa.
    Cleophas, Loek
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Dept. of Information Science, Stellenbosch University, Stellenbosch, South Africa.
    N-Gram Representations for Comment Filtering2015Ingår i: SAICSIT '15: Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, ACM Digital Library, 2015, artikel-id 6Konferensbidrag (Refereegranskat)
    Abstract [en]

    Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and com- ments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.

  • 15.
    Chen, Hung Chiao
    et al.
    Umeå universitet, Samhällsvetenskapliga fakulteten, Institutionen för psykologi.
    Weck, Saskia
    Umeå universitet, Samhällsvetenskapliga fakulteten, Institutionen för psykologi.
    Understanding Robots: The Effects of Conversational Strategies on the Understandability of Robot-Robot Interactions from a Human Standpoint2020Självständigt arbete på avancerad nivå (magisterexamen), 10 poäng / 15 hpStudentuppsats (Examensarbete)
    Abstract [sv]

    I takt med att teknologin utvecklas integreras robotar mer och mer i olika delar av våra liv. Framtidens människo-robot interaktioner kan ta många olika former och konfigurationer. I den här studien undersökte vi förståelsen för olika konversationsstrategier mellan robotar ur det mänskliga perspektivet. Specifikt undersökte vi förståelsen av muntliga förklaringar konstruerade enligt Grices princip för informativitet. En uppgift för deltagarna i testet var att försöka förutsäga robotarnas agerande. Dessutom utvärderades robotarnas interaktion genom att låta deltagarna rangordna och betygsätta dem. Resultatet tyder på att de robotar som använder Grices princip och de som använder de andra testade strategierna förstås och uppfattas på ett liknande sätt.

    Ladda ner fulltext (pdf)
    fulltext
  • 16.
    Chiang, David
    et al.
    University of Notre Dame.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Gildea, Daniel
    University of Rochester.
    Lopez, Adam
    University of Edinburgh.
    Satta, Giorgio
    University of Padua.
    Weighted DAG automata for semantic graphs2018Ingår i: Computational linguistics - Association for Computational Linguistics (Print), ISSN 0891-2017, E-ISSN 1530-9312, Vol. 44, nr 1, s. 119-186Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Graphs have a variety of uses in natural language processing, particularly as representations of linguistic meaning. A deficit in this area of research is a formal framework for creating, combining, and using models involving graphs that parallels the frameworks of finite automata for strings and finite tree automata for trees. A possible starting point for such a framework is the formalism of directed acyclic graph (DAG) automata, defined by Kamimura and Slutzki and extended by Quernheim and Knight. In this article, we study the latter in depth, demonstrating several new results, including a practical recognition algorithm that can be used for inference and learning with models defined on DAG automata. We also propose an extension to graphs with unbounded node degree and show that our results carry over to the extended formalism.

    Ladda ner fulltext (pdf)
    fulltext
  • 17. Coelho Mollo, Dimitri
    et al.
    Millière, Raphael
    Rathkopf, Charles
    Stinson, Catherine
    Conceptual Combinations - Benchmark Task for BIG-Bench2021Övrigt (Refereegranskat)
    Abstract [en]

    This is a task accepted in July 2021 as part of Google’s “Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models”. It is published at https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/conceptual_combinations. Links to the collection of queries are below, followed by the ReadMe file that explains the task, its justification, and its performance with existing AI language models.

  • 18.
    Dahlgren Lindström, Adam
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Learning, reasoning, and compositional generalisation in multimodal language models2024Doktorsavhandling, monografi (Övrigt vetenskapligt)
    Abstract [en]

    We humans learn language and how to interact with the world through our different senses, grounding our language in what we can see, touch, hear, and smell. We call these streams of information different modalities, and our efficient processing and synthesis of the interactions between different modalities is a cornerstone of our intelligence. Therefore, it is important to study how we can build multimodal language models, where machine learning models learn from more than just text. This is particularly important in the era of large language models (LLMs), where their general capabilities are unclear and unreliable. This thesis investigates learning and reasoning in multimodal language models, and their capabilities to compositionally generalise in visual question answering tasks. Compositional generalisation is the process in which we produce and understand novel sentences, by systematically combining words and sentences to uncover the meaning in language, and has proven a challenge for neural networks. Previously, the literature has focused on compositional generalisation in text-only language models. One of the main contributions of this work is the extensive investigation of text-image language models. The experiments in this thesis compare three neural network-based models, and one neuro-symbolic method, and operationalise language grounding as the ability to reason with relevant functions over object affordances.

    In order to better understand the capabilities of multimodal models, this thesis introduces CLEVR-Math as a synthetic benchmark of visual mathematical reasoning. The CLEVR-Math dataset involve tasks such as adding and removing objects from 3D scenes based on textual instructions, such as \emph{Remove all blue cubes. How many objects are left?}, and is given as a curriculum of tasks of increasing complexity. The evaluation set of CLEVR-Math includes extensive testing of different functional and object attribute generalisations. We open up the internal representations of these models using a technique called probing, where linear classifiers are trained to recover concepts such as colours or named entities from the internal embeddings of input data. The results show that while models are fairly good at generalisation with attributes (i.e.~solving tasks involving never before seen objects), it is a big challenge to generalise over functions and to learn abstractions such as categories. The results also show that complexity in the training data is a driver of generalisation, where an extended curriculum improves the general performance across tasks and generalisation tests. Furthermore, it is shown that training from scratch versus transfer learning has significant effects on compositional generalisation in models.

    The results identify several aspects of how current methods can be improved in the future, and highlight general challenges in multimodal language models. A thorough investigation of compositional generalisation suggests that the pre-training of models allow models access to inductive biases that can be useful to solve new tasks. Contrastingly, models trained from scratch show much lower overall performance on the synthetic tasks at hand, but show lower relative generalisation gaps. In the conclusions and outlook, we discuss the implications of these results as well as future research directions.

    Ladda ner fulltext (pdf)
    fulltext
    Ladda ner (pdf)
    spikblad
  • 19.
    Deutschmann, Mats
    et al.
    Umeå universitet, Humanistiska fakulteten, Institutionen för språkstudier.
    Molka-Danielsen, Judith
    Molde University, Norway.
    Future Directions for Learning in Virtual Worlds2009Ingår i: Learning and Teaching in the Virtual World of Second Life / [ed] Molka-Danielsen, J & M. Deutschmann, Trondheim: Tapir Academic Press , 2009, 1, s. 185-190Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    Some may claim that this book has been a showcase of case studies, without common thread. However, the common goal that runs through each of these cases is the focus on learning and the roles of learners and educators in learning activities. Do virtual worlds assist learning and do they create new opportunities? The answer from these analyses is “Yes” and this book demonstrates “how” to make use of the affordances of the virtual word of Second Life as it exists today. Yet, many questions remain both for practitioners and researchers. To give some examples: On what principles should learners’ tasks be designed, who are doing research on education in virtual worlds and what is the future of virtual worlds in a learning context? In this chapter we attempt to address some of these issues.

  • 20.
    Deutschmann, Mats
    et al.
    Umeå universitet, Humanistiska fakulteten, Institutionen för språkstudier.
    Panichi, Luisa
    Pisa University, Italy.
    Instructional Design: Teacher Practice and Learning Autonomy2009Ingår i: Learning and Teaching in the Virtual World of Second Life / [ed] Judith Molka-Danielsen & Mats Deutschmann, Trondheim: Tapir Academic Press , 2009, 1, s. 24-44Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    This chapter is based on the experiences from language proficiency courses given on Kamimo education island and addresses concerns related to teacher practice in Second Life. We examine preparatory issues, task design and the teacher’s role in fostering learner autonomy in Second Life. Although the chapter draws mainly on experiences from and reflections in the domain of language education, it has general pedagogical implications for teaching in SL.

  • 21.
    Deutschmann, Mats
    et al.
    Umeå universitet, Humanistiska fakulteten, Institutionen för språkstudier.
    Panichi, Luisa
    Pisa University, Itlay.
    Talking into empty space?: signalling involvement in a virtual language classroom in Second Life2009Ingår i: Language Awareness, ISSN 0965-8416, Vol. 18, nr 3-4, s. 310-328Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this study, we compare the first and the last sessions from an online oral proficiencycourse aimed at doctoral students conducted in the virtual world Second Life. The study attempts to identify how supportive moves made by the teacher encourage learners to engage with language, and what type of linguistic behaviour in the learners leads to engagement in others. We compare overall differences in terms of floor space and turn-taking patterns, and also conduct a more in-depth discourse analysis of parts of the sessions focusing on supportive moves such as back-channelling and elicitors. There are indications that the supportive linguistic behaviour of teachers is important in increasing learner engagement. In our studywe are also able to observe a change in student linguistic behaviour between the first and the last sessions with students becoming more active in signalling involvement as the course progresses. Finally, by illustrating some of the language awareness issues that arise in online environments, we hope to contribute to the understanding of the dynamics of online communication.

  • 22.
    Devinney, Hannah
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Gender and representation: investigations of bias in natural language processing2024Doktorsavhandling, monografi (Övrigt vetenskapligt)
    Abstract [sv]

    Nuförtiden möter vi dagligen språkteknologi i olika former. Ibland är det tydligt för oss att detta sker, till exempel när vi använder maskinöversättning. Andra gånger är det svårare att upptäcka, som när sociala medier rekommenderar oss inlägg. Språkteknologi ligger också till grund för större AI-system, som till exempel kan användas för att bevilja eller avslå låneansökningar och därmed ha stora materiella effekter på våra liv. I takt med att ChatGPT och andra stora språkmodeller blir mer populära kommer vi också att konfronteras med fler och fler maskingenererade texter.       

    Maskininlärningsmetoder, som de flesta av dessa verktyg förlitar sig på idag, upprepar mönster de 'ser' i sin träningsdata. Vanligtvis är detta språkdata som människor har skrivit eller talat, så förutom saker som meningsstruktur innehåller den också information om hur vi konstruerar vårt samhälle. Detta inkluderar även stereotyper och andra fördomar. Vi kallar dessa mönster för 'social bias' och de upprepas, eller till och med förvärras, av maskininlärningssystem. När språkteknologi blir en del av vårt språkliga sammanhang blir de också delaktiga i att föra vidare stereotyper genom att till exempel anta att sjuksköterskor är kvinnor och läkare män, eller systematiskt föreslå män framför kvinnor för befordran. Tekniken blir därmed ett verktyg som samhället använder för att bygga upp makt -- och maktskillnader -- genom att sprida och normalisera orättvisa idéer samt genom att bidra till orättvisa resursfördelningar.       

    Den här avhandlingen utforskar sociala fördomar om kön och genus, inkludering av trans- och ickebinära personer samt queer representation i språkteknologier genom en feministisk och intersektionell lins. Tre frågor ställs: Hur tänker forskare på och mäter 'genus' när de undersöker 'genusbias' i språkteknologi? Vilka könsstereotyper finns i data som används för att träna språkteknologiska modeller? Hur representeras queera (särskilt trans- och ickebinära) människor, kroppar och erfarenheter i produktionen av dessa teknologier? Avhandlingen finner att ickebinära personer osynliggörs av fördomar i såväl modeller som data, men också av forskare som vill ta itu med könsfördomar. Män och kvinnor reduceras till cisheteronormativa roller och stereotyper, med litet utrymme att vara en individ bortom kön. Vi kan mildra några av dessa problem, till exempel genom att lägga till mer ickebinärt språk i träningsdatan, men fullständiga lösningar är svåra att uppnå på grund av det komplexa samspelet mellan samhälle och teknik. Dessutom måste vi förbli flexibla, eftersom vår förståelse av samhället, stereotyper och 'bias' i sig skiftar över tid och med sammanhanget.

    Ladda ner fulltext (pdf)
    fulltext
    Ladda ner (pdf)
    spikblad
    Ladda ner (jpg)
    presentationsbild
  • 23.
    Devinney, Hannah
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Björklund, Jenny
    Uppsala University.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Crime and Relationship: Exploring Gender Bias in NLP Corpora2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    Gender bias in natural language processing (NLP) tools, deriving from implicit human bias embedded in language data, is an important and complicated problem on the road to fair algorithms. We leverage topic modeling to retrieve documents associated with particular gendered categories, and discuss how exploring these documents can inform our understanding of the corpora we may use to train NLP tools. This is a starting point for challenging the systemic power structures and producing a justice-focused approach to NLP.

    Ladda ner fulltext (pdf)
    fulltext
  • 24.
    Devinney, Hannah
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Björklund, Jenny
    Centre for Gender Research, Uppsala University.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Semi-Supervised Topic Modeling for Gender Bias Discovery in English and Swedish2020Ingår i: Proceedings of the Second Workshop on Gender Bias in Natural Language Processing / [ed] Marta R. Costa-jussà, Christian Hardmeier, Will Radford, Kellie Webster, Association for Computational Linguistics, 2020, s. 79-92Konferensbidrag (Refereegranskat)
    Abstract [en]

    Gender bias has been identified in many models for Natural Language Processing, stemming from implicit biases in the text corpora used to train the models. Such corpora are too large to closely analyze for biased or stereotypical content. Thus, we argue for a combination of quantitative and qualitative methods, where the quantitative part produces a view of the data of a size suitable for qualitative analysis. We investigate the usefulness of semi-supervised topic modeling for the detection and analysis of gender bias in three corpora (mainstream news articles in English and Swedish, and LGBTQ+ web content in English). We compare differences in topic models for three gender categories (masculine, feminine, and nonbinary or neutral) in each corpus. We find that in all corpora, genders are treated differently and that these differences tend to correspond to hegemonic ideas of gender.

    Ladda ner fulltext (pdf)
    fulltext
  • 25.
    Devinney, Hannah
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Björklund, Jenny
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Uppsala University, Sweden.
    Björklund, Henrik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Theories of Gender in Natural Language Processing2022Ingår i: Proceedings of the fifth annual ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT'22), 2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    The rise of concern around Natural Language Processing (NLP) technologies containing and perpetuating social biases has led to a rich and rapidly growing area of research. Gender bias is one of the central biases being analyzed, but to date there is no comprehensive analysis of how “gender” is theorized in the field. We survey nearly 200 articles concerning gender bias in NLP to discover how the field conceptualizes gender both explicitly (e.g. through definitions of terms) and implicitly (e.g. through how gender is operationalized in practice). In order to get a better idea of emerging trajectories of thought, we split these articles into two sections by time.

    We find that the majority of the articles do not make their theo- rization of gender explicit, even if they clearly define “bias.” Almost none use a model of gender that is intersectional or inclusive of non- binary genders; and many conflate sex characteristics, social gender, and linguistic gender in ways that disregard the existence and expe- rience of trans, nonbinary, and intersex people. There is an increase between the two time-sections in statements acknowledging that gender is a complicated reality, however, very few articles manage to put this acknowledgment into practice. In addition to analyzing these findings, we provide specific recommendations to facilitate interdisciplinary work, and to incorporate theory and methodol- ogy from Gender Studies. Our hope is that this will produce more inclusive gender bias research in NLP.

  • 26.
    Devinney, Hannah
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Samhällsvetenskapliga fakulteten, Umeå centrum för genusstudier (UCGS).
    Eklund, Anton
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ryazanov, Igor
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Cai, Jingwen
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Developing a multilingual corpus of wikipedia biographies2023Ingår i: International conference. Recent advances in natural language processing 2023, large language models for natural language processing: proceedings / [ed] Ruslan Mitkov; Maria Kunilovskaya; Galia Angelova, Shoumen, Bulgaria: Incoma ltd. , 2023, artikel-id 2023.ranlp-1.32Konferensbidrag (Refereegranskat)
    Abstract [en]

    For many languages, Wikipedia is the mostaccessible source of biographical information. Studying how Wikipedia describes the lives ofpeople can provide insights into societal biases, as well as cultural differences more generally. We present a method for extracting datasetsof Wikipedia biographies. The accompanying codebase is adapted to English, Swedish, Russian, Chinese, and Farsi, and is extendable to other languages. We present an exploratory analysis of biographical topics and gendered patterns in four languages using topic modelling and embedding clustering. We find similarities across languages in the types of categories present, with the distribution of biographies concentrated in the language’s core regions. Masculine terms are over-represented and spread out over a wide variety of topics. Feminine terms are less frequent and linked to more constrained topics. Non-binary terms are nearly non-represented.

    Ladda ner fulltext (pdf)
    fulltext
  • 27.
    Drewes, Frank
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Gebhardt, Kilian
    Technische Universität Dresden.
    Vogler, Heiko
    Technische Universität Dresden.
    EM-training for probabilistic aligned hypergraph bimorphisms2016Ingår i: Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata, Association for Computational Linguistics , 2016, s. 60-69Konferensbidrag (Refereegranskat)
    Abstract [en]

    We define the concept of probabilistic aligned hypergraph bimorphism. Each such bimorphism consists of a probabilistic regular tree grammar, two hypergraph algebras in which the generated trees are interpreted, and a family of alignments between the two interpretations. It generates a set of bihypergraphs each consisting of two hypergraphs and an alignment between them; for instance, discontinuous phrase structures and non-projective dependency structures are bihypergraphs. We show an EM-training algorithm which takes a corpus of bihypergraphs and an aligned hypergraph bimorphism as input and calculates a probability assignment to the rules of the regular tree grammar such that in the limit the maximum-likelihood of the corpus is approximated.

  • 28.
    Drewes, Frank
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Knight, Kevin
    University of Southern California.
    Kuhlmann, Marco
    Linköpings universitet.
    Formal Models of Graph Transformation in Natural Language Processing2015Rapport (Övrigt vetenskapligt)
    Abstract [en]

    In natural language processing (NLP) there is an increasing interest in formal models for processing graphs rather than more restricted structures such as strings or trees. Such models of graph transformation have previously been studied and applied in various other areas of computer science, including formal language theory, term rewriting, theory and implementation of programming languages, concurrent processes, and software engineering. However, few researchers from NLP are familiar with this work, and at the same time, few researchers from the theory of graph transformation are aware of the specific desiderata, possibilities and challenges that one faces when applying the theory of graph transformation to NLP problems. The Dagstuhl Seminar 15122 “Formal Models of Graph Transformation in Natural Language Processing” brought researchers from the two areas together. It initiated an interdisciplinary exchange about existing work, open problems, and interesting applications.

    Ladda ner fulltext (pdf)
    fulltext
  • 29.
    Drewes, Frank
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Prorok, Kalle
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för tillämpad fysik och elektronik.
    AI för dokumentgenerering2021Rapport (Övrigt vetenskapligt)
    Abstract [sv]

    Den här rapporten ger en kortfattad introduktion till metoder och några praktiska resultat från ett AI-texthanteringsprojekt i samarbete mellan Trafikverket, Umeå universitet och Sweco. Tester har gjorts för att extrahera information (geografiska orter, sammanfattningar, frågor och svar) från dokument. Även dokumentgenerering, projektets ursprungliga fokus, har adresserats. Där var målet att automatiskt skapa texter för utvalda syften, något som visade sig vara svårt i nuläget då de existerande metoderna är begränsade och samtidigt mycket krävande på datorkraft. Till rapporten hör några förenklade kodexempel där läsaren själv kan testköra och förhoppningsvis lära sig från lite olika fall.Rapporten är indelad i fyra delar: En icke teknisk översikt för allmänt intresserade, en mer detaljerad beskrivning av resultaten, en del om begrepp och metoder för speciellt intresserade samt en del om implementation för programmerare.

    Ladda ner fulltext (pdf)
    fulltext
  • 30.
    Eklund, Anton
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Forsman, Mona
    Adlede AB.
    Topic modeling by clustering language model embeddings: human validation on an industry dataset2022Konferensbidrag (Refereegranskat)
    Abstract [en]

    Topic models are powerful tools to get an overview of large collections of text data, a situation that is prevalent in industry applications. A rising trend within topic modeling is to directly cluster dimension-reduced embeddings created with pretrained language models. It is difficult to evaluate these models because there is no ground truth and automatic measurements may not mimic human judgment. To address this problem, we created a tool called STELLAR for interactive topic browsing which we used for human evaluation of topics created from a real-world dataset used in industry. Embeddings created with BERT were used together with UMAP and HDBSCAN to model the topics. The human evaluation found that our topic model creates coherent topics. The following discussion revolves around the requirements of industry and what research is needed for production-ready systems.

  • 31.
    Eklund, Anton
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Adlede AB, Umeå, Sweden.
    Forsman, Mona
    Adlede AB, Umeå, Sweden.
    Topic modeling by clustering language model embeddings: human validation on an industry dataset2022Ingår i: EMNLP 2022 Industry Track: Proceedings of the conference, Association for Computational Linguistics (ACL) , 2022, s. 645-653Konferensbidrag (Refereegranskat)
    Abstract [en]

    Topic models are powerful tools to get an overview of large collections of text data, a situation that is prevalent in industry applications. A rising trend within topic modeling is to directly cluster dimension-reduced embeddings created with pretrained language models. It is difficult to evaluate these models because there is no ground truth and automatic measurements may not mimic human judgment. To address this problem, we created a tool called STELLAR for interactive topic browsing which we used for human evaluation of topics created from a real-world dataset used in industry. Embeddings created with BERT were used together with UMAP and HDBSCAN to model the topics. The human evaluation found that our topic model creates coherent topics. The following discussion revolves around the requirements of industry and what research is needed for production-ready systems.

  • 32.
    Eklund, Anton
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Adlede, Umeå, Sweden.
    Forsman, Mona
    Adlede, Umeå, Sweden.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    An empirical configuration study of a common document clustering pipeline2023Ingår i: Northern European Journal of Language Technology (NEJLT), ISSN 2000-1533, Vol. 9, nr 1Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Document clustering is frequently used in applications of natural language processing, e.g. to classify news articles or create topic models. In this paper, we study document clustering with the common clustering pipeline that includes vectorization with BERT or Doc2Vec, dimension reduction with PCA or UMAP, and clustering with K-Means or HDBSCAN. We discuss the inter- actions of the different components in the pipeline, parameter settings, and how to determine an appropriate number of dimensions. The results suggest that BERT embeddings combined with UMAP dimension reduction to no less than 15 dimensions provides a good basis for clustering, regardless of the specific clustering algorithm used. Moreover, while UMAP performed better than PCA in our experiments, tuning the UMAP settings showed little impact on the overall performance. Hence, we recommend configuring UMAP so as to optimize its time efficiency. According to our topic model evaluation, the combination of BERT and UMAP, also used in BERTopic, performs best. A topic model based on this pipeline typically benefits from a large number of clusters.

    Ladda ner fulltext (pdf)
    fulltext
  • 33.
    Eklund, Anton
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Adlede AB, Umeå, Sweden.
    Forsman, Mona
    Adlede AB, Umeå, Sweden.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Dynamic topic modeling by clustering embeddings from pretrained language models: a research proposal2022Ingår i: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop / [ed] Yan Hanqi; Yang Zonghan; Sebastian Ruder; Wan Xiaojun, Association for Computational Linguistics , 2022, s. 84-91Konferensbidrag (Refereegranskat)
    Abstract [en]

    A new trend in topic modeling research is to do Neural Topic Modeling by Clustering document Embeddings (NTM-CE) created with a pretrained language model. Studies have evaluated static NTM-CE models and found them performing comparably to, or even better than other topic models. An important extension of static topic modeling is making the models dynamic, allowing the study of topic evolution over time, as well as detecting emerging and disappearing topics. In this research proposal, we present two research questions to understand dynamic topic modeling with NTM-CE theoretically and practically. To answer these, we propose four phases with the aim of establishing evaluation methods for dynamic topic modeling, finding NTM-CE-specific properties, and creating a framework for dynamic NTM-CE. For evaluation, we propose to use both quantitative measurements of coherence and human evaluation supported by our recently developed tool.

  • 34.
    Eriksson, Erik J.
    et al.
    Umeå universitet, Humanistiska fakulteten, Filosofi och lingvistik.
    Rodman, Robert D.
    Dept. of Computer Science, NCSU, USA.
    Hubal, Robert C.
    Technology Assisted Learning Ctr., RTI International, USA.
    Emotions in speech: juristic implications2007Ingår i: Speaker Classification: Volume I, Berlin: Springer Verlag , 2007Kapitel i bok, del av antologi (Övrigt vetenskapligt)
    Abstract [en]

    This chapter focuses on the detection of emotion in speech and the impact that using technology to automate emotion detection would have within the legal system. The current states of the art for studies of perception and acoustics are described, and a number of implications for legal contexts are provided. We discuss, inter alia, assessment of emotion in others, witness credibility, forensic investigation, and training of law enforcement officers.

  • 35.
    Farahani, Mehrdad
    et al.
    Department of Computer Engineering, Islamic Azad University North Tehran Branch, Tehran, Iran.
    Gharachorloo, Mohammad
    Queensland University of Technology, School of Electrical Engineering and Robotics, Brisbane, Australia.
    Farahani, Marzieh
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Manthouri, Mohammad
    Department of Electrical and Electronic Engineering, Shahed Univerisity, Tehran, Iran.
    ParsBERT: Transformer-based Model for Persian Language Understanding2021Ingår i: Neural Processing Letters, ISSN 1370-4621, E-ISSN 1573-773X, Vol. 53, nr 6, s. 3831-3847Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The surge of pre-trained language models has begun a new era in the field of Natural Language Processing (NLP) by allowing us to build powerful language models. Among these models, Transformer-based models such as BERT have become increasingly popular due to their state-of-the-art performance. However, these models are usually focused on English, leaving other languages to multilingual models with limited resources. This paper proposes a monolingual BERT for the Persian language (ParsBERT), which shows its state-of-the-art performance compared to other architectures and multilingual models. Also, since the amount of data available for NLP tasks in Persian is very restricted, a massive dataset for different NLP tasks as well as pre-training the model is composed. ParsBERT obtains higher scores in all datasets, including existing ones and gathered ones, and improves the state-of-the-art performance by outperforming both multilingual BERT and other prior works in Sentiment Analysis, Text Classification, and Named Entity Recognition tasks.

  • 36.
    Granberg, Johan
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Minock, Michael
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    A natural language interface over the MusicBrainz database2011Ingår i: Proceedings of the 1st workshop on Question Answering over Linked Data (QALD-1) / [ed] Christina Unger, Philipp Cimiano, Vanessa Lopez, Enrico Motta, 2011, s. 38-43Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper demonstrates a way to build a natural language interface (NLI) over semantically rich data. Specifically we show this over the MusicBrainz domain, inspired by the second shared task of the QALD-1 workshop. Our approach uses the tool C-Phrase [4] to build an NLI over a set of views defined over the original MusicBrainz relational database. C-Phrase uses a limited variant of X-Bar theory [3] for syntax and tuple calculus for semantics. The C-Phrase authoring tool works over any domain and only the end configuration has to be redone for each new database covered – a task that does not require deep knowledge about linguistics and system internals. Working over the MusicBrainz domain was a challenge due to the size of the database – quite a lot of effort went into optimizing computation times and memory usage to manageable levels. This paper reports on this work and anticipates a live demonstration for querying by the public

    Ladda ner fulltext (pdf)
    fulltext
  • 37.
    Hansson, Britt
    Umeå universitet, Samhällsvetenskaplig fakultet, Pedagogik.
    Större chans att klara det?: En specialpedagogisk studie av 10 ungdomars syn på hur datorstöd har påverkat deras språk, lärande och skolsituation.2008Självständigt arbete på grundnivå (yrkesexamen), 10 poäng / 15 hpStudentuppsats
    Abstract [sv]

    I studien intervjuades 10 ungdomar om sina erfarenheter av att använda dator med talsyntes och inspelade böcker. De tillfrågades om i vilka situationer verktygen har kommit till nytta eller upplevts hämmande i deras lärande och skolsituation. På grund av stora skolsvårigheter har ungdomarna fått låna en bärbar dator av skolan. Den har de använt både hemma och i skolan. Tillsammans med föräldrar och lärare har de fått handledning vid kommunens Skoldatatek. Att språket utvecklas när det används har varit utgångspunkt i studien, ur ett sociokulturellt perspektiv. Skolan ska erbjuda en tidsenlig utbildning och elever i skolsvårigheter har rätt att få stöd. Hur detta stöd ska utformas kan skapa ett dilemma på den enskilda skolan. Ett stöd riktat direkt till den enskilde kan nämligen uppfattas som att skolsvårigheter ses som en elevburen problematik, vilket inte får förekomma i ”en skola för alla”. Med tanke på detta dilemma var det viktigt att efterforska ungdomarnas upplevelser av stöd, utveckling och hinder, för att förstå om de orsakar utpekande och exkludering. Resultatet visade att ungdomarna upplevde att de kände sig mer motiverade med sina datorverktyg, som har kompenserat deras svårigheter och tilltalat deras olika lärstilar. Ungdomarna sade sig ha blivit säkrare skribenter och läsare tack vare ökat språkbruk. I deras berättelse framgår även nödvändigheten av stöd från lärare och föräldrar. Resultatet pekar på att alternativa verktyg i lärandet skulle kunna medverka till större måluppfyllelse i en skola för alla, med pedagogisk mångfald.

    Ladda ner fulltext (pdf)
    FULLTEXT01
  • 38.
    Hatefi, Arezoo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Document Clustering Using Attentive Hierarchical Document Representation2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    We propose a text clustering algorithm that applies an attention mechanism on both word andsentence level. This ongoing work is motivated by an application in contextual programmatic advertising, where the goal is to grouponline articles into clusters corresponding to agiven set of marketing objectives. The maincontribution is the use of attention to identify words and sentences that are of specific importance for the formation of the clusters

    Ladda ner fulltext (pdf)
    fulltext
  • 39.
    Hatefi, Arezoo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Vu, Xuan-Son
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Bhuyan, Monowar
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    The efficiency of pre-training with objective masking in pseudo labeling for semi-supervised text classificationManuskript (preprint) (Övrigt vetenskapligt)
  • 40.
    Hatefi, Arezoo
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Vu, Xuan-Son
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Bhuyan, Monowar H.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Drewes, Frank
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Cformer: Semi-Supervised Text Clustering Based on Pseudo Labeling2021Ingår i: CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, ACM Digital Library, 2021, s. 3078-3082Konferensbidrag (Refereegranskat)
    Abstract [en]

    We propose a semi-supervised learning method called Cformer for automatic clustering of text documents in cases where clusters are described by a small number of labeled examples, while the majority of training examples are unlabeled. We motivate this setting with an application in contextual programmatic advertising, a type of content placement on news pages that does not exploit personal information about visitors but relies on the availability of a high-quality clustering computed on the basis of a small number of labeled samples.

    To enable text clustering with little training data, Cformer leverages the teacher-student architecture of Meta Pseudo Labels. In addition to unlabeled data, Cformer uses a small amount of labeled data to describe the clusters aimed at. Our experimental results confirm that the performance of the proposed model improves the state-of-the-art if a reasonable amount of labeled data is available. The models are comparatively small and suitable for deployment in constrained environments with limited computing resources. The source code is available at https://github.com/Aha6988/Cformer.

  • 41.
    Hellsten, Simon
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Incremental Re-tokenization in BPE-trained SentencePiece Models2024Självständigt arbete på grundnivå (kandidatexamen), 10 poäng / 15 hpStudentuppsats (Examensarbete)
    Abstract [en]

    This bachelor's thesis in Computer Science explores the efficiency of an incremental re-tokenization algorithm in the context of BPE-trained SentencePiece models used in natural language processing. The thesis begins by underscoring the critical role of tokenization in NLP, particularly highlighting the complexities introduced by modifications in tokenized text. It then presents an incremental re-tokenization algorithm, detailing its development and evaluating its performance against a full text re-tokenization. Experimental results demonstrate that this incremental approach is more time-efficient than full re-tokenization, especially evident in large text datasets. This efficiency is attributed to the algorithm's localized re-tokenization strategy, which limits processing to text areas around modifications. The research concludes by suggesting that incremental re-tokenization could significantly enhance the responsiveness and resource efficiency of text-based applications, such as chatbots and virtual assistants. Future work may focus on predictive models to anticipate the impact of text changes on token stability and optimizing the algorithm for different text contexts.

    Ladda ner fulltext (pdf)
    fulltext
  • 42.
    Hendrick, Stephanie
    Umeå universitet, Humanistiska fakulteten, Humlab. Umeå universitet, Humanistiska fakulteten, Moderna språk. Engelska.
    Following Conversational Traces: Part 1: Creating a corpus with the ICWSM dataset.2007Konferensbidrag (Refereegranskat)
    Abstract [en]

    This poster will present the methodology behind the creation of a linguistic corpus based on a subset of the 2007 International Conference on Weblogs and Social Media dataset. Posts from a small group of political bloggers were tagged for parts of speech and indexed into a corpus using the program Xairia. From this corpus, the political blogger subset will be investigated for register and referential information. Referential information,especially with regards to new and given information, will be compared against network placement both to identify network innovators as well as to compare network placement as a catalyst for innovation. The final section, Further Research, will outline the modifications necessary for the creation of a full-scale corpus based on the entire ICWSM 2006 dataset.

  • 43.
    Jarlbrink, Johan
    et al.
    Umeå universitet, Humanistiska fakulteten, Institutionen för kultur- och medievetenskaper.
    Snickars, Pelle
    Umeå universitet, Humanistiska fakulteten, Institutionen för kultur- och medievetenskaper.
    Cultural heritage as digital noise: nineteenth century newspapers in the digital archive2017Ingår i: Journal of Documentation, ISSN 0022-0418, E-ISSN 1758-7379, Vol. 73, nr 6, s. 1228-1243Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Purpose

    The purpose of this paper is to explore and analyze the digitized newspaper collection at the National Library of Sweden, focusing on cultural heritage as digital noise. In what specific ways are newspapers transformed in the digitization process? If the digitized document is not the same as the source document – is it still a historical record, or is it transformed into something else?

    Design/methodology/approach

    The authors have analyzed the XML files from Aftonbladet 1830 to 1862. The most frequent newspaper words not matching a high-quality references corpus were selected to zoom in on the noisiest part of the paper. The variety of the interpretations generated by optical character recognition (OCR) was examined, as well as texts generated by auto-segmentation. The authors have made a limited ethnographic study of the digitization process.

    Findings

    The research shows that the digital collection of Aftonbladet contains extreme amounts of noise: millions of misinterpreted words generated by OCR, and millions of texts re-edited by the auto-segmentation tool. How the tools work is mostly unknown to the staff involved in the digitization process? Sticking to any idea of a provenance chain is hence impossible, since many steps have been outsourced to unknown factors affecting the source document.

    Originality/value

    The detail examination of digitally transformed newspapers is valuable to scholars depending on newspaper databases in their research. The paper also highlights the fact that libraries outsourcing digitization processes run the risk of losing control over the quality of their collections.

  • 44.
    Khairova, Nina
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. National Technical University ”Kharkiv Polytechnic Institute”, Ukraine.
    Hamon, Thierry
    Institut Galilée, Univ. Sorbonne Paris Nord, France.
    Grabar, Natalia
    University of Lille, France.
    Burov, Yevhen
    Lviv Polytechnic National University, Ukraine.
    Preface: Computational Linguistics Workshop2023Ingår i: CoLInS 2023, Computational Linguistics and Intelligent Systems 2023: Proceedings of the 7th International Conference on Computational Linguistics and Intelligent Systems. Volume II: Computational Linguistics Workshop, CEUR-WS , 2023Konferensbidrag (Refereegranskat)
    Ladda ner fulltext (pdf)
    fulltext
  • 45. Kleyko, Denis
    et al.
    Osipov, Evgeny
    De Silva, Daswin
    Wiklund, Urban
    Umeå universitet, Medicinska fakulteten, Institutionen för strålningsvetenskaper, Radiofysik.
    Vyatkin, Valeriy
    Alahakoon, Damminda
    Distributed representation of n-gram statistics for boosting self-organizing maps with hyperdimensional computing2019Ingår i: Perspectives of system informatics / [ed] Nikolaj Bjørner, Irina Virbitskaite, Andrei Voronkov, Cham: Springer, 2019, s. 64-79Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper presents an approach for substantial reduction of the training and operating phases of Self-Organizing Maps in tasks of 2-D projection of multi-dimensional symbolic data for natural language processing such as language classification, topic extraction, and ontology development. The conventional approach for this type of problem is to use n-gram statistics as a fixed size representation for input of Self-Organizing Maps. The performance bottleneck with n-gram statistics is that the size of representation and as a result the computation time of Self-Organizing Maps grows exponentially with the size of n-grams. The presented approach is based on distributed representations of structured data using principles of hyperdimensional computing. The experiments performed on the European languages recognition task demonstrate that Self-Organizing Maps trained with distributed representations require less computations than the conventional n-gram statistics while well preserving the overall performance of Self-Organizing Maps.

  • 46.
    Kucherenko, Taras
    et al.
    SEED, Electronic Arts (EA), Stockholm, Sweden.
    Nagy, Rajmund
    KTH Royal Institute of Technology, Stockholm, Sweden.
    Yoon, Youngwoo
    ETRI, Daejeon, South Korea.
    Woo, Jieyeon
    ISIR, Sorbonne University, Paris, France.
    Nikolov, Teodor
    Umeå universitet.
    Tsakov, Mihail
    Umeå universitet.
    Henter, Gustav Eje
    KTH Royal Institute of Technology, Stockholm, Sweden.
    The GENEA challenge 2023: a large-scale evaluation of gesture generation models in monadic and dyadic settings2023Ingår i: ICMI '23: proceedings of the 25th international conference on multimodal interaction / [ed] Elisabeth André; Mohamed Chetouani; Dominique Vaufreydaz; Gale Lucas; Tanja Schultz; Louis-Philippe Morency; Alessandro Vinciarelli, Association for Computing Machinery (ACM), 2023, s. 792-801Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper reports on the GENEA Challenge 2023, in which participating teams built speech-driven gesture-generation systems using the same speech and motion dataset, followed by a joint evaluation. This year's challenge provided data on both sides of a dyadic interaction, allowing teams to generate full-body motion for an agent given its speech (text and audio) and the speech and motion of the interlocutor. We evaluated 12 submissions and 2 baselines together with held-out motion-capture data in several large-scale user studies. The studies focused on three aspects: 1) the human-likeness of the motion, 2) the appropriateness of the motion for the agent's own speech whilst controlling for the human-likeness of the motion, and 3) the appropriateness of the motion for the behaviour of the interlocutor in the interaction, using a setup that controls for both the human-likeness of the motion and the agent's own speech. We found a large span in human-likeness between challenge submissions, with a few systems rated close to human mocap. Appropriateness seems far from being solved, with most submissions performing in a narrow range slightly above chance, far behind natural motion. The effect of the interlocutor is even more subtle, with submitted systems at best performing barely above chance. Interestingly, a dyadic system being highly appropriate for agent speech does not necessarily imply high appropriateness for the interlocutor.

  • 47.
    Li, Yunyao
    et al.
    IBM Research - Almaden.
    Grandison, Tyrone
    The Data-Driven Institute.
    Silveyra, Patricia
    University of North Carolina - Chapel Hill.
    Douraghy, Ali
    The National Academies of Sciences, Engineering and Medicine.
    Guan, Xinyu
    Yale University.
    Kieselbach, Thomas
    Umeå universitet, Umeå universitetsbibliotek (UB).
    Li, Chengka
    University of Texas - Arlington.
    Zhang, Haiqi
    University of Texas - Arlington.
    Jennifer for COVID-19: An NLP-Powered Chatbot Built for the Peopleand by the People to Combat Misinformation2020Ingår i: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, 2020Konferensbidrag (Refereegranskat)
    Abstract [en]

    Just as SARS-CoV-2 continues to infect a growing number of people around the world, harmful misinformation about the outbreak also continues to spread. We designed and built Jennifer chatbot to provide easily accessible information from reliable resources to answer questions related to the current COVID-19 pandemic. It covers a wide variety of topics, from case statistics to best practices for disease prevention and management.

    Ladda ner fulltext (pdf)
    Submitted paper
    Ladda ner fulltext (pdf)
    fulltext
  • 48.
    Lindgren, Eva
    et al.
    Umeå universitet, Humanistiska fakulteten, Institutionen för språkstudier.
    Sullivan, Kirk
    Umeå universitet, Humanistiska fakulteten, Institutionen för språkstudier.
    Zhao, Huahui
    Umeå universitet, Humanistiska fakulteten, Institutionen för språkstudier.
    Deutschmann, Mats
    Umeå universitet, Humanistiska fakulteten, Institutionen för språkstudier.
    Steinvall, Anders
    Umeå universitet, Humanistiska fakulteten, Institutionen för språkstudier.
    Developing Peer-to-Peer Supported Reflection as a Life-Long Learning Skill: an Example from the Translation Classroom2011Ingår i: Human Development and Global Advancements through Information Communication Technologies: New Initiatives / [ed] Susheel Chhabra & Hakikur Rahman, Hershey USA: IGI publishing , 2011, 1, s. 188-210Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    Life-long learning skills have moved from being a side-affect of a formal education to skills that are explicitly trained during a university degree. In a case study a University class undertook a translation from Swedish to English in a keystroke logging environment and then replayed their translations in pairs while discussing their thought processes when undertaking the translations, and why they made particular choices and changes to their translations. Computer keystroke logging coupled with Peerbased intervention assisted the students in discussing how they worked with their translations, enabled them to see how their ideas relating to the translation developed as they worked with the text, develop reflection skills and learn from their peers. The process showed that Computer Keystroke logging coupled with Peer-based intervention has to potential to (1) support student reflection and discussion around their translation tasks, (2) enhance student motivation and enthusiasm for translation and (3) develop peer-to-peer supported reflection as a life-long learning skill.

  • 49.
    Lindgren, Helena
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Heintz, Fredrik
    Linköping University, Linköping, Sweden.
    The wasp-ed AI curriculum: A holistic curriculum for artificial intelligence2023Ingår i: INTED2023 Proceedings: 17th International Technology, Education and Development Conference, 2023, s. 6496-6502Konferensbidrag (Refereegranskat)
    Abstract [en]

    Efforts in lifelong learning and competence development in Artificial Intelligence (AI) have been on the rise for several years. These initiatives have mostly been applied to Science, Technology, Engineering and Mathematics (STEM) disciplines. Even though there has been significant development in Digital Humanities to incorporate AI methods and tools in higher education, the potential for such competences in Arts, Humanities and Social Sciences is far from being realised. Furthermore, there is an increasing awareness that the STEM disciplines need to include competences relating to AI in humanity and society. This is especially important considering the widening and deepening of the impact of AI on society at large and individuals. 

    The aim of the presented work is to provide a broad and inclusive AI Curriculum that covers the breadth of the topic as it is seen today, which is significantly different from only a decade ago. It is important to note that with the curriculum we mean an overview of the subject itself, rather than a particular education program. The curriculum is intended to be used as a foundation for educational activities in AI to for example harmonize terminology, compare different programs, and identify educational gaps to be filled. An important aspect of the curriculum is the ethical, legal, and societal aspects of AI and to not limit the curriculum to the STEM subjects, instead extending to a holistic, human-centred AI perspective. 

    The curriculum is developed as part of the national research program WASP-ED, the Wallenberg AI and transformative technologies education development program. 

    Ladda ner fulltext (pdf)
    fulltext
  • 50.
    Lindgren, Simon
    Umeå universitet, Samhällsvetenskapliga fakulteten, Sociologiska institutionen.
    Introducing Connected Concept Analysis: A network approach to big text datasets2016Ingår i: Text & Talk, ISSN 1860-7330, E-ISSN 1860-7349, Vol. 36, nr 3, s. 341-362Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper introduces Connected Concept Analysis (CCA) as a framework for text analysis which ties qualitative and quantitative considerations together in one unified model. Even though CCA can be used to map and analyze any full text dataset, of any size, the method was created specifically for taking the sensibilities of qualitative discourse analysis into the age of the Internet and big data. Using open data from a large online survey on habits and views relating to intellectual property rights, piracy and file sharing, I introduce CCA as a mixed-method approach aiming to bring out knowledge about corpuses of text, the sizes of which make it unfeasible to make comprehensive close readings. CCA aims to do this without reducing the text to numbers, as often becomes the case in content analysis. Instead of simply counting words or phrases, I draw on constant comparative coding for building concepts and on network analysis for connecting them. The result - a network graph visualization of key connected concepts in the analyzed text dataset - meets the need for text visualization systems that can support discourse analysis.

12 1 - 50 av 92
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf