Umeå universitets logga

umu.sePublikationer
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Alternativa namn
Publikationer (9 of 9) Visa alla publikationer
Lorig, F., Tucker, J., Dahlgren Lindström, A., Dignum, F., Murukannaiah, P., Theodorou, A. & Yolum, P. (Eds.). (2024). HHAI 2024: hybrid human AI systems for the social good: proceedings of the third international conference on hybrid human-artificial intelligence. Amsterdam: IOS Press
Öppna denna publikation i ny flik eller fönster >>HHAI 2024: hybrid human AI systems for the social good: proceedings of the third international conference on hybrid human-artificial intelligence
Visa övriga...
2024 (Engelska)Proceedings (redaktörskap) (Refereegranskat)
Abstract [en]

The field of hybrid human-artificial intelligence (HHAI), although primarily driven by developments in AI, also requires fundamentally new approaches and solutions. Multidisciplinary in nature, it calls for collaboration across various research domains, such as AI, HCI, the cognitive and social sciences, philosophy and ethics, and complex systems, to name but a few.

This book presents the proceedings of HHAI 2024, the 3rd International Conference on Hybrid Human-Artificial Intelligence, held from 10-14 June 2024 in Malmö, Sweden. The focus of HHAI 2024 was on artificially-intelligent systems that cooperate synergistically, proactively and purposefully with humans, amplifying rather than replacing human intelligence. A total of 62 submissions were received for the main track of the conference, of which 31 were accepted for presentation after a thorough double blind review process. These comprised 9 full papers, 5 blue sky papers, and 17 working papers, making the final acceptance rate for full papers 29%. Acceptance rate across all tracks of the main program was 50%. This book contains all submissions accepted for the main track, as well as the proposals for the Doctoral Consortium and extended abstracts from the Posters and Demos track. Topics covered include human-AI interaction and collaboration; learning, reasoning and planning with humans and machines in the loop; fair, ethical, responsible, and trustworthy AI; societal awareness of AI; and the role of design and compositionality of AI systems in interpretable/collaborative AI, among others.

Providing a current overview of research and development, the book will be of interest to all those working in the field and facilitate the ongoing exchange and development of ideas across a range of disciplines.

Ort, förlag, år, upplaga, sidor
Amsterdam: IOS Press, 2024. s. 492
Serie
Frontiers in Artificial Intelligence and Applications, ISSN 0922-6389, E-ISSN 1879-8314 ; 386
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-228017 (URN)10.3233/FAIA386 (DOI)9781643685229 (ISBN)
Tillgänglig från: 2024-07-22 Skapad: 2024-07-22 Senast uppdaterad: 2024-07-22Bibliografiskt granskad
Dahlgren Lindström, A. (2024). Learning, reasoning, and compositional generalisation in multimodal language models. (Doctoral dissertation). Umeå: Umeå University
Öppna denna publikation i ny flik eller fönster >>Learning, reasoning, and compositional generalisation in multimodal language models
2024 (Engelska)Doktorsavhandling, monografi (Övrigt vetenskapligt)
Alternativ titel[sv]
Inlärning, resonemang, och kompositionalitet i multimodala språkmodeller
Abstract [en]

We humans learn language and how to interact with the world through our different senses, grounding our language in what we can see, touch, hear, and smell. We call these streams of information different modalities, and our efficient processing and synthesis of the interactions between different modalities is a cornerstone of our intelligence. Therefore, it is important to study how we can build multimodal language models, where machine learning models learn from more than just text. This is particularly important in the era of large language models (LLMs), where their general capabilities are unclear and unreliable. This thesis investigates learning and reasoning in multimodal language models, and their capabilities to compositionally generalise in visual question answering tasks. Compositional generalisation is the process in which we produce and understand novel sentences, by systematically combining words and sentences to uncover the meaning in language, and has proven a challenge for neural networks. Previously, the literature has focused on compositional generalisation in text-only language models. One of the main contributions of this work is the extensive investigation of text-image language models. The experiments in this thesis compare three neural network-based models, and one neuro-symbolic method, and operationalise language grounding as the ability to reason with relevant functions over object affordances.

In order to better understand the capabilities of multimodal models, this thesis introduces CLEVR-Math as a synthetic benchmark of visual mathematical reasoning. The CLEVR-Math dataset involve tasks such as adding and removing objects from 3D scenes based on textual instructions, such as \emph{Remove all blue cubes. How many objects are left?}, and is given as a curriculum of tasks of increasing complexity. The evaluation set of CLEVR-Math includes extensive testing of different functional and object attribute generalisations. We open up the internal representations of these models using a technique called probing, where linear classifiers are trained to recover concepts such as colours or named entities from the internal embeddings of input data. The results show that while models are fairly good at generalisation with attributes (i.e.~solving tasks involving never before seen objects), it is a big challenge to generalise over functions and to learn abstractions such as categories. The results also show that complexity in the training data is a driver of generalisation, where an extended curriculum improves the general performance across tasks and generalisation tests. Furthermore, it is shown that training from scratch versus transfer learning has significant effects on compositional generalisation in models.

The results identify several aspects of how current methods can be improved in the future, and highlight general challenges in multimodal language models. A thorough investigation of compositional generalisation suggests that the pre-training of models allow models access to inductive biases that can be useful to solve new tasks. Contrastingly, models trained from scratch show much lower overall performance on the synthetic tasks at hand, but show lower relative generalisation gaps. In the conclusions and outlook, we discuss the implications of these results as well as future research directions.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå University, 2024. s. 192
Serie
Report / UMINF, ISSN 0348-0542 ; 24.07
Nyckelord
multimodal, language models, compositional, generalisation, generalisation, generalization, reasoning, probing, grounding
Nationell ämneskategori
Språkbehandling och datorlingvistik
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-224571 (URN)9789180704175 (ISBN)9789180704182 (ISBN)
Disputation
2024-06-13, Aula Biologica, Biologihuset, Umeå, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2024-05-23 Skapad: 2024-05-20 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Methnani, L., Dahlgren Lindström, A. & Dignum, V. (2024). The impact of mixed-initiative on collaboration in hybrid AI. In: Fabian Lorig; Jason Tucker; Adam Dahlgren Lindström; Frank Dignum; Pradeep Murukannaiah; Andreas Theodorou; Pınar Yolum (Ed.), HHAI 2024: hybrid human AI systems for the social good: proceedings of the third international conference on hybrid human-artificial intelligence. Paper presented at 3rd International Conference on Hybrid Human-Artificial Intelligence, HHAI 2024, Hybrid, Malmö, Sweden, June 10-14, 2024 (pp. 469-471). Amsterdam: IOS Press
Öppna denna publikation i ny flik eller fönster >>The impact of mixed-initiative on collaboration in hybrid AI
2024 (Engelska)Ingår i: HHAI 2024: hybrid human AI systems for the social good: proceedings of the third international conference on hybrid human-artificial intelligence / [ed] Fabian Lorig; Jason Tucker; Adam Dahlgren Lindström; Frank Dignum; Pradeep Murukannaiah; Andreas Theodorou; Pınar Yolum, Amsterdam: IOS Press, 2024, s. 469-471Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This paper explores the integration of mixed-initiative systems in human-AI teams to improve coordination and communication in Search and Rescue (SAR) scenarios, leveraging dynamic control sharing to enhance operational effectiveness.

Ort, förlag, år, upplaga, sidor
Amsterdam: IOS Press, 2024
Serie
Frontiers in Artificial Intelligence and Applications, ISSN 0922-6389, E-ISSN 1879-8314 ; 386
Nyckelord
Human-AI interaction, mixed-initiative systems, search and rescue, team coordination
Nationell ämneskategori
Annan teknik
Identifikatorer
urn:nbn:se:umu:diva-228000 (URN)10.3233/FAIA240227 (DOI)2-s2.0-85198757074 (Scopus ID)9781643685229 (ISBN)
Konferens
3rd International Conference on Hybrid Human-Artificial Intelligence, HHAI 2024, Hybrid, Malmö, Sweden, June 10-14, 2024
Tillgänglig från: 2024-07-22 Skapad: 2024-07-22 Senast uppdaterad: 2025-02-18Bibliografiskt granskad
Aler Tubella, A., Coelho Mollo, D., Dahlgren, A., Devinney, H., Dignum, V., Ericson, P., . . . Nieves, J. C. (2023). ACROCPoLis: a descriptive framework for making sense of fairness. In: FAccT '23: Proceedings of the 2023 ACM conference on fairness, accountability, and transparency. Paper presented at 2023 ACM Conference on Fairness, Accountability, and Transparency, Chicago, Illinois, USA, June 12-15, 2023 (pp. 1014-1025). ACM Digital Library
Öppna denna publikation i ny flik eller fönster >>ACROCPoLis: a descriptive framework for making sense of fairness
Visa övriga...
2023 (Engelska)Ingår i: FAccT '23: Proceedings of the 2023 ACM conference on fairness, accountability, and transparency, ACM Digital Library, 2023, s. 1014-1025Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Fairness is central to the ethical and responsible development and use of AI systems, with a large number of frameworks and formal notions of algorithmic fairness being available. However, many of the fairness solutions proposed revolve around technical considerations and not the needs of and consequences for the most impacted communities. We therefore want to take the focus away from definitions and allow for the inclusion of societal and relational aspects to represent how the effects of AI systems impact and are experienced by individuals and social groups. In this paper, we do this by means of proposing the ACROCPoLis framework to represent allocation processes with a modeling emphasis on fairness aspects. The framework provides a shared vocabulary in which the factors relevant to fairness assessments for different situations and procedures are made explicit, as well as their interrelationships. This enables us to compare analogous situations, to highlight the differences in dissimilar situations, and to capture differing interpretations of the same situation by different stakeholders.

Ort, förlag, år, upplaga, sidor
ACM Digital Library, 2023
Nyckelord
Algorithmic fairness; socio-technical processes; social impact of AI; responsible AI
Nationell ämneskategori
Systemvetenskap, informationssystem och informatik
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-209705 (URN)10.1145/3593013.3594059 (DOI)001062819300088 ()2-s2.0-85163594710 (Scopus ID)978-1-4503-7252-7 (ISBN)
Konferens
2023 ACM Conference on Fairness, Accountability, and Transparency, Chicago, Illinois, USA, June 12-15, 2023
Tillgänglig från: 2023-06-13 Skapad: 2023-06-13 Senast uppdaterad: 2025-04-24Bibliografiskt granskad
Dahlgren Lindström, A. & Abraham, S. S. (2022). CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning. In: d'Avila Garcez A.; Jimenez-Ruiz E.; Jimenez-Ruiz E. (Ed.), CEUR Workshop Proceedings: . Paper presented at 16th International Workshop on Neural-Symbolic Learning and Reasoning, NeSy 2022, Windsor, UK, september 28-30, 2022.. CEUR-WS, 3212
Öppna denna publikation i ny flik eller fönster >>CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning
2022 (Engelska)Ingår i: CEUR Workshop Proceedings / [ed] d'Avila Garcez A.; Jimenez-Ruiz E.; Jimenez-Ruiz E., CEUR-WS , 2022, Vol. 3212Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We introduce CLEVR-Math, a multi-modal math word problems dataset consisting of simple math word problems involving addition/subtraction, represented partly by a textual description and partly by an image illustrating the scenario. The text describes actions performed on the scene that is depicted in the image. Since the question posed may not be about the scene in the image, but about the state of the scene before or after the actions are applied, the solver envision or imagine the state changes due to these actions. Solving these word problems requires a combination of language, visual and mathematical reasoning. We apply state-of-the-art neural and neuro-symbolic models for visual question answering on CLEVR-Math and empirically evaluate their performances. Our results show how neither method generalise to chains of operations. We discuss the limitations of the two in addressing the task of multi-modal word problem solving.

Ort, förlag, år, upplaga, sidor
CEUR-WS, 2022
Serie
International Workshop on Neural-Symbolic Learning and Reasoning, ISSN 1613-0073
Nyckelord
Math Word Problem Solving, Multimodal Reasoning, Neuro-Symbolic, Visual Question Answering
Nationell ämneskategori
Datorgrafik och datorseende
Identifikatorer
urn:nbn:se:umu:diva-200100 (URN)2-s2.0-85138703727 (Scopus ID)
Konferens
16th International Workshop on Neural-Symbolic Learning and Reasoning, NeSy 2022, Windsor, UK, september 28-30, 2022.
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Tillgänglig från: 2022-10-13 Skapad: 2022-10-13 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Björklund, J., Dahlgren Lindström, A. & Drewes, F. (2021). Bridging Perception, Memory, and Inference through Semantic Relations. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: . Paper presented at EMNLP 2021, 2021 Conference on Empirical Methods in Natural Language Processing, Online and in Punta Cana, Dominican Republic, November 7-11, 2021 (pp. 9136-9142). Association for Computational Linguistics (ACL)
Öppna denna publikation i ny flik eller fönster >>Bridging Perception, Memory, and Inference through Semantic Relations
2021 (Engelska)Ingår i: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (ACL) , 2021, s. 9136-9142Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

There is a growing consensus that surface form alone does not enable models to learn meaning and gain language understanding. This warrants an interest in hybrid systems that combine the strengths of neural and symbolic methods. We favour triadic systems consisting of neural networks, knowledge bases, and inference engines. The network provides perception, that is, the interface between the system and its environment. The knowledge base provides explicit memory and thus immediate access to established facts. Finally, inference capabilities are provided by the inference engine which reflects on the perception, supported by memory, to reason and discover new facts. In this work, we probe six popular language models for semantic relations and outline a future line of research to study how the constituent subsystems can be jointly realised and integrated.

Ort, förlag, år, upplaga, sidor
Association for Computational Linguistics (ACL), 2021
Nationell ämneskategori
Datavetenskap (datalogi) Datorsystem
Identifikatorer
urn:nbn:se:umu:diva-193832 (URN)2-s2.0-85127452384 (Scopus ID)9781955917094 (ISBN)
Konferens
EMNLP 2021, 2021 Conference on Empirical Methods in Natural Language Processing, Online and in Punta Cana, Dominican Republic, November 7-11, 2021
Tillgänglig från: 2022-04-21 Skapad: 2022-04-21 Senast uppdaterad: 2023-03-24Bibliografiskt granskad
Woldemariam, Y. D. & Dahlgren, A. (2020). Adapting language specific components of cross-media analysis frameworks to less-resourced languages: the case of Amharic. In: Dorothee Beermann; Laurent Besacier; Sakriani Sakti; Claudia Soria (Ed.), Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020): . Paper presented at Language Resources and Evaluation Conference (LREC 2020), Marseille, France, May 11–16, 2020 (pp. 298-305).
Öppna denna publikation i ny flik eller fönster >>Adapting language specific components of cross-media analysis frameworks to less-resourced languages: the case of Amharic
2020 (Engelska)Ingår i: Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020) / [ed] Dorothee Beermann; Laurent Besacier; Sakriani Sakti; Claudia Soria, 2020, s. 298-305Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We present an ASR based pipeline for Amharic that orchestrates NLP components within a cross media analysis framework (CMAF). One of the major challenges that are inherently associated with CMAFs is effectively addressing multi-lingual issues. As a result, many languages remain under-resourced and fail to leverage out of available media analysis solutions. Although spoken natively by over 22 million people and there is an ever-increasing amount of Amharic multimedia content on the Web, querying them with simple text search is difficult. Searching for, especially audio/video content with simple key words, is even hard as they exist in their raw form. In this study, we introduce a spoken and textual content processing workflow into a CMAF for Amharic. We design an ASR-named entity recognition (NER) pipeline that includes three main components: ASR, a transliterator and NER. We explore various acoustic modeling techniques and develop an OpenNLP-based NER extractor along with a transliterator that interfaces between ASR and NER. The designed ASR-NER pipeline for Amharic promotes the multi-lingual support of CMAFs. Also, the state-of-the art design principles and techniques employed in this study shed light for other less-resourced languages, particularly the Semitic ones.

Nyckelord
Speech recognition, named entity recognition, Less-resourced languages, Amharic, Cross-media analysis
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datorlingvistik
Identifikatorer
urn:nbn:se:umu:diva-170765 (URN)979-10-95546-35-1 (ISBN)
Konferens
Language Resources and Evaluation Conference (LREC 2020), Marseille, France, May 11–16, 2020
Tillgänglig från: 2020-05-15 Skapad: 2020-05-15 Senast uppdaterad: 2024-02-01Bibliografiskt granskad
Dahlgren Lindström, A., Björklund, J., Bensch, S. & Drewes, F. (2020). Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case. In: Proceedings of the 28th International Conference on Computational Linguistics (COLING): . Paper presented at COling´20220 : 8th International Conference on Computational Linguistics, Barcelona, Spain (Online), December 8-13, 2020 (pp. 730-744).
Öppna denna publikation i ny flik eller fönster >>Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case
2020 (Engelska)Ingår i: Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020, s. 730-744Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic embeddings comes from the distillation and enrichment of information through machine learning, their inner workings are poorly understood and there is a shortage of analysis tools. To address this problem, we generalize the notion of probing tasks to the visual-semantic case. To this end, we (i) discuss the formalization of probing tasks for embeddings of image-caption pairs, (ii) define three concrete probing tasks within our general framework, (iii) train classifiers to probe for those properties, and (iv) compare various state-of-the-art embeddings under the lens of the proposed probing tasks. Our experiments reveal an up to 12% increase in accuracy on visual-semantic embeddings compared to the corresponding unimodal embeddings, which suggest that the text and image dimensions represented in the former do complement each other

Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:umu:diva-179880 (URN)10.18653/v1/2020.coling-main.64 (DOI)
Konferens
COling´20220 : 8th International Conference on Computational Linguistics, Barcelona, Spain (Online), December 8-13, 2020
Tillgänglig från: 2021-02-11 Skapad: 2021-02-11 Senast uppdaterad: 2021-04-20Bibliografiskt granskad
Björklund, H., Björklund, J., Dahlgren, A. & Demeke, Y. (2016). Implementing a speech-to-text pipeline on the MICO platform. Umeå University
Öppna denna publikation i ny flik eller fönster >>Implementing a speech-to-text pipeline on the MICO platform
2016 (Engelska)Rapport (Övrigt vetenskapligt)
Abstract [en]

MICO is an open-source platform for cross-media analysis, querying, and recommendation. It is the major outcome of the European research project Media in Context, and has been contributed to by academic and industrial partners from Germany, Austria, Sweden, Italy, and the UK. A central idea is to group sets of related media objects into multimodal content items, and to process and store these as logical units. The platform is designed to be easy to extend and adapt, and this makes it a useful building block for a diverse set of multimedia applications. To promote the platform and demonstrate its potential, we describe our work on a Kaldi-based speech-recognition pipeline.

Ort, förlag, år, upplaga, sidor
Umeå University, 2016
Serie
Report / UMINF, ISSN 0348-0542 ; 16.07
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-220303 (URN)
Tillgänglig från: 2024-02-01 Skapad: 2024-02-01 Senast uppdaterad: 2024-02-01Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-1112-2981

Sök vidare i DiVA

Visa alla publikationer