Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Inputmix: a strategy to regularize and balance multi-modality and multi-view model learning
Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics.
Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics. RISE Research Institutes of Sweden, Sweden.ORCID iD: 0000-0002-0562-2082
2024 (English)In: ICASSP 2024 - 2024 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, 2024, p. 5455-5459Conference paper, Published paper (Refereed)
Abstract [en]

Real-world perception tasks often involve multiple modalities or views of input. While joint training of multiple modality classification models has been explored previously, it has not consistently outperformed the best single modality model. This paper aims to address one of the reasons for this: the difficulty in balancing the contributions of each input in the end-to-end training of multi-input models. Additionally, the increased capacity of multi-input networks can lead to overfitting. To solve these issues, we propose InputMix, a simple yet effective method for optimally mixing different inputs. Our method mixes a certain proportion p of input pairs to relieve the increased capacity problems and assigns a weighting factor λ for each input to generate a mixed target, allowing us to specify the contributions of each input. Experimental results on three multi-input classification tasks demonstrate that our method significantly improves the generalization performance of multi-input neural networks.

Place, publisher, year, edition, pages
IEEE, 2024. p. 5455-5459
Series
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ISSN 1520-6149, E-ISSN 2379-190X
Keywords [en]
multi-modality learning, multi-view learning
National Category
Control Engineering
Identifiers
URN: urn:nbn:se:umu:diva-226501DOI: 10.1109/ICASSP48485.2024.10446664ISI: 001285850005134Scopus ID: 2-s2.0-85195364390ISBN: 9798350344851 (electronic)ISBN: 9798350344868 (print)OAI: oai:DiVA.org:umu-226501DiVA, id: diva2:1878293
Conference
49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024, Seoul, Republic of Korea, April 14-19, 2024
Note

Codes are available at https://github.com/JesseWong333/inputmix/

Available from: 2024-06-26 Created: 2024-06-26 Last updated: 2026-04-13Bibliographically approved
In thesis
1. Cooperative perception for next-generation autonomous vehicles
Open this publication in new window or tab >>Cooperative perception for next-generation autonomous vehicles
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Samverkande perception för nästa generations autonoma fordon
Abstract [en]

Cooperative perception has emerged as a key paradigm for enhancing environmental understanding in multi-agent systems by fusing sensory information from multiple agents to achieve more comprehensive and accurate perception than single-agent approaches.Despite its demonstrated benefits, existing cooperative perception methods face critical limitations in practical deployments, primarily due to model heterogeneity, latency, and limited communication bandwidth.

This Ph.D. thesis addresses the gap between the theoretical promise of cooperative perception and its practical deployment by systematically investigating how to design cooperative perception systems that are robust, efficient, and scalable under realistic constraints. The main objective of this research is to develop unified frameworks that enable effective multi-agent perception.

To this end, the thesis proposes a series of novel methods targeting these challenges.First, as a foundational study, InputMix is proposed to balance the contributions of heterogeneous sensors in joint training scenarios. Second, an intermediate model-agnostic cooperative perception framework is introduced to enable modular training and seamless collaboration among agents with heterogeneous models. Third, the Latency-Robust Cooperative Perception (LRCP) framework is developed to mitigate the adverse effects of temporal misalignment among agents. Fourth, a lightweight, codebook-free feature compression framework is designed to reduce communication overhead while preserving perceptual performance. Finally, these components are integrated into a unified framework.

Extensive experiments on public benchmark datasets demonstrate that the proposed methods achieve perception performance comparable to the ideal scenario under latency constraints, while enabling effective collaboration among heterogeneous agents and substantially reducing communication bandwidth.

The main contributions of this thesis lie in establishing practical cooperative perception frameworks that collectively address multiple fundamental challenges in multi-agent perception. The findings of this research have broader implications for large-scale autonomous systems, including connected autonomous vehicles and distributed robotic platforms, where reliable cooperative perception under communication and system heterogeneity constraints is essential.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2026. p. 65
Keywords
Cooperative Perception, Autonomous Driving
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:umu:diva-251909 (URN)978-91-6850-019-5 (ISBN)978-91-6850-020-1 (ISBN)
Public defence
2026-05-07, NAT.D.440, 09:00 (English)
Opponent
Supervisors
Available from: 2026-04-16 Created: 2026-04-13 Last updated: 2026-04-14Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Nordström, Tomas

Search in DiVA

By author/editor
Wang, JunjieNordström, Tomas
By organisation
Department of Applied Physics and Electronics
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 5105 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf