Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
HEAL-SWIN: a vision transformer on the sphere
Chalmers University of Technology, University of Gothenburg, Department of Mathematical Sciences, Gothenburg, Sweden.
Chalmers University of Technology, University of Gothenburg, Department of Mathematical Sciences, Gothenburg, Sweden.
Chalmers University of Technology, University of Gothenburg, Department of Mathematical Sciences, Gothenburg, Sweden.
Neural Information Processing, Science of Intelligence, Technical University Berlin, Berlin, Germany.
Show others and affiliations
2024 (English)In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society , 2024, p. 6067-6077Conference paper, Published paper (Refereed)
Abstract [en]

High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchi-cal Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, enabling the network to process spherical representations with minimal computational overhead. We demonstrate the superior performance of our model on both synthetic and real automotive datasets, as well as a selection of other image datasets, for semantic segmentation, depth regression and classification tasks.

Place, publisher, year, edition, pages
IEEE Computer Society , 2024. p. 6067-6077
Series
Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition), ISSN 1063-6919, E-ISSN 2575-7075
Keywords [en]
depth estimation, fisheye images, image classification, omni-directional images, semantic segmentation, spherical grid, transformer
National Category
Computer graphics and computer vision Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:umu:diva-239133DOI: 10.1109/CVPR52733.2024.00580Scopus ID: 2-s2.0-85200821799ISBN: 9798350353006 (electronic)OAI: oai:DiVA.org:umu-239133DiVA, id: diva2:1961052
Conference
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, USA, June 16-22, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)German Research Foundation (DFG), 390523135Available from: 2025-05-26 Created: 2025-05-26 Last updated: 2025-05-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusPublicly available code

Authority records

Ohlsson, Fredrik

Search in DiVA

By author/editor
Ohlsson, Fredrik
By organisation
Department of Mathematics and Mathematical Statistics
Computer graphics and computer visionOther Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 110 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf