umu.sePublications
Change search
Refine search result
1 - 6 of 6
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Daldorff, Lars K. S.
    et al.
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Bengt, Eliasson
    Umeå University, Faculty of Science and Technology, Department of Physics.
    Parallelization of a Vlasov–Maxwell solver in four-dimensional phase space2009In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 35, no 2, p. 109-115Article in journal (Refereed)
    Abstract [en]

    We present a parallelized algorithm for solving the time-dependent Vlasov–Maxwell system of equations in the four-dimensional phase space (two spatial and velocity dimensions). One Vlasov equation is solved for each particle species, from which charge and current densities are calculated for the Maxwell equations. The parallelization is divided into two different layers. For the first layer, each plasma species is given its own processor group. On the second layer, the distribution function is domain decomposed on its dedicated resources. By separating the communication and calculation steps, we have met the design criteria of good speedup and simplicity in the implementation.

  • 2.
    Jäger, Gerold
    et al.
    Computer Science Institute, University of Halle-Wittenberg, D-06120 Halle (Saale), Germany.
    Wagner, Clemens
    denkwerk, Vogelsanger Straße 66, D-50823 Köln, Germany.
    Efficient parallelizations of Hermite and Smith normal form algorithms2009In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 35, no 6, p. 345-357Article in journal (Refereed)
    Abstract [en]

    Hermite and Smith normal form are important forms of matrices used in linear algebra. These terms have many applications in group theory and number theory. As the entries of the matrix and of its corresponding transformation matrices can explode during the computation, it is a very difficult problem to compute the Hermite and Smith normal form of large dense matrices. The main problems of the computation are the large execution times and the memory requirements which might exceed the memory of one processor. To avoid these problems, we develop parallelizations of Hermite and Smith normal form algorithms. These are the first parallelizations of algorithms for computing the normal forms with corresponding transformation matrices, both over the rings Z and F[x]. We show that our parallel versions have good efficiency, i.e., by doubling the processes, the execution time is nearly halved. Furthermore, they succeed in computing normal forms of dense large example matrices over the rings Q[x], F3[x], and F5[x].

  • 3.
    Karlsson, Lars
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kressner, Daniel
    Uschmajew, Andre
    Parallel algorithms for tensor completion in the CP format2016In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 57, p. 222-234Article in journal (Refereed)
    Abstract [en]

    Low-rank tensor completion addresses the task of filling in missing entries in multidimensional data. It has proven its versatility in numerous applications, including context aware recommender systems and multivariate function learning. To handle large-scale datasets and applications that feature high dimensions, the development of distributed algorithms is central. In this work, we propose novel, highly scalable algorithms based on a combination of the canonical polyadic (CP) tensor format with block coordinate descent methods. Although similar algorithms have been proposed for the matrix case, the case of higher dimensions gives rise to a number of new challenges and requires a different paradigm for data distribution. The convergence of our algorithms is analyzed and numerical experiments illustrate their performance on distributed-memory architectures for tensors from a range of different applications.

  • 4.
    Karlsson, Lars
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures2011In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 37, no 12, p. 771-782Article in journal (Refereed)
    Abstract [en]

    We consider parallel reduction of a real matrix to Hessenberg form using orthogonal transformations. Standard Hessenberg reduction algorithms reduce the columns of the matrix from left to right in either a blocked or unblocked fashion. However, the standard blocked variant performs 20% of the computations in terms of matrix vector multiplications. We show that a two-stage approach consisting of an intermediate reduction to block Hessenberg form speeds up the reduction by avoiding matrix vector multiplications. We describe and evaluate a new high-performance implementation of the two-stage approach that attains significant speedups over the one-stage approach. The key components are a dynamically scheduled implementation of Stage 1 and a blocked, adaptively load-balanced implementation of Stage 2. (C) 2011 Elsevier B.V. All rights reserved.

  • 5.
    Karlsson, Lars
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Kågström, Bo
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Wadbro, Eddie
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Fine-Grained Bulge-Chasing Kernels for Strongly Scalable Parallel QR Algorithms2014In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, no 7, p. 271-288Article in journal (Refereed)
    Abstract [en]

    The bulge-chasing kernel in the small-bulge multi-shift QR algorithm for the non-symmetric dense eigenvalue problem becomes a sequential bottleneck when the QR algorithm is run in parallel on a multicore platform with shared memory. The duration of each kernel invocation is short, but the critical path of the QR algorithm contains a long sequence of calls to the bulge-chasing kernel. We study the problem of parallelizing the bulge-chasing kernel itself across a handful of processor cores in order to reduce the execution time of the critical path. We propose and evaluate a sequence of four algorithms with varying degrees of complexity and verify that a pipelined algorithm with a slowly shifting block column distribution of the Hessenberg matrix is superior. The load-balancing problem is non-trivial and computational experiments show that the load-balancing scheme has a large impact on the overall performance. We propose two heuristics for the load-balancing problem and also an effective optimization method based on local search. Numerical experiments show that speed-ups are obtained for problems as small as 40-by-40 on two different multicore architectures.

  • 6.
    Schwarz, Angelika Beatrix
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Karlsson, Lars
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Scalable eigenvector computation for the non-symmetric eigenvalue problem2019In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 85, p. 131-140Article in journal (Refereed)
    Abstract [en]

    We present two task-centric algorithms for computing selected eigenvectors of a non-symmetric matrix reduced to real Schur form. Our approach eliminates the sequential phases present in the current LAPACK/ScaLAPACK implementation. We demonstrate the scalability of our implementation on multicore, manycore and distributed memory systems.

1 - 6 of 6
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf