Umeå University's logo

umu.sePublikasjoner
Endre søk
Begrens søket
1 - 8 of 8
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Daldorff, Lars K. S.
    et al.
    Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden.
    Bengt, Eliasson
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för fysik.
    Parallelization of a Vlasov–Maxwell solver in four-dimensional phase space2009Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 35, nr 2, s. 109-115Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present a parallelized algorithm for solving the time-dependent Vlasov–Maxwell system of equations in the four-dimensional phase space (two spatial and velocity dimensions). One Vlasov equation is solved for each particle species, from which charge and current densities are calculated for the Maxwell equations. The parallelization is divided into two different layers. For the first layer, each plasma species is given its own processor group. On the second layer, the distribution function is domain decomposed on its dedicated resources. By separating the communication and calculation steps, we have met the design criteria of good speedup and simplicity in the implementation.

  • 2.
    Jäger, Gerold
    et al.
    Computer Science Institute, University of Halle-Wittenberg, D-06120 Halle (Saale), Germany.
    Wagner, Clemens
    denkwerk, Vogelsanger Straße 66, D-50823 Köln, Germany.
    Efficient parallelizations of Hermite and Smith normal form algorithms2009Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 35, nr 6, s. 345-357Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Hermite and Smith normal form are important forms of matrices used in linear algebra. These terms have many applications in group theory and number theory. As the entries of the matrix and of its corresponding transformation matrices can explode during the computation, it is a very difficult problem to compute the Hermite and Smith normal form of large dense matrices. The main problems of the computation are the large execution times and the memory requirements which might exceed the memory of one processor. To avoid these problems, we develop parallelizations of Hermite and Smith normal form algorithms. These are the first parallelizations of algorithms for computing the normal forms with corresponding transformation matrices, both over the rings Z and F[x]. We show that our parallel versions have good efficiency, i.e., by doubling the processes, the execution time is nearly halved. Furthermore, they succeed in computing normal forms of dense large example matrices over the rings Q[x], F3[x], and F5[x].

  • 3.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kressner, Daniel
    Uschmajew, Andre
    Parallel algorithms for tensor completion in the CP format2016Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 57, s. 222-234Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Low-rank tensor completion addresses the task of filling in missing entries in multidimensional data. It has proven its versatility in numerous applications, including context aware recommender systems and multivariate function learning. To handle large-scale datasets and applications that feature high dimensions, the development of distributed algorithms is central. In this work, we propose novel, highly scalable algorithms based on a combination of the canonical polyadic (CP) tensor format with block coordinate descent methods. Although similar algorithms have been proposed for the matrix case, the case of higher dimensions gives rise to a number of new challenges and requires a different paradigm for data distribution. The convergence of our algorithms is analyzed and numerical experiments illustrate their performance on distributed-memory architectures for tensors from a range of different applications.

  • 4.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures2011Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 37, nr 12, s. 771-782Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We consider parallel reduction of a real matrix to Hessenberg form using orthogonal transformations. Standard Hessenberg reduction algorithms reduce the columns of the matrix from left to right in either a blocked or unblocked fashion. However, the standard blocked variant performs 20% of the computations in terms of matrix vector multiplications. We show that a two-stage approach consisting of an intermediate reduction to block Hessenberg form speeds up the reduction by avoiding matrix vector multiplications. We describe and evaluate a new high-performance implementation of the two-stage approach that attains significant speedups over the one-stage approach. The key components are a dynamically scheduled implementation of Stage 1 and a blocked, adaptively load-balanced implementation of Stage 2. (C) 2011 Elsevier B.V. All rights reserved.

  • 5.
    Karlsson, Lars
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Kågström, Bo
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Wadbro, Eddie
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N).
    Fine-Grained Bulge-Chasing Kernels for Strongly Scalable Parallel QR Algorithms2014Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, nr 7, s. 271-288Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The bulge-chasing kernel in the small-bulge multi-shift QR algorithm for the non-symmetric dense eigenvalue problem becomes a sequential bottleneck when the QR algorithm is run in parallel on a multicore platform with shared memory. The duration of each kernel invocation is short, but the critical path of the QR algorithm contains a long sequence of calls to the bulge-chasing kernel. We study the problem of parallelizing the bulge-chasing kernel itself across a handful of processor cores in order to reduce the execution time of the critical path. We propose and evaluate a sequence of four algorithms with varying degrees of complexity and verify that a pipelined algorithm with a slowly shifting block column distribution of the Hessenberg matrix is superior. The load-balancing problem is non-trivial and computational experiments show that the load-balancing scheme has a large impact on the overall performance. We propose two heuristics for the load-balancing problem and also an effective optimization method based on local search. Numerical experiments show that speed-ups are obtained for problems as small as 40-by-40 on two different multicore architectures.

    Fulltekst (pdf)
    PARCO-D-12-00193.pdf
  • 6.
    Khelghatdoust, Mansour
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. The University of Sydney.
    Gramoli, Vincent
    The University of Sydney.
    A scalable and low latency probe-based scheduler for data analytics frameworks2021Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 103, artikkel-id 102752Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Today's data analytics frameworks divide jobs into many parallel tasks such that each task operates on a small partition of data in order to execute jobs with low latency. Such frameworks often rely on probe-based distributed schedulers to tackle the challenge of reducing the associated overhead. Unfortunately, the existing solutions do not perform efficiently under workload fluctuations and heterogeneous job durations. This is due to a problem called Head-of-Line blocking, i.e., short tasks are enqueued at workers behind longer tasks. To overcome this problem, we propose Peacock (Khelghatdoust and Gramoli, 0000) [25] a new fully distributed probe-based scheduling method. Unlike the existing methods, Peacock introduces a novel probe rotation technique. Workers form a ring overlay network and rotate probes using elastic queues of workers. It is augmented by a novel starvation-free probe reordering algorithm executed by workers. We evaluate Peacock against two existing state-of-the-art probe based solutions through a trace driven simulation of up to 20,000 workers and a distributed experiment of 100 workers in Apache Spark under Google, Cloudera, and Yahoo! traces. The performance results indicate that Peacock outperforms the state-of-the-art in all cluster sizes and loads. Our distributed experiments confirm our simulation results.

  • 7.
    Schwarz, Angelika Beatrix
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Scalable eigenvector computation for the non-symmetric eigenvalue problem2019Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 85, s. 131-140Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present two task-centric algorithms for computing selected eigenvectors of a non-symmetric matrix reduced to real Schur form. Our approach eliminates the sequential phases present in the current LAPACK/ScaLAPACK implementation. We demonstrate the scalability of our implementation on multicore, manycore and distributed memory systems.

  • 8.
    Schwarz, Angelika Beatrix
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kjelgaard Mikkelsen, Carl Christian
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Karlsson, Lars
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Robust parallel eigenvector computation for the non-symmetric eigenvalue problem2020Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 100, artikkel-id 102707Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A standard approach for computing eigenvectors of a non-symmetric matrix reduced to real Schur form relies on a variant of backward substitution. Backward substitution is prone to overflow. To avoid overflow, the LAPACK eigenvector routine DTREVC3 associates every eigenvector with a scaling factor and dynamically rescales an entire eigenvector during the backward substitution such that overflow cannot occur. When many eigenvectors are computed, DTREVC3 applies backward substitution successively for every eigenvector. This corresponds to level-2 BLAS operations and constitutes a bottleneck. This paper redesigns the backward substitution such that the entire computation is cast as tile operations (level-3 BLAS). By replacing LAPACK’s scaling factor with tile-local scaling factors, our solver decouples the tiles and sustains parallel scalability even when a lot of numerical scaling is necessary.

1 - 8 of 8
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf