umu.sePublications
Change search
Refine search result
1 - 3 of 3
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Myllykoski, Mirko
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Department of Mathematical Information Technology, University of Jyväskylä.
    Rossi, Tuomo
    Department of Mathematical Information Technology, University of Jyväskylä.
    Toivanen, Jari
    Department of Mathematical Information Technology, University of Jyväskylä; Department of Aeronautics & Astronautics, Stanford University.
    On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method2018In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 115, p. 56-66Article in journal (Refereed)
    Abstract [en]

    Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.

    The full text will be freely available from 2020-06-01 15:36
  • 2.
    Rodrigo, Gonzalo P.
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Antypas, Katie
    Lawrence Berkeley National Lab, USA.
    Gerber, Richard
    Lawrence Berkeley National Lab, USA.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab, USA.
    Towards understanding HPC users and systems: a NERSC case study2018In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 111, p. 206-221Article in journal (Refereed)
    Abstract [en]

    High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems.

    In this paper, we present a methodology to characterize workloads and assess their heterogeneity, at a particular time period and its evolution over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems' lifetime (2010–2014).

  • 3. Soni, V.
    et al.
    Hadjadj, A.
    Roussel, Olivier
    Umeå University, Faculty of Science and Technology, Department of Physics.
    Moebs, G.
    Parallel multi-core and multi-processor methods on point-value multiresolution algorithms for hyperbolic conservation laws2019In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 123, p. 192-203Article in journal (Refereed)
    Abstract [en]

    The underlying sequential behavior of the multiresolution (MR) method has been exploited for parallel computing by introducing a concept of multiresolution forest structures (MFS) along with two new load-balancing algorithms. Another easy-to-implement multithreading approach has also been introduced for the multicore architectures. Tests were conducted using an Euler solver based on a fifth-order shock capturing WENO scheme and a third-order Runge-Kutta algorithm. The methods have been rigorously analyzed in terms of speedup ratio and parallel efficiency to bring forth their benefits as well as limitations. The performance yielded through these methods indicates that the MFS is a new headway for the MR method in parallel computing that has a potential to harness better scalability. 

1 - 3 of 3
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf