umu.sePublications
Change search
Refine search result
1 - 39 of 39
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Byrne, James
    et al.
    Dublin City University, Ireland.
    Svorobej, Sergej
    Dublin City University, Ireland.
    Giannoutakis, Konstantinos M.
    Centre for Research and Technology Hellas, Greece.
    Tzovaras, Dimitrios
    Centre for Research and Technology Hellas, Greece.
    Byrne, P. J.
    Dublin City University, Ireland.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
    Gourinovitch, Anna
    Dublin City University, Ireland.
    Lynn, Theo
    Dublin City University, Ireland.
    A review of cloud computing simulation platforms and related environments2017In: Proceedings of the 7th International Conference on Cloud Computing and Services Science: Volume 1: CLOSER, 2017, Vol. 1, 679-691 p.Conference paper (Refereed)
    Abstract [en]

    Recent years have seen an increasing trend towards the development of Discrete Event Simulation (DES) platforms to support cloud computing related decision making and research. The complexity of cloud environments is increasing with scale and heterogeneity posing a challenge for the efficient management of cloud applications and data centre resources. The increasing ubiquity of social media, mobile and cloud computing combined with the Internet of Things and emerging paradigms such as Edge and Fog Computing is exacerbating this complexity. Given the scale, complexity and commercial sensitivity of hyperscale computing environments, the opportunity for experimentation is limited and requires substantial investment of resources both in terms of time and effort. DES provides a low risk technique for providing decision support for complex hyperscale computing scenarios. In recent years, there has been a significant increase in the development and extension of tools to support DES for cloud computing resulting in a wide range of tools which vary in terms of their utility and features. Through a review and analysis of available literature, this paper provides an overview and multi-level feature analysis of 33 DES tools for cloud computing environments. This review updates and extends existing reviews to include not only autonomous simulation platforms, but also on plugins and extensions for specific cloud computing use cases. This review identifies the emergence of CloudSim as a de facto base platform for simulation research and shows a lack of tool support for distributed execution (parallel execution on distributed memory systems).

  • 2.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N). Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Gardfjäll, Peter
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Norberg, Arvid
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Tordsson, Johan
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Designing general, composable, and middleware-independent Grid infrastructure tools for multi-tiered job management2007In: Towards Next Generation Grids / [ed] T. Priol and M. Vaneschi, Springer-Verlag , 2007, 175-184 p.Conference paper (Refereed)
    Abstract [en]

    We propose a multi-tiered architecture for middleware-independent Grid job management. The architecture consists of a number of services for well-defined tasks in the job management process, offering complete user-level isolation of servicecapabilities, multiple layers of abstraction, control, and fault tolerance. The middleware abstraction layer comprises components for targeted job submission, job control and resource discovery. The brokered job submission layer offers a Grid view on resources, including functionality for resource brokering and submission of jobs to selected resources. The reliable job submission layer includes components for fault tolerant execution of individual jobs and groups of independentjobs, respectively. The architecture is proposed as a composable set of tools rather than a monolithic solution, allowing users to select the individual components of interest. The prototype presented is implemented using the Globus Toolkit 4, integrated with the Globus Toolkit 4 and NorduGrid/ARC middlewares and based on existing and emerging Grid standards. A performance evaluation reveals that the overhead for resource discovery, brokering, middleware-specific format conversions, job monitoring, fault tolerance, and management of individual and groups of jobs is sufficiently small to motivate the use of the framework.

  • 3.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N). Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Hernández, Francisco
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Tordsson, Johan
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Designing service-based resource management tools for a healthy grid ecosystem2008In: Parallel processing and applied mathematics: 7th International Conference on Parallel Processing and Applied Mathematics, Springer-Verlag , 2008, 259-270 p.Conference paper (Refereed)
    Abstract [en]

    We present an approach for development of Grid resource management tools, where we put into practice internationally established high-level views of future Grid architectures. The approach addresses fundamental Grid challenges and strives towards a future vision of the Grid where capabilities are made available as independent and dynamically assembled utilities, enabling run-time changes in the structure, behavior, and location of software. The presentation is made in terms of design heuristics, design patterns, and quality attributes, and is centered around the key concepts of co-existence, composability, adoptability, adaptability, changeability, and interoperability. The practical realization of the approach is illustrated by five case studies (recently developed Grid tools) high-lighting the most distinct aspects of these key concepts for each tool. The approach contributes to a healthy Grid ecosystem that promotes a natural selection of “surviving” components through competition, innovation, evolution, and diversity. In conclusion, this environment facilitates the use and composition of components on a per-component basis.

  • 4.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Holmgren, S.
    Lindemann, J.
    Toor, S.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Empowering a flexible application portal with a SOA-based Grid job management framework2009In: Applied Parallel Computing (PARA 08): State of art in scientific computing / [ed] A.C. Elster et al., Springer , 2009Conference paper (Refereed)
  • 5.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    A composable service-oriented architecture for middleware-independent and interoperable grid job management2010Manuscript (preprint) (Other academic)
    Abstract [en]

    We propose a composable, loosely coupled Service-Oriented Architecture for middleware-independent Grid job management. The architecture is designed for use in federated Grid environments and aims to decouple Grid appli- cations from Grid middlewares and other infrastructure components. The notion of an ecosystem of Grid infrastructure components is extended, and Grid job management software design is discussed in this context. Non- intrusive integration models and abstraction of Grid middleware function- ality through hierarchical aggregation of autonomous Grid job management services are emphasized, and service composition techniques facilitating this process are explored. Earlier efforts in Service-Oriented Architecture design are extended upon, and implications of these are discussed throughout the paper. A proof-of-concept implementation of the proposed architecture is presented along with a technical evaluation of the performance of the proto- type, and a details of architecture implementation are discussed along with trade-offs introduced by the service composition techniques used.

  • 6.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Dynamic and Transparent Service Composition Techniques for Service-Oriented Grid Architectures2008In: Integrated Research in Grid Computing / [ed] S. Gorlatch and P. Fragopoulou and T. Priol, Greece: Crete University Press , 2008, 323-334 p.Conference paper (Refereed)
  • 7.
    Elmroth, Erik
    et al.
    Umeå University, Faculty of Science and Technology, High Performance Compting Center North (HPC2N). Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Dynamic and transparent service compositions techniques for service-oriented grid architectures2008In: Integrated research in Grid computing / [ed] S. Gorlatch, P Fragopoulou and T. Priol, Crete University Press , 2008, 323-334 p.Chapter in book (Refereed)
    Abstract [en]

    With the introduction of the Service-Oriented Architecture design paradigm, service composition has become a central methodology for developing Grid software. We present an approach to Grid software development consisting of architectural design patterns for service de-composition and service re-composition. The patterns presented can each be used individually, but provide synergistic effects when combined as described in a unified framework. Software design patterns are employed to provide structure in design for service-based software development. Service APIs and immutable data wrappers are used to simplify service client development and isolate service clients from details of underlying service engine architectures. The use of local call structures greatly reduces inter-service communication overhead for co-located services, and service API factories are used to make local calls transparent to service client developers. Light-weight and dynamically replaceable plug-ins provide structure for decision support and integration points. A dynamic configuration scheme provides coordination of service efforts and synchronization of service interactions in a user-centric manner. When using local calls and dynamic configuration for creating networks of cooperating services, the need for generic service monitoring solutions becomes apparent and is addressed by service monitoring interfaces. We present these techniques along with their intended use in the context of software development for service-oriented Grid architectures.

  • 8.
    Espling, Daniel
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Integration and Evaluation of De- centralized Fairshare Prioritization (Aequus)2014In: Proceedings of PDSEC 2014 - The 15th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2014), 2014, 1198-1207 p.Conference paper (Refereed)
    Abstract [en]

    Fairshare is commonly one of the factors used by cluster resource management systems to prioritize jobs during scheduling. Despite the grid vision of a transparent and unified infrastructure, fairshare is normally calculated and enforced at the local cluster level rather than at a grid-wide scale. Aequus is a self-contained decentralized system for grid-wide fairshare job prioritization. Using Aequus, detailed global share policies can be combined with local cluster policies to offer a unified grid fairshare prioritization system where local administrations retain control over their clusters. This work shows how Aequus can be integrated with local resource management systems such as SLURM and Maui with minimal intrusion. Early results from production help assess the maturity of the system, and the system is further tested and evaluated for use at a nation-wide scale using workload modeling techniques. Statistical models are created based on historical national grid usage data, and synthetic traces based on these models are used to create a diverse input set used to exemplify system behavior. The system is shown to behave consistently despite great variations in job arrival patterns and partial participation of some of the collaborating installations.

  • 9.
    Espling, Daniel
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Integration and Evaluation of Decentralized Fairshare Prioritization (Aequus)Manuscript (preprint) (Other academic)
    Abstract [en]

    Fairshare is commonly one of the factors used by cluster resource management systems to prioritize jobs during scheduling. Despite the grid vision of a transparent and unified infrastructure, fairshare is normally calculated and enforced at the local cluster level rather than at a grid-wide scale. Aequus is a self-contained decentralized system for grid-wide fairshare job prioritization. Using Aequus, detailed global share policies can be combined with local cluster policies to offer a unified grid fairshare prioritization system where local administrations retain control over their clusters. This work shows how Aequus can be integrated with local resource management systems such as SLURM and Maui with minimal intrusion. Early results from production use are presented, and the system is further tested and evaluated for use at a nation-wide scale. Statistical models are created based on historical national grid usage data, and synthetic traces based on these models are used to create a diverse input set used to exemplify system behavior. The system is shown to behave consistently despite great variations in job arrival patterns and partial participation of some of the collaborating installations.

  • 10.
    Gonzalo P., Rodrigo
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab, Berkeley, California, USA.
    ScSF: a scheduling simulation framework2017In: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing, 2017Conference paper (Refereed)
    Abstract [en]

    High-throughput and data-intensive applications are increasingly present, often composed as workflows, in the workloads of current HPC systems. At the same time, trends for future HPC systems point towards more heterogeneous systems with deeper I/O and memory hierarchies. However, current HPC schedulers are designed to support classical large tightly coupled parallel jobs over homogeneous systems. Therefore, There is an urgent need to investigate new scheduling algorithms that can manage the future workloads on HPC systems. However, there is a lack of appropriate models and frameworks to enable development, testing, and validation of new scheduling ideas. In this paper, we present an open-source scheduler simulation framework (ScSF) that covers all the steps of scheduling research through simulation. ScSF provides capabilities for workload modeling, workload generation, system simulation, comparative workload analysis, and experiment orchestration. The simulator is designed to be run over a distributed computing infrastructure enabling to test at scale. We describe in detail a use case of ScSF to develop new techniques to manage scientific workflows in a batch scheduler. In the use case, such technique was implemented in the framework scheduler. For evaluation purposes, 1728 experiments, equivalent to 33 years of simulated time, were run in a deployment of ScSF over a distributed infrastructure of 17 compute nodes during two months. Finally, the experimental results were analyzed in the framework to judge that the technique minimizes workflows’ turnaround time without over-allocating resources. Finally, we discuss lessons learned from our experiences that will help future researchers.

  • 11.
    Gonzalo P., Rodrigo
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, P-O
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab, USA.
    Enabling workflow aware scheduling on HPC systemsManuscript (preprint) (Other academic)
    Abstract [en]

    Workƒows from diverse scienti€c domains are increasingly present in the workloads of current HPC systems. However, HPC scheduling systems do not incorporate workƒow speci€c mechanisms beyond the capacity to declare dependencies between jobs. Œus, when users run workƒows as sets of batch jobs with completion dependencies, the workƒows experience long turn around times. Alternatively, when they are submiŠed as single jobs, allocating the maximum requirementof resources for the whole runtime, they resources, reducing the HPC system utilization. In this paper, we present a workƒow aware scheduling (WoAS) system that enables pre-existing scheduling algorithms to take advantage of the €ne grained workƒow resource requirements and structure, without any modi€cation to the original algorithms. Œe current implementation of WoAS is integrated in Slurm, a widely used HPC batch scheduler. We evaluate the system in simulation using real and synthetic workƒows and a synthetic baseline workload that captures the job paŠerns observed over three years of the real workload data of Edison, a large supercomputer hosted at the National Energy Research Scienti€c Computing Center. Finally, our results show that WoAS e‚ectively reduces workƒow turnaround time and improves system utilization without a signi€cant impact on the slowdown of traditional jobs.

  • 12. Jayawardena, Mahen
    et al.
    Nettelblad, Carl
    Toor, Salman Zubair
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Holmgren, Sverker
    A Grid-enabled problem solving environment for QTL analysis in R2010In: 2nd International Conference on Bioinformatics and Computational Biology (BICoB) / [ed] Al-Mubaid, H, 2010, 202-209 p.Conference paper (Other academic)
  • 13.
    Krzywda, Jakub
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Tärneberg, William
    Dept. of Electrical and Information Technology, Lund University.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Kihl, Maria
    Dept. of Electrical and Information Technology, Lund University.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Telco Clouds: Modelling and Simulation2015Conference paper (Refereed)
  • 14.
    Krzywda, Jakub
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    A Sensor-Actuator Model for Data Center Optimization2015In: 2015 INTERNATIONAL CONFERENCE ON CLOUD AND AUTONOMIC COMPUTING (ICCAC), New York: IEEE Computer Society, 2015, 192-195 p.Conference paper (Refereed)
    Abstract [en]

    Cloud data centers commonly use virtualization technologies to provision compute capacity with a level of indirection between virtual machines and physical resources. In this paper we explore the use of that level of indirection as a means for autonomic data center configuration optimization and propose a sensor-actuator model to capture optimization-relevant relationships between data center events, monitored metrics (sensors data), and management actions (actuators). The model characterizes a wide spectrum of actions to help identify the suitability of different actions in specific situations, and outlines what (and how often) data needs to be monitored to capture, classify, and respond to events that affect the performance of data center operations.

  • 15.
    Rodrigo, Gonzalo P.
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Priority operators for fairshare scheduling2015In: Job scheduling strategies for parallel processing (JSSPP 2014), 2015, 70-89 p.Conference paper (Refereed)
    Abstract [en]

    Collaborative resource sharing in distributed computing requires scalable mechanisms for allocation and control of user quotas. Decentralized fairshare prioritization is a technique for enforcement of user quotas that can be realized without centralized control. The technique is based on influencing the job scheduling order of local resource management systems using an algorithm that establishes a semantic for prioritization of jobs based on the individual distances between user's quota allocations and user's historical resource usage (i.e. intended and current system state). This work addresses the design and evaluation of priority operators, mathematical functions to quantify fairshare distances, and identify a set of desirable characteristics for fairshare priority operators. In addition, this work also proposes a set of operators for fairshare prioritization, establishes a methodology for verification and evaluation of operator characteristics, and evaluates the proposed operator set based on this mathematical framework. Limitations in the numerical representation of scheduling factor values are identified as a key challenge in priority operator formulation, and it is demonstrated that the contributed priority operators (the Sigmoid operator family) behave robustly even in the presence of severe resolution limitations.

  • 16.
    Rodrigo, Gonzalo P
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Antypas, Katie
    Lawrence Berkeley National Lab.
    Gerber, Richard
    Lawrence Berkeley National Lab.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab.
    HPC System Lifetime Story: Workload Characterization and Evolutionary Analyses on NERSC Systems2015In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HDPC), ACM Digital Library, 2015, , 57-60 p.57-60 p.Conference paper (Refereed)
    Abstract [en]

    High performance computing centers have traditionally served monolithic MPI applications. However, in recent years, many of the large scientific computations have included high throughput and data-intensive jobs. HPC systems have mostly used batch queue schedulers to schedule these workloads on appropriate resources. There is a need to understand future scheduling scenarios that can support the diverse scientific workloads in HPC centers. In this paper, we analyze the workloads on two systems (Hopper and Carver) at the National Energy Research Scientific Computing (NERSC) Center. Specifically, we present a trend analysis towards understanding the evolution of the workload over the lifetime of the two systems.

  • 17.
    Rodrigo, Gonzalo P.
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Antypas, Katie
    Lawrence Berkeley National Lab, USA.
    Gerber, Richard
    Lawrence Berkeley National Lab, USA.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab, USA.
    Towards understanding HPC users and systems: a NERSC case study2018In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 111, 206-221 p.Article in journal (Refereed)
    Abstract [en]

    High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems.

    In this paper, we present a methodology to characterize workloads and assess their heterogeneity, at a particular time period and its evolution over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems' lifetime (2010–2014).

  • 18.
    Rodrigo, Gonzalo P.
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Antypas, Katie
    Gerber, Richard
    Ramakrishnan, Lavanya
    Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study2016In: 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, 521-526 p.Conference paper (Refereed)
    Abstract [en]

    The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs' wait time.

  • 19.
    Rodrigo, Gonzalo P.
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, P-O
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab, USA.
    A2L2: an application aware flexible HPC scheduling model for low-latency allocation2015In: VTDC '15: proceedings of the 8th International workshop on virtualization technologies in distributed computing, ACM Digital Library, 2015, , 11-19 p.11-19 p.Conference paper (Refereed)
    Abstract [en]

    High-performance computing (HPC) is focused on providing large-scale compute capacity to scientific applications. HPC schedulers tend to be optimized for large parallel batch jobs and, as such, often overlook the requirements of other scientific applications. In this work, we propose a cloud-inspired HPC scheduling model that aims to capture application performance and requirement models (Application Aware - A2) and dynamically resize malleable application resource allocations to be able to support applications with critical performance or deadline requirements. (Low Latency allocation - L2). The proposed model incorporates measures to improve data-intensive applications performance on HPC systems and is derived from a set of cloud scheduling techniques that are identified as applicable in HPC environments. The model places special focus on dynamically malleable applications; data-intensive applications that support dynamic resource allocation without incurring severe performance penalties; which are proposed for fine-grained back-filling and dynamic resource allocation control without job preemption.

  • 20.
    Rodrigo, Gonzalo
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Antypas, Katie
    Lawrence Berkeley National Laboratory, USA.
    Gerber, Richard
    Lawrence Berkeley National Laboratory, USA.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Laboratory, USA.
    Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study2016In: Proceedings of CCGrid 2016 - The 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2016, 521-526 p.Conference paper (Refereed)
    Abstract [en]

    The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs’ wait time.

  • 21. Tomas, L
    et al.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Caminero, B.
    Carrion, C.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    An adaptable in-advance and fairshare meta-scheduling architecture to improve grid QoS2011In: Grid Computing (GRID), 2011 12th IEEE/ACM International Conference on, 2011, 220-221 p.Conference paper (Refereed)
    Abstract [en]

    Grids are highly variable heterogeneous systems where resources may span multiple administrative domains and utilize heterogeneous schedulers, which complicates enforcement of end-user resource utilization quotas. This work focuses on enhancement of resource utilization quality of service through combination of two systems. A predictive meta-scheduling framework and a distributed fairs hare job prioritization system. The first, SA-Layer, is a system designed to provide scheduling of jobs in advance by ensuring resource availability for future job executions. The second, FS Grid, provides an efficient mechanism for fairs hare-based job prioritization. The integrated architecture presented in this work combines the strengths of both systems and improves perceived end-user quality of service by providing reliable resource allocations adhering to usage allocation policies.

  • 22. Tomás, L.
    et al.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Caminero, B.
    Carrión, C.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Addressing QoS in grids through a fairshare meta-scheduling in-advance architecture2012Conference paper (Refereed)
    Abstract [en]

    Federated Grid resources typically span multiple administrative domains and utilize heterogeneous schedulers. This complexity complicates not only provisioning of quality of service but also management and enforcement of end-user resource utilization allocations. to overcome these problems, we propose to combine high-level meta-scheduling techniques with lower-level fairs hare prioritization mechanisms to create a framework that improves end-user quality of service in heterogeneous distributed computing environments. to illustrate the approach we present a prototype architecture based on two existing systems, the meta-scheduling framework SA-Layer and the distributed fairs hare prioritization system Aequus. the proposed architecture constitutes a predictive meta-scheduling architecture that performs fair user-level scheduling prioritization and enacts resource utilization quotas, whilst also providing synergetic effects that improve the performance of the individual system components. to characterize the contribution, the proposed system is evaluated on a test bed consisting of geographically dispersed, heterogeneous computing resources spanning multiple administration domains.

  • 23.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    A model for simulation of application and resource behavior in heterogeneous distributed computing environments2012In: Proceedings of the 2nd international conference on simulation and modeling methodologies, technologies and applications / [ed] Nuno Pina, Janusz Kacprzyk, Mohammad S. Obaidat, SciTePress, 2012, 144-151 p.Conference paper (Refereed)
  • 24.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Architectures, design methodologies, and service composition techniques for Grid job and resource management2009Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    The field of Grid computing has in recent years emerged and been established as an enabling technology for a range of computational eScience applications. The use of Grid technology allows researchers and industry experts to address problems too large to efficiently study using conventional computing technology, and enables new applications and collaboration models. Grid computing has today not only introduced new technologies, but also influenced new ways to utilize existing technologies.This work addresses technical aspects of the current methodology of Grid com- puting; to leverage highly functional, interconnected, and potentially under-utilized high-end systems to create virtual systems capable of processing problems too large to address using individual (supercomputing) systems. In particular, this thesis studies the job and resource management problem inherent to Grid environments, and aims to contribute to development of more mature job and resource management systems and software development processes. A number of aspects related to Grid job and resource management are here addressed, including software architectures for Grid job management, design methodologies for Grid software development, service composition (and refactorization) techniques for Service-Oriented Grid Architectures, Grid infrastructure and application integration issues, and middleware-independent and transparent techniques to leverage Grid resource capabilities.The software development model used in this work has been derived from the notion of an ecosystem of Grid components. In this model, a virtual ecosystem is defined by the set of available Grid infrastructure and application components, and ecosystem niches are defined by areas of component functionality. In the Grid ecosys- tem, applications are constructed through selection and composition of components, and individual components subject to evolution through meritocratic natural selection. Central to the idea of the Grid ecosystem is that mechanisms that promote traits beneficial to survival in the ecosystem, e.g., scalability, integrability, robustness, also influence Grid application and infrastructure adaptability and longevity. As Grid computing has evolved into a highly interdisciplinary field, current Grid applications are very diverse and utilize computational methodologies from a number of fields. Due to this, and the scale of the problems studied, Grid applications typically place great performance requirements on Grid infrastructures, making Grid infrastructure design and integration challenging tasks. In this work, a model of building on, and abstracting, Grid middlewares has been developed and is outlined in the papers. In addition to the contributions of this thesis, a number of software artefacts, e.g., the Grid Job Management Framework (GJMF), have resulted from this work.

  • 25.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Virtual infrastructures for computational science: software and architectures for distributed job and resource management2011Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    In computational science, the scale of problems addressed and the resolution of solu- tions achieved are often limited by the available computational capacity. The current methodology of scaling computational capacity to large scale (i.e. larger than individ- ual resource site capacity) includes aggregation and federation of distributed resource systems. Regardless of how this aggregation manifests, scaling of scientific compu- tational problems typically involves (re)formulation of computational structures and problems to exploit problem and resource parallelism. Efficient parallelization and scaling of scientific computations to large scale is difficult and further complicated by a number of factors introduced by resource aggregation, e.g., resource heterogene- ity and coupling of computational methodology. Scaling complexity severely impacts computation enactment and necessitates the use of mechanisms that provide higher abstractions for management of computations in distributed computing environments.This work addresses design and construction of virtual infrastructures for scientific computation that abstract computation enactment complexity, decouple computation specification from computation enactment, and facilitate large-scale use of compu- tational resource systems. In particular, this thesis discusses job and resource man- agement in distributed virtual scientific infrastructures intended for Grid and Cloud computing environments. The main area studied is Grid computing, which is ap- proached using Service-Oriented Computing and Architecture methodology. Thesis contributions discuss both methodology and mechanisms for construction of virtual infrastructures, and address individual problems such as job management, application integration, scheduling job prioritization, and service-based software development.I addition to scientific publications, this work also makes contributions in the form of software artifacts that demonstrate the concepts discussed. The Grid Job Manage- ment Framework (GJMF) abstracts job enactment complexity and provides a range of middleware-agnostic job submission, control, and monitoring interfaces. The FSGrid framework provides a generic model for specification and delegation of resource allo- cations in virtual organizations, and enacts allocations based on distributed fairshare job prioritization. Mechanisms such as these decouple job and resource management from computational infrastructure systems and facilitate the construction of scalable virtual infrastructures for computational science.

  • 26.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Barry, McCollum
    Queens University of Belfast, United Kingdom.
    Heuristics and Algorithms for Data Center Optimization2015In: Proceedings of the 7th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2015), 2015, 921-927 p.Conference paper (Refereed)
  • 27.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    A Performance Evaluation of the Grid Job Management Framework (GJMF)2011Conference paper (Refereed)
  • 28.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Decentralized Prioritization-Based Management Systems for Distributed Computing2013In: 2013 IEEE 9th international conference on e-science (e-science), IEEE Computer Society, 2013, 228-237 p.Conference paper (Refereed)
    Abstract [en]

    Fairshare scheduling is an established technique to provide user-level differentiation in management of capacity consumption in high-performance and grid computing scheduler systems. In this paper we extend on a state-of-the-art approach to decentralized grid fairshare and propose a generalized model for construction of decentralized prioritization-based management systems. The approach is based on (re) formulation of control problems as prioritization problems, and a proposed framework for computationally efficient decentralized priority calculation. The model is presented along with a discussion of application of decentralized management systems in distributed computing environments that outlines selected use cases and illustrates key trade-off behaviors of the proposed model.

  • 29.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    GJMF - a composable service-oriented Grid job management framework2013In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 29, no 1, 144-157 p.Article in journal (Refereed)
    Abstract [en]

    We investigate best practices for Grid software design and development, and propose a composable, loosely coupled Service-Oriented Architecture for Grid job management. The architecture focuses on providing a transparent Grid access model for concurrent use of multiple Grid middlewares and aims to decouple Grid applications from Grid middlewares and infrastructure. The notion of an ecosystem of Grid infrastructure components is extended, and Grid job management software design is discussed in this context. Non- intrusive integration models and abstraction of Grid middleware function- ality through hierarchical aggregation of autonomous Grid job management services are emphasized, and service composition techniques facilitating this process are explored. A proof-of-concept implementation of the architecture is presented along with a discussion of architecture implementation details and trade-offs introduced by the service composition techniques used.

  • 30.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Impact of service overhead on service-oriented Grid architectures2011Conference paper (Refereed)
    Abstract [en]

    Grid computing applications and infrastructures build heavily on Service-Oriented Computing development methodology and are often realized as Service-Oriented Ar- chitectures. Current Service-Oriented Architecture methodology renders service components as Web Services, and suffers per- formance limitations from Web Service overhead. The Grid Job Management Framework (GJMF) is a flexible Grid in- frastructure and application support component realized as a loosely coupled network of Web Services that offers a range of abstractive and platform independent interfaces for middleware- agnostic Grid job submission, monitoring, and control. In this paper we a present a performance evaluation aimed to characterize the impact of service overhead on Grid Service- Oriented Architectures and evaluate the efficiency of the GJMF architecture and optimization mechanisms designed to mediate impact of Web Service overhead on architecture performance.

  • 31.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Increasing flexibility and abstracting complexity in service-based Grid and cloud software2011In: Proceedings of CLOSER 2011 - International Conference on Cloud Computing and Services Science / [ed] F. Leyman, I Ivanov, M. van Sinderen and B. Shishkov, SciTePress , 2011, 240-249 p.Conference paper (Refereed)
  • 32.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Mediation of service overhead in service-oriented grid architectures2011In: Grid Computing (GRID), 2011 12th IEEE/ACM International Conference on, 2011, 9-18 p.Conference paper (Refereed)
    Abstract [en]

    Grid computing applications and infrastructures build heavily on Service-Oriented Computing development methodology and are often realized as Service-Oriented Architectures. The Grid Job Management Framework (GJMF) is a flexible Grid infrastructure and application support tool that offers a range of abstractive and platform independent interfaces for middleware-agnostic Grid job submission, monitoring, and control. In this paper we use the GJMF as a test bed for characterization of Grid Service-Oriented Architecture overhead, and evaluate the efficiency of a set of design patterns for overhead mediation mechanisms featured in the framework.

  • 33.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Service development abstraction: A design methodology and development toolset for abstractive and flexible service-based software2011In: Cloud Computing and Service Science / [ed] Ivanov, van Sinderen, and Shishkov, Springer, 2011Chapter in book (Refereed)
  • 34.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Espling, Daniel
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Decentralized scalable fairshare scheduling2013In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 29, no 1, 130-143 p.Article in journal (Refereed)
    Abstract [en]

    This work addresses Grid fairshare allocation policy enforcement and presents Aequus, a decentralized system for Grid-wide fairshare job prioritization. The main idea of fairshare scheduling is to prioritize users with regard to predefined resource allocation quotas. The presented system builds on three contributions: a flexible tree-based policy model that allows delegation of policy definition, a job prioritization algorithm based on local enforcement of distributed fairshare policies, and a decentralized architecture for non-intrusive integration with existing scheduling systems. The system supports organization of users in virtual organizations and divides usage policies into local and global policy components that are defined by resource owners and virtual organizations. The architecture realization is presented in detail along with an evaluation of the system behavior in an emulated environment. In the evaluation, convergence noise types (mechanisms counteracting policy allocation convergence) are characterized and quantified, and the system is demonstrated to meet scheduling objectives and perform scalably under realistic operating conditions.

  • 35.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Groenda, Henning
    Wesner, Stefan
    Byrne, James
    Nikolopoulos, Dimitris
    Sheridan, Craig
    Tordsson, Johan
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ali-Eldin, Ahmed
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Krzywda, Jakub
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Domaschka, Jörg
    Byrne, PJ
    Svorobej, Sergej
    McCollum, Barry
    Papazachos, Zafeiros
    Whigham, Darren
    Rüth, Stefan
    Paurevic, Dragana
    Krogmann, Klaus
    The CACTOS Vision of Context-Aware Cloud Topology Optimization and Simulation2014In: Proceedings of the 6th IEEE International Con- ference on Cloud Computing Technology and Science (CloudCom 2014), 2014, 26-31 p.Conference paper (Refereed)
    Abstract [en]

    Recent advances in hardware development coupled with the rapid adoption and broad applicability of cloud computing have introduced widespread heterogeneity in data centers, significantly complicating the management of cloud applications and data center resources. This paper presents the CACTOS approach to cloud infrastructure automation and optimization, which addresses heterogeneity through a combination of in-depth analysis of application behavior with insights from commercial cloud providers. The aim of the approach is threefold: to model applications and data center resources, to simulate applications and resources for planning and operation, and to optimize application deployment and resource use in an autonomic manner. The approach is based on case studies from the areas of business analytics, enterprise applications, and scientific computing.

  • 36.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Hellander, Andreas
    Drawert, Brian
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Holmgren, Sverker
    Petzold, Linda
    Abstractions for scaling escience applications to distributed computing environments: a StratUm Integration Case Study in Molecular Systems Biology2012In: Bioinformatics: proceedings of the international conference on bioinformatics models, methods and algorithms / [ed] Correia, C; Fred, A; Gamboa, H; Schier, J, SETUBAL: SCITEPRESS , 2012, 290-294 p.Conference paper (Other academic)
    Abstract [en]

    Management of eScience computations and resulting data in distributed computing environments is complicated and often introduces considerable overhead. In this work we address a lack of integration tools that provide the abstraction levels, performance, and usability required to facilitate migration of eScience applications to distributed computing environments, In particular, we explore an approach to raising abstraction levels based on separation of computation design computation management and present StratUm, a computation enactment tool for distributed computing environments. Results are illustrated in a case study of integration of a software from the systems biology community with a grid computation management system.

  • 37.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Hellander, Andreas
    University of California, Santa Barbara, USA.
    Drawert, Brian
    University of California, Santa Barbara, USA.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Holmgren, Sverker
    Uppsala Universitet.
    Petzold, Linda
    University of California, Santa Barbara, USA.
    Reducing Complexity in Management of eScience Computation2012In: Proceedings of CCGrid 2012 - The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2012, 845-852 p.Conference paper (Refereed)
    Abstract [en]

    In this paper we address reduction of complexity in management of scientific computations in distributed computing environments. We explore an approach based on separation of computation design (application development) and distributed execution of computations, and investigate best practices for construction of virtual infrastructures for computational science - software systems that abstract and virtualize the processes of managing scientific computations on heterogeneous distributed resource systems. As a result we present StratUm, a toolkit for management of eScience computations. To illustrate use of the toolkit, we present it in the context of a case study where we extend the capabilities of an existing kinetic Monte Carlo software framework to utilize distributed computational resources. The case study illustrates a viable design pattern for construction of virtual infrastructures for distributed scientific computing. The resulting infrastructure is evaluated using a computational experiment from molecular systems biology.

  • 38.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Lockner, Niclas
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Creo: reduced complexity service development2014In: Proceedings of CLOSER 2014 - 4th International Conference on Cloud Computing and Services Science / [ed] M. Helfert, F. Desprez, D. Ferguson, F. Leymann, V. Mendez Munoz, 2014, 230-241 p.Conference paper (Refereed)
    Abstract [en]

    In this work we address service-oriented software development in distributed computing environments, and investigate an approach to software development and integration based on code generation. The approach is illustrated in a toolkit for multi-language software generation built on three building blocks; a service description language, a serialization and transport protocol, and a set of code generation techniques. The approach is intended for use in the eScience domain and aims to reduce the complexity of development and integration of distributed software systems through a low-knowledge-requirements model for construction of network-accessible services. The toolkit is presented along with a discussion of use cases and a performance evaluation quantifying the performance of the toolkit against selected alternative techniques for code generation and service communication. In tests of communication overhead and response time, toolkit performance is found to be comparable to or improve upon the evaluated techniques.

  • 39.
    Östberg, Per-Olov
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Lockner, Niclas
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Reducing Complexity in Service Development and Integration2015In: Cloud computing and services sciences, CLOSER 2014, Springer Berlin/Heidelberg, 2015, 63-80 p.Conference paper (Refereed)
    Abstract [en]

    The continuous growth and increasing complexity of distributed systems software has produced a need for software development tools and techniques that reduce the learning requirements and complexity of building distributed systems. In this work we address reduction of complexity in service-oriented software development and present an approach and a toolkit for multi-language service development based on three building blocks: a simplified service description language, an intuitive message serialization and transport protocol, and a set of code generation techniques that provide boilerplate environments for service implementations. The toolkit is intended for use in the eScience domain and is presented along with a performance evaluation that quantifies toolkit performance against that of selected alternative toolkits and technologies for service development. Toolkit performance is found to be comparable to or improve upon the performance of evaluated technologies.

1 - 39 of 39
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf