The problem of Grid-middleware interoperability is addressed by the design and analysis of a feature-rich, standards-based framework for all-to-all cross-middleware job submission.The architecture is designed with focus on generality and flexibility and builds on extensive use, internally and externally, of (proposed) Web and Grid services standards such asWSRF, JSDL, GLUE, and WS-Agreement. The external use providesthe foundation for easy integration into specific middlewares,which is performed by the design of a small set of plugins for each middleware. Currently, plugins are provided for integrationinto Globus Toolkit 4 and NorduGrid/ARC. The internal use of standard formats facilitates customizationof the job submission service by replacement of custom components for performing specific well-defined tasks.Most importantly, this enables the easy replacement of resource selection algorithms by algorithms that addresses the specific needs of a particular Grid environment and job submission scenario.By default, the service implements a decentralized brokering policy, strivingto optimize the performance for the individual user by minimizing the response time for each job submitted. The algorithms in our implementation perform resource selectionbased on performance predictions, and provide support for advance reservations as well as coallocation of multiple resources for coordinated use.The performance of the system is analyzed with focuson overall service throughput (up to over 250 jobs per minute)and individual job submission response time (down to under one second).
The SweGrid Accounting System (SGAS) allocates capacity in collaborative Grid environments by coordinating enforcement of Grid-wide usage limits as a means to offer usage guarantees and prevent overuse. SGAS employs a credit-based allocation model where Grid capacity is granted to projects via Grid-wide quota allowances that can be spent across the Grid resources. The resources collectively enforce these allowances in a soft, real-time manner. SGAS is built on service-oriented principles with a strong focus on interoperability and Web services standards. This article covers the SGAS design and implementation, which, besides addressing inherent Grid challenges (scale, security, heterogeneity, decentralization), emphasizes generality and flexibility to produce a customizable system with lightweight integration into different middleware and scheduling system combinations. We focus the discussion around the system design, a flexible allocation model, middleware integration experiences and scalability improvements via a distributed virtual banking system, and finally, an extensive set of testbed experiments. The experiments evaluate the performance of SGAS in terms of response times, request throughput, overall system scalability, and its performance impact on the Globus Toolkit 4 job submission software. We conclude that, for all practical purposes, the quota enforcement overhead incurred by SGAS on job submissions is not a limiting factor for the job-handling capacity of the job submission software.
A parallel algorithm for reordering the eigenvalues in the real Schur form of a matrix is presented and discussed. Our novel approach adopts computational windows and delays multiple outside-window updates until each window has been completely reordered locally. By using multiple concurrent windows the parallel algorithm has a high level of concurrency, and most work is level 3 BLAS operations. The presented algorithm is also extended to the generalized real Schur form. Experimental results for ScaLAPACK-style Fortran 77 implementations on a Linux cluster confirm the efficiency and scalability of our algorithms in terms of more than 16 times of parallel speedup using 64 processors for large-scale problems. Even on a single processor our implementation is demonstrated to perform significantly better compared with the state-of-the-art serial implementation.
We analyze the convergence of quasi-Newton methods in exact and finite precision arithmetic using three different techniques. We derive an upper bound for the stagnation level and we show that any sufficiently exact quasi-Newton method will converge quadratically until stagnation. In the absence of sufficient accuracy, we are likely to retain rapid linear convergence. We confirm our analysis by computing square roots and solving bond constraint equations in the context of molecular dynamics. In particular, we apply both a symmetric variant and Forsgren's variant of the simplified Newton method. This work has implications for the implementation of quasi-Newton methods regardless of the scale of the calculation or the machine.
Triangular linear systems are central to the solution of general linear systems and the computation of eigenvectors. In the absence of floating‐point exceptions, substitution runs to completion and solves a system which is a small perturbation of the original system. If the matrix is well‐conditioned, then the normwise relative error is small. However, there are well‐conditioned systems for which substitution fails due to overflow. The robust solvers xLATRS from LAPACK extend the set of linear systems which can be solved by dynamically scaling the solution and the right‐hand side to avoid overflow. These solvers are sequential and apply to systems with a single right‐hand side. This paper presents algorithms which are blocked and parallel. A new task‐based parallel robust solver (Kiya) is presented and compared against both DLATRS and the non‐robust solvers DTRSV and DTRSM. When there are many right‐hand sides, Kiya performs significantly better than the robust solver DLATRS and is not significantly slower than the non‐robust solver DTRSM.
Energy management has become increasingly necessary in data centers to address all energy-related costs, including capital costs, operating expenses, and environmental impacts. Heterogeneous systems with mixed hardware architectures provide both throughput and processing efficiency for different specialized application types and thus have a potential for significant energy savings. However, the presence of multiple and different processing elements increases the complexity of resource assignment. In this paper, we propose a system for efficient resource management in heterogeneous clouds. The proposed approach maps applications' requirement to different resources reducing power usage with minimum impact on performance. A technique that combines the scheduling of custom hardware accelerators, in our case, Field-Programmable Gate Arrays (FPGAs) and optimized resource allocation technique for commodity servers, is proposed. We consider an energy-aware scheduling technique that uses both the applications' performance and their deadlines to control the assignment of FPGAs to applications that would consume the most energy. Once the scheduler has performed the mapping between a VM and an FPGA, an optimizer handles the remaining VMs in the server, using vertical scaling and CPU frequency adaptation to reduce energy consumption while maintaining the required performance. Our evaluation using interactive and data-intensive applications compare the effectiveness of the proposed solution in energy savings as well as maintaining applications performance, obtaining up to a 32% improvement in the performance-energy ratio on a mix of multimedia and e-commerce applications.
Reinforcement learning (RL) is an effective approach to developing control policies by maximizing the agent's reward. Deep reinforcement learning uses deep neural networks (DNNs) for function approximation in RL, and has achieved tremendous success in recent years. Large DNNs often incur significant memory size and computational overheads, which may impede their deployment into resource-constrained embedded systems. For deployment of a trained RL agent on embedded systems, it is necessary to compress the policy network of the RL agent to improve its memory and computation efficiency. In this article, we perform model compression of the policy network of an RL agent by leveraging the relevance scores computed by layer-wise relevance propagation (LRP), a technique for Explainable AI (XAI), to rank and prune the convolutional filters in the policy network, combined with fine-tuning with policy distillation. Performance evaluation based on several Atari games indicates that our proposed approach is effective in reducing model size and inference time of RL agents. We also consider robust RL agents trained with RADIAL-RL versus standard RL agents, and show that a robust RL agent can achieve better performance (higher average reward) after pruning than a standard RL agent for different attack strengths and pruning rates.
In this paper, we present the StarNEig library for solving dense nonsymmetric standard and generalized eigenvalue problems. The library is built on top of the StarPU runtime system and targets both shared and distributed memory machines. Some components of the library have support for GPU acceleration. The library currently applies to real matrices with real and complex eigenvalues and all calculations are done using real arithmetic. Support for complex matrices is planned for a future release. This paper is aimed at potential users of the library. We describe the design choices and capabilities of the library, and contrast them to existing software such as LAPACK and ScaLAPACK. StarNEig implements a ScaLAPACK compatibility layer which should assist new users in the transition to StarNEig. We demonstrate the performance of the library with a sample of computational experiments.