Andrea Attanasio is associated with the Center of Excellence on High-Performance Computing and Dipartimento di Elettronica, Informatica e Sistemistica, Universita degli Studi della Calabria, Rende, Italy.
Gianpaolo Ghiani is a professor at the Faculty of Engineering at the University of Salento (Lecce, Italy). Gianpaolo spent his PhD and post-doc years working on optimixation problems related to logistics and manufacturing planning and control (vehicle routing, plant location, facility layout, job sequencing and similar topics). In his recent years, his focus is in sequential (multi-stage) decision processes. And has also done work on vehicle routing problems, including the fast computation of point-to-point quickest paths on very large time-depending road networks. He has also published over 50 papers to international journals on these topics.
Lucio Grandinetti is a full professor on the Faulty of Engineering at the University of Calabria in Rende, Italy, where he stands as Vice Rector and Coordinator of the Ph.D. Degree Program in Management Science and Engineering. His degrees are in Electronic Engineering and Systems Science, having graduated from the University of Pisa, Italy and the University of California at Berkeley and performing post-doctoral research at the University of Southern California, Los Angeles and the University of Dundee, Scotland. He has contributed to over 110 publications on the subject of high-performance computing, been a member of the IEEE Committee on Parallel Processing, and served as the editor for an MIT Press series of books on Advanced Computational Methods and Engineering. Currently, Grandinetti is the Director of the University of Calabria’s Centre of Excellence on HPC and the co- managing director of a Supercomputing Centre jointly established by the University of Calabria and the NEC Corporation.
Emanuela Guerriero graduated in Computer Engineering from the Universita' degli Studi di Lecce in 1999. She got a PhD in Operation Research from the University of Calabria in 2003. In January 2004 she joined the Università of Lecce as Researcher of Operations Research (Dipartimento di Ingegneria dell'Innovazione).
Her main research interests concern combinatorial optimization models for planning and scheduling problems, with applications in manufacturing and logistics. She published on international journals, including Journal of Optimization Theory and Applications, International Transactions in Operational Research, European Journal of Operational Research, Parallel Computing, International Journal of Production Research, International Journal of Production Economics.
Francesca Guerriero is associated with the Center of Excellence on High-Performance Computing and Dipartimento di Elettronica, Informatica e Sistemistica, Universita degli Studi della Calabria, Rende, Italy.
|SYSTEM||GRID TYPE||ORGANIZATION||GRMS FEATURES|
|2K||On Demand Service Grid||Flat (Hierarchical in the Future)||Soft network QoS, one level hierarchical scheduler for network resources, decentralized scheduler for other resources|
|AppleS||Computational Grid||Resource model provided by the underlying Globus, Legion, or Netsolve middleware services. Centralized scheduler, predictive heuristic state estimation, online rescheduling|
|Bond||On Demand Service Grid||Flat||Hard QoS, decentralized scheduler, predictive pricing models, online rescheduling|
|Hierarchical||No QoS, hierarchical schedulers, extensible schUduling policy|
|Condor||Computational Grid||Flat||No QoS, centralized scheduler|
|Hierarchical||Hard QoS, hierarchical scheduler, non-predictive state estimation, online rescheduling|
|Globus||Grid Toolkit (for developing computational data, service Grids)||Soft QoS, lower level services for resource allocation or co-allocation including resource reservation. Higher-level tools (like Nimrod-G) perform scheduling|
|Multimedia Service Grid||Flat||Hard QoS, decentralized scheduler|
|MOL||Computational Grid||Hierarchical Cells||No QoS, decentralized scheduler|
|MSHN||Computational & Service Grids||Flat||Hard QoS, centralized scheduler, predictive heuristics for state estimation, event driven rescheduling|
Table 1 - Existing GRMS main features
A client-server architecture is utilized in which the server can: register any volunteer and send there a client program that can process any task, send a task to a client program that contacted the server, and receive the result of processing of a task from a client program. Any client computer processes at most one task at any time. The server stores a database of about 2 million volunteers, half a million of which actively participate in the processing, and keeps track of which volunteer processed which task. A task is sometimes processed more than once (by different volunteers) to verify the results using an independent volunteer.
An example of such a project is the National Science Foundation's Partnerships for Advanced Computational Infrastructure (PACI) Program which manages a number of computing resources at several US research institutions (including the Distributed Terascale Facility). There are several ways of obtaining access to PACI resources [Ref-18]. A scientist, engineer or educator who has a joint appointment with a US university or a US Federal agency can obtain access to PACI resources depending on his/her eligibility and requirements. Smaller allocations, including startup allocations, are handled by individual sites while larger applications are considered by the National Resource Allocations Committee (NRAC) twice annually. Requests should be clearly linked to the scientific goals and should be supported by providing performance and parallel-scaling information for the application(s) to be used. Currently, the PACI Program does not allocate resources other than CPU time. It is anticipated that the allocation of non-CPU resources (such as data storage or dedicated networking resources) may be required in the future. CPU resources are requested in the form of Service Units (SUs). In general, an SU is approximately equivalent to either one CPU hour or one wall clock hour on one CPU of the system of interest. The exact definition of an SU on each platform is defined in the technical documentation associated with each resource available through the listing of PACI resources [Ref-19].
Economy-based GRMSs provide a distributed mechanism for allocating resources to jobs at a superscheduler level while enforcing QoS constraints. Market-based GRMSs force users to compete so that the different levels of importance they give to their job become explicit and resources can be allocated accordingly. Furthermore, market-based schedulers provide a feedback signal that prevents the user from submitting unbounded amounts of work. Negotiation may be done on a monetary basis or with a dummy currency in order to share the benefit of the cooperation among organizations. According to the economic theory, auctions are among the most suitable means for achieving this result. (For material explaining auction design in a grid setting, please refer to the complete text, Grid Computing: The New Frontier of High Performance Computing, edited by Lucio Grandinetti.)
Grid users submit their jobs from any one of a number of entry points (EPs). Some privileged users have direct access to the resources of some administrative domain while others are managed by an ES. An ES acts as a broker, i.e. it is in charge of 63 dividing the job into a number of tasks and allocating each task to a site (in such a way QoS constraints are satisfied). For each site (an administrative domain or a part of it), a LS is responsible for determining job sequencing, local resource allocation and data transfer scheduling. In general, on receipt of a job request, the ES interrogates a number of controlled LSs to ascertain whether the task can be executed on the available resources and meet the user-specified deadline (feasibility check). If this is not the case the ES attempts to locate a LS, managed by another ES, that can meet the task requirements.
If a LS cannot be located within a preset number of search steps, the task request is either rejected or passed to a scheduler that can minimize the deadline failure. When a suitable site is located, the task request is passed from the ES to the selected LS. The choice of specific algorithms for each component defines a particular GRMS.
In practice, the above described architecture can be expanded in order to improve its flexibility and/or to implement grid economy features. In particular, the following variants are expected to be considered in the near future:
Figure 4 - Grid scheduling architecture.
As pointed out in previous sections, the literature on grid resource management is still in its infancy. As a result there are several research lines that deserve to be followed. In the remainder of this paper we report and discuss a list of the most remarkable issues.
The research activity should be focused on devising enhanced semi on-line heuristics for the multiprocessor task scheduling problem [Ref-20, 21, 22]. Unlike traditional sequential and parallel computing environments (where only "soft" QoS is guaranteed, and scheduling must be performed very quickly), in a grid environment it is worth spending a greater effort in order to obtain better schedules (i.e. schedule with a better trade-off between economic and QoS objectives). In order to improve the algorithms proposed in Caramia et al. and Festa et al., our basic idea is to allow a modification of the current schedule any time a new user request arrives. In order to implement this strategy, new tailored metaheuristics [Ref-23] should be devised in order to provide much better solutions in a short amount of time. The so called metaheuristic algorithms can play a very important role in solving hard combinatorial problems. They include multistart methods, genetic algorithms, simulated annealing, tabu search, variable neighborhood descent, variable neighborhood search, and grasp. It is worth noting that, in order to provide good results, metaheuristics need to be tailored to handle the problem at hand.
Secondly, the research activity should be based on a number of meaningful generalizations of the on- line problems tackled in Caramia et al. and Festa et al. The main features of these problems are listed below:
Thirdly, the high degree of dynamism of computational grids should be taken into account explicitly in the resource management system. This requires that variations of resource availability (e.g., processor speed and network bandwidth) and service requests (e.g., job arrival times) are suitably characterized. Then such (probabilistic) characterizations need to be taken into account in computationally efficient rescheduling algorithms.
As explained in the previous sections, economy-based GRMSs provide a distributed mechanism for allocating resources to jobs at a superscheduler level while enforcing QoS constraints. Because of the peculiar nature of grid resources and services, combinatorial auctions are the most suitable mechanism. With this regard, three main research issues deserve attention. Firstly, a grid bidding language should be devised in such a way applications' agents are able to express alternative resource needs. Secondly, efficient grid auction mechanisms need to be developed. A mechanism is defined as the specification of all possible bidding strategies available to participants, and of an outcome function that maps these strategies to an allocation of resources (who gets what?) and corresponding payments that participants need to make or receive. Finally, efficient heuristics should be devised for the winner determination problems associated to the selected auction mechanisms.
Although grid resource management is still a relatively new field, there are already numerous methods for improving the economy and proficiency of computational grids. Local and external scheduling are the two primary research channels for developing grid management blueprints. Grid signals and data received on a zero-settlement or award basis are becoming a tremendous resource for developing better computational grids. Machine and scheduler organization research are also in the works for making grid resource management more efficient. As the forecasting, scheduling, and coordination of computational resources continues to improve, systems and applications will have access to faster and more-streamlined computational grids. With the continual advancement of these computational grids, not only are the services that they perform improved, but so are the applications and systems that rely on their services.
This article is based on material found in "Operations Research Methods for Resource Management and Scheduling in a Computational Grid: a Survey," written by Andrea Attanasio, Gianpaolo Ghiani, Lucio Grandinetti, Emanuela Guerriero, and Francesca Guerriero. The chapter is included in the book Grid Computing: The New Frontier of High Performance Computing, edited by Lucio Grandinetti.
Visit the publisher's page to learn about purchasing a copy of this book: http://www.elsevier.com/books/grid-computing-the-new-frontier-of-high-performance-computing/grandinetti/978-0-444-51999-3
[Ref-1] Foster and C. Kesselman (editors), The Grid: Blueprint for a Future Computing Infrastructure, Morgan Kaufmann Publishers, San Francisco, USA, 1999.
[Ref-2] M. Chetty and R. Buyya. Weaving computational Grids: How analogous are they with electrical Grids? Journal of Computing in Science and Engineering, 4, 61-71, 2001.
[Ref-3] K. Krauter, R. Buyya and M. Maheswaran. A taxonomy and survey of grid resource management systems for distributed computing, Software - Practice & Experience, 32, 135-164,2002.
[Ref-4] Foster and C. Kesselman. Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications, 11, 115-128, 1997.
[Ref-5] S. Chapin, J. Karpovich and A. Grimshaw. The Legion Resource Management System, Proceedings of the 5th Workshop on Job Scheduling Strategies for Parallel Processing, Birmingham, UK, 1999.
[Ref-6] F. Berman and R. Wolski. The AppLeS Project: A Status Report, Proceedings of the 8th NEC Research Symposium, Berlin, Germany, 1997.
[Ref-7] D. Thain, T. Tannenbaum and M. Livny. Distributed computing in practice: the Condor experience. Concurrency and Computation: Practice and Experience, 17, 323-356, 2005.
[Ref-8] R. Buyya, D. Abramson and J. Giddy. Nimrod/G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid, International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2000), Beijing, China. IEEE Computer Society Press, USA, 2000.
[Ref-9] B. Bode, D. Halstead, R. Kendall, and D. Jackson. PBS: The portable batch scheduler and the maui scheduler on linux clusters. Proceedings of the 4th Linux Showcase and Conference, Atlanta, GA, USENIX Press, Berkley, CA, 2000.
[Ref-10] GRACE (Grid Architecture for Computational Economy) project, http://www.gridbus.org/.
[Ref-11] R. Buyya. Economic-based Distributed Resource Management and Scheduling for Grid Computing, Ph.D. Thesis, Monash University, Melbourne, Australia, 2002.
[Ref-12] T. Sandholm. Distributed Rational Decision Making. In: G. Weiss (Editor), Multiagent Systems, The MIT Press, Cambridge, MS, 201-258, 1999.
[Ref-13] Korpela, E., Werthimer, D., Anderson, D., Cobb and J., Lebofsky. Seti@home – massively distributed computing for SETI. Computing in Science & Engineering, 3, 78-83, 2001 (http://setiathome.ssl.berkeley.edu).
[Ref-14] Weth, C, Kraus, U., Freuer, J., Ruder, M., Dannecker, R., Schneider, R., Konold, M. Ruder. XPulsar@home - Schools help Scientists. Proceedings of CCGrid, Brisbane, Australia, 588-594, 2001.
[Ref-15] The Intel(r) Philanthropic Peer-to-Peer Program, (http://www.intel.com/cure).
[Ref-16] The Olson Laboratory FightAIDS@Home project, (http://www.fightaidsathome.org/discovery.asp).
[Ref-17] The RSA Factoring By Web project (http://www.npac.syr.edu/factoring).
[Ref-18] PACI and DTF Resource Allocations Policies, Vl.O.a, Modified 09 March 2004, available at http://www.PACI.org/.
[Ref-20] M. Drozdowski. Scheduling multiprocessor tasks. An overview, European Journal of Operations Research 94, 215- 230, 1996.
[Ref-21] M. Caramia, S. Giordani, A. Iovanella. Grid scheduling by on-line rectangle packing, Networks 44, 106-119, 2004.
[Ref-22] P. Festa, G. Ghiani, L. Grandinetti and E. Guerriero. Semi on-line multiprocessor task scheduling, Technical Report, Center of Excellence for High Performance Computing, Universita della Calabria, 2005.
[Ref-23] F. W. Glover, G. A. Kochenberger (Eds). Handbook of Metaheuristics (International Series in Operations Research & Management Science), Springer, Berlin, Germany, 2003.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to:
Copyright Clearance Center
222 Rosewood Drive, Danvers, MA 01923
978-750-8400, fax 978-750-4744
Portions of this work are from Grid Computing: The New Frontier of High Performance Computing, edited by Lucio Grandinetti and published by Elsevier, Copyright 2005 Elsevier B.V. All rights reserved.