CCGrid 2008 Program
8th IEEE International Symposium on Cluster Computing and the Grid

May 19-22, 2008, Ecole Normale Superieure de Lyon, Lyon, France

Monday 19 May 2008 : Tutorials

Registration in UNESCO Rooms
UNESCO Room (Ground floor)
UNESCO Room (1st floor)
08:00-09:30 Title: Pegasus Workflow Management System (Part 1)
Contact: vahi@ISI.EDU
Title: Trusted Virtualization and Grid Security (Part 1)
09:45-11:15 Title: Pegasus Workflow Management System (Part 2)
Contact: vahi@ISI.EDU
Title: Trusted Virtualization and Grid Security (Part 2)
11:30-13:00 Title: Market-Oriented Grid Computing and the Gridbus Middleware (Part 1)
Title: Ibis Tutorial (Part 1)
14:00-15:30 Title: Market-Oriented Grid Computing and the Gridbus Middleware (Part 2)
Title: Ibis Tutorial (Part 2)
15:45-17:15 Title: Simulation for Large-Scale Distributed Computing Research (Part 1)
Title: GRelC DAIS: A P2P Framework for Data Access and Integration in Grid (Part 1)
17:30-19:00 Title: Simulation for Large-Scale Distributed Computing Research (Part 2)
Title: GRelC DAIS: A P2P Framework for Data Access and Integration in Grid (Part 2)

Tuesday 20 May 2008



CCGrid 2008 opening session
Mérieux Theater : Keynote 1 : "Four Important Concepts to Consider when Using Computing Clusters and Grids", Jack Dongarra, University of Tennessee, Oak Ridge National Laboratory, USA, IEEE Medal for Excellence in scalable computing winner
Keynote chair : L. Lefèvre
In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased run--time environment variability will make these problems much harder. We will look at four areas of research that will have an importance impact in the development of software.

We will focus on following themes: Redesign of software to fit multicore architectures, Automatically tuned application software, Exploiting mixed precision for performance and the importance of fault tolerance.
Jack Dongarra is University Distinguished Professor of Computer Science in the Computer Science Department at the University of Tennessee and holds the title of Distinguished Research Staff in the Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL), Turning Fellow at Manchester University, and an Adjunct Professor in the Computer Science Department at Rice University. He is the director of the Innovative Computing Laboratory at the University of Tennessee. He is also the director of the Center for Information Technology Research at the University of Tennessee which coordinates and facilitates IT research efforts at the University.
Mérieux Theater
UNESCO Room (Ground floor)
Thesis Room (ENS)
UNESCO Room (1st floor)
Session 1
Scheduling Algorithms

Session chair : S. Matsuoka
Session 2

Session chair : O. Rana
Lunch in Atrium
Session 3

Session chair : C. Cérin
Session 4

Session chair : R. Badia
Session 5
Cluster Computing

Session chair : T. Priol
Session 6
Grid Middleware & Programming

Session chair : F. Cappello
Welcome reception and cocktail in Lyon City Hall

Wednesday 21 May 2008



Mérieux Theater : Keynote 2 : "Beyond Grid middleware: XtreemOS Vision", Christine Morin, Senior Researcher, INRIA, France
Keynote chair : H. Bal
Despite the availability of various middleware, Grid environments are still complex to manage, use and program. In this talk, we present a novel Grid operating system approach, promoted by the XtreemOS European project funded under the FP6 program. XtreemOS targets the management of large and very dynamic Grid systems: users logged in an XtreemOS box will transparently exploit VO-managed resources through the standard POSIX interface. While much work has been done to build Grid middleware on top of existent operating systems, little has been done to extend the underlying operating systems for enabling and facilitating Grid computing, for example, by embedding some important basic services or functionalities directly into the operating system. In this light, XtreemOS aims to be a first European step towards the creation of true open source operating system for Grid platforms. The XtreemOS operating system is composed of a consistent set of integrated Grid OS services. It is based on Linux traditional general-purpose OS, extended as needed to support VOs, and to provide appropriate interfaces to the Grid operating system services. In contrast to middleware approaches, XtreemOS is an operating system able to execute any kind of application, including unmodifyed existing applications. Both traditional scientific applications and commercial services are within XtreemOS scope.
Christine Morin is senior researcher at INRIA in the INRIA PARIS project-team contributing to the programming of large scale parallel and distributed systems. She has led research activities on single system image OS for high performance computing in clusters, resulting in Kerrighed cluster OS, now developed in open source. She is the scientific coordinator of the XtreemOS project which is a 4-year European integrated project started in June 2006. She is a co-founder of Kerlabs start-up, created in 2006 to exploit Kerrighed technology. Her research interests are in operating systems, distributed systems, fault tolerance, cluster and grid computing.
Break, Posters and Exhibits
Mérieux Theater
UNESCO Room (Ground floor)
Thesis Room (ENS)
UNESCO Room (1st floor)
Session 7

Session chair : R. Buyya
Session 8
Service computing

Session chair : B. Schulze
Lunch in Atrium combined with Posters Session
Session 9
Grid economy

Session chair : S. Matsuoka
Session 10
Resource Management 1

Session chair : O. Tatebe
Break combined with Posters Session
Session 11

Session chair : C. Pérez
Session 12
Resource Management 2

Session chair : W. Ziegler
Mérieux Theater : Best CCGRID2008 Paper award announcement
Mérieux Theater : Plenary Panel Session : "How useful and relevant are existing abstractions for current and emerging large scale, distributed computational infrastructures?" leaded by Shantenu Jha, Louisiana University, USA
Panelists :
  • Satoshi Matsuoka, Toyko Institute of Technoloy, Japan
  • Franck Cappello, Université Paris Sud, France
  • Henri Bal, Vrije Universiteit, Netherlands

Thursday 22 May 2008



Mérieux Theater : Keynote 3 : "Impact of Cloud Computing on Emerging Software Systems and Solutions ", Hamid Pirahesh, IBM Fellow, USA
Keynote chair : T. Priol
Information technology is going through a fundamental change, influenced primarily by (1) Flexible provisioning and scalability of Cloud Computing, (2) Rise of analytics around semi-structured and unstructured data in the context of semantically rich data objects in the main stream data processing, (3) Much increased human interaction with the web due to the use of mobile devices, particularly for more critical financial transactions and purchasing services and goods, (4) Web Scale programming community with Web 2.0, search and open software, (5) Rise of SaaS (Software As A Service).

Continuous arrival of huge amount of data from numerous sources requires continuous discovery of information. Unstructured and semi-structured data dominates this space. Web scale solutions require new approaches to integration and information composition, such as Web 2.0 mash-ups. Variability of incoming information requires semi-structured repositories with flexible schema and the associated query language. Cloud Computing is mainly driven by the commercial applications. However, high Performance scientific Computing can significantly benefit from cloud computing.

There is a particular emphasis on breaking the complexity barrier of today's solutions through simplification. The lifetime cost of ownership of solutions is dominated by the human time spent in building, operating and evolving these solutions. Much increased compute power in cloud computing enables us to reduce this complexity by reducing the use of fragile and complex machine optimized programs in favor of simpler and more stable and scalable ones. Flexibility and much quicker provisioning of cloud computing combined with much reduced cost per terra byte/flop are key factors in much faster deployment of solutions.
Hamid Pirahesh, Ph.D., is an IBM fellow and the manager of DataBase Technology Institute (DBTI) at IBM Almaden Research Center in San Jose, California. Pirahesh is an IBM master inventor, and is a member of IBM Academy. He also has direct responsibilities in various aspects of IBM DB2 UDB product, including architecture, design and development. He is a senior manager responsible for the exploratory database research department at IBM Almaden Research Center areas.
Break and Exhibits
Mérieux Theater
UNESCO Room (Ground floor)
Thesis Room (ENS)
UNESCO Room (1st floor)
Session 13

Session chair : P. Primet
Session 14
Data Management

Session chair : P. Roe
Session 15

Session chair : C. Morin
Conference closing
Conference banquet : Abbaye de Collonges (one of the Paul Bocuse restaurant ! It will be a great evening ! Don't miss it !)
(We walk 8 minutes from CCGrid conference to take the boat to the restaurant. Abbaye de Collonges.
The meeting point is at 18:45 under the CCGrid banner)

CCGrid2008 Conference Program

Session 1 : Scheduling Algorithms
Tuesday 20th May - 10:30-12:30
Mérieux Theater
Session chair : S. Matsuoka
Session 2 : Peer-to-peer
Tuesday 20th May - 10:30-12:30
UNESCO Room (Ground floor)
Session chair : O. Rana
  • Rajiv Ranjan, Mustafizur Rahman and Rajkumar Buyya. A Decentralized and Cooperative Workflow Scheduling Algorithm
  • Marek Wieczorek, Stefan Podlipnig, Radu Prodan and Thomas Fahringer. Bi-criteria Scheduling of Scientific Workflows for the Grid
  • Konstantinos Christodoulopoulos, Nikolaos Doulamis and Emmanouel Varvarigos. Joint Communication and Computation Task Scheduling in Grids
  • Christian Grimme, Joachim Lepping and Alexander Papaspyrou. Benefits of Job Exchange between Autonomous Sites in Decentralized Computational Grids
  • Sharon Shitrit, Eyal Felstaine, Niv Gilboa and Ofer Hermoni. Anonymity Scheme for Interactive P2P Services
  • Yi Wan, Takuya Asaka and Tatsuro Takahashi. A Hybrid P2P Overlay Network for Non-Strictly Hierarchically Categorized Contents
  • Nuno Cruces, Rodrigo Rodrigues and Paulo Ferreira. Pastel: Bridging the Gap Between Structured and Large-State Overlays
  • Luis Carlos Erpen De Bona, Elias P. Duarte Jr. and Keiko Veronica Ono Fonseca. HyperBone: A Scalable Overlay Network Based on a Virtual Hypercube

Session 3 : Applications
Tuesday 20th May - 14:00-16:00
Mérieux Theater
Session chair : C. Cérin
Session 4 : Security
Tuesday 20th May - 14:00-16:00
UNESCO Room (Ground floor)
Session chair : R. Badia
  • Manuel Rubio, Miguel Ángel Vega, Juan Manuel Sánchez, Antonio Gómez-Iglesias and Miguel Cárdenas-Montes. A FPGA Optimization Tool based on a Multi-Island Genetic Algorithm distributed over Grid Environments
  • Tianyi Zang, Radu Calinescu, Steve Harris, Andrew Tsui, Marta Kwiatkowska, Jeremy Gibbons, Jim Davies, Peter Maccallum and Carlos Caldas. WSRF-Based Modeling of Clinical Trial Information for Collaborative Cancer Research
  • Xiaolin Li and Manish Parashar. GridMate: A Portable Simulation Environment for Large-Scale Adaptive Scientific Applications
  • Ganeshamoorthy Kandasamy and Nalin Ranasinghe. On the Performance of Parallel Neural Network Implementations on Distributed Memory Architectures
  • Takeshi Nishikawa and Satoshi Matsuoka. Time Stamping Authority Grid
  • Richard Sinnott, David Chadwick, Thomas Doherty, David Martin, Gordon Stewart, John Watt, Anthony Stell and Linying Su. Advanced Security for Virtual Organizations: The Pros and Cons of Centralized vs Decentralized Security Models
  • Stefan Piger, Christian Grimm, Ralf Groeper and Christopher Kunz. A Comprehensive Approach to Self-Restricted Delegation of Rights in Grids
  • Tim Dörnemann, Matthew Smith and Bernd Freisleben. Composition and Execution of Secure Workflows in WSRF-Grids

Session 5 : Cluster Computing
Tuesday 20th May - 16:30-18:30
Mérieux Theater
Session chair : T. Priol
Session 6 : Grid Middleware & Programming
Tuesday 20th May - 16:30-18:30
UNESCO Room (Ground floor)
Session chair : F. Cappello
  • Amith R Mamidala, Rahul Kumar, Debraj De and Dhabaleswar K Panda. MPI Collectives on modern Multicore clusters: Performance Optimizations and Communication Characteristics
  • Karthikeyan Vaidyanathan, Ping Lai, Sundeep Narravula and Dhabaleswar Panda. Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications
  • Filip Blagojevic, Matthew Curtis-Maury, Jae-Seung Yeom, Scott Schneider and Dimitrios Nikolopoulos. Scheduling Asymmetric Parallelism on a PlayStation3 Cluster
  • Weikuan Yu and Jeffrey S Vetter. Xen-Based HPC: A Parallel IO Perspective
  • Thomas Fieseler and Wolfgang Gürich. Operation of the Core D-Grid Infrastructure
  • Nabil Abdennadher, Peter Engel, Derek Feichtinger, Dean Flanders, Placi Flury, Pascal Jermini, Sergio Maffioletti, Cesare Pautasso, Heinz Stockinger, Wibke Sudholt, Michela Thiemard, Nadya Williams and Christoph Witzig. Initializing a National Grid Infrastructure : Lessons Learned from the Swiss National Grid Association Seed Project
  • Areski Flissi, Jeremy Dubus, Nicolas Dolet and Philippe Merle. Deploying on the Grid with DeployWare
  • Enric Tejedor and Rosa M. Badia. COMP Superscalar: Bringing GRID Superscalar and GCM together

Session 7 : Workflow
Wednesday 21th May - 10:30-12:30
Mérieux Theater
Session chair : R. Buyya
Session 8 : Service computing
Wednesday 21th May - 10:30-12:30
UNESCO Room (Ground floor)
Session chair : B. Schulze
  • Tamas Kiss, Peter Kacsuk, Gabor Terstyanszky and Stephen Winter. Workflow Level Interoperation of Grid Data Resources
  • Markus Held and Wolfgang Blochinger. Collaborative BPEL Design with a Rich Internet Application
  • Adam Barker, Jon Weissman and Jano van Hemert. Orchestrating Data-centric Workflows
  • David Stirling, Ian Welsh and Peter Komisarczuk. Designing Workflows for Grid Enabled Internet Instruments
  • Wolfram Wiesemann, Ronald Hochreiter and Daniel Kuhn. A Stochastic Programming Approach for QoS-Aware Service Composition
  • Simon Caton, Matthan Caan, Silvia Olabarriaga and Omer Rana. Using Dynamic Condor-based Services for Classifying Schizophrenia in Diffusion Tensor Images
  • Leonid Glimcher and Gagan Agrawal. A Middleware for Developing and Deploying Scalable Remote Mining
  • Christoph Reich, Kris Bubendorfer and Rajkumar Buyya. An Autonomic Peer-to-Peer Architecture for Hosting Stateful Web Services

Session 9 : Grid economy
Wednesday 21th May - 14:00-16:00
Mérieux Theater
Session chair : S. Matsuoka
Session 10 : Resource Management 1
Wednesday 21th May - 14:00-16:00
UNESCO Room (Ground floor)
Session chair : O. Tatebe
  • Kurt Vanmechelen, Wim Depoorter and Jan Broeckhove. Economic Grid Resource Management for CPU Bound Applications with Hard Deadlines
  • Anthony Sulistio, Kyong Hoon Kim and Rajkumar Buyya. Managing Cancellations and No-shows of Reservations with Overbooking to Increase Resource Revenue
  • Thomas Sandholm, Kevin Lai and Scott Clearwater. Admission Control in a Computational Market
  • Cecile Germain, Julien Perez, Balazs Kégl and Charles Loomis. Grid Differentiated Services: a Reinforcement Learning Approach
  • Huan Liu. GridBatch: Cloud Computing for Large-Scale Data-intensive Batch Applications
  • Norman Bobroff, Liana Fong, Selim Kalayci, Yanbin Liu, Juan Carlos Martinez, Ivan Rodero, Seyed Masoud Sadjadi and David Villegas. Enabling Interoperability among Meta-Schedulers
  • Michael Heidt, Tim Dörnemann, Kay Dörnemann and Bernd Freisleben. Omnivore: Integration of Grid Meta-Scheduling and Peer-to-Peer Technologies
  • Yang-Suk Kee and Carl Kesselman. Grid Resource Abstraction, Virtualization, and Provisioning for Time-targeted Applications

Session 11 : Networking
Wednesday 21th May - 16:30-18:30
Mérieux Theater
Session chair : C. Pérez
Session 12 : Resource Management 2
Wednesday 21th May - 16:30-18:30
UNESCO Room (Ground floor)
Session chair : W. Ziegler
  • Hajime Fujita, Hiroya Matsuba and Yutaka Ishikawa. TCP Connection Scheduler in Single IP Address Cluster
  • Kees Verstoep, Jason Maassen, Henri Bal and John Romein. Experiences with Fine-grained Distributed Supercomputing on a 10G Testbed
  • Ping Lai, Sundeep Narravula, Karthikeyan Vaidyanathan and Dhabaleswar Panda. Advanced RDMA-based Admission Control for Modern Data-Centers
  • Kei Takahashi, Hideo Saito, Takeshi Shibata and Kenjiro Taura. A Stable Broadcast Algorithm
  • Mathias Dalheimer, Franz-Josef Pfreundt and Peter Merz. Formal Verification of a Grid Resource Allocation
  • Yulai Yuan, Yongwei Wu and Guangwen Yang. Adaptive Hybrid Model for Long Term Load Prediction in Computational Grid
  • Farrukh Nadeem, Radu Prodan and Thomas Fahringer. Characterizing, Modeling and Predicting Dynamic Resource Availability in the Grid
  • Ran Yang, Rob van der Mei, Dennis Roubos, Frank Seinstra and Ger Koole. On the Optimization of Resource Utilization in Distributed Multimedia Applications

Session 13 : Communication
Thursday 22th May - 10:30-12:30
Mérieux Theater
Session chair : P. Primet
Session 14 : Data Management
Thursday 22th May - 10:30-12:30
UNESCO Room (Ground floor)
Session chair : P. Roe
  • Ryousei Takano, Motohiko Matsuda, Tomohiro Kudoh, Yuetsu Kodama, Fumihiro Okazaki, Yutaka Ishikawa and Yasufumi Yoshizawa. High Performance Relay Mechanism for MPI Communication Libraries Run on Multiple Private IP Address Clusters
  • Francisco Javier García Blas, Florin Isaila, David E. Singh and Jesus Carretero. View-based collective I/O for MPI-IO
  • Camille Coti, Thomas Herault, Sylvain Peyronnet, Ala Rezmerita and Franck Cappello. Grid Services for MPI
  • Yoshikazu Kamoshida and Kenjiro Taura. Scalable Data Gathering for Real-time Monitoring Systems on Distributed Computing
  • Ali Elghirani, Riky Subrata and Albert Zomaya. A Proactive Non-Cooperative Game-theoretic Framework for Data Replication in Data Grids
  • Michal Vossberg, Andreas Hoheisel, Thomas Tolxdorff and Dagmar Krefting. A Reliable DICOM Transfer Grid Service Based on Petri Net Workflows
  • Daniel L. Wang, Charles S. Zender and Stephen Jenks. Clustered Workflow Execution of Retargeted Data Analysis Scripts
  • Lin Lin, Xuemin Li, Hong Jiang and Yifeng Zhu. AMP: An Affinity-based Metadata Prefetching Scheme in Large-Scale Distributed Storage Systems

Session 15 : Fault-Tolerance
Thursday 22th May - 14:00-16:00
Mérieux Theater
Session chair : M. Vanneschi
Session 16 : Models and Provenance & Ontology
Thursday 22th May - 16:30-18:30
Mérieux Theater
Session chair : M. Bubak
  • Juergen Hofer and Thomas Fahringer. Synthesizing Byzantine Fault-Tolerant Grid Application Wrapper Services
  • Fatiha bouabache, Thomas Herault, Gilles Fedak and Franck Cappello. Hierarchical Replication Techniques to Ensure Checkpoint Storage Reliability in Grid Environments
  • Aurelien Bouteiller and Frederic Desprez. Fault Tolerance Management for a Hierarchical GridRPC Middleware
  • Chun-Chen Hsu, Pangfeng Liu and Chien-Min Wang. Heuristic Algorithms for Replication Transition Problem in the Grid Systems
  • Thanasis Papaioannou and George D. Stamoulis. Reputation-based Estimation of Individual Performance in Grids
  • Tristan Glatard, Johan Montagnat and Xavier Pennec. A probabilistic model to analyse workflow performance on production grids
  • Ran Yang, Rob van der Mei, Dennis Roubos, Frank Seinstra, Ger Koole and Henri Bal. Modeling ``Just-in-Time'' Communication in Distributed Real-Time Multimedia Applications
  • Sergio Serra, Marta Mattoso, Patricia Barros, Paulo M. Bisch and Maria Luiza Machado Campos. Provenance services for distributed workflows Li Dong. A Framework for Ontology-based Data Integration

CCGrid2008 European Projects Showcase

Thursday 22th May - UNESCO Room (Ground floor)
  • 14:00-16:00 : Session Chair : L. Lefevre
    • 14:00-14:15 CoreGrid -- Network of Excellence
    • 14:15-14:30 EC-Gin
    • 14:30-14:45 Phosphorus
    • 14:45-15:00 BEinGRID
    • 15:00-15:15 SORMA and GridEcon
    • 15:15-15:30 RESERVOIR
    • 15:30-15:45 Grid4All
    • 15:45-16:00 SmartLM
  • 16:30 - 18:30 : Session Chair : O. Rana
    • 16:30-16:45 OGF Europe
    • 16:45-17:00 SAGA
    • 17:00-17:15 VirtualLabfMRI/HealthGrid
    • 17:15-17:30 KnowARC
    • 17:30-17:45 A-Ware
    • 17:45-18:00 Ibis
    • 18:00-18:15 gEclipse
    • 18:15-18:30 DEISA/DESHL

CCGrid2008 Workshops Program

Tuesday 20th May - UNESCO Room (1st floor)
  • 10:30-12:30 : Data and resource management in workflow systems
    • Metadata Management in the Taverna Workflow System, Khalid Belhajjame, Katy Wolstencroft, Oscar Corcho, Tom Oinn, Franck Tanoh, Alan William and Carole Goble
    • Data Management Challenges of Large-Scale, Data-Intensive Scientific workflow Ewa Deelman, Ann Chervenak
    • Provenance Tracking and Querying in the ViroLab Virtual Laboratory Bartosz Balis, Marian Bubak, Michal Pelczar and Jakub Wach
    • Resource Discovery based on a Novel Distributed DNS Framework, Lican Huang
    • A Task Pipelining Framework for e-ScienceWorkflow Management Systems Hyeong S. Kim, In Soon Cho and Heon Y. Yeom
  • 14:00-16:00 : Execution and interactivity in workflow systems
    • A New Approach to Development and Execution of Interactive Applications on the Grid, Piotr Nowakowski, Daniel Harezlak and Marian Bubak
    • Implementation of Turing machines with the Scufl data-flow language, Tristan Glatard and Johan Montagnat.
    • A Framework for Interactive Parameter Sweep Applications, Adianto Wibisono, Zhiming Zhao, Adam Belloum and Marian Bubak
    • A Lightweight Middleware Monitor for Distributed Scientific Workflows Fabricio Nogueira, Sergio Serra, Luiz Gadelha, Maria Claudia Cavalcanti, Maria Luiza Machado Campos and Marta Mattoso
    • Scheduling Dynamic Workflows onto Clusters of Clusters using Postponing Strategies, Sascha Hunold, Thomas Rauber and Frederic Suter
  • 16:30-18:30 : Workflow systems
    • Comparative Studies Made Simple in GPFlow Lawrence Buckingham, James Hogan, Paul Roe, Jiro Sumitomo and Michael Towsey
    • Securing Grid Workflows with Trusted Computing, Po-Wah Yau, Allan Tomlinson, Shane Balfe and Eimear Gallery
    • Architecture of the DaltOn Data Integration System for Scientific Applications Stefan Jablonski, olivier cure, M. Abdul Rehman and Bernhard Volz
    • Discussion

Tuesday 20th May - Thesis Room (ENS)
  • 14:00-16:00 :
    • DaCAP- A Distributed Anti-Cheating Peer to Peer Architecture for Massive Multiplayer On-line Role Playing Game. Huey-Ing Liu, Yun-Ting Lo
    • On the Construction of a Super-Peer Topology underneath Middleware for Distributed Computing. Peter Merz, Jan Ubben, Matthias Priebe
    • A Simple Cache Based Mechanism For Peer To Peer Resource Discovery in Grid Environment. Filali Imen, Huet Fabrice, Vergoni Christophe
    • Peer Delay Trade-offs for Video Streaming in P2P Systems. Anis Ouali, Brigitte Jaumard, Gérard Hébuterne
  • 16:30-18:30 :
    • Invited Speaker Talk : On Correlated Availability in Internet Distributed Systems. Derrick Kondo
    • RESERV: A Distributed, Load-Balanced Information System for Grid Applications Vincze Gábor, Novák Zoltán, Pap Zoltán, Vida Rolland
    • Peer-to-Peer Desktop Grids in the Real World: the ShareGrid Project. Cosimo Anglano, Massimo Canonico, Marco Guazzone, Sergio Rabellino, Simone Arena, Guglielmo Girardi, Marco Botta
    • Invited Speaker Talk : Large Scale Execution of a Bioinformatic Application on a Volunteer Grid. Viktors Bertis, Raphael Bolze, Frédéric Desprez and Kevin Reed

Wednesday 21th May - Thesis Room (ENS)
  • 10.30-12.30
    • Fault-tolerant Policy for Optical Network Based Distributed Computing System. Zhenyu Sun, Guo Wei, Yaohui Jin, Weiqiang Sun, Weisheng Hu
    • Temporal Routing Metrics for Networks with Advance Reservations. Christoph Barz, Markus Pilz, André Wichmann
    • Deployment and Interoperability of the Phosphorus Grid Enabled GMPLS (G2MPLS) Control Plane. Eduard Escalona, Georgios Zervas, Reza Nejabati, Dimitra Simeonidou, George Markidis, Anna Tzanakaki, Gino Carrozzo, Nicola Ciulli, Bartosz Belter, Artur Binczewski
    • Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks. Panagiotis Kokkinos, Kostas Christodoulopoulos, Aristotelis Kretsis, Emmanouel Varvarigos
  • 14.00-16.00
    • Invited Speaker: The Carriocas Project. Pascale Vicat-Blanc Primet

Thursday 22th May - UNESCO Room (1st floor)
  • 10:30-12:30
    • Stephen L. Scott, Chokchai (Box) Leangsuksun Co-Chairs : Welcome / Introduction
    • Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin He. Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations
    • Jiaying Zhang and Peter Honeyman. Performance and Availability Tradeoffs in Replicated File Systems
    • Bradley W. Settlemyer and Walter B. Ligon III. A Technique for Lock-less Mirroring in Parallel File Systems
  • 14:00-16:00
    • J.T. Daly, L.A. Pritchett-Sheats, and S.E. Michalak. Application MTTFE vs. Platform MTTF: A Fresh Perspective on System Reliability and Application Throughput for Computations at Scale
    • William M. Jones, John T. Daly, and Nathan A. DeBardeleben. Application Resilience: Making Progress in Spite of Failure
    • Gopi Kandaswamy, Anirban Mandal, and Daniel A. Reed. Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
    • Thomas Ropars and Christine Morin. Fault Tolerance in Cluster Federations with O2P-CF
  • 16:30-18:30
    • Nichamon Naksinehaboon, Yudan Liu, Chokchai (Box) Leangsuksun, Raja Nassar, Mihaela Paun, and Stephen L. Scott. Reliability-aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments
    • Ann Gentile, Jim Brandt, Philippe Pebay, David Thompson, Matthew Wong, Bert Debusschere, and Jackson Mayo. Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems
    • Jon Stearley. Bad Words: Finding Faults in Spirit's Syslog

Thursday 22th May - Thesis Room (ENS)
  • 10:30-12:30
    • Keynote: Rodrigo Lopez (European Bioinformatics Institute) : "Building experiments and on-line pipe-lines using Web Services. A providers perspective."
    • Marco Pagni, Joerg Hau and Heinz Stockinger. A Multi-Protocol Bioinformatics Web Service: Use SOAP, Take a REST or Go With HTML
    • Jose Luis Vazquez-Poletti, Eduardo Huedo, Ruben Santiago Montero and Ignacio Martin Llorente. CD-HIT Workflow Execution on Grids using Replication Heuristics
    • Antonella Galizia, Federica Viti, Alessandro Orro, Daniele D'Agostino, Ivan Merelli, Luciano Milanesi and Andrea Clematis. TMAinspect, an EGEE Framework for Tissue MicroArray image handling
  • 14:00-15:00
    • Keynote: JR Valverde (EMBnet/CNB) : "The quest for the Holy Grid in Bioinformatics"
    • Paolo D'Onorio De Meo, Danilo Carrabino, Mattia D'Antonio, Nico Sanna, Tiziana Castrignano`, Rosalia Maglietta, Annarita D'Addabbo, Sabino Liuni, Flavio Mignone, Graziano Pesole and Nicola Ancona. HT-RLS: High-Throughput web tool for analysis of DNA microarray data using RLS classifiers
    • Diane Lingrand, Johan Montagnat and Tristan Glatard. Modeling the latency on production grids with respect to execution context

Thursday 22th May - Thesis Room (ENS)
  • 15:00 -16:00 :
    • Invited Speaker Talk : "Modular Authorisation for Grids", David W. Chadwick, Information Systems Security, Computing Laboratory, University of Kent, Canterbury, UK
  • 16:30-18:30 :
    • Wolfgang Hommel. Using Policy-based Management for Privacy-Enhancing Data Access and Usage Control in Grid Environments
    • Massimiliano Pala, Scott Rea, Shreyas Cholia and Sean Smith. Extending PKI Interoperability in Computational Grids
    • Shreyas Cholia and R. Jefferson Porter. Publication and Protection of Sensitive Site Information in a Grid Infrastructure
    • Guido van 't Noordende, Matthijs Koot and Silvia Olabarriaga. A Trusted Storage Infrastructure for Grid-based Medical Applications
    • Hong Wang, Hiroyuki Takizawa and Hiroaki Kobayashi. A Performance Study of Secure Data Mining on the Cell Processor

Wednesday 21th May - Thesis Room (ENS)
  • A Transactional Scalable Distributed Data Store: Wikipedia on a DHT (T. Schutt et al) - Berlin, Germany
  • Market-based Provisioning of a Cloud Computing Cluster (T. Sandholm et al) - Sweden
  • Gridbus Middleware for Life Science Applications on Global Grids (R. Buyya et al) - Melbourne, Australia
  • Scalable Wall-Socket Multimedia Grid Computing (F.J. Seinstra et al) - Amsterdam, Netherlands
  • Large-Scale Bioinformatic Computing on Data Desktop Grid (H. He et al) - Paris, France
  • LiveWN: Scavenging In The Grid Era (G. Kouretis et al) - Athens , Greece

Wednesday 21th May - Poster Room - During breaks and lunch
    • Akiko Nakaniwa, Srikumar Venugopal and Raj Buyya.
      Co-scheduling of Data Replication and Job Execution in Data Grids
    • Anne-Cécile Orgerie, Laurent Lefèvre and Jean-Patrick Gelas.
      How an experimental Grid is used: The Grid5000 case and its impact on energy usage
    • Cesar Aguiar, Roberta Ulson, Daniel Cruz and Marcos Cavenaghi.
      The Application of Virtualized Multiuse Clusters for LAN Grid Network Computer Management
    • Gael Le Mahec.
      Large bio-informatics request distribution strategies over the grid.
    • Heithem Abbes, Christophe Cérin, Jean-Christophe Dubacq and Mohamed Jemni.
      Analysis of Peer-to-Peer Protocols Performance for establishing a decentralized Desktop Grid Middleware
    • John Mehnert-Spahn, Michael Schoettner, David Margery and Christine Morin.
      XtreemOS Grid Checkpointing Architecture
    • Ken Hironaka, Hideo Saito, Kei Takahashi and Kenjiro Taura.
      A Framework for Flexible Programming in Complex Grid Environments
    • Marc-Eduard Frincu, Martin Quinson and Frederic Suter.
      A Formalism for the Description of Large Scale Computing Platforms
    • Rodrigo Righi, Larcio Lima Pilla, Alexandre Carissimi and Philippe Navaux.
      Load Rebalancing in BSP Applications Using Three Metrics: Computation, Communication and Memory

Tuesday 20th May - Thesis Room (ENS)
  • "Myrinet/Ethernet 10G developments", Loic Prylli, Myricom, USA
  • "High Performance Computing", Alexandre Chauvin, IBM, France
  • "ClusterVision builds DAS-3 Grid for Dutch Universities", Christopher Huggins, ClusterVision, France
  • TBA

Wednesday 21th May - UNESCO Room (1st floor)
  • 10:30 - 12:30
    • Introduction (Chairs)
    • Opening Talk : "Scalable Systems Research: Challenges and Opportunities" by Prof. Manish Parashar, Rutgers: State University of New Jersey, USA
    • A Distributed Economic Meta-scheduler for the Grid (Kyle Chard)
    • Towards Autonomic Workflow Management in Grids (Mustafizur Rahman)
  • 14:00 - 16:00
    • Scheduling of Scientific Workflows on Data Grids (Suraj Pandey)
    • Overlapping Communication and Computation with High Level Communication Routines (Torsten Hoefler)
    • Methodologies and Tools for Exploring Transport Protocols in the Context of Highspeed Networks (Romaric Guillier)
    • Application-Level Fault-Tolerance Solutions for Grid Computing (Xoan Pardo)
  • 16:30 - 18:30
    • Performance Optimization for Multi-Agent Based Simulation in Grid Environments (Dawit Mengistu)
    • Panelists' Report on the Symposium and Open Discussion

CCGrid2008 Tutorials Program

Tutorial : Pegasus Workflow Management System
Speaker : Karan Vahi and Kent Wenger
Scientific workflows are becoming an important part of the scientific discovery process and are increasingly used to analyze large amounts of data. They capture the individual data transformation and analysis steps, as well as the mechanisms to carry them out in a distributed environment. Each step in the workflow specifies a process or computation (e.g. a web service to be invoked or a program to be executed). The steps in the workflow are linked according to the data flow and dependencies amongst them. Representing the computational analysis as a workflow allows the users to scale up computationally. It allows the users to execute the workflows on a wide variety of execution environments (from a user desktop, to a single cluster to a large computational grid such as the Open Science Grid or the TeraGrid ) depending on the size and characteristics of the analysis. At the same time, users need to be insulated as much as possible from the underlying infrastructure. Thus, users should be able to express their workflows independent of the underlying computational infrastructure. Workflow systems then can take this high level abstraction of the workflow (referred to as an abstract workflow) and generate an executable workflow that can be executed on a variety of execution environments. The executable workflow contains many details required to carry out the computational steps in the abstract workflow, including the use of specific execution and storage resources, managing the transfer of input data to the computational nodes and the transfer of output data back to the user. Workflow systems can also provide the provenance information necessary for scientific reproducibility, result publication and sharing among collaborators.

In this tutorial, we will examine the opportunities and challenges of designing and executing scientific workflows in distributed environments. We will provide an introduction to scientific workflows, their usefulness in data analysis and the challenges of running scientific workflows in a variety of execution environments. We will outline issues that need to be addressed by any workflow system in order to be able to run scientific workflows on the grid. The tutorial will also focus on issues of workflow composition - how to design workflow components that are portable across many platforms and how to define workflows at an appropriate level of abstraction.

In addition to the high-level overview of the workflow management issues, we will provide hands-on experience with the Pegasus Workflow Management System (Pegasus-WMS). The system is composed of the Pegasus Workflow Mapper and the Condor DAGMan workflow execution engine. The Pegasus Workflow Mapper takes the high-level workflow descriptions (abstract workflows) and automatically maps them to the distributed resources. Pegasus performs execution site selection, selects input data, and provides directives for data transfers and registrations. DAGMan, the Pegasus-WMS workflow executor is developed as part of the Condor project and executes the workflows be performing dependency analysis and releasing workflow jobs to the execution environment as and when they are ready for execution. DAGMan also provides error tolerance (by optionally retrying failed nodes), recovery capability (by creating a "rescue DAG" if a DAG fails), throttling (allowing limits on the number of running jobs), and other practical measures aiding in workflow execution. As part of the tutorial, the attendees will run a workflow on variety of execution environments: their own workstations, a cluster at ISI, and computational grid such as the TeraGrid.
Pegasus has been in development for more than 6 years and is in production use by several scientific applications in projects such as Southern California Earthquake Center (SCEC), Montage (an astronomy application), the Laser Interferometer Gravitational Wave Observatory (LIGO) and others. These workflows are executed on a variety of execution platforms from campus clusters to large-scale grids

Karan Vahi is a Research Programmer at USC Information Sciences Institute. He is a member of the Center for Grid Technologies at the Information Sciences Institute and works on the Pegasus Project . He received a M.S in Computer Science from University of Southern California and a B.E in Computer Engineering from Thapar University, India. Karan has been associated with the Pegasus project since its inception, first as a Graduate Research Assistant and then as a full time programmer after graduating from USC. He is currently the lead developer on it and works closely with the user community to drive its development. He is the co-author on most of Pegasus research publications. Before joining USC, Karan worked briefly at Quark Media House as a programmer. There he worked on a business workflow management product
Kent Wenger received his Bachelor of Science degree in computer sciences from the University of Wisconsin-Madison in 1990 after also studying at the University of Wisconsin-Green Bay and Northwestern University (with internships at the Mayo Clinic and IBM and work for Nicolet Instrument Corporation along the way).

After graduating, he did programming for analytical chemistry instrumentation at Extrel FTMS in Madison. In 1996, Kent moved to the UW-Madison computer sciences department as an associate researcher on the DEVise (Data Exploration and Visualization) project. Kent continues to develop the DEVise software for the BioMagResBank (a depository for quantitative data derived from NMR spectroscopic investigations of biological macromolecules). Since 2002, Kent has also worked on the Condor project, and he is now one of the main developers of the DAGMan (Directed Acyclic Graph Manager) workflow execution engine. He is currently working with the Pegasus team at the Information Sciences Institute to more closely integrate DAGMan with the Pegasus (Planning for Execution in Grids) workflow mapping engine.

Kent and his wife live in Madison near the UW campus. When he is not working, Kent can often be found cycling through the Wisconsin countryside.

Tutorial : Trusted Virtualization and Grid Security
Speakers : Volkan Erol, Bora Gungoren

As Trusted Computing comes into practical use, either by application software targeting personal use or by enterprise management software enforcing policies on users debate gains a technicaly sound base. At the same operating system virtualization has become a reality for all data centers, the reason being either green-computing or perceived additional security. Combining these two technologies in a trusted virtualization concept however complements many current approaches.
In this tutorial the audience is to be introduced to trusted virtualization and its potential applications in grid computing. A trusted computing platform is often well defined by the term "one who breaks policies", where breaking a policy refers to the platform's ability to correctly report its state. Extending this mechanisms enables us to create trusted communication channels, trusted group formations, and so on. Methods of extending the basic properties of TPMs towards such architectures will be demonstrated. Furthermore, using virtualization for platform attestation will be demonstrated through the use of EU Funded Open TC project deliverables. Finally, an extension of the authentication and privacy mechanisms found in grid security approaches will be discussed.

After BS in Computer Engineering from Galatasaray University Engineering and Technology Faculty, Volkan Erol continues his Ms studies in Computer Engineering at Bogazici University. Prior to joining TUBITAK UEKAE, he worked as software engineer in Turkcell Shubuo-Turtle project. Currently, he works as a full-time researcher in the Open Trusted Computing project. His interested research areas are Trusted Computing, Applied Cryptography, Software Development and Design, and Image Processing.
Bora Gungoren - Portakal Teknoloji
Bora Gungoren has received his BS in Electrical and Electronics Engineering from Middle East Technical University (METU). His current research interests span various fields in software engineering and economics. He has worked in both public research centers and private sector. He has published five computer science and engineering textbooks in addition to his papers and presentations. He manages Portakal Teknoloji contribution to many projects, including OpenTC and mentors several software companies. In addition to his full time work at Portakal Teknoloji and Bilkent University (where he is an instructor), he is an active member of the IEEE, an twice elected member in the Board of Directors of the Turkish Chamber of Electrical Engineers (EMO) Ankara Branch, and a former Treasurer of the Turkish Linux Users Association (LKD).

Tutorial : Simulation for Large-Scale Distributed Computing Research
Speaker : Martin Quinson
This tutorial will provide attendees with clear perspectives on the challenges for experimental research in the area of parallel and large-scale distributed computing, and on current technology for conducting experiments with real-world testbeds, emulated testbeds, or simulated testbeds. The first part of the tutorial will present and contrast current experimental methodologies, giving attendees in-depth understanding of the scientific and technological issues at hand. The second part of the tutorial will focus on simulation, giving a state of the art of current simulation technology and discussing challenges for the development of sound simulation models. The tutorial will use the SimGrid simulation framework as an exemplar since it implements sophisticated and validated simulation models. The last part of the tutorial will focus on an in-depth presentation of the different simulation approaches enabled by SimGrid, each with its specific range of applications and goals. SimGrid has been used to obtain results published in over 50 research articles and has thus emerged as one of the key tools for simulation in the area of parallel and large-scale distributed computing. Tutorial attendees will have the opportunity to gain some hands-on experience with SimGrid, by witnessing step-by-step development of small simulation projects. By the end of this tutorial attendees should have a clear understanding of current technology and best practice for experimental parallel large-scale distributed computing research, and in particular on the use of simulation.

Martin Quinson is an Associate Professor in the School of Computer Science and Applications of Lorraine at University of Nancy. His research interests are distributed, grid and internet computing. In particular his research emphasizes the development of distributed services over large-scale distributed platforms, assessing the quality of distributed applications and the experimental evaluation of distributed algorithms. He has published over 15 research articles in peer-reviewed journals and conferences. He obtained his B.S from Universite Jean Monet of Saint Etienne, France in 1999, his M.S from the Ecole Nationale Superieure of Lyon, France in 2000, and his Ph.D. from the Ecole Nationale Superieure of Lyon, France in 2003. He is a program committee member of the 2008 SIMUTools conference and the program committee chair of the 2008 ASSESS workshop on Assessing Models of Networks and Distributed Computing Platforms (to be held in conjunction with CCGrid08). Martin Quinson is one of the main developers of the SimGrid project since 2002.

Tutorial : Ibis Tutorial
Speakers : J. Maassen, Niels Drost, Rov Ban Nieuwpoort
The goal of the Ibis project is to create an efficient Java-based platform for grid computing. The Ibis project currently consists of a variety of programming models, the Java Grid Application Toolkit, the Zorilla peer-to-peer grid middleware, and the Ibis and SmartSockets communication libraries.

In this tutorial we will provide an overview of the project, explain which grid computing related problems we are trying to solve, and describe the relationship between the different sub projects. We will also give more detailed descriptions of the different components, explain the scope in which they can be used, and (if possible) show live demos. The tutorial will cover the following subjects:
  • 1) The Java Grid Application Toolkit: The JavaGAT offes a set of coordinated, generic and flexible APIs for accessing grid services from application codes, portals, data managements systems. The JavaGAT provides a uniform interface to numerous types of grid middleware, such as Globus, Unicore, SSH or Zorilla. As a result, application programmers need only learn a single API to obtain access to the entire grid. Due to its modular design, the JavaGAT can easily be extended with support for other grid middleware layers.
  • 2) Zorilla: Zorilla is prototype Peer-to-Peer (P2P) grid middleware system. It strives to implement all functionality needed to run applications on a grid in a fully distributed manner, such as scheduling, file transfer and security. Zorilla is designed to be used in situations where a full-blown grid environment is not needed, or simply not possible. Deployment of Zorilla is easy; only a single application needs to be installed on the participating machines. Zorilla requires little configuration, since machines automatically organize themselves into a grid. Due to its Peer-to-Peer design, Zorilla scales to large numbers of machines.
  • 3) Programming Models: Ibis offers several programming models, such as MPJ, an MPI-like message passing for Java applications, and Satin, a divide and conquer style language which maps cleanly to grid systems, contains an efficient and simple load-balancing algorithm and provides efficient and transparent fault-tolerance, malleability and migration to the application.
  • 4) Communication Libraries: Ibis offers two communication libraries, SmartSockets, and the IPL. SmartSockets is designed to automatically discover and solve connectivity problems such as firewalls, network address translation (NAT), non-routed networks or multi homing. The IPL offers a high level communication model which is specifically designed for usage in a grid environment. It supports several communication patterns, such as unicast, multicast or many-to-one and offers extensive support for malleability. The Ibis Communication Library is capable of keeping track of which machines participate in a computation. If desired, the application can be notified when resources join or leave.
  • 5) Real life applications: At the end of the tutorial, we will give a detailed description of some real-life examples of how the software developed in the Ibis project is used. These examples include medical data analysis, image recognition, a SAT solver, and a N-Body simulation demo.

Jason Maassen obtained his PhD at the VU University Amsterdam in 2003 on "Method Invocation Based Programming Models for Parallel Programming in Java". He is now working as a postdoc at the same University. His current research is part of the Starplane Project, and examines how grid applications can benefit from using a reconfigurable photonic network.
Rob van Nieuwpoort obtained his PhD at the VU University Amsterdam in 2003 on "Efficient Java- Centric Grid-Computing". He is now working as a postdoc at both the VU University and ASTRON (Netherlands Foundation for Research in Astronomy). The focus of his current research is on the development of parallel astronomy codes for emerging platforms, such as the Cell processor and GPUs (Graphical processors).
Niels Drost is currently a PhD student at the VU University Amsterdam working on Peer-to-Peer supercomputing. The goal of his research is to build a system which enables people to use the processing power of large numbers of machines without having to deal with complex configuration and maintenance issues.

Tutorial : Market-Oriented Grid Computing and the Gridbus Middleware
Speaker : R. Buyya - Grid Computing and Distributed Systems Laboratory, The University of Melbourne, Australia
Grid computing, one of the latest buzzwords in the ICT industry, is emerging as a new paradigm for Internet-based parallel and distributing computing. Grids aim at exploiting synergies that result from cooperation of autonomous distributed entities. The synergies that result for Grid cooperation include the sharing, exchange, selection, and aggregation of geographically distributed resources for solving large-scale problems in science, engineering, and commerce. At the same time, the Grid community has embraced the integration of commodity Web services and Grid technologies along with adoption of utility oriented computing model. The recent widespread interested in Grid computing from commercial organisations is pushing it towards mainstream computing and Grid services to become valuable economic commodities.

Despite a number of advances, resource management in Grid computing continues to be a challenging and complex undertaking as resources autonomous and participants (service providers and consumers) have different goals, objectives, strategies, and requirements. Market-oriented Grid computing emerged as an effective solution to address these challenges as it (1) enables the regulation of supply and demand for resources, (2) provides economic incentive for Grid service providers, and (3) motives Grid service consumers to trade-off between deadline, budget, and the required level of quality-of-service.

This tutorial introduces fundamental principles of Grid computing and market-based economic models and their impact on next-generation Grid systems. It identifies resource management challenges and introduces new challenges and requirements introduced by the Grid economy on Grid service providers and consumers. The tutorial presents a Service-Oriented Grid Architecture (SOGA) inspired by SOA and market models and demonstrates how it can be realized by leveraging the existing Grid technologies and building new economic-oriented capabilities and components. The tutorials presents solutions to these challenges based on our experience in designing and developing market-oriented computational and data Grid technologies such as the Gridbus middleware (Grid Market Directory, Grid Bank, Grid Service Broker, Workflow Engine) and their utilization in driving e-Science and e-Business applications such as molecular docking, high-energy physics, and portfolio analysis.

Dr. Rajkumar Buyya is an Associate Professor of Computer Science and Software Engineering; and Director of the Grid Computing and Distributed Systems Laboratory at the University of Melbourne, Australia. He has authored over 210 publications and three books. The books on emerging topics that Dr. Buyya edited include, High Performance Cluster Computing (Prentice Hall, USA, 1999) and Market-Oriented Grid and Utility Computing (Wiley, 2008). Dr. Buyya has contributed to the creation of high-performance computing and communication system software for Indian PARAM supercomputers. He has pioneered Economic Paradigm for Service-Oriented Grid computing and developed key Grid technologies such as Gridbus that power the emerging e-Science and e-Business applications. He received "Research Excellence Award" from the University of Melbourne for productive and quality research in computer science and software engineering in 2005. The Journal of Information and Software Technology in Jan 2007 issue, based on an analysis of ISI citations, ranked Dr. Buyya's work (published in Software: Practice and Experience Journal in 2002) as one among the "Top 20 cited Software Engineering Articles in 1986-2005". He received the Chris Wallace Award for Outstanding Research Contribution 2008 from the Computing Research and Education Association of Australasia, CORE, which is an association of university departments of computer science in Australia and New Zealand.

Tutorial : GRelC DAIS: A P2P Framework for Data Access and Integration in Grid
Speaker : S. Fiore and A. Negro - Euro-Mediterranean Centre for Climate Change (CMCC) and 2 SPACI Consortium, Italy
Grids encourage and promote the publication, sharing and integration of scientific data, distributed across Virtual Organizations. Scientists and researchers (bioinformatics, astrophysics, etc.) work on huge, complex and growing datasets. The complexity of data management within a grid environment comes from the distribution, heterogeneity and number of data sources. Along with coarse grained services (basically grid storages, replica services, storage resource managers, etc), there is a strong interest on fine grained ones concerning, for instance, grid-database access and management. Moreover, as grid computing, technologies and standards evolve, more mature environment (production grids such as EGEE) become available for production based activities and tools/services able to access in grid to relational databases are also strongly required.

Within the proposed tutorial we will talk in detail about the Grid Relational Catalog (GRelC) Project, an integrated environment for grid database management, highlighting the vision/approach, architecture, components, services and technological issues.

The key topic of the tutorial as well as of the demo will be the GRelC Data Access and Integration Service. The GRelC DAIS is a GSI/VOMS enabled web service addressing extreme performance, interoperability and security. It efficiently, securely and transparently manage databases on the grid across VOs, with regard to emerging and consolidated grid standards and specifications as well as production grid middleware (gLite & Globus). It provides a uniform access interface, in grid, both to access and integrate relational (Mysql, Oracle, Postgresql, IBM/DB2, SQLite) and non-relational data sources (XML DB engines such as eXist, XIndice and libxml2 based documents). The GRelC DAIS provides:
  • basic functionalities (query submission, grid-db management, user/VO/ACL management, etc.) to access and manage grid-databases;
  • efficient delivery mechanisms leveraging streaming, chunking, prefetching, etc. to retrieve data from databases in grid providing high level of performance (in terms of query response time, number of concurrent accesses, etc.);
  • advanced functionalities to transparently and securely integrate heterogeneous, distributed and geographically spread grid data sources (through P2P connected GRelC DAIS nodes);
  • additional functionalities such as asynchronous queries,
  • a data Grid Portal (GRelC Portal) to ease the access, management and integration of grid-databases, as well as user/VO/ACL management, etc.

    The GRelC DAIS is very versatile so it can be used both at VO and site level. It can/was used in both ways depending on VO/user/database constraints and requirements. There is no single point of failure and no centralized management for this service due to the scalable P2P architecture. After the talk, during the demo we will show how the GRelC DAIS successfully provides a data access and integration service for gLite. We will demonstrate how distributed DB integration, distributed accounting and monitoring can easily be performed by using the GRelC Portal (web interface). Today the GRelC DAIS is part of the GILDA release (EGEE t-Infrastructure) and is candidate at the EGEE Respect Program since is tightly coupled with the gLite middleware, the EGEE architecture and the EGEE Training Infrastructure. Currently the GRelC DAIS is used as the Euro-Mediterranean Centre for Climate Change (CMCC) Data Grid framework.

Sandro Luigi Fiore was born in Galatina (ITALY) in 1976. He received a summa cum laude Laurea degree in Computer Engineering from the University of Lecce (Italy), in 2001. He received a PhD degree in Informatic Engineering - Innovative Materials and Technologies from the ISUFI-University of Lecce, Italy, in 2004. His research activity focuses on Parallel and Distributed Computing, specifically with regard to Advanced Grid Data Management. Since 2004, he is a member of the Center for Advanced Computational Technologies (CACT) of the University of Salento and technical staff member of the SPACI Consortium. Since 2001 he is the Project Principal Investigator of the Grid Relational Catalog project ( He was directly involved within the EGEE project (Enabling Grids for E-science) and currently he is involved within the EGEE-II project and within other national projects (LIBI). Since June 2006, he leads the Data Grid group within the Euro-Mediterranean Centre for Climate Change (CMCC) in Lecce (Italy). He is author and co-author of more than 40 papers in refereed journals/proceedings on parallel & grid computing as well as of 1 patent concerning advanced data management.