ACCEPTED PAPERS

 Title An Artificial Immune System for Efficient Comparison-Based Diagnosis of Multiprocessor Systems Authors Mourad Elhadef Location School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada Abstract The problem of identifying faulty processors (or units) in diagnosable systems is considered. For the purpose of diagnosis, a system composed of interconnected independent heterogeneous processors is modeled using a comparison graph, where tasks are assigned to pairs of processors and the results are compared. The agreements and disagreements among the units are the basis for identifying faulty processors. It is assumed that at most t processors can fail at the same time and that faults are permanent. In this paper, we introduce a new artificial-immune-based diagnosis approach using the comparison approach. The new approach has been implemented and evaluated using randomly generated diagnosable systems. Simulations results indicate that the immune approach is a viable addition to present diagnosis problems.
 Title A Comparative Simulation Analysis of P2P System Architectures Authors Anjan Mahanta Thanaruk Theeramunkong Location Sirindhorn International Institute of Technology, Thammasat University, Bangkadi, Muang, Phathumthani, Thailand Abstract This paper gives a comparative study on multi-layer file sharing mechanisms in Peer-to-Peer (P2P) systems. Based on a well-known P2P system named Gnutella, two system architectures, 1-layer and 2-layer Ngnu architectures are proposed in order to reduce network traffic and to gain better scalability. These architectures can be applied to large-scale systems, such as e-government, e-office and e-library, which need an efficient file-sharing mechanism. To examine the efficiency and effectiveness of the proposed methods, a number of simulations are made with respect to three factors: time-to-live’s (TTLs), the number of nodes and the number of queries. To compare these systems, the number of messages and the mean query hit are measured and compared with the original flat Gnutella system.
 Title An Efficient Load Balancing Scheme for Grid-based High Performance Scientific Computing Authors Arun Kejariwal Alexandru Nicolau Location University of California at Irvine Abstract With the emergence of computational grids, there has been a dramatic increase in the number of available processing and storing resources available for parallel execution of large-scale compute and data intensive scientific applications. However, large computing power in itself is not sufficient for high performance computing (HPC). In this context, (application) partitioning and load balancing strategies play a critical role in meeting the high performance requirements and in achieving high processor utilization. In HPC applications such as molecular simulations, protein synthesis, drug design et cetera parallel loops constitute the greatest percentage of program parallelism. The degree to which parallelism can be exploited during parallel execution of a nested loop directly depends on partitioning and load balance, i.e., the number of iterations mapped onto each processor, between the different processors. Thus, partitioning of parallel loops is of key importance for grid-based high performance scientific computing. Although a significant amount of work has been done in partitioning of iteration spaces of nested loops, both rectangular and non-rectangular iteration spaces, for homogeneous multiprocessor systems, the problem of partitioning of iteration spaces for heterogeneous systems has not been given enough attention so far. In this paper, we present a geometric approach for partitioning N-dimensional non-rectangular iteration spaces for optimizing performance on heterogeneous parallel processor systems. Speedup measurements for kernels (loop nests) of linear algebra packages are presented.
 Title Collaborative Data Distribution with BitTorrent for Computational Desktop Grids Authors Baohua Wei Gilles Fedak Franck Cappello Location Laboratoire de Recherche en Informatique/INRIA Futurs Abstract Data-centric applications are still a challenging issue for Large Scale Distributed Computing Systems. The emergence of new protocols and softwares for collaborative content distribution over Internet offers a new opportunity for efficient and fast delivery of high volume of data. This paper presents an evaluation of the BitTorrent protocol for Computational Desktop Grids. We first present a prototype of a generic subsystem dedicated to data management and designed to serve as a building block for any Desktop Grid System. Based on this prototype we conduct experimentations to evaluate the potential of BitTorrent compared to a classical approach based on FTP data server. The preliminary results obtained with a 65-nodes cluster measure the basic characteristics of BitTorrent in terms of latency and bandwidth and evaluate the scalability of BitTorrent for the delivery of large input files. Moreover, we show that BitTorrent has a considerable latency overhead compared to FTP but clearly outperforms FTP when distributing large files or files to a high number of nodes. Tests on a synthetic application show that BitTorrent significantly increases the communication/computation ratio of the applications eligible to run on a Desktop Grid System.
 Title Cellular Nonlinear Network Grids – a new Grid Computing Approach Authors Roland Kunz Ronald Tetzlaff Location IBM Deutschland GmbH, Institute of applied physics, University of Frankfurt, Germany Abstract Throughout this paper a catenation between the universal paradigm of Cellular Nonlinear Networks (CNN) and the innovative approach of grid computing in the on-demand era is given. CNN are a massive parallel solution for solving non-linear problems, modelling complex phenomena in medicine, physics and data analysis as well as for powerful image processing and recognition systems. They usually are simulated on local computer systems or build as dedicated VLSI-implementations. However, the research of complex CNN structures and settings require massive computing power and thus can benefit from multi-system open architectures which can be provided by the grid approach. Propositions of two different realizations with grid architecture in mind are given by introducing an algorithm of implementing such methods in a CNN software simulator. First a brief introduction to CNN is given. Afterwards, problems for the current determination of such networks are discussed and routes are shown, how this field of research can benefit from the opportunities, grid computing can deliver.
 Title Load Balancing Algorithm in Cluster-based RNA secondary structure Prediction Authors Guangming Tan Shengzhong Feng Ninghui Sun Location The Institute of Computing Technology, Chinese Academy of Sciences Abstract RNA secondary structure prediction remains one of the most compelling, yet elusive areas of computational biology. Many computational methods have been proposed in an attempt to predict RNA secondary structures. A popular dynamic programming (DP) algorithm uses a stochastic context-free grammar to model RNA secondary structures, its time complexity is O(N4) and spatial complexity is O(N3). In this paper, a parallel algorithm, which is time-wise and space-wise optimal with respect to the usual sequential DP algorithm, can be implemented using O(N4/P) time and O(N3/P) space in cluster. High efficient utilization of processors and good load balancing are important to the performance of parallel algorithms in cluster systems. Two parallel DP algorithms, which have different mappings of the DP matrix to processors, are evaluated concerning running time. As experiments show, dynamic mapping of DP matrix can achieve better load balancing than the static and improve the efficiency of processors. Thus, the dynamic mapping algorithm is faster and gets better speedups.
 Title Parallel Jess Authors Dana Petcu Location Western University of Timisoara, Romania Abstract Parallel production or rule-based systems, like a parallel version of Jess, are needed for real applications. The proposed architecture for a such system is based on a wrapper allowing the cooperation between several instances of Jess running on different computers. The system has been designed having in mind the final goal to speedup current P system simulators. Preliminary tests show its efficiency in this particular case and on classical benchmarks.
 Title A Memory-Efective Fault-Tolerant Routing Strategy for Direct Interconnection Networks Authors Maria Engracia Gomez Pedro Lopez Jose Duato Location Universidad Politécnica de Valencia Abstract In massively parallel computing system, high-performance interconnection networks are crucial. Routing is one of the most important design issues of interconnection networks. Moreover, the huge amount of hardware of large machines makes fault-tolerance another important design issue. In this paper, we propose a mechanism that combines scalable routing without forwarding tables and fault-tolerance for commercial switches to build direct regular topologies, which are the topologies used in large machines. The mechanism is very flexible and the hardware required is not complex. Furthermore, it allows a high degree of fault-tolerance inflicting a minimal decrease of performance.
 Title Performance Effective Task Scheduling Algorithm for Heterogeneous Computing System Authors E. Ilavarasan P. Thambidurai R. Mahilmannan Location Department of Computer Science & Engineering and Information Technology, Pondicherry Engineering College, Pondicherry, India Abstract Finding an optimal solution to the problem of scheduling an application modeled by a Directed Acyclic Graph (DAG) onto a distributed system is known to be NP-complete. The complexity of the problem increases when task scheduling is to be done in a heterogeneous computing system, where the processors in the network may not be identical and take different amounts of time to execute the same task. This paper introduces a Performance Effective Task Scheduling (PETS) Algorithm for Network of Heterogeneous system, with complexity O (v+e) (p+ log v), which provides optimal results for applications represented by DAGs. The performance of the algorithm is illustrated by comparing the schedule length, speedup, efficiency and the scheduling time with existing algorithms such as, Heterogeneous Earliest Finish Time (HEFT) and Critical-Path On a processor (CPOP) and Levelized Matching and Scheduling reported in this paper. The comparison study based on both randomly generated graphs and graphs of some real applications such as Gaussian Elimination, Fast Fourier Transformation and Molecular Dynamics code shows that PETS algorithm substantially outperforms existing algorithms.
 Title Black Holes Location in Arbitrary Graphs: Algorithmic Aspect Authors Rodrigue-Bertrand Ossamy Location LaBRI, ENSEIRB, University of Bordeaux 1 Abstract Security, especially the attacks performed by hosts to the visiting mobile agents (the malicious hosts problem), is a major problem in mobile agents systems. Being the running environment for mobile agents, the host has full control over them and could easily perform many kinds of attacks against them. In this work we are specially interested in the presence of a textit{black hole} in the network. That is, a stationary process that destroys any incoming agent without leaving any trace. The textit{black hole} search problem is the one of assembling a team of asynchronous mobile agents to successfully identify the location of the textit{black hole}. We are concerned with topology-independent solutions where the network is not necessary anonymous. We also establish that the number of agents also depends on the initial placement of the agents and on the number of textit{black holes} in the network. Finally, we prove that in an anonymous network where each agent has its own homebase, it is not possible to solve the textit{black hole} search problem, if the network size is unknown.
 Title A Strategyproof Mechanism for Scheduling Divisible Loads in Distributed Systems Authors Daniel Grosu Thomas E. Carroll Location Dept. of Computer Science, Wayne State University, Detroit, Michigan, USA Abstract An important scheduling problem is the one in which there are no dependencies between tasks and the tasks can be of arbitrary size. This is known as the divisible load scheduling problem and was studied extensively in recent years resulting in a cohesive theory called \emph{Divisible Load Theory} (DLT). In this paper we augment the existing divisible load theory with incentives. We develop a strategyproof mechanism for scheduling divisible loads in distributed systems assuming a bus type interconnection and a linear cost model for the processors. The mechanism provides incentives to processors such that it is beneficial for them to report their true processing power and process the assigned load using their full processing capacity. We define the strategyproof mechanism and prove its properties. We simulate and study the implementation of the mechanism on systems characterized by different parameters.
 Title Measuring and improving quality of parallel application monitoring based on global states Authors Janusz Borkowski Location Polish-Japanese Institute of Information Technology Abstract Effectiveness of parallel/distributed application control based on predicates defined on application global states depends on the performance of an underlying Consistent Global State monitoring mechanism. Focusing on Strongly Consistent Global States (SCGS) usage, we introduce three measures of SCGS monitoring quality. They reflect delay, state information ageing and state inspection frequency experienced by a monitor. Using the measures, four SCGS monitoring algorithms are compared, including two novel algorithm variants. The comparison is carried out with the use of simulations. The newly introduced algorithms prove to perform much better than a standard SCGS algorithm.
 Title A New Join Algorithm for Cluster-Based Database Systems Authors Kenji Imasaki Sivarama Dandamudi Location School of Computer Science, Carleton University, Canada Abstract This paper focuses on cluster-based parallel database systems in which only one of the nodes has the database and the other nodes, which have no initial data, are used for parallel query processing. In such a system, the load of each node changes dynamically depending on the activities of the local users. In addition, in database query processing, data skew exists. Thus, it is very important to develop efficient load balancing/sharing algorithms. This paper proposes a new join algorithm called Symmetric Chunking Hash Join (SCHJ) that divides the hash buckets into chunks and uses them for load balancing. The SCHJ algorithm is compared with a dynamic round-robin algorithm and a sampling algorithm. The results show that the SCHJ algorithm is the best among these algorithms when there is data skew and background load variations
 Title A Compiler-Directed Energy Saving Strategy for Parallelizing Applications in On-Chip Multiprocessors Authors Juan Chen Xue-jun Yang Yong Dong Dan Wu Location School of Computer Science, National University of Defense Technology, Changsha China Abstract As energy consumption becoming one of the key optimization objects in on-chip multiprocessor, parallel compilation techniques that optimize performance and energy simultaneously are placed an important role. Thus, compiling a parallelizing application combined with energy saving strategy is more significant. In this paper, we focus on an on-chip multiprocessors architecture and each processor in on-chip multiprocessor can independently scale its frequency and voltage (here we provide several pre-selected discrete frequency values and voltage levels), which provides us an efficient energy saving approach. Given an array-intensive application, we simulate parallelizing application and analyze probable load imbalance; then our energy saving strategy determines each processor¡¯s clock frequency and voltage fit for each parallel fragment in terms of load imbalance. Here, parallel fragments mainly denote parallel loop nests. Further, we consider the serial code fragments as a severe load-unbalanced parallel partitioning and the redundant processors can be shut down during the serial codes execution process. Such adaptive frequency/voltage scaling for the parallel code fragments and shutting down processors for the serial code fragments form our main energy saving strategy. Initial experiment proves our energy saving strategy is successful in reducing the energy consumption of the parallel programs.
 Title Enforcing consistency during the adaptation of a parallel component Authors Jérémy Buisson Françoise André Jean-Louis Pazat Location IRISA/INSA de Rennes IRISA/Université de Rennes 1 IRISA/INSA de Rennes Abstract As Grid architectures provide execution environments that are distributed, parallel and dynamic, applications require to be not only parallel and distributed, but also able to adapt themselves to their execution environment. This article presents a model for designing self-adaptable parallel components that can be assembled to build applications for Grid. This model includes the definition of a consistency criterion for the dynamic adaptation of SPMD components. We propose a solution to implement this criterion. It has been evalued on both synthetic and real codes to exhibit the behavior of the several proposed strategies.
 Title Evolutionary Tuning for Distributed Database Performance Authors Anca Gog Horea-Adrian Grebla Location Babes-Bolyai University, Department of Computer Science Abstract Modern database management systems require modern design and administration solutions. In such a system the execution process of the queries implies accurate estimations and predictions for performance characteristics. The performance problems of data allocation and query optimization in distributed database systems done by means of mobile agents and evolutionary algorithms are considered. These problems still present a challenge because of the dynamic changes in number of components and architectural complexity of nowadays system topologies. The distributed system is modeled as a graph structure on which is defined a dynamic cost vector. The cost vector remains consistent, relevant, by use of mobile agents performing cost statistics and vector updates. An evolutionary algorithm is proposed to solve this NP-Complete problem. Experimental results prove the efficiency of the proposed technique.
 Title Towards Scalable Grid Replica Optimization Framework Authors Marek Ciglan Ladislav Hluchý Location Institute of informatics, Slovak academy of sciences Abstract Data Grids are becoming a popular infrastructure for distributed, data intensive scientific application. Data access optimization is one of the key features of such systems, as the transfer and access to the huge amounts of data are often the bottleneck of the data intensive tasks. Concept of replication, creation of multiple copies of a data source across multiple grid nodes, was adopted by grid community to increase data availability. In this paper, we present the ongoing work on the grid replica optimization framework that addresses both short-term and long-term replica optimization problems. We describe high level architecture of the optimization framework and discuss the promising results achieved by prototype implementation of several system components.
 Title Embedded cost model in mobile agents for large scale query optimization Authors Mohammad Hussein Franck Morvan Abdelkader Hameurlain Location IRIT Loboratory, Paul Sabatier University, Toulouse, France Abstract In the context of heterogeneous and distributed data sources in large scale, the execution plans generated by traditional optimizers can be sub-optimal because: the estimations are inaccurate, the unavailability of data and the execution environment is unstable. To deal with the sub-optimality of an execution plan, an approach is to execute each relational operator of an execution plan by a mobile agent. This mobile agent can migrate from a site to another to execute. The stored information in the site, where the agent is situated, can be insufficient to calculate the site of migration of the agent. In this paper, we will investigate to integrate a cost model into the mobile agent executing an operator. The performance evaluation shows that the cost model embedded in the agent allows him to choose its most appropriate site of migration.
 Title GMRES Method on Lightweight GRID System Authors Haiwu He Guy Bergère Serge Petiton Location Laboratoire d’Informatique Fondamentale de Lille, Université de Lille I, France Abstract Abstract-Grid computing accomplishes high throughput computing by using a very large number of unexploited computing resources. We present a parallel method GMRES to solve large sparse linear systems by the use of a lightweight GRID XtremWeb. This global computing platform, just as many popular GRID systems, is devoted to multi-parameters generic applications. We have implemented an important algorithm GMRES which is one the key method to resolve large, non-symmetric, linear problems. We discuss the performances of this algorithm deployed on two XtremWeb networks: a local network with 128 non-dedicated PCs in Polytech-Lille of University of Lille I, a remote network with 4 clusters of SCGN Grid that includes 91 CPUs totally in the High Performance Computing Center of University of Tsukuba. We compare these performances with those of a MPI implementation of GMRES on the same platform.We present the advantages and shortcomings of our implementation on this lightweight GRID XtremWeb.
 Title Progressive Clustering for Database Distribution on a Grid Authors Valerie Fiolet Bernard Toursel Location Service Informatique, University of Mons-Hainault, Mons, Belgium Laboratoire d'Informatique Fondamentale de Lille (UMR CNRS 8022), University of Lille 1, France Abstract The increasing availability of clusters and grids of workstations allows to bring cheap and powerful ressources for distributed datamining. But it appears necessary to invent new algorithms adapted to this kind of environement, especially according to data distribution and the use of this distribution in treatment. Then the need of an intelligent fragmentation of data appears.This distribution could derived from clustering. This paper presents a new clustering algorithm called Progressive Clustering that allows to execute a clustering treatment in an efficient and incremental distributed way. Fragmentation of data resulting from this method could be used to distribute data mining treatments.
 Title Managing MPICH-G2 Jobs with WebCom-G Authors Padraig J. ODowd Adarsh Patil John P. Morrison Location Computer Science Dept, UCC, Cork Ireland. Abstract This paper discusses the use of WebCom-G to handle the management & scheduling of MPICH-G2 (MPI) jobs. Users can submit their MPI applications to a WebCom-G portal via a web interface. WebCom-G will then select the machines to execute the application on, depending on the machines available to it and the number of machines requested by the user. WebCom-G automatically & dynamically constructs a RSL script with the selected machines and schedules the job for execution on these machines. Once the MPI application has finished executing, results are stored on the portal server, where the user can collect them. A main advantage of this system is fault survival, if any of the machines fail during the execution of a job, WebCom-G can automatically handle such failures. Following a machine failure, WebCom-G can create a new RSL script with the failed machines removed, incorporate new machines (if they are available) to replace the failed ones and re-launch the job without any intervention from the user. The probability of failures in a Grid environment is high, so fault survival becomes an important issue.
 Title A study of meta-scheduling architectures for high throughput computing: Pull vs. Push Authors Vincent Garonne Andrei Tsaregorodtsev Eddy Caron Location Centre de Physique des Particules de Marseille CNRS-IN2P3, France Centre de Physique des Particules de Marseille CNRS-IN2P3, France LIP/ENS Lyon, France Abstract In this paper we present a model and a simulator for large-scale system. Such platforms are composed of heterogeneous clusters of PCs belonging to a local network. These clusters are then connected to each other through a global network. Moreover each cluster is managed via a local scheduler and is shared by many users. We validate our simulator by comparing the experimental results and the analytical results of a M/M/4 queuing system. These studies indicate that the simulator is consistent. After that we do the comparison with a real batch system and we obtain a mean error of 10.5 % for the response time and 12 % for the makespan. We conclude that our simulator is realistic and describes well the behavior of a large-scale system. Thus we can study the scheduling of our system called dirac in a high throughput context. We justify our decentralized, adaptive and opportunistic approach in comparison to a centralized approach in such a context.
 Title Service Driven Mobile Ad Hoc Networks Formation and Management Authors Dan Grigoras Mark Riordan Location Computer Science Department, UCC, Cork, Ireland Abstract Self-organizing mobile ad hoc networks (MANET) allow devices to share their services and resources. MANET applications as diverse as multi-player games, disaster recovery, cooperative computing and mobile patient monitoring explain the high interest in these dynamic systems. However, the formation and management of MANET are challenging tasks due to network’s features like mobility, heterogeneity, limited resources and feeble communication. This paper presents a simple, cost-effective and resilient algorithm for the basic tasks of MANET creation and management.
 Title Towards introducing code mobility on J2ME Authors Laurentiu Lucian Petrea Dan Grigoras Location Department of Computer Science University College Cork, Cork, Ireland Abstract This paper makes the case why mobile code is an appropriate solution for platforms running on mobile devices aggregated in mobile ad-hoc networks. It also presents a solution that considers the introduction of mobile code using remote class loading in the J2ME/CLDC profile. The principal problem of remote class loading revolves around trusting the remote code in accessing system resources. After a review of the algorithms which are used to accomplish access control, a mechanism for the CLDC is proposed that is based on the history of the access control.
 Title Ghost Process: a Sound Basis to Implement Process Duplication, Migration and Checkpoint/Restart in Linux Clusters Authors Geoffroy Vallee Renaud Lottiaux David Margery Christine Morin Jean-Yves Berthou Location ORNL/INRIA/EDF IRISA/INRIA IRISA/INRIA IRISA/INRIA EDF R&D Abstract Process management mechanisms (process duplication, migration and checkpoint/restart) are very useful for high performance and high availability in clustering systems. The single system image approach aims at providing a global process management with mechanisms of process checkpoint, process migration and process duplication. In this context, a common mechanism of process virtualization is very useful in this respect but traditional operating systems do not provide such a system. This paper presents a kernel service for process virtualization called ghost process, extending the Linux kernel. The ghost process mechanism has been implemented in the Kerrighed single system image based on Linux.
 Title A Component-based Software Infrastructure for Ubiquitous Computing Authors Areski Flissi Christophe Gransart Philippe Merle Location CNRS/LIFL INRETS - LEOST INRIA Abstract Multiplication of mobile devices and generalized use of wireless networks imply changes on the design and execution support of distributed software applications targeting ubiquitous computing. Many strong requirements have to be addressed: heterogeneity and limited resources of wireless networks and mobile devices, networked communications between distributed applications, dynamic discovery and automatic deployment on mobile devices, etc. In this paper, we present a component-based software infrastructure to design, discover, deploy, and execute ubiquitous contextual services, i.e. distributed applications providing services to mobile end-users but only available from a particular place. These ubiquitous contextual services are designed as assemblies of distributed software components. These assemblies are dynamically discovered according to end-users' physical location and device capabilities. Then, appropriate assemblies are automatically deployed on users' devices. This approach (the software infrastructure and a ubiquitous application example) are implemented on top of the OMG CORBA Component Model and using the OpenCCM open source platform.
 Title Data distribution for failure correlation management in a Peer to Peer Storage Authors Cyril Randriamaro Olivier Soyez Gil Utard Francis Wlazinski Location Laria UPJV Abstract This article presents a dynamic algorithm for data storage in a P2P storage system, named Us. In Us, peers are arranged in groups called meta-peers to take into account failure correlation. When a peer fails, a reconstruction process rebuilds lost data with help from others peers. To minimize end user traffic because of the reconstruction process, distribution must take into account a new measure: the maximum disturbance cost of a peer during the reconstruction process. The disturbance cost is indicated by the number of data communications which are requested from a single peer for rebuilding lost data. The main goal of this article is to define algorithm, that takes into account failure correlation, able to dilute dynamically the reconstruction process in the system.
 Title Extending OBDD Graphs for Composite Event Matching in Content-based Pub/Sub Systems Authors Gang Xu Location Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing China Abstract Content-based publish/subscribe offers a convenient abstraction for the information producers and consum-ers, supporting a large-scale system design and evolu-tion by integrating several distributed independent application systems. Unlike in the traditional address-based unicast or multicast, its core problem is how to matching events by predicates on the content of events. In existing matching approaches, matching predicates are composed by the conjunction and disjunction of non-semantic constraints. But, in context of enterprise application integration, although they can match events by their contents, this traditional matching predicates are not enough expressive in manipulating the complex event matching, such as the ¡°one-to-many¡± and ¡°many-to-one¡± matching. Therefore, tradi-tional matching approaches should be extended to solve the complex matching problems. After analyzing information matching patterns in enterprise applica-tion integration, we propose three matching models, extend the simple matching to the multi-semantic matching and introduce the temporal constraint vari-able. The multi-semantic matching allows using differ-ent operations in accordance with different semantics; the temporal constraint variable supports processing the discrete events in the temporal sequence. Then, we extend OBDD graphs into hierarchy coloured OBDD graphs and prove the equivalence of the transforma-tion. Based on the extended OBDD graphs, the com-posite matching algorithm is presented and analysed. By experiments, we show the proposed algorithm is efficient.
 Title On the Implementation and Evaluation of Berkeley Sockets on Maestro2 cluster computing environment Authors Ricardo Miguel Costa Guapo Shinichi Yamagiwa Leonel Augusto Pires Seabra de Sousa Location INESC-ID, Lisboa Portugal Abstract The support on cluster environments of "legacy protocols" is important to avoid rewriting the code of applications, which must be implemented by maximizing the communication performance. This paper addresses this issue by implementing the Berkeley sockets interface over the MMP message passing protocol, which is the lower layer of the Maestro2 cluster communication network. Experimental results show that MMP-Sockets offers a minimum latency of 25$\mu$s and a maximum throughput of 1250Mbps. These values correspond to a relative increase of 80% for the latency and a decrease of about 30% for the throughput, regarding to a communication based only on MMP. However, MMP-Sockets increases the compatibility and portability of developed applications.
 Title Active zero-copy: A performance study of non-deterministic messaging Authors Shinichi Yamagiwa Keiichi Aoki Koichi Wada Location INESC-ID, Lisboa Portugal Department of Computer Science, University of Tsukuba,  Tsukuba, Ibaraki,  Japan. Abstract Zero-copy communication exchanges the message among the buffers allocated and locked ahead. This communica-tion style fits into applications that the communication timings and the message sizes are known in its initialization. However, another application with non-deterministic messaging can not fit into the style because the sizes and timings of its messages that change at every transaction. This paper proposes a new zero-copy communication style for this kind of application, called ac-tive-zero copy. The active zero-copy receives messages without the pre-allocation of the buffers. The performance evaluation with the active zero-copy comparing with the conventional zero-copy when it is applied to the applications with non-deterministic messaging will be shown.
 Title Java Byte Code Scheduling Based on the Most-Often-Used-Paths in Programs with Branches Authors Eryk Laskowski Marek Tudruj Richard Olejnik Bernard Toursel Location Institute of Computer Science, Polish Academy of Sciences Institute of Computer Science, Polish Academy of Sciences Laboratoire d’Informatique Fondamentale de Lille, USTL Laboratoire d’Informatique Fondamentale de Lille, USTL Abstract The paper presents an introductory optimization algorithm, which can be performed before a Java program is executed in a parallel system. Taking a sequential multithreaded version of a Java program as input information, the aim of the parallel program optimization is to determine an initial distribution of objects on virtual machines so as to decrease direct inter-object communication and balance loads of the virtual machines. Object placement optimization algorithm uses a graphical representation of control dependencies and data dependencies among methods in the Java program. These dependencies are discovered by an analysis of program byte code and stored in the form of relevant macro/dataflow graphs. Placement optimization algorithm tries to optimally assign the macro nodes to processors (JVMs) so as to reduce inter-processor communication overheads. The optimization methods first do clustering of macro nodes on unlimited number of processors (logical JVMs) to reduce the execution time of the clustered nodes. Next, merging of the assigned clusters is performed to reduce the number of logical JVMs to the number of real processors. The presented approach is supported by dynamic, on-line load balancing mechanism, which applies object migration as proposed in the ADAJ (Adaptive Distributed Applications in Java) project.
 Title Task Scheduling for Dynamic SMP Clusters with Communication on the Fly for Bounded Number of Resources Authors Lukasz Masko Location Institute of Computer Science, Polish Academy of Sciences, Poland Abstract The paper presents an algorithm for scheduling parallel programs in a parallel architecture based on multiple dynamic SMP clusters. In this architecture, processors are organized in clusters around shared memory modules. A set of processors connected to the same memory module constitute such cluster. Processors can be dynamically switched between modules in the runtime. Due to technical problems, memory modules and processors are organized in computational units of a fixed size, connected with a local communication network implemented in a Network-on-Chip technology (NoC module). Processors located in the same module can share their data and communicate using a technique of data transfers on the fly. A number of such modules can be connected using a global interconnection network to form a larger infrastructure. The presented algorithm schedules initial macro data-flow program graph for such an architecture with a given number of NoC modules, considering a fixed size of a NoC module. First, it distributes program graph nodes among processors, assuming no division between NoC modules. Then it structures and schedules all computations and communications to utilize processor switching and read on the fly facilities. Finally, using genetic algorithm, it divides the whole set of processors to subsets of a given size, which then are mapped to separate NoC modules.
 Title A Distributed Prime Sieving Algorithm based on Scheduling by Multiple Edge Reversal Authors Gabriel Paillard Christian Lavault Felipe França Location LIPN - Université Paris Nord - France LIPN - Université Paris Nord - France PESC - Universidade Federal do Rio de Janeiro - Brazil Abstract In this article, we propose a fully distributed algorithm for finding all primes in an given interval [2..n] (or (L,R), more generally), based on the SMER - Scheduling by Multiple Edge Reversal - multigraph dynamics. Given a multigraph M of arbitrary topology, having N nodes, the SMER-driven system is defined by the number of directed edges (arcs) between any two nodes of M, and by the global period length of all "arc reversals" in M. In the domain of prime numbers generation, such a graph method shows quite elegant, and it also yields a totally new kind of distributed prime sieving algorithms of an entirely original design. The maximum number of steps required by the algorithm is at most n + sqrt(n). Although far beyond the O(n/(loglog n)) steps required by the improved sequential "wheel sieve" algorithms, our SMER-based algorithm is fully distributed and of linear (step) complexity.
 Title Profiling Macro Data Flow Graphs for Parallel Implementation of FDTD Computations Authors Adam Smyk Marek Tudruj Location Polish-Japanese Institute of Information Technology, Warsaw, Poland Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland Abstract In this paper, we present methodology, which enables designing and profiling macro data flow graphs that represent computation and communication patterns for the FDTD (Finite Difference Time Domain) problem in irregular computational areas. Optimized macro data flow graphs (MDFG) for FDTD computations are generated in three main phases: generation of initial MDFG based on wave propagation area partitioning, MDFG nodes merging with load balancing to obtain given number of macro nodes and communication optimization to minimize and/or balance internode data transmissions. The efficiency of computation for several communication systems (PVM, RDMA, RDMA RB) is discussed. Relations between communication optimization algorithms and the overall FDTD computation efficiency will be shown. Experimental results obtained by simulation will be presented.
 Title Load balancing with migration based on synchronizers in PS-GRADE graphical tool Authors M. Tudruj J. Borkowski D. Kopañski Location Polish-Japanese Institute of Information Technology, Warsaw, Poland Abstract Parallel application control based on application global states is a new concept realized in PS-GRADE. PS-GRADE is a graphical environment for parallel programming, which unifies message passing programming style with control based on global application states. Special processes called synchronizers are responsible for gathering process states, constructing application global states and issuing control signals when necessary to application processes. In this paper we show, how this mechanism can be used as a framework for implementing load balancing with process migration using two methods. With the first method, asynchronously served control signals terminate current computations and cause the process state to be sent to another location. Another signal activates a target process and delivers the captured state to it. With the second method we use a special PVM library – Dynamite PVM instead of a standard version. It extends PVM by checkpointing and process migration with full restitution of the process state.
 Title A Distributed Algorithm For Maximum Flow Problem Authors Thuy Lien Pham Ivan Lavallee Marc Bui Si Hoang Do Location Université Paris 8, France Abstract This paper presents an asynchronous distributed algorithm for solving the maximum flow problem which is based on the preflow-push approach of Golberg-Tarjan. Each node initially knows the capacities of outgoing and incoming adjacent edges, the source nodes knows additionally the number of nodes in graph. The nodes in graph execute the same algorithm, and exchange messages with neighbors until the maximum flow is established. The algorithm is applicable in cases of multiple sources and/or targets. The total number of messages required for a graph of V nodes and E edges at most $4V^2E+2VE+10V^2+2E-3V$. It is expected that the algorithm will be ameliorated to adapt to the real-time maximum flow problem.
 Title Towards Massively Parallel Numerical Computations Based on Dynamic SMP Clusters with Communication on the Fly Authors Marek Tudruj Lukasz Masko Location Polish Academy of Science Abstract The paper presents an analysis of the suitability of the architecture of dynamic SMP clusters with communication on the fly for massively parallel computations based on fine grain decomposition of parallel programs. It is assumed that the proposed architecture will be implemented using a highly modular "System on Chip" and "Network on Chip" technology. This technology is considered a very prospective method for embedding very large number of co-operating parallel processors in a single system, thus enabling massively parallel computations. Experimental simulation results are presented in the paper, concerning parallel fine grain implementation of an exemplary typical numerical problem which is matrix multiplication based on recursive matrix decomposition. Selection of the size of parallel computation grain for large problem size is discussed in the paper. Estimations of computation efficiency of the proposed solutions including thousands of parallel processors are presented.
 Title SSOR Preconditioned Conjugent Gradient Implementation in Dynamic SMP Clusters with Communication on the Fly Authors Boguslaw Butrylo Marek Tudruj Lulasz Masko Laurent Nicolas Christian Vollaire Location Bialystok Technical University Institute of Computer Science, Polish Academy of Sciences Institute of Computer Science, Polish Academy of Sciences Centre de Genie Electrique de Lyon Centre de Genie Electrique de Lyon Abstract The paper concerns efficient computation methods for solving sets of linear equations using the preconditioned conjugent gradient method. Efficient parallelization method of the preconditioner program module has been proposed for execution in a new special architecture of dynamically reconfigurable shared memory clusters. The assumed system is provided with novel efficient inter-cluster communication mechanizms called communication on the fly which are very suitable for fine grain parallel computations. The proposed preconditioner implementation includes two step domain decomposition which leads to the decrease of parallel computation grain and overlapping of communication with computations. Comparative experimental results are presented of program execution efficiency in a standard multiprocessor cluster based on message passing and in the architecture of dynamic SMP clusters with communication on the fly.
 Title Scheduling Scheme based on Dedication Rate in Volunteer Computing Environment Authors Eun-Joung Byun Location Dept. of Computer Science and Engineering Korea University, Seoul, Korea Abstract The volunteer node can leave from and join to volunteer computing system freely. Since existing volunteer computing systems do not consider dynamic property (i.e. volatility) such as leave, join and suspension in scheduling, they suffer from interruption of job execution, delay of execution time and increase of total execution time. Therefore, ynamic execution properties of volunteer nodes should be considered in scheduling scheme in order to design a stable and reliable volunteer computing system. In this paper, we propose new scheduling scheme based on Dedication Rate which reflects dynamic properties of volunteer. Our scheduling scheme improves completeness and reliability of execution, and decreases delay time and total execution time. In addition, we describe the implementation of the proposed scheduling scheme in the KOREA@Home and performance evaluation.
 Title Designing Agent Based Travel Support System Authors Minor Gordon Marcin Paprzycki Location Department of Computer Science, Technical University of Berlin, Berlin, Germany Abstract As indicated in the literature, agent-based technology should naturally fit the architecture of a travel support system. Over the last few years we have been considering various aspects of designing such a system, struggling with technologies incapable of supporting our requirements as well as constantly changing computational landscape and, as a result, evolving and improving the design of the system. The aim of this note is to present the current design of a comprehensive framework for delivering personalized travel services using agent infrastructure.
 Title Coordination of Components in a Distributed Discrete­Event System Authors Ahmed Khoumsi Location Department of Electrical and Computer Engineering Universite de Sherbrooke, Canada Abstract We propose a method for coordinating local com­ponents that observe a distributed discrete ­event system R and execute actions depending on the current state of R. Coordination is achieved by allowing the components to communicate with one another. This communication helps the components to distinguish states of R when necessary. This study can for example be applied in distributed supervisory control or conformance testing.

Website maintained by Richard Olejnik and Guilhem Paroux.