|
|
|
Title |
An Artificial Immune
System for Efficient Comparison-Based Diagnosis of Multiprocessor
Systems |
Authors |
Mourad
Elhadef |
Location |
School
of Information Technology and Engineering, University of Ottawa,
Ottawa, Canada |
Abstract |
The
problem of identifying faulty processors (or units) in diagnosable
systems is considered. For the purpose of diagnosis, a system composed
of interconnected independent heterogeneous processors is modeled using
a comparison graph, where tasks are assigned to pairs of processors and
the results are compared. The agreements and disagreements among the
units are the basis for identifying faulty processors. It is assumed
that at most t processors can fail at the same time and that faults are
permanent. In this paper, we introduce a new artificial-immune-based
diagnosis approach using the comparison approach. The new approach has
been implemented and evaluated using randomly generated diagnosable
systems. Simulations results indicate that the immune approach is a
viable addition to present diagnosis problems. |
|
Title |
A Comparative Simulation
Analysis of P2P System Architectures
|
Authors |
Anjan
Mahanta
Thanaruk Theeramunkong |
Location |
Sirindhorn
International Institute of Technology, Thammasat University, Bangkadi,
Muang, Phathumthani, Thailand
|
Abstract |
This paper
gives a
comparative study on multi-layer file sharing mechanisms in
Peer-to-Peer (P2P) systems. Based on a well-known P2P system named
Gnutella, two system architectures, 1-layer and 2-layer Ngnu
architectures are proposed in order to reduce network traffic and to
gain better scalability. These architectures can be applied to
large-scale systems, such as e-government, e-office and e-library,
which need an efficient file-sharing mechanism. To examine the
efficiency and effectiveness of the proposed methods, a number of
simulations are made with respect to three factors:
time-to-live’s (TTLs), the number of nodes and the number of
queries. To compare these systems, the number of messages and the mean
query hit are measured and compared with the original flat Gnutella
system. |
|
Title |
An Efficient Load
Balancing Scheme for Grid-based High Performance Scientific Computing
|
Authors |
Arun
Kejariwal
Alexandru Nicolau |
Location |
University
of California at Irvine |
Abstract |
With the
emergence of
computational grids, there has been a dramatic increase in the number
of available processing and storing resources available for parallel
execution of large-scale compute and data intensive scientific
applications. However, large computing power in itself is not
sufficient for high performance computing (HPC). In this context,
(application) partitioning and load balancing strategies play a
critical role in meeting the high performance requirements and in
achieving high processor utilization. In HPC applications such as
molecular simulations, protein synthesis, drug design et cetera
parallel loops constitute the greatest percentage of program
parallelism. The degree to which parallelism can be exploited during
parallel execution of a nested loop directly depends on partitioning
and load balance, i.e., the number of iterations mapped onto each
processor, between the different processors. Thus, partitioning of
parallel loops is of key importance for grid-based high performance
scientific computing. Although a significant amount of work has been
done in partitioning of iteration spaces of nested loops, both
rectangular and non-rectangular iteration spaces, for homogeneous
multiprocessor systems, the problem of partitioning of iteration spaces
for heterogeneous systems has not been given enough attention so far.
In this paper, we present a geometric approach for partitioning
N-dimensional non-rectangular iteration spaces for optimizing
performance on heterogeneous parallel processor systems. Speedup
measurements for kernels (loop nests) of linear algebra packages are
presented. |
|
Title |
Collaborative Data
Distribution with BitTorrent for Computational Desktop Grids
|
Authors |
Baohua
Wei
Gilles Fedak
Franck Cappello |
Location |
Laboratoire
de Recherche en Informatique/INRIA Futurs |
Abstract |
Data-centric
applications are still a challenging issue for Large Scale Distributed
Computing Systems. The emergence of new protocols and softwares for
collaborative content distribution over Internet offers a new
opportunity for efficient and fast delivery of high volume of data.
This paper presents an evaluation of the BitTorrent protocol for
Computational Desktop Grids. We first present a prototype of a generic
subsystem dedicated to data management and designed to serve as a
building block for any Desktop Grid System. Based on this prototype we
conduct experimentations to evaluate the potential of BitTorrent
compared to a classical approach based on FTP data server. The
preliminary results obtained with a 65-nodes cluster measure the basic
characteristics of BitTorrent in terms of latency and bandwidth and
evaluate the scalability of BitTorrent for the delivery of large input
files. Moreover, we show that BitTorrent has a considerable latency
overhead compared to FTP but clearly outperforms FTP when distributing
large files or files to a high number of nodes. Tests on a synthetic
application show that BitTorrent significantly increases the
communication/computation ratio of the applications eligible to run on
a Desktop Grid System. |
|
Title |
Cellular Nonlinear Network
Grids – a new Grid Computing Approach
|
Authors |
Roland
Kunz
Ronald Tetzlaff |
Location |
IBM
Deutschland GmbH, Institute of applied physics, University of
Frankfurt, Germany
|
Abstract |
Throughout
this paper a
catenation between the universal paradigm of Cellular Nonlinear
Networks (CNN) and the innovative approach of grid computing in the
on-demand era is given. CNN are a massive parallel solution for solving
non-linear problems, modelling complex phenomena in medicine, physics
and data analysis as well as for powerful image processing and
recognition systems. They usually are simulated on local computer
systems or build as dedicated VLSI-implementations. However, the
research of complex CNN structures and settings require massive
computing power and thus can benefit from multi-system open
architectures which can be provided by the grid approach. Propositions
of two different realizations with grid architecture in mind are given
by introducing an algorithm of implementing such methods in a CNN
software simulator. First a brief introduction to CNN is given.
Afterwards, problems for the current determination of such networks are
discussed and routes are shown, how this field of research can benefit
from the opportunities, grid computing can deliver. |
|
Title |
Load Balancing Algorithm
in Cluster-based RNA secondary structure Prediction
|
Authors |
Guangming
Tan
Shengzhong Feng
Ninghui Sun |
Location |
The
Institute of Computing Technology, Chinese Academy of Sciences |
Abstract |
RNA secondary
structure
prediction remains one of the most compelling, yet elusive areas of
computational biology. Many computational methods have been proposed in
an attempt to predict RNA secondary structures. A popular dynamic
programming (DP) algorithm uses a stochastic context-free grammar to
model RNA secondary structures, its time complexity is O(N4) and
spatial complexity is O(N3). In this paper, a parallel algorithm, which
is time-wise and space-wise optimal with respect to the usual
sequential DP algorithm, can be implemented using O(N4/P) time and
O(N3/P) space in cluster. High efficient utilization of processors and
good load balancing are important to the performance of parallel
algorithms in cluster systems. Two parallel DP algorithms, which have
different mappings of the DP matrix to processors, are evaluated
concerning running time. As experiments show, dynamic mapping of DP
matrix can achieve better load balancing than the static and improve
the efficiency of processors. Thus, the dynamic mapping algorithm is
faster and gets better speedups. |
|
Title |
Parallel Jess
|
Authors |
Dana
Petcu |
Location |
Western
University of Timisoara, Romania |
Abstract |
Parallel
production or
rule-based systems, like a parallel version of Jess, are needed for
real applications. The proposed architecture for a such system is based
on a wrapper allowing the cooperation between several instances of Jess
running on different computers. The system has been designed having in
mind the final goal to speedup current P system simulators. Preliminary
tests show its efficiency in this particular case and on classical
benchmarks. |
|
Title |
A Memory-Efective
Fault-Tolerant Routing Strategy for Direct Interconnection Networks
|
Authors |
Maria
Engracia Gomez
Pedro Lopez
Jose Duato |
Location |
Universidad
Politécnica de Valencia |
Abstract |
In massively
parallel
computing system, high-performance interconnection networks are
crucial. Routing is one of the most important design issues of
interconnection networks. Moreover, the huge amount of hardware of
large machines makes fault-tolerance another important design issue. In
this paper, we propose a mechanism that combines scalable routing
without forwarding tables and fault-tolerance for commercial switches
to build direct regular topologies, which are the topologies used in
large machines. The mechanism is very flexible and the hardware
required is not complex. Furthermore, it allows a high degree of
fault-tolerance inflicting a minimal decrease of performance. |
|
Title |
Performance Effective Task
Scheduling Algorithm for Heterogeneous Computing System
|
Authors |
E.
Ilavarasan
P. Thambidurai
R. Mahilmannan |
Location |
Department
of Computer Science & Engineering and Information Technology,
Pondicherry Engineering College, Pondicherry, India
|
Abstract |
Finding an
optimal
solution to the problem of scheduling an application modeled by a
Directed Acyclic Graph (DAG) onto a distributed system is known to be
NP-complete. The complexity of the problem increases when task
scheduling is to be done in a heterogeneous computing system, where the
processors in the network may not be identical and take different
amounts of time to execute the same task. This paper introduces a
Performance Effective Task Scheduling (PETS) Algorithm for Network of
Heterogeneous system, with complexity O (v+e) (p+ log v), which
provides optimal results for applications represented by DAGs. The
performance of the algorithm is illustrated by comparing the schedule
length, speedup, efficiency and the scheduling time with existing
algorithms such as, Heterogeneous Earliest Finish Time (HEFT) and
Critical-Path On a processor (CPOP) and Levelized Matching and
Scheduling reported in this paper. The comparison study based on both
randomly generated graphs and graphs of some real applications such as
Gaussian Elimination, Fast Fourier Transformation and Molecular
Dynamics code shows that PETS algorithm substantially outperforms
existing algorithms. |
|
Title |
Black Holes Location in
Arbitrary Graphs: Algorithmic Aspect
|
Authors |
Rodrigue-Bertrand
Ossamy |
Location |
LaBRI,
ENSEIRB, University of Bordeaux 1 |
Abstract |
Security,
especially
the attacks performed by hosts to the visiting mobile agents (the
malicious hosts problem), is a major problem in mobile agents systems.
Being the running environment for mobile agents, the host has full
control over them and could easily perform many kinds of attacks
against them. In this work we are specially interested in the presence
of a textit{black hole} in the network. That is, a stationary process
that destroys any incoming agent without leaving any trace. The
textit{black hole} search problem is the one of assembling a team of
asynchronous mobile agents to successfully identify the location of the
textit{black hole}. We are concerned with topology-independent
solutions where the network is not necessary anonymous. We also
establish that the number of agents also depends on the initial
placement of the agents and on the number of textit{black holes} in the
network. Finally, we prove that in an anonymous network where each
agent has its own homebase, it is not possible to solve the
textit{black hole} search problem, if the network size is unknown. |
|
Title |
A
Strategyproof Mechanism for Scheduling Divisible Loads in Distributed
Systems |
Authors |
Daniel
Grosu
Thomas E. Carroll |
Location |
Dept.
of Computer Science, Wayne State University, Detroit, Michigan, USA
|
Abstract |
An important
scheduling
problem is the one in which there are no dependencies between tasks and
the tasks can be of arbitrary size. This is known as the divisible load
scheduling problem and was studied extensively in recent years
resulting in a cohesive theory called \emph{Divisible Load Theory}
(DLT). In this paper we augment the existing divisible load theory with
incentives. We develop a strategyproof mechanism for scheduling
divisible loads in distributed systems assuming a bus type
interconnection and a linear cost model for the processors. The
mechanism provides incentives to processors such that it is beneficial
for them to report their true processing power and process the assigned
load using their full processing capacity. We define the strategyproof
mechanism and prove its properties. We simulate and study the
implementation of the mechanism on systems characterized by different
parameters. |
|
Title |
Measuring and improving
quality of parallel application monitoring based on global states
|
Authors |
Janusz
Borkowski |
Location |
Polish-Japanese
Institute of Information Technology |
Abstract |
Effectiveness
of
parallel/distributed application control based on predicates defined on
application global states depends on the performance of an underlying
Consistent Global State monitoring mechanism. Focusing on Strongly
Consistent Global States (SCGS) usage, we introduce three measures of
SCGS monitoring quality. They reflect delay, state information ageing
and state inspection frequency experienced by a monitor. Using the
measures, four SCGS monitoring algorithms are compared, including two
novel algorithm variants. The comparison is carried out with the use of
simulations. The newly introduced algorithms prove to perform much
better than a standard SCGS algorithm. |
|
Title |
A New Join Algorithm for
Cluster-Based Database Systems
|
Authors |
Kenji
Imasaki
Sivarama Dandamudi |
Location |
School
of Computer Science, Carleton University, Canada |
Abstract |
This paper
focuses on
cluster-based parallel database systems in which only one of the nodes
has the database and the other nodes, which have no initial data, are
used for parallel query processing. In such a system, the load of each
node changes dynamically depending on the activities of the local
users. In addition, in database query processing, data skew exists.
Thus, it is very important to develop efficient load balancing/sharing
algorithms. This paper proposes a new join algorithm called Symmetric
Chunking Hash Join (SCHJ) that divides the hash buckets into chunks and
uses them for load balancing. The SCHJ algorithm is compared with a
dynamic round-robin algorithm and a sampling algorithm. The results
show that the SCHJ algorithm is the best among these algorithms when
there is data skew and background load variations |
|
Title |
A Compiler-Directed Energy
Saving Strategy for Parallelizing Applications in On-Chip
Multiprocessors |
Authors |
Juan
Chen
Xue-jun Yang
Yong Dong
Dan Wu |
Location |
School
of Computer Science, National University of Defense Technology,
Changsha China |
Abstract |
As energy
consumption
becoming one of the key optimization objects in on-chip multiprocessor,
parallel compilation techniques that optimize performance and energy
simultaneously are placed an important role. Thus, compiling a
parallelizing application combined with energy saving strategy is more
significant. In this paper, we focus on an on-chip multiprocessors
architecture and each processor in on-chip multiprocessor can
independently scale its frequency and voltage (here we provide several
pre-selected discrete frequency values and voltage levels), which
provides us an efficient energy saving approach. Given an
array-intensive application, we simulate parallelizing application and
analyze probable load imbalance; then our energy saving strategy
determines each processor¡¯s clock frequency and
voltage fit for each parallel fragment in terms of load imbalance.
Here, parallel fragments mainly denote parallel loop nests. Further, we
consider the serial code fragments as a severe load-unbalanced parallel
partitioning and the redundant processors can be shut down during the
serial codes execution process. Such adaptive frequency/voltage scaling
for the parallel code fragments and shutting down processors for the
serial code fragments form our main energy saving strategy. Initial
experiment proves our energy saving strategy is successful in reducing
the energy consumption of the parallel programs. |
|
Title |
Enforcing consistency
during the adaptation of a parallel component
|
Authors |
Jérémy
Buisson
Françoise André
Jean-Louis Pazat |
Location |
IRISA/INSA
de Rennes
IRISA/Université de Rennes 1
IRISA/INSA de Rennes |
Abstract |
As Grid
architectures
provide execution environments that are distributed, parallel and
dynamic, applications require to be not only parallel and distributed,
but also able to adapt themselves to their execution environment. This
article presents a model for designing self-adaptable parallel
components that can be assembled to build applications for Grid. This
model includes the definition of a consistency criterion for the
dynamic adaptation of SPMD components. We propose a solution to
implement this criterion. It has been evalued on both synthetic and
real codes to exhibit the behavior of the several proposed strategies. |
|
Title |
Evolutionary Tuning for
Distributed Database Performance
|
Authors |
Anca
Gog
Horea-Adrian Grebla |
Location |
Babes-Bolyai
University, Department of Computer Science |
Abstract |
Modern
database
management systems require modern design and administration solutions.
In such a system the execution process of the queries implies accurate
estimations and predictions for performance characteristics. The
performance problems of data allocation and query optimization in
distributed database systems done by means of mobile agents and
evolutionary algorithms are considered. These problems still present a
challenge because of the dynamic changes in number of components and
architectural complexity of nowadays system topologies. The distributed
system is modeled as a graph structure on which is defined a dynamic
cost vector. The cost vector remains consistent, relevant, by use of
mobile agents performing cost statistics and vector updates. An
evolutionary algorithm is proposed to solve this NP-Complete problem.
Experimental results prove the efficiency of the proposed technique. |
|
Title |
Towards Scalable Grid
Replica Optimization Framework
|
Authors |
Marek
Ciglan
Ladislav Hluchý |
Location |
Institute
of informatics, Slovak academy of sciences |
Abstract |
Data Grids
are becoming
a popular infrastructure for distributed, data intensive scientific
application. Data access optimization is one of the key features of
such systems, as the transfer and access to the huge amounts of data
are often the bottleneck of the data intensive tasks. Concept of
replication, creation of multiple copies of a data source across
multiple grid nodes, was adopted by grid community to increase data
availability. In this paper, we present the ongoing work on the grid
replica optimization framework that addresses both short-term and
long-term replica optimization problems. We describe high level
architecture of the optimization framework and discuss the promising
results achieved by prototype implementation of several system
components. |
|
Title |
Embedded cost model in
mobile agents for large scale query optimization
|
Authors |
Mohammad
Hussein
Franck Morvan
Abdelkader Hameurlain |
Location |
IRIT
Loboratory, Paul Sabatier University, Toulouse, France
|
Abstract |
In the
context of
heterogeneous and distributed data sources in large scale, the
execution plans generated by traditional optimizers can be sub-optimal
because: the estimations are inaccurate, the unavailability of data and
the execution environment is unstable. To deal with the sub-optimality
of an execution plan, an approach is to execute each relational
operator of an execution plan by a mobile agent. This mobile agent can
migrate from a site to another to execute. The stored information in
the site, where the agent is situated, can be insufficient to calculate
the site of migration of the agent. In this paper, we will investigate
to integrate a cost model into the mobile agent executing an operator.
The performance evaluation shows that the cost model embedded in the
agent allows him to choose its most appropriate site of migration. |
|
Title |
GMRES Method on
Lightweight GRID System |
Authors |
Haiwu
He
Guy Bergère
Serge Petiton |
Location |
Laboratoire
d’Informatique Fondamentale de Lille, Université
de Lille I, France
|
Abstract |
Abstract-Grid
computing
accomplishes high throughput computing by using a very large number of
unexploited computing resources. We present a parallel method GMRES to
solve large sparse linear systems by the use of a lightweight GRID
XtremWeb. This global computing platform, just as many popular GRID
systems, is devoted to multi-parameters generic applications. We have
implemented an important algorithm GMRES which is one the key method to
resolve large, non-symmetric, linear problems. We discuss the
performances of this algorithm deployed on two XtremWeb networks: a
local network with 128 non-dedicated PCs in Polytech-Lille of
University of Lille I, a remote network with 4 clusters of SCGN Grid
that includes 91 CPUs totally in the High Performance Computing Center
of University of Tsukuba. We compare these performances with those of a
MPI implementation of GMRES on the same platform.We present the
advantages and shortcomings of our implementation on this lightweight
GRID XtremWeb. |
|
Title |
Progressive Clustering for
Database Distribution on a Grid
|
Authors |
Valerie
Fiolet
Bernard Toursel |
Location |
Service
Informatique, University of Mons-Hainault, Mons, Belgium
Laboratoire d'Informatique Fondamentale de Lille (UMR CNRS 8022),
University of Lille 1, France
|
Abstract |
The
increasing
availability of clusters and grids of workstations allows to bring
cheap and powerful ressources for distributed datamining. But it
appears necessary to invent new algorithms adapted to this kind of
environement, especially according to data distribution and the use of
this distribution in treatment. Then the need of an intelligent
fragmentation of data appears.This distribution could derived from
clustering. This paper presents a new clustering algorithm called
Progressive Clustering that allows to execute a clustering treatment in
an efficient and incremental distributed way. Fragmentation of data
resulting from this method could be used to distribute data mining
treatments. |
|
Title |
Managing MPICH-G2 Jobs
with WebCom-G |
Authors |
Padraig
J. ODowd
Adarsh Patil
John P. Morrison |
Location |
Computer
Science Dept, UCC, Cork Ireland. |
Abstract |
This paper
discusses
the use of WebCom-G to handle the management & scheduling of
MPICH-G2 (MPI) jobs. Users can submit their MPI applications to a
WebCom-G portal via a web interface. WebCom-G will then select the
machines to execute the application on, depending on the machines
available to it and the number of machines requested by the user.
WebCom-G automatically & dynamically constructs a RSL script
with the selected machines and schedules the job for execution on these
machines. Once the MPI application has finished executing, results are
stored on the portal server, where the user can collect them. A main
advantage of this system is fault survival, if any of the machines fail
during the execution of a job, WebCom-G can automatically handle such
failures. Following a machine failure, WebCom-G can create a new RSL
script with the failed machines removed, incorporate new machines (if
they are available) to replace the failed ones and re-launch the job
without any intervention from the user. The probability of failures in
a Grid environment is high, so fault survival becomes an important
issue. |
|
Title |
Adaptive window scheduling
for a hierarchical agent system
|
Authors |
Holly Dail
Frédéric Desprez |
Location |
ENS Lyon, INRIA
|
Abstract |
DIET
(Distributed
Interactive Engineering Toolbox) is a toolbox for the construction of
Network Enabled Server systems. DIET servers provide transparent access
to compute resources; resources can be either a single, interactive
machine where the DIET server runs each request directly on its host,
or batch-managed systems where the DIET server manages request
submission and completion notification. A distributed hierarchy of
scheduling agents connect the servers and are responsible for selecting
servers appropriate to each client request. DIET seeks scalability by
distributing the scheduling process and by keeping resource information
measurement and performance prediction at the server level. DIET has
traditionally offered an online scheduling model whereby all requests
are scheduled immediately or refused. This approach can overload
interactive servers in high-load conditions and does not allow
adaptation of the schedule to task or data dependencies. In this
article we consider an alternative model based on active management of
the flow of requests throughout the system. We have added support for
(1) limiting the number of concurrent requests on interactive servers,
(2) server and agent-level queues, and (3) window-based scheduling
algorithms whereby the request release rate to servers can be
controlled and some re-arrangement of request to host mappings is
possible. We present experiments demonstrating that these approaches
can improve performance and that the overheads introduced are not
significantly different from those of the standard DIET approach. |
|
Title |
A study of meta-scheduling
architectures for high throughput computing: Pull vs. Push
|
Authors |
Vincent
Garonne
Andrei Tsaregorodtsev
Eddy Caron
|
Location |
Centre
de Physique des Particules de Marseille CNRS-IN2P3, France
Centre de Physique des Particules de Marseille CNRS-IN2P3, France
LIP/ENS Lyon, France
|
Abstract |
In this paper
we
present a model and a simulator for large-scale system. Such platforms
are composed of heterogeneous clusters of PCs belonging to a local
network. These clusters are then connected to each other through a
global network. Moreover each cluster is managed via a local scheduler
and is shared by many users. We validate our simulator by comparing the
experimental results and the analytical results of a M/M/4 queuing
system. These studies indicate that the simulator is consistent. After
that we do the comparison with a real batch system and we obtain a mean
error of 10.5 % for the response time and 12 % for the makespan. We
conclude that our simulator is realistic and describes well the
behavior of a large-scale system. Thus we can study the scheduling of
our system called dirac in a high throughput context. We justify our
decentralized, adaptive and opportunistic approach in comparison to a
centralized approach in such a context. |
|
Title |
Service Driven Mobile Ad
Hoc Networks Formation and Management
|
Authors |
Dan
Grigoras
Mark Riordan |
Location |
Computer
Science Department, UCC, Cork, Ireland |
Abstract |
Self-organizing
mobile
ad hoc networks (MANET) allow devices to share their services and
resources. MANET applications as diverse as multi-player games,
disaster recovery, cooperative computing and mobile patient monitoring
explain the high interest in these dynamic systems. However, the
formation and management of MANET are challenging tasks due to
network’s features like mobility, heterogeneity, limited
resources and feeble communication. This paper presents a simple,
cost-effective and resilient algorithm for the basic tasks of MANET
creation and management. |
|
Title |
Towards introducing code
mobility on J2ME |
Authors |
Laurentiu
Lucian Petrea
Dan Grigoras
|
Location |
Department
of Computer Science University College Cork, Cork, Ireland
|
Abstract |
This paper
makes the
case why mobile code is an appropriate solution for platforms running
on mobile devices aggregated in mobile ad-hoc networks. It also
presents a solution that considers the introduction of mobile code
using remote class loading in the J2ME/CLDC profile. The principal
problem of remote class loading revolves around trusting the remote
code in accessing system resources. After a review of the algorithms
which are used to accomplish access control, a mechanism for the CLDC
is proposed that is based on the history of the access control. |
|
Title |
Ghost Process: a Sound
Basis to Implement Process Duplication, Migration and
Checkpoint/Restart in Linux Clusters
|
Authors |
Geoffroy
Vallee
Renaud Lottiaux
David Margery
Christine Morin
Jean-Yves Berthou
|
Location |
ORNL/INRIA/EDF
IRISA/INRIA
IRISA/INRIA
IRISA/INRIA
EDF R&D |
Abstract |
Process
management
mechanisms (process duplication, migration and checkpoint/restart) are
very useful for high performance and high availability in clustering
systems. The single system image approach aims at providing a global
process management with mechanisms of process checkpoint, process
migration and process duplication. In this context, a common mechanism
of process virtualization is very useful in this respect but
traditional operating systems do not provide such a system. This paper
presents a kernel service for process virtualization called ghost
process, extending the Linux kernel. The ghost process mechanism has
been implemented in the Kerrighed single system image based on Linux. |
|
Title |
A Component-based Software
Infrastructure for Ubiquitous Computing
|
Authors |
Areski
Flissi
Christophe Gransart
Philippe Merle
|
Location |
CNRS/LIFL
INRETS - LEOST
INRIA |
Abstract |
Multiplication
of
mobile devices and generalized use of wireless networks imply changes
on the design and execution support of distributed software
applications targeting ubiquitous computing. Many strong requirements
have to be addressed: heterogeneity and limited resources of wireless
networks and mobile devices, networked communications between
distributed applications, dynamic discovery and automatic deployment on
mobile devices, etc. In this paper, we present a component-based
software infrastructure to design, discover, deploy, and execute
ubiquitous contextual services, i.e. distributed applications providing
services to mobile end-users but only available from a particular
place. These ubiquitous contextual services are designed as assemblies
of distributed software components. These assemblies are dynamically
discovered according to end-users' physical location and device
capabilities. Then, appropriate assemblies are automatically deployed
on users' devices. This approach (the software infrastructure and a
ubiquitous application example) are implemented on top of the OMG CORBA
Component Model and using the OpenCCM open source platform. |
|
Title |
Data distribution for
failure correlation management in a Peer to Peer Storage
|
Authors |
Cyril
Randriamaro
Olivier Soyez
Gil Utard
Francis Wlazinski |
Location |
Laria
UPJV |
Abstract |
This article
presents a
dynamic algorithm for data storage in a P2P storage system, named Us.
In Us, peers are arranged in groups called meta-peers to take into
account failure correlation. When a peer fails, a reconstruction
process rebuilds lost data with help from others peers. To minimize end
user traffic because of the reconstruction process, distribution must
take into account a new measure: the maximum disturbance cost of a peer
during the reconstruction process. The disturbance cost is indicated by
the number of data communications which are requested from a single
peer for rebuilding lost data. The main goal of this article is to
define algorithm, that takes into account failure correlation, able to
dilute dynamically the reconstruction process in the system. |
|
Title |
Extending OBDD Graphs for
Composite Event Matching in Content-based Pub/Sub Systems
|
Authors |
Gang
Xu |
Location |
Technology
Center of Software Engineering, Institute of Software,
Chinese Academy of Sciences, Beijing China |
Abstract |
Content-based
publish/subscribe offers a convenient abstraction for the information
producers and consum-ers, supporting a large-scale system design and
evolu-tion by integrating several distributed independent application
systems. Unlike in the traditional address-based unicast or multicast,
its core problem is how to matching events by predicates on the content
of events. In existing matching approaches, matching predicates are
composed by the conjunction and disjunction of non-semantic
constraints. But, in context of enterprise application integration,
although they can match events by their contents, this traditional
matching predicates are not enough expressive in manipulating the
complex event matching, such as the
¡°one-to-many¡± and
¡°many-to-one¡± matching.
Therefore, tradi-tional matching approaches should be extended to solve
the complex matching problems. After analyzing information matching
patterns in enterprise applica-tion integration, we propose three
matching models, extend the simple matching to the multi-semantic
matching and introduce the temporal constraint vari-able. The
multi-semantic matching allows using differ-ent operations in
accordance with different semantics; the temporal constraint variable
supports processing the discrete events in the temporal sequence. Then,
we extend OBDD graphs into hierarchy coloured OBDD graphs and prove the
equivalence of the transforma-tion. Based on the extended OBDD graphs,
the com-posite matching algorithm is presented and analysed. By
experiments, we show the proposed algorithm is efficient. |
|
Title |
On the Implementation and
Evaluation of Berkeley Sockets on Maestro2 cluster computing environment
|
Authors |
Ricardo
Miguel Costa Guapo
Shinichi Yamagiwa
Leonel Augusto Pires Seabra de Sousa |
Location |
INESC-ID,
Lisboa Portugal |
Abstract |
The support
on cluster
environments of "legacy protocols" is important to avoid rewriting the
code of applications, which must be implemented by maximizing the
communication performance. This paper addresses this issue by
implementing the Berkeley sockets interface over the MMP message
passing protocol, which is the lower layer of the Maestro2 cluster
communication network. Experimental results show that MMP-Sockets
offers a minimum latency of 25$\mu$s and a maximum throughput of
1250Mbps. These values correspond to a relative increase of 80% for the
latency and a decrease of about 30% for the throughput, regarding to a
communication based only on MMP. However, MMP-Sockets increases the
compatibility and portability of developed applications. |
|
Title |
Active zero-copy: A
performance study of non-deterministic messaging
|
Authors |
Shinichi
Yamagiwa
Keiichi Aoki
Koichi Wada |
Location |
INESC-ID,
Lisboa Portugal
Department of Computer Science, University of Tsukuba,
Tsukuba, Ibaraki, Japan.
|
Abstract |
Zero-copy
communication
exchanges the message among the buffers allocated and locked ahead.
This communica-tion style fits into applications that the communication
timings and the message sizes are known in its initialization. However,
another application with non-deterministic messaging can not fit into
the style because the sizes and timings of its messages that change at
every transaction. This paper proposes a new zero-copy communication
style for this kind of application, called ac-tive-zero copy. The
active zero-copy receives messages without the pre-allocation of the
buffers. The performance evaluation with the active zero-copy comparing
with the conventional zero-copy when it is applied to the applications
with non-deterministic messaging will be shown. |
|
Title |
Java Byte Code Scheduling
Based on the Most-Often-Used-Paths in Programs with Branches
|
Authors |
Eryk
Laskowski
Marek Tudruj
Richard Olejnik
Bernard Toursel |
Location |
Institute
of Computer Science, Polish Academy of Sciences
Institute of Computer Science, Polish Academy of Sciences
Laboratoire d’Informatique Fondamentale de Lille, USTL
Laboratoire d’Informatique Fondamentale de Lille, USTL |
Abstract |
The paper
presents an
introductory optimization algorithm, which can be performed before a
Java program is executed in a parallel system. Taking a sequential
multithreaded version of a Java program as input information, the aim
of the parallel program optimization is to determine an initial
distribution of objects on virtual machines so as to decrease direct
inter-object communication and balance loads of the virtual machines.
Object placement optimization algorithm uses a graphical representation
of control dependencies and data dependencies among methods in the Java
program. These dependencies are discovered by an analysis of program
byte code and stored in the form of relevant macro/dataflow graphs.
Placement optimization algorithm tries to optimally assign the macro
nodes to processors (JVMs) so as to reduce inter-processor
communication overheads. The optimization methods first do clustering
of macro nodes on unlimited number of processors (logical JVMs) to
reduce the execution time of the clustered nodes. Next, merging of the
assigned clusters is performed to reduce the number of logical JVMs to
the number of real processors. The presented approach is supported by
dynamic, on-line load balancing mechanism, which applies object
migration as proposed in the ADAJ (Adaptive Distributed Applications in
Java) project. |
|
Title |
Task Scheduling for
Dynamic SMP Clusters with Communication on the Fly for Bounded Number
of Resources |
Authors |
Lukasz
Masko |
Location |
Institute
of Computer Science, Polish Academy of Sciences, Poland
|
Abstract |
The paper
presents an
algorithm for scheduling parallel programs in a parallel architecture
based on multiple dynamic SMP clusters. In this architecture,
processors are organized in clusters around shared memory modules. A
set of processors connected to the same memory module constitute such
cluster. Processors can be dynamically switched between modules in the
runtime. Due to technical problems, memory modules and processors are
organized in computational units of a fixed size, connected with a
local communication network implemented in a Network-on-Chip technology
(NoC module). Processors located in the same module can share their
data and communicate using a technique of data transfers on the fly. A
number of such modules can be connected using a global interconnection
network to form a larger infrastructure. The presented algorithm
schedules initial macro data-flow program graph for such an
architecture with a given number of NoC modules, considering a fixed
size of a NoC module. First, it distributes program graph nodes among
processors, assuming no division between NoC modules. Then it
structures and schedules all computations and communications to utilize
processor switching and read on the fly facilities. Finally, using
genetic algorithm, it divides the whole set of processors to subsets of
a given size, which then are mapped to separate NoC modules. |
|
Title |
A
Distributed Prime Sieving Algorithm based on Scheduling by Multiple
Edge Reversal |
Authors |
Gabriel
Paillard
Christian Lavault
Felipe França |
Location |
LIPN
- Université Paris Nord - France
LIPN - Université Paris Nord - France
PESC - Universidade Federal do Rio de Janeiro - Brazil
|
Abstract |
In this
article, we
propose a fully distributed algorithm for finding all primes in an
given interval [2..n] (or (L,R), more generally), based on the SMER -
Scheduling by Multiple Edge Reversal - multigraph dynamics. Given a
multigraph M of arbitrary topology, having N nodes, the SMER-driven
system is defined by the number of directed edges (arcs) between any
two nodes of M, and by the global period length of all "arc reversals"
in M. In the domain of prime numbers generation, such a graph method
shows quite elegant, and it also yields a totally new kind of
distributed prime sieving algorithms of an entirely original design.
The maximum number of steps required by the algorithm is at most n +
sqrt(n). Although far beyond the O(n/(loglog n)) steps required by the
improved sequential "wheel sieve" algorithms, our SMER-based algorithm
is fully distributed and of linear (step) complexity. |
|
Title |
Profiling Macro Data Flow
Graphs for Parallel Implementation of FDTD Computations
|
Authors |
Adam
Smyk
Marek Tudruj |
Location |
Polish-Japanese
Institute of Information Technology, Warsaw, Poland
Institute of Computer Science, Polish Academy of Sciences, Warsaw,
Poland
|
Abstract |
In this
paper, we
present methodology, which enables designing and profiling macro data
flow graphs that represent computation and communication patterns for
the FDTD (Finite Difference Time Domain) problem in irregular
computational areas. Optimized macro data flow graphs (MDFG) for FDTD
computations are generated in three main phases: generation of initial
MDFG based on wave propagation area partitioning, MDFG nodes merging
with load balancing to obtain given number of macro nodes and
communication optimization to minimize and/or balance internode data
transmissions. The efficiency of computation for several communication
systems (PVM, RDMA, RDMA RB) is discussed. Relations between
communication optimization algorithms and the overall FDTD computation
efficiency will be shown. Experimental results obtained by simulation
will be presented. |
|
Title |
Load balancing with
migration based on synchronizers in PS-GRADE graphical tool
|
Authors |
M.
Tudruj
J. Borkowski
D. Kopañski |
Location |
Polish-Japanese
Institute of Information Technology, Warsaw, Poland |
Abstract |
Parallel
application
control based on application global states is a new concept realized in
PS-GRADE. PS-GRADE is a graphical environment for parallel programming,
which unifies message passing programming style with control based on
global application states. Special processes called synchronizers are
responsible for gathering process states, constructing application
global states and issuing control signals when necessary to application
processes. In this paper we show, how this mechanism can be used as a
framework for implementing load balancing with process migration using
two methods. With the first method, asynchronously served control
signals terminate current computations and cause the process state to
be sent to another location. Another signal activates a target process
and delivers the captured state to it. With the second method we use a
special PVM library – Dynamite PVM instead of a standard
version. It extends PVM by checkpointing and process migration with
full restitution of the process state. |
|
Title |
A Distributed Algorithm
For Maximum Flow Problem |
Authors |
Thuy
Lien Pham
Ivan Lavallee
Marc Bui
Si Hoang Do
|
Location |
Université
Paris 8, France |
Abstract |
This paper
presents an
asynchronous distributed algorithm for solving the maximum flow problem
which is based on the preflow-push approach of Golberg-Tarjan. Each
node initially knows the capacities of outgoing and incoming adjacent
edges, the source nodes knows additionally the number of nodes in
graph. The nodes in graph execute the same algorithm, and exchange
messages with neighbors until the maximum flow is established. The
algorithm is applicable in cases of multiple sources and/or targets.
The total number of messages required for a graph of V nodes and E
edges at most $4V^2E+2VE+10V^2+2E-3V$. It is expected that the
algorithm will be ameliorated to adapt to the real-time maximum flow
problem. |
|
Title |
Towards
Massively Parallel
Numerical Computations Based on Dynamic SMP Clusters with Communication
on the Fly |
Authors |
Marek
Tudruj
Lukasz Masko |
Location |
Polish
Academy of Science |
Abstract |
The paper
presents an
analysis of the suitability of the architecture of dynamic SMP clusters
with communication on the fly for massively parallel computations based
on fine grain decomposition of parallel programs. It is assumed that
the proposed architecture will be implemented using a highly modular
"System on Chip" and "Network on Chip" technology. This technology is
considered a very prospective method for embedding very large number of
co-operating parallel processors in a single system, thus enabling
massively parallel computations. Experimental simulation results are
presented in the paper, concerning parallel fine grain implementation
of an exemplary typical numerical problem which is matrix
multiplication based on recursive matrix decomposition. Selection of
the size of parallel computation grain for large problem size is
discussed in the paper. Estimations of computation efficiency of the
proposed solutions including thousands of parallel processors are
presented. |
|
Title |
SSOR Preconditioned Conjugent Gradient
Implementation in Dynamic SMP Clusters with Communication on the Fly
|
Authors |
Boguslaw Butrylo
Marek Tudruj
Lulasz Masko
Laurent Nicolas
Christian Vollaire
|
Location |
Bialystok Technical
University
Institute of Computer Science, Polish Academy of Sciences
Institute of Computer Science, Polish Academy of Sciences
Centre de Genie Electrique de Lyon
Centre de Genie Electrique de Lyon |
Abstract |
The paper
concerns efficient computation methods for solving sets of linear
equations using the preconditioned conjugent gradient method. Efficient
parallelization method of the preconditioner program module has been
proposed for execution in a new special architecture of dynamically
reconfigurable shared memory clusters. The assumed system is provided
with novel efficient inter-cluster communication mechanizms called
communication on the fly which are very suitable for fine grain
parallel computations. The proposed preconditioner implementation
includes two step domain decomposition which leads to the decrease of
parallel computation grain and overlapping of communication with
computations. Comparative experimental results are presented of program
execution efficiency in a standard multiprocessor cluster based on
message passing and in the architecture of dynamic SMP clusters with
communication on the fly. |
|
Title |
Scheduling
Scheme based on Dedication Rate in Volunteer Computing Environment |
Authors |
Eun-Joung
Byun
|
Location |
Dept.
of Computer Science and Engineering Korea University, Seoul, Korea
|
Abstract |
The volunteer
node can
leave from and join to volunteer computing system freely. Since
existing volunteer computing systems do not consider dynamic property
(i.e. volatility) such as leave, join and suspension in scheduling,
they suffer from interruption of job execution, delay of execution time
and increase of total execution time. Therefore, ynamic execution
properties of volunteer nodes should be considered in scheduling scheme
in order to design a stable and reliable volunteer computing system. In
this paper, we propose new scheduling scheme based on Dedication Rate
which reflects dynamic properties of volunteer. Our scheduling scheme
improves completeness and reliability of execution, and decreases delay
time and total execution time. In addition, we describe the
implementation of the proposed scheduling scheme in the KOREA@Home and
performance evaluation. |
|
Title |
Designing
Agent Based
Travel Support System |
Authors |
Minor Gordon
Marcin Paprzycki
|
Location |
Department of
Computer
Science, Technical University of Berlin, Berlin, Germany
|
Abstract |
As
indicated in the literature, agent-based technology should naturally
fit the architecture of a travel support system. Over the last few
years we have been considering various aspects of designing such a
system, struggling with technologies incapable of supporting our
requirements as well as constantly changing computational landscape
and, as a result, evolving and improving the design of the system. The
aim of this note is to present the current design of a comprehensive
framework for delivering personalized travel services using agent
infrastructure. |
|
Title |
Coordination of Components
in a Distributed DiscreteEvent System
|
Authors |
Ahmed
Khoumsi
|
Location |
Department
of Electrical and Computer Engineering Universite de Sherbrooke, Canada
|
Abstract |
We propose a
method for
coordinating local components that observe a distributed
discrete event system R and execute actions depending on the
current state of R. Coordination is achieved by allowing the components
to communicate with one another. This communication helps the
components to distinguish states of R when necessary. This study can
for example be applied in distributed supervisory control or
conformance testing. |
|
|
|