Distributed eScience Applications in the DORII Project Test Bed – Architecture and Performance

by Davide Adami, Alexey Cheptsov, Franco Davoli, Bastian Koller, Matteo Lanati, Ioannis Liabotis, Norbert Meyer, Roberto Pugliese, Anastasios Zafeiropoulos, Stefano Vignola, Sandro Zappatore
Download as PDFDownload as PDF

Abstract—Much interest has arisen recently in the access to and management of remote instrumentation and laboratory equipment in general. The complex of activities related to these topics can be summarized under the name of Remote Instrumentation Services, where the term “instrumentation” includes any kind of experimental equipment, and the term “services” underlines the general framework whereby the instrumental resources should be accessed (i.e., the Service Oriented Architecture). Building on the foundations of previous European projects, the aim of DORII (Deployment of Remote Instrumentation Infrastructure) is to build and operate a test bed addressing different areas of eScience. These include oceanographic applications, earthquake engineering, and large-scale physics experiments on synchrotron light. The paper describes the characteristics and the design of the test bed stemming from the applications’ requirements, in terms of networking and middleware, and its current status of development. It also presents the performance monitoring infrastructure that has been built in DORII and the results concerning a selected application in seismic engineering.

I.          Introduction

Access, configuration, monitoring, control and management of remote laboratory instrumentation gained growing interest with the development of the so-called e-Science [1-4]. Remotely controlling a device, sending commands and acquiring measurements is not new - it has been done and it is being done in a whole range of different applications. However, a Remote Instrumentation Service is more than this:

  • It should provide a set of standard capabilities to perform whatever functionality may be required;
  • It should construct suitable abstractions of the remote instrumentation, in order to make it visible as a manageable resource;
  • It should present the user standard interfaces, and allow browsing the “distributed laboratory space”, choose different pieces of equipment, configure their interconnection, orchestrate experiment executions, collect, process and analyze the results, and make them available to the scientific community through experiment data repositories, organized as Digital Libraries.

In order to accomplish such tasks to a full extent, instruments should become full-class members of a Service Oriented Architecture (SOA), much in the same way as computing and storage devices are. Test sites should be developed, providing:i) isolation from and relative independence of the underlying networking infrastructure; ii) tools for resource allocation and management; iii) standard user interfaces; iv) non-trivial Quality of Service (QoS) control and QoS-aware workflows; v) integration in Grid or in cloud computing architectures. This has been put in perspective within the framework of the Open Grid Services Architecture (OGSA), by presenting enhancements to existing service capabilities [5].

Building on the experience gained in previous European projects (notably, among others, GRIDCC [6], and RINGrid [5], on Remote Instrumentation, Int.EU.Grid [7], on interactivity, and g-Eclipse [8], on software frameworks for application developers), the DORII (Deployment of Remote Instrumentation Infrastructure) project [9] has designed and is currently completing the set up of an extended infrastructure with the direct involvement and cooperation of users in three main areas:

  • Earthquake community (with various sensor networks);
  • Environmental science community;
  • Experimental science community (with synchrotron and free electron lasers).

The goal of the present paper is to highlight the design choices and the current development status of the DORII infrastructure, particularly with respect to the specific application fields, in terms of advanced networking and middleware solutions. Moreover, we highlight the customization and deployment of some performance monitoring tools, and present preliminary results on a selected application in earthquake engineering. The paper is organized as follows. Section 2 describes the applications and points out their QoS requirements stemming from the analysis performed in the initial phase of the project. Section 3 deals with the networking aspects, whereas Section 4 is devoted to the middleware ones. Section 5 presents the deployment of the selected application, and reports some related experimental results. Section 6 contains the conclusions and directions for future development.

 

II.         DORII Applications

DORII applications span a significant range of e-Science domains, each presenting some challenges for the effective exploitation of Grid and networking services. In each of the three groups mentioned in the introduction, there are several differentiated scenarios, which we briefly outline in the following. The goal of DORII is to provide an integrated support environment to such applications, in terms of networking and middleware, by using similar concepts and abstractions, allowing interactivity among scientists and application developers, collaborative working, real-time collection, transfer, manipulation and visualization of data, virtualization and publication of real instrumentation as grid resources.

 

A.    Earthquake Engineering

·         Network-Centric Seismic Simulations (NCSS)

This application aims at performing pseudo-dynamic simulations using sub-structuring. This means that a part of the building being simulated is a “virtual” structure, while another part is a physical specimen placed in a laboratory and equipped with actuators (to apply forces or displacements) and sensors (to measure reactions). The simulation server collects the data provided by the sensors and the calculated response of the virtual building components, putting all together in order to represent this set as a unique structure.

  • Earthquake Early Warning System (EEWS)

An earthquake early warning system provided with the computational capabilities of a grid infrastructure can be used to speed up the calculation of shake maps. In particular, fast shake maps are very useful for damage assessment in a post-seismic scenario, when it is necessary to coordinate in a safe and quick way rescue team operations. A network of seismic sensors should be deployed and connected to a grid infrastructure by means of a wireless network. In the presence of an earthquake, all the typical seismic parameters (epicenter, magnitude, ground acceleration, etc.) are estimated and then used to build fragility curves. In the easiest implementation, the application has only to perform an interpolation accessing a database of use cases already calculated (with a non-trivial computing effort, simplified by the grid), in order to fit the current situation. In the other case, the map is calculated immediately after the earthquake parameters are recorded.

 

B.    Environmental Science (Ocean and Coastal Monitoring)

  • Oceanographic and coastal observation and modeling

Specific instruments (FLOATS) drift with the current at specific depths and perform temperature and salinity profiles that are transmitted to satellites when emerging to the surface. The data are received by a ground station in Toulouse through the Argos system on-board polar orbiting satellites and are sent by email to the National Institute of Oceanography and Experimental Geophysics (OGS) in Trieste, Italy, every 8 hours. The OGS processing server starts to update the float files with the positions. Graphics with the positions of the floats are produced and status tables are updated and posted on the web. The entire processing up to this stage is automatically activated every 8 hours. The scientific community may access public and restricted data. OGS developers can also access raw data and update processing software. In the future, they will also be able to update floats attributes and parameters.

This system is complemented with a second kind of (steerable) instrument (GLIDER), programmed to follow a specific route and perform temperature, salinity, oxygen, chlorophyll, and turbidity profiles. The glider transmits its data through a satellite link (Iridium) and the data are received at the OGS dock server (as binary files). The processing starts by converting the binary files into ASCII files. Graphics of data are generated and posted on the web. In this application, there is the possibility to interact with the instruments and change the mission parameters.

  • Mediterranean Ecosystem Forecasting - weekly Production of Analyses and Forecasts (OPATM-BFM)

The application provides a complete Mediterranean Marine Ecosystem model, which is being considered in a twofold perspective: i) as an automatic procedure (operational chain), which starts weekly and provides 7-day daily analysis and 10-day daily forecasts for the Mediterranean Marine Ecosystem; ii) for long-term climatic simulations of the Mediterranean Marine Ecosystem. The model is a complex one, coupling physical forcing and transport equations with a biogeochemical flux model.

  • Coastal Observation and Modeling using Imaging (HORUS Bench)

Digital imaging and remote sensing are used for beach monitoring (user distribution, intertidal profile), as well as river monitoring. A remote station consisting of one or more cameras and a cabinet containing a computer collects data and sends them to the local network, by using GPRS/UMTS, Wi-Fi or ADSL.

  • Simulation and Monitoring System for Inland Waters and Reservoirs (SMIWR)

The application taken into account in the DORII project mainly concerns the surveillance of toxic algae bloom. The instrumentation is installed at a water reservoir in Spain to control water quality (with physical and biological measures). The instrumentation should provide information in near real time about the water quality status. In parallel, a simulation model provides a prediction of the evolution of this quality. Both pieces of information are contrasted in a monitoring system, used by the water management authorities to apply corrective and preventive actions.

 

C.    On-line Data Analysis in Experimental Science (ODAES)

Experimental stations in facilities like Synchrotrons and Free Electron Lasers produce huge quantities of data. These data need to be analyzed on-line, which requires considerable computing power and often teamwork. The problem is even more difficult considering the increased efficiency of the light sources and detectors.  Complex calculations are required to take diffraction images and convert them into a 3D protein structure. Similarly, complex calculations are required to produce tomograms and then perform an analysis of the results.

The results of these analyses often need to be visualized by a distributed team and used to modify interactively the data collection strategy. Data from instruments and sensors are saved in distributed repositories, computational models are executed, and an interactive data mining process is eventually used to extract useful knowledge.

This kind of application requires both the support of a standard Grid computing environment, i.e., a Virtual Organization (VO), a set of distributed storage and computing resources and some resource brokering mechanism, a workflow definition and execution environment, and the capability to integrate instruments (the detectors) and interactively collaborate in the data analysis process. A QoS handling mechanism is necessary to use the available network structure effectively.

This application can be actually considered as a group of applications, since each beam-line and experimental station represents a completely different data collection process, with specific processing, storage, analysis, sharing, and visualization requirements.

This area concerns three main lines, developed around synchrotron light experiments. Actors involved in this application are software developers, beam-line scientists, and users of the experimental stations. The developers can perform all the activities of the scientists and also deploy software and define the collective behavior (workflows and scripts). The beam-line scientist can perform all the activities of the users, besides controlling the beam-line instrumentation and the detectors. The user can access and visualize data even remotely, collaborate with other users, beam-line scientists and developers, define the data collection strategy, start/stop data acquisition, and monitor the acquisition and online processing. The user can also run scripts and workflows and monitor their execution, and perform offline processing of the available data.

This application is currently applied to the following beam-lines and experimental stations.

  • SAXS (Small Angle X-ray Scattering)

SAXS has become a well known standard method to study the structure of various objects in the spatial range from 1 to 1000 nm, and therefore instruments capable to perform such experiments are installed at most of the synchrotron research centers. The high-flux SAXS beam-line at ELETTRA (Trieste, Italy) is mainly intended for time-resolved studies on fast structural transitions in the sub-millisecond time region in solutions and partly ordered systems with a SAXS-resolution of 1 to 140 nm in real-space.

  • SYRMEP (SYnchrotron Radiation for MEdical Physics)

The SYRMEP beam-line has been designed by Sincrotrone Trieste, in cooperation with the University of Trieste and the Italian National Institute of Nuclear Physics (INFN), for research in medical diagnostic radiology. The use of monochromatic and laminar-shaped beams allows, in principle, an improvement of the clinical quality of images and a reduction of adsorbed dose (because of both monochromatic and scatter reduction). The available imaging techniques of the SYRMEP beam-line are conventional absorption radiology and tomography, phase contrast imaging, diffraction enhanced imaging.

  • X-Ray Diffraction

The X-Ray Diffraction 1 (XRD1) beam-line has been designed primarily for macromolecular crystallography.

 

The following diagram (Fig. 1) describes how the functional components interact during a typical workflow of the online processing in experimental science. The functional components appearing in the figure are the Virtual Control Room (VCR), the Instrument Element (IE), the Storage Element (SE), and the Computing Element (CE). Whereas SE and CE are “classical” Grid components, the VCR and the IE are a recent addition to Grid middleware, and have been further enhanced within DORII.

 

 

Fig. 1.Functional components and workflow in ODAES.

III.        Networking

A.    Network Application Requirements and Network Infrastructure

The project has conducted an in-depth analysis of applications’ networking requirements [10]. When looking at DORII applications from the network requirements perspective, they can be divided into two major groups: i) applications that process pre-collected data: NCSS, FLOAT, GLIDER, OPAT-BFM and HORUS belong to this category; ii) applications working on data acquired in real-time: these include EEWS, SMIWR, and the three ODAES cases. Moreover, it is necessary to take into account that in most instances data processed by the applications are acquired in real-time by using a sensor network.

Applications in both classes are characterized by a point-to-point communication paradigm and, in general, do not use data replication. Therefore, multicast or point-to-multipoint LSPs (Label Switched Paths) are not necessary. The main QoS requirements that have been identified are summarized in Table I.

Moreover, the following characteristics can be outlined:

1)                   Applications that process pre-collected data

  • Bandwidth requirements cover a wide range – from 1 to 100 Mbps, but there is no need of very large data pipes (in the order of a few Gbps);
  • All applications require low packet loss, but (with the exception of one) they can tolerate some delay and jitter;
  • The nature of the majority of traffic sources is VBR (Variable Bit Rate), i.e., the sources are bursty.

2)                  Applications working on data acquired in real time

  • Multicast or point-to-multipoint LSPs are not necessary;
  • Again, bandwidth requirements cover a wide range, but no large data pipes are needed;
  • There are variable requirements in terms of packet loss (from very low to high) and jitter (from none to high) that may be tolerated;
  • The nature of the traffic is bursty.

In both cases, we observe that bandwidth reservation with book-ahead scheduling may be useful and should indeed be applied in some instances. High reliability is always required. Local access networks (LANs) will be upgraded to the Gigabit level and, wherever possible, traffic priority mechanisms will be introduced. Layer 2 VPNs and, in some cases, point-to-point guaranteed bandwidth connections should be implemented in the backbone. Bandwidth on Demand (BoD) and the adoption of IPv6 are not strictly necessary, but they are currently experimented in our test bed for some applications.

 

Application

Bandwidth

(Mb/s)

Throughput

(Mb/s)

Delay

(ms)

Jitter

Packet Loss

Path Reliability

NCSS

1

1

5

Low

Very Low

No

EEWS

1

1

5

High

Very Low

No

FLOAT

10

10

> 30

 

Very Low

No

GLIDER

10

10

> 30

N/A

Very Low

No

OPATM-BFM

10

10

> 30

High

Very Low

Yes

HORUS

10-100

10-100

> 30

High

Very Low

No

SMIWR

10

100

> 30

High

High

No

ODAES

100

100

> 30

No

No

 

Table 1.QoS requirements for DORII applications.

 

In general, before activating QoS functionalities, a careful monitoring of the network traffic generated by the applications and of the main factors affecting Quality of Experience (QoE) – which might in turn determine QoS requirements – should be performed under the best-effort service. Therefore, a set of monitoring tools (shortly described in the next sub-section) has been set up and configured to perform this task.

A sketch of DORII connectivity and monitoring organization is depicted in Figure 2.

 

Fig. 2.DORII Network Infrastructure and deployment of monitoring tools (USTUTT, LMU, PSNC, GRNET, ELETTRA, OGS, EUCENTRE, CNIT, CSIC, UC and ECO are DORII partners – see http://www.dorii.eu - most of which are connected to the respective NREN (National Research and Education Network).

B.   

Network Monitoring and Management

The network monitoring infrastructure deployed for the DORII project consists of the following tools:

  • Smokeping, for network latency measurement;
  • Pathload, for the estimation of the available bandwidth along a network path;
  • SNMP-based Web applications, for monitoring network interface utilization.

B.1   Smokeping

Smokeping [11] is a software tool that can be used to measure the network latency. More specifically, a Smokeping probe sends test packets out to the network and measures the amount of time they need to travel to a target host node and back. The RRDtool [12] is used to maintain a long-term data-store with latency measurement, and the presentation of the data on the web is done by means of a CGI with some AJAX capabilities for interactive graph exploration.

In the framework of the DORII project, Smokeping is used in master/slave mode: this way, Smokeping probes (slaves) are allowed to run remotely and to perform latency measurements from multiple locations to the target hosts.

As shown in Fig. 1, Smokeping has been deployed as follows:

  • the Smokeping master is located at CNIT [13]; it maintains a configuration file with a specific section for each slave, and it storesand presents all monitoring data collected by the slaves.
  • Remote probes (Smokeping slaves) have been installed at DORII partners’ sites EUCENTRE, ELETTRA, GRNET, CSIC-IFCA, OGS and PSNC. Based on settings contained in the configuration file retrieved from the master (e.g., measurement utility, target host address, measurement length and period, etc.), each slave performs latency measurements and sends back the results to the Master server by using the HTTP protocol. In the DORII network infrastructure, the following targets have been identified: Computing Elements (CEs), Storage Elements (SEs), Instrument Elements (IEs), remote sites’ Access Gateways (AGs).

B.2   Pathload

Pathload [14] is a monitoring tool that estimates the available bandwidth of a network path. The basic idea behind Pathload is that the one-way delays of a periodic packet stream show an increasing trend when the stream rate is larger than the available bandwidth.

Pathload is based on a client-server architecture and consists of two main components:

  • pathload_sndthat listens on TCP port 55002 and acts as a traffic generator;
  • pathload_rcvthat starts a Pathload session and acts as a traffic receiver.

Pathload has been customized for the DORII project. In addition to the previous components, some scripts have been introduced to monitor the status of the sender and receiver processes and to automatically export the measurement data collected by the receiver to the management station located at CNIT via HTTP. The tool has been installed as follows (see Fig. 2):

  • Pathload_sender: at each site where CEs and/or SEs of the DORII e-Infrastrcture are located (GRNET, PSNC, CSIC-IFCA);
  • Pathload_receiver: at each site where IEs are deployed (EUCENTRE, OGS, ELETTRA, UC, etc.) and, therefore, DORII applications are running.

This way, the bandwidth available from the sites hosting CEs and SEs to the sites with applications (IEs and VCR) can be estimated.

B.3   SNMP-based network monitoring

Various applications exist to collect and consolidate network usage information. At a basic level, such applications (also called managers) use the Simple Network Management Protocol (SNMP) to read statistics from each monitored device (router or host) where an SNMP agent is configured and running. A standard Management Information Base (MIB) collects counters of the number of datagrams and bytes sent and received on each interface of a device, and it also gives the number of packets discarded because of congestion. An SNMP application can periodically poll each device and convert the returned information into a view of usage across the whole network. SNMP can also help identify network interface failures or outage conditions. In the framework of the DORII, SNMP is required to be enabled on IEs, CEs, SEs and routers. Data are collected by an SNMP manager and interfaced with a Web server by using ad-hoc CGI programs.

B.4   Nagios

Nagios [15] is an open source monitoring system providing comprehensive and scalable monitoring of all mission-critical infrastructure components, including applications, services, operating system and system metrics or network protocols and infrastructure. Nagios is integrated in the monitoring framework of the DORII e-Infrastructure providing information on problems and failures related to the computational, storage and instrument resources of this infrastructure, and monitoring services such as the CE, SE, BDII, WMS, and IE.

IV.        e-Infrastructure and Middleware Architecture

One of the main requirements posed by applications of many strategic areas in science and technology (as the ones specified by ESFRI - European Strategy Forum on Research Infrastructure [16]) is to design a service-oriented IT architecture, which should allow users manage, maintain and exploit diverse instrumentation and acquisition devices, together with heterogeneous computation and storage facilities granted by the traditional Grid. Such architecture is similar to those set up by EGEE (Enabling Grids for E-sciencE) [17], DEISA (Distributed European Infrastructure for Supercomputing Applications) [18] and many other Grid projects. Unlike the traditional Grid, the e-Infrastructure should practically enable access to remote instrumentation in high-performance computing and storage environments, and allow users and their applications to get an easy and secure access to various remote instrumentation resources, supported by high-performance Grid computation and storage facilities. The e-Infrastructure does that by providing standardized services to access integrated instrumentation resources (including expensive experimental equipment, but also smaller network-connected sensors and mobile devices), in a unified way with the traditional Grid services (as provided, e.g., by gLite [19]).

At the time of writing of this paper the DORII e-Infrastructure consists of 9 sites offering computational and storage resources distributed among the partners of the project. Table 2 shows the sites that support the catch all DORII VO, where most of the DORII applications have been deployed:

 

Country

Partner Name

Site Name

CPU Cores

Storage (TB)

Poland

PSNC

PSNC

1068

16

Spain

CSIC

IFCA-CSIC

372

107

IFCA-I2G*

372*

107*

Italy

ELETTRA

ELETTRA

80

0.1

Greece

GRNET

HG-01-GRNET

64

4.78

HG-02-IASA

118

3.14

HG-03-AUTH

120

3.13

HG-04-CTI-CEID

114

2.87

HG-05-FORTH

120

2.33

HG-06-EKT

228

7.76

 

Totals

2284

147.11

 

Table 2.vo.dorii.eu Computational and Storage Resources

The gLite middleware was also chosen in DORII, and therefore composes the core of the middleware architecture, providing a set of Information (BDII), Job Management (CE, WMS), Data Management (SE, LFC) and Security (VOMS, MyProxy) Services. In DORII, a special service has been developed (the Instrument Service) that extends the basic gLite middleware towards interconnecting an instrument/sensor with the infrastructure and providing remote access to it from applications. The Instrument Element (IE) middleware, originally conceived in GRIDCC [20], and re-developed within DORII, represents the virtualization of data sources and enables interactive interfacing and controlling instruments integrated into the e-Infrastructure.

From the user perspective, the middleware architecture for the e-Infrastructure can be summarized as presented in Fig. 3.

 

Fig. 3.The DORII middleware architecture.

The architecture is composed of two types of components: grid services for interconnecting instruments/sensors with the infrastructure (Instrument Services) that are DORII extensions of the basic EGEE gLite middleware, and user-level middleware services that are not a part of the infrastructure but aim at facilitating the fulfilment of the extra application requirements to the infrastructure.

The user-level middleware components make up a considerable part of the architecture. They target at promoting the e-Infrastructure services to the users and ensuring the efficient development, deployment and use of the applications in grid environments. The Virtual Control Room (VCR) [21] is the central front-end for the e-Infrastructure. The VCR is a richly featured web portal that provides an intuitive and user-friendly interface, and it serves a complete desktop environment for the e-Infrastructure end-users (Fig. 4).

Some of the e-Infrastructure’s applications perform complex experiments described through their workflows. In order to simplify the process of designing the workflows for their applications on the part of the users, as well as submitting a workflow to the execution and monitoring it during the execution, the Workflow Management System (WfMS) is included in the architecture. The WfMS-launcher is provided by the VCR, as well.

 

 

Fig.  4.The e-Infrastructure front-end -VCR

 

To supportapplication development activities, the architecture also includes g-Eclipse – an integrated development framework for the Grid. g-Eclipse offers sustainable support for the full development cycle of the Grid applications, including code deployment, remote compilation and debugging, etc. [22].

Access from the user-level middleware components to the grid services deployed on the infrastructure (gLite and Instrument Services) is facilitated by means of the Common Library (CL), which is a lightweight grid service provider that enables unified access to the grid services set up on the infrastructure. The CL is designed in a way that ensures components’ interoperability within the architecture and simplifies the installation of the user-level middleware.

To support the specific application requirements on interactivity, visualization, as well as improved running parallel MPI applications on the e-Infrastructure, the following solutions are additionally included in the middleware architecture: GLogin (transport layer service provider used for opening an interactive session between the infrastructure and users), GVid (video streaming service intended for grid-based visualizations in scientific applications), Open MPI(communication library for MPI applications), MPI-Start (tool for improved running of the parallel applications on the infrastructure).

 

V.         Deployment of a selected application and related experimental results

A.      The application

The EEWS (Earthquake Early Warning System) aims at recording seismic data from sensors, possibly in real time, and at processing them in order to extract time history for ground velocity, ground acceleration and displacements. This is the starting point to calculate some interesting parameters widely adopted in the seismic community, such as the acceleration Fourier amplitude spectrum and the acceleration response spectrum (useful to evaluate a building's response to the force imposed by the earthquake). The rationale of this application is to provide scientists a unified environment mixing access to instruments and computational tools, speeding up the analysis carried out after a seismic event.

All the operations are performed remotely and on the grid, employing the DORII infrastructure. The VCR plays the role of the user interface: all the actions are carried out and all the resources are accessed through this web portal.

The IE is located at EUCENTRE (Pavia, Italy) and it hosts the IM devoted to access the server collecting data from sensors. This node is in Genoa, Italy, while the seismic sensors are spread over the Liguria Region. Measurements from each channel are time-stamped and saved locally on the IE in a separate file, then moved to a SE. Finally, a CE retrieves each file from the SE and performs the computation. Since the same computation is repeated for each input file, the job is parametric, where the parameters are the file names. The JDL (Job Description Language) file characterizing the job is created using a VCR application, which is a Jython script that customizes a given template. The user is asked only to select the input folder on the right SE. The output is downloaded to the home folder on the VCR. An alternative approach is represented by the Workflow Manager, a graphical and friendly interface to specify the parameters.

B.      The instruments

A set of seismic sensors is connected to a central point over the UDP protocol, by means of wired or wireless links. Each device measures the ground velocity along the three Cartesian directions, so each station broadcasts at least three channels, plus state of health information. All the data are gathered by a central server, which manages replicated and out of order packets. The reconstructed stream is stored and made available as DAT service: the user can access the historical data series, specifying the starting point of information flow and the time window of interest. On the contrary, if a user or an application is focused on near-real-time access, the NAQ service is more suitable. Given that some parameters are tuned properly, it is possible to configure the service to forward the original packets from instruments to the application, minimising the delay. In fact this is the meaning of near-real-time acquisition. Seismic sensors send measurements to the central server as soon as possible, but the server has to store them, organizing in data structures called “bundles” and “packets”. These operations take some time, however limited to few seconds, depending on the distance between the server and the client.

Packets are uniquely identified by a sequence number and a time stamp; they include an odd number, from 1 to 255, of 17-byte bundles. This solution allows adapting the packet size to the network. Moreover, it is worth noting that data contained in a packet are homogeneous, for example the measurements of a particular channel or the status information of a station. The first bundle in a packet always acts as header, specifying some useful details such as station’s unique ID, time stamp, sampling frequency and sequence number. The following groups of bytes carry only data. State of health messages are obviously strings, while measurements are integer values, in compressed or uncompressed format.

C.     The Instrument Manager

The information flow is provided to the IM by means of a TCP connection as a time series or a transparent serial stream. A transparent serial stream handles only uncompressed format and all packets have the same length, employing padding where necessary. On the contrary, a time series stream can deal with compressed data, too, and the packet size may vary. Both streams exist also in the buffered version, so it is possible to retrieve additional packets prior to the beginning of subscription, moving the starting point slightly to the past. Finally, one of the most important parameters is the Short Term Completion (STC) time. It represents the time interval, from 0 to 300 seconds, the server waits to fill the gaps in the stream in case of retransmitted packets.

Our goal consists in reading seismic data in near-real-time, so we chose to subscribe a time series compressed stream (as it is more efficient) in its unbuffered version, disabling STC. This means that measurements are not guaranteed to be in order because of errors and losses, so the gaps are filled using interpolation and delayed packets are discarded. The approach guarantees a continuous flow, fundamental feature for the subsequent computation, complying with the strict time constraint. All the operations are performed by a Java client library developed for the VCR architecture on the basis of the user’s manual provided by the server manufacturer.

 

D. 5.Experimental Results

The user task list previously described represents the starting point for the test bed set-up employed in this work. Moreover, network performance over the grid infrastructure is monitored during the entire life cycle of the application execution. In our experiments, we skipped the acquisition phase, since the server gathering sensors’ data is not part of the infrastructure, unlike the IE sending the query. Moreover, the required network resources are very limited: as a matter of fact, a single station channel only needs few kbps. The size of the file containing the initial data set for the computation is about 115 MB and corresponds to measurement data coming from two channels and collected for a whole day. The archive is stored on a server at EUCENTRE that acts as IE and VCR; then, it is transferred to a SE at GRNET (se01.isabella.grnet.gr). The average throughput is about 7.5 Mb/s. The analysis is carried out by a parametric job, where the parameters correspond to different settings for the two seismic stations being monitored, so a single execution of the application produces two children nodes. The job is launched three times, and the target CEs are located at different sites: GRNET (ce01.athena.hellasgrid.gr), PSNC (ce.reef.man.poznan.pl) and IFCA-CSIC (egeece01.ifca.es).

Table 3 reports the time necessary to upload the file from the SE to each CE, along with the latency measured by using Smokeping. If the user chooses a local CE (ce01.athena.hellasgrid.gr) the latency is in the order of a few ms, and therefore it is not reported in the table, but also in the other two cases the latency is very low, since it is less than 100 ms. The processing time for each CE is reported in Table 4.

 

CE

Upload

Start Time

Upload

Time [s]

Throughput

[Mb/s]

Average

Latency

[ms]

ce01.athena.hellasgrid.gr

27/11/2009, 12.59

2.0

478.4

-

ce01.athena.hellasgrid.gr

27/11/2009, 12.59

2.1

460.2

-

ce.reef.man.poznan.pl

27/11/2009, 13.07

351

2.6

59.2

ce.reef.man.poznan.pl

27/11/2009, 13.04

241

3.9

59.2

egeece01.ifca.es

27/11/2009, 13.46

95

9.8

94.3

egeece01.ifca.es

27/11/2009, 13.46

82

11.7

94.3

Table3. Upload time from se01.isabella.grnet.gr to each CE

 

CE

Computation Start Time

Processing Time [s]

ce01.athena.hellasgrid.gr

27/11/09, 12.59

26

ce01.athena.hellasgrid.gr

27/11/09, 13.18

65

ce.reef.man.poznan.pl

27/11/09,13.07

16

ce.reef.man.poznan.pl

27/11/09, 3.04

47

egeece01.ifca.es

27/11/09, 13.48

22

egeece01.ifca.es

27/11/09, 13.48

45

 

Table 4.Processing time for each CE

 

The job output consists of files whose size is comparable with the size of the input file. Finally, the output files are retrieved, by using the VCR at EUCENTRE. Table 5 reports the time necessary to download the output files stored by each CE from EUCENTRE.

 

CE

Getting Output Start Time

Download Time

[s]

Throughput [Mb/s]

Available Bandwidth

[Mb/s]

Average Latency

[ms]

ce01.athena.hellasgrid.gr

27/11/2009, 13.18

36

26.8

92

49.2

ce01.athena.hellasgrid.gr

27/11/2009, 13.19

32

30.2  

92

49.2

.2ce.reef.man.poznan.pl

27/11/2009, 13.22

33

29.3

102

29.9

ce.reef.man.poznan.pl

27/11/2009, 13.23

89

10.9

102

29.9

egeece01.ifca.es

27/11/2009, 14.16

32

30.2

96

46.1

egeece01.ifca.es

27/11/2009, 14.17

32

30.2

96

46.1

Table 5. Download Time from each CE to EUCENTREVCR

 

CE

Total Time

[s]

Elaboration Time [s]

Communication

Time [s]

ce01.athena.hellasgrid.gr

64

26

38

ce01.athena.hellasgrid.gr

99

65

34

ce.reef.man.poznan.pl

400

16

384

ce.reef.man.poznan.pl

377

47

330

egeece01.ifca.es

149

22

127

egeece01.ifca.es

159

45

114

 

Table6. Total Time

The last two columns contain the available bandwidth (estimated by Pathload) and the average latency (measured by Smokeping) from each CE to EUCENTRE VCR. It is relevant to highlight that the throughput is significantly less than the available bandwidth: this means that the communication protocols are not efficient enough to utilize the available bandwidth of the communication channel.

Finally, Table 6 reports the overall time for the execution of the application. As clearly shown in the table, the amount of time necessary for exchanging the data may significantly affect the performance of the application and represents (except in the second case) the major component of the overall execution time.

 

VI.        Conclusions

Remote Instrumentation Services are an important part of Grid-based applications, and may well become a significant component of the Future Internet services and of the Pervasive Computing paradigm. The platform and test bed that have been built and are operated by the DORII project have a twofold goal: on one hand, they aim at providing further extensions and refinements of the functional components for the virtualization and effective management of real instrumentation; on the other hand, they see the direct involvement of user communities, which will bring their operational experience in the experimental activities to be carried out on the test bed. The paper has presented the main building blocks of this process, and the design choices stemming from the applications’ requirements. An example in performance monitoring of a selected application has been shown and commented. Scientists in the different disciplines involved are actively participating in DORII applications’ deployment and experimental activity, with the goal of evaluating the new middleware functionalities and suggesting improvements. Future work will be aimed towards the development of additional functionalities and to providing input to standardization activities in Remote Instrumentation Services.

Acknowledgment

This work was supported by the European Commission under the DORII project (contract no. 213110).

References
  • V.J. Harward et al., “The iLab shared architecture: A Web Services infrastructure to build communities of Internet accessible laboratories,” Proc. IEEE, vol. 96, no. 6, pp. 931-950, June 2008.
  • D.F. McMullen, R. Bramley, K. Chiu, H. Davis, T. Devadithya, J.C. Huffman, K. Huffman, T. Reichherzer, “The Common Instrument Middleware Architecture,” in F. Davoli, N. Meyer, R. Pugliese, S. Zappatore, Eds., Grid Enabled Remote Instrumentation, Springer, New York, NY, 2008, pp. 393-407; ISBN: 978-0-387-09662-9.
  • F. Lelli, E. Frizziero, M. Gulmini, G. Maron, S. Orlando, A. Petrucci, S. Squizzato, “The many faces of the integration of instruments and the grid,” International Journal of Web and Grid Services, vol. 3, no. 3, 2007, pp. 239 – 266.
  • F. Davoli, S. Palazzo, S. Zappatore, Eds., Distributed Cooperative Laboratories: Networking, Instrumentation, and Measurements, Springer, New York, NY, 2006; ISBN 0-387-29811-8.
  • RINGrid project website and RINGrid whitepaper, http://www.ringrid.eu.
  • GRIDCC project website, http://www.gridcc.org.
  • Interactive European Grid project website, http://www.i2g.eu/.
  • g-Eclipse project website, http://www.geclipse.org/.
  • DORII project website, http://www.dorii.eu.
  • DORII Project Deliverable DSA1.1, “Analysis of Applications and Network Requirements for Remote Instrumentation Infrastructure”.
  • Smokeping Home Page [http://oss.oetiker.ch/smokeping/].
  • RRD Tool Home Page [http://oss.oetiker.ch/rrdtool/index.en.html].
  • DORII Monitoring Platform Home Page [http://monitor2.cnit.it].
  • Pathload Home Page [http://www.cc.gatech.edu/fac/Constantinos.Dovrolis/bw-est/pathload.html].
  • NAGIOS Home Page [http://www.nagios.org].
  • ESFRI Home Page [http://cordis.europa.eu/esfri/]
  • EGEE Project Home Page [http://www.eu-egee.org/]
  • DEISA project Home Page [http://www.deisa.eu/]
  • http://glite.web.cern.ch/glite/.
  • E. Frizziero, M. Gulmini, F. Lelli, G. Maron, A. Oh, S. Orlando, A. Petrucci, S. Squizzato, S. Traldi, “Instrument Element: A new Grid component that enables the control of remote instrumentation”,Proc. 6th IEEE Internat. Symp. on Cluster Computing and the Grid Workshops (CCGRIDW'06); available online: http://ieeexplore.ieee.org/iel5/10857/34198/01630943.pdf.
  • R. Ranon, L. De Marco, A. Senerchia, S. Gabrielli, L. Chittaro, R. Pugliese , L. Del Cano, F. Asnicar, M. Prica, “A web-based tool for collaborative access to scientific instruments in cyberinfrastructures,” in F. Davoli, N. Meyer, R. Pugliese, S. Zappatore, Eds., Grid Enabled Remote Instrumentation, Springer, New York, NY, 2008, pp. 237-251; ISBN: 978-0-387-09662-9.
  • H. Kornmayer, M. Stümpert, M. Knauer, P. Wolniewicz, “g-Eclipse - An integrated workbench tool for Grid application users, Grid operators and Grid application developers,” Cracow Grid Workshop '06, Cracow, Poland, October 15-18, 2006.
Rate this article: 
0

Comments

This blog is so nice to me. I will keep on coming here again and again. Visit my link as well.. free apps for iPod Touch

I would like to say that this blog really convinced me to do it! Thanks, very good post.
best apps for Blackberry

This is really a nice and informative, containing all information and also has a great impact on the new technology. Thanks for sharing it, Wayne

Finally, Table 6 reports the overall time for the execution of the application. As clearly shown in the table, the amount of time necessary for exchanging the data may significantly affect the performance of the application and represents (except in the second case) the major component of the overall execution time.
maternity wear

Saying thank you will not only be sufficient to c great lucidity in your writing. I will instantly capture your RSS feed to stay abreast of updates.

trade shows

The size of the file containing the initial data set for the computation is about 115 MB and corresponds to measurement data coming from two channels and collected for a whole day.registry cleaner

So you are talking about Media Player! I really liked your post to read. And you are doing well in the site. Keep it up.

Aquarium Cichlid

Data from instruments and sensors are saved in distributed repositories, computational models are executed, and an interactive data mining process is eventually used to extract useful knowledge.Hip Hop Blogs

Very useful blog bestech projects in gurgaon

The Iowa caucuses and the looming New Hampshire primary provide evidence that 2008 will be the year that young voters will play a key role in our political process and election of our next president.

pokies machines

This means that a part of the building being simulated is a “virtual” structure, while another part is a physical specimen placed in a laboratory and equipped with actuators

Homes in Columbia SC

I am very enjoyed for this blog. Its an informative topic. It help me very much to solve some problems. Its opportunity are so fantastic and working style so speedy.
Top colleges

I read your blog frequently and I just thought I’d say keep up the amazing work! bobsweep review

Thankyou for this wondrous post, I am glad I observed this website on yahoo.
new horoscope signs

Rale nicely fuel airman it contains structural noesis for me. I am prosperous to conceptualize your riches way of penning the canton. Now you add it leisurely for me to iterate and obligate the trait. Consignment you for the playacting.
www.astrology software from world of wisdom

It was very well authored and easy to understand. Unlike additional blogs I have read which are really not good. I also found your posts very interesting. In fact after reading, I had to go show it to my friend and he enjoyed it as well!acai berry cleanse

Thank heaven my close dude told to utilize the mortgage loans. Thus I received the sba loan and made real my dream.

Nerium Age Defying

I found this is an learning and attractive order so i think so it is very useful and cherished. I would like to be in touch thanks you for the hard work you have made in inscription this article. code promo photocite

Thank you for taking the time to publish this information very useful! Karmaloop Codes

process and analyze the results, and make them available to the scientific community through experiment data repositories, organized as Digital Libraries.

magicjack

This is something I have been thinking about for a long time and you really captured the essence of the subject.

Top Directory