September 28, 2016

GridPP Storage

Co-evolving data nodes

Bing! a mail comes in from our friends in the States saying look! here's someone in New Zealand who has set up iRODS node to GridFTP data to/from their site. It is a very detailed document yet it looks a lot like the DiRAC/GridPP data node document. They have solved many of the same problems we have solved, independently.

The basic idea is to have a node outside your institute/organisation which can be used to transfer data to/from your datastore/cluster. With a GridFTP endpoint, you could move data with FTS (as we do with DiRAC), people can use Globus (used by STFC's facilities, for example), or data can be moved to/from other e-infrastructures (such as EUDAT's B2STAGE) or EGI. Regardless of the underlying storage, there will be common topics like security, monitoring, performance, how to (or not to) firewall it, how to make it discoverable, etc. It could be the data node in a Science DMZ.

The suggestion is that we (= GridPP, DiRAC, and in fact anyone else who is willing and able) contribute to a detailed writeup which can be published as an OGF document (open access publishing for free!, and because GridFTP is an OGF protocol), either community practice or experiences - and then have a less detailed paper which could be submitted to a conference or published in a journal. 

by Jens Jensen ( at September 28, 2016 10:10

September 22, 2016

GridPP Storage

Upgrading and Expanding Lustre Storage (part4)

With a 1.5PB Lustre file system set up we now need to transfer our data from the old lustre system, conveniently also 1.5 PB in size, before we can put it into production.

Migration of Data:

It was found that is was not possible to mount both Lustre 1.8 and 2.8 on the same client, therefore migration of data had to be done via rsync between two clients mounting the different Lustre file systems. Setting up an rsync demon on the clients was found to be an order of magnitude quicker than using rsync over ssh for transferring data between the two clients. Hard links, ACLs and extended attributes are preserved by using the “-HAX” option when transferring data. Up to a dozen clients were utilised over the course of about six weeks to transfer 1.5PB of data between the old and new Lustre file systems. After the initial transfer the old and new systems were kept in sync with repeated rsync runs, remembering to use the “—delete” option to remove files that no longer existed on the live Lustre system. MD5 checksums were compared for a small random selection of files. The final transfer from the old to new Lustre took about a day, during which the file system was unavailable to external users. Then all clients were updated to the new Lustre version.  

Real World Experience:

With the new Lustre system put into production we then recommissioned the old system to create a 3PB Lustre file system. The grid cluster has about 4000 job slots in over 200 Lustre client compute nodes. The actual cluster is shown below. Note that the compute nodes fill the bottom 12U of every rack, where the air is cooler, and storage above them in the next 24U.
Real world performance over half a year, March to September 2016, is shown below. When all job slots are running grid “analysis” workloads, requiring access to data stored in Lustre, no slow down in job efficiency was observed. An average of 4.8 Gb/s is seen for reading data from Lustre and 1.6Gb/s for writing to Lustre (which is always done through StoRM).
However, in one case a local user simultaneously ran more than a 1500 jobs each accessing a very large number of small files, in this case BIOinformatics data, on Lustre and a slow down in performance was observed. Once the user was limited to no more than 500 jobs no further issues were seen. It is expected that accessing small files on the Lustre filesystem is not efficient [1] and should be avoided or limited where possible. A future enhancement to Lustre is planned that will enable small files to be stored on the MDS which should improve small file performance [1]. 

The Queen Mary Grid site major workload is for the ATLAS experiment which keeps detailed statistics of site usage. We are responsible for processing about 2.5% of all ATLAS data internationally and about 20% of data processed in the UK. Remote data transfer statics are shown below. Over the last 6 months ATLAS has transferred 2.39 PB of data into the cluster (top left plot), the weekly totals are shown in the top left plot, with a maxim for one week of 340TB (an average 4Gb/s). The bottom plot shows that 2.3PB has been sent to other grid sites around the world from Queen Mary.

Future Plans:
  • Double the Storage of the cluster to 6PB in 2018.
  • Consider an upgrade to Lustre 2.9 which will have bug LU1482 fixed and also provide additional functionality such as user and group ID mapping which would allow the storage to be used in different clusters. However Lustre 2.9 is SL/Centos7 only.
  • Upgrade OSS servers to SL/CentOS 7 from SL6. 
  • Examine the use of ZFS in place of hardware raid which might help mitigate very long raid rebuild times after replacement of a failed hard drive.

Over the past 4 Blogs we have shown a  successful major upgrade of Lustre. Including the specification, installation, configuration, migration of data, and operation of hardware and software.  

by Daniel Traynor ( at September 22, 2016 13:24

July 20, 2016

The SSI Blog

ProtoMS passes the test: developing a test suite for biomolecular simulation software

By Devasena Inupakutika and Steve Crouch, Software Sustainability Institute, and Richard Bradshaw, University of Southampton.

As a part of their research, Jonathan Essex’s Research Group developed ProtoMS, a biomolecular simulation software that allows the simple development of methods for the calculation of relative protein/ligand binding free energies. The Software Sustainability Institute worked with them as part of an Open Call project to develop a test strategy and Python test suite, and to verify the operation of the ProtoMS software as an overall product. The great news is that the latest release now includes the test suite and has already found some interesting issues which have been resolved.

author:devasena Inupakutika author:steve Crouch

read more

by s.aragon at July 20, 2016 12:04

July 11, 2016

Tier1 Blog

Analysis of Call-outs for 2015 and First Part of 2016

During the first week of July our Work Experience student, Ellen carried out an analysis of the data regarding the calls to the Tier 1 team from 2015 to 2016 so far. She has provided this report.

Service Restored By Weekday 2015

The above plot shows the distribution of call-outs for each day of the week, with each line showing a categorization of the call resolution. The “n/a” is normally applied to alarms that occur during working hours which are therefore not handled by the out of hours team. In 2015, there was a total of 250 calls. Call-outs can arise from genuine failures or from cases where work being undertaken triggers the call-out system. From this graph we can conclude that work that triggers call-outs is carried out at the beginning of the week, primarily on Tuesdays, then begins to dip through to the weekend. This largely reflects the scheduling of work in the early part of the week, particularly on Tuesdays.

Callout By Service 2015Callout By Service 2016 (Jan-Jul) 2016

These pie charts show the distribution of calls from different systems. Most sectors in on the 2016 pie are relatively similar to the one of 2015, however calls for “CE” and “SRM” have decreased whilst “Disk Server” and “Database” have had a significant increase, but this may change as the year progresses.

by Gareth Smith at July 11, 2016 12:19

The SSI Blog

Assessing the Sustainability of the Research Data Spring Projects

By Steve Crouch, Research Software Group Lead.

This article was originally published on Jisc's Research Data Spring blog on 30 June 2016.

Sustainability is increasingly becoming recognised as a must-have goal in the development of research software. Earlier this year, I undertook a sustainability assessment of the projects that had reached the second phase of the Jisc's Research Data Spring. It is particularly heartening that Jisc has sustainability high on the agenda across its portfolio of software projects, and that the projects themselves are embracing this ideal with such enthusiasm.

Author:Steve Crouch Research Software Group Consultancy Jisc Research Data Spring Sustainability

read more

by s.crouch at July 11, 2016 09:58

May 10, 2016

Tier1 Blog

R89 Water Pump Outage

Yesterday an unexplained site-wide BMS glitch caused the R89 BMS to unexpectedly stop the four pumps which circulate water around from the CRACs to the Chillers and back again. The Chillers shut down due to lack of water flow but the CRACs continued to circulate air in the rooms.

Schematic of water flow in R89 plant room

Machines across R89 began to heat up at roughly one °C per minute — our servers will try to shut down cleanly at a threshold based on the model of the hardware (most commonly 60°C). Shortly after being notified of the pump shutdown we paused all running batch jobs and prevented any new jobs from starting which appeared to stabilise temperatures, during this time other groups using the data-centre were also shutting down their services. Just before 5pm the pumps and chillers were restarted and temperatures started to fall. After some discussion, the paused jobs were allowed to continue (but new jobs were still prevented).

The graph below shows the mean internal (not CPU) temperature of all 168 hosts identified as “worker nodes” throughout yesterday’s event with key events labelled (note time is in UTC).

Graph of mean worker node temperature during cooling problems

For a wider view of the whole room, we can look at the period from ARTEMIS’s point of view.

Or with (incomplete) rack layouts overlaid over the data:

by James Adams at May 10, 2016 13:57

July 27, 2015


Simple CVMFS puppet Module

Oxford was one of the first site to test CVMFS and also to use cern CVMFS module. Initially installation of CVMFS was not well documented  so cern cvmfs puppet module was very helpful in installing and configuring cvmfs.
Installation became easy and more clear with the newer version of cvmfs. One of my ops action was to install gridpp multi vo cvmfs repo with cern cvmfs puppet module. We realized that it is easy to write a trimmed down version of cern  cvmfs module rather than use cern cvmfs module directly. The result is cvmfs_simple module which is available on GitHub.

'include cvmfs_simple' will set up LHC repos and gridpp repo

Only mandatory parameter is

cvmfs_simple::config::cvmfs_http_proxy : 'squid-server'

It is also possible to add local cvmfs repository. Extra repos can be configured by passing values from hiera

cvmfs_simple::extra::repo: ['gridpp', 'oxford']

Oxford is using a local cvmfs repo to distribute software for local users. oxford.pp can be used as template for setting new local cvmfs repo.

cvmfs_simple doesn't support all use cases and it expects that everyone is using hiera ;) . Please feel free to change it for your use case. 

by Kashif Mohammad ( at July 27, 2015 09:53

February 24, 2015


Replacing the Condor Defrag Daemon

I've replaced the standard DEFRAG daemon released with Condor with a simpler version that contains a proportional integral (PI) controller. I hoped this would give us better control over multicore slots. Preliminary results with the proportional part of the controller show that it fails to keep accurate control over the provision of slots. It is subject to hunting due to the long time lags between the onset of drainin and the eventual change in the controlled variable (which is 'running mcore jobs'). The rate of provision was unexpectedly stable at first, considering the simplicity of the algorithm employed, but degraded over time as the controlled variable became more random.

The graph below shows the very preliminary picture, with a temporary period of stable control shown by the green line on the right of the plot. The setpoint is 250.

I have also now included an Integral component to the controller, and I'm in the process of tuning the reset rate on this. I hope to show the results of this test soon.

by Steve Jones ( at February 24, 2015 12:09

November 17, 2014


Condor Workernode Heath Script

This is a script that makes some checks on the worker node and "turns it off" if it fails any of them. To implement this, I made use a a Condor feature; startd_cron jobs. I put this in my /etc/condor_config.local file on my worker nodes.

PERSISTENT_CONFIG_DIR = /etc/condor/ral
STARTD_ATTRS = $(STARTD_ATTRS) StartJobs, RalNodeOnline
StartJobs = False
RalNodeOnline = False

I use the prefix "Ral" here because I inherited some of this material from Andrew Lahiffe at RAL! Basically, it's just to de-conflict names. I should have used "Liv" right from the start, but I'm not changing it now. Anyway, the first section says to keep a persistent record of configuration settings; it adds new configuration settings called "StartJobs" and “RalNodeOnline”; it's sets them initially to False; and it makes the START configuration setting dependant upon them both being set. Note: the START setting is very important because the node won't start jobs unless it is True. I also need this. It tells the system (startd) to run a cron script every three minutes.


# Make sure values get over
The script looks like this:


/usr/libexec/condor/scripts/ > /dev/null 2>&1

if [ $STATUS != 0 ]; then
MESSAGE=`grep ^[A-Z0-9_][A-Z0-9_]*=$STATUS\$ /usr/libexec/condor/scripts/ | head -n 1 | sed -e "s/=.*//"`
if [[ -z "$MESSAGE" ]]; then

if [[ $MESSAGE =~ ^OK$ ]] ; then
echo "RalNodeOnline = True"
echo "RalNodeOnline = False"
echo "RalNodeOnlineMessage = $MESSAGE"

echo `date`, message $MESSAGE >> /tmp/testnode.status
exit 0

This just wraps an existing script which I reuse from out TORQUE/MAUI cluster. The existing script just returns a non-zero code if any error happens. To add a bit of extra info, I also lookup the meaning of the code. The important thing to notice is that it echoes out a line to set the RalNodeOnline setting to false. This is then used in the setting of START. Note: on TORQUE/MAUI, the script ran as “root”; here it runs as “condor”. I had to use sudo for some of the sections which (e.g.) check disks etc. because condor could not get smartctl settings etc. Right, so I think that's it. When a node fails the test, START goes to False and the node won't run more jobs. Oh, there's another thing to say. I use two settings to control START. As well as RalNodeOnline, I have the StartJobs setting. I can control this independently, so I can turn a node offline whether or not it has an error. This is useful for stopping the node to (say) rebuild it. It's done on the server, like this.

condor_config_val -verbose -name r21-n01 -startd -set "StartJobs = false"
condor_reconfig r21-n01
condor_reconfig -daemon startd r21-n01

by Steve Jones ( at November 17, 2014 16:52

October 14, 2014


Nagios Monitoring for Non LHC VO’s

A brief description of monitoring framework before coming to the actual topic of Non LHC VO's monitoring.
Service Availability Monitoring (SAM) is a framework for monitoring grid sites remotely. It consists of many components to perform various functions. It can be broadly divided into
‘What to Monitor’ or Topology Aggregation:  Collection of service endpoints and metadata from different sources like GOCDB, BDII, VOMS etc. Custom topological source (VO Feeds) can also be used.
Profile Management:  Mapping of services to the test to be performed.  This service is provided by POEM ( Profile Management) database.  It provides a web based interface to group various metrics into profiles.
Monitoring: Nagios is used as monitoring engine. It is automatically configured based on the information provided by Topology Aggregator and POEM.
SAM software was developed under EGEE project at CERN and now maintained by EGI.
It is mandatory for grid sites to pass ops VO functional test to be part of WLCG. Every NGI maintains a Regional SAM Nagios and result from regional SAM Nagios also goes to central MyEGI which is used for Reliability/Availability calculation.   
UK Regional Nagios is maintained at Oxford
and a backup instance at Lancaster

There was no centralize monitoring of Non LHC VO’s for long time and it contributed to bad user experience as it was difficult to find whether a site is broken or problem at the user end.  It was decided to host a multi VO Nagios at Oxford as we had experience with WLCG Nagios.
It is currently monitoring five VO’s

Sites can look for tests associated with only their site
VO managers may be interested to see tests associated with a particular VO only

We are using VO-feed mechanism to aggregate site metadata and endpoint information. Every VO has a vo-feed available on a web server.  Currently we are maintaining this VO-feed 

VO feed provides list of services to be monitored. I am generating this VO-feed through a script

Jobs are submitted using a proxy generated from a Robot Certificate assigned to Kashif Mohammad. These jobs are like normal grid user jobs and test things like GCC version and CA version. Jobs are submitted every eight hour and this is a configurable option.  We are monitoring CREAMCE, ARC-CE and SE only. Services like BDII, WMS etc. are already monitored by Regional Nagios so there was no need for the duplication.  

For more information, these links can be consulted

by Kashif Mohammad ( at October 14, 2014 11:08

October 08, 2014

London T2

XrootD and ARGUS authentication

A couple of months ago, I  set up a test machine running XrootD version 4  at QMUL. This was to test three things:
  1. IPv6 (see blog post),
  2. Central authorisation via ARGUS (the subject of this blog post).
  3. XrootD 4
We  run StoRM/Lustre on our grid storage, and have run an XrootD server for some time as part of the  ATLAS federated storage system, FAX. This  allows local (and non local) ATLAS users interactive access, via the xrootd protocol, to files on our grid storage.

For the new machine, I started by following ATLAS's Fax for Posix storage sites instructions. These instructions document how to use VOMS authentication, but not central banning via ARGUS. CMS do however have some instructions on using xrootd-lcmaps to do the authorisation - though with RPMs from different (and therefore potentially incompatible) repositories. It is, however, possible to get them to work.

The following packages are needed (or at least what I have installed):

  yum install xrootd4-server-atlas-n2n-plugin
  yum install argus-pep-api-c  yum install lcmaps-plugins-c-pep
  yum install lcmaps-plugins-verify-proxy
  yum install lcmaps-plugins-tracking-groupid
  yum install yum install xerces-c
  yum install lcmaps-plugins-basic

Now the packages are installed, xrootd needs to be configured to use them - the appropriate lines in /etc/xrootd/xrootd-clustered.cfg are:

 xrootd.seclib /usr/lib64/
 xrootd.fslib /usr/lib64/
 sec.protocol /usr/lib64 gsi -certdir:/etc/grid-security/certificates -cert:/etc/grid-security/xrd/xrdcert.pem -key:/etc/grid-security/xrd/xrdkey.pem -crl:3 -authzfunparms:--osg,--lcmapscfg,/etc/xrootd/lcmaps.cfg,--loglevel,5|useglobals -gmapopt:10 -gmapto:0
 acc.authdb /etc/xrootd/auth_file
 acc.authrefresh 60
 ofs.authorize 1

And in /etc/xrootd/lcmaps.cfg it is necessary to change path and argus server (my argus server is obscured in the example below). My config file looks looks like:


# where to look for modules
#path = /usr/lib64/modules
path = /usr/lib64/lcmaps

good = "lcmaps_dummy_good.mod"
bad  = "lcmaps_dummy_bad.mod"
# Note put your own argus host instead of for argushost.mydomain
pepc        = "lcmaps_c_pep.mod"
             "--pep-daemon-endpoint-url https://argushost.mydomain:8154/authz"
             " --resourceid"
             " --actionid"
             " --capath /etc/grid-security/certificates/"
             " --no-check-certificates"
             " --certificate /etc/grid-security/xrd/xrdcert.pem"
             " --key /etc/grid-security/xrd/xrdkey.pem"

pepc -> good | bad

Then after restarting xrootd, you just need to test that it works.

It seems to work, I was successfully able to ban myself. Unbanning didn't work instantly, and I resorted to restarting xrootd - though perhaps if I'd had patience, it would have worked eventually.

Overall, whilst it wasn't trivial to do, it's not actually that hard, and is one more step along the road to having central banning working on all our grid services.

by Christopher J. Walker ( at October 08, 2014 09:20

March 24, 2014


The Three Co-ordinators

It is has been a while since we posted on the blog. Generally, this means that things have been busy and interesting. Things have been busy and interesting.

We are presently, going through redevelopment of the site, the evaluation of new techniques for service delivery such as using Docker for containers and updating multiple services throughout the sites.

The development of the programme presented at CHEP on automation and different approaches to delivering HEP related Grid services is underway. An evaluation of container based solutions for service deployment will be presented at the next GridPP collaboration meeting later this month. Other evaluation work on using Software Defined Networking hasn't progressed as quickly as we would have like but is still underway.

Graeme (left), Mark (center) and Gareth.

On other news, Gareth Roy is taking over as the Scotgrid Technical Co-ordinator this month. Mark is off for adventures with the Urban Studies Big Data Group within Glasgow University.And as Dr Who can do it, we can do. Co-ordinator Past, Present and Future all appear in the same place at the same time.

Will the fabric of Scotgrid be the same again?

Very much so.

by Mark Mitchell ( at March 24, 2014 23:13

October 14, 2013


Welcome to CHEP 2013

Greetings from CHEP 2013 in a rather wet Amsterdam.

The conference season is upon us and Sam, Andy, Wahid and myself find ourselves in Amsterdam for CHEP 2013. CHEP started here in 1983 and it is hard to believe that it has been 18 months since New York.

As usual the agenda for the next 5 days is packed. Some of the highlights so far have included advanced facility monitoring, the future of C++ and Robert Lupton's excellent talk on software engineering for Science.

As with all of my visits to Amsterdam, the rain is worth mentioning. So much so that it made local news this morning. However, the venue is the rather splendid Beurs van Berlage in central Amsterdam.

CHEP 2013

There will be further updates during the week as the conference progresses.

by Mark Mitchell ( at October 14, 2013 12:53

June 04, 2013

London T2

Serial Consoles over ipmi

To get Serial Consoles over ipmi working properly with Scientific Linux 6.4 (aka RHEL 6.4 / centos 6.4) I had to modify several setting both in the BIOS and in the OS.

Hardware Configuration

For Dell C6100 I set these setting in the BIOS

Remote Access = Enabled
Serial Port Number = COM2
Serial Port Mode = 115200 8,n,1
Flow Control = None
Redirection After BIOS POST = Always
Terminal Type = VT100
VT-UTF8 Combo Key Support = Enabled

Note: "Redirection After Boot = Disabled" is required otherwise I get a 5 minute timeout before booting the kernel. Unfortunately with this set up you get a gap in output while the server attempts to pxeboot. However, you can interact with the BIOS and once Grub starts you will see and be able to interact with the grub and Linux boot processes.

For Dell R510/710 I set these setting in the BIOS

Serial Communication = On with Console Redirection via COM2
Serial Port Address = Serial Device1=COM1,Serial Device2=COM2
External Serial Connector = Serial Device1
Failsafe Baud Rate = 115200
Remote Terminal Type = VT100/VT220
Redirection After Boot = Disabled

Note: With these settings you will be unable to see the progress of the kickstart install on the non default console.

Grub configuration

In grub.conf you should have these two lines (they were there by default in my installs).

serial --unit=1 --speed=115200
terminal --timeout=5 serial console

This allows you access grub via the consoles. The "serial" (ipmi) terminal will be default unless you press a key when asked during the boot process. This is only for grub and not for the rest of the linux boot process

SL6 Configuration

The last console specified in the linux kernel boot options is taken to be the default console. However, if the same console is specified twice this can cause issues (e.g. when entering a password the characters are shown on the screen!)

For the initial kickstart pxe boot I append "console=tty1 console=ttyS1,115200" to the linux kernel arguments. Here the serial console over ipmi will be the default during the install process, while the other console should echo the output of the ipmi console.

After install the kernel argument "console=ttyS1,115200" was already added to the kernel boot arguments. I have additionally added "console=tty1" before this, this may be required to enable interaction with the server via a directly connected terminal if needed.

With the ipmi port set as default (last console specified in the kernel arguments) SL6 will automatically start a getty for ttyS1. If it was not the default console we would have to add a upstart config file in /etc/init/. Note SL6 uses upstart, previous SL5 console configurations in /etc/inittab are ignored!

e.g. ttyS1.conf

start on stopping rc runlevel [345]
stop on starting runlevel [S016]

exec /sbin/agetty /dev/ttyS1 115200 vt100

by Daniel Traynor ( at June 04, 2013 14:49

October 02, 2012

National Grid Service

SHA2 certificates

We have started to issue certificates with the "new" more secure algorithms, SHA2 (or to be precise SHA256) - basically, it means that the hashing algorithm which is a part of the signature is more secure against attacks than the current SHA1 algorithm (which in turn is more secure than the older MD5).

But only to a lucky few, not to everybody.  And even they get to keep their "traditional" SHA1 certificates alongside the SHA2 one if they wish.

Because the catch is that not everything supports SHA2.  The large middleware providers have started worrying about supporting SHA2, but we only really know by testing it.

So what's the problem?  A digital signature is basically a one-way hash of something, which is encrypted with your private key: S=E(H(message)).  To verify the signature, you would re-hash the message, H(message), and also decrypt the signature with the public key (found in the certificate in the signer): D(S)=D(E(H(message)))=H(message) - and also check the validity of the certificate.

If someone has tampered with the message, the H would fail (with extremely high probability) to yield the same result, hence invalidate the signature, as D(S) would no longer be the same as H(tamper_message).

However, if you could attack the hash function and find a tamper_message which has the property that H(tamper_message)=H(message), then the signature is useless - and this is precisely the kind of problem people are worrying about today, for H being SHA1 signatures (and history repeats itself, since we went through the same stuff for MD5 some years ago.)

So we're now checking if it works. So far, we have started with PKCS#10 requests of a few lucky individuals; I'll do some SPKACs tomorrow.  If you want one to play with, send us a mail via the usual channels (eg email or helpdesk.)

Eventually, we will start issuing renewals with SHA2, but only once we're sure that they work with all the middleware out there... we also take the opportunity to test a few modernisations of extensions in the certificates.

by Jens Jensen ( at October 02, 2012 16:52

June 14, 2012

National Grid Service

Kick off - it's time for the NGS summer seminar series

In the midst of this summer of sport another event is kicking off soon but this time it's the NGS Summer Seminar series.

The first seminar will take place next Wednesday (20th June) at 10.30am (BST) and will give an overview of how accounting is done on the grid, and what it is used for.  It will cover the NGS accounting system at a high level and then go into more detail about the implementation of APEL, the accounting system for EGI, including the particular challenges involved and the plans for development.

The speaker will be Will Rogers from STFC Rutherford Appleton Laboratory who I'm sure would appreciate a good audience ready to ask lots of questions!

Please help spread the word about this event to any colleagues or organisations you think might be interested.  A Facebook event page is available so please invite your colleagues and friends!

by Gillian ( at June 14, 2012 11:09

March 14, 2011

December 21, 2010

gLite/Grid Data Management

GFAL / LCG_Util 1.11.16 release

There has been no blog post for almost half a year. It does not mean that nothing has happened since than. We devoted enormous effort to some background works (automated test bed, nightly builds and test runs, change to EMI era from EGEE, etc.). We will test the tools and the procedures in the first months of 2011, analyze if they have added value and how they could be improved. As for the visible part, we released GFAL/LCG_Util 1.11.16 (finally) in November - see the release notes. Better later than never!

by zsolt rossz molnár ( at December 21, 2010 14:11

October 29, 2010

Steve Lloyd's ATLAS Grid Tests

Upgrade to AtlasSetup

I have changed the setup for my ATLAS jobs so it uses AtlasSetup (rather than AtlasLogin). The magic lines are:

source $VO_ATLAS_SW_DIR/software/$RELEASE/cmtsite/ AtlasOffline $RELEASE

VO_ATLAS_SW_DIR is set up automatically and you have to set RELEASE yourself. Since AtlasSetup is only available from Release 16 onwards, jobs going to sites without Release 16 will fail.

by Steve Lloyd ( at October 29, 2010 15:11

July 23, 2010

Steve Lloyd's ATLAS Grid Tests

Steve's pages update

I have done some much needed maintenance and the gstat information is available again (from gstat2). There is also a new page giving the history of the ATLAS Hammercloud tests status

by Steve Lloyd ( at July 23, 2010 09:04

November 16, 2009

MonAMI at large

watching the ink dry

Yeah, it's been far too long since the last bit of news so here's a entry just to announce that MonAMI now has a new plugin: inklevel.

This plugin is a simple wrapper around Markus Heinz's libinklevel library. This is a nice library that allows easy discovery of how much ink is left in those expensive ink cartridges.

The library allows one to check the ink levels of Canon, Epson and HP printers. It can check printers directly attached (via the parallel port or USB port) or, for Canon printers, over the network via BJNP (a proprietary protocol that has been reverse engineered).

libinklevel supports many different printers, but not all of them. There's a small collection of printers that the library doesn't work with. There are some printers that are neither listed as working or not working. If your printer isn't listed, please let Markus know whether libinklevel works or not.

Credit for the photo goes to Matthew (purplemattfish) for his picture CISS - Day 304 of Project 365.

by Paul Millar ( at November 16, 2009 20:14

Trouble at Mill

With some unfortunate timing, it looks like the "Axis of Openness" webpages (SourceForge, Slashdot, Freshmeat, ...) have gone for a burton. There seems to be some networking problems with these sites, with web traffic timing out. Assuming traceroute output is valid, the problem appears soon after traffic leaves the Santa Clara location of the Savvis network [dead router(s)?]

This is a pain because we've just done the v0.10 release of MonAMI and both the website and the file download locations are hosted by SourceForge. Whilst SourceForge is down, no one can download MonAMI!

If you're keen to try MonAMI, in the mean-time, you can download the RPMs from the (rough and ready) dev. site:

The above site is generously hosted by the ScotGrid project [their blog].

Thanks guys!

by Paul Millar ( at November 16, 2009 20:13

October 13, 2009

Monitoring the Grid


It should be obvious to any following this blog that 10 weeks of this student-ship project have long since ended, however until now there were a few outstanding issues. I can now finally say that the project is finished and ready for public use. It can be found at, although the link may change at some point in the future & the "add to iGoogle" buttons won't work for now.

The gadget is currently configured to utilize all available data from 2009. Specifically it holds data for all jobs submitted in 2009, up-to around mid September (the most up to date data available from the Grid Observatory).

In addition to this I have produced a small report giving an overview of the project which is available here.

by Laurence Hudson ( at October 13, 2009 16:45

September 17, 2009

Steve at CERN

Next Week EGEE 09

Next week is of course EGEE 09 in Barcelona. As a warm up the EGEE SA1 OAT sections a sneak preview.

by Steve Traylen ( at September 17, 2009 16:03

July 31, 2009

Monitoring the Grid

Another Update (Day 30)

As week six comes to a close I thought it was about time for another progress update. So same as last time, stealing the bullet points from 2 posts back, with new additions in italics.

  • GridLoad style stacked charts.

  • A "League Table" (totaled over the time period).

  • Pie charts (of the "League Table").

  • Filters and/or sub filters (just successful jobs, just jobs by one VO, just jobs for this CE etc).

  • A tabbed interface.

  • Regular Expression based filtering

  • Variable Y-axis parameter (jobs submitted, jobs started, jobs finished etc).

  • Transition to university web servers.

  • Move to a more dynamic chart legend style. Not done for pie chart yet.

  • Ensure w3 standards compliance/cross-browser compatibility & testing.

  • Automate back end data-source (currently using a small sample data set). Need automatic renewal of grid certificates.

  • Variable X-axis time-step granularity.

  • Data/image export option.

  • A list of minor UI improvements. (About 10 little jobs, not worth including in this list as they would be meaningless, without going into a lot more detail about how the gadget's code is implemented).

  • Optimise database queries and general efficiency testing.

  • Make the interface more friendly. (Tool-tips etc.)

  • Possible inclusion of more "real time" data.

  • Gadget documentation.

  • A Simple project webpage.

  • A JSON data-source API reference page.

  • 2nd gadget, to show all know infon for a given JobID.

  • 2nd gadget: Add view of all JobIDs for one user (DN string).

  • The items in this list are now approximately in the order I intend to approach them.

    On another note, I have finally managed to get some decent syntax highlighting for google gadgets, thanks to this blog post even if it means being stuck with VIM. To get this to work add the vim modeline to the very bottom of the xml gadget file, other wise it tends to break things, such as the gadget title, if added at the top. Whilst VIM is not my editor/IDE of choice it's pretty usable and can, with some configuration, match most of the key features (show whitespaces), I use in Geany. However Geany's folding feature would save a lot of time & scrolling.

    by Laurence Hudson ( at July 31, 2009 15:11

    June 18, 2009

    Grid Ireland

    DPM 1.7.0 upgrade

    I took advantage of a downtime to upgrade our DPM server. We need the upgrade as we want to move files around using dpm-drain and don't want to lose space token associations. As we don't use YAIM I had to run the upgrade script manually, but it wasn't too difficult. Something like this should work (after putting the password in a suitable file):

    ./dpm_db_310_to_320 --db-vendor MySQL --db $DPM_HOST --user dpmmgr --pwd-file /tmp/dpm-password --dpm-db dpm_db

    I discovered a few things to watch out for along the way though. Here's my checklist:

    1. Make sure you have enough space on your system disk: I got bitten by this on a test server. The upgrade script needs a good chunk of space (comparable to that already used by the MySQL DB?) to perform the upgrade
    2. There's a mysql setting you probably need to tweak first: add set-variable=innodb_buffer_pool_size=256M to the [mysqld] section in /etc/mysql.conf and restart mysql. Otherwise you get this cryptic error:

      Thu Jun 18 09:02:30 2009 : Starting to update the DPNS/DPM database.
      Please wait...
      failed to query and/or update the DPM database : DBD::mysql::db do failed: The total number of locks exceeds the lock table size at line 19.
      Issuing rollback() for database handle being DESTROY'd without explicit disconnect().

      Also worth noting is that if this happens to you, when you try to re-run the script (or YAIM) you will get this error:

      failed to query and/or update the DPM database : DBD::mysql::db do failed: Duplicate column name 'r_uid' at line 18.
      Issuing rollback() for database handle being DESTROY'd without explicit disconnect().

      This is because the script has already done this step. You need to edit /opt/lcg/share/DPM/dpm-db-310-to-320/ and comment out this line:

      $dbh_dpm->do ("ALTER TABLE dpm_get_filereq ADD r_uid INTEGER");

      You should then be able to run the script to completion.

    by Stephen Childs ( at June 18, 2009 09:19

    June 08, 2009

    Grid Ireland

    STEP '09 discoveries

    ATLAS have been giving our site a good thrashing over the past week, which has helped us shake out a number of issues with our setup. Here's some of what we've learned.

    Intel 10G cards don't work well with SL4 kernels

    We're currently upgrading our networking to 10G and had it mostly in place by the time STEP'09 started. However, we discovered that the stock SL4 kernel (2.6.9) doesn't support the ixgbe 10G driver very well. It was hard to detect because we could get reasonable transmit performance but receive was limited to 30Mbit/s! It's basically an issue with interrupts (MSI-X and multi-queue weren't enabled). I compiled up a 2.6.18 SL5 kernel for SL4 and that works like a charm (once you've installed it using --nodeps).

    It's worth tuning RFIO

    We had loads of atlas analysis jobs pulling data from the SE and they were managing to saturate the read performance of our disk array. See this NorthGrid post for solutions.

    Fair-shares don't work too well if someone stuffs your queues

    We'd set up shares for the various different atlas sub-groups but the generic analysis jobs submitted via ganga were getting to use much more time. On digging deeper with Maui's diagnose -p I could see that the length of time they'd been queued was overriding the priority due to fairshare. I was able to fix this by increasing the value of FSWEIGHT in Maui's config file.

    You need to spread VOs over disk servers

    We had a nice tidy setup where all the ATLAS filesystems were on one DPM disk server. Of course this then got hammered ... we're now trying to spread out the data across multiple servers.

    by Stephen Childs ( at June 08, 2009 12:53

    March 23, 2009

    Steve at CERN

    Installed Capacity at CHEP

    This week is CHEP 09 proceeded by WLCG workshop. I presented some updates on the roll out of the installed capacity document. It included examples of a few sites that would have zero capacity if considered under the new metrics.
    Sites should consider taking the following actions.

    • Check gridmap. In particular the view obtained by clicking on the more label and selecting the "size by SI00 and LogicalCPUs".
    • Adjust your published #LogicalCPUS in your SubCluster. It should correspond to the number of computing cores that you have.
    • Adjust your #Specint2000 settings in the SubCluster. The aim is to make your gridmap box the correct size to represent your total site power.
    The followup questions were the following. Now a chance for a a more reflected response.
    1. Will there be any opportunity to run a benchmark within the gcm framework?
      I answered that this was not possible since unless it could be executed in under 2 seconds then there was no room for it. Technically there would not be a problem with running something for longer, it could be ran rarely. We should check how the first deployment of GCM goes, longer tests are in no way planned though.
    2. What is GCM collecting and who can see its results?
      Currently no one can see on the wire since messages are encrypted. There should be a display at however currently it is down but once there it will be accessible to IGTF CA members. For now there are some test details available.
    3. When should sites start publishing the HEPSpecInt2006 benchmark?
      The management contributed "Now" which is of course correct, the procedure is well established. Sites should be in the process of measuring their clusters with the HEPSpec06 bench mark. With the next YAIM release they will be able to publish the value also.
    4. If sites are measuring these benchmarks can they the values be made available on the worker nodes to jobs?
      Recently the new glite-wn-info made it as far as the PPS service. This allows the job to find on the WN to which GlueSubCluster it belongs. In principal this should be enough, the Spec benchmarks can be retrieved from the GlueSubClusters. The reality of course is that until some future date when all the WNWG recommendations are deployed along with CREAM also then this is not possible. So for now I will extend glite-
      wn-info to also return a HepSpec2006 value as configured by the site administrators.
    5. Do you know how many sites are currently publishing incorrect data?
      I did not know the answer nor is an answer easy other than collecting the ones of zero size. Checking now of 498 (468 unique?) SubClusters some 170 of them have zero LogicalCPUs.
    On a more random note a member of CMS approached me afterwards to thank me for the support I gave him 3 or so years ago while working at RAL. At the time we both had an interest in making grid work. He got extra queues, reserved resources, process dumps and general job watching from me. It was the fist grid jobs we had approaching something similar to the analysis we now face. Quoting the gentleman from his grid experience and results using RAL he obtained his doctorate and CMS chose to use the grid.

    by Steve Traylen ( at March 23, 2009 10:11

    January 07, 2009

    GridPP Operations

    OpenSSL vulnerability

    There is a new vulnerability in OpenSSL in all versions prior to 0.9.8j, discovered by Google's security team. You will be happy to learn that the Grid PKI is not affected by the vulnerability since it uses RSA signatures throughout - only DSA signatures and ECDSA (DSA but with Elliptic Curves) are affected. (Of course you should still upgrade!)

    by Jens Jensen ( at January 07, 2009 21:06

    January 05, 2009

    GridPP Operations

    New MD5 vulnerability announced

    In 2006 two different MD5-signed certificates were created. A new stronger attack, announced last Wednesday (yes 30 Dec), allows the attacker to change more parts of the certificate, also the subject name. To use this "for fun and profit" one gets an MD5 end entity certificate from a CA (ideally one in the browser's keystore), and hacks it to create an intermediate CA which can then issue

    by Jens Jensen ( at January 05, 2009 11:12