September 02, 2010

National Grid Service

A flurry of activity

There has been a batch of new speakers recently announced for the NGS Innovation Forum '10 and further information about their presentations are now on the website. The latest presentations are NGS tool demos which will consist of walk-throughs of NGS tools using real research examples so delegates can leave the event with the knowledge of new tools to use in their research.

NGS tool demos
1. Transcriptome Analysis using the NGS User Interface /Workload Management System (UI/WMS) – Jonathan Churchill, NGS, STFC RAL
The UI/WMS is a tool which allows users to easily submit jobs to the whole of the NGS relying on the WMS to chose which NGS resources to use for their jobs. Use of the UI/WMS will be demonstrated with a user case study in which analysis time of mRNA was decreased from a month to less than 12 hours.

2. Accessing the NGS using the Application Hosting Environment (AHE) – Stefan Zasada, UCL
An overview of how access to the NGS can be simplified using the Application Hosting Environment, a lightweight application portal system.

3. Using the HERMES data management tool – David Wallom, NGS, University of Oxford
Here we will show how easy it is to install and connect into various NGS resources to move data between them, your home institution and your desktop.

4. The NGS from the CCP4 desktop – Matteo Turilli, NGS, University of Oxford
The NGS R&D theme have been working to build access to the NGS into the desktop tools that researchers use on a day-to-day basis, in this presentation we look at the example of CCP4: Software for Macromolecular X-Ray Crystallography.

We also have a presentation from the Director of the NGS -
The future of the NGS – Neil Geddes, NGS Director, STFC RAL
This presentation will look at the focus of activities for the NGS for the coming 2-3 years and possible longer term opportunities.

Remember that registration for the event is now open and that the call for poster abstracts closes on the 10th of September!

by Gillian (noreply@blogger.com) at September 02, 2010 10:45

September 01, 2010

SouthGrid

APEL on ngsce-test

APEL was failing on ngsce-test with the following error.

java.io.FileNotFoundException: /var/spool/pbs/server_priv/accounting/20090522 (Too many open files)

The solution was to type:
ulimit -n 10240

I've added this to the /opt/glite/bin/apel-pbs-log-parser script.

A fix is in test, so a new version of APEL will fix it.
see GGUS ticket
https://gus.fzk.de/ws/ticket_info.php?ticket=60674

by Pete Gronbech (noreply@blogger.com) at September 01, 2010 14:14

NorthGrid

Atlas jobs in Manchester

August has seen a really a notable increase of Atlas user pilot jobs. Over 34000 jobs of which more than 12000 just in the last 4 days. Plotting the number of jobs since the beginning of the year there has been an inversion between production and users pilots.



The trend in August was probably helped by moving all the space to the DATADISK space token and attracting more interesting data. LOCALGROUP is also heavily used in Manchester.

In the past 4 days we also have applied the XFS file system tuning suggested by John that solves the load on the data servers experienced since upgrading to SL5. The tweak has increased notably the data throughput and reduced the load on the data servers practically to zero allowing us to increase the number of concurrent jobs. This has allowed a bigger job throughput and has had a clear improvement on the job efficiency isolating as most inefficient the very short ones (<10 mins CPU time) and even then the improvement is also notable as it is possible to see from the plots below.


Before applying the tweak




After applying the tweak




This also means we can keep on using XFS for the data servers which has currently more flexibility as far as partition sizes are concerned.

by Alessandra Forti (noreply@blogger.com) at September 01, 2010 08:00

August 31, 2010

NorthGrid

Tuning Areca RAID controllers for XFS on SL5

Sites (including Liverpool) running DPM on pool nodes running SL5 with XFS file systems have been experiencing very high (up to multiple 100s Load Average and close to 100% CPU IO WAIT) load when a number of analysis jobs were accessing data simultaneously with rfcp.The exact same hardware and file systems under SL4 had shown no excessive load, and the SL5 systems had shown no problems under system stress testing/burn-in. Also, the problem was occurring from a relatively small number of parallel transfers (about 5 or more on Liverpool's systems were enough to show an increased load compared to SL4).Some admins have found that using ext4 at least alleviates the problem although apparently it still occurs under enough load. Migrating production servers with TBs of live data from one FS to another isn't hard but would be a drawn out process for many sites.The fundamental problem for either FS appears to be IOPS overload on the arrays rather than sheer throughput, although why this is occurring so much under SL5 and not under SL4 is still a bit of a mystery. There may be changes in controller drivers, XFS, kernel block access, DPM access patterns or default parameters.When faced with an IOPS overload (that's resulting well below the theoretical throughput of the array) one solution is to make each IO operation access more bits from the storage device so that you need to make fewer but larger read requests.This leads to the actual fix (we have been doing this by default on our 3ware systems but we just assumed the Areca defaults were already optimal).
blockdev --setra 16384 /dev/$RAIDDEVICEThis sets the block device read ahead to (16384/2)kB (8MB). We have previously (on 3ware controllers) had to do this to get the full throughput from the controller. The default on our Areca 1280MLs is 128 (64kB read ahead). So when lots of parallel transfers are occurring our arrays have been thrashing spindles pulling off small 64kB chunks from each different file. These files are usually many hundreds or thousands of MB where reading MBs at a time would be much more efficient.The mystery for us is more why the SL4 systems *don't* overload rather than why SL5 does, as the SL4 systems use the exact same default values.Here is a ganglia plot of our pool nodes under about as much load as we can put on them at the moment. Note that previously our SL5 nodes would have LAs in the 10s or 100s under this load or less.http://hep.ph.liv.ac.uk/~jbland/xfs-fix.htmlAny time the systems go above 1LA now is when they're also having data written at a high rate. On that note we also hadn't configured our Arecas to have their block max sector size aligned with the RAID chunk size withecho "64" > /sys/block/$RAIDDEVICE/queue/max_sectors_kbalthough we don't think this had any bearing on the overloading and might not be necessary. 
We expect the tweak to also work for systems running ext4 as the underlying hardware access would still be a bottle neck, just at a different level of access.Note that this 'fix' doesn't fix the even more fundamental problem as pointed out by others that DPM doesn't rate limit connections to pool nodes. All this fix does is (hopefully) push the current limit where overload occurs above the point that our WNs can pull data.There is also a concern that using a big read ahead may affect small random (RFIO) access although the sites can tune this parameter very quickly to get optimum access. 8MB is slightly arbitrary but 64kB is certainly too small for any sensible access I can envisage to LHC data. Most access is via full file copy (rfcp) reads at the moment.

by John Bland (noreply@blogger.com) at August 31, 2010 18:24

National Grid Service

Registration for NGS Innovation Forum open now!

The registration for the NGS Innovation Forum is now open - details of how to register can be found on the event page on the NGS website.

We are pleased to announce another speaker for the event. Neil Geddes, director of the NGS, will be speaking about the future of the NGS so if you are a long term NGS user who wants to know the future direction of the NGS or a new user who is planning to use the NGS for the long term, then make sure you attend!

A reminder that we are still looking for NGS users to submit poster abstracts demonstrating how they use our resources in their research. The deadline for abstracts is the 10th of September so it's approaching soon! There are many benefits of submitting an abstract and attending the event -

  • Walk through demos of new NGS tools
  • NGS staff on-hand to answer your questions
  • The opportunity to contribute and feedback to the future of the NGS
  • The poster abstracts will be peer reviewed by the NGS IF'10 programme committee
  • Publicity for your research both at the event and through accepted posters being placed on the NGS website
  • The chance to win a prize as "best poster" as voted for by IF'10 delegates
All that is required is a short 200 word abstract! Of course you are more than welcome to attend the event without submitting an abstract and you can attend for one or both days. We hope you can come along!

by Gillian (noreply@blogger.com) at August 31, 2010 13:37

GridPP Storage

Taming XFS on SL5

Sites (including Liverpool) running DPM on pool nodes running SL5 with XFS file systems have been experiencing very high (up to multiple 100s Load Average and close to 100% CPU IO WAIT) load when a number of analysis jobs were accessing data simultaneously with rfcp.

The exact same hardware and file systems under SL4 had shown no excessive load, and the SL5 systems had shown no problems under system stress testing/burn-in. Also, the problem was occurring from a relatively small number of parallel transfers (about 5 or more on Liverpool's systems were enough to show an increased load compared to SL4).

Some admins have found that using ext4 at least alleviates the problem although apparently it still occurs under enough load. Migrating production servers with TBs of live data from one FS to another isn't hard but would be a drawn out process for many sites.

The fundamental problem for either FS appears to be IOPS overload on the arrays rather than sheer throughput, although why this is occurring so much under SL5 and not under SL4 is still a bit of a mystery. There may be changes in controller drivers, XFS, kernel block access, DPM access patterns or default parameters.

When faced with an IOPS overload (that's resulting well below the theoretical throughput of the array) one solution is to make each IO operation access more bits from the storage device so that you need to make fewer but larger read requests.

This leads to the actual fix (we have been doing this by default on our 3ware systems but we just assumed the Areca defaults were already optimal).
blockdev --setra 16384 /dev/$RAIDDEVICE

This sets the block device read ahead to (16384/2)kB (8MB). We have previously (on 3ware controllers) had to do this to get the full throughput from the controller. The default on our Areca 1280MLs is 128 (64kB read ahead). So when lots of parallel transfers are occurring our arrays have been thrashing spindles pulling off small 64kB chunks from each different file. These files are usually many hundreds or thousands of MB where reading MBs at a time would be much more efficient.

The mystery for us is more why the SL4 systems *don't* overload rather than why SL5 does, as the SL4 systems use the exact same default values.

Here is a ganglia plot of our pool nodes under about as much load as we can put on them at the moment. Note that previously our SL5 nodes would have LAs in the 10s or 100s under this load or less.

http://hep.ph.liv.ac.uk/~jbland/xfs-fix.html

Any time the systems go above 1LA now is when they're also having data written at a high rate. On that note we also hadn't configured our Arecas to have their block max sector size aligned with the RAID chunk size with

echo "64" &gt; /sys/block/$RAIDDEVICE/queue/max_sectors_kb

although we don't think this had any bearing on the overloading and might not be necessary.
 
We expect the tweak to also work for systems running ext4 as the underlying hardware access would still be a bottle neck, just at a different level of access.

Note that this 'fix' doesn't fix the even more fundamental problem as pointed out by others that DPM doesn't rate limit connections to pool nodes. All this fix does is (hopefully) push the current limit where overload occurs above the point that our WNs can pull data.

There is also a concern that using a big read ahead may affect small random (RFIO) access although the sites can tune this parameter very quickly to get optimum access. 8MB is slightly arbitrary but 64kB is certainly too small for any sensible access I can envisage to LHC data. Most access is via full file copy (rfcp) reads at the moment.

by John Bland (noreply@blogger.com) at August 31, 2010 06:50

August 28, 2010

National Grid Service

Ravioli code

The ngs-vo-tool - a utility for organising your virtual organisations - was released to the NGS's area on NeSCForge earlier this week.

Previous postings have described the reasons why the tool was written and how we worked out what it needed to do. This one is about developing and testing the program code itself.

So if you thought the 'Rough guide to the User Account Service' was dull, look away now.

The ngs-vo-tool was meant to be an example of 'software as documentation'.

If you are running complex software, which needs configuration information to be scattered around multiple files in different formats, and want to describe how you set it up then you have two options...
  • Write a long and detailed description of every stage in the process and expect people to read and follow the documentation.
or
  • You write a program to do the dirty work and expect anyone who wants to know how it works to read the program.
We are doing the latter. This is why ngs-vo-tool is written in Python - a language designed to be readable by people who don't actually know the language. Take this example from the code:

class LcmapsMapper(BaseMapper):
"LCMAPS gridmapfile and groupmapfile entries"

... snip ...

def add_mapping(self, vo, acinfo):
self.assert_mapping_args(vo,acinfo)

gridmap_e,groupmap_e = mapfile_entries(vo,acinfo)

if gridmap_e not in self.gridmap_entries:
logger.debug("Adding %s> to gridmapfile" % gridmap_e)
self.gridmap_entries.append(gridmap_e)

if groupmap_e not in self.groupmap_entries:
logger.debug("Adding %s> to groupmapfile" % groupmap_e)
self.groupmap_entries.append(groupmap_e)

... snip ...


As long as you can wrap your brain around Python's way of using indentation to group related code together, that is, describes the process of adding VO information to the contents of a LCMAPS gridmapfile and groupmapfile.

The ngs-vo-tool was also an attempt to apply the kind of software engineering techniques that our colleagues at the Software Sustainability Institute want to encourage in academia. In particular, we tried to use something approximating to Test Driven Development (TDD) to keep the bugs under control.

in TDD, for each bit of program code, there is a corresponding bit of test code to put it through its paces. A test harness is used to run all the tests whenever the code is changed.

If you have the source code, run:

python setup.py test

to watch all 57 tests fly past.

In proper TDD, the test is written first and run before any attempt is made to write the thing being tested. I'm afraid that I was more pragmatic and wrote the tests and code at the same time.

This approach encourages the developer to split the code into largely-self-contained modules with well defined interfaces simply because these are much easier to write tests against. Some people refer to this as ravioli code.

I would be a fool to claim that the program is bug-free - although the testing did shake out bugs early in the development process and I do think that the code is cleaner and easier to read because of this approach.

Someone at eScience centre at Southampton once came up with the phrase 'making the useful usable' as a way of describing their work. I hope that someone will find the ngs-vo-tool useful in making their services to the grid usable.

by Jason Lander (noreply@blogger.com) at August 28, 2010 01:06

August 27, 2010

SouthGrid

Argus Server at Oxford

We finally managed to install Argus server at Oxford with messy workaround. Installation and configuration was reasonably ok, and once policy structure was clear then writing and loading policy was also easy. Details are here http://www.gridpp.ac.uk/wiki/Oxford.

The main issue was host certificate issued by UK CA which contains an "emailAddress" and supposedly this is depreciated year(s) ago and most developers assume that there is no "emailAddress" in host certificate. Although still it is a bug in Argus and hopefully would be resolved in next release.
So the workaround
By default pap-admin command uses host certificate in /etc/grid-security/ if started from root but since there is a problem with host certificate so I copied my personal certificate proxy from UI and started pap-admin using that proxy. Then added ACE
pap-admin ace
"/C=UK/O=eScience/OU=Oxford/L=OeSC/CN=t2argus02.physics.ox.ac.uk/OID.1.2.840.113549.1.9.1=lcg_manager@physics.ox.ac.uk" ALL
This workaround was suggested by Andrea Ceccanti

The only issue is that if you want to restart pap service then first remove ACE using remove-ace command, restart pap and then add ACE again.

by Kashif Mohammad (noreply@blogger.com) at August 27, 2010 13:24

August 26, 2010

GridPP Storage

Climbing Everest

The slides should appear soon on the web page - the mountain themed programme labelled us Everest, the second highest mountain on the agenda.
Apart from the lovely fresh air and hills and outdoorsy activities, GridPP25 was also an opportunity to persuade the experiments (and not just the LHC ones) to give us feedback and discuss future directions - we'll try to collate this and follow up. We are also working on developing "services" which seem to be useful, eg checking integrity of files, or consistency between catalogues and the storage elements. And of course for us to meet face to face and catch up over a be coffee

by Jens Jensen (noreply@blogger.com) at August 26, 2010 06:11

August 24, 2010

National Grid Service

Application hosting environment and UI/WMS - preparations continue!

I've just added some details of more presentations at the forthcoming Innovation Forum to the website. We have two demos - one of the UI/WMS which has proved to be a big hit with our users and the Application Hosting Environment (AHE). Full details of the presentations are available on our event page on the website. Delegates will see walk-throughs of how to use these tools on the NGS and the benefits they can bring to your research.

We have also updated the UI/WMS tutorials if you want to give the UI/WMS a go beforehand!

by Gillian (noreply@blogger.com) at August 24, 2010 14:50

August 20, 2010

National Grid Service

User success stories at the NGS

The programme for the 3rd NGS Innovation Forum is really beginning to come together now. I've managed to secure presentations from 3 NGS users from completely different research areas to talk about how they have used the NGS in their work.

First up we have Zhongwei Guan from the University of Liverpool who leads a research group where many researchers use the NGS. They research into the impact of explosions on aircraft fuselages as well blasts and impacts on concrete amongst many other things. I have seen Zhongwei speak at previous events and his presentations are colourful and interesting!

Next up in a complete change of direction, we have Luke Rendell from the Centre for Social Learning and Cognitive Evolution at the University of St Andrews. Luke came to the NGS to ask for resources to run an international computer tournament on the evolution of learning. The tournament and results were so productive that the resulting paper was published in Science and was featured in New Scientist.

Finally, and by no means least, we have Narcis Fernandes-Fuentes who has used the NGS for several years to discover novel therapeutic agents. Narcis has had his research turned into a NGS user case study as an example of the type of research that can be performed on the NGS.

The deadline for poster abstracts for the event is approaching (10th Sept) so if you would like to submit an abstract and be in with a chance of winning the best poster prize, please submit soon!

by Gillian (noreply@blogger.com) at August 20, 2010 12:45

ScotGrid

Why, yes ... we were using that...

So .... remind me never to do a 'nothing much happening' post again. It looks like tempting fate results in Interesting Times.

Our cooling setup in one of the rooms is a bit quirky; based on a chilled water system (long story, but it was originally built for cooling a laser before we ended up with it). There's been a few blips with the water supply, so duely an engineer was dispatched to have a poke at it.
The 'poke' in this cases involved switching it off, until he could delve into the midsts of the machine, resulting in the rather exciting peak in temperatures (these measured using the on board thermal sensors in the worker nodes).

We were supposed to get a warning from the building systems when the chiller went offline, and again when the water supply temperature rose too high. (The air temperature lags behind the water temp, so it's a good early warning). As neither of those happened, our first warning was the air temperature in the room, followed by the nodes internal sensor alarms.

First course of action was to offline the nodes, and then find the cause of the problem. Once found, there was a short ... Explanation ... of why that was a Bad Time to switch off the chiller. We'll schedule some downtime to get it done later; at some point when we're not loaded with production jobs.

Still, little incidents like this are a good test for the procedures. Everything went pretty smoothly, from offlining nodes to stop them picking up new jobs, through to the defence in depth of multiple layers of monitoring systems.

Thankfully, we didn't need to do anything drastic (like hard powering off a rack); so we now know how long we have from a total failure of cooling until the effects kick in. Time to sit down and do some sums, to make sure we could handle a cooling failure at full load that occurs at 3am...

Update: 19/08/2010 by Mike


Never mind "sums", I took the physicist's approach a couple of years ago and got some real data:

Triangles (offset slightly along x-axis for clarity) are the temperatures of worker nodes as reckoned by IPMI; stars are input air temperatures to the three downflow units in room 141 and the squares are flow/return water temperatures. I simulated a total loss of cooling by switching the chilled water pump off; all worker nodes were operating at their maximum nominal load. It took ~20 minutes for the worker node temperatures to reach 40 degrees, at which point I bottled it and restored cooling. So, for good reason, we now run a script that monitors node temperatures, and has the ability to power them off once a temperature threshold is breached. Oh, and that has been tested in anger.

by Stuart Purdie (noreply@blogger.com) at August 20, 2010 00:27

August 19, 2010

National Grid Service

Source code archeology

I'm afraid this is going to be technical.

For the last month or so - in the gaps between holidays, meetings and dealing with a power-glitch that has knocked-out some rather important bits of ngs.leeds.ac.uk - I've been working on the ngs-vo-tool.

The ngs-vo-tool is a utility program that does the tweaking and fiddling needed when adding or removing support for all, or part, of a particular Virtual Organisation (VO).

One of the things the ngs-vo-tool needs to tweak and fiddle is the LCMAPS version of the gridmapfile. This controls how which bits of of which VO get assigned to which local accounts and consists of entries like...

"/training.ngs.ac.uk/*" .ngstrain
"/monitoring.ngs.ac.uk/lcas_lcmaps/*" .ngsmon

This particular example assigns anyone in the NGS's Training VO to an account in the 'ngstrain' pool but only members of the 'lcas_lcmaps' group within the NGS Monitoring VO to the ngsmon pool.

Groups can contain subgroups. You can also cherry-pick VO members with a particular role or a particular capability.

Among the last features due to be added to ngs-vo-tool is one to allow any combination of group/role/capability to specified and have this turned into something fit for an LCMAPS gridmapfile.

As always, it is not a simple as it first appears.

The bit in quotes is a pattern that matches a Fully Qualified Attribute Name (FQAN).

The FQAN is a representation of VO, group, subgroup, role and capability defined in http://edg-wp2.web.cern.ch/edg-wp2/security/voms/edg-voms-credential.pdf as

/VO[/group[/subgroup(s)]][/Role=role][/Capability=cap]
with the additional complication is that the FQAN for a VO member with no role can either omit the Role=role bit or explicitly include 'Role=NULL'.

So in order for the code to do the right thing, I need to work out..
  • What the LCMAPS uses to match a pattern to a string. In particular, is it fussy about where the '*' can be placed and how many '*'s can be used.
  • How the FQAN is constructed.
and do this for the slightly elderly version of LCMAPS that some NGS sites have deployed.

So a spot of source code archeology is required and luckily, I don't need to dig too deep as CERN kindly provide access to their source code repository on the web.

The LCMAPS code, and the rest of the gLite code, can be found at http://jra1mw.cvs.cern.ch. Released versions are even conveniently 'tagged' with the version number at that release - allowing the incurably geeky to jump directly to the relevant files.

This is work in progress. So far, I've worked out that the venerable Unix fnmatch function is used to match the pattern to the string and fnmatch allows '*'s to be used anywhere.

The exact details of FQAN construction are still buried somewhere but suitable fnmatch patterns should cope with the many variants of Role.

The code is in our local version control system. It will be copied to the source code repository at NeSCForge as soon as the important bits of ngs.leeds.ac.uk are back in service.

by Jason Lander (noreply@blogger.com) at August 19, 2010 15:46

ScotGrid

Business as unusual

There's a been a lot of little things happening up here; individually none of them quite big enough to blog about.

And after a while, it's worth doing a catch up post about them. This is that post.

David started a couple of weeks ago, and Mark is starting on Monday; just in time for the GridPP meeting. It's seeming to be a tradition that every time we get new hardware, the staff rotate; Dug and myself started just around the last hardware upgrade.

The hardware this time is mostly a petabyte of storage to be added, so David's been working on ways of testing the disks before we sign off on them.

GridPP; next week. Usual round of site reports, and future planning. With the data from the LHC now a routine matter, it's time to start thinking about future needs. I'll be talking about non-(particle)-physcists on the Grid, as a nod towards the longer term EGI picture.

We noticed some load balancing issues on our SL5 disk pool nodes; Sam's been poking at that, and it looks like there's a mix of issues, from filesystem type (ext4 is better than xfs here), and clustering of files onto nodes.

And that's most of the interesting stuff from up here. Hopefully we'll have more to post about over the next few months

by Stuart Purdie (noreply@blogger.com) at August 19, 2010 14:28

GridPP Storage

Where Dave Lives and who shares his home....

Title of this blog post is all about whee I am situated within the RALLCG2 site and where my children our as well. I apparently also want to discuss the profile of "files" across a "disk server" as my avatar likes to put it, I prefer to think of this "Storage Element" that he talks of as my home and these "disk servers" as rooms inside my home.

I am made of 1779 files. ( ~3TB if you recall) I am spread across 8/795tapes in the RAL DATATAPE store ( although the pool of tapes for real data is actually only 229 tapes. in total there are currently tapes being used by atlas, so I take up 1/1572 datasets but ~1/130 of the volume (~3TB of the ~380TB) stored on DATATAPE at RAL and correspond to ~1/130 of the files (1779 out of ~230000). In this tape world I am deliberately kept to as small subset of tapes to allow for expedient recall.

However when it comes to being on disk I want to be spread out as much as possible so as not to cause "hot disking" . However, spreading me across many rooms means that if a single room is down, then this increases the chance that I can not be fully examined. In this disk world; of my 3TB is part of the 700TB in ATLASDATADISK at RAL and is 1 in 25k datasets and 1779 files in ~1.5 Million. In this world my average filesize at ~1.7GB per file is a lot larger than the average 450MB filesize of all the other DATADISK files. (Filesize distribution is not linear but that is a discussion for another day.) I am spread across 38 out of 71 roomsa which existed in my space token when I was created. (ther are now an additional 10 rooms and this will continue to increase in the near term.).

Looking at a random DATADISK server for every file:

1in20 datasets represented on this server are log datasets and that 1 in 11 files are log files and corresponds to 1 in 10200GB of the space used in the room.
1in2.7 datasets represented on this server are AOD datasets and that 1 in 5.1 files are AOD files and corresponds to 1 in 8.25GB of the space used in the room.
1in4.5 datasets represented on this server are ESD datasets and that 1 in 3.9 files are ESD files and corresponds to 1 in 2.32GB of the space used in the room.
1in8.3 datasets represented on this server are TAG datasets and that 1 in 8.5 files are TAG files and corresponds to 1 in 3430GB of the space used in the room.
1in47 datasets represented on this server are RAW datasets and that 1 in 17 files are RAW files and corresponds to 1 in 10.8GB of the space used in the room.
1in5.4 datasets represented on this server are DESD datasets and that 1 in 5.1 files are DESD files and corresponds to 1 in3.67 GB of the space used in the room.
1in200 datasets represented on this server are HIST datasets and that 1 in 46 files are HIST files and corresponds to 1 in735 GB of the space used in the room.
1in50 datasets represented on this server are NTUP datasets and that 1 in 16 file are NTUP files and corresponds to 1 in 130GB of the space used in the room.


Similar study has been done for a MCDISK server:
1 in 4.8 datasets represented in this room are log datasets and that 1 in 2.5 files are log files and corresponds to 1 in 18 GBof the space used in the room.
1 in 3.1 datasets represented in this room are AOD datasets and that 1 in 5.7 files are AOD files and corresponds to 1 in 2.1GB of the space used in the room.
1 in 28 datasets represented in this room are ESD datasets and that 1 in 13.6 files are ESD files and corresponds to 1 in 3.2GB of the space used in the room.
1 in 4.3 datasets represented in this room are TAG datasets and that 1 in 14.6 files are TAG files and corresponds to 1 in 2000GB of the space used in the room.
1 in 560 datasets represented in this room are DAOD datasets and that 1 in 49 files are DAOD files and corresponds to 1 in 2200GB of the space used in the room.
1 in 950 datasets represented in this room are DESD datasets and that 1 in 11000 files are DESD files and corresponds to 1 in 600GB of the space used in the room.
1 in 18 datasets represented in this room are HITS datasets and that 1 in 6.3 files are HITS files and corresponds to 1 in 25GB of the space used in the room.
1 in 114 datasets represented in this room are NTUP datasets and that 1 in 71 files are NTUP files and corresponds to 1 in 46GB of the space used in the room.
1 in114 datasets represented in this room are RDO datasets and that 1 in 63 files are RDO files and corresponds to 1 in 11GB of the space used in the room.
1 in 8 datasets represented in this room are EVNT datasets and that 1 in 13 files are EVNT files and corresponds to 1 in 100GB of the space used in the room.

As a sample this MCDISK server represents 1/47 of the space used in MCDISK at RAL and ~ 1/60 of all files in MCDISK. This room was add recently so any disparity might be due this server being filled with newer rather than older files ( which would be a good sign as it shows ATLAS are increasing file size.) Average filesize on this server is 211MB per file. Discounting log files this increases to 330MB per file. ( since log files average size is 29MB)


One area my avatar is interested in is to know that if one of these rooms were lost then how many of the files that were stored in that room could be found in anothe house and how many would be permanently lost.


For the "room" in the DATADISK Space Token, there are no files that are not located in another other house. ( This will not be the case all the time but is a good sign that the ATLAS model of replication is working.)

For the "room" in the MCDISK Space Token the following is the case:
886 out 2800 datasets that are present are not complete elsewhere. Of these 886, 583 are log datasets (consisting of 21632 files.)
Including log datasets there would be potentially 36283 files in 583 of the 886 datasets with a capacity of 640GB of lost data. ( avergae file size is 18MB).
Ignoring log datasets this drops to 14651 files in 303 datasets with a capacity of 2.86 TB of lost data.
The files on this diskserver whcih are elsewhere are form 1914 datasets, consist of 18309 files, and fill a capacity of 8104GB.

by bgedavies (noreply@blogger.com) at August 19, 2010 09:45

August 18, 2010

NorthGrid

ext4 vs ext3 round(1)

I started yesterday to look at the ext4 vs ext3 performance with iozone. I installed two old dell WNs with the same file system layout, same raid level 0, but one with ext3 and one with ext4 on / and on /scratch the directories used by the jobs. Both machines have the default mount values and the kernel is 2.6.18-194.8.1.el5.

I performed the tests on /scratch partition writing the log in /. I did it twice one mounting and unmounting the fs at each test so to delete any trace of information from the buffer cache and one leaving the fs mounted between tests. Tests were automatically repeated for sizes from 64kB to 4GB and record length between 4kB - 16384kB. Iozone automatically doubles the previous sizes at each test (4GB is the smallest multiples smaller than the 5GB file size limit I set).

From the numbers ext4 performs much better in writing while reading is basically the same if not slightly worst for smaller files. There is a big drop in performance for both file systems for the 4GB size.

What however I find confusing is that I did the tests again setting the max size of the file at 100M and doing only write tests and ext3 takes less time despite (22 secs vs 44s in this case) despite the numbers saying that writing is almost 40% faster there is something that slows the tests down (deleting?). Speed of tests become similar for sizes >500MB they both decrease steadily until they finally drop at 4GB for any record length in both file systems.

Below some results mostly with the buffer cache because not having it affects mostly ext3 for small sizes of file and rec length as shown in the first graph.

EXT3: write (NO buffer cache)
==================



EXT3: write (buffer cache)
==================




EXT4: write (buffer cache)
==================




EXT3: read (buffer cache)
==================




EXT4: read (buffer cache)
==================


by Alessandra Forti (noreply@blogger.com) at August 18, 2010 18:57

August 17, 2010

Tier1 Blog

One (of five) of the bdii host…

One (of five) of the bdii hosts has died. The dns alias has been altered to exclude it. This should not be service affecting. We are w#wlcg

by tiju at August 17, 2010 15:42

No operational issues. Batch f…

No operational issues. Batch farm full for past 24 hours. There was a slight glitch with the preprod database this morning though it w#wlcg

by tiju at August 17, 2010 12:26

August 16, 2010

Tier1 Blog

Batch farm now running 5013 jo…

Batch farm now running 5013 jobs, it was been full all weekend. There are no known service affecting issues at RAL.#wlcg

by tiju at August 16, 2010 13:47

August 14, 2010

GridPP Storage

This quarter’s deliverable.


Apologies for the late reporting but on 13/6/10, Cyrus Irving Bhimji went live.He is a mini-Thumper weighing it at just over 10lb from day one.Here he is deeply contemplating the future of grid storage.

Since then he has been doing well - as this weeks performance data clearly illustrates.

by WorldWideWah (noreply@blogger.com) at August 14, 2010 10:51

August 12, 2010

National Grid Service

Innovative NGS

The more observant amongst you may have noticed a recent addition to the top level tabs on the NGS website.

As well as making sure that the NGS runs all the services required by our users and sites on a day to day basis, we are also working on services for the future. To keep you up to date with our work "behind the scenes", we have added a new Innovation section to the website.

This section currently includes information on our prototype cloud service and user interfaces such as as GSI-PuTTY and R through the NGS (Windows). Some of these services are looking for people to help develop the applications or to be the first users and to help develop them for the wider community.

We hope that you find this section interesting and useful!

by Gillian (noreply@blogger.com) at August 12, 2010 15:31

August 10, 2010

National Grid Service

Want to know how our databases work?

Then come along to the NGS Innovation Forum in November!

We have recently announced the details of our first few presentations with Simon Collins from the NGS team at the University of Manchester bravely giving two presentations.

His first presentation on day 1 will cover using databases on the NGS. It will give an overview of the databases supported on the NGS, what offerings and advantages these give to users and real-life case studies of how NGS users are taking advantage of this service.

Simon will also be giving a second presentation on day 2 aimed more at our member sites which will look at accounting services and solutions for NGS member sites. Current and potential sites will find out how accounting is supported on the NGS, the different choices available for sites to report accounting information and the information and tools that are available to sites that choose to do so.

Keep your eye on the NGS website over the next few weeks to see more details of presentations at the IF.

by Gillian (noreply@blogger.com) at August 10, 2010 16:40

August 09, 2010

GridPP Storage

DPM 1.7.4 and the case of the changing python module.

Since we've just had our first GridPP DPM Toolkit user hit this problem, I thought the time was right to blog about it.

Between DPM 1.7.3 and DPM 1.7.4, there is one mostly-invisible change that only hits people using the api (like, for example, the GridPP DPM Toolkit). Basically, in an attempt to clean up code and make maintenance easier, the api modules have been renamed and split by the programming language that they support.
This means that older versions of the GridPP DPM Toolkit can't interface with DPM 1.7.4 and above, as they don't know the name of the module to import. The symptom of this is the "Failed to import DPM API", in the case where following the instructions provided doesn't help at all.

Luckily, we wise men at GridPP Storage already solved this problem.
http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.storage/
contains two versions of the GridPP DPM Toolkit RPM for the latest release - one suffixed "DPM173-1" and one suffixed "DPM174-1". Fairly obviously, the difference is the version of DPM they work against.

Future releases of the Toolkit will only support the DPM 1.7.4 modules (and may come relicensed, thanks to the concerns of one of our frequent correspondants, who shall remain nameless).

by Sam Skipsey (noreply@blogger.com) at August 09, 2010 09:36