July 25, 2014

The SSI Blog

Desert Island Hard Disks: David De Roure

You find yourself stranded on a beautiful desert island. Fortunately, the island is equipped with the basics needed to sustain life: food, water, solar power, a computer and a network connection. Consummate professional that you are, you have brought the three software packages you need to continue your life and research. What software would you choose and - go on - what luxury item would you take to make life easier?

Today we'll be hearing from David De Roure, Director of the Oxford eResearch Centre.

I've always felt that, should one ever find oneself in a fairytale "three wishes" scenario, then surely the first wish would be to be able to do magic. Hence my first software package must be a programming language with which I could do anything, and for me that is Lisp. To be specific, it would be the Scheme dialect, and the MIT compiler, but that might all depend how much I can grab in those precious seconds – "any Scheme will do" as someone once nearly sang. And given one Lisp I can make any other – or indeed any other language, of my own invention. Metalinguistic magic. Lisp was and is my first language and I believe my brain might be programmed in it.

Desert Island Hard Disks, author:David De Roure

read more

by s.hettrick at July 25, 2014 12:00

I don't fix printers or do IT support - I'm a Research Software Engineer

By Gillian Law, Tech Literate.

Ashley Towers, Research Software Engineer at the University of Sheffield, explains why and how he won the job title he wanted. Job titles matter. Towers almost didn’t apply for his current job at the University of Sheffield – a job that he loves - because the title was all wrong.

"They advertised for a Computing Officer, and I probably wouldn't have paid any attention if I saw it on a job page, because it sounds like a IT Support, fixing-the-printers role. But I was fortunate in that my sister works for the University, and she sent me the advert."

The University was, in fact, looking for someone to buy or develop software for students in the School of Clinical Dentistry. Four years later, Towers has made the job his own, creating software that is vital to the dentistry students and brings great research possibilities – and has persuaded the School to change his job title to reflect it.

Towers is now a Research Software Engineer – a job title that the Software Sustainability Institute has been promoting since 2012.

Research Software Engineers, Careers, Dentistry, author:Gillian Law

read more

by s.hettrick at July 25, 2014 09:00

July 24, 2014

The SSI Blog

First FAIRport-ELIXIR BYOD Workshop

By Alasdair J G Gray, Lecturer in Computer Science, Heriot-Watt University

At the end of June, a group of individuals from across Europe came together in Leiden for the first FAIRport-ELIXIR Bring Your Own Data (BYOD) workshop, which was also sponsored by the Dutch Techcentre for Life Sciences. None of us quite knew what would happen but we were all excited that such an event was taking place. The result was better than we expected.

This first BYOD workshop combined experts in Linked Data as well as in MycoBase and the Human Protein Atlas. The participants were evenly split between data providers with some, but not a lot of RDF knowledge, and trainers, who were experts in semantic web technologies. The workshop’s aim was to give the data providers a mix of tutorial and hackathon that would make their data available in a more accessible and reusable manner, based on the Data FAIRport initiative, and using RDF. The goal was to develop showcases that would demonstrate the added value of interoperable data to facilitate questions across multiple resources.

author:Alasdair J G Gray, Bring Your Own Data, FAIRport, Elixir, RDF

read more

by a.hay at July 24, 2014 13:00

July 23, 2014

GridPP Storage

IPv6 and XrootD 4

Xrootd version 4 has recently been released. As QMUL is involved in IPv6 testing, and as this new release now supports IPv6, I thought I ought to test it.  So,  what does this involve?

  1. Set up a dual stack virtual machine - our deployment system now makes this relatively easy. 
  2. Install xrootd. QMUL is a StoRM/Lustre site, and has an existing xrootd server that is part of Atlas's  FAX (Federated ATLAS storage systems using XRootD), so it's just a matter of configuring a new machine to export our posix storage in much the same way.  In fact, I've done it slightly differently as I'm also testing ARGUS authentication, but that's something for another blog post. 
  3. Test it - the difficult bit...
I decided to test it using CERN's dual stack lxplus machine: lxplus-ipv6.cern.ch.

First, I tested that I'd got FAX set up correctly:

voms-proxy-init --voms atlas

All 3 tests were successful, so I've got FAX working, next configure it to use my test machine:

export STORAGEPREFIX=root://xrootd02.esc.qmul.ac.uk:1094/

Which also gave 3 successful tests out of 3. Finally, to prove that downloading files works, and that it isn't just redirection that works, I tested a file that should only be at QMUL:

xrdcp -d 1 root://xrootd02.esc.qmul.ac.uk:1094//atlas/rucio/user/ivukotic:user.ivukotic.xrootd.uki-lt2-qmul-1M -> /dev/null 

All of these reported that they were successful. Were they using IPv6 though? Well looking at Xrootd's logs, it certainly thinks so - at least for some connections, though some still seem to be using IPv4:

140723 16:03:47 18291 XrootdXeq: cwalker.19073:26@lxplus0063.cern.ch pub IPv6 login as atlas027
140723 16:04:01 18271 XrootdXeq: cwalker.20147:27@lxplus0063.cern.ch pub IPv4 login as atlas027
140723 16:04:29 23892 XrootdXeq: cwalker.20189:26@lxplus0063.cern.ch pub IPv6 login as atlas027


by Christopher J. Walker (noreply@blogger.com) at July 23, 2014 17:57

The SSI Blog

3D archaeology - now low-cost, high-volume and crowd-sourced

By Andrew Bevan, Senior Lecturer, UCL Institute of Archaeology.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

Archaeologists have long had a taste for computer-based methods, not least because of their need to organise large datasets of sites and finds, search for statistical patterns and map out the results geographically. Digital technologies have been important in fieldwork for at least two decades and increasingly important for sharing archaeology with a wider public online. However, the last decade of advances in computer vision now means that the future of archaeological recording – from whole landscapes of past human activity to archaeological sites to museum objects – is increasingly digital, 3D and citizen-led.

Structure-from-motion and multi-view stereo constitute a bundle of ‘computer vision’ methods (‘SfM’). They are a form of flexible photogrammetry (the latter being a science with a much older pedigree) in which software is able to automatically identify small features in a digital photograph and then match these across large sets of heavily-overlapping images in order to reconstruct the camera positions from which these photographs were taken.

author:Andrew Bevan, Archaeology, Crowd-Sourcing, Python, 3D Modelling, Terracota Army, China

read more

by a.hay at July 23, 2014 09:00

July 22, 2014

The SSI Blog

Automatic performance tuning and reproducibility as a side effect

By Grigori Furisin, President and CTO of international cTuning foundation.

Computer systems' users are always eager to have faster, smaller, cheaper, more reliable and power efficient computer systems either to improve their everyday tasks or to continue innovation in science and technology. However, designing and optimising such systems is becoming excessively time consuming, costly and error prone due to an enormous number of available design and optimisation choices and complex interactions between all software and hardware components. Furthermore, multiple characteristics have to be carefully balanced at the same time including execution time, code size, compilation time, power consumption and reliability using a growing number of incompatible tools and techniques with many ad-hoc, intuition based heuristics.

During the EU FP6 MILEPOST project in 2006-2009, we attempted to solve the above issues by combining empirical performance auto-tuning with machine learning. We wanted to be able to automatically and adaptively explore and model large design and optimisation spaces. This, in turn, could allow us to quickly predict better program optimisations and hardware designs to minimise execution time, power consumption, code size, compilation time and other important characteristics. However, during this project, we faced multiple problems.

author:Grigori Furisin, Reproducibility, Automatic Performance Tuning

read more

by s.hettrick at July 22, 2014 16:00

A look at FORTRAN unit test frameworks

StrawberriesBy Mike Jackson, Software Architect.

As part of our open call collaboration with TPLS I was to develop a suite of unit tests. TPLS is written in FORTRAN and while there are de-facto standard unit test frameworks for Java (JUnit) or Python (PyUnit), for FORTRAN there are none. In this blog post I look at the test frameworks that are available for FORTRAN, compare two, FRUIT and pFUnit, and explain why I opted to use FRUIT for TPLS.

software development, testing, unit test, fortran, open call

read more

by m.jackson at July 22, 2014 12:51

July 18, 2014

The SSI Blog

Venerable beads – tracing the origins of ancient jewellery

By Beatrice Demarchi, Research Fellow at the Department of Archaeology, and Dr Julie Wilson, Lecturer at the Department of Chemistry. University of York.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

Interdisciplinary research between fields as diverse as biochemistry, physics and archaeology can help us decipher the most amazing things, for example the choices that our ancestors made thousands of years ago when deciding how to adorn their dead.

People have been using personal ornaments (jewellery) for tens of thousands of years to demonstrate their status in a group or society. If we are lucky, we may find these ornaments in graves or caves and can speculate as to their meaning and provenance. Were these “jewels” made with local raw materials? Or were they traded in exchange for exotic goods? Each of these possibilities tells us something about the environment that people lived in and how they chose to exploit it. However it is often very difficult to identify the raw material when the ornaments are both sculpted and also found in a degraded state.

author:Beatrice Demarchi, author:Julie Wilson, Archaeology, Shells, Jewellery, Molluscs, Learning Vector Quantisation, dating, LVQ algorithm

read more

by a.hay at July 18, 2014 09:00

July 16, 2014

The SSI Blog

Smart Glasses – a new vision for the visually impaired

By Dr Stuart Golodetz, Postdoctoral Research Associate at the Nuffield Department of Clinical Neurosciences, University of Oxford, and head of object detection and tracking for the Smart Glasses Project.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

People who are visually-impaired face numerous daily challenges, from how to find where to go and the best way to avoid obstacles on the way, to how best to locate, recognise and interact with other people and objects. This can have a significant impact on their independence, confidence and overall quality of life. However, although visual impairments can prevent people from making use of visual signals from the world around them, only a small percentage of visually-impaired people are completely blind in the sense that they receive no useful visual inputs at all.

It is far more common instead for people to retain some level of residual vision, whether that amounts to small regions of their visual field in which they can see, or the more limited ability to distinguish between light and dark. In some cases, the real issue is one of visual signals being drowned out by ‘noise’, and by boosting the signal-to-noise ratio in those regions it is sometimes possible to provide people with at least some ability to perceive what they are looking at.

author:Stuart Golodetz, Smart Glasses, Vision, Nuffield, Oxford, SmartSpecs

read more

by a.hay at July 16, 2014 13:00

July 15, 2014

The SSI Blog

I'd never been to a programming conference before - I wish I hadn't

By April Wright, Graduate Student, University of Texas at Austin.

This post is reproduced from the original by kind permission of the author. Following the original post, April received a number of messages of support, which can be viewed on Storify.

I went to SciPy this week. I'd never been to a programming conference before, and they featured a lot of education talks.

I wish I hadn't.

Last night, at the Software Carpentry mixer, a grand total of five men shook my husband's hand and ignored mine. My total of new people met is a dismal ten. Compare it to the Evolution meetings, which is my meeting, where I met upwards of forty new people, had a blast, and was treated by all participants like a member of the community.

I was reminded of a question my friend Steve Young asked me a while back: "What makes some women stick it out and be awesome [in tech]?" I'm going to turn the question around a bit. It's easy to be awesome. Lots of women are doing awesome things. But I could have sat in my office and worked all week, rather than attending this meeting. I could have done far more awesome alone, and I wouldn't have had my face rubbed in the fact that I'm different. I'd feel a lot less alone had I spent the week hanging out alone.

Events, Comment, Women in software

read more

by s.hettrick at July 15, 2014 07:24

July 11, 2014

The SSI Blog

Google Glass in the operating theatre

By Shafi Ahmed, Colorectal Cancer Lead at Barts Health NHS Trust and Associate Dean at Queen Mary University of London.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

Over the last few centuries, surgery has traditionally been taught as an apprenticeship with students clamouring around the operating table to glimpse a view of both surgical technique and clinical anatomy.

Not much as changed over this time, even now, medical students will be crowded in the operating theatre, sometimes stuck in the background and waiting for many hours to get a glimpse of theory being put into practice.

Thanks to the introduction of video imaging systems such as the laparoscope - as used in keyhole surgery - we have begun to visualise surgery in a much clearer and more accessible fashion for a larger number of students, and so this has become the benchmark for training in modern abdominal surgery.

author:Shafi Ahmed, Google Glass, Surgery, Remote, QMUL, Barts

read more

by a.hay at July 11, 2014 09:58

July 10, 2014

The SSI Blog

First steps towards understanding the size of the research software community

By Simon Hettrick, Deputy Director.

In an earlier post, I discussed our plans for investigating the number of researchers who rely on software. We’ve spent the last month looking at the feasibility of some of our ideas. In this post, I’ll present our findings about one of these approaches and some of the problems that we’ve encountered. If you’ve ever wondered what happens when a clueless physicist starts to dabble in social science, then this is the post for you.

First of all, a quick recap. Anecdotally, it seems that the number of researchers who rely on software for their research is – pretty much – everyone. There are few researchers who don’t use software in some way, even when we discount things like word processing and other non-result-generating software. But without a study, we lack the evidence to make this case convincingly. And it’s not just about the size of the community, it’s also about demographics. Seemingly simply questions are unanswerable without knowing the make up of the research software community. How much EPSRC funding is spent on researchers who rely on software? Is that greater, proportionally speaking, than the AHRC?

Research software community

read more

by s.hettrick at July 10, 2014 09:20

July 08, 2014

The SSI Blog

The Reproducible Research Vibe from the Fellows 2014 summer meeting

Nine of this year’s Fellows met in sunny Southampton on June 23rd-24th 2014 to discuss various aspects of reproducible research and how it would shape future engagement with their research domains.

From the discussions that took place over the following two days, it became clear that the UK research community is focussing on Open Access, Open Science and Open Data, and that the time is ripe to build on these endeavours and promote the necessity and benefits of reproducible, computationally derived results. 

To do this, the Fellows agreed that targeting researchers who were unaware of reproducible research should be a priority for the Institute. This, of course, will require simple and convincing messages about the benefits and need of reproducibility for often time pressed but influential people such as Principal Investigators.

author:Shoaib Sufi, Fellows, Software, Reproducibility, Sustainability

read more

by s.sufi at July 08, 2014 09:15

July 03, 2014


APEL EMI-3 upgrade

Here are some notes from Manchester upgrade to EMI-3 APEL. The new APEL is much simpler as it is a bunch of python scripts with a couple of key=value configuration files, rather than java scripts with XML files. It doesn't have YAIM to configure it but since it is much easier to install and configure it doesn't really matter anymore. As an added bonus I found that it's also much faster when it publishes and doesn't require any tedious tuning of how many records at the time to publish.

So Manchester starting point to upgrade was
  • EMI-2 APEL node
  • EMI-2 APEL parsers on EMI-3 cream CEs
    • We have 1 batch system per CE so I haven't tried a configuration in which there is only 1 batch system and multiple CEs
  • In few months we may move to ARC-CE so configuration was done mostly manually
I didn't preserve the old local APEL database since all the records are in the central APEL one anyway.  So the steps to carrie out were the following:
  1. Install a new EMI-3 APEL node
  2. Configure it 
  3. Upgrade the CEs parsers to EMI-3 and point them the new node
  4. Disable the old EMI-2 APEL node and backup its DB
  5. Run the parsers and fill the new APEL node DB
  6. Publish all records for the previous month from the new APEL machine

Install a new EMI-3 APEL node

Installed a vanilla VM with
  • EMI-3 repositories
  • Mysql DB
  • Host certificates
  • ca-policy-egi-core
I did this with puppet since all the bits and pieces were already there for other type of services I just put together the profile for this machine. Then manually I've installed the rpms for APEL
  • yum install --nogpg emi-release
  • yum install apel-ssm apel-client apel-lib

Configure EMI-3 APEL node

I followed the instructions on the official EMI-3 APEL server guide.

There are no tips here I've only changed the obvious fields Like site_name and password plus few others like the top BDII because we have a local one and the location of the hostcertificate because we have a different name.

I didn't install install the publisher cron job at this stage because the machine was not ready yet to publish

Upgrade the CEs parsers to EMI-3 and point them the new node

The CEs as I said are already on EMI-3, only the APEL parsers were still EMI-2 so I disabled the EMI-2 cron job
  • rm /etc/cron.d/glite-apel-pbs-parser  
Installed the EMI-3 APEL  parsers rpm
  • yum install apel-parser
Configured the parsers following the instructions on the official EMI-3 APEL parser guide setting the obvious parameters and installing also the cron job after a trial parsing test.

NOTE: the parser configuration file for me is a bit confusing regarding the batch system name it states

# Batch system hostname.  This does not need to be a definitive hostname,
# but it should uniquely identify the batch system.
# Example: pbs.gridpp.rl.ac.uk
lrms_server =

It seems you can use any name. You are of course better off using your batch system server name. We have one for each CE so the configuration file on each contains that. In the database this will identify the records from each machine CE. I'm not sure about what happens with 1 batch system and several CEs. Following literally one should put only the batch system but then there is no distinction between CEs.

Disable the old EMI-2 APEL node and backup its DB

Just removed the old cron job the machine is still running but it isn't doing anything while waiting to be decomissioned.

Run the parsers and fill the new APEL node DB

You will need to publish an entire month prior to when you are installing. For example for us it was publish all the June records, but since I didn't want to republish everything we had in the log files I moved the batch system and blah log files prior to mid May to a backup subdirectory and parsed only the log files for end of May June. May days were needed because some jobs that finished in June early days had started in May and one wants the complete record. The first jobs to finish in June in Manchester started on the 25th of May so you may want to go back a bit with the parsing.

Publish all records for the previous month from the new APEL machine

Finally on the new machine now filled with the June records plus some May I've done a bit of DB clean up as suggested by the APEL team. If you don't do this step the APEL team will do it centrally before stitching the old EMI-2 record and the new ones 
  • Delete from JobRecords where EndTime<"2014-06-01";
  • Delete from SuperSummaries where Month="5";
After all this I modified the configuration file (/etc/apel/client.cfg) to publish a gap from the 25th of May until the day before I published i.e. 1st of July. I then modified again to put back "latest". I finally installed the cron job also on the new APEL to publish regularly every day.

by Alessandra Forti (noreply@blogger.com) at July 03, 2014 18:37

June 30, 2014

GridPP Storage

Thank you for making a simple compliance test very happy

Rob and I had a look at the gstat tests for RAL's CASTOR. For a good while now we have had a number of errors/warnings raised. They did not affect production: so what are they?

Each error message has a bit of text associated with it, saying typically "something is incompatible with something else" - like an "access control base rule" (ACBR) is incorrect, or tape published not consistent with type of Storage Element (SE). The ACBR error arises due to legacy attributes being published alongside the modern ones, and the latter complains about CASTOR presenting itself as tape store (via a particular SE)

So what is going on?  Well, the (only) way to find out is to locate the test script and find out what exactly it is querying. In this case, it is a python script running LDAP queries, and luckily it can be found in CERN's source code repositories. (How did we find it in this repository? Why, by using a search engine, of course.)

Ah, splendid, so by checking the Documentation™ (also known as "source code" to some), we discover that it needs all ACBRs to be "correct" (not just one for each area) and the legacy ones need an extra slash on the VO value, and an SE with no tape pools should call itself "disk" even if it sits on a tape store.

So it's essentially test driven development: to make the final warnings go away, we need to read the code that is validating it, to engineer the LDIF to make the validation errors go away.

by Jens Jensen (noreply@blogger.com) at June 30, 2014 11:14