August 22, 2014

The SSI Blog

The 20-line script that saves you hours of mind-numbing tedium

By Simon Hettrick, Deputy Director.

Apart from a brief liaison during my undergraduate years, I am cursed with a complete lack of training in programming. When I face problems that are easily solved with some basic coding, I experience the beginner’s dilemma (not unlike the problem with automation): do I choose the frustration entailed in working out how to write a short program to do the work automatically, or the monotony of performing the same simple task a thousand times by hand?

Once you’ve taken a first step into coding, and you see how quickly and efficiently it can change your work, it’s difficult to stop. My epiphany occurred when someone renamed a hundred images for me using a single command line instruction. I was going to make those changes by hand, so this little trick saved me something like an hour of tedious work. In addition to these tricks, I find short programs provide the most compelling reason for researchers to learn a bit about coding. The heavyweight software packages are, of course, very important to research, but the 20-line script that saves you hours of mind-numbing tedium is the real hero of research.

Opinion, author:Simon Hettrick, R, Bash, Coding, Programming

read more

by s.hettrick at August 22, 2014 13:00

GridPP Storage

Lambda station

So what did CMS say at GridPP33?  Having looked ahead to the future, they came up with more speculative suggestions. Like FNAL's Lambda Station in the past, one suggestion was to look again at scheduling network for transfers, what we might nowadays call network-as-a-service (well, near enough): since we schedule transfers, it would indeed make sense to integrate networks more closely with the pre-allocation at the endpoints (where you'd bringOnline() at the source and schedule the transfer to avoid saturating the channel.) Phoebus is a related approach from Internet2. 

by Jens Jensen ( at August 22, 2014 10:50

August 21, 2014

GridPP Storage

Updated data models from experiments

At the GridPP meeting in Ambleside ATLAS announced having lifetime on their files: not quite like the SRM implementation where a file could have a finite when created, but more like a timer which counts after each access. Unlike SRM, deletion when the file has been not accessed for the set length of time, the file will be automatically deleted. Also notable is that files can now belong to multiple datasets, and they are set with automatic replication policies (well, basically how many replicas at T1s are required.) Now with extra AOD visualisation goodness.

Also interesting updates from LHCb, they are continuing to use SRM to stage files from tape, but could be looking into FTS3 for this. Also discussed the DIRAC integrity checking with Sam over breakfast. In order to confuse the enemy they are not using their own GIT but code from various places: both LHCb and DIRAC have their own repositories, and some code is marked as "abandonware," so determining which code is being used in practice requires asking. This correspondent would have naïvely assumed that whatever comes out of git is being used... perhaps that just for high energy physics...

CMS to speak later.

by Jens Jensen ( at August 21, 2014 11:31

August 15, 2014

The SSI Blog

Desert Island Hard Disks: Greg Wilson

You find yourself stranded on a beautiful desert island. Fortunately, the island is equipped with the basics needed to sustain life: food, water, solar power, a computer and a network connection. Consummate professional that you are, you have brought the three software packages you need to continue your life and research. What software would you choose and - go on - what luxury item would you take to make life easier?

Today we hear from Greg Wilson, founder of Software Carpentry.
Like most people who have moved from programming to managing (and in my case, teaching), I feel nostalgic for the good old days when all I had to do was fix memory leaks in multi-threaded C++. So when I imagine being stranded on a desert island, my first thought is, I could use the time to learn how to program again! The reality is that programming has moved on in the decade since I last shipped a product, and I'd enjoy reacquainting myself with the craft I used to love.

The first software package I'd bring with me would therefore be the Glasgow Haskell Compiler (GHC). There are many newer functional programming languages, like Clojure, Scala and F#, but Haskell seems to have inherited the title of a tool for thinking about programming that for many years belonged to Scheme. Becoming a native speaker of Haskell would, I hope, force me to see all of programming with fresh eyes, and re-instill the sense of wonder I first felt when writing a small Pascal interpreter in Pascal more than thirty years ago.

Desert Island Hard Disks, author: Greg Wilson

read more

by s.hettrick at August 15, 2014 09:00

August 13, 2014

The SSI Blog

Women! Science is not for you!

By Pam Cameron, Managing Director of Novoscience, and Clare Taylor, Lecturer in medical microbiology, Edinburgh Napier University.

This article is part of our series Women in Software, in which we hear perspectives on a range of issues related to women who study and work with computers and software.

The title of this blog might seem to be preposterous given that numbers of female undergraduates in many STEM (science, technology, engineering, and maths) subjects are on the rise.  However, humour us for a few moments and read on.

We seem to be talking a lot about women these days. For example, women in technology, women in engineering, women in boardrooms, women in sport, women in computing, and so on. It’s good to talk about women, but in all honesty, we’ve been doing this for the last 50 years, and guess what? We’re still having the same conversations.

author:Pam Cameron, author:Clare Taylor, Women in software, Sexism, Stem, Edinburgh Fringe

read more

by a.hay at August 13, 2014 13:00 – getting the facts straight during humanitarian disasters

By Victor Naroditskiy, Post-Doctoral researcher on the ORCHID project, University of Southampton.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

We live in what has been described as the Information Age, but at times a better term would be the Disinformation Age. This is due to the sheer volume of information being propagated by the Internet, and in particular, social media, every day at the click of a mouse. Finding out the truth in this sea of contradictory material has, as a result, become increasingly difficult.

Can a coordinated collective effort be effective in quickly discerning true answers to key questions? Crowdsourcing has been used to solve several seemingly impossible search tasks, but there have been many more failures.

author:Victor Naroditskiy, Day in the software life, DISL,, Disasters, Crowd-sourcing

read more

by a.hay at August 13, 2014 09:00

August 08, 2014

GridPP Storage

ARGUS user suspension with DPM

Many grid services that need to authenticate their users do so with LCAS/LCMAPS plugins, making integration with a site central authentication server such as ARGUS relatively straightforward. With the ARGUS client LCAS/LCMAPS plugins configured, all authentication decisions are referred to the central service at the time they're made. When the site ARGUS is configured to use the EGI/NGI emergency user suspension policies, any centrally suspended user DN will be automatically blocked from accessing the site's services.

However, DPM does it's own authentication and maintains its own list of banned DNs, so rather than referring each decision to the site ARGUS, we need a specific tool to update DPM's view based on the site ARGUS server. Just to complicate matters further, DPM's packages live in the Fedora EPEL repository, which means that they cannot depend on the ARGUS client libraries, which do not.

The solution is the very small 'dpm-argus' package which is available from the EMI3 repositories for both SL5 and SL6; a package dependency bug has prevented its installation in the past, but this has been fixed as of EMI3 Update 19. It should be installed on the DPM head node (if installing manually rather than with yum, you'll also need the argus-pep-api-c package from EMI) and contains two files, the 'dpns-arguspoll' binary, and its manual page.

Running the tool is simple - it needs a 'resource string' to identify itself to the ARGUS server (for normal purposes it doesn't actually matter what it is) and the URL for the site ARGUS:
dpns-arguspoll my_resource_id
when run, it will iterate over the DNs known to the DPM, check each one against the ARGUS server, and update the DPM banning state accordingly. All that remains is to run it periodically. At Oxford we have an '/etc/cron.hourly/dpm-argus' script that simply looks like this:
# Sync DPM's internal user banning states from argus

dpns-arguspoll dpm_argleflargle 2>/dev/null
And that's it. If you want to be able to see the current list of DNs that your DPM server considers to be banned, then you can query the head node database directly:
echo "SELECT username from Cns_userinfo WHERE banned = 1;" | mysql -u dpminfo -p cns_db
At the moment that should show you my test DN, and probably nothing else.

by Ewan ( at August 08, 2014 14:28

August 06, 2014

The SSI Blog

Optimising OpenMP implementation of MD modelling package Tinker

By Weronika Filinger, Application Developer at EPCC.

Do you use scientific codes in your research? In this article I will describe briefly the process I have undertaken to optimise the parallel performance of a computational chemistry package – TINKER, as part of the EPCC/SSI APES project.

TINKER can be used to perform molecular modelling and molecular dynamics simulations. Originally written in Fortran 77 and currently in the process of being ported to Fortran 90, it has already been parallelised for a shared memory environment using OpenMP. The code does not scale well with increasing number of cores and the scaling is even poorer on AMD architectures. In my investigation I used a cluster machine hosted by EPCC consisting of 24 compute nodes, each with four 16-core AMD Opteron 6276 2.3 GHz Interlagos processors, giving a total of 64 cores per node that share memory. As the current parallelisation of TINKER is purely for shared memory, all my investigations were restricted to a single node.

author:Weronika Filinger, OpenMP, MD Modelling, TINKER

read more

by a.hay at August 06, 2014 09:00

August 05, 2014

The SSI Blog

Making the dead (Trigonotarbid) walk

A Trigonotarbid, yesterday.By Russell Garwood, 1851 Royal Commission Research Fellow at the School of Earth, Atmospheric and Environmental Science, University of Manchester.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

Palaeontology is often thought of as an antiquated field full of elderly researchers, but the discipline as it is today rarely matches this stereotype. Modern studies are multidisciplinary and use a diverse array of techniques to investigate the history of life. In some cases, palaeontologists can even make the dead walk.

When we think of the first animals to live on land we tend to imagine ungainly, fish-like creatures lolloping onto a beach somewhere  around 385 million years ago.

Day in the software life, DISL, author:Russell Garwood, Paleaontology, 3D imaging, Manchester

read more

by a.hay at August 05, 2014 13:00

Hackathons with a difference: writing collaborative papers

By Derek Groen, Fellow and Research Assistant, Centre for Computational Science, University College London

This September I will host the first Paper Hackathon event in Flore, Northamptonshire, with help from Joanna Lewis and support from both the Software Sustainability Institute and 2020 Science.

To my knowledge, this is the first time anyone has organised a Paper Hackathon, although amusingly there has been a Hackathon focused on the IPhone Papers app. With that in mind, let me share how I thought of this, and why I think it is a good idea.

The idea

2013 was quite an eventful year for me. I became a Fellow of the Software Sustainability Institute, but I also became an Associate Fellow of the 2020 Science project. As such, I familiarised myself with two groups and their differing backgrounds, but with an unexpectedly good match in terms of aims and objectives.

author:Derek Groen, Hackathon, Papers, Publishing

read more

by a.hay at August 05, 2014 09:00

August 04, 2014

The SSI Blog

Bioconductor conference – R-based and open-sourced

By Laurent Gatto, Software Sustainability Institute Fellow.

This past week saw the yearly Bioconductor conference  take place at the Dana-Farber Cancer Institute, Boston, MA. It started with a Developer Day on July 30th and continued with scientific talks and workshops until August 1st.

Bioconductor is an R-based open-source, open-development software project that provides tools for the analysis and comprehension of high-throughput genomics data. It was set up in 2001 by Robert Gentleman, co-founder, alongside Ross Ihaka, of R and is overseen by a core team based primarily at the Fred Hutchinson Cancer Research Center in Seattle, WA and by other members coming from a range of other US-based and international institutions.

author:Laurent Gatto, R, Bioconductor, Genome, DNA

read more

by a.hay at August 04, 2014 13:00

August 01, 2014

The SSI Blog

A sprint to new materials for Software Carpentry

By Aleksandra Pawlik, Training Lead.

Nine people, two days, two venues, two proper Polish lunches, many pull requests and one tour in the super computer centre. This summarises the Software and Data Carpentry sprint in Krakow during which we created new materials for Software Carpentry and updated existing materials. The team in Poland was one of nineteen teams taking part in the Mozilla Science Lab global sprint on 22-23 July 2014.

The idea of the sprint was based on the Random Hack of Kindness approach in which the teams all around the world work on hacks 24/7 (due to the different time zones in which the work is completed). The teams in the Software Carpentry sprint were located in Australia, New Zealand, Europe, US and Canada. When the teams in Australia and New Zealand were finishing their day, they handed over to the European groups. Around lunchtime in Europe, the (early birds!) teams in the North America were starting to join in. All sites had webcams streaming live picture. We could see and talk to each other, and it was really motivating to see people around the world hacking away or writing new materials for lesson.

Software Carpentry, author:Aleksandra Pawlik, Sprint

read more

by a.pawlik at August 01, 2014 09:00

July 31, 2014

The SSI Blog

Running an unconference - top tips

By Simon Hettrick, Deputy Director.

Have you ever needed to answer a big question that's fuzzily defined and can only hope of being answered by combining the experiences and knowledge of a wide group of disparate experts?

Whenever this situation occurs at the Institute, we apply the perfect solution: an unconference (like our Collaborations Workshop). Rather than sitting through a dull series of presentations, the attendees at an unconference are in control of what they do and how the conference works. This makes the confernece adaptive, so that the shifting boundaries of fuzzily defined questions can be honed and narrowed down until a solution is found.

At a good unconference you could have around 100 people all firing suggestions at you across a huge range of topics and, somehow, you have to accommodate these ideas on-the-fly into a constantly evolving agenda. Fortunately, the organisation of an unconference can be made much easier if you prepare in advance and follow our top tips.

unconference, Collaborations Workshop, top tips, author:Simon Hettrick

read more

by s.hettrick at July 31, 2014 09:00

July 30, 2014

The SSI Blog

Oh research software, how shalt I cite thee?

Citation needed placardBy Mike Jackson, Software Architect

The Institute are firm believers in software citation. Citing software, directly or via its associated publications, provides deserved credit for those who develop this vital research infrastructure. In this blog post I look at some ways in which research software developers are helping to promote the citation of software, by making it easier for researchers to do this. That's another thing we are firm believers in, automating the grunt work of using and developing software to free up time for research...​

As part of recent open call collaborations with both BoneJ and QuBIc I was taken aback by how involved citing software could get. For example, BoneJ request that their journal paper is cited, but, depending upon the plug-ins and additional features used, there are other papers that also need to be cited. Likewise, the FSL software library request citation of one of their three overview papers. Again, depending upon the specific tools used, there are additional papers to be cited. For example, using QuBIC's FABBER tool, bundled in FSL, requires citation of one paper, though citing three is recommended.

consultancy, open call, automation, citation, software as a research object

read more

by m.jackson at July 30, 2014 13:00

July 28, 2014

The SSI Blog

Research software program at CANARIE leading the way for Canadian researchers

By Scott Henwood, Lead Software Architect, CANARIE.

As is the case with most countries, Canada does not have a clearly identifiable research software development community. The community certainly exists, but members tend to interact only within specific research disciplines and the population is fairly transient due to the project-based approach to research software development used today. As a result, opportunities for collaboration and software re-use are missed and we see the same functionality being developed over and over again by the different software teams. This is particularly true in the case of non-research facing support software including user authentication components, visualisation tools and digital infrastructure management.

Ultimately more time and money is spent on redundant software development, leaving less of both for research-facing software and for the actual research itself. To help alleviate this situation, CANARIE has established our Research Software Program.

author:Scott Henwood, RSE, Funders

read more

by s.hettrick at July 28, 2014 09:00