Where to share data and code… a recent experience.

Today I am writing a rant here since I am quite fed up.

Removing the tag and graffiti. Street art by an unknown artist found in Oslo. Photo by: Thomas H.A. Haverkamp

Removing the tags and graffiti.
Street art by an unknown artist found in Oslo.
Photo by: Thomas H.A. Haverkamp

The reason for this post is the following: A paper on which I am a co-author was recently accepted for publication in the journal BMC bioinformatics, after being rejected in BMC genomics since our paper was not suitable for that journal. It’s a bioinformatics tools to analyze big blast files, so that is not bad at all. Such news is wonderful, and as a young scientist these are the moments you work for.

So what is making me fed up?Well, at the moment our paper got accepted the principal investigator, or PI, got it in his – yes, the PI is not a female – head that the present link for the software in the manuscript was no good. At that point I was thinking that he just wanted to change the link in the manuscript of the University of Oslo website to a nicer format so that it would be easier to read in the paper. But I could not have been more wrong here.

The PI was in favour of a dedicated website where we could publish our software and in addition, the website should become a repository for all the data produced in his group. I don’t disagree with that, but why does he come with that when the paper is already accepted and ready to be published? We have worked in the last three years on the paper, and until the news of the acceptance he had not made any clear effort to signal that the link in the manuscript needed to be changed. So why now?

Well, I have no real clue, and I will not speculate here on that.

But we are now a month further, and contrary to what the PI wrote to the BMC editorial office he did not fix the link in a few days. More so, I have not even seen that we have actually submitted the final accepted manuscript to the journal so they can work on it. The reason for me to write this up, is that all my mailing with the PI and my co-authors is not helping the process move forward, and I am getting frustrated about the time it takes for something so trivial.  The PI claims he has trouble finding and setting up a web address that is suitable, and he is actually paying for the website out of his own budget.

When I found out that he was doing that, I pointed out that there are excellent websites such as Github.com that will help you share your code with others. Of course Github is not suitable for the large data files that produced in modern biological experiments. But even for those there are excellent repositories such as the Short Read Archive (not easy to work with), or Dryad which is set-up to handle all kinds of scientific data. Not always free, but paying a small fee to keep your data safe and available for others is a small price to pay.

Recently, I shared my geochemical data from my Olso Fjord Pockmark paper at another excellent website called www.pangaea.de. Never done that before, but one of the reviewers on my paper suggested it, and I happily did. There your data is really checked by qualified scientists that even help you get the data in exactly the right format, so others can use it with ease.

At this point I fail to understand why a PI would like to keep all the data from the experiments in his lab on a website that is maintained and hosted by the PI or the lab. Knowing a lot of PI’s tells me that most of them are way to busy to have the time to actually do such things. Furthermore, asking PhD students or Postdocs to do it runs the risk that once they are gone, you possibly need to train somebody to keep the website maintained. Which all suggests that setting up your own data repository is not a thing that will last for a long time and will distract you from doing science. Unless, you really like to do such things, but I would not consider it.

In the end I think it is good for science, if data and code is available at a few good and well maintained repositories, instead of on thousands of web pages build and maintained by the scientists themselves. It makes it a lot simpler to obtain, re-use data and validate scientific claims made in the literature. Especially with software, I have encountered too often that software was not present on websites, or the website had ceased to exist after a few years. So how does that help the scientists who made those web sites, and how does that help science in general?

Okay, that is enough with the ranting. Any comments or suggestions on this are appreciated. Now it’s time to work on my Thermotoga project where the PI is a female and she is a great supporter of sharing data.


About Thomas Haverkamp

A microbial ecologist, an amateur photographer and a proud father of a tiny little girl.
