Installing phiSpy on my macbook.

The reason for this post is that in my current project I need to identify prophages in Thermotogales genomes. Prophages are phages /virusses that are integrated into the genome of their host, in our case Bacteria. To check if our genome sequences have prophages in I ran two phage finding tools on our selection of genomes: prophinder and PHAST. Both tools gave me different answers for my set of genomes.

With that in mind I decided to test one of the newer tools to identify prophages in bacterial genomes, phiSpy. Here I describe how I installed it on my macbook and how I tested that phiSpy works.

A phage

My reason for not installing this on my own area on the abel computing cluster is that I first want to test this on my mac to see if the data is useful that comes out of this piece of software.

The installation

I started with downloading the software from the phiSpy website:
http://sourceforge.net/projects/phispy/files/phiSpyNov11_v2.3.zip

In the downloads folder I unzipped the file and it created the folder phiSpy_Nov_v2.3. This folder was then moved to the Applications folder on my mac.

In the phiSpy folder you find several files, including the README.txt file. Reading this file I see that it should work on all unix like systems. So It might work on my Macbook wit OSX version 10.6.8.

The next point is the software requirements for phiSpy.

These are:

  1. Python 2.7.2 or later.
    I currently have installed Enthought Canopy Python 2.7.6, 64-bit, so that should not be a problem. Note, the Canopy system is easy for updating your python installation.
  2. Biopython – version 1.58 or later.
    Via the Canopy system I installed Biopython. To check which version I have, I can use the canopy system, but I can also try to see if I can identify the biopython version when I run python. Just to make it more interesting.
    In the terminal I start up python.
    Then I type:
    import Bio # This will load the biopython module
    print Bio.__version__ # This asks for the version number of biopythonIt returns:
    1.64So I have biopython 1.64 installed. Good.
    Then I stop the python terminal ” quit()”
  3. gcc – GNU project C and C++ compiler – version 4.4.1 or later
    I can check which version I have of gcc in the terminal: gcc -v
    This returns: gcc version 4.2.1 (Apple Inc. build 5664)So I should update the gcc version. This is not trivial on a mac, and I do not want to install Xcode (≈5 Gb). If you have a mac with a newer operating system, you might not need to do this, the gcc version should be newer their.
    In my case I decided to skip this step and see if I could run phiSpy without the update. Of course this is not recommended, but I try it anyway.
  4. R-statistics version v2.9.2.
    On my mac I already have R version 3.1.1 (2014-07-10) — “Sock it to Me”
    (Why the strange extra name?)
  5. Package randomForest in R – version 4.5-36 or later.
    I do not have this installed. So I started up the R console and I opened the package installer. Then I searched CRAN for the randomForest package. Version 4.6-10 is available and I install it. So easy.

At this point I should be ready to install phiSpy. In the terminal I go to the folder for phiSpy and I follow the installation instructions provided in the README.txt file.

The only thing I need to do is to type: “make”
Only one line of text appears and no error messages.

So the big question is, does my installation of phiSpy work?

To test phiSpy I downloaded the test data set from the website:
http://sourceforge.net/projects/phispy/files/Test_Organism.zip

The dataset is moved to the temp folder on my machine, and I unzipped the file. This created a folder called: Test_Organisms.

Then I try to run phiSpy on the dataset.
The command that I use is:

python /Applications/phiSpyNov11_v2.3/phiSpy.py -i Test_Organism/160490.1/ -o output_directory -t 25

The output:
Making Test Set… (need couple of minutes)
Start Classification Algorithm
Using training flag: 25
Done with classification Algorithm
Start evaluation…
Threshold for fn is 5
Done!!!

Hmmm, it seems to have worked without crashing. Let’s check the results in the output_directory. There I find three files:
initial_tbl.txt
prophage.tbl
prophage_tbl.txt

The file initial_tbl.txt and prophage_tbl.txt both contains all the genes in the analyzed genome and both have 16 columns. But the initial_tbl.txt file has only the first 9 filled out. The prophage_tbl.txt file has the 10th column filled for every gene indicating which genes are phage-like. The phage-like genes are encoded with a “1”, the non-phage-like genes are encoded with a “0”. The columns 11 to 16 are reserved to describe the attL and attR sites, which are the junction sequences and indicate the integration sites of the prophage.

The file prophage_tbl contains the following:
fig|160490.1.pp.1 NC_002737_529631_573586
fig|160490.1.pp.2 NC_002737_778642_820599
fig|160490.1.pp.3 NC_002737_1185686_1221283
fig|160490.1.pp.4 NC_002737_1775862_1810396

These are the regions with phage-like genes.

We can check that with the grep command. I look up the line with genome position 778642 in the prophage_tbl.txt file and I ask grep to show me also the one line above my line of interest (-B flag) and the line below it (-A flag). This to tell me that the line above is not annotated as a phage-like gene.

grep -F 778642 -A 1 -B1 prophage_tbl.txt

This gives me three lines:
fig|160490.1.peg.706 dTDP-glucose 4,6-dehydratase (EC 4.2.1.46) NC_002737 777508 778548 744 0.703489501488 1 0 0

fig|160490.1.peg.707 Phage integrase NC_002737 779781 778642 745 0.728489501488 1 1.5 1

fig|160490.1.peg.708 Pathogenicity island SaPIn2 NC_002737 780844 780035 746 0.753489501488 1 0 1

This shows that the line we selected from the file contains the description of a gene annotated as a Phage integrase. This gene is on the opposite DNA strand.

Now we can extract all lines that have a 1 in column 10 of the prophage_tbl.txt file with a awk command:

awk -F”\t” ‘$10 == 1 { print $0 }’ prophage_tbl.txt > phages_genes.txt

This gave me a file with 203 genes identified as a phage-like genes. Checking this file gives me a interesting observation.

The sequences between 1185686 and 1221283 (phage region fig|160490.1.pp.3)
have the attL and attR sequences annotated. This might indicate that this region is a proper prophage, while the other three regions have lost these sites, or they are not really phages.

With this I can conclude that phiSpy works on my mac and now I am ready to analyse my bacterial genomes.

Advertisements

About Thomas Haverkamp

A microbial ecologist, an amateur photographer and a proud father of a tiny little girl.
This entry was posted in Genomics & more, Microbes, software and tagged , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s