Methods





Obtention of Varanus komodoensis' genome

Varanus komodoensis’ genome was obtained from a file given by the bioinformatics teachers as a Fasta file. It can be found in the following directory:



mnt/NFS_UPF/soft/genomes/2019/Varanus_komodoensis/genome.fa







Obtention of the human's and Anolis carolinensis' selenoproteome

The lizard Anolis Carolinensis genome has the closest phylogenetic relationship with our specie. However, its selenoprotein annotation is incomplete, some of its proteins were really short and incomplete, so we could not make an accurate prediction. Therefore, we finally decided to compare Varanus komodoensis genome also with a further relative species that has a very well annotated selenoproteome: the Homo sapiens’.



In order to obtain the selenoproteins of our chosen species we used the SelenoDB website. Human proteins were taken from the 1.0 version of the database, and Anolis Carolinensis’ proteins from the 2.0 one, as it was not in the first version. We downloaded the protein aminoacid sequences in two fasta files, one for each specie, and then we created a single file for every query, separately.



First, we created a directory for Human and another one for Anolis Carolinensis where we put all the sequences. Then we created two other directories to perform the analysis and store the results and the information obtained from the comparisons.



To split the big fasta files in multiple files for each query we used the next commands:



awk '/^>/ {OUT=substr($0,2) ".fasta"}; OUT {print >OUT}' lizardx.fa

awk '/^>/ {OUT=substr($0,2) ".fasta"}; OUT {print >OUT}' humanx.fa







Prediction process

Here, an explanation of the different steps will be done in order to have a better understanding of the development of our prediction process. A bash programme was created to automate the prediction and annotation process, script. It consists in the generation of different files with the protein alignments prediction in the genome for all the possibilities. Finally, after obtaining the results, data analysis will be performed.







Secisearch3 and Seblastian

The last thing we performed was a SECIS prediction. To do so, we used the programs Seblastian and SECISearch3 with the results of the subseq as inputs. (these tools can be found in http://seblastian.crg.es/). This prediction is important since a SECIS element must be found in all selenoproteins. It is also important to compare the positions of the predicted SECIS element with the absolute positions of our predicted gene, in order to determine if the SECIS predicted is found in the 3’ UTR region. If not, it will be dismissed.





Phylogenetic trees

Finally, a phylogenetic tree was performed for each protein family in order to study the the relationship between the queries and the predicted proteins. It also helped us to check if the scaffold chosen correlated with the phylogeny or not. The input was the human and/or lizard protein and our predicted proteins. The tool used in this case can be found in http://www.phylogeny.fr/simple_phylogeny.cgi.