THUNNUS ORIENTALIS

CONCLUSIONS



The aim of this study was to annotate all the selenoproteins and related machinery proteins in Thunnus orientalis (tuna) genome using a program created by ourselves that automated tblastn, exonerate, genewise and t-coffee processes. The annotation of selenoproteins is possible thanks to the homology that this specific proteins conserve between distant species.

As well as willing to contrast the conservation of the selenoprotein and its selenocysteine-21st- characteristic amino acid, we wanted to evaluate the conservation of the SECIS elements in the 3’-UTR region of the gene, since it is essential for the protein’s traduction, as seen in the introduction of this study.

In order to look for all the selenoproteins and their related machinery proteins in Thunnus orientalis genome it was needed to compare them with homologues in other animals’ genomes. Considering phylogenetics and evolutive proximity, we chose Danio rerio (zebrafish) as the best candidate, not only because its queries would be the most similar to our studied organism but also because of the importance of zebrafish in the research field that might ensure the proper annotation of these proteins in its genome.

However, as we described in the methodology and results, we found that some proteins were not correctly annotated or named in SelenoDB database, and for this reason we decided to search them in other databases such as Ensembl and/or UniProt. In those cases in which neither in these databases we obtained well annotated queries, we decided to use the human protein to guarantee a good annotation and naming of the protein.

Following our methodology we could predict:

Sec-containing selenoproteins
DI1, DI2, DI3a, DI3b, GPx1a, GPx1b, GPx2, GPx3, GPx3b, GPx4a, GPx4b, Fep15, Sel15, SelH, SelJ, SelM, SelN, SelO, SelP1a, SelP1b, SelR1a, SelR1b, SelT2, SelU1a, SelU1b, SelU1c, SelW2a, TR2 and SPS2

Selenoproteins in which we could not predict the Sec residue
SelI, SelK, SelL, SelS and TR3

Cys-containing homologues (or other homologues)
GPx7, GPx8, MsrA1, MsrA2, SelO2, SelT1a, SPS1, SelR2, SelR3, SelU2, SelU3, SelW2 and TR1

Selenoproteins related machinery proteins
eEFSec, PSTK, SBP2, SBP2L, Secp43, Secp43-like, SecS and SPS1

Duplicated selenoproteins
SelH~, Sel J~,SelL~,SelO~, Secp43-like~, TR1~A, TR1~B, TR1~C and SPS2~

Selenoproteins that we could not predict
SelT1b, SelW1 and SelW2b.

In the majority of cases, the presence of Sec was confirmed by the presence of SECIS elements at the 3’-UTR region of the gene sequence. In those proteins where the Sec element had been replaced by Cys, we could not detect any SECIS elements. This confirms the idea of the essential paper of this elements for the presence of this amino acid in the protein structure.

One of the main contributions of our study is based on the phylogenetic approach that provides about some selenoproteins found in Thunnus orientalis genome. The most relevant examples are encountered in GPx family and in selenoproteins Fep15 and SelM. In case of the first mentioned proteins, are GPx7 and GPx8 whose have evolved from a GPx4 isoform before mammals and fishes split. On the other hand, Fep15 is found to be a duplication from SelM protein. Both are examples of phylogenetically related proteins that diverged from a common ancestor.

For some selenoproteins, there have been found consistent duplications apparently occurred recently in evolution. This is the case of SelH, SelJ, SelL, SelO and TR; whose copies have been named as SelH~, SelJ~, SelL~, SelO~ and TR~. This fact might be due to the limitation caused by not all selenoproteins families being fully described in fishes, which ease the possibility to find several homologous proteins that could belong to a same family that has not been yet described as such.

It has also been observed that dynamic Sec/Cys evolutionary exchange has been reported to occur among different taxonomic levels. In the case of Thunnus orientalis genome, the selenoprotein SelR1a conserved Sec amino acid in contrast of Danio rerio, in which the replacement of Sec by Cys occurred.

Finally, one of the main discoveries that support the high relevance of our study is the finding of two possible pseudogenes of SelR1a and SelU1c. We achieved this conclusion by observing the presence of a Stop codon within its in-frame protein sequence, that truncates its translation. Furthermore, we can also assure that this event might have occurred recently in the evolutionary process, as the alignment with the original protein has not accumulated a high number of mutations.

The principal limitation we found while carrying out our study was the high fragmentation of Thunnus orientalis genome. Due to this fact, some of our proteins were fragmented into different contigs and this complicates the prediction of its whole sequence and structure. For this reason, it can not be assured that all predicted introns and exons are properly organized. These are also the causes why finding every SECIS element/s expected to encounter contiguously to each selenoprotein becomes difficult.

The second limitation we should consider is the fact that we have based our study by testing the homology between Thunnus orientalis selenoproteins and those annotated in other organisms, such as Danio rerio and Homo sapiens species. This implies that we have only taken into account selenoproteins that are similar with selenoproteins already described in other species, but we have no means to predict selenoproteins found in tuna that might have not been yet described. Future research should be performed in order to look for this undescribed selenoproteins.

In another way but still based on our homology-testing study, we might not be considering some proteins that have highly diverged from the query token as reference. In these cases, the e-value threshold applied would detect the alignment as not good enough and would immediately reject the hit, skipping this selenoprotein in the final prediction.

The last limitation of our study consists on the bad annotation of selenoproteins in SelenoDB database in which we could not find the different isoforms for a single protein, in some proteins its name was badly assigned, and in many others the annotation was incorrect.

For future studies, we propose a widely project which could include a vast variety of different marine animals, such as fishes (like tuna), mammals (like dolphins) and reptiles (like turtles). By using a similar methodology to predict the selenoproteins in each specie, these could be compared with other found in non-marine animals, allowing the description of common functions between species that share a similar habitat but which belong to different lineages. By testing such hypotheses, we could provide evidence about the convergence process that might have taken place during evolution.