*  GENE ANALYSIS

 

 

In the database Genebank (http://www.ncbi.nlm.nih.gov/Genebank) we found the genomic sequences for our hemagglutinin-esterase protein sequences, except three of them that we didn’t find; these are Influenza C, Breda virus and Murine hepatitis virus (the shortest one). Our viruses are RNA viruses, but in this database the sequences are like cDNA, maybe, because the entire database has to be in the same format, unless you can’t compare the sequences.  With Clustalw, we aligned them.

 

 

Usually, in eukariotic coding regions, we find more frequently C and G nucleotides, but in this case, when we analyzed the content of nucleotides in our sequences, the ratio AT/CG is bigger than the expected results. For the strange results that we obtain, we decided analyze this with more detail, for this, we study the AT/CG ratio of more proteins in Coronaviridae and Influenza, and other proteins of a lot of different kind of random viruses.

 

 

We always found the same results: the viruses’ sequences are rich in A and T. The reasons of this aren’t known, but we propose some hypothesis:

 

 

*      It can be related with the autocomplementarity of the strand, if it has a strong autocomplementarity can be interesting have low content of G and C because their interaction is stronger than AT one. If the sequence is rich in A and T it’s easier separate it to the translation.

 

 

*      G and C are the most mutable nucleotides. The evolution of viruses are very fast, so it mutes more, and its content in GC was reduced in favor of A and T.

 

 

Finally we made the analysis of the conserved motifs in our cDNA sequences, with GENIO/logo server, and we didn’t obtain any significant conservation on our sequences, except the initial ATG.  This maybe occurs, because we worked with RNA virical sequences and this evolves very quickly, so the RNA polimerase mistaken more than DNA polimerase.