Methods and Materials

DESCRIPTION:

Like we explain in the Introduction section , our program allows us to find the sequences where the human SNPs are included through the identification number "rs" that NCBI dbSNP goes to search.
When we began to create the program, the first thing we had to resolve was how to accede to the database of NCBI from our program.
We first thought we could use modules of ENSEMBL or BioPerl but , finally, we decided to adapt the LWP module to accede to NCBI dbSNP. Under the restrictions of the module, we have designed it to be able to introduce up to 90 identification numbers (that is under 100, the maximum advised by the module).

STRUCTURE OF THE PROGRAM:

- Data control:
The first part of our program consist on controlling the information the user introduce, that are the identification numbers (rs) separated by commas and the number of nucleotides upstream and downstream that we request.
If the "rs" aren't in the requested format or if there are more than 90 "rs", we recieve an error message.
To choose the length of the sequence we have used another regular expression. If we don't introduce a natural number, we will recieve an error message telling us that we need this kind of character.

- Selection of the requested information in dbSNP and visualisation on the web:
Thanks to the adaptations we have done of the module LWP, we can make two requests to dbSNP.
The first request is quite reduced (function minicrida) and is used to know if the SNP is in the database. If the SNP isn't there, we recieve an advising message and if it is there we continue with the program using the four other functions that we have created.

The second request (function crida) to NCBI shows us different sequences where the SNP we ask is included. There are more than one sequence due to the updates of the human genome.The variations we see are almost allways variations of the number of nucleotides.
For every "rs", our program goes over every line and select the information we need: the identification number, the specie and the longest forward and reverse sequences upstream and downstream of the SNP saving it in a hash called "minihash". We have included all the "minihash" in another hash called "maxihash" which will contain the information selected of all the "rs" requested.
The last function (function pagweb) select the length the user request (function mostratallat) to show it and recover all the information saved in the "minihash".
If the user has requested a greater length than the available ones, it will show the greatest length and we will see a message telling us that the length requested for the identification number is not available in dbSNP.

Top

Comments and suggests to: Clara Ballesté and Núria Conde UPF Barcelona