On thursday 8th of February of 2018 at 10 a.m. at the Conference Room of CACTI (Torre CACTI – Floor 0).
Speaker: Dr. Hugo López-Fernández, do Sistemas Informáticos de Nueva Generación (SI4).
Talk will be in Spanish (or English if non-Spanish speakers are attending)
One of the most important types of data used in biological research is DNA or protein sequence data. They are usually stored in FASTA files, which can contain one or more sequences. Public databases such as GenBank, NCBI or Ensembl provide huge collections of genomes, genome annotations, and so on, in FASTA format. Nevertheless, downloaded files usually must be preprocessed before subsequent analysis depending on each researcher needs. Despite the simplicity of these preprocessing operations (e.g. remove sequences without a minimum number of bases), processing of large batches of FASTA files is a complex task that usually requires advanced bioinformatics skills and the combination of different tools (including the bash command line) to achieve the desired result. In order to allow researchers to easily perform these operations, we are developing the SEDA software application (http://www.sing-group.org/seda/) presented in this seminar.