CIS Computing & Information Services

2012 CCV/EPSCoR Bioinformatics Workshop

Friday, October 19, 2012

Barus & Holley 190

Session 1: 9:00 - 9:45am

Getting Started With CCV
Aaron Shen, CCV

Data Management: Critical Success Factors
Jaime Combariza, CCV

The ready availability of powerful instruments like genome sequencers, telescopes, electron microscopes, and remote sensing devices coupled with the increased capability of supercomputers to process vast amounts of information has resulted in an explosion of data (4th paradigm of science) that poses challenges not only to analyze it and move science forward but also -- and perhaps more importantly -- for the proper management and curation of this data. These challenges include: planning for data management; documentation and metadata; data structure and organization; security and backup; citation; dissemination; and sharing, In this talk I will describe best practices on data management that are recommended to researchers and if properly applied will strengthen collaborative endeavors (technical and scientific) between researchers and the unit/center that may be the stewardship of the data. These best practices are also critical components of a "data management plan," which is a requirement for some funding agencies.

SeqDB: high-throughput compression of raw sequence data
Mark Howison, CCV

SeqDB is a file format, compressor and storage tool for the raw data produced by Next-Generation Sequencing platforms like the Illumina HiSeq 2000. In this talk, I will give a brief tutorial of how to use SeqDB on Oscar, and compare it to other available storage solutions for raw sequence data.

15 min break

Session 2: 10:00 - 11:00am

Identification of the sex chromosomes in S. purpuratus
Adrian Reich, MCB

The sea urchin Stronglycentrotus purpuratus is an important model organism for developmental biology and even though a sequenced and annotated genome is available, some fundamental knowledge is lacking. The goals of this dataset are to identify genomic scaffolds that orginate from the sex chromosomes, determine what sex chromosomes are present (XY or ZW or other), and to construct a more complete genome. In the current study we attempt to identify the sex chromosomes using two different sequencing technologies, short reads (Illumina) and single molecule imaging of long DNA molecules (BioNano Genomics).

Examining aging in Drosophila using RNA-seq and ChIP-seq
Jason Wood, MCB

I will present data from our project seeking to understand the epigenetic changes that take place during the aging process in the fruit fly Drosophila melanogaster. I will also discuss my approach to data management and tool selection/pipeline development on Oscar.

Identification of dFOXO direct targets regulating lifespan in Drosophila melanogaster
Hua Bai and Marc Tatar, EEB

Reduced insulin/IGF-1 signaling increases the life span of nematodes, flies and rodents. However, the underlying mechanisms of insulin-mediated life span extension are still unclear. In both Caenorhabditis elegans and Drosophila melanogaster, the life span extension via mutations in insulin/IGF-1 signaling depends on its downstream forkhead transcription factor, daf-16 or dFOXO. In this study, chromatin immunoprecipitation-sequencing (ChIP-Seq) was used to identify dFOXO direct targets in adult Drosophila. We found the binding of dFOXO is enriched in the promoters of 273 genes in two different insulin mutants. Many of these genes are involved in early development, growth and neuronal functions. 25 genes were selected for further analysis. We found that dFOXO acts as both activator and repressor to regulate the transcription of target genes. In some cases, activated dFOXO via chico mutation didn’t affect the expression of target genes, suggesting other signaling pathways may be involved. In the life span analysis, we found that inactivation of three dFOXO target genes leads to an increased life span. One of them belongs to TGF-β pathway, indicating insulin/IGF-1 signaling can modulate TGF-β to regulate longevity in Drosophila.

Agalma: an automated de novo transcriptome assembly pipeline
Mark Howison, CCV

The de novo assembly of transcripts from raw sequence reads requires many more steps than running an assembly program. As transcriptome sequencing projects become larger and more sophisticated, it is critical to be able to automate these multistep analyses, and to consistently generate and collect diagnostics that help the investigator understand the success of sample preparation, the results of the assembly, and the computational demands of the analysis. To address these needs, we present Agalma, an automated transcriptome assembly pipeline tailored to paired-end Illumina sequence data.

A Network Visualization Tool for Dynamic Data Exploration
Jack Lovell, RISD

15 min break

Session 3: 11:15am - noon

A practical overview of de novo genome assemblers
Mark Howison, CCV

Next Generation Sequencing at the RIGSC - A Progress Report
Paul Johnson and Janet Atoyan, RI Genomics & Sequencing Center, URI

BioLite: a lightweight framework for bioinformatics pipelines
Mark Howison, CCV

BioLite is a Python/C++ framework for implementing bioinformatics pipelines for Next-Generation Sequencing (NGS) data. It tracks provenance of analyses, automates the collection and reporting of diagnostics (such as summary statistics and plots at intermediate stages), and profiles computational requirements. These diagnostics can be accessed across multiple stages of a pipeline, from other pipelines, and in HTML reports.