Browsing by Subject "bioinformatics"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Challenges and approaches for distributed workflow-driven analysis of large-scale biological data: vision paper(ACM, 2012-03-30) Altintas, Ilkay; Wang, Jianwu; Crawl, Daniel; Li, WeizhongNext-generation DNA sequencing machines are generating a very large amount of sequence data with applications in many scientific challenges and placing unprecedented demands on traditional single-processor bioinformatics algorithms. Middleware and technologies for scientific workflows and data-intensive computing promise new capabilities to enable rapid analysis of next-generation sequence data. Based on this motivation and our previous experiences in bioinformatics and distributed scientific workflows, we are creating a Kepler Scientific Workflow System module, called "bioKepler", that facilitates the development of Kepler workflows for integrated execution of bioinformatics applications in distributed environments. This vision paper discusses the challenges related to next-generation sequencing data, explains the approaches taken in bioKepler to help with analysis of such data, and presents preliminary results demonstrating these approaches.Item Functionally Significant Features in the 5' Untranslated Region of the ABCA1 Gene and Their Comparison in Vertebrates(MDPI, 2019-06-21) Dvorak, Pavel; Leupen, Sarah; Soucek, PavelSingle nucleotide polymorphisms located in 5' untranslated regions (50UTRs) can regulate gene expression and have clinical impact. Recognition of functionally significant sequences within 5' UTRs is crucial in next-generation sequencing applications. Furthermore, information about the behavior of 5' UTRs during gene evolution is scarce. Using the example of the ATP-binding cassette transporter A1 (ABCA1) gene (Tangier disease), we describe our algorithm for functionally significant sequence finding. 5' UTR features (upstream start and stop codons, open reading frames (ORFs), GC content, motifs, and secondary structures) were studied using freely available bioinformatics tools in 55 vertebrate orthologous genes obtained from Ensembl and UCSC. The most conserved sequences were suggested as hot spots. Exon and intron enhancers and silencers (sc35, ighg2 cgamma2, ctnt, gh-1, and fibronectin eda exon), transcription factors (TFIIA, TATA, NFAT1, NFAT4, and HOXA13), some of them cancer related, and microRNA (hsa-miR-4474-3p) were localized to these regions. An upstream ORF, overlapping with the main ORF in primates and possibly coding for a small bioactive peptide, was also detected. Moreover, we showed several features of 5' UTRs, such as GC content variation, hairpin structure conservation or 5' UTR segmentation, which are interesting from a phylogenetic point of view and can stimulate further evolutionary oriented research.Item Genome-Scale Analyses of Escherichia coli and Salmonella enterica AraC Reveal Noncanonical Targets and an Expanded Core Regulon(American Society for Microbiology, 2013-11-22) Stringer, Anne M.; Currenti, Salvatore; Bonocora, Richard P.; Baranowski, Catherine; Petrone, Brianna L.; Palumbo, Michael J.; Reilly, Andrew A.; Zhang, Zhen; Erill, Ivan; Wade, Joseph T.Escherichia coli AraC is a well-described transcription activator of genes involved in arabinose metabolism. Using complementary genomic approaches, chromatin immunoprecipitation (ChIP)-chip, and transcription profiling, we identify direct regulatory targets of AraC, including five novel target genes: ytfQ, ydeN, ydeM, ygeA, and polB. Strikingly, only ytfQ has an established connection to arabinose metabolism, suggesting that AraC has a broader function than previously described. We demonstrate arabinose-dependent repression of ydeNM by AraC, in contrast to the well-described arabinose-dependent activation of other target genes. We also demonstrate unexpected read-through of transcription at the Rho-independent terminators downstream of araD and araE, leading to significant increases in the expression of polB and ygeA, respectively. AraC is highly conserved in the related species Salmonella enterica. We use ChIP sequencing (ChIP-seq) and RNA sequencing (RNA-seq) to map the AraC regulon in S. enterica. A comparison of the E. coli and S. enterica AraC regulons, coupled with a bioinformatic analysis of other related species, reveals a conserved regulatory network across the family Enterobacteriaceae comprised of 10 genes associated with arabinose transport and metabolism.Item Genomic characterization and comparison of seven Myoviridae bacteriophage infecting Bacillus thuringiensis(Elsevier B.V, 2016-01-14) Sauder, Amber Brooke; Quinn, McKenzie Rea; Brouillette, Alexis; Caruso, Steven; Cresawn, Steven; Erill, Ivan; Lewis, Lynn; Loesser-Casey, Kathryn; Pate, Morgan; Scott, Crystal; Stockwell, Stephanie; Temple, LouiseBacillus thuringiensis Kurstaki, a bacterium that is a source of biopesticides and a safe simulant for pathogenic Bacillus species, was used to isolate seven unique bacteriophages. The phage genomes were sequenced and ranged in size from 158,100 to 163,019 bp encoding 290–299 genes, and the GC content of ~38% was similar to that of the host bacterium. All phages had terminal repeats 2–3 kb long. Three of the phages encoded tRNAs and three contained a self-splicing intron in the DNA polymerase gene. They were categorized as a single cluster (>60% nucleotide conservation) containing three subclusters (>80% nucleotide conservation), supported by genomic synteny and phylogenetic analysis. Considering the published genomes of phages that infect the genus Bacillus and noting the ability of many of the Bacillus cereus group phages to infect multiple species, a clustering system based on gene content is proposed.Item Reading between the genes: interpreting non-coding DNA in high-throughput(2019-01-03) Berghout, Joanne; Lussier, Yves A.; Vitali, Francesca; Bulyk, Martha L.; Kann, Maricel G.; Moore, Jason H.Identifying functional elements and predicting mechanistic insight from non-coding DNA and non-coding variation remains a challenge. Advances in genome-scale, high-throughput technology, however, have brought these answers closer within reach than ever, though there is still a need for new computational approaches to analysis and integration. This workshop aims to explore these resources and new computational methods applied to regulatory elements, chromatin interactions, non-protein-coding genes, and other non-coding DNA.Item Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms(BioMed Central Ltd, 2014-11-19) Speiser, Daniel I.; Pankey, M. Sabrina; Zaharoff, Alexander K.; Battelle, Barbara A.; Bracken-Grissom, Heather D.; Breinholt, Jesse W.; Bybee, Seth M.; Cronin, Thomas W.; Garm, Anders; Lindgren, Annie R.; Patel, Nipam H.; Porter, Megan L.; Protas, Meredith E.; Rivera, Ajna S.; Serb, Jeanne M.; Zigler, Kirk S.; Crandall, Keith A.; Oakley, Todd H.Background: Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. Results: We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository (http://bitbucket.org/osiris_phylogenetics/pia/) and we demonstrate PIA on a publicly-accessible web server (http://galaxy-dev.cnsi.ucsb.edu/pia/). Conclusions: Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.