|
Di Genova, A., Ruz, G. A., Sagot, M. F., & Maass, A. (2018). Fast-SG: an alignment-free algorithm for hybrid assembly. GigaScience, 7(5), 15 pp.
Abstract: Background: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short-and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. Results: Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffolding graph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). Conclusions: Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.
|
|
|
Loira, N., Mendoza, S., Cortes, M. P., Rojas, N., Travisany, D., Di Genova, A., et al. (2017). Reconstruction of the microalga Nannochloropsis salina genome-scale metabolic model with applications to lipid production. BMC Syst. Biol., 11, 17 pp.
Abstract: Background: Nannochloropsis salina (= Eustigmatophyceae) is a marine microalga which has become a biotechnological target because of its high capacity to produce polyunsaturated fatty acids and triacylglycerols. It has been used as a source of biofuel, pigments and food supplements, like Omega 3. Only some Nannochloropsis species have been sequenced, but none of them benefit from a genome-scale metabolic model (GSMM), able to predict its metabolic capabilities. Results: We present iNS934, the first GSMM for N. salina, including 2345 reactions, 934 genes and an exhaustive description of lipid and nitrogen metabolism. iNS934 has a 90% of accuracy when making simple growth/no-growth predictions and has a 15% error rate in predicting growth rates in different experimental conditions. Moreover, iNS934 allowed us to propose 82 different knockout strategies for strain optimization of triacylglycerols. Conclusions: iNS934 provides a powerful tool for metabolic improvement, allowing predictions and simulations of N. salina metabolism under different media and genetic conditions. It also provides a systemic view of N. salina metabolism, potentially guiding research and providing context to -omics data.
|
|
|
Narum, S. R., Di Genova, A., Micheletti, S. J., & Maass, A. (2018). Genomic variation underlying complex life-history traits revealed by genome sequencing in Chinook salmon. Proc. R. Soc. B-Biol. Sci., 285(1883), 9 pp.
Abstract: A broad portfolio of phenotypic diversity in natural organisms can buffer against exploitation and increase species persistence in disturbed ecosystems. The study of genomic variation that accounts for ecological and evolutionary adaptation can represent a powerful approach to extend understanding of phenotypic variation in nature. Here we present a chromosome-level reference genome assembly for Chinook salmon (Oncorhynchus tshawytscha; 2.36 Gb) that enabled association mapping of life-history variation and phenotypic traits for this species. Whole-genome re-sequencing of populations with distinct life-history traits provided evidence that divergent selection was extensive throughout the genome within and among phylogenetic lineages, indicating that a broad portfolio of phenotypic diversity exists in this species that is related to local adaptation and life-history variation. Association mapping with millions of genome-wide SNPs revealed that a genomic region of major effect on chromosome 28 was associated with phenotypes for premature and mature arrival to spawning grounds and was consistent across three distinct phylogenetic lineages. Our results demonstrate how genomic resources can enlighten the genetic basis of known phenotypes in exploited species and assist in clarifying phenotypic variation that may be difficult to observe in naturally occurring organisms.
|
|