


The VGP assembly pipeline uses data generated by a variety of technologies, including PacBio HiFi reads, Bionano optical maps, and Hi-C chromatin interaction maps. Get dataįor this tutorial, the first step is to get the datasets from Zenodo. QC on raw read data is outside the scope of this tutorial. QC on raw read data should be performed before it is used. This tutorial assumes the input datasets are high-quality. However, it is possible that Bionano data may not be available, in which case the HiC workflow can be used directly on the initial purged assembly. When Hi-C data and Bionano data are available, the default pipeline is running the Bionano workflow first, followed by the Hi-C workflow. The VGP pipeline integrates two workflows to generate scaffolds from the contig level assemblies generated from the HiFi reads. In addition, it includes some additional workflows (not shown in the figure), required for exporting the results to GenomeArk. The VGP workflow is implemented in a modular fashion: it consists of five independent subworkflows. Secondly, it allows to adapt the workflow to the available data. Firstly, it allows the evaluation of intermediate steps, which facilitates the modification of parameters if necessary, without the need to start from the initial stage.
Workflowy import series#
1), each one integrated by a series of data manipulation steps. The VGP assembly pipeline has a modular organization, consisting in five main subworkflows (fig.
Workflowy import how to#
The objective of this training is to explain how to run the VGP workflow, focusing on what are the required inputs and which outputs are generated and delegating how the steps are executed to the GWS.
Workflowy import manual#
GWS facilitates analysis repeatability, while minimizing the number of manual steps required to execute an analysis workflow, and automating the process of inputting parameters and software tool version tracking. This tutorial instead provides a quick walkthrough on how the workflows can be used to rapidly assemble a genome using the VGP pipeline with the Galaxy Workflow System (GWS). The VGP has developed a fully automated de-novo genome assembly pipeline, which uses a combination of three different technologies: Pacbio HiFi, Bionano optical maps and Hi-C chromatin interaction data.Īs a result of a collaboration with the VGP team, a training including a step-by-step detailed description of parameter choices for each step of assembly was developed for the Galaxy Training Network ( Lariviere et al. The Vertebrate Genome Project (VGP), a project of the G10K Consortium, aims to generate high-quality, near error-free, gap-free, chromosome-level, haplotype-phased, annotated reference genome assemblies for every vertebrate species ( Rhie et al.
