BROAD Best Practices Somatic CNV Panel is used for creating a panel of normals (PON) given a set of normal samples. ### Common Use Cases For CNV discovery, the PON is created by running the initial coverage collection tools individually on a set of normal samples and combining the resulting copy ratio data using a dedicated PON creation tool [1]. This produces a binary file that can be used as a PON. It is very important to use normal samples that are as technically similar as possible to the tumor samples (same exome or genome preparation methods, sequencing technology etc.) [2]. The basis of copy number variant detection is formed by collecting coverage counts, while the resolution of the analysis is defined by the genomic intervals list. In the case of whole genome data, the reference genome is divided into equally sized intervals or bins, while for exome data, the target regions of the capture kit should be padded. In either case, the **PreprocessIntervals** tool is used for preparing the intervals list which is then used for collecting raw integer counts. For this step **CollectReadCounts** is utilized, which counts reads that overlap the interval. Finally a CNV panel of normals is generated using the **CreateReadCountPanelOfNormals** tool. In creating a PON, **CreateReadCountPanelOfNormals** abstracts the counts data for the samples and the intervals using Singular Value Decomposition (SVD), a type of Principal Component Analysis. The normal samples in the PON should match the sequencing approach of the case sample under scrutiny. This applies especially to targeted exome data because the capture step introduces target-specific noise [3]. Some of the common input parameters are listed below: * **Input reads** (`--input`) - BAM/SAM/CRAM file containing reads. In the case of BAM and CRAM files, secondary BAI and CRAI index files are required. * **Intervals** (`--intervals`) - required for both WGS and WES cases. Formats must be compatible with the GATK `-L` argument. For WGS, the intervals should simply cover the autosomal chromosomes (sex chromosomes may be included, but care should be taken to avoid creating panels of mixed sex, and to denoise case samples only with panels containing only individuals of the same sex as the case samples)[4]. * **Bin length** (`--bin-length`). This parameter is passed to the **PreprocessIntervals** tool. Read counts will be collected per bin and final PON file will contain information on read counts per bin. Thus, when calling CNVs in Tumor samples, **Bin length** parameter has to be set to the same value used when creating the PON file. * **Padding** (`--padding`). Also used in the **PreprocessIntervals** tool, defines number of base pairs to pad each bin on each side. * **Reference** (`--reference`) - Reference sequence file along with FAI and DICT files. * **Blacklisted Intervals** (`--exclude_intervals`) will be excluded from coverage collection and all downstream steps. * **Do Explicit GC Correction** - Annotate intervals with GC content using the **AnnotateIntervals** tool. ### Changes Introduced by Seven Bridges *The workflow in its entirety is per [best practice](https://github.com/gatk-workflows/gatk4-somatic-cnvs/blob/master/cnv_somatic_panel_workflow.wdl) specification.* ### Performance Benchmarking | Input Size | Experimental Strategy | Coverage | Duration | Cost (on demand) | AWS Instance Type | | --- | --- | --- | --- | --- | --- | --- | | 2 x 45GB | WGS | 8x | 33min | $0.59 | c4.4xlarge 2TB EBS | | 2 x 120GB | WGS | 25x | 1h 22min | $1.47 | c4.4xlarge 2TB EBS | | 2 x 210GB | WGS | 40x | 2h 19min | $2.48 | c4.4xlarge 2TB EBS | | 2 x 420GB | WGS | 80x | 4h 15min | $4.54 | c4.4xlarge 2TB EBS | ### API Python Implementation The app's draft task can also be submitted via the **API**. In order to learn how to get your **Authentication token** and **API endpoint** for corresponding platform visit our [documentation](https://github.com/sbg/sevenbridges-python#authentication-and-configuration). ```python # Initialize the SBG Python API from sevenbridges import Api api = Api(token="enter_your_token", url="enter_api_endpoint") # Get project_id/app_id from your address bar. Example: https://igor.sbgenomics.com/u/your_username/project/app project_id = "your_username/project" app_id = "your_username/project/app" # Replace inputs with appropriate values inputs = { "sequence_dictionary": api.files.query(project=project_id, names=["enter_filename"])[0], "intervals": api.files.query(project=project_id, names=["enter_filename"])[0], "in_alignments": list(api.files.query(project=project_id, names=["enter_filename", "enter_filename"])), "in_reference": api.files.query(project=project_id, names=["enter_filename"])[0], "output_prefix": "sevenbridges"} # Creates draft task task = api.tasks.create(name="GATK CNV Somatic Panel Workflow - API Run", project=project_id, app=app_id, inputs=inputs, run=False) ``` Instructions for installing and configuring the API Python client, are provided on [github](https://github.com/sbg/sevenbridges-python#installation). For more information about using the API Python client, consult [the client documentation](http://sevenbridges-python.readthedocs.io/en/latest/). **More examples** are available [here](https://github.com/sbg/okAPI). Additionally, [API R](https://github.com/sbg/sevenbridges-r) and [API Java](https://github.com/sbg/sevenbridges-java) clients are available. To learn more about using these API clients please refer to the [API R client documentation](https://sbg.github.io/sevenbridges-r/), and [API Java client documentation](https://docs.sevenbridges.com/docs/java-library-quickstart). ### References * [1] [https://github.com/gatk-workflows/gatk4-somatic-cnvs](https://github.com/gatk-workflows/gatk4-somatic-cnvs) * [2] [https://gatkforums.broadinstitute.org/gatk/discussion/11053/panel-of-normals-pon](https://gatkforums.broadinstitute.org/gatk/discussion/11053/panel-of-normals-pon) * [3] [https://gatkforums.broadinstitute.org/dsde/discussion/11682](https://gatkforums.broadinstitute.org/dsde/discussion/11682) * [4] [https://gatkforums.broadinstitute.org/gatk/discussion/11009/intervals-and-interval-lists](https://gatkforums.broadinstitute.org/gatk/discussion/11009/intervals-and-interval-lists)