![]() |
SureDesign allows you to create a custom design for Avida DNA and Avida Methyl target enrichment captures using a design wizard that takes you through the steps of the creation process. The wizard will ask you to define the target capture regions (using gene names, transcript IDs, or genomic coordinates) and set a few of the design parameters. You then submit the design to SureDesign and the program's algorithms select the probe sequences for the design. When your probe selection job is complete, you receive an e-mail notifying you that your design is ready to be finalized. Once you finalize the design, it is available for ordering.
To open the Avida design wizard:
· At the top of the screen, click Create Designs > Avida.
The wizard window opens to Step 1.
In this step, complete the fields described below to define the design.
Category
Select between Avida-DNA (for DNA analysis) and Avida-Methyl (for methylation analysis).
Design Name
Type a name for your design into the field. Alphanumeric characters, hyphens, underscores, and spaces are permitted. The name must be unique within your workgroup.
Species
For Avida designs, the only available selection for the species is H. sapiens.
Build
Select the desired human genome build in the provided drop-down list.
Specify the folder in which you want to save this design. The default selection is the top-level folder for your workgroup.
To change the selection, click Select to open the Select Folder dialog box, and mark the folder in which you want to save the new design. This dialog box lists the available folders within your workgroup and, if you are a member of any collaborations, lists the collaboration folders to which you have access. (If you later decide you want to change the folder location of the design, you can move it to another folder.)
Click Next to advance to Step 2.
In this step, define the target regions that you want to capture by providing the following information:
Targets
In the Targets text area, enter identifiers for the targets using either of the following approaches:
· Type or paste the target identifiers directly into the text area. List one identifier per line.
· Click Upload to browse to a text file (*.txt) that lists the target identifiers (one identifier per line).
The permitted identifiers are:
· For target genes:
Gene name - enter the gene name (not case-sensitive) as it appears in one or more of the selected databases; example: brca1; see SureDesign gene finder for information on how SureDesign maps a gene name to a specific genomic location
Transcript ID - enter the transcript ID (not case-sensitive) as it appears in one or more of the selected databases; examples: NM_007294, CCDS6359, or ENST00000357654; note that SureDesign ignores version numbers included in the transcript ID
Gene ID - enter the numerical NCBI gene ID; example: 672
SNP ID - enter the dbSNP ID; example: rs3824120
Promoter ID - enter the EPD ID; example: MYC_1 or HRAS_1
· For target genomic intervals:
Genomic coordinates - enter the chromosome number and range of nucleotides using the UCSC browser format or BED format.
You can add a string of text, no spaces, after the target genomic interval to be used as the target ID (e.g. chr1:1-100 geneX). If you enter multiple target genomic intervals with the same target ID (e.g. chr1:1-100 geneX and chr1:201-300 geneX), SureDesign will treat the intervals as different regions within the same gene.
CpG Island Lookup (for Avida-Methyl designs only)
In the CpG Island Lookup text area, enter identifiers for the CpG islands using either of the following approaches:
· Type or paste the target identifiers directly into the text area. List one identifier per line.
· Click Upload to browse to a text file (*.txt) that lists the target identifiers (one identifier per line).
The permitted identifiers are:
· For target genes that contain a CpG island of interest:
Gene name - enter the gene name (not case-sensitive) as it appears in one or more of the selected databases; example: brca1; see SureDesign gene finder for information on how SureDesign maps a gene name to a specific genomic location
Transcript ID - enter the transcript ID (not case-sensitive) as it appears in one or more of the selected databases; examples: NM_002467 and ENST00000524013; note that SureDesign ignores version numbers included in the transcript ID
Gene ID - enter the numerical NCBI gene ID; example: 672
SNP ID - enter the dbSNP ID; example: rs3824120
Promoter ID - enter the EPD ID; example: MYC_1 or HRAS_1
· For target genomic intervals that contain a CpG island of interest:
Genomic coordinates - enter the chromosome number and range of nucleotides using the UCSC browser format or BED format.
You can add a string of text, no spaces, after the target genomic interval to be used as the target ID (e.g. chr1:1-100 geneX). If you enter multiple target genomic intervals with the same target ID (e.g. chr1:1-100 geneX and chr1:201-300 geneX), SureDesign will treat the intervals as different regions within the same gene.
Databases
Below the Databases heading, mark the genome annotation databases that you want SureDesign to use to obtain genomic coordinate information for your specified targets. The default selections differ for Avida-DNA and Avida-Methyl designs. For targets entered in the CpG Island Lookup text box, SureDesign searches the genomic coordinates for the targets to identify CpG islands that overlap those locations. The available database sources are:
RefSeq - US National Center for Biotechnology Information (NCBI)
Ensembl - European Bioinformatics Institute and the Wellcome Trust Sanger Institute
CCDS - Consensus Coding Sequence project (CCDS) of the US National Center for Biotechnology Information (NCBI)
Gencode - US National Human Genome Research Institute (NHGRI) and the Wellcome Trust Sanger Institute
VEGA - Vertebrate Genome Annotation project of the Human and Vertebrate Analysis and Annotation (HAVANA) group at the Wellcome Trust Sanger Institute
SNP - dbSNP database from the National Institutes of Health (NIH)
EPD - Eukaryotic Promoter Database
NOTE If you mark multiple databases, and you select Coding Exons or Coding Exons + UTRs as the regions of interest (see below), SureDesign may find exon information for a target gene in more than one database. In these cases, the program considers a sequence to be coding if any of the selected databases identifies it as coding, and it considers a sequence to be translated if any of the selected databases identifies it as translated.
Regions of Interest (Targets or CpG Island Lookup)
Specify the specific regions within the target genes that you want to capture. Use the options below the Regions of Interest heading:
· Coding Exons - Select this option to include probes for the translated regions of the target genes.
· Coding Exons + UTRs - Select this option to include probes for the translated regions and the 5' and/or 3' untranslated regions of the target genes. Mark adjacent check boxes to indicate which untranslated regions you want to include in the target regions:
· Mark 5' UTR to include 5' untranslated regions.
· Mark 3' UTR to include 3' untranslated regions.
· Entire Transcribed Region - Select this option to include probes for the entire genomic sequence (exons, introns, and UTRs) of your target genes.
NOTE For target genomic intervals (i.e., targets entered as genomic coordinates), SureDesign always includes the entire genomic sequence when selecting probes for the design, regardless of your selection for the Regions of Interest.
Include Flanking Bases (for Regions of Interest - Targets)
In the 3' and 5' drop-down lists, select how many base pairs of flanking sequence (on the 3' and 5' ends, respectively) you want SureDesign to include on each exon/UTR when designing the probes for a target gene.
NOTE SureDesign does not include flanking bases for targets entered as genomic coordinates.
Extend Regions for CpG lookup (for Regions of Interest - CpG Island Lookup)
In the Extend Regions for CpG lookup field, select how many base pairs of flanking sequence you want SureDesign to include on each target when searching for CpG islands.
NOTE SureDesign does not include flanking bases for targets entered as genomic coordinates.
Allow Synonyms
When this check box is marked, SureDesign compares the gene names you entered into the Targets area (or CpG Island Lookup area)to a table of synonyms, and may use the synonym names to map the genes to a genomic location. For example, if you entered HER2 as a target, SureDesign would identify HER2 as a product of the gene ERBB2, and use ERBB2 to map the genomic location.
In cases in which the gene name for your target is also a synonym for another gene, SureDesign treats both genes as targets when Allow Synonyms is marked. For example, if you entered DSP as a target, SureDesign would identify your target as the official gene name for desmoplakin, but it would also identify it as a synonym for the gene encoding dentin sialophosphoprotein. Consequently, the program would map the genomic location to two completely different genes, and in the next step of the wizard (Step 3: Review Targets), you would see both genomic locations listed for the target.
When the Allow Synonyms check box is cleared, SureDesign maps your targets to genomic locations using only the entered gene names.
To fully control how SureDesign maps your targets to a genomic location, enter your targets using transcript IDs, gene IDs, or SNP IDs instead of gene names. Alternatively, after you advance to the Review Targets step of the wizard, click Download to download the Regions.bed file and then edit the genomic locations listed in the file so that they accurately match those of your targets. You can then go back to the Define Targets step of the wizard and paste the genomic locations into the Targets input area.
Ignore Transcript Version
Marking this check box impacts the way that SureDesign matches your inputted target identifiers for transcripts. When you enter a target identifier as a transcript ID that includes the version number (e.g., ENST00000258149.10), SureDesign first searches for that exact transcript and version. If the Ignore Transcript Version check box is not marked, and SureDesign cannot find the exact transcript and version, then SureDesign reports the target ID as not found. If, however, the check box is marked, then SureDesign repeats the search, this time ignoring the version number so that any version of the transcript is included as a target.
Include Predicted Transcripts (RefSeq)
If you selected the RefSeq databases, marking this check box allows SureDesign to use predicted transcripts in that database for obtaining genomic coordinate information for your specified targets.
Click Next to advance to Step 3.
NOTE If the total size of the targets is 1 Mb or larger, SureDesign opens an error message and does not let you proceed to the next step. To continue creating the design, define fewer or smaller targets, or contact your Agilent representative.
This step provides a chance for you to make sure that SureDesign successfully recognized all of the target identifiers that you entered in the Define Targets step. Review the Target Summary and Target Details before you proceed to the next step.
Target Summary
Near the top of the wizard window is a target summary with 2 to 4 bullet points (depending on target identifiers entered in the Define Targets step). The bullet points indicate:
· The number of Target IDs that SureDesign was able to resolve to a genomic location, and the total number of continuous genomic regions that comprise those targets. The Target IDs summarized in this bullet point are based on the identifiers entered in the Targets text area of the Define Targets step. (If any of the target identifiers mapped to more than one genomic location, you will notice that the number of targets is greater than the number of Target IDs. See SureDesign gene finder for more information on how SureDesign maps target IDs to targets.)
· The number of Target IDs that SureDesign was not able to find in any of the databases you selected in the Define Targets step. The Target IDs summarized in this bullet point are based on the identifiers entered in the Targets text area of the Define Targets step.
· The number of Target IDs for CpG islands that SureDesign was able to resolve to a genomic location, and the total number of continuous genomic regions that comprise those targets. The Target IDs (CpG Islands) summarized in this bullet point are based on the identifiers entered in the CpG Island Lookup text area of the Define Targets step.
· The number of Target IDs for CpG islands that SureDesign was not able to find in any of the databases you selected in the Define Targets step. The Target IDs (CpG Islands) summarized in this bullet point are based on the identifiers entered in the CpG Island Lookup text area of the Define Targets step.
If SureDesign did not accurately identify
all of your target regions
The Target Details table lists the following information for each of the target identifiers that SureDesign was able to locate:
· Target ID - The target ID is the gene name, transcript ID, SNP ID, or genomic coordinates that you used to define the target.
· # Regions - The # Regions column lists the number of target regions within the target.
· Base Pairs - The Base Pairs column lists the total number of base pairs within the regions defined by the target identifier.
· Position - The Position column lists the genomic coordinates identified for the target.
NOTE To perform a careful review of the individual regions, click View targets in UCSC to open the UCSC Genome Browser and see the genomic locations of the regions identified by SureDesign.
To
submit the design for probe selection:
When you are finished reviewing the target details, submit the design to the SureDesign job queue and the SureDesign algorithms will select the probes for your design.
Click Begin
Probe Design.
A message box opens indicating the e-mail address where Agilent will
contact you when the probe design job is complete. If desired, you
can enter additional e-mail addresses into the provided field.
Click OK
in the notification message to submit the design to SureDesign.
Your submission is placed in the SureDesign job queue to await probe
selection.
The wizard automatically advances to Step 4.
At this point in the design creation process, SureDesign is processing your probe design job. The length of time required for SureDesign to complete the job depends on the number of jobs waiting in the queue and the size of your design.
Click Close Design Wizard. When you receive an e-mail from Agilent SureDesign notifying you that your probe design job was successfully completed, relaunch the wizard and continue creating the design:
Open the SureDesign Home screen.
Locate the design under Designs:
In Progress, and click the Continue icon .
The wizard window opens to Step 5.
NOTE You can monitor the status of your probe design jobs from the SureDesign Home screen.
NOTE If the probe design process results in a design in which the covered size of the probes is 1 Mb or larger, the probe design job fails. To continue creating the design, define fewer or smaller targets, or contact your Agilent representative.
This step shows the results of the probe selection job. The selected probes are listed in a table that provides the probe target (Target ID), the genomic location (Internal), the length of the probe in bp, and the coverage (calculated as the percentage of the target regions covered by the probes). In this step, you select between the following options using the radio buttons at the top of the wizard screen.
· Select All Probes - Select this option to add all of the probes in the table to the Avida design. When Select All Probes is chosen, the design may include some probes that target repetitive regions in the panel, ensuring comprehensive coverage.
· Select High Quality Probes - Select this option to only add the high quality probes in the table to the Avida design. When Select High Quality Probes is chosen, the design excludes coverage of repetitive regions in the panel to enhance sequencing efficiency.
To help you make an informed choice between Select All Probes and Select High Quality Probes, SureDesign provides a summary of the price tier, design size, and coverage for each option, as shown in the example screen shot above. To view any of the probe intervals in the UCSC Genome Browser, click directly on the Interval column for that probe to launch the browser.
NOTE At this step, the information in the design summary panel is complete, and you can download and view the summary files to help you decide. The set of download files includes files based on the Select All Probes option (see files with suffix "All") as well as the Select High Quality Probes option (see files with suffix "High").
After reviewing the probes, select the desired option (Select All Probes or Select High Quality Probes) and click Next. Alternatively, if you want to make further edits to the design, click Modify Design to delete the probes from the probe selection job and return to the Define Targets step.
In this step, click Finalize Design to finish the design creation process.
After clicking Finalize Design, the wizard window updates to the Design Complete step, and provides the following information:
Name |
The name of the design. |
Design ID |
The unique, Agilent-assigned design ID. |
Species |
The species of the targets. The genome build is indicated in parentheses. |
Design Category |
The category of the Avida design (Avida-DNA or Avida-Methyl). |
# Regions |
The number of target regions in the design. |
Total Target Regions Size |
The size of the genomic footprint of the target regions. |
Probes Size |
The total size of the genomic footprint of all the probes in the design, which for Avida designs is an approximation of the size of the sequenceable region. [[needs verification]] |
Price Tier |
The Agilent-assigned pricing category for a design. For Avida designs, Agilent calculates the price tier using the covered design size. [[needs verification]] |
Coverage |
The percentage of nucleotides in the target regions that are expected to be captured by one or more probes in the design. For Avida designs, a target nucleotide is considered to be covered if at least one probe comes within 100 bases of the nucleotide in either direction. [[needs verification]] The covered region is included as a track in the AllTracks BED file. |
Use the action buttons at the bottom of the Finalize Design window to take
further action on the design:
· Click Order to open the Order dialog box, where you can submit a request to receive a price quote.
· Click Mark as Favorite to add the design to your list of favorites. The design will appear in the Designs: Recent and Favorites dashboard on the Home screen.
· Click Download to download one or more design files, including a formatted PDF report that summarizes key information on the design, probes, targets, and the probe selection job.
· Click UCSC View to launch your internet browser to the UCSC Genome Browser page. The design's AllTracks.BED file is loaded in the browser. You may need to disable the pop-up blocker in your internet browser in order to use this feature.
These action buttons are also available from the design details window. Click Exit Design Wizard to close the wizard window.