Permitted formats for probe upload files

When you create a SureSelect design or probegroup using the advanced wizard, you have the option to upload a group of probes from a file.

Follow these requirements when preparing a probe upload file:

·        The file must be a tab-delimited text file (*.txt or *.tdt) that has been saved within a compressed folder (*.zip).

·        If your file includes comments, the comment lines must start with a "#" symbol.

·        For target enrichment probes, the probe information in the file must be in a 4-column, 6-column, 7-column (RNA designs only), or 8-column format, with each column completed as described in the tables below. The 4-column file is a minimal format for use when you do not wish to enter the genomic coordinates of the probes, while the 6- and 8-column formats require genomic coordinates to be provided.

·        For microarray probes, the probe information in the file must be in a 2-column or 6-column format, with each column completed as described in the tables below.

2-column probe upload file (microarray probes)

The columns of a 2-column probe upload file for microarray designs are described in the table below.

 NOTE  SureDesign does not download annotation information for probes uploaded from a file. Probes uploaded from a 2-column file do not have any annotation information associated with them.

Column #

Column header

Description of column content

1

ProbeID

The ProbeID is a unique identifier for the probe sequence.

The ProbeID can be up to 100 characters long.

2

Sequence

This column contains the complete sequence of the probe in 5' to 3' orientation.

The sequence can contain only A, T, G, and C characters. All sequences must be 20-60 nucleotides long.

 

4-column probe upload file (target enrichment probes)

The columns of a 4-column probe upload file for SureSelect or HaloPlex designs are described in the table below.

Column #

Column header

Description of column content

1

TargetID

The TargetID is an identifier that describes the target of the probe sequence. For example, the TargetID may be gene name (e.g. BRCA1). You can have more than one probe with the same TargetID.

The Target ID can be up to 100 characters long.

2

ProbeID

The ProbeID is a unique identifier for the probe sequence.

The ProbeID can be up to 100 characters long.

3

Sequence

This column contains the complete sequence of the probe in 5' to 3' orientation.

The sequence can contain only A, T, G, and C characters. All sequences must be 120 nucleotides long.

4

Replication

The number in this column indicates the number of times that the probe is replicated within the probegroup.

This column allows you to control the replication number for each probe in the probegroup. However, you can enter a 1 in this column for all probes, and then override that entry by selecting Balanced, Max Performance, or Max Performance - XTHS/XT Low Input only in the wizard's Boosting setting. With either of these boosting options, SureDesign assigns a replication number to each probe based on its GC content.

 

6-column probe upload file (microarray probes)

The columns of a 6-column probe upload file for microarray designs are described in the table below.

Column #

Column header

Description of column content

1

ProbeID

The ProbeID is a unique identifier for the probe sequence.

The ProbeID can be up to 100 characters long.

2

Sequence

This column contains the complete sequence of the probe in 5' to 3' orientation.

The sequence can contain only A, T, G, and C characters. All sequences must be 20-60 nucleotides long.

3

Coordinates

This column lists the genomic coordinates of the probe. SureDesign uses the probe coordinates to compute the total capture size of the design.

Coordinates must be provided in standard browser format, e.g. chr1:1-100.

4

Accessions

This column lists any accession numbers for sequences that overlap the probe coordinates.

Separate the database source and accession number with a "|" character (e.g. ref|NM_008837, for RefSeq accession numbers). The database source can be up to 5 characters long. The accession number can be up to 25 characters long.

Separate multiple accessions with a "|" character. You can enter up to 20 accession numbers.

5

Gene Symbol

This column lists any genes that overlap the probe coordinates.

Enter the gene as a name (e.g. "Ube1x") or as a database source followed by a gene symbol (e.g. "ref|Prkcabp").

Separate multiple genes with a "|" character.

6

Description

Use this column to add any description of your choosing.

The description can be up to 200 characters long.

 

6-column probe upload file (target enrichment probes)

The columns of a 6-column probe upload file for SureSelect or HaloPlex designs are described in the table below.

Column #

Column header

Description of column content

1

TargetID

The TargetID is an identifier that describes the target of the probe sequence. For example, the TargetID may be gene name (e.g. BRCA1). You can have more than one probe with the same TargetID.

The Target ID can be up to 100 characters long.

2

ProbeID

The ProbeID is a unique identifier for the probe sequence.

The ProbeID can be up to 100 characters long.

3

Sequence

This column contains the complete sequence of the probe in 5' to 3' orientation.

The sequence can contain only A, T, G, and C characters. All sequences must be 120 nucleotides long.

4

Replication

The number in this column indicates the number of times that the probe is replicated within the probegroup.

This column allows you to control the replication number for each probe in the probegroup. However, you can enter a 1 in this column for all probes, and then override that entry by selecting Balanced, Max Performance, or Max Performance - XTHS/XT Low Input onlyin the wizard's Boosting setting. With either of these boosting options, SureDesign assigns a replication number to each probe based on its GC content.

5

Strand

An entry of "+" indicates that the probe is the sense strand and it captures the antisense strand of the target. An entry of "–" indicates that the probe is the antisense strand and it captures the sense strand of the target.

For each probe, the Strand column must contain a single "+" or "–" character.

6

Coordinates

This column lists the genomic coordinates of the probe. SureDesign uses the probe coordinates to compute the total capture size of the design.

Coordinates must be provided in standard browser format, e.g. Chr1:1-100.

 

7-column probe upload file (target enrichment probes for SureSelect RNA designs)

The columns of an 7-column probe upload file are described in the table below.

Column #

Column header

Description of column content

1

TargetID

The TargetID is an identifier that describes the target of the probe sequence. For example, the TargetID may be gene name (e.g. BRCA1). You can have more than one probe with the same TargetID.

The Target ID can be up to 100 characters long.

2

ProbeID

The ProbeID is a unique identifier for the probe sequence.

The ProbeID can be up to 100 characters long.

3

Sequence

This column contains the complete sequence of the probe in 5' to 3' orientation.

The sequence can contain only A, T, G, and C characters. All sequences must be 120 nucleotides long.

4

Replication

The number in this column indicates the number of times that the probe is replicated within the probegroup.

This column allows you to control the replication number for each probe in the probegroup. However, you can enter a 1 in this column for all probes, and then override that entry by selecting Balanced, Max Performance, or Max Performance - XTHS/XT Low Input only in the wizard's Boosting setting. With either of these boosting options, SureDesign assigns a replication number to each probe based on its GC content.

5

TranscriptLocation

This column lists the location of the probe within the target transcript sequence. The required format is similar to that for a genomic location (i.e., standard browser format of one-based, closed) but the transcipt ID replaces the chromosome number.

6

Strand

An entry of "+" indicates that the probe captures the sense strand of the target. An entry of "–" indicates that the probe captures the anti-sense strand of the target.

For each probe, the Strand column must contain a single "+" or "–" character.

7

Chromosome

This column lists the chromosome number of the probe. SureDesign uses the probe coordinates (in columns 6, 7, and 8) to compute the total capture size of the design.

The chromosome number must be provided in standard BED format, e.g. "Chr1".

8-column probe upload file (target enrichment probes)

The columns of an 8-column probe upload file are described in the table below.

Column #

Column header

Description of column content

1

TargetID

The TargetID is an identifier that describes the target of the probe sequence. For example, the TargetID may be gene name (e.g. BRCA1). You can have more than one probe with the same TargetID.

The Target ID can be up to 100 characters long.

2

ProbeID

The ProbeID is a unique identifier for the probe sequence.

The ProbeID can be up to 100 characters long.

3

Sequence

This column contains the complete sequence of the probe in 5' to 3' orientation.

The sequence can contain only A, T, G, and C characters. All sequences must be 120 nucleotides long.

4

Replication

The number in this column indicates the number of times that the probe is replicated within the probegroup.

This column allows you to control the replication number for each probe in the probegroup. However, you can enter a 1 in this column for all probes, and then override that entry by selecting Balanced, Max Performance, or Max Performance - XTHS/XT Low Input only in the wizard's Boosting setting. With either of these boosting options, SureDesign assigns a replication number to each probe based on its GC content.

5

Strand

An entry of "+" indicates that the probe captures the sense strand of the target. An entry of "–" indicates that the probe captures the anti-sense strand of the target.

For each probe, the Strand column must contain a single "+" or "–" character.

6

Chromosome

This column lists the chromosome number of the probe. SureDesign uses the probe coordinates (in columns 6, 7, and 8) to compute the total capture size of the design.

The chromosome number must be provided in standard BED format, e.g. "Chr1".

7

Start

This column lists the nucleotide start position of the probe. SureDesign uses the probe coordinates (in columns 6, 7, and 8) to compute the total capture size of the design.

The nucleotide number must be provided in standard BED format (0-based, half-open).

8

Stop

This column lists the nucleotide stop position of the probe. SureDesign uses the probe coordinates (in columns 6, 7, and 8) to compute the total capture size of the design.

The nucleotide number must be provided in standard BED format (0-based, half-open).

 

 

See Also

Overview of the SureDesign advanced options

View and search for probegroups