Main Content

affyprobeseqread

Read data file containing probe sequence information for Affymetrix GeneChip array

Syntax

Struct = affyprobeseqread(SeqFile, CDFFile)
Struct = affyprobeseqread(SeqFile, CDFFile, ...'SeqPath', SeqPathValue, ...)
Struct = affyprobeseqread(SeqFile, CDFFile, ...'CDFPath', CDFPathValue, ...)
Struct = affyprobeseqread(SeqFile, CDFFile, ...'SeqOnly', SeqOnlyValue, ...)

Input Arguments

SeqFile

Character vector or string specifying a file name of a sequence file (tab-separated or FASTA) that contains the following information for a specific type of Affymetrix® GeneChip® array:

  • Probe set IDs

  • Probe x-coordinates

  • Probe y-coordinates

  • Probe sequences in each probe set

  • Affymetrix GeneChip array type (FASTA file only)

The sequence file (tab-separated or FASTA) must be on the MATLAB® search path or in the Current Folder (unless you use the SeqPath property). In a tab-separated file, each row represents a probe; in a FASTA file, each header represents a probe.

CDFFile

Either of the following:

  • Character vector or string specifying a file name of an Affymetrix CDF library file, which contains information that specifies which probe set each probe belongs to on a specific type of Affymetrix GeneChip array. The CDF library file must be on the MATLAB search path or in the MATLAB Current Folder (unless you use the CDFPath property).

  • CDF structure, such as returned by the affyread function, which contains information that specifies which probe set each probe belongs to on a specific type of Affymetrix GeneChip array.

Caution

Make sure that SeqFile and CDFFile contain information for the same type of Affymetrix GeneChip array.

SeqPathValueCharacter vector or string specifying a folder or path and folder where SeqFile is stored.
CDFPathValueCharacter vector or string specifying a folder or path and folder where CDFFile is stored.
SeqOnlyValueControls the return of a structure, Struct, with only one field, SequenceMatrix. Choices are true or false (default).

Output Arguments

Struct

MATLAB structure containing the following fields:

  • ProbeSetIDs

  • ProbeIndices

  • SequenceMatrix

Description

Struct = affyprobeseqread(SeqFile, CDFFile) reads the data from files SeqFile and CDFFile, and stores the data in the MATLAB structure Struct, which contains the following fields.

FieldDescription
ProbeSetIDs

Cell array containing the probe set IDs from the Affymetrix CDF library file.

ProbeIndices

Column vector containing probe indexing information. Probes within a probe set are numbered 0 through N - 1, where N is the number of probes in the probe set.

SequenceMatrix

An N-by-25 matrix of sequence information for the perfect match (PM) probes on the Affymetrix GeneChip array, where N is the number of probes on the array. Each row corresponds to a probe, and each column corresponds to one of the 25 sequence positions. Nucleotides in the sequences are represented by one of the following integers:

  • 0 — None

  • 1 — A

  • 2 — C

  • 3 — G

  • 4 — T

Note

Probes without sequence information are represented in SequenceMatrix as a row containing all 0s.

Tip

You can use the int2nt function to convert the nucleotide sequences in SequenceMatrix to letter representation.

Struct = affyprobeseqread(SeqFile, CDFFile, ...'PropertyName', PropertyValue, ...) calls affyprobeseqread with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

Struct = affyprobeseqread(SeqFile, CDFFile, ...'SeqPath', SeqPathValue, ...) lets you specify a path and folder where SeqFile is stored.

Struct = affyprobeseqread(SeqFile, CDFFile, ...'CDFPath', CDFPathValue, ...) lets you specify a path and folder where CDFFile is stored.

Struct = affyprobeseqread(SeqFile, CDFFile, ...'SeqOnly', SeqOnlyValue, ...) controls the return of a structure, Struct, with only one field, SequenceMatrix. Choices are true or false (default).

Examples

  1. Read the data from a FASTA file and associated CDF library file, assuming both are located on the MATLAB search path or in the Current Folder.

    S1 = affyprobeseqread('HG-U95A_probe_fasta', 'HG_U95A.CDF');
    
  2. Read the data from a tab-separated file and associated CDF structure, assuming the tab-separated file is located in the specified folder and the CDF structure is in your MATLAB Workspace.

    S2 = affyprobeseqread('HG-U95A_probe_tab',hgu95aCDFStruct,...
         'seqpath','C:\Affymetrix\SequenceFiles\HGGenome');
    
  3. Access the nucleotide sequences of the first probe set (rows 1 through 20) in the SequenceMatrix field of the S2 structure.

    seq = int2nt(S2.SequenceMatrix(1:20,:))

Version History

Introduced in R2007a