FASTA - Maple Help
For the best experience, we recommend viewing online help using Google Chrome or Microsoft Edge.

Online Help

All Products    Maple    MapleSim


FASTA (.fasta) File Format

FASTA file format

 

Description

Details on the FASTA format

Notes

Examples

References

Description

• 

FASTA is a plaintext format for storing protein or nucleic acid (DNA or RNA) data as character sequences.  It is a popular interchange format for molecular biology software.

• 

The commands Import and Export support this format.

Details on the FASTA format

• 

The FASTA format employs the following standard IUB/IUPAC conventions for encoding protein or nucleic acid sequences as alphabetic characters.

• 

In addition to codes specifying particular nucleic acids or amino acids, the convention supports codes for ambiguous sequences where a position may be occupied by more than one possible nucleic acid or amino acid. For example the code R matches either adenine (A) or guanine (G).

 

Table 1: Nucleic Acid Codes

 

Code

Meaning

Description

Code

Meaning

Description

A

A

Adenine

B

{C,G,T,U}

Not A

C

C

Cytosine

D

{A,G,T,U}

Not C

G

G

Guanine

H

{A,C,T,U}

Not G

T

T

Thymine

V

{A,C,G}

Not T or U

U

U

Uracil

N

{A,C,G,T,U}

Any Nucleic acid

R

{A,G}

Purine

Y

{C,T,U}

Pyramidine

K

{G,T,U}

Ketone

M

{A,C}

Amino

S

{C,G}

Strong interaction

W

{A,T,U}

Weak interaction

 

Table 2: Amino Acid Codes

 

Code

Description

Code

Description

Code

Description

A

Alanine

J

I or L

S

Serine

B

D or N

K

Lysine

T

Threonine

C

Cysteine

L

Leucine

U

Selenocysteine

D

Aspartic acid

M

Methionine

V

Valine

E

Glutamic acid

N

Asparagine

W

Tryptophan

F

Phenylalanine

O

Pyrrolysine

 

 

G

Glycine

P

Proline

Y

Tyrosine

H

Histidine

Q

Glutamine

Z

E or Q

I

Isoleucine

R

Arginine

 

 

X

any amino acid

*

translation stop

-

gap of indeterminate length

Notes

• 

Content-Type: chemical/seq-aa-fasta, chemical/seq-na-fasta

Examples

Import a DNA sequence from a FASTA file.

DNASequenceImportexample/humanmtDNA.fasta,base=datadir:

Read the descriptor for the first sequence in the file.

DNASequence1,1

Human mitochondrial genome,HVR2,CR,HVR1

(1)

Examine positions 100 through 150 in this sequence.

DNASequence1,2100..150

GGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATC

(2)

Count the frequency of each of the nucleotide base pairs within the sequence.

frequenciesStringTools:-CharacterFrequenciesDNASequence1,2,dna

frequenciesA=5118,C=5185,G=2175,T=4092

(3)

Statistics:-ColumnGraphfrequencies

References

  

IUPAC code for incomplete nucleic acid specification, National Center for Biotechnology Information.

  

A One-Letter Notation for Amino Acid Sequences, International Union of Pure and Applied Chemistry.

See Also

Formats

Formats,FASTQ

Formats,GenBank