by PDB,NDB,UniProt,PROSITE Code or Search Term(s)  

The aim of this Base Pair Directory is to compile structural information on nucleic acid base pairs.

This is work in progress. We start out from the usual canonical and noncanonical base pairs with two or three hydrogen bonds and will finally include more recently discovered unusual base pairs with only one standard hydrogen bond and additional C-H...O or C-H...N contacts, water-mediated pairs, and even base pairs with no standard hydrogen bond at all. Examples for these latter pairs include:

 Basic Information

Nucleic acids are  polymers made up of repeated units, nucleotides, comprising three components:

In a formal sense a nucleic acid strand is generated by forming C3'-O3' bonds between different nucleotides. This is, however, only a formal structural description. The chemical reaction is more complicated. The well-known double helix is obtained by connecting the two strands via hydrogen bonding between bases.

These images show a nucleic acid double helix structure in an ideal B conformation. Nucleic acids can, however, occur in different conformations. The bases are colored in the following manner: A - red, T - yellow, C - blue, G - green.

B-DNA side view 
B-DNA top view 
detailed view of a base pair

Enlarge images

The bases correspond to the colored plates in the side view and are located inside in the top view.

Base pairing via hydrogen bonds as shown in the detailed view is of utmost importance for the structure of nucleic acids.

Note, however, that interactions within the sugar-phosphate backbone and base stacking are also relevant for nucleic acid structure.

The base pairs are formed from the two purine bases adenine (A) and guanine (G) and from the two pyrimidine bases cytosine (C) and uracil (U) or thymine (T).

- purine bases

adenine - A guanine - G

- pyrimidine bases

uracil - U

thymine - T

cytosine - C

Uracil is used in RNA and thymine in DNA. The standard or canonical Watson-Crick base pairs are A-U(T) and G-C. More information on these base pairs can be found here.

In addition, other non-canonical base pairs have been found. The latter base pairs are also called mismatches. Many of them occur in RNA structures.  Therefore, often only uracil but not thymine is taken into account.

 Canonical and non-canonical base pairs with at least two standard hydrogen bonds

There are various compilations of possible base pairs.

  1. I. Tinoco, Jr. In Appendix 1 of:The RNA World (R. F. Gesteland, J. F. Atkins, Eds.), Cold Spring Harbor Laboratory Press, 1993, pp. 603-607.
  2. G. Dirheimer, G. Keith, P. Dumas, E. Westhof. In: tRNAs Structure, Biosynthesis, and Function. (D. Söll, and U. RajBhandary, Eds.); American Society for Microbiology, Washington, 1995, pp. 93-126.
  3. G. A. Jeffrey, W. Saenger, Hydrogen Bonding in Biological Structures, Springer-Verlag, Berlin, 1991.

  4. The non-canonical base pair database (Fox lab)

  5. RNA Base Pair Isostericity (Leontis, Westhof)

In 1. 28 base pairs with at least four H-bond heavy-atom donor/acceptor sites have been enumerated. The compilation 2. includes also examples with only three H-bond heavy-atom donor acceptor sites and lists 38 base pair structures. On the other hand, in 2. base pairs involving H-bonds with N3 of purines are not considered. The classification by Leontis and Westhof provides new and more comprehensive information.

In the following a comprehensive compilation is presented. The total number of possible base pairs with at least two standard H-bonds and four heavy-atom donor/acceptor site is 32. This means that four additional pairs are included as compared to the Tinoco compilation (2 x GU, 1x GG, 1 x GC).  They were probably discarded for sterical reasons. However, a comprehensive search for all base pairs occurring in the currently known RNA structures has shown that this is not justified in all cases.

It is important to note that the compilations given above and below are based on simple structural rules. It cannot be excluded that a few base pairs listed do not correspond to an energy minimum. In addition, it should be kept in mind that in a nucleic acid structure stacking and backbone restraints may affect base pair geometries.

 All possible base pairs with at least two standard H-bonds

In parentheses the number of possible base pair structures with (four/three) heavy-atom donor-acceptor sites is given ( x stands for data coming soon).

purine-purine:            AA (3/0) | GG (5/2) | GA (4/2)             | (12/4) base pairs
pyrimidine-pyrimidine:    CC (2/2) | UU (3/0) | CU (2/0)             | ( 7/2) base pairs
purine-pyrimidine:        AC (2/2) | AU (4/0) | GC (3/4)  | GU (4/x) | (13/x) base pairs (not yet finalized)
                                                               total | (32/x) base pairs

The backbone may lead to steric restraints on base pairing. Therefore, in the preceding tables the complete nucleotides are shown. The backbone geometry corresponds to a standard A-RNA conformation.  The base pair geometries were generated manually. The two bases are  located approximately in a common plane and the  hydrogen bond H...O or H...N distances are approximately 2 A. The structures shown do not correspond to either optimized or experimental geometries.

Both the canonical and non-canonical base pairs mentioned above were formed from standard nucleotides/bases. Modified nucleotides/bases do also occur. A few of them found in transfer RNA are shown here. A comprehensive compilation of modified nucleotides in RNA can be obtained from the RNA Modification Database.

Direct questions and criticism to Jürgen Sühnel.