QuickSearch Help

The primary resource for 3D structures of biological macromolecules is the Worldwide Protein Data Bank (wwPDB).
Comprehensive resources that may be ar may not be wwPDB members are:

All of these resources have a QuickSearch or a LiteSearch Option. Is is important to note that the results obtained may be different. This is basically due to different search spaces or matching options and is illustrated by three searches performed on July 13, 2006.

Search for 'melanin':

JenaLib	6 hits	:	1F9B, 1IDP, 1OYO, 2B9L, 2STD, 3STD
MSD	26 hits	:	1A8R, 1A9C, 1AR0, 1DOH, 1DPT, 1F9B, 1G0N, 1G0O, 1GTP, 1IDP, 1JA9, 1OAA, 1OUN, 1OYO, 1STD, 1TVB, 1TVh, 1YBV, 2B9L, 2STD, 3STD, 4STD, 5STD, 6STD, 7STD
OCA	12 hits	:	1DOH, 1F9B, 1G0N, 1G0O, 1GTP, 1JA9, 1OYO, 1STD, 1YBV, 2B9L, 2STD, 3STD
PDBsum	6 hits	:	1F9B, 1IDP, 1OYO, 2B9L, 2STD, 3STD
RCSB/PDB	14 hits	:	1DPT, 1F9B, 1OYO, 1STD, 1TVB, 1TVH, 1YBV, 2B9L, 2STD, 3STD, 4STD, 5STD, 6STD, 7STD

Search for 'PYRR_BACSU':

JenaLib	2 hits	:	1A3C, 1A4X
MSD	2 hits	:	1A3C, 1A4X
OCA	2 hits	:	1A3C, 1A4X
PDBsum	0 hits	:
RCSB/PDB	0 hits	:

Search for 'genase':

JenaLib	1942 hits	:
MSD	2608 hits	:	(In this case the search term has to be '*genase'. Otherwise you will get no hits.)
OCA	2243 hits	:	(Again '*genase' has to be used for searching.)
PDBsum	1952 hits	:
RCSB/PDB	0 hits	:	('*genase' does not work in this case.)

The OCA and MSD searches are done as a text query.The larger number of hits found by the first PDB search as compared to the PDBsum and JenaLib results is due to the fact that this search is based on mmCIF format files. These files also include information from other databases, such as UniProt keywords, for example. One the other hand, the JenaLib/PDBsum searches are based on the original PDB format files. Note, however, that in the JenaLib QuickSearch option a mapping of PDB, UniProt and PROSITE codes is included in the search space. Further differences can occur if different PDB file records are taken into account. Information in which records the search strings do occur is provided by PDBsum and JenaLib.
The by far largest number of hits is obtained from a MSD search. The reason is that MSD also searches in PubMed abstracts of primary and of all secondary citations.

For the second search it is not clear why the PDB does not return any hits, because in other cases searching for UniProt IDs gives results. In MSD and PDBsum searching for UniProt codes seems not to be possible in the simple search versions.

In the third search there is a dramatic difference in the hit number between PDBsum/JenaLib on the one side and MSD/OCA/PDB on the other side. The reason is that in the latter case a complete word matching is required, whereas in JenaLib/PDBsum a partial word match is sufficient. The difference disappears both for MSD and OCE if the wildcard sign is used in the search string. This does not work, however, in the RCSB/PDB case. The larger number of MSD hits may be again due to additional search in PubMed abstracts. Finally note, that PDBsum includes superseded entries.

So, the take-home message from these observations is that the best results can be obtained by using search options of different resources.

General Information

This QuickSearch option provides a simple search interface to the Jena Library of Biological Macromolecules (JenaLib).

PDB / NDB IDs and UniProt accession numbers are recognized automatically. In this case the search is performed only in the corresponding ID / accession number list and requires a complete match.
If a PDB / NDB ID was provided, the corresponding JenaLib atlas page will be shown directly. Otherwise a list of entries will be displayed.

Any other string, including UniProt IDs (entry names) and PROSITE IDs and accession numbers, is interpreted as one or more 'search terms'. The separation of these terms is indicated by blanks. So, the string 'arabinose isomerase' will be separated into the two search terms 'arabinose' and 'isomerase'.

A phrase can be used as one search term by putting the complete string in double quotes. So, "arabinose isomerase" will be used, for example, as the single search term 'arabinose isomerase'. A search term must be at least three charactesr long. Within a phrase, character strings may be shorter than three as in "factor h", for example. However, the total number of characters surrounded by the double quotes and including blanks has to be three or larger. Double quotes can also be used to prevent the recognition of a string as a PDB / NDB code or accession number.

A hit is returned if all search terms are found in a particular entry. This corresponds to a search term combination by a logical AND, see below.

Search Space

In the following description 'complete match' means that the complete database code must match a search term.
In contrast, 'partial match' means that only a part of a field like 'Structure Title' must match a search term.
Fields are the database elements that contain parts of information from the PDB file or from other data sources, such as the TITLE or KEYWDs records of the PDB file.

The QuickSearch option queries:

PDB IDs complete match ( example: 3CRO )

NDB IDs complete match ( example: PDR001 )

UniProt codes, including

- Primary accession number
complete match ( example: P03036 )

- Secondary accession number
complete match ( example: P25982 )

- ID / Entry name
partial match ( example: RCRO_BP434 )

PROSITE codes, including

- Accession number
complete match ( example: PS01122 )

- ID / Entry name
partial match ( example: CASPASE_CYS )

Header partial match ( PDB record: HEADER )

Structure Title partial match ( PDB record: TITLE )

Keywords partial match ( PDB record: KEYWDS )

Method partial match ( PDB record: EXPDTA )

Hetero Component Name partial match ( PDB record: HETNAM, HET ; only full name )

Reference, including all sub-records such as partial match ( PDB record: JRNL ; primary reference )

- auth

- titl

- ref

- refn

- ...

Compound, including all sub-records such as partial match ( PDB record: COMPND )

- molecule

- synonym

- ec

- ...

Source, including all sub-records such as partial match ( PDB record: SOURCE )

- organism_scientific

- organism_common

- cellular_location

- expression_system

- cell_line

- tissue

- ...

Only PDB information contained in the original PDB format files and cross-references between PDB, UNiProt and PROSITE codes is taken into account. Additional information from mmCIF format files is not used.

More information on the 'PDB Format' can be obtained from the Protein Data Bank Contents Guide.

How to Create a Search Query

Output

Example Queries

In the following examples double quotes ("....") but NOT single quotes ('....') are part of the search string.
The query strings are linked to the corresponding QuickSearch query.

Query	Description
_HUMAN	Search for all human proteins in the PDB that are cross referenced to a UniProt entry.
DPOLB_	Search for all DNA polymerase beta proteins in the PDB that are cross referenced to a UniProt entry.
CASPASE_	Search for all entries with the associated PROSITE IDs (entry names) CASPASE_CYS, CASPASE_HIS, CASPASE_P10, CASPASE_P20.
"refn: astm psfgey"	Search for all entries that have been published in the journal Proteins. refn is a group of fields that contains encoded references to the citation such as the ASTM (American Society for Testing and Materials) code or the ISSN and ISBN numbers. 'psfgey' is the ASTM code for the journal Proteins. This search mode is especially useful if a journal has changed its name or if the journal name is rather unspecific (as in this case)]. To get the ASTM code search first for the journal name and include the search string refn in the query. From the search results you will get the ASTM code that can than be used for a more specific search.
"solid state nmr"	Search for all entries that match exactly the phrase 'solid state NMR'.
caspase nmr	Search for all NMR structures of the protein caspase.
haloarcula marismortui	Search for all structures from the archaeum (archaebacterium) Haloarcula marismortui.
a.rich	Search for all structures with A. Rich as an author. Note, that the search string rich' returns many other hits containing, for example, the phrase 'leucine rich proteins' or all Escherichia coli proteins.
"tb structural genomics consortium"	Search for all structures deposited by the TB Structural Genomics Consortium.
jcsg !=nmr	Search for all non-NMR structures deposited by the Joint Center for Structural Genomics (JCSG).
", rsgi"	Search for all structures deposited by the Riken Structural Genomics/Proteomics Initiative (RSGI). Search for 'rsgi' alone would, for example, also yield entries authored by R.W. Pickersgill. Note also, that search for other author affilitations makes no sense, because this information is not included in the corresponding PDB record. The names of Structural Genomics Centers/Initiatives are an exception because they are used as author names.
gene:	Search for all entries which contain a sub-record ending with 'gene' (gene, organism_gene, expression system_gene).
"gene: thsa"	Search for all structures for which the thsA gene is indicated in the sub-records ending with 'gene' (gene, organism_gene, expression system_gene).
"cellular_location: cytoplasm"	Search for all structures for which cytoplasm is indicated in the sub-records 'cellular_location' or 'expression_system_cellular_location'.
renin	Search for all entries that contain the string 'renin'. The hits include both entries with 'renin' and with other 'renin'-containg strings such as 'prorenine' or 'kynurenine'.
" renin "	Returns 'renin' hits only.
moglobin	Search for all hemoglobin entries. The PDB uses the two different spellings 'hemoglobin' and 'haemoglobin'. Searching for 'moglobin' identifies both of them.
"organism_common: fungi"	Search for all fungi proteins. Searching for 'fungi' alone would also return hits where fungi is part of a longer string such as in fungicide or sinefungin.
5.3.1.5	Search for the enzyme classificator 5.3.1.5 (xylose isomerase). Note that the PDB files contain slightly different strings related to enzyme classification, e.g.: 'E.C.5.3.1.5', 'E.C. 5.3.1.5' and 'EC: 5.3.1.5'. Also, be careful in interpreting the results. For example, a search for '1.1.1.1' returns not only the '1.1.1.1'-hits but also entries with '1.1.1.158' or '1.1.1.1.47', for example.
"to be published"	Search for all entries with the reference information 'to be published'. On May 3, 2006 this query returned 7052 hits.
"2005"	Search for all occurrences of '2005'. These occurrences may include the year of publication, page numbers, a specific CCDC/PDB confirmation code in the refn sub-record and possibly further cases, for example the occurrence of '2005' in large page numbers. So, this search will return structures published in 2005 but also further entries.Getting only 2005 structures is not possible with the QuickSearch option. In any case you have to use double quotes. Otherwise, the search string is considered as PDB ID.
" 1996"	Search for all occurrences of ' 1996' (note the leading blank). The leading blank prevents to find hits, where '1996' is part of a larger string but does not prevent cases, where 1996 is a page number. As 1996 is not used as a code in the refn sub-record, and also 1996 is obviously never used as a page numer (as of June 2006), this search should very likely give all structures with 1996 primary citations. Note, that the number of these hits is not identical to the number of entries released in 1996. The latter quantity also includes cases with a 'to be published'-references and, possibly, entries that were already published before 1996.
" 1973"	Search for all occurrences of ' 1973'. In addition to 1973 structures you will also get other entries, for example, the one with the line 'fragment: nonstructural protein ns5a (p56)(residues 1973- 2003 of swiss-prot sequence p27958)'. So, for a more reliable and specific search in references you have to use the (upcoming) AdvancedSearch option.

Preliminary Note

General Information

Search Space

How to Create a Search Query

Output

Example Queries

AdvancedSearch vs. QuickSearch