|
|
The primary resource for 3D structures of biological macromolecules
is the Worldwide Protein Data Bank (wwPDB).
Comprehensive resources that may be ar may not be wwPDB members are: All of these resources have a QuickSearch or a LiteSearch Option. Is is important to note that the results obtained may be different. This is basically due to different search spaces or matching options and is illustrated by three searches performed on July 13, 2006.
Search for 'melanin':
Search for 'PYRR_BACSU':
Search for 'genase':
The OCA and MSD searches are done as a text query.The larger number of hits found by the first PDB search as compared to the PDBsum and JenaLib results is due to the fact that this search is based on mmCIF format files.
These files also include information from other databases, such as UniProt keywords, for example.
One the other hand, the JenaLib/PDBsum searches are based on the original PDB format files.
Note, however, that in the JenaLib QuickSearch option a mapping of PDB, UniProt and PROSITE codes is included
in the search space.
Further differences can occur if different PDB file records are taken into account.
Information in which records the search strings do occur is provided by PDBsum and JenaLib.
For the second search it is not clear why the PDB does not return any hits, because in other cases searching for UniProt IDs gives results. In MSD and PDBsum searching for UniProt codes seems not to be possible in the simple search versions. In the third search there is a dramatic difference in the hit number between PDBsum/JenaLib on the one side and MSD/OCA/PDB on the other side. The reason is that in the latter case a complete word matching is required, whereas in JenaLib/PDBsum a partial word match is sufficient. The difference disappears both for MSD and OCE if the wildcard sign is used in the search string. This does not work, however, in the RCSB/PDB case. The larger number of MSD hits may be again due to additional search in PubMed abstracts. Finally note, that PDBsum includes superseded entries. So, the take-home message from these observations is that the best results can be obtained by using search options of different resources. |
This QuickSearch option provides a simple search interface to the Jena Library of Biological Macromolecules (JenaLib).
PDB / NDB IDs and UniProt accession numbers
are recognized automatically. In this case the search is performed only in the corresponding ID / accession number list and requires a complete match.
Any other string, including UniProt IDs (entry names) and PROSITE IDs and accession numbers, is interpreted as one or more 'search terms'. The separation of these terms is indicated by blanks. So, the string 'arabinose isomerase' will be separated into the two search terms 'arabinose' and 'isomerase'. A phrase can be used as one search term by putting the complete string in double quotes. So, "arabinose isomerase" will be used, for example, as the single search term 'arabinose isomerase'. A search term must be at least three charactesr long. Within a phrase, character strings may be shorter than three as in "factor h", for example. However, the total number of characters surrounded by the double quotes and including blanks has to be three or larger. Double quotes can also be used to prevent the recognition of a string as a PDB / NDB code or accession number. A hit is returned if all search terms are found in a particular entry. This corresponds to a search term combination by a logical AND, see below. |
In the following description 'complete match' means that the complete database code must match a search term.
In contrast, 'partial match' means that only a part of a field like 'Structure Title' must match a search term. Fields are the database elements that contain parts of information from the PDB file or from other data sources, such as the TITLE or KEYWDs records of the PDB file. The QuickSearch option queries:
Only PDB information contained in the original PDB format files and cross-references between PDB, UNiProt and PROSITE codes is taken into account. Additional information from mmCIF format files is not used. More information on the 'PDB Format' can be obtained from the Protein Data Bank Contents Guide. |
|
The search returns either an atlas page or an entry list.
In the latter case all search fields with occurrences of at least one of the search terms are displayed
and the search terms are highlighted.
It is also possible to generate code lists with user-selected separators such as new line, comma, semicolon, blank, tab. Example output: 1 of 38719 entries match the query
|
In the following examples double quotes ("....") but NOT single quotes ('....') are part of the search string. The query strings are linked to the corresponding QuickSearch query.
|
Certainly, an AdvancedSearch option can query the database in a more versatile and specific manner
than a simple QuickSearch. Currently, we are working on a new AdvancedSearch option.
An old version
is still available, however. It is required, for example, if one wants to search for
SCOP, SMART, Pfam or Gene Ontology terms.
Our experience is, however, that a large fraction of database queries can be conducted in a satisfying manner by QuickSearch. One advantage of the QuickSearch option is also that it can identify entries where terms of upcoming developments are occurring in reference titles and keywords, for example, but have not yet made it into the more formalized PDB records. One example is 'solid state NMR' that not yet appears in the Methods record. |
|
|