The PRINTS Fingerprint Database



CONTENTS SMITE 1.0 Introduction 1.1 Smite Tutorial a) How to examine a PRINTS entry b) Simple queries c) Complex queries d) Multiple parameters e) Very complex queries f) Other display commands g) Other commands which can accept a query. h) How LISTS make life easy i) NEGWORK j) How can I see what entries are in the PRINTS database? k) EXTRACT l) Shortcuts 1.2 Summary of SMITE commands, qualifiers, functions and operators a) Commands b) Functions c) Operators d) Qualifiers 2.0 References 2.1 Applications SMITE 1.0 Introduction SMITE is a query language for the PRINTS database. It uses the same general syntax as DELPHOS, the query language for the OWL database. The program allows you to examine the database and also to extract motif sets in ADSP format. This brief guide first provides a tutorial on the use of SMITE followed by command descriptions. It assumes you're using the example PRINTS database. 1.1 Smite Tutorial a) How to examine a PRINTS entry The description of the PRINTS database format has shown that every entry has a unique identifier code. This section shows how to examine individual entries. The SMITE command `DISPLAY' is used to query the PRINTS database. Type SMITE> display code "lyslact" You'll see the following information displayed on your screen. WORKLIST ENTRIES (1): LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE The SMITE command translates to `display the PRINTS database entry with the identifier code lyslact'. DISPLAY is the command, CODE is a function. The CODE function allows you to select database entries for examination. The default action of the DISPLAY command is just to provide you with a very brief description of what the fingerprint represents (in this case a lysozyme/lactalbumin fingerprint entry). What you've typed is a `query'. SMITE stores the entry codes which match your query in a list called the WORKLIST. After the above query it shows that there is one entry in the worklist, namely LYSLACT. That is to be hoped as you've asked SMITE for a unique entry!. You can redisplay the contents of the worklist at any time by typing SMITE> display Try it now. You should get the same results shown as your original query. There is obviously more information in a PRINTS database entry than just its title. To get more information from the DISPLAY command you use `qualifiers'. These qualifiers are listed in Appendix B. To see what qualifiers are available type SMITE> help As an example of their use type ... SMITE> display/brief code "lyslact" This will give you the following output on your screen:- WORKLIST ENTRIES (1): LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE Type of feature: COMPOSITE with 6 elements Prosite code: PS00128 LACTALBUMIN_LYSOZYME; PATTERN Created by D.N.PERKINS, 29-MAY-1991 (UPDATE M.E.BECK, 5-APR-1993) 1. SHEWALE, J.G., SUDHIR, K.S. and BREW, K. Evolution of alpha-lactalbumins. J.BIOL.CHEM. 259 4947-4956 (1984). 2. IRWIN, D.M. and WILSON, A.C. Multiple cDNA sequences and the evolution of bovine stomach lysozyme. J.BIOL.CHEM. 264 11387-11393 (1989). 3. STUART, D.I., ACHYARA, K.R., WALKER, N.P.C., SMITH, S.G., LEWIS M. and PHILLIPS D.C. Alpha-lactalbumin possesses a novel calcium binding loop. NATURE 324 84-87 (1986). 4. NITTA, K., HIDEAKI, H., SHINTARO, S. and SHIMAZAKI, K. The calcium binding property of equine lysozyme. FEBS LETTERS 223 405-408 (1987). Lysozyme C and alpha-lactalbumin and are similar both in terms of primary sequence and structure, and probably evolved from a common ancestral protein. There is, however, no similarity in function as lactalbumin promotes the conversion of galactosyltransferase to lactose synthase and is essential for milk production [1], while lysozyme catalyses the hydrolysis of bacterial cell wall polysaccharides; it has also been recruited for a digestive role in certain ruminants and colobine monkeys [2]. Another significant difference between the 2 enzymes is that all lactalbumins have the ability to bind calcium [3], while this property is restricted to only a few lysozymes [4]. The binding site was deduced using high resolution X-ray structure analysis and was shown to consist of 3 aspartic acid residues. It was first suggested that the calcium bound to lactalbumin stabilised the structure, but recently it has been claimed that calcium controls the release of lactalbumin from the golgi membrane and that the pattern of ion binding may also affect the catalytic properties of the lactose synthetase complex. LYSLACT is a 6-element fingerprint that provides a signature for the lysozyme/alpha-lactalbumin superfamily. The fingerprint was derived from an initial alignment of 12 sequences: motif 5 encodes the calcium binding region, and together with motif 4 contains 3 of the 8 cysteine residues that are conserved in both lysozymes and lactalbumins (cf. PROSITE pattern LACTALBUMIN_LYSOZYME (PS00128)). Two iterations on OWL10.1 were required to reach convergence, at which point a true set comprising 81 sequences was identified (cf. signatures LYSOZYME and LACTALBUMIN). An update on OWL19.1 identified a true set containing 98 sequences, together with a number of partial matches, all of which are fragments. SUMMARY INFORMATION 98 codes involving 6 elements 0 codes involving 5 elements 1 codes involving 4 elements 2 codes involving 3 elements 7 codes involving 2 elements COMPOSITE FINGERPRINT INDEX 6| 98 98 98 98 98 98 5| 0 0 0 0 0 0 4| 0 0 1 1 1 1 3| 0 0 2 2 2 0 2| 6 6 1 0 0 1 --+------------------------------- | 1 2 3 4 5 6 Qualifiers are only used with the DISPLAY, ISHOW and FSHOW commands. They must always immediately follow the command i.e. they must appear before any `functions' such as CODE. Now that LYSLACT is in the worklist you can redisplay these results by typing SMITE> display/brief Try it. The /brief qualifier gives you the type of fingerprint (simple or composite), the PROSITE code (if any), the author and creation date of the fingerprint, bibliographic references, comments, summary information and the composite fingerprint index. These could have been selected individually by using the /TYPE, /AUTHOR, /REFERENCE, /COMMENT, /SUMMARY and /CFI qualifiers. Assuming you still have LYSLACT in your worklist try typing.. SMITE> display/author SMITE> display/type etc. You should get selected extracts from the information you obtained by typing /BRIEF. You can combine qualifiers, try typing e.g. SMITE> display/type/comment code "lyslact" or just SMITE> display/type/comment (if lyslact is already in the worklist) Try several combinations. Some information is not displayed by /BRIEF, notably the scan history, protein codes and titles for motif sets (true/false positives, true/false negatives, subfamily positives and negatives), initial motif sets and final motif sets. To get the full information about a fingerprint type SMITE> display/full code "lyslact" (or just SMITE> display/full if lyslact is in the worklist). As usual the history, pcode information, initial and final motif sets can be displayed individually by typing SMITE> display/history code "lyslact" SMITE> display/title SMITE> display/imotif SMITE> display/fmotif Try them. Three qualifiers shown by `SMITE> help' have not yet been mentioned. These are /INFO, /OUTPUT=fn and /PRINTER. The /INFO qualifier will be described later. The /OUTPUT and /PRINTER qualifiers allow you to send SMITE output to somewhere other than the screen. This is very useful for getting hardcopy. Try typing SMITE> display/brief/output=LYS.BRIEF code "lyslact" Leave SMITE by typing SMITE> quit and examine the file LYS.BRIEF that has been produced. It should contain all the information that would otherwise be sent to the screen. The /PRINTER option is available on some systems and will send SMITE output directly to the default system printer at your site. Note that the DISPLAY command, with or without qualifiers, shows everything in the worklist. If you have more than one entry in the worklist (see later) and you use /BRIEF (for example) you'll get brief information on all the entries in the worklist. This subsection has covered how you can display entries in the PRINTS database; subsequent subsections show how you can query the database using SMITE. b) Simple queries In this part of the tutorial we'll again restrict the examples to the DISPLAY command. SMITE contains many other functions other than CODE. You can query the database on the basis of general text, pcode text, pcodes, sequence and the number of elements in a fingerprint by using these functions. Lets start off with a text query. You want to know which entries in the PRINTS database mention `calcium'. To find out type... SMITE> display text "calcium" ... and you'll get the following representative output:- WORKLIST ENTRIES (3): AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE CANODO NODO CALCIUM BINDING SIGNATURE LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE The default output, as usual, just gives the title lines for the matching fingerprints. Only one of them (CANODO) has the word `calcium' in the title line, in the others the word could be in the prosite name, author, bibliography or comment fields. The TEXT function looks at all these fields. Try typing SMITE> display/brief text "calcium" (or just `display/brief' if you've just performed the TEXT query) and spot where `calcium' occurs in the descriptions for AAMYLASE and LYSLACT. NB: Like DELPHOS, all text queries use the idea of FREE TEXT SEARCHING. Only the numerals 0-9 and letters A-Z (case insensitive) are significant. All query probes must be at least 3 letters long. Free text searching means that you don't have to use complete words in your query, for example SMITE> display/brief text "alciu" would be a valid query if looking for the occurrence of the word calcium. Also, because all punctuation (INCLUDING SPACES) is ignored you can use, for example, the two equivalent queries SMITE> display/brief text "alphaamylase" SMITE> display/brief text "phaamyl" to detect both `ALPHA-AMYLASE' and 'ALPHA AMYLASE' occurrences. The queries SMITE> display/brief text "alpha-amylase" SMITE> display/brief text "pha-amyl" have EXACTLY the same effect as either of the previous queries as all punctuation is removed from your query before the search is started. This approach has numerous advantages when you consider the lack of linguistic standardisation in molecular biology nomenclature. Try them all. As the scope of the TEXT function is ALL the general text it is a very versatile function. For example, to find the database entry corresponding to a PROSITE code you just have to type e.g. SMITE> display text "ps00128" The PTEXT function has precisely the same use as the TEXT function but, whereas TEXT looks at the general text, PTEXT looks at the text in the title lines of the pcodes. To see the difference type SMITE> display text "prothrombin" and SMITE> display ptext "prothrombin" The TEXT query doesn't find anything but the PTEXT query gives the following output:- WORKLIST ENTRIES (1): KRINGLE KRINGLE DOMAIN SIGNATURE This shows that at least one of the pcode titles in the KRINGLE fingerprint contains the word `prothrombin'. Again, only the default fingerprint title is given by this query. To get a list of the pcode titles you could have typed SMITE> display/title ptext "prothrombin" however, the word could be a little difficult to spot, especially if there are a lot of pcodes containing the search string. Because of this problem the /INFO qualifier is provided. This qualifier causes SMITE to display each matching hit (in this case a pcode plus title) as it finds them. The /INFO qualifier is only active when the query is being performed; it cannot be used for redisplay of the worklist. Type SMITE> display/info ptext "prothrombin" and you'll get the following output:- Matches for PTEXT probe PROTHROMBIN are: KRINGLE THRB_BOVIN PROTHROMBIN PRECURSOR (EC 3.4.21.5). - BOS TAURUS (BOVINE). KRINGLE BOVTHBNM BOVTHBNM preprothrombin - Bos taurus KRINGLE THRB_HUMAN PROTHROMBIN PRECURSOR (EC 3.4.21.5) (COAGULATION FACTOR II) - HO KRINGLE THRB_MOUSE PROTHROMBIN PRECURSOR (EC 3.4.21.5). - MUS MUSCULUS (MOUSE). KRINGLE THRB_RAT PROTHROMBIN PRECURSOR (EC 3.4.21.5). - RATTUS NORVEGICUS (RAT). WORKLIST ENTRIES (1): KRINGLE KRINGLE DOMAINS This shows, for each occurrence, the database codename, the pcode and the pcode title line. The PCODE function allows you to find which database entries contain a particular pcode. Type SMITE> display pcode "5pti" and you'll get WORKLIST ENTRIES (1): HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE If you use the /INFO qualifier you'll get the pcode title line as well. Type SMITE> display/info pcode "5pti" and you'll get 5PTI Trypsin Inhibitor (Crystal Form II) - Bovine (Bos Taurus) Pan Matches for PCODE probe 5PTI are: No. of matches = 1 HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE WORKLIST ENTRIES (1): HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE SMITE also allows you to search the final motif sets on the basis of sequence information. This is done using the SEQ function. Type SMITE> display seq "iwg" To get... WORKLIST ENTRIES (2): DAGPE DIACYLGLYCEROL/PHORBOL ESTER BINDING SIGNATURE GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE This shows that two fingerprintss contain at least one occurrence of the peptide ile-trp-gly. Use the /INFO qualifier to see the sequence in context by typing SMITE> display/info seq "iwg" and you'll get..... Matches for SEQ probe IWG are: DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPC1_RABIT 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPC1_RAT 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPC2_BOVIN 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPC2_HUMAN 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPC2_RABIT 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPC2_RAT 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG RATPKCB1 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG RATPKCII 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG A37237 55 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPCA_HUMAN 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPCA_MOUSE 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPCA_RABIT 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPCA_RAT 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG MMUV25PKC 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPCG_BOVIN 34 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPCG_HUMAN 49 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPCG_RABIT 49 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPCG_RAT 49 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG B37237 47 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCTDF IWG KPCA_BOVIN 50 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCKDF IWG KPC1_DROME 59 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSLCRDF IWG APLPKCB 190 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCREF IWG KPC3_DROME 85 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCRDF IWG KPCE_MOUSE 183 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCRDF IWG KPCE_RABIT 183 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCRDF IWG KPCE_RAT 183 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCREF IWG KPCL_MOUSE 185 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCRDF IWG HSPKCE 183 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCREF IWG RNPKCETA 185 1 DAGPE2 DAG/PE element II - 5 Length = 10 CGHCKDF IWG KPC2_DROME 85 1 DAGPE2 DAG/PE element II - 5 Length = 10 CSHCREF IWG HUMPKCL 184 1 DAGPE2 DAG/PE element II - 5 Length = 10 CGQCSER IWG KPCZ_RAT 144 1 DAGPE2 DAG/PE element II - 5 Length = 10 CGQCSER IWG S25605 136 1 DAGPE2 DAG/PE element II - 5 Length = 10 CGQCSER IWG MUSPROKINC 144 1 GPCRRHOD4 GPCR transmembrane motif IV - 18 Length = 22 LVKFICLS IWG LSLLLALPVLL IL8B_HUMAN 155 12 GPCRRHOD4 GPCR transmembrane motif IV - 18 Length = 22 WAKLYSLV IWG CTLLLSSPMLV BRB2_HUMAN 145 13 GPCRRHOD7 GPCR transmembrane motif VII - 18 Length = 27 T IWG ACFAKSAACYNPIVYGISHPKYG OPS1_CALVI 308 12 GPCRRHOD7 GPCR transmembrane motif VII - 18 Length = 27 T IWG SVFAKANSCYNPIVYGISHPRYK CRBOPLE 309 13 GPCRRHOD7 GPCR transmembrane motif VII - 18 Length = 27 T IWG SVFAKANSCYNPIVYGISHPRYK CRBOPM 309 13 GPCRRHOD7 GPCR transmembrane motif VII - 18 Length = 27 T IWG ACFAKSAACYNPIVYGISHPKYR OPS1_DROPS 311 12 GPCRRHOD7 GPCR transmembrane motif VII - 18 Length = 27 T IWG ACFAKSAACYNPIVYGISHPKYR OPS1_DROME 310 12 GPCRRHOD7 GPCR transmembrane motif VII - 18 Length = 27 T IWG ATFAKTSAVYNPIVYGISHPNDR OPS2_DROPS 317 12 GPCRRHOD7 GPCR transmembrane motif VII - 18 Length = 27 T IWG ATFAKTSAVYNPIVYGISHPNDR DPRH2OP 317 12 GPCRRHOD7 GPCR transmembrane motif VII - 18 Length = 27 T IWG ATFAKTSAVYNPIVYGISHPKYR OPS2_DROME 317 12 The final function is the ELEMENT function. This allows you to select database entries on the basis of how many elements make up the fingerprint. Type SMITE> display element "4" and you'll get... WORKLIST ENTRIES (4): CANODO NODO CALCIUM BINDING SIGNATURE DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE KRINGLE KRINGLE DOMAIN SIGNATURE SENSOR BACTERIAL SENSOR PROTEIN C-TERMINAL SIGNATURE These are all the entries whose fingerprints contain 4 elements. To see this you could type SMITE> display/type element "4" The ELEMENT function also allows you to specify greater-than or less-than parameters. Type SMITE> display element ">4" to get.... WORKLIST ENTRIES (4): AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE These are all the database entries whose fingerprints are made up of more than 4 elements. Similarly type SMITE> display element "<5" to get... WORKLIST ENTRIES (6): HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE CANODO NODO CALCIUM BINDING SIGNATURE DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE FERREDOXIN PLANT FERREDOXIN SIGNATURE KRINGLE KRINGLE DOMAIN SIGNATURE SENSOR BACTERIAL SENSOR PROTEIN C-TERMINAL SIGNATURE These are all the database entries whose fingerprints are made up of less than 5 elements. The SMITE functions can all be preceded by the NOT word. This negates the worklist. For example, to find all the database entries which are not composed of exactly 4 elements type SMITE> display not element "4" Try it on the other functions as well. c) Complex queries You now know how to use simple queries and how to display results. We can now add another level of complexity and show further flexibility of the SMITE query language. SMITE allows you to use multiple functions in a query and to combine the results of such queries. This introduces the idea of `OPERATORS'. The operators available are AND, OR, XOR, ADD, SUBTRACT and NOT. They are easy to use and have their intuitive meanings so don't be put off! In order to show what the operators do by using examples type SMITE> display ptext "mouse" This will give.... WORKLIST ENTRIES (7): AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE FERREDOXIN PLANT FERREDOXIN SIGNATURE GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE KRINGLE KRINGLE DOMAIN SIGNATURE LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE Now type SMITE> display ptext "rat" which will give.... WORKLIST ENTRIES (9): HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE CANODO NODO CALCIUM BINDING SIGNATURE DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE FERREDOXIN PLANT FERREDOXIN SIGNATURE GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE KRINGLE KRINGLE DOMAIN SIGNATURE LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE SENSOR BACTERIAL SENSOR PROTEIN C-TERMINAL SIGNATURE SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE As you can see some ptext entries are common to both lists and others are unique. Now type SMITE> display ptext "mouse" or ptext "rat" To give... WORKLIST ENTRIES (10): AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE CANODO NODO CALCIUM BINDING SIGNATURE DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE FERREDOXIN PLANT FERREDOXIN SIGNATURE GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE KRINGLE KRINGLE DOMAIN SIGNATURE LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE SENSOR BACTERIAL SENSOR PROTEIN C-TERMINAL SIGNATURE SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE This query has asked for `all entries which contain the ptext mouse OR the ptext rat OR BOTH'. This is as you'd intuitively expect. Now try SMITE> display ptext "mouse" and ptext "rat" to give.... WORKLIST ENTRIES (6): DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE FERREDOXIN PLANT FERREDOXIN SIGNATURE GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE KRINGLE KRINGLE DOMAIN SIGNATURE LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE This has shortened the worklist considerably. The query has asked for `all entries which contain BOTH mouse AND rat in their ptext fields'. Now try SMITE> display ptext "mouse" xor ptext "rat" to give.... WORKLIST ENTRIES (4): AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE HELIX1N TYPE I ALPHA-HELIX N-TERMINAL SIGNATURE CANODO NODO CALCIUM BINDING SIGNATURE SENSOR BACTERIAL SENSOR PROTEIN C-TERMINAL SIGNATURE Again the list contains 4 entries but not the same as with AND. The operator XOR stands for `exclusive or'. The query has asked for `all entries which contain EITHER mouse OR rat BUT *NOT* BOTH'. The operator called ADD is exactly the same as the OR operator, it is just added for ease of understanding. The SUBTRACT operator again does what you'd expect. Type SMITE> display ptext "mouse" subtract ptext "rat" to give.... WORKLIST ENTRIES (1): AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE This query has asked for a list of `all entries which contain mouse in the ptext fields EXCEPT those which contain rat in the ptext fields'. Again, SUBTRACT is added for clarity; it is actually equivalent to `AND NOT'. Try typing SMITE> display ptext "mouse" and not ptext "rat" This will give the same answer as the previous query. This is because a list is created of all those entries which contain `mouse', another list is created of all those entries which *don't* contain `rat' and the two lists are ANDed together. The ability to use constructs like `AND NOT' or 'XOR NOT' is a powerful feature of SMITE but it is advised that you gain experience with SMITE before using NOT in earnest. Above all DON'T PANIC! Use SUBTRACT instead of AND NOT if it is easier for you to understand. Finally, using operators you can relate the results of ANY SMITE function with ANY other one. d) Multiple parameters The use of operators makes a complex query easy to read but you can use shortcuts. This is because SMITE functions can accept multiple parameters. Type SMITE> display code "kringle" or code "lyslact" then type SMITE> display code "kringle lyslact" You'll see that the result is the same. If the functions CODE, PCODE or ELEMENT are given multiple parameters there is an implied OR. This is sensible as an implied AND would result in nothing being selected by the CODE functions! The other SMITE functions have an implied AND. Type SMITE> display text "amylase" and text "calcium" then type SMITE> display text "amylase calcium" Again the results are the same. Only those entries which contain both words are selected. This implied AND is used by the TEXT, PTEXT and SEQ functions. e) Very complex queries SMITE allows very complex queries. These are characterised by having more than two operators in the query. As an example type... SMITE> display ptext "mouse" or (seq "iwg" and ptext "rat") to give... WORKLIST ENTRIES (7): AAMYLASE ALPHA-AMYLASE FAMILY SIGNATURE DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE FERREDOXIN PLANT FERREDOXIN SIGNATURE GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE KRINGLE KRINGLE DOMAIN SIGNATURE LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE This query says `get me all entries which contain both the sequence ile-trp-gly and the text rat in the ptext field PLUS those entries which contain mouse in the ptext field. Just like an arithmetic expression the parentheses tell SMITE in which order to perform its operations. Now type SMITE> display (ptext "mouse" or seq "iwg") and ptext "rat" to give.... DAGPE DIACYLGLYCEROL/PHORBOL-ESTER BINDING SIGNATURE FERREDOXIN PLANT FERREDOXIN SIGNATURE GPCRRHOD RHODOPSIN-LIKE GPCR SUPERFAMILY SIGNATURE KRINGLE KRINGLE DOMAIN SIGNATURE LYSLACT LYSOZYME/ALPHA-LACTALBUMIN SUPERFAMILY SIGNATURE SUGARTRAN SUGAR TRANSPORTER PROTEIN FAMILY SIGNATURE You can see how the position of the parentheses alters the meaning of the query. This one says `get me all entries which contain rat in the ptext field and also contain (either mouse in the ptext field or the peptide ile-trp-gly in the final motif set or both)'. It is strongly recommended that you use parentheses in very complex queries but you don't have to; the following query is valid SMITE> display ptext "mouse" or seq "iwg" and ptext "rat" Try it to see which of the previous two queries it resembles. The answer shows that very complex expressions in SMITE are worked out from right to left. Parentheses avoid confusion! The last two sections have shown how you can build up any arbitrarily complex query using SMITE. In normal use you would try and keep each query simple and therefore easy to understand. The use of LISTS, explained later, enables you to use several simple queries instead of one very complex query. f) Other display commands The two commands ISHOW and FSHOW allow you to display initial and final motif sequence blocks. They both have the same syntax. These commands do NOT accept a query but focus instead on those database entries that are alreday in the worklist. First of all get a single entry in the worklist by typing SMITE> display code "aamyl" This is a fingerprint with 5 elements. ISHOW and FSHOW typed on their own will show all the sequence motif blocks for this code. Type SMITE> ishow and SMITE> fshow to show this. These commands are made flexible by allowing a `range' to be specified. Type SMITE> ishow 4 This will show only motif 4 in the initial motif set. Now type SMITE> fshow 2-4 This will show motifs 2, 3 and 4 in the final motif set. Now type SMITE> ishow -3 This will show initial motifs 1, 2 and 3. Finally type SMITE> fshow 4- This will show final motifs from block 4 to the end i.e. motifs 4 and 5. The /OUTPUT qualifier can be used with these commands g) Other commands which can accept a query. These are the commands PLUSWORK and MINUSWORK. Just like DISPLAY they can accept simple, complex and very complex queries. PLUSWORK adds the results of a query to the worklist whereas MINUSWORK subtracts the results of a query. Unlike DISPLAY they do not show the contents of the worklist after completion. Try the following two sets of examples each of which are equivalent. SMITE> display code "kringle" or code "aamyl" is equivalent to SMITE> display code "kringle" SMITE> pluswork code "aamyl" SMITE> display similarly SMITE> display ptext "mouse" subtract ptext "rat" is equivalent to SMITE> display ptext "mouse" SMITE> minuswork ptext "rat" SMITE> display The final DISPLAY commands are just so you can confirm what is in the worklist. One interesting and useful feature of MINUSWORK is that, if typed without a query, it will clear the worklist. h) How LISTS make life easy Lists allow you to break up complex and very complex queries into simple steps. SMITE, like DELPHOS, contains two lists. You've already met the WORKLIST. The results of every query go into the worklist. The other list, the STORELIST, is provided for your benefit, use it! Several commands operate on these lists. STOREWORK: makes another copy of the worklist in the storelist RECALLWORK: makes another copy of the storelist in the worklist SWAPWORK: transposes the two lists ORLISTS: ORs the storelist with the worklist leaving the result in the worklist ANDLISTS: ANDs the storelist with the worklist leaving the result in the worklist XORLISTS: XORs the storelist with the worklist leaving the result in the worklist. WORKSAVE: saves the entry codes in the worklist to a file of your choice. WORKREAD: reads entry codes from one of your files into the worklist STORESAVE: same as WORKSAVE but uses the storelist STOREREAD: same as WORKREAD but uses the storelist As an example of how to use these lists consider the very complex query we used earlier i.e. SMITE> display (ptext "mouse" or seq "iwg") and ptext "rat" Try this again and then try the following series of commands which are equivalent SMITE> display ptext "mouse" SMITE> storework SMITE> display seq "iwg" SMITE> orlists SMITE> storework SMITE> display ptext "rat" SMITE> andlists SMITE> display To an experienced user this may seem like using a sledgehammer to crack a nut but the flow of operations is clearer for the novice. Next try SMITE> display code "kringle" SMITE> storework SMITE> minuswork SMITE> display SMITE> recallwork SMITE> display SMITE> display code "aamyl" SMITE> swapwork SMITE> display SMITE> swapwork SMITE> display to see the effects of moving lists around. To see the save and read operations in action try the following. SMITE> display code "kringle" SMITE> worksave MYWORK.DAT SMITE> minuswork (clear the worklist) SMITE> display (the worklist will be empty) SMITE> storeread MYWORK.DAT (load the data ino the storelist) SMITE> display (the worklist will still be empty) SMITE> recallwork SMITE> display (the worklist is restored) The above sequence could have been shortened by reading the file directly into the worklist by using WORKREAD. i) NEGWORK This command, as its name implies, negates the worklist. What that means is that, after executing this command, all entries which were in the worklist are removed and replaced by those entries which were NOT in the worklist before. As an example, the query SMITE> display not ptext "rat" could be replaced by the sequence of commands SMITE> display ptext "rat" SMITE> negwork SMITE> display j) How can I see what entries are in the PRINTS database? Try typing the following and then think how they work. SMITE> minuswork SMITE> negwork SMITE> display or SMITE> display element "<20" Answer: The first example clears the worklist using MINUSWORK and then negates the worklist. This guarantees all entries in the database will be in the worklist The second example relies on the fact that there won't be any fingerprints of 20 elements (or more!) in the database. k) EXTRACT The EXTRACT command allows you to extract the final motif sets from entries in the worklist into MOT files suitable for use by ADSP. Try typing SMITE> display code "lyslact" SMITE> extract and look at the files produced. l) Shortcuts Commands and qualifiers in SMITE may be abbreviated down to the point of no conflict with other instructions. For example SMITE> display/brief code "kringle" could be abbreviated to SMITE> d/b code "kringle" Also, if no command is given, the DISPLAY command is assumed therefore the following two queries are equivalent SMITE> display ptext "rat" SMITE> ptext "rat" In order to redisplay the worklist using qualifiers only the abbreviated qualifier is necessary so the following SMITE statements are equivalent SMITE> display/history SMITE> /h 1.2 Summary of SMITE commands, qualifiers, functions and operators a) Commands DISPLAY [qual] [query] default command to display the results of a query FSHOW [range] Display final motif blocks ISHOW [range] Display initial motif blocks STOREWORK Copy worklist to storelist RECALLWORK Copy storelist to worklist SWAPWORK Transpose worklist and storelist WORKSAVE [file] save worklist to a file WORKREAD [file] recreate worklist from a file STORESAVE [file] save storelist to a file STOREREAD [file] recreate storelist from a file ANDLISTS storelist AND worklist -> worklist ORLISTS storelist OR worklist -> worklist XORLISTS storelist XOR worklist -> worklist NEGWORK negate worklist PLUSWORK query Add results of query to worklist MINUSWORK [query] Subtract results of query from worklist EXTRACT Extract final motif sets to MOT files HELP brief help sheet BYE EXIT QUIT leave SMITE b) Functions CODE select a database entry code PCODE select a protein code (pcode) TEXT search general text PTEXT search pcode text SEQ search final motif set polypeptides ELEMENT select entries based on the number of elements in the fingerprint c) Operators AND perform a boolean AND OR perform a boolean OR XOR perform a boolean exclusive-OR ADD same as OR SUBTRACT perform a boolean AND NOT NOT negate the results of a function d) Qualifiers /AUTHOR display the fingerprint creator + date /CFI display the composite fingerprint index /COMMENT display comment information /FMOTIF display final motifs /HISTORY display scan history /IMOTIF display initial motifs /REFERENCE display bibliography /SUMMARY display summary information /TITLE display pcodes /TYPE display type of fingerprint /BRIEF show brief information /FULL show all information /INFO show context in PTEXT, PCODE and SEQ queries /OUTPUT=file redirect screen output to a file /PRINTER redirect screen output to the default system printer. 2.0 References 1. Attwood, T.K., Beck, M.E., Bleasby, A.J. and Parry-Smith, D.J. (1994) PRINTS - A database of protein motif fingerprints. Nucleic Acids Research, in press. 2. Attwood, T.K. and Beck, M.E. (1994) PRINTS - A protein motif finger- print database. Protein Engineering, 7 (7), 841-848. 3. Parry-Smith, D.J. and Attwood, T.K. (1992) ADSP - A new package for computational sequence analysis. CABIOS 8 (5) 451-459. 4. Bleasby, A.J., Akrigg, D. and Attwood, T.K. (1994) OWL - A non- redundant composite protein sequence database. Nucleic Acids Research, in press. 5. Bleasby, A.J. and Wootton, J.C. (1990) Construction of validated, non-redundant composite protein sequence databases. Protein Engineering 3 (3) 153-159. 6. Parry-Smith, D.J. and Attwood, T.K. (1991) SOMAP - A novel interactive approach to multiple protein sequence alignment. CABIOS 7 (2) 233-235. 7. Akrigg, D., Attwood, T.K, Bleasby, A.J., Findlay, J.B.C., Maughan, N.A., North, A.C.T., Parry-Smith, D.J., Perkins, D.N. and Wootton, J.C. (1992) SERPENT: An information storage and analysis resource for protein sequences. CABIOS, 8 (3), 295-296. 8. Perkins, D.N. and Attwood, T.K. (1994) VISTAS - A package for VIsualising STructures And Sequences of proteins. J.Mol.Graph., submitted. 2.1 Applications 1. Attwood, T.K. and Findlay, J.B.C. (1994) Fingerprinting G-Protein-Coupled Receptors. Protein Engineering, 7 (2), 195-203. 2. Attwood, T.K. and Findlay, J.B.C. (1993) Design of a discriminating fingerprint for G-protein-coupled receptors. Protein Engineering, 6 (2), 167-176. 3. Flower, D.R., North, A.C.T. and Attwood, T.K. (1993) Structure and Sequence Relationships in the Lipocalins and Related Proteins. Protein Science, 2, 753-761. 4. Boguski, M.S., Bairoch, A., Attwood, T.K. and Michaels, G.S. (1992) Proto-vav and Gene Expression. Nature, 358, 113. 5. Flower, D.R., North, A.C.T. and Attwood, T.K. (1991) Mouse oncogene protein 24p3 is a member of the Lipocalin protein family. Biochemical and Biophysical Research Communications, 180 (1), 69-74. ------ * ------