Tokenization rules used in FTS
Tokenization rules for non-literal fields
For non-literal fields, the following rules apply:
- Words are split at punctuation characters, and punctuation is removed. However, a dot that is not followed by white space is considered part of a token.
- one:two becomes one two (two words).
- Alpha#Omega becomes Alpha Omega (two words).
- x.y.z becomes x.y.z (one word).
- Words are split at hyphens unless the token contains a number, in which case the whole token is interpreted as a product number and is not split.
- x-y=z becomes x y z (three words).
- KX-13AF9 becomes KX-13AF9 (one word).
- Email addresses and Internet host names are recognized as one token.
- someone@bmc.com becomes someone@bmc.com (one word).
- www.bmc.com becomes www.bmc.com (one word).
- In words with no spaces, the ampersand (&) is retained.
- Smith&Brown becomes Smith&Brown (one word).
Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*