eDiscovery Keyword Search Expression Options

Gimmal Discover provides a very robust and powerful keyword searching engine to help identify documents and emails as part of the policy enforcement and eDiscovery searching processes. The engine supports the use of simple pattern matching and Boolean search capabilities as well as more powerful features such as Regular Expression (Regex) matching and proximity processing. This document describes the many keyword search options that are supported in Gimmal Discover and provides some examples of how to properly format a search expression using these options.

Keyword Criteria Options

  • Word List Any - Acts as a Boolean OR, add item to the search results If any of the listed keywords are found, (i.e.  Huck OR Tom OR Becky)
  • Word List All - Acts as a Boolean AND, requires that all the listed words or phrases must be found in order for an item to be included in the search results (i.e. Huck AND Tom AND Becky)
  • Search Expression - Allows the administrator to enter a logical expression for keyword AND/OR logic) usage. See Search Term Clarification for more detail.


Word List Rules

When using a Word list (Any or All), keep in mind the following rules are used for evaluating the entries:

  • Keywords or Phrases should be entered one word or phrase per line.
  • Quotes are not needed for multi-word phrases. Each item is delineated by the line breaks.
  • Wildcards may be used in words or phrases. Gimmal Discover will automatically recognize wildcards as a Like Pattern. If you wish to search for a literal word, one that has a wildcard as a character you wish to search, put the word or expression in quotes.
  • Regular Expressions may be used as part of the keyword list. Make sure to use the proper syntax (i.e. RegEx ("expression").
  • Proximity operators (e.g. raft NEAR(3) Mississippi) maybe used in a Word List .


Search Expression Rules

A search expression uses Boolean logic to enter a more complex set of keyword criteria. When a search expression is used, follow these guidelines:

  • Valid Boolean operators are: AND, OR, AND NOT
  • Long search expressions will word-wrap around the text box, do not press enter to manually move to the next line while typing long search expressions as this could cause an error in logic translation.
  • Multi-word phrases need to be surrounded by double quotes, e.g. "Statue of Liberty"
  • Double quotes (") should surround words or phrases with punctuation including spaces, hyphens, parenthesis or commas, e.g. "100-234".
  • Be sure to use parenthesis to group search expressions, e.g. ((blue OR green) AND (red or yellow)) OR (lions AND tigers AND bears)
  • Proximity operators can be used in search expressions, e.g. (bank* NEAR asset* AND loan NEAR payment)
  • Gimmal Discover will automatically consider items with wildcards to be valid Like Patterns, unless that word or phrase is in double quotes. Therefore, day will find today or days, while "day" will find only the exact phase day with the asterisks included.

Boolean Operators

The logical operators AND, OR, AND NOT and NEAR are used to further clarify keyword criteria when simple lists are not precise enough.
Reserved Word: AND
Narrows your criteria by only returning hits which match all the keywords, phrases or conditions
Reserved Word: OR
Expands your criteria by returning hits which match any of the keywords, phrases or conditions.
See the examples below for how these operators are used:
Scenario fence AND paint fence OR paint

Fence and paint both appear in the document

Yes

Yes

Only fence appears in the document

No

Yes

Only paint appears in the document

No

Yes

Neither fence nor paint appears in the document

No

No

Please note: Boolean operators may also be used with Negative Criteria (e.g. AND NOT).

Proximity Searching (NEAR)

Gimmal Discover provides a keyword option that allows a choice where two or more words must all be present in a document, but cannot be more than a set distance apart. This is referred to as Proximity Searching. For example, a search expression of Romeo NEAR(5) Juliet must find Juliet within 5 words of Romeo to be considered a hit.
The standard syntax for proximity searching is: KeywordA NEAR(#) KeywordB. The NEAR reserved word delineates which word or phase will be tied by a certain distance to the other. Customize the distance by entering the number of word distance after the reserved word in parenthesis, e.g. John NEAR(6) Smith will find John if it is within six words of Smith, regardless of direction. By default, the distance is three, so John NEAR(3) Smith can be written as John NEAR Smith.
The proximity feature is particularly helpful for returning hits where specific keywords can be found close to one another, regardless of order. For example, if you wanted to locate a name such as "Jane Beth Doe", instead of having "Jane Doe", "Jane B. Doe", "Jane Beth Doe", "Doe, Jane" etc. you can used proximity searching to limit your keyword to one phrase: Jane NEAR Doe.

Grouped Proximity Expression

This type of proximity search allows the user to find multiple words near the base words. When using multiple words, they must be delineated using curly brackets, i.e. { and }.
Here are some examples of possible syntax options for multi-word proximity:

  • KeywordA NEAR(#) {KeywordB OR KeywordC OR KeywordD} – Any of the words B, C or D can be found within the # of KeywordA to match.
  • KeywordA NEAR(#) {KeywordB AND KeywordC AND KeywordD}All of the words B, C or D must be found within the # of KeywordA to be a match.
  • {KeywordA OR KeywordB OR KeywordC} NEAR(#) {KeywordX OR KeywordY OR KeywordZ} -Any words on the right side are found near any of the words on the left side of the proximity expression.
  • {KeywordA AND KeywordB AND KeywordC} NEAR(#) {KeywordX AND KeywordY AND KeywordZ}Any words on the right side are found near any of the words on the left side of the proximity expression.

Please note: When using Boolean operators as part of a grouped proximity expression in a keyword list, they must match the type of list. You can only use 'OR' when in an 'ANY' list. 'AND' can only be used in a
"ALL' list.

Multiple Proximity Expressions

NEARS can be combined together to form a set of conditional nears. Gimmal Discover supports a variety of syntax for multiple nears including grouped, directional and with additional AND/OR operators:

  • KeywordA NEAR(#) KeywordB NEAR(#) KeywordC
  • {KeywordA OR KeywordB} NEAR(#) {KeywordC} NEAR(#) {KeywordX OR KeywordY}
  • {KeywordA AND KeywordB} NEAR(#, BEFORE) {KeywordC OR KeywordD} NEAR(#) {Key-wordX AND KeywordY}

Directional Proximity Searches (BEFORE, AFTER)

Gimmal Discover supports using directional indicators in proximity. The syntax is KeywordA NEAR(#, BEFORE) KeywordB, or KeywordA NEAR(#, AFTER) KeywordB.The syntax can also be shortened using just the first letter, e.g. KeywordA NEAR(#, B) KeywordB or KeywordA NEAR(#, A) KeywordB

Proximity Hints and Tips

  • Unlike the Boolean operators (AND, OR, NOT), the NEAR reserved word can be used in either a Keyword List or in a Search Expression.
  • Multi-Word phrases are valid in Proximity expressions, but quotes must be used: "A Connecticut Yankee in King Arthur's Court" NEAR "To Be or Not to Be"
  • Pattern matching (including standard wildcards) can be used in proximity expressions: Adventure* NEAR Sawyer
  • Be sure to use the "curly" brackets { and } for any types of grouping within a proximity expression. Standard parenthesis (and) are used for grouping within a search expression and syntax error will occur if they are interchanged. You can, however, group proximity expressions within a search expression, for example: (market NEAR(3) {stock OR bond} AND {bank OR CD} NEAR(4) {loan OR borrow}) OR finance NEAR {job OR position} OR (discount* NEAR invest* NEAR loan* AND (Acme OR Widget))

Complex Search Expressions

The Boolean operators can be combined to form more complex expressions, with each expression separated by an OR. For example, you could enter the following: (fence AND paint) OR (river AND boat)
Use regular parenthesis (i.e. "( "and ")") to delineate an expression. Expressions can be nested, and each nesting must have its own grouping with parenthesis. In the example below, notice how the parenthesis match the various groupings:
(fence AND paint) OR (river AND boat) OR (cave AND (hideout OR "hide out" OR "hide-out" OR "hideaway"))
The Proximity Operator can also be used as part of a complex expression:
(fence AND paint) OR (river AND boat) OR Yankee NEAR(5) court

Using Quotes with Keywords

The rules for using single and double quotes with keywords depend on the type of keyword search being performed.

Keyword List (Any or All)

The search engine will see the entry of a single line of a word list as a phrase to search, regardless of spaces or any internal punctuation. Quotes surrounding the expression are not required. However, if included, double quotes will be ignored if surrounding the entire phrase. Thus, in a word list, the following are equivalent:
"To be or not to be" To be or not to be

Keyword Search Expression

Quotes should be used for all multi-word phrases or items with delineating characters including spaces, parenthesis or commas. When in doubt surround the phrase with double quotes, or a syntax error could occur. For example, the following search expression: Profit OR (1,234,567 AND 7654321) will throw a syntax error. To get this expression to resolve correctly use the following: Profit OR ("1,234,567" AND 7654321)

Items with Double Quotes

If you are searching for a keyword that contains double quotes, keep in mind the following (for both the keyword list and search expression). For one double quote (e.g. 4" to identify inches), no further formatting is required. However, if you are searching for something that is wrapped in double quotes, (e.g."Wow" ), then you must wrap the phrase in single quotes, i.e. ' "Wow" '.

Items with Single Quotes

If you are searching for items with a single quote, no further formatting in required for a word list. For example, the following will all resolve correctly:

  • O'Connor
  • partners' AND accounts
  • "partners' accounts"

However, when using single quotes in a search expression, they must be contained in double quotes:

  • "O'Connor"
  • "partners'" AND accounts
  • "partners' accounts"

However, in the example above, it would be a better idea to use a Like Pattern, e.g. Like ("partner* account*") to find all possible permutations of the criteria rather than relying on quotes.
Please note:

  • If you are using a search expression with one of the reserved words as a keyword, be sure to place it within quotes or use a Like Pattern.
  • Smart Quotes, also known as 'Curly' quotes, are directional quotes often inserted by Microsoft Word. To prevent confusion, Discover will automatically convert Smart Quotes to straight quotes when they are found in keyword criteria.

Negative Criteria

Sometimes with keyword criteria it can be as important to avoid certain words as is it to find others. Gimmal Discover has a number of options for the so called negative criteria.

AND NOT

Narrows your criteria by only returning hits where one search term is in a document while the other is NOT in the document.
For example, a search expression of fence AND NOT paint must find fence but cannot find paint for the document to be considered a hit. This syntax is very tricky, especially when used with parenthesis. See Examples below:
a) fence AND NOT paint

Scenario

Match

Logic

Fence and paint both appear in the document

No

True AND False = False

Only fence appears in the document

Yes

True AND True = True

Only paint appears in the document

No

False AND False = False

Neither fence nor paint appears in the document

No

False AND True = False


b) NOT (fence AND paint) i.e. NOT fence OR NOT paint

Scenario

Match

Logic

Fence and paint both appear in the document

No

False OR False = False

Only fence appears in the document

Yes

False OR True = True

Only paint appears in the document

Yes

True OR False = True

Neither fence nor paint appears in the document

Yes

True OR True = True


c) NOT (fence OR paint) i.e. NOT fence AND NOT paint

Scenario

Match

Logic

Fence and paint both appear in the document

No

False AND False = False

Only fence appears in the document

No

False AND True = False

Only paint appears in the document

No

True AND False = False

Neither fence nor paint appears in the document

Yes

True AND True = True


NOT NEAR

This criteria is an inverse of the Proximity Operator used to find a word or phrase as long as it is not located in proximity to another word or phrase. Like the NEAR, this expression can also take a numeric value indicating the number of words within the second word or phrase should not appear. The syntax is KeywordA NOT NEAR(#) KeywordB. For Example, "income tax" NOT NEAR(2) "personal" will find all instances of "income tax" where it is not within two words of "personal".
The NOT NEAR is very useful as it can also contain directional indicators KeywordA NOT NEAR(#, BEFORE/AFTER) KeywordB.

EXCLUDE

This operator refines searches by locating keywords that match as long as it does not match a specified full excluded expression. It helps to find specific words that are needed without bringing back false positives based on specific usage of the word or phrase. EXCLUDE can be used when a 'BUT NOT' is needed. The syntax is KeywordA EXCLUDE KeywordA [Rest of Expression].
For example, the following expression Confiden** EXCLUDE "Confidential Statement" would match all instances of Confiden* (including Confidence, Confidential etc.) as long as it was not part of the phrase Confidential Statement.
Please note:

  • For the EXCLUDE to be valid, the first keyword or phrase (KeywordA), must be part of the second portion of the expression.
  • Similar to the proximity statement, EXCLUDE can accept an integer to describe the range that this comparison should be made within. The default is5. For example, if you want to search for "United States" but you did not want to find "United States of America Presidential Election of 1994", you would need to explicitly input a value of 6 for the range. Since the phrase being excluded is larger than the default range, it is impossible to match the entire expression within 5 words of the original keyword phrase. To be valid, the following should be used:
    • "United States" EXCLUDE (6) "United States of America Presidential Election of 1994"

Comparing Negative Operators

Operator

Scope

Usage

AND NOT

Entire Document

Should be avoided due to lack of precision and validation

NOT NEAR

Within NEAR(#) range

Very flexible, can be used in a variety of instances

EXCLUDE

Within EXCLUDE(#) range

Wherever a 'BUT NOT" is required, i.e. this word, BUT NOT if is a specific usage of the word


List of Reserved Words

The following is a list of reserved words in Gimmal Discover. If any of these words are used as part of keyword criteria, they must be enclosed in double quotes. These words are not case sensitive.

Word

Description

Example

LIKE

Used to define a pattern in keyword criteria. Often hidden from the user, but always used by the engine when a wildcard is used.

LIKE ("comp*")

AND

Boolean operator indicating that all keyword or expressions on either side of the operator must be found

blue AND green

OR

Boolean operator indicating that any keyword used with this operator could be found

blue OR green

NOT

Boolean operator used to negate an expression. Can be used with AND or NEAR

blue AND NOT green
blue NOT NEAR green

NEAR

Used to define proximity of first word or phrase near a second word or phrase

bank NEAR loan

EXCLUDE

Used to find keywords that match the first partial expression so long as it does not match the full excluded expression

Confidential EXCLUDE
"confidential statement"

BEFORE

Used with the NEAR operator to define proximity only for items for items found in one direction (left side of expression must come first)

cat NEAR(3, BEFORE) hat

AFTER

Used with the NEAR operator to define proximity only for items for Items found in one direction (left side of expression must come second)

cat NEAR(3, AFTER) hat

REGEX

Used to define a regular expression in keyword criteria

RegEx("\d\d\d[- ]\d\d[- ]\d\d\d\d")

PATTERN

Used to find specific information that is found well defined formats. It is used in conjunction with CC (for finding credit cards) and SSN (for finding social security numbers)

PATTERN(CC)

PATTERN(SSN)


Pattern Matching

Gimmal Discover provides options for users to enter expressions which are more complex than a standard exact search match. Patterns can be used to help evaluate keyword, address, file name and folder criteria.
Two types of Pattern Matching: Like Patterns and Regular Expressions (RegEx) are supported by the search engine.

Like Patterns (Wildcards)

Whenever you use a wildcard to expand a keyword search term, you are actually using a Like Pattern. Gimmal Discover automatically evaluates any word or phrase containing wildcards as a Like Pattern for use in the engine. However, the user does not need to enter exact syntax (i.e. the Like ("") portion) into the wizard. For example, use of day is valid as is the equivalent syntax Like ("day"). The syntax for a Like Pattern is Like ("expression") where the expression is the word or phrase, containing wildcards, you wish to evaluate.

Supported Wildcards

*Matches none, one or more characters.
?Matches any single character
#Matches any digit
[,]Matches a range or set of characters or numbers
All the wildcards are reserved and will be translated as a pattern. If you wish to use one of the wildcards as a literal match, be sure to put it in double quotes, e.g. "# sign" will find a hit in the phrase Press the # sign for more options.
Examples

Expression

Matches

bicycl*

bicycle, bicycles, bicycling

river?boat*

river boat, riverboat, river boats, river boating

Version 3.#

Version 3.0, Version 3.1, Version 3.101


Regular Expressions (RegEx)

In addition to the standard wildcard support with Like Patterns, Gimmal Discover also supports the complex structured pattern language of Regular Expressions (also referred to as RegEx, or GREP). Regular Expressions are very helpful when trying to match patterns that cannot be done with Boolean operators such as account numbers, credit card numbers, social security or national insurance numbers.
This help document is not intended to teach you how to use Regular Expressions, a certain amount of knowledge is assumed before you can use them in a search. There are entire books written on the subject and innumerable sources to be found on the Internet to help you create the ideal Regular Expression.
Gimmal Discover uses the .Net implementation of RegEx. The syntax is: RegEx("expression") where RegEx("") tells the search engine to analyze and return items that match the expression within the quotes.

Examples

Expression

Finds

RegEx("\b(?!000)([0-6]\d{2}

7([0-6]\d

7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}\b")

Social Security numbers (just the numbers using allocated limits)

RegEx("\d\d\d[- ]\d\d[- ]\d\d\d\d")

Social Security numbers (with ### ## #### or ######### pattern)

RegEx("(cc

credit(\s{0,3}card)?)[\D]{0,60}(\d{4}([\D]?\d{4}){3}
([\D]?\d{3})?

\d{4}[\D]?\d{6}[\D]?\d{5}([\D]?\d{4})?)")

Credit Card patterns with several allocated numbers.


PATTERN Reserved Words

Gimmal Discover has a method for finding certain predefined formats such as credit card and social security numbers in text using the PATTERN reserved word. When used as a keyword, the PATTERN(CC) or PATTERN(SSN) uses a regular expression combined with programmatic testing (including the Luhn algorithm in the case of credit cards) to find matching hits while reducing (but not necessarily eliminating) false positives.

Expression

Finds

PATTERN(SSN)

Social Security Numbers (just the numbers using allocated limits)

PATTERN(CC)

Credit Card numbers