Aid4Mail’s powerful filtering capabilities, specifically in the Investigator and Enterprise editions, are designed to meet the exacting needs of professionals in email forensics and eDiscovery. This article provides an overview of Aid4Mail’s filter syntax, enabling you to efficiently search, analyze, and extract critical information from vast email datasets.
Please refer to Aid4Mail’s User Guide for in-depth coverage of the filter syntax and additional examples.
In the fields of digital forensics and legal discovery, the ability to precisely locate and extract relevant emails is paramount. Aid4Mail supports a multitude of email formats and can even carve emails from unknown file formats, uncompressed disk images, and forensically extracted disk space. However, determining which emails are relevant to a case requires sophisticated filtering.
Aid4Mail’s filtering feature allows you to:
Aid4Mail’s filter syntax is similar to Gmail and Microsoft 365’s, making it easy to learn and remember. However, Aid4Mail’s syntax is richer, offering capabilities that are on par with, or even exceed, those of other eDiscovery and forensics tools.
Where Aid4Mail’s syntax differs from Gmail and Microsoft 365’s, it has been highlighted in the relevant parts of this article.
Before delving into the specifics of Aid4Mail’s filter syntax, it’s important to understand a few key concepts:
For example, in an investigation into communication patterns, you might use a filter like:
Date>=2023 AND Type:Personal AND ("project alpha" OR "confidential acquisition")
This filter would find all personal emails (excluding newsletters, marketing emails, automated notifications, etc.) from 2023 onwards that mention “project alpha” or “confidential acquisition”. This could help isolate direct, person-to-person communications about sensitive topics, filtering out bulk emails or automated notifications that might use similar keywords.
As we progress through this manual, you’ll learn how to construct increasingly sophisticated filters to support your forensic analysis or eDiscovery efforts. Whether you’re searching for specific pieces of evidence, establishing communication patterns, or isolating relevant date ranges, Aid4Mail’s filter syntax provides the precision and flexibility you need for your professional investigations.
Search terms are the foundation of Aid4Mail’s filtering capabilities. They allow forensics experts and eDiscovery professionals to pinpoint specific content within large email datasets.
A search term is a word, phrase, or pattern that Aid4Mail looks for in emails. These can be simple keywords or more complex expressions using wildcards and operators. Search terms are used to identify emails containing specific information relevant to an investigation or legal case.
The order in which search terms appear in your query can significantly impact Aid4Mail’s processing speed. To enhance efficiency, place search terms with the smallest scope (e.g. date, sender, recipients, subject), and those that are less likely to be found, before those that are more common or covering a larger part of the email. For further details, refer to Best Practices and Tips at the end of this article.
Here are some examples of search terms tailored for email forensics and eDiscovery scenarios:
1. Simple keyword:
confidential
This will find all emails containing the word “confidential”.
2. Exact phrase:
"trade secret"
This will match emails containing the exact phrase “trade secret”.
3. Multiple terms:
lawsuit settlement
This will find emails containing both “lawsuit” and “settlement” (in any order).
4. Using wildcards (which we’ll cover in more depth later):
litigat*
This will match “litigation”, “litigate”, “litigator”, etc.
5. Combining concepts:
"insider trading" OR "market manipulation"
This will find emails containing either phrase.
6. Complex example:
Date:2022 AND (embezzle* OR fraud*) AND (account* OR financ*)
This search term would be useful in a financial crimes investigation, looking for emails from 2022 that mention embezzlement or fraud in relation to accounts or finances.
Understanding how to construct effective search terms is crucial for efficiently sifting through large volumes of email data. As we progress through this manual, you’ll learn how to combine these basic search terms with more advanced filtering techniques to create powerful, precise queries tailored to your specific investigative needs.
Wildcards are special characters that represent unknown or variable parts of a search term. They are particularly useful in forensics and eDiscovery when you need to account for variations in spelling, prefixes, suffixes, or unknown parts of an email address or domain.
Aid4Mail supports several types of wildcards, each with specific uses:
These wildcards match known variations of characters, including accented letters and ligatures.
Example: <e>vidence
Matches: evidence, évidence
These are crucial for proximity searching, allowing you to find words near each other within emails.
Example: fraud<5>report
Matches: “fraud” and “report” within 5 words of each other.
Example: bribe<.>official
Matches: “bribe” and “official” in the same sentence.
Example: insider<*>trading
Matches: “insider” and “trading” in the same paragraph.
1. Investigating various forms of financial misconduct:
embezzle* OR fraud* OR money<5>launder*
This search would find mentions of embezzlement, fraud, and money laundering, including variations of these terms.
2. Searching for potentially altered documents:
(modif* OR chang* OR alter*)<.>(document* OR file* OR record*)
This would find sentences mentioning modifications to documents, files, or records.
3. Identifying communication about insider information:
(insider<*>trading) OR (material<5>nonpublic<5>information)
This search would find paragraphs mentioning insider trading or discussions of material nonpublic information within close proximity.
4. Investigating international communications:
@*.?? OR @*.??? OR @*.????
This would match email addresses with two-letter (.us, .uk), three-letter (.com, .org), or four-letter (.info) top-level domains.
5. Searching for potential code words or deliberate misspellings:
th?ft OR fr??d
This could catch attempts to obfuscate discussions of theft or fraud.
Wildcards significantly enhance the power and flexibility of your searches, allowing you to cast a wider net in your investigations while still maintaining precision. They’re particularly valuable when dealing with large datasets where you may not know the exact phrasing used in relevant communications.
Search operators in Aid4Mail allow you to refine your searches by specifying which parts of an email to search or by setting specific criteria. These operators are crucial for targeted investigations and efficient eDiscovery processes.
These operators allow you to search within specific folders or types of folders.
Example in forensics:
FolderName:"Project X" AND Subject:confidential
This would search for confidential emails specifically within the “Project X” folder.
These operators allow you to filter emails based on the parties involved in the communication.
Example in eDiscovery:
Participants:ceo@company.com AND Subject:(merger OR acquisition)
This would find all emails involving the CEO that mention mergers or acquisitions.
These operators allow you to search within specific time frames.
Aid4Mail uses the International Date Format:
Note that double-quotes are required around dates that include the time, due to the space character. As in the example above, date/time search operators can use comparison operators:
To specify a date range in Aid4Mail, use two separate conditions combined with the AND operator. For example:
Sent>=2023-04-14 AND Sent<=2024-03-21
This would search for emails sent between April 14, 2023, and March 21, 2024, inclusive.
⚠ Unlike Gmail and Microsoft 365, you cannot use the “..” notation to define a date range.
You can also search for partial dates:
Sent:2023-06
This would match all emails sent in June 2023.
Received:2023
This would match all emails received in the year 2023.
Example in forensics:
Sent>=2023-01-01 AND Sent<=2023-03-31 AND From:suspect@company.com AND Subject:confidential
This would search for confidential emails sent by a suspect during the first quarter of 2023.
Additional date-related operators include:
These date and time operators are crucial in forensics and eDiscovery for establishing timelines, identifying patterns of communication, and focusing investigations on specific periods of interest.
These operators allow you to search within specific parts of an email.
Example in eDiscovery:
Subject:(contract OR agreement) AND AttachmentNames:*.pdf
This would find emails with “contract” or “agreement” in the subject that have PDF attachments.
⚠ Note that the parentheses in the example serve two purposes:
These operators search for emails with specific attributes.
Example in forensics:
Type:Personal AND Size>5M AND AttachmentNames:*.zip
This would find personal emails (not bulk messages) larger than 5MB with zip attachments, which could be relevant in data exfiltration investigations.
These operators allow you to search for specific emails by their unique identifiers.
Example in eDiscovery:
MessageId:<abc123@company.mail.com> OR MessageId:<def456@company.mail.com>
This would retrieve specific emails identified by their Message-IDs, which could be useful when following up on particular pieces of evidence.
You can also write the above example like this:
MessageId:{<abc123@company.mail.com>|<def456@company.mail.com>}
The curly braces “{}” and vertical tab “|” notation can be used to shorten scripts when combining multiple “OR” criteria. See the Aid4Mail User Guide for a more in-depth coverage of this syntax.
These search operators provide powerful tools for forensics experts and eDiscovery professionals to narrow down their searches and quickly locate relevant emails within large datasets. By combining these operators with search terms and wildcards, you can create highly specific and effective search queries.
Boolean operators are essential tools in Aid4Mail for combining or excluding search terms. They allow forensics experts and eDiscovery professionals to create complex, precise queries to pinpoint relevant emails within large datasets.
Aid4Mail also supports symbolic representations of Boolean operators:
These symbols can be used without spaces, making queries more concise:
Example: confidential+urgent-draft
This is equivalent to: confidential AND urgent AND NOT draft
The order in which Aid4Mail processes boolean operators is important:
⚠ Note that this order is different from Google/Gmail’s, which gives OR precedence over AND.
Example #1:
Subject:contract OR agreement AND Sender:john.doe@aid4mail.com
The “AND” operator has higher precedence than “OR”. As a result, this search query will match emails that have the word “contract” in the subject line, as well as emails from john.doe@aid4mail.com that have the word “agreement” anywhere in the email. In other words, this search query is equivalent to:
Subject:contract OR (agreement AND Sender:john.doe@aid4mail.com)
Example #2:
Subject:(contract OR agreement) AND Sender:john.doe@aid4mail.com
This query will match emails sent by john.doe@aid4mail.com that have either “contract” or “agreement” in the subject line.
⚠ Understanding this order is crucial for creating accurate, complex search queries.
1. Investigating potential insider trading:
(("insider information" OR "material nonpublic") AND (trade OR stock OR share)) AND NOT (newsletter OR "press release")
This query looks for discussions of insider information or material nonpublic information in relation to trades, stocks, or shares, while excluding newsletters and press releases.
2. Examining communication patterns in a fraud case:
Date>=2023-01-01 AND Date<=2023-06-30 AND (From:suspect@company.com OR To:suspect@company.com) AND (money OR payment OR transfer)
This query focuses on emails to or from a suspect, involving financial terms, within a specific six-month period.
3. Identifying potential data breaches:
("data leak" OR "information breach" OR "unauthorized access") AND (customer OR client OR patient) AND NOT (drill OR test OR exercise)
This search looks for mentions of data breaches involving customer, client, or patient information, while excluding emails about security drills or tests.
4. Investigating intellectual property theft:
("trade secret" OR patent OR copyright) AND (steal OR theft OR misappropriate OR "unauthorized use") AND NOT legal
This query searches for discussions about stealing or misusing intellectual property, excluding emails that might be from the legal department discussing these terms in a different context.
5. Examining conflicts of interest:
("conflict of interest" OR "competing interest" OR "personal benefit") AND (disclosure OR report OR notify) XOR conceal
This complex query looks for emails discussing conflicts of interest in relation to disclosure or reporting, or alternatively, concealment of such conflicts, but not both simultaneously.
By leveraging these Boolean operators and understanding their order of precedence, forensics and eDiscovery professionals can craft highly specific search queries. This allows for efficient filtering of large email datasets, helping to quickly identify the most relevant communications for an investigation or legal proceeding.
Proximity searching allows you to find words or phrases that appear near each other in an email. This is crucial for identifying relevant conversations and context in investigations.
Example in forensics:
(trade<5>secret) AND (steal<.>proprietary)
This query would find emails discussing trade secrets within five words of each other, in the same email as discussions of stealing proprietary information within the same sentence.
Searching by an email type can be significantly more powerful than it may initially seem. Three common examples of this are:
Deduplication is the process of eliminating duplicate emails from your search results. This is crucial for efficiency in large-scale investigations.
To skip duplicates in Aid4Mail, use the following search term:
NOT Type:Duplicate
or its shorter form:
-Type:Duplicate
Example in eDiscovery:
(contract OR agreement) AND NOT Type:Duplicate
This would find unique emails about contracts or agreements, eliminating duplicates to streamline review.
Unpurged mail refers to emails that have been deleted but not permanently removed from the system. These can be crucial in forensic investigations.
To include only unpurged mail:
Type:Unpurged
To exclude unpurged mail:
NOT Type:Unpurged
or
-Type:Unpurged
Example in forensics:
Type:Unpurged AND Date>=2023-01-01 AND Date<=2023-06-30 AND From:suspect@company.com
This would search for unpurged emails from a suspect within a specific date range, potentially uncovering deleted evidence.
Aid4Mail classifies emails as personal based on specific criteria, separating them from newsletters, marketing emails, and automated notifications.
To include only personal emails:
Type:Personal
To exclude personal emails:
NOT Type:Personal
or
-Type:Personal
Example in eDiscovery:
Date>=2023 AND Type:Personal AND (confidential OR proprietary)
This query focuses on personal communications (excluding bulk emails) that mention confidential or proprietary information from 2023 onwards.
Email-type search terms can be combined for highly targeted results:
Date>=2023-01-01 AND Date<=2023-06-30 AND Type:Unpurged AND NOT Type:Duplicate AND Type:Personal AND (insider<5>trading OR material<.>nonpublic) AND (From:executive1@company.com OR From:executive2@company.com)
This complex query:
These advanced filtering techniques allow forensics and eDiscovery professionals to create highly targeted searches, improving the efficiency and effectiveness of their investigations. By combining filtering techniques, investigators can quickly isolate the most relevant communications from large email datasets, even when dealing with deleted items or attempting to separate personal communications from automated messages.
Tokenization and stemming are advanced linguistic processing techniques that Aid4Mail employs to enhance search capabilities. These features are particularly valuable for forensics and eDiscovery professionals dealing with large volumes of email data where variations in language use can be critical to investigations.
Tokenization in Aid4Mail is the process of recognizing and matching similar lexical units (both characters and whole words) within text. It can be turned on or off using the “Tokenize” option, located directly under the “Search Query” field in the “Item filtering” settings.
1. Matching variations in company names:
Acme Corp
This could match “Acme Corp”, “Acme Corp.”, “ACME CORP”, etc.
2. Handling different spellings:
naïve
This would match both “naive” and “naïve”.
3. Dealing with punctuation variations in sensitive terms:
trade-secret
This could match “trade secret”, “trade-secret”, “tradesecret”.
Stemming finds words with the same root as the specified word. A dictionary for stemming can be set in the “Filters” section of the “Project Settings”.
1. Investigating financial misconduct:
embezzle~
This would match “embezzle”, “embezzled”, “embezzling”, etc.
2. Searching for communication about financial irregularities:
steal~
This would find “steal”, “stole”, “stolen”, etc.
3. Identifying discussions about legal matters:
sue~
This could match “sue”, “sued”, and “suing”.
When both tokenization and stemming are enabled, Aid4Mail provides powerful linguistic processing capabilities. It handles variations in spelling, punctuation, and word forms, increasing the chances of finding relevant information even when the exact phrasing is unknown.
By effectively leveraging tokenization and stemming, forensics and eDiscovery professionals can create more comprehensive and nuanced search strategies, potentially uncovering relevant communications that might be missed by exact-match searches alone. However, it’s crucial to balance the broader reach these techniques provide with the precision required in legal and investigative contexts.
Search lists in Aid4Mail are a powerful feature that allow forensics and eDiscovery professionals to manage and reuse large sets of search terms efficiently. They are particularly useful when dealing with complex investigations involving numerous keywords, email addresses, EDRM MIH values, or other identifiers.
{SearchList=KeywordList.txt}
{SearchList=C:\Investigations\Case123\KeywordList.txt}
Search lists support various advanced search techniques:
These techniques allow for powerful and flexible searches directly from your search lists.
Modify search list behavior by adding properties to the first line, prefixed with an exclamation mark (!). If a property is absent, its default value will be used. Available properties include:
Example of a search list file with properties:
!CASE=SENSITIVE OPERATOR=OR STEMMING=NO Trade Secret Proprietary Information Confidential Data
1. List of Suspects:
Subject:(confidential OR classified) AND {SearchList=SuspectEmails.txt}
Searches for emails from suspects with specific subject terms.
2. Industry-Specific Terminology:
Type:Personal AND {SearchList=FinancialTerms.txt}
Finds personal emails containing financial terms from your list.
3. Code Names in Corporate Investigations:
From:executive@company.com AND {SearchList=ProjectCodeNames.txt}
Searches for executive emails mentioning project code names.
4. Combining Multiple Lists:
Date>=2023 AND ({SearchList=SuspectEmails.txt}) AND ({SearchList=SensitiveKeywords.txt})
Combines lists of suspects and sensitive keywords, focusing on recent communications.
5. Exclusion List:
(trade secret OR confidential) AND NOT {SearchList=ExclusionTerms.txt}
Finds emails about trade secrets or confidential information, excluding certain terms.
By leveraging search lists effectively, forensics and eDiscovery professionals can create more flexible, maintainable, and powerful search strategies in Aid4Mail. This enhances the ability to efficiently process large volumes of email data and adapt to the specific needs of each investigation.
This is only a brief overview of what you can do with search lists. Please refer to the Aid4Mail User Guide to learn more.
1. Query Structure: For optimal results, structure your search queries in the following order.
a. Start with date restrictions (Date:, Received:, Newer_than:, etc.) and folder restrictions (In:, Label:, FolderName:).
b. Follow with metadata filters (Is:Replied, Type:Personal, -Type:Duplicate).
c. Then header field searches (From:, To:, Subject:).
d. Next, whole-header and message-body searches (Header:, SenderMessage:, Message:).
e. Finally, broad searches that include the contents of attachments.
Note that you should always place broad content searches, especially those using proximity operators, last.
Example:
Date>=2023-01-01 AND Date<=2023-12-31 AND Type:Personal AND NOT Type:Duplicate AND From:executive@company.com AND (confidential<10>project)
2. Use Folder Filters: If you’re only interested in specific folders, use folder filters early in your query to reduce the dataset.
3. Leverage Search Lists: For complex investigations with many search terms, use search lists to manage your queries more efficiently.
4. Incremental Searching: Start with broader searches and progressively narrow down results, especially in large datasets.
1. Segmented Approach: For extremely large datasets, consider breaking your search into segments (e.g., by date ranges or sender domains).
2. Use of Exclusions: Sometimes it’s more efficient to exclude irrelevant data first before searching for relevant content.
Example:
NOT Type:Newsletter AND NOT From:*@marketing.com AND (confidential OR proprietary)
3. Deduplication: Always consider using NOT Type:Duplicate to eliminate redundant data, unless you specifically need to examine duplicates.
1. Custodian Focus: Use the From: and To: operators effectively to focus on key custodians.
Example:
(From:custodian1@company.com OR To:custodian1@company.com) AND Subject:(contract OR agreement)
2. Privileged Content: Develop robust search lists for identifying potentially privileged content early in the process.
3. Date Ranges: Be precise with date ranges to align with the scope of discovery requests.
4. Iterative Process: Work closely with legal teams to refine searches based on initial results and emerging case strategy.
By following these best practices and tips, forensics and eDiscovery professionals can maximize the effectiveness of their Aid4Mail searches, ensuring more thorough, efficient, and defensible investigations. Remember that the key to successful email analysis often lies in the iterative refinement of search strategies based on initial findings and evolving case requirements.
This article has provided an overview of the most important elements of Aid4Mail’s filter syntax. But there’s much more, including powerful, time-saving features like native pre-acquisition filtering. For full details, please refer to the Aid4Mail User Guide.