EDRM DupeID and MIH Support in Aid4Mail

Aid4Mail is the leading email processing tool for eDiscovery and digital forensics and offers extensive support for the EDRM Message Identification Hash (EDRM MIH) and Duplicate Identification (DupeID) specification.

Trusted by legal professionals, Aid4Mail quickly and accurately collects, searches, and converts email from numerous sources. It excels at processing large datasets, recovering deleted messages, finding specific emails, and outputting to review-friendly formats.

Cross-Platform Email Deduplication

Incompatible Proprietary Methods

In a digital forensics or eDiscovery project involving multiple custodians and various email platforms, duplicate email messages can significantly increase the volume of data to be reviewed. Specialized products have offered deduplication across their own datasets for many years. However, these used proprietary techniques, making it impossible to deduplicate across datasets from different vendors.

Historically, the only solution to this was to reprocess all data with a single platform to ensure compatibility, a time-consuming and economically inefficient process. This was the problem that the Electronic Discovery Reference Model (EDRM) set out to solve.

The DupeID Project and EDRM MIH

In February 2023, the EDRM Duplicate Identification (DupeID) project proposed a standardized, cross-platform method for generating unique identifiers for email messages. It produces a value they call the EDRM Message Identification Hash (EDRM MIH).

The EDRM MIH is simple but effective—an MD5 hash calculated from the Message-ID field found in email SMTP headers. (MD5 is a widely-used cryptographic function that produces a 128-bit hash value, and is commonly used to verify the integrity of files and generate unique identifiers for data.)

The EDRM MIH is intended to exist alongside proprietary deduplication techniques rather than to replace them. It allows messages to be consistently identified across systems for purposes like deduplication, selection, and cross-referencing in the context of eDiscovery and digital forensics.

EDRM MIH and Duplicate Identification in Aid4Mail

Since its initial release in 2005, Aid4Mail has offered email deduplication based on the algorithm that, almost twenty years later, would be adopted by the EDRM. Starting with the February 2024 release of Aid4Mail 5.1.6, it now offers extensive support for the EDRM MIH and DupeID specification, not only for deduplication but also in metadata extraction, search and filtering, file naming, and review document content.

Email Deduplication Based on EDRM MIH Values

Identifying duplicate emails within a single dataset greatly reduces the need for manual review. Deduplication across datasets can improve efficiency exponentially. This has been possible with Aid4Mail since version 1.0. However, now with the EDRM Duplicate Identification standard and many more vendors supporting it, the future is bright for forensic examiners and eDiscovery professionals who want to produce faster and better results.

For incoming emails with a Message-ID value, Aid4Mail’s deduplication method is identical to the EDRM Duplicate Identification specification. For emails lacking a Message-ID value, typically drafts and outgoing messages, the EDRM MIH will result in a null value. To enable deduplication, a different approach is required so Aid4Mail generates an MD5 hash from other metadata values in the email header. When possible, Aid4Mail uses a concatenation of the sender, date, and subject as the MD5 hash source: MD5(sender + date + subject). If any of those values are missing, Aid4Mail uses the entire header: MD5(email header).

To obtain the email sender, date, and subject, Aid4Mail can draw from multiple email header fields which it checks in a specific order:

  • The sender is taken from the From field if available, otherwise from the Sender field, otherwise from the Reply-To
  • The date is taken from the Date field if available, otherwise from the most recent (topmost) Received
  • The subject is taken from the Subject

The result is fast and reliable deduplication that, for incoming emails with a Message-ID, also works cross-platform across disparate vendors that support the EDRM MIH.

Extracting EDRM MIH Values as Metadata

With Aid4Mail, you can extract almost any metadata field from emails. Save them to CSV or XML formats for easy comparison and deduplication across platforms. You can also extract a variety of email hashes, including the EDRM MIH. This is done by selecting the EDRM.MIH token in Aid4Mail’s Column Configuration Editor.

Search and Filtering on EDRM MIH Values

Aid4Mail Investigator and Aid4Mail Enterprise include very sophisticated search and filtering capabilities. They allow you to search not only email content and metadata but also attachments and embedded files. You can create complex search queries using:

  • Fielded searches
  • Boolean constructs
  • Comparison operators
  • Proximity searches (including some unique ones!)
  • Wildcards and Regular Expressions
  • Multilingual stemming and tokenization
  • And more…

The MIH operator, one of Aid4Mail’s fielded search operators, enables you to find specific EDRM MIH values. This is potentially very powerful. For example, use a list of EDRM MIH values from another vendor’s dataset as a search list in Aid4Mail. This would allow you to reject any matches (deduplication), accept them, or apply further conditions to processing.

In addition, because Aid4Mail Investigator and Enterprise support multiple, concurrent instances of the processing engine, you could deduplicate a whole project against datasets from several vendors all at once!

Note that access to the MIH search operator requires the Investigator or Enterprise edition of Aid4Mail. The Converter edition only supports basic filtering.

File Naming Using EDRM MIH Values

When converting emails to .eml and .msg formats, Aid4Mail can create file names based on EDRM MIH values. This ensures file names are consistent and limited in length. It also enables files to be identified, grouped, sorted, or even deduplicated, based on their MIH value, using operating system tools like File Explorer in Windows or Finder in macOS.

Aid4Mail uses the EDRM MIH specification for any email that contains a Message-ID value. However, for drafts, outgoing messages, and incoming emails lacking a Message-ID value, Aid4Mail instead generates an MD5 hash from other metadata in the email header. This is to avoid a null EDRM MIH value and uses the same algorithm that is detailed in Deduplication Based on EDRM MIH Values, above.

To name .eml and .msg files in this manner, simply select the Use MD5 signature option in Aid4Mail’s File name field.

Inserting EDRM MIH values into review documents

In documents submitted for evidence or review, it can be useful to include an email’s EDRM MIH value directly. This can help eliminate confusion and speed up the review process.

When Aid4Mail creates review documents in PDF, HTML, or plain text formats, it allows you to select exactly which parts of the email to include and where to place them in the document. This includes the EDRM MIH value which can be inserted as an X-EDRM-MIH email header field. To do this, simply select that field in Aid4Mail’s Email Header Configuration Editor.

X-EDRM-MIH-Email-Header-Configuration-Editor

Experience the Difference with Aid4Mail and EDRM MIH

By using Aid4Mail’s EDRM MIH features, legal teams, digital forensic examiners, and eDiscovery professionals can streamline their projects, minimize the risk of overlooking duplicate emails, and ultimately save time and costs associated with reviewing redundant data.

See for yourself how Aid4Mail can transform your email forensics and eDiscovery workflows. Download the free Aid4Mail trial today and unlock a new level of efficiency and precision.

To learn more about Aid4Mail’s capabilities, explore the following resources:

With Aid4Mail, you’re not just choosing a tool—you’re transforming the way you handle email evidence. Start your journey today!

About Fookes Software

Fookes Software Ltd
La Petite Fin 27
1637 Charmey (en Gruyère)
Switzerland

For over 25 years we have been developing award-winning tools and productivity software. We also have more than 20 years of expertise in the field of email processing and analysis.

Our clients include Fortune 500 companies, government agencies, law firms, universities, and professionals specializing in e-discovery and forensics from around the world.

Your outdated browser! You can download Edge or Chrome or Firefox