DMARC Report Parser Open Source

DMARC is a critical email authentication protocol designed to protect your domain from impersonation and phishing. It builds on SPF (Sender Policy Framework) and DKIM (DomainKeys Identified Mail) by providing a framework for domain owners to instruct recipient mail servers on how to handle unauthenticated emails and, crucially, to receive reports on email authentication failures.

These reports, known as aggregate (RUA) reports, are XML files sent by recipient mail servers. They contain invaluable data: which IPs are sending mail purporting to be from your domain, whether SPF and DKIM passed, and, most importantly for troubleshooting, whether they aligned with your From domain. However, these reports are notoriously difficult to read and analyze manually. They're large, verbose, and come from many different sources, often zipped. This is where DMARC report parsers come in.

Why Open Source DMARC Parsers?

For engineers, the appeal of open source is strong. It offers transparency, control, and often, cost savings. When dealing with something as sensitive as email security and data about your sending infrastructure, being able to inspect the code, understand how data is processed, and even contribute fixes or custom features is a significant advantage.

An open-source DMARC parser allows you to:

  • Own your data: Store your DMARC report data in your own infrastructure, giving you full control over privacy and retention.
  • Customize: Adapt the parser to your specific needs, integrate it with existing monitoring systems, or add custom reporting features.
  • Audit: Examine the parsing logic to ensure accuracy and understand exactly how results are being interpreted.
  • Avoid vendor lock-in: Should a commercial service no longer meet your needs, you have a self-hosted alternative.

While commercial DMARC services offer convenience and advanced features, understanding and potentially deploying an open-source solution provides a deeper insight into the underlying mechanics.

Understanding DMARC Alignment Failures

Before diving into tools, let's clarify the core problem DMARC helps you identify: alignment failures.

DMARC requires that the domain used in the From header (the one users see) aligns with the domains authenticated by SPF and DKIM. This alignment can be either "strict" or "relaxed."

  • SPF Alignment: For SPF to align, the domain in the Return-Path header (also known as the mfrom or envelope sender) must match the domain in the From header.

    • Strict alignment: The domains must be an exact match. example.com in Return-Path and example.com in From.
    • Relaxed alignment: The Return-Path domain can be a subdomain of the From domain. E.g., bounce.example.com in Return-Path aligns with example.com in From.
  • DKIM Alignment: For DKIM to align, the domain specified in the d= tag of the DKIM signature must match the domain in the From header.

    • Strict alignment: The domains must be an exact match. d=example.com aligns with From: user@example.com.
    • Relaxed alignment: The d= domain can be a subdomain of the From domain. E.g., d=marketing.example.com aligns with From: user@example.com.

Why do these fail? The most common reason is using third-party email sending services (marketing platforms, transactional email APIs, CRMs).

Example 1: SPF Alignment Failure with a Marketing Platform You use marketing.thirdparty.com to send emails from yourdomain.com. * Your From header is user@yourdomain.com. * The marketing platform, by default, might set the Return-Path to bounces@marketing.thirdparty.com for bounce tracking. * Even if marketing.thirdparty.com is included in your yourdomain.com SPF record, SPF passes because the sending IP is authorized for marketing.thirdparty.com. * However, marketing.thirdparty.com does not align with yourdomain.com in the From header. DMARC fails for SPF.

Example 2: DKIM Alignment Failure with a Transactional Email Service You use api.transactional-email.com to send order confirmations from yourdomain.com. * Your From header is orders@yourdomain.com. * The service signs the email with its own DKIM key, so the d= tag in the signature is d=transactional-email.com. * DKIM passes because transactional-email.com is a valid signer for that message. * But transactional-email.com does not align with yourdomain.com in the From header. DMARC fails for DKIM.

DMARC reports highlight these exact scenarios, showing you which sources are failing and why.

Open Source DMARC Parsers: Tools and Approaches

Setting up your own open-source DMARC parser typically involves:

  1. Configuring your DMARC record: Add rua=mailto:your_email@yourdomain.com to receive aggregate reports. Point this email address to a mailbox that your parser can access.
  2. Report ingestion: A script or application that fetches the XML reports (often gzipped) from the mailbox.
  3. Parsing: Extracting relevant data from the XML: source IP, SPF/DKIM results, alignment status, message counts, policy applied, etc.
  4. Storage: Saving the parsed data into a database (e.g., PostgreSQL, MySQL, SQLite).
  5. Reporting/Visualization: Tools to query and visualize the stored data.

Let's look at some concrete examples:

1. dmarcian-community/dmarc-report-parser (Python)

This project, available on GitHub, is a Python-based tool designed to parse DMARC XML reports and store them in a database. It's relatively straightforward and a good starting point for engineers comfortable with Python.

How it works: The core of this tool is a Python script that takes a DMARC XML file as input. It uses Python's xml.etree.ElementTree or similar libraries to navigate the XML structure, extract the relevant fields (like source_ip, policy_evaluated details including spf_aligned and dkim_aligned), and then inserts this data into a database.

Basic Setup & Usage (Conceptual): You'd typically set up a cron job or a dedicated service to: 1. Fetch new DMARC reports from an IMAP mailbox. 2. Unzip them (if gzipped). 3. Pass each XML file to the parser script.

# Example (simplified) of how you might run it after setup
# Assuming you have a report.xml file
python -m dmarc_report_parser --file report.xml --db-connection "sqlite:///dmarc_data.db"

The output is not directly a UI but data in your database. You'd then use SQL queries to analyze this data. For instance, to find all sources that failed DKIM alignment for your domain:

SELECT
    source_ip,
    COUNT(*) as total_failures
FROM
    dmarc_records
WHERE
    dkim_aligned = 0 AND header_from = 'yourdomain.com'
GROUP BY
    source_ip
ORDER BY
    total_failures DESC;

Strengths: Python's ecosystem, ease of scripting, direct database interaction. Weaknesses: Lacks a built-in web UI, requires manual integration with visualization tools (like Grafana, Metabase, or custom dashboards), ongoing maintenance of the script and its dependencies. Scaling for very high volumes of reports might require more robust queuing and processing.

2. DMARC-Reports (PHP/Laravel)

The `DMARC-Reports/DMARC-