Skip to main content

Audit Log

Last Updated: May 1, 2026

This document records data-quality issues discovered during pre-launch and post-launch audits, what was done about each one, and what remains open. Entries are dated and listed newest-first.

The purpose is twofold: an internal record so issues don't get lost between work sessions, and a transparent account for external readers (environmental attorneys, watchdog groups, journalists) who want to understand how the underlying data is verified.

Permit Minder's defensibility standard: every number on the platform should be something an environmental attorney would be comfortable forwarding to opposing counsel. This log is part of how that standard is maintained.


2026-05-01 — Search results card: max_exceedance_pct and top_parameter computed independently

Discovered during: Scoping for direction-explicit display copy on the permit detail and search surfaces (PR #33).

Issue: The search_exceedances RPC aggregates per permit and returns two columns that the search results card visually associates with each other but that are computed by separate aggregations. max_exceedance_pct is MAX(exceedance_pct) across all rendered rows for the permit, while top_parameter is the most-frequent parameter for the permit, picked by a separate LATERAL subquery. The two values do not necessarily come from the same underlying row. A permit whose most-frequent parameter is Total Suspended Solids (a ceiling parameter) but whose single highest-percentage row is a Dissolved Oxygen floor reading would show, on the search card, the parameter from the first aggregation alongside the percentage from the second — visually implying a relationship that does not exist in the underlying data. The defect was invisible in ceiling-only display copy and surfaced when the direction-explicit rewrite (PR #33) needed the stat_base_code of the row that produced max_exceedance_pct in order to classify direction, and discovered that field is not exposed in the RPC return at all.

Affected scope: Search results card and the search-results CSV export. The permit detail page renders per-row data with full row context (parameter, stat_base_code, qualifier, permit_limit, reported_value all available per row) and is unaffected.

Disposition: Permit detail surfaces shipped direction-explicit copy in PR #33 using the per-row classifier. Search card rendering remains the previous ceiling-only “+X% over reported limit” wording until a post-launch follow-up migrates search_exceedances to expose a max_exceedance_direction column computed inside the aggregate. The methodology page (PR #32) discloses that direction-aware detection applies and notes the historical recovery work in flight, which covers the gap until the migration lands.

Status: Open. Pending post-launch RPC migration.


2026-05-01 — Same logical exceedance ingested multiple times (cross-source and intra-source)

Discovered during: PR #33 verification, while cross-checking the one row with stat_base_code = 'UNKNOWN' in production against the rendered preview.

Issue: The exceedances table has no uniqueness constraint on the natural key (permit_number, parameter, monitoring_period_end, permit_limit, reported_value). The same logical exceedance can appear multiple times when a row is ingested from PA DEP and again from EPA ECHO with the same numeric value but different stat_base_code (PA DEP sometimes records 'UNKNOWN' where ECHO has the proper short code), or when ECHO ingestion writes the same row twice with slightly different field state under different id values (qualifier '<=' vs null, monitoring_period_end populated vs null).

Platform-wide totals across rendered rows (exceedance_pct > 0): 155 distinct natural-key tuples appear in more than one row; 334 total rows are part of those duplicated tuples; about 1.1% of all rendered rows are duplicate-attributable. Cross-source duplication accounts for 1 of the 155 tuples; the remaining 154 are intra-source duplications, mostly within ECHO.

Worst-affected permits by extra-row count: PA0254967 (38 extra rows), PA0053091 (21), PA0001937 (20), PA0215520 (10), PA0008281 (7).

User-visible symptom on the permit detail page: the same parameter and period appear in the exceedance table multiple times, with the same numbers but different Limit Type values (a “—” row from PA DEP plus an “Instantaneous Maximum” row from ECHO, for example).

Affected scope: Permit detail page (visible duplicate rows for affected permits). Search-card aggregates that count rows (exceedance_count, and longest_streak indirectly) inherit small over-counts on these permits. Methodology and audit-log surfaces are unaffected.

Disposition: Dedup logic is post-launch. Two paths under consideration: a uniqueness constraint on the natural key with a tie-break preferring the source that has the more specific stat_base_code, or a view that collapses duplicates at read time. Either approach requires deciding the canonical row — likely PA DEP for permit-level metadata and ECHO for the stat code when PA DEP has 'UNKNOWN'. Until then, users on the affected permits will see duplicate rows on the permit detail page.

Status: Open. Pending post-launch dedup work.


2026-04-30 — ECHO ingestion: floor-parameter override dead at raw-column stage

Discovered during: Phase 1 fixture-verification work (regression tests for direction-aware exceedance detection).

Issue: The ingestion pipeline has a parameter-name override that classifies certain parameters (Dissolved Oxygen, % Removal, Stream Flow Minimum) as floor limits regardless of stat code, because those parameters are always-floor by EPA convention. The override was silently inactive at the ECHO ingestion stage: the function checked for a column named PARAMETER, but raw ECHO data uses PARAMETER_DESC until column-mapping runs later in the pipeline. So for ECHO rows where parameter ∈ {DO, % Removal, Stream Flow Min} AND the stat code field was empty AND the qualifier wasn't ">=", the override didn't fire and the row was misclassified as a ceiling limit. Real floor exceedances in that slice were silently dropped.

Affected scope: ECHO data only. PA DEP data was not affected (its scraper has always been direction-aware and uses post-map column names). Slice size: 0 rows visibly miscategorized in the current database (every override-pattern parameter already had a recognized stat code via the IB/DC/ME/MJ path or a >= qualifier). Upper bound is unknown without re-pulling raw ECHO bulk data, since dropped rows were never inserted. Phase 4's diff report will surface the actual count.

Disposition: Fixed. Patched _is_always_floor_parameter to accept PARAMETER_DESC as a fallback when PARAMETER isn't present. Pure widening — when PARAMETER exists, behavior is identical to pre-patch; when only PARAMETER_DESC exists (raw ECHO before map_columns), the override now fires. Verified against fixture: previously-dropped DO no-stat row now correctly flagged; compliant DO no-stat row still correctly dropped (no over-flagging). Phase 1 regression tests assert post-fix behavior. Historical recovery of dropped rows is part of Phase 4's ECHO re-pull.

Status: Fix shipped. Historical recovery pending Phase 4.


2026-04-30 — Homepage stats included null/null placeholder rows

Discovered during: Phase 0 audit of UNKNOWN-direction bucket.

Issue: The get_site_stats() database function did not filter on exceedance_pct > 0, so it counted 4,687 PA DEP rows with null permit_limit and null reported_value (placeholder rows from a separate data-quality issue). Every other surface on the platform — search, streaks, permit detail, alerts, exports — already filtered on exceedance_pct > 0 and excluded these rows. The homepage stats were the only inconsistent surface. Pre-fix counts were inflated: total exceedances by 45.9%, total permits by 24.6%, parameters tracked by 28.3%.

Affected scope: Homepage display only. No alerts, exports, or permit-detail pages were affected.

Disposition: Migration 060 updated get_site_stats() to filter on exceedance_pct > 0. Verified post-migration counts match expected: total exceedances 30,022, total permits 2,322, parameters tracked 129.

Status: Shipped. The 4,687 placeholder rows themselves are a separate data-quality issue tracked in the internal post-launch backlog.


Pre-2026-04-30 — Historical ECHO ingestion was direction-blind

Discovered during: Pre-launch defensibility audit.

Issue: Historical ECHO ingestion compared reported_value > permit_limit uniformly, without considering whether a permit limit was a ceiling (maximum) or a floor (minimum). For Minimum-type limits (Dissolved Oxygen, pH lower bound, % Removal, Stream Flow Min, and any limit with stat codes IB/DC/ME/MJ), this meant every compliant reading was incorrectly flagged as an exceedance, and every reading that should have been flagged as a floor-limit exceedance was incorrectly recorded as compliant. Roughly 660 ECHO-only permits were affected.

Affected scope: ECHO data only. PA DEP data was not affected. Current pipeline is direction-aware; only historical ingestion was affected.

Disposition: Cleanup migrations 026, 031, and 054 removed the false-positive exceedance flags. False negatives — floor-limit exceedances that were never inserted because the direction-blind comparison rejected them — could not be recovered from the existing database; those rows must be re-pulled from ECHO source data. Estimated recovery: ~17,936 historical floor-limit exceedances. Re-pull is Phase 4 of the multi-phase fix, scheduled post-launch with a methodology-disclosure fallback if the re-pull doesn't pass sample validation.

Status: False positives cleaned. False-negative recovery pending Phase 4.


How this document is maintained

Entries are added when a data-quality issue is discovered, regardless of whether it's user-visible. Each entry covers: what was found, what was affected, what was done, and current status. Entries are not deleted when issues close — closure is recorded in the Status field.

Future audit findings follow the same template: Discovered during / Issue / Affected scope / Disposition / Status.