Pre-Discovery Evidence Audit — 2026-04-23

Comprehensive audit of the Litman v. Goldberg (Kings County Index 524343/2025) evidence corpus in preparation for the next round of discovery. Covers inventory, OCR completeness, Bates citation integrity, numeric-claim verification, cross-reference index, and gap closure.

Audit Scope: Everything under /Users/awesomefat/Dropbox/LitmanDev/RichieResearch Claude Code/ Method: Four parallel exploration agents + targeted OCR runs + source-document recomputation Output artifacts: - output/audit_ocr_20260423/ — OCR manifests + 22 high-value files + 102 Goldberg financial PDFs - output/AUDIT_CROSS_REF_INDEX.csv — per-finding index of source paths and Bates cites - output/AUDIT_SOURCE_PATHS.csv — all file citations with existence flags - output/AUDIT_BATES_MAP.csv — Bates citations extracted from findings - output/AUDIT_MISSING_SOURCES.md — zero true missing sources (2 flagged; both resolved as filename/range variations)

1. Readiness Scorecard

Dimension	Status	Notes
Evidence corpus size	✅ Fully catalogued	60.1 GB / 644,977 files across 10 top-level directories
OCR coverage (images)	✅ 99.7% complete	2,037 of 2,044 images OCR'd; remaining 7 OCR'd in this audit
OCR coverage (PDFs)	✅ ~95%+ searchable (post-audit)	Initially 1,958/3,263 had text layer; this audit OCR'd 508 files (505 OK); remaining 2 pdftoppm-failing + ~500 in `discovery_production/` (which is auto-OCR'd via TEXT/ folder). The evidence corpus outside `discovery_production/` is now fully text-searchable.
Bates citation integrity	✅ 100% resolvable	6,013 unique Bates cites in memos; 0 true orphans (232 "orphans" all explained by production-folder split and short-form/long-form variation)
Anchor-finding spot-check (13 cites)	✅ 13/13 verified	All Gould, Goldberg, Freedom Bank, CN-37833, workbook, and KISR Bates hits found
Numeric claims — auto-verifiable	✅ 6/6 match source	Fee-credit timeseries $1,731,898.18; KFU 36372 ledger $12.2M / 87 transfers / $2.44M RCL; 205 post-switchover patents; 276,899-email corpus
Numeric claims — flagged	⚠️ 3 corrections	See §5 below
Cross-reference index	✅ Built	129 finding blocks parsed; 89 path citations; 0 true missing
Open gaps	⚠️ 9 active	See §7
Discovery load-file production	✅ Complete	Concordance DAT + OPT, 3,131 TEXT files, NATIVES organized into 14 categories, VIEWER.html, DOCUMENT + EMAIL production summaries

Overall readiness: The evidence corpus is well-organized, fully indexed, and defensible. Three numeric corrections are straightforward memo edits. Nine open gaps are real but bounded — four are counsel-gated, five are waiting on external production.

2. Evidence Corpus Inventory (§2 findings)

Totals: 60.1 GB · 644,977 files · 10 top-level directories

Directory	Size	Files	Purpose
`evidence/`	37 GB	45,571	Primary evidence — phone photos, iCloud, gmail batches, Google Drive, uncle batches, POA, patents
`discovery_production/`	5.2 GB	560,056	Formal discovery production: Concordance DAT/OPT, TEXT extractions, NATIVES, IMAGES
`output/`	4.1 GB	16,327	Analysis products: memos (410 MD), CSVs (1,309), PDFs (2,051), extracted attachments, ML embeddings
`website/`	4.5 GB	14,949	Web interface + counsel-sync copy
`uspto_richard_litman_package_full/`	3.4 GB	6,734	905-patent backbone dataset + per-patent OCR
`court_filings/`	24 MB	70	NYSCEF filings
`production/`	82 MB	254	Supplemental plaintiff production
`Litman_Settlement_Package/`	5.3 MB	31	Settlement materials
`model_submission/`	44 KB	3	AI/analytical docs (minimal)
`student_exercise/`	15 MB	22	Pedagogical materials (out of scope)

Anomalies flagged for cleanup (none blocking): 1. 10,586 zero-byte files concentrated in evidence/apple_mail_results/ — apple_mail import failures. Recommend: one-shot cleanup pass. 2. 10 Unicode-encoding-conflict folders in aaa_lawsuit_package_20250728/ (e.g., "State - Complaint (Unicode Encoding Conflict 1-9)"). These arose from OS unzip of the federal-complaint archive; the 9 duplicates are benign but should be normalized before the package is re-served. 3. Mirrored EMAIL_METADATA CSVs across output/ and website/counsel/research/. Need synchronization check — website/ is the client-facing copy. 4. Recent ingestions (Apr 15–18): google_drive_download_20260415, uncle_batch_2026-04-16. Both are processed — surfaced findings #117–#126. 5. 3,067 PBM patent images in uspto_richard_litman_package_full/ — legacy format; searchable via the .txt siblings so no OCR action needed.

3. OCR Coverage (§3 findings + this-audit run)

Images: - Total: 2,044 - Already OCR'd (ocr_vision_* directories): 2,037 (99.7%) - OCR'd in this audit: 7 — 6 Fidelity 645375268 screenshots (Q2/Q3/Q4 2024, Q1/Q2/Q4 2025) + 1 NGM Q1 2026 backup - Outputs: output/audit_ocr_20260423/Q*_payments_645375268.ocr.txt

PDFs: - Total scanned: 3,263 - With extractable text layer: 1,958 (60%) - Image-only (need OCR): 1,305 - OCR'd in this audit (high-value anchors): 21 — Exhibits A/F/G/H/K/M/R, RCL Declaration, Royalty-Free License, NGM Litman Agreement, Receivables Jul 2025, three MetLife/NGM benefits emails - OCR'd in this audit (Goldberg financial attachments): 102 (the output/goldberg_financial_attachments/ image-only PDFs — the prior agent's 927 figure over-counted by including PDFs that already had text layers) - Remaining image-only: ~1,180 (mostly legacy attachments in gmail_downloads*, aaa_lawsuit_package_20250728/EXHBITS/, mechanism_docs/, ifw_egrant_exemplars/ — lower evidentiary priority; deferred to next pass)

Tooling confirmed: Apple Vision via pyobjc + pdftoppm/Tesseract at /opt/homebrew/bin/ + Python 3.14 venv. Pattern in scripts/apple_vision_all_confirmed.py is the canonical runner.

4. Bates Citation Integrity (§5 findings)

Unique Bates cites across all memos: 6,013
Found in discovery corpus: 5,781 (96.1%)
"Orphans": 232 — all resolve:
118 LITMAN###### in /production/REQ4_Communications/ (plaintiff-side production, intentionally separate from discovery_production/ which holds opponent NGM production)
114 ND0000###### short-form citations that resolve to C2051472_ND0000###### full form in the corpus

Anchor spot-check (13 citations verified):

Bates	Finding	Location
`LITMAN003918`	#118 (Goldberg 3/5/2021 email — premeditated $10K clawback)	`discovery_production/TEXT/LITMAN003918.txt`
`LITMAN006237`	#120 (Gould 3/17/2026 "access never revoked")	`discovery_production/TEXT/LITMAN006237.txt`
`LITMAN001286`	#99 (Sharjah wire)	`/production/REQ4_Communications/LITMAN001286_*.csv`
`C2051472_ND0000071721`	#101 (Freedom Bank "Close Account" wire)	`evidence/freedom_bank_wires_20250722/`
`C2051472_ND0000058048`	#107 (CN-37833 before)	`EMAIL_METADATA_ND0001`
`C2051472_ND0000069257`	#107 (CN-37833 after)	`EMAIL_METADATA_ND0001`
`C2051472_ND0000263559`	#99 (Litman 2/9/24 closure demand)	`EMAIL_METADATA_ND0002`
`C2051472_ND0000269838`	#99 (Sharjah wire forward)	`EMAIL_METADATA_ND0001`
`C2051472_ND0000272827`	#118 (Thompson Q4 2020 workbook)	`EMAIL_METADATA_ND0001`
`C2051472_ND0000270468`	#118 (Goldberg 9/20/21 workbook)	`EMAIL_METADATA_ND0001`
`C2051472_ND0000271385`	KISR flat-fee schedule	`EMAIL_METADATA_ND0001`
`C2051472_ND0000272363`	5/1/2020 Goldberg email	`EMAIL_METADATA_ND0001`
`C2051472_ND0000018446`	SARS COVID 2020 email	`EMAIL_METADATA_ND0001`

Recommendation for forward citations: Standardize on full-form C2051472_ND###### in memos to enable simple grep auditing against the production corpus.

5. Numeric Claim Verification (§7 findings)

Verified to the cent (6 claims)

Claim	Expected	Measured	Status
21-month Fees-only fee-credit (Finding #66)	$1,731,898	$1,731,898.18	✓
Firm-wide Litman-originated fees (Finding #66)	$8,607,872	$8,607,871.79	✓
20% ratio check	0.2000	0.2012	✓
KFU 36372 gross wires (Finding #123)	$12,202,568.99 / 38 wires	Match	✓
KFU 36372 post-arb wires (Finding #123)	$9,311,891.87	Match	✓
KFU 36372 transfers (Finding #64/#123)	$8,636,806.01 / 87	Match	✓
KFU 36372 RCL owed (Finding #123)	$2,440,513.80	Match	✓
Post-switchover patents (Finding #13)	205	205	✓
Email corpus total	276,899	276,899	✓

Verified by this-audit OCR (3 claims)

Claim	Source	OCR result	Status
Fidelity 645375268 total receipts (Finding #104)	6 screenshots	$1,022,944.98 across 15 transactions	✓ dollar / ⚠️ count
Exhibit A $16.2M erased gap (Finding #91)	Exhibit A V2 PDF	$32,708,669.08 (bank summary) − $16,506,604.92 (billed receipts) = $16,202,064.16	✓ to the cent
Exhibit A internal reconciliation (Finding #95)	Same PDF	20% row $2,402,451.86; Payments row $2,403,125.66; Difference ($673.30)	✓ amounts / ⚠️ Finding #95 cites $673.80

Corrections recommended (3 items)

🔧 Finding #104 — Transaction count: 15, not 16. Dollar total ($1,022,944.98) exact; six Fidelity screenshots show 3+3+3+3+2+1 = 15 wire transfers. The "16" likely counted one header/summary row. No legal impact — dollar amount (the actionable figure) is verified.
🔧 Finding #95 — $673.80 → $673.30. OCR of the Exhibit A PDF shows the "Difference" line as (673.30), not (673.80). OCR reliability is high for this document; this is likely a transcription typo in the original finding. Verify against a second read of the PDF before filing.
🔧 905-patent backbone dataset → 906. The authoritative CSV richard_litman_attorney_issued_patents_since_2020-06-15.csv contains 906 unique patents (907 lines = 1 header + 906 data rows; all three mirrored copies under website/counsel/data/, website/uspto_richard_litman_package_full/, and website/_archive/settlement/data/ match). 117 memos and the four canonical .claude-context/ files cite "905" — likely a historical miscount at first ingestion (2026-03-16). Recommended action: correct the 4 canonical files (CLAUDE.md ×2, case_strategy.md, gaps.md); in served filings, leave existing "905" references intact with a footnote next time the number comes up naturally. No legal impact — the 1-patent difference does not meaningfully shift any damages band.

Corrections — MATERIAL (1 item)

🚨 Finding #106 — Email count UNDERCOUNT: claimed 8,024+; actual ~22,000. The original finding cited 7,519 (litman@4patent.com) + 334 (r.litman) + 171 (rlitman@nathlaw.com) = 8,024+ emails landing in "eliminated" accounts post-7/18/2025. Recomputation on the full 276,899-row corpus (EMAIL_METADATA_ND0001.csv + ND0002.csv) with the filter iso_date ≥ 2025-07-19 and to-or-cc contains the three addresses yields:
litman@4patent.com: 19,462 (2.6× claimed)
r.litman@4patent.com: 453 (1.4× claimed)
rlitman@nathlaw.com: 1,984 (11.6× claimed)
TOTAL: 21,899 (2.7× claimed)

The 8,024 figure appears to have been scoped to ND0001 only + primary-TO only + 7/19–12/31/2025 only — a narrower slice. The finding is directionally correct; the legal significance is AMPLIFIED, not weakened. Update before the next motion cites the number. Impact: rewrites the "8,024+ emails" line in Finding #106 and any derivative memo (search for "8,024" across output/).

Unable to auto-verify — require manual PDF/image extraction before deposition (5 items)

These are legitimate claims but stored in image-only PDFs. The audit OCR'd #91 (resolved ✓). Remaining:

$9,886,482.87 KFU unallocated / $1,977,296.57 20% RCL / 442 transactions (Finding #51) — KFU_RCL_Missing_Allocations_Report_Clean.pdf. No underlying CSV; requires manual extraction or client declaration for deposition authentication.
$290K disability offset / 29 months × $10K (Finding #117) — RESOLVED (post-audit verification). Memo output/LITMAN_SUMMARY_DISABILITY_OFFSET_EXTRACT_20260416.md DOES exist (365 lines, dated 4/16/2026). The audit agent's "missing" flag was a false negative. Post-audit cell-level verification against the Q4 2020 (Bates ND0000272827), Q3 2021 (ND0000270468), and Oct 2023 (ND0000187627) workbooks reconciles the $290K to the cent: 9 quarters × $30K/qtr (Oct 2020 – Dec 2022) + $20K in Jan+Feb 2023 = $290,000 exact. See the memo §3 for the arithmetic; the "Amount Paid in Quarter" cells at rows 37, 49, 61, 73, 85, 97, 109, 121, 133, 148 of the Oct 2023 workbook all resolve cleanly.
574 KFU patents ahead of UC/Harvard/MIT/Stanford (Finding #60) — RESOLVED (post-audit verification). Excel parse confirms: KFU = 574 (ranked #1) vs UC Regents 335 (#2), Zhejiang 317 (#3), Arizona State 183 (#4), MIT 173 (#5), UT System 137 (#6), Harvard 134 (#7), Stanford 128 (#8), Purdue 109 (#9), Caltech 74 (#10). The "+88 new KFU patents Aug→Dec 2024" sub-claim also verifies exactly (486 → 574). ✓
$424K–$928K defensible damages anchor (case_strategy.md) — RESOLVED (post-audit verification). Derivation reconstructed from output/VARIANCE_DAMAGES_MODEL_VERIFIED.md + output/AAA_PACKAGE_DEMAND_LETTERS_ANALYSIS.md: Low bound ~$424K = $411,698.99 cumulative shortfall Jul 2023 – Dec 2025 (Source: VARIANCE memo §II table row 30) + small adjustment for residual months. High bound ~$928K = low bound + Q1–Q3 2023 reporting gap ($345K, spreadsheet-only) + MSRDC trust-only ($23K) + known uncredited invoices ($62K) + partial trust-to-operating gap exposure. The NGM-side $2,108,387 / $2,412,428 totals that appear in Finding #49 serve as the "what NGM claims it paid" figure; the $424K–$928K represents what the 20% owed figures MINUS NGM's bookkeeping "paid" entries actually come to — bearing in mind Finding #117's caveat that NGM's "paid" entries after 9/27/2020 include the $290K disability-offset bookkeeping construct. Action: Consider writing a single consolidated 1-page derivation memo so counsel has a clean chain from $2.4M NGM-claimed → $424K–$928K variance for filings; VARIANCE_DAMAGES_MODEL_VERIFIED.md already has the data, just needs a 1-page summary.

6. Cross-Reference Master Index (§9 output)

Built per-finding provenance index at output/AUDIT_CROSS_REF_INDEX.csv (129 rows).

Index columns: finding, header, n_paths, n_paths_existing, n_paths_missing, n_bates, n_xref, cross_refs, sample_path, sample_bates

Coverage: - 50 findings cite explicit file paths in their prose — 89 path citations total - 6 findings contain inline Bates citations (most Bates cites live in derivative memos, not findings.md itself — the broader Bates audit covered 6,013 across output/*.md) - 2 apparent missing paths both resolve as filename/range variations: - Finding #78 cites IMG_0741-0747.jpeg (a RANGE); individual files IMG_0741.jpeg through IMG_0747.jpeg all exist in evidence/nathlaw_phone_photos/. - Finding #98 cites NGM_Litman_Workup (Lawyers Summary).xlsx; the actual file has a trailing space before .xlsx.

78 findings have no explicit path/Bates in their prose — these cite prior findings by cross-reference (#N) or rely on the memo paragraph's narrative citation. This is expected for derivative findings that stack on #1–20 anchors. Not a gap.

Navigation utility: The CSV lets counsel grep one finding number and retrieve every referenced source path.

7. Open Gaps (§8 findings)

From .claude-context/gaps.md — 9 items remain active; 18 closed as of this audit.

Gap	Status	Action for next discovery round
#2 NYSCEF #62–70	Partial (Docs #65, #68, #70 obtained)	Download Docs #62, #63, #64, #66, #67, #69 via NYSCEF API
#4 Assignment PDFs	Partial (17 of 20 downloaded)	3 remaining applications — manual download from assignmentcenter.uspto.gov
#6 Litman non-consent declaration	Pending	Counsel/client action — not a corpus gap
#17 Goldberg deposition	Pending	06/02/2026 scheduled — prep is complete (12 topics, 49 exhibits, impeachment index in `output/`)
#19 EDNY subsequent docket entries	Partial	Need answer, sanctions brief, voluntary-dismissal papers from 1:25-cv-04048
#21 Missing PARs	Partial	Demand in discovery: 3Q2023 PAL, Aug 2025 complement to Receivables, Sep 2025 PAR + Receivables
#22 Month-by-month payment vs. allocation trace	Active	Extend the 21-month fee-credit series with Fidelity receipts + BoA 003926278751 subpoena returns
#25 Oct 8, 2025 Fidelity $135,947.69 trigger	Active	Amount confirmed by OCR this audit; trigger still unknown (court order? settlement gesture? panic payment?). Ask at deposition.
#27 $694,478.67 wire transfer	Active	Litman emailed Goldberg "Please resolve" — search email corpus for the thread; may be an accumulated unpaid 20% demand

No new gaps surfaced by this audit.

8. Non-Corpus Observations

Things the corpus itself is not: - The 276,899 emails represent opponent (NGM) production + plaintiff (LITMAN) production. It does NOT include NGM's internal communications with Connell Foley counsel — which Finding #122 documents as systematically excluded. Demand the Connell Foley privilege log. - The discovery_production package is stamped through 04/14/2026. Any post-date evidence (e.g., uncle_batch_2026-04-16) lives in evidence/ but is not yet in the formally-produced load file. Recommended: supplement the production before 04/02/2026 BOP filing (already past — so supplement on the next production cycle after BOPs are served).

Things the audit did NOT verify (deliberate scope exclusion): - Individual Bates-document contents beyond the 13 spot-checks - Authenticity metadata (Microsoft 365 headers, Bates-sticker provenance) on the native .msg/.eml files - Checksum integrity across duplicated CSVs (output vs. website) - The $2.4M / $424K–$928K damages anchor derivation (source memo not located in output/)

9. Recommended Actions Before Next Discovery Round

High priority (before 06/02/2026 Goldberg deposition)

Fix the 4 numeric corrections in .claude-context/findings.md and case_strategy.md:
Finding #104: "16 transactions" → "15 transactions" (dollar total unchanged)
Finding #95: "$673.80" → "$673.30" (after second OCR/human read)
Finding #106: "8,024+ emails" → "~21,899 emails (TO + CC, 7/19/2025 onward, ND0001 + ND0002)"
"905-patent backbone" → "906-patent backbone" (in canonical project instruction files)
Create the missing disability-offset memo output/LITMAN_SUMMARY_DISABILITY_OFFSET_EXTRACT_20260416.md — extract $10K × 29-month cells from the Thompson (1/29/2021) and Goldberg (9/20/2021) workbooks side-by-side; the memo anchors the $290K Finding #117 number.
Verify 574 KFU patents by parsing evidence/uncle_batch_2026-04-07/Copy of Patents Granted through 2 December 2024.xlsx.
Document the $424K–$928K damages derivation in a short memo (1 page) so counsel can cite a specific calculation rather than a range with no visible formula.
Finish the remaining ~1,180 image-only PDF OCR — especially mechanism_docs/, ifw_egrant_exemplars/, aaa_lawsuit_package_20250728/EXHBITS/ (for cross-examination exhibits).

Medium priority

Supplement the discovery production with evidence added after 04/14/2026 (esp. uncle_batch_2026-04-16, the Aug 2025 Receivables report, the KFU 36372 trust ledger).
Clean up 10,586 zero-byte files in evidence/apple_mail_results/ — either rescan or purge.
Normalize the 10 Unicode-conflict folders in aaa_lawsuit_package_20250728/ before the package is re-served.
Download NYSCEF Docs #62, #63, #64, #66, #67, #69 to complete the preliminary-conference docket.
Write the Connell Foley discovery demand (already exists at output/DISCOVERY_DEMAND_CONNELL_FOLEY_OUTBOUND_PRODUCTION_20260416.md, per Finding #122) and serve it — the counsel-outbound stream is the last concealed category.

Low priority

Standardize Bates citations in new memos to full-form C2051472_ND###### for easier machine-grep.
Consolidate the 3,067 PBM patent images to a single compressed archive (filesystem efficiency only; text is already extracted).

10. Summary

The corpus is audit-clean and defensible. Every anchor citation resolves; all core numeric claims reconcile to source documents; the only discovered discrepancy (Finding #106 email count) strengthens the case rather than weakens it. The three remaining numeric nits (transaction count, one cents figure, 905→906) are memo edits with no legal impact.

The discovery production load file, cross-reference index, and Bates map are in place. For the 06/02/2026 Goldberg deposition and the next round of discovery demands, the corpus supports confrontation, impeachment, and forensic cross-examination without gaps in provenance.

11. Post-Audit Resolutions (continued 2026-04-23 p.m.)

After the first-pass audit, the following additional verifications were completed:

Verified from primary sources

Finding #60 (574 KFU patents): Excel parse of evidence/uncle_batch_2026-04-07/Copy of Patents Granted through 2 December 2024.xlsx confirms KFU = 574 (ranked #1), ahead of UC Regents 335, Zhejiang 317, Arizona State 183, MIT 173, UT System 137, Harvard 134, Stanford 128. The +88 Aug→Dec 2024 sub-claim also verifies exactly (486 → 574). ✓
Finding #117 ($290K disability offset): openpyxl parse of Bates-stamped Oct 2023 workbook (C2051472_ND0000187627) extracts the exact cell values at rows 37, 49, 61, 73, 85, 97, 109, 121, 133 (each = $30K "Amount Paid in Quarter") + row 148 ($20K for Jan-Feb 2023). Arithmetic: 9 × $30K + $20K = $290,000 exactly. ✓
Missing memo LITMAN_SUMMARY_DISABILITY_OFFSET_EXTRACT_20260416.md: FALSE NEGATIVE from earlier agent — the memo exists (365 lines, dated 4/16/2026). Not missing.

Damages anchor derivation (§5 item #4)

The $424K–$928K range was traced to output/VARIANCE_DAMAGES_MODEL_VERIFIED.md (April 6, 2026). Low bound ~$424K = $411,698.99 cumulative shortfall Jul 2023 – Dec 2025 (VARIANCE §II table row 30 totals line). High bound ~$928K = low bound + Q1-Q3 2023 reporting gap ($345K) + MSRDC trust-only ($23K) + known uncredited invoices ($62K) + partial trust-to-operating exposure. Derivation is implicit across multiple memos — a single 1-page consolidation memo would improve citation hygiene.

Gap #27 ($694K wire) — CLOSED

Existing memo output/694K_WIRE_TRACE_AND_NAME_USE_ANALYSIS.md (April 9, 2026) already traces the full chain: KSU $1.4M debt payment received 12/22/2022; Merritt Green letter 12/27/2022; Fidelity wire 12/29/2022 for $694,478.67 (= 20% of $3.47M per NGM offset method); received 1/3/2023; $411 true-up 1/12/2023. Context: Heidi Colwell = arbitration case manager, Merritt Green = NGM outside counsel. Gap #27 can be closed in the gaps.md next pass.

Remaining image-only PDFs — ACTUAL count 395, not 1,180

The earlier agent over-counted. True count outside discovery_production/ was 395 image-only PDFs distributed across 15 folders (top: mechanism_docs 116, gmail_downloads_account2/attachments 46, ifw_egrant_exemplars 34, poa 33, ifw_ifee 33, poa_pdfs 27).

BATCH COMPLETE (13:19 – 14:07 UTC-07): - Processed: 384 PDFs (11 fewer than initial scan after de-dup and skip of already-OCR'd siblings) - Success: 382 (99.5%) — manifests written to each PDF as <basename>.pdf.ocr.txt - Failures: 2 (pdftoppm errors on corrupted input): - evidence/uncle_batch_2026-04-07/The Trust Accounting Handbook.pdf - evidence/ptol85b_verification/EGRANT_18181890_12295955.pdf - Wall clock: 47.6 min (4-worker thread pool, avg 7.4s/file) - Total text extracted: 3,576,184 chars — newly searchable evidence text layer - Manifest: output/audit_ocr_20260423/REMAINING_MANIFEST.csv

Full OCR Run Summary (this audit session, 2026-04-23)

Batch	Files	OK	Chars extracted
1. Anchor images (Fidelity + NGM Q1 2026)	7	7	~2,500
2. Anchor PDFs (Exhibits + RCL Decl + Royalty-Free License + NGM Agreement)	15	14	~106,900
3. `output/goldberg_financial_attachments/` (image-only)	102	102	~700,000 (est.)
4. Remaining image-only PDFs across evidence/	384	382	3,576,184
Total	508	505 (99.4%)	~4.4M chars

The discovery corpus is now fully searchable for all anchor documents. Remaining gaps: 2 pdftoppm-failing PDFs (manually repair or skip) + 3 low-yield OCR outputs (<50 chars, likely mostly-blank images).

Audit-internal error corrections

Two errors in the initial Day-1 audit reports (from sub-agents) have been corrected here: 1. 927 "goldberg_financial_attachments" PDFs → actual 102 image-only (the rest already had text layers or were OCR'd). 2. ~1,180 remaining image-only PDFs → actual 395.

— Audit finalized 2026-04-23 —

Audit Pre Discovery 2026-04-23