Universal Document
790 oncology ER records ClinicalTrials.gov · PubMed · OpenFDA Physician-reviewed, not scraped-and-sold raw

The ER Oncology Dataset: 790 Physician-Validated Records for AI Training

AI needs good data. But PubMed has 30 million articles. Which ones matter?

We read them. Sonny Saggar, MD, a practicing ER physician with 30 years of experience, reviewed over 790 oncology records. Each one got graded, scored, and annotated. You get 10 quality scores, clinical context, and real takeaways.

Stop wasting your engineers' time. Start training on data that actually works.

The Problem

"AI is only as good as its training data. But PubMed has 30 million articles."

AI healthcare startups face a massive problem: drowning in irrelevant, low-quality medical literature while missing the high-impact studies that matter.

Engineers waste weeks filtering noise. Models hallucinate on edge cases. Clinical decision support systems fail because they're trained on broad, unspecific data.

The Solution

"The ER Oncology Dataset — 790 Records, 10 Quality Scores, Physician-Validated"

790 records validated • 3 customers served
Physician-Validated Oncology ER Records Seeded Live

A La Carte Dataset Filtering

Customize your dataset properties before checkout. If no filters are selected, you will receive the full un-truncated dataset.

How It Works

Our rigorous physician validation flow ensures that only high-utility, clinically accurate oncology ER records make it to your training pipeline.

👁️

1. I review every record

Raw records are collected from ClinicalTrials.gov, PubMed, and OpenFDA matching oncology emergency profiles.

✍️

2. Add my physician notes

Every record passes 10 hardcoded logic rules assessing study type, data completeness, evidence levels, and and ER relevance.

✅ / ❌

3. Approve or reject

Sonny Saggar, MD personally reviews, annotates, and approves every record on the clinician dashboard.

📥

4. Export curated dataset

We deliver structured datasets in CSV/JSON formats with full scorecard validation matrices and custom notes.

Sample Dataset Preview

A visual representation of the flat database schema and structured physician notes included in every export.

Title: Low-dose low-molecular-weight heparin vs placebo in ambulatory cancer patients.
Condition: Cancer-Associated Thrombosis (CAT)
Evidence Grade: Grade A  |  ER Applicability: 9.2/10  |  Actionability: STAT
Physician Notes: Double-blind RCT. n=115. Clear risk reduction in cancer outpatients. Useful for ER triage. Typo: LMWH dose needs careful titration in renal failure patients.

10 Scoring Rules That Filter the Noise

Hardcoded in the validator, not a marketing checklist — every exported record carries its full rule breakdown.

RuleWhat it checks
Loading rules…

Meet the Physician & Curator

Every oncology emergency record is verified by a practicing physician with clinical clinical-review standards.

SS

Sonny Saggar, MD

Chief Medical Officer & Practicing ER Physician

Sonny Saggar, MD brings over 30 years of medical experience in the emergency department. He has personally reviewed, scored, and annotated every record on the Universal Document platform to ensure it meets strict clinical quality indicators, making it optimal for healthcare AI models, clinical researchers, and pharmacovigilance teams.

View Sonny Saggar, MD Publications on SSRN ↗

What Customers Say

Early feedback from health AI teams training on our structured oncology emergency datasets.

"Having physician-verified notes saved our engineers hundreds of hours of preprocessing. The quality of our oncology triage bot improved dramatically."

— Lead AI Scientist, HealthTech Unicorn
Coming soon from real customers

"The 10-rule scorecard combined with ER applicability scores allowed us to filter out noise instantly. Highly recommended."

— Director of Pharmacovigilance, Global Pharma
Coming soon from real customers

Dataset specifications & pricing tiers

All tiers are physician-reviewed. Every record ships with its full 10-rule scorecard.

Mini

Free / 50 records
  • Level 2 reviewed sample
  • Requires email verification
  • Manual review & approval

Growth

$3,500 / 500 records
  • Level 2/3 mixed, audit-grade top records flagged
  • Priority condition weighting available on request
  • CSV + JSON delivery

Frequently Asked Questions

Q: Who validates the data?

A: Sonny Saggar, MD personally reviews every record. No automated validation shortcut is used.

Q: What sources are used?

A: ClinicalTrials.gov, PubMed, and OpenFDA.

Q: Can I filter by disease area or evidence grade?

A: Yes, customers can filter subsets using custom criteria during checkout.

Q: Is this data suitable for AI training?

A: Yes, it's specifically curated for machine learning applications in healthcare.

Q: What format is the data in?

A: CSV and JSON, ready for any AI pipeline.

Q: How many records are currently available?

A: 790 records, growing weekly.

Q: Do you provide updates?

A: Enterprise customers receive quarterly updates.

Ready to Train Your AI on Physician-Validated Data?

Contact Support