790 oncology ER recordsClinicalTrials.gov · PubMed · OpenFDAPhysician-reviewed, not scraped-and-sold raw
The ER Oncology Dataset: 790 Physician-Validated Records for AI Training
AI needs good data. But PubMed has 30 million articles. Which ones matter?
We read them. Sonny Saggar, MD, a practicing ER physician with 30 years of experience, reviewed over 790 oncology records. Each one got graded, scored, and annotated. You get 10 quality scores, clinical context, and real takeaways.
Stop wasting your engineers' time. Start training on data that actually works.
The Problem
"AI is only as good as its training data. But PubMed has 30 million articles."
AI healthcare startups face a massive problem: drowning in irrelevant, low-quality medical literature while missing the high-impact studies that matter.
Engineers waste weeks filtering noise. Models hallucinate on edge cases. Clinical decision support systems fail because they're trained on broad, unspecific data.
The Solution
"The ER Oncology Dataset — 790 Records, 10 Quality Scores, Physician-Validated"
790 oncology records focused on ER complications: febrile neutropenia, spinal cord compression, cancer thrombosis, hypercalcemia, and cancer pain
10 quality scores applied to every single record for custom subsets
Each record annotated with **physician notes** written by Sonny Saggar, MD, a practicing ER physician with 30 years of experience
Formatted in ready-to-ingest JSON and CSV formats for AI training, academic research, and regulatory submissions
790 records validated • 3 customers served
Physician-Validated Oncology ER Records Seeded Live
A La Carte Dataset Filtering
Customize your dataset properties before checkout. If no filters are selected, you will receive the full un-truncated dataset.
How It Works
Our rigorous physician validation flow ensures that only high-utility, clinically accurate oncology ER records make it to your training pipeline.
👁️
1. I review every record
Raw records are collected from ClinicalTrials.gov, PubMed, and OpenFDA matching oncology emergency profiles.
✍️
2. Add my physician notes
Every record passes 10 hardcoded logic rules assessing study type, data completeness, evidence levels, and and ER relevance.
✅ / ❌
3. Approve or reject
Sonny Saggar, MD personally reviews, annotates, and approves every record on the clinician dashboard.
📥
4. Export curated dataset
We deliver structured datasets in CSV/JSON formats with full scorecard validation matrices and custom notes.
Sample Dataset Preview
A visual representation of the flat database schema and structured physician notes included in every export.
Title: Low-dose low-molecular-weight heparin vs placebo in ambulatory cancer patients.
Condition: Cancer-Associated Thrombosis (CAT)
Evidence Grade: Grade A | ER Applicability: 9.2/10 | Actionability: STAT
Physician Notes: Double-blind RCT. n=115. Clear risk reduction in cancer outpatients. Useful for ER triage. Typo: LMWH dose needs careful titration in renal failure patients.
10 Scoring Rules That Filter the Noise
Hardcoded in the validator, not a marketing checklist — every exported record carries its full rule breakdown.
Rule
What it checks
Loading rules…
Meet the Physician & Curator
Every oncology emergency record is verified by a practicing physician with clinical clinical-review standards.
SS
Sonny Saggar, MD
Chief Medical Officer & Practicing ER Physician
Sonny Saggar, MD brings over 30 years of medical experience in the emergency department. He has personally reviewed, scored, and annotated every record on the Universal Document platform to ensure it meets strict clinical quality indicators, making it optimal for healthcare AI models, clinical researchers, and pharmacovigilance teams.
Early feedback from health AI teams training on our structured oncology emergency datasets.
"Having physician-verified notes saved our engineers hundreds of hours of preprocessing. The quality of our oncology triage bot improved dramatically."
— Lead AI Scientist, HealthTech Unicorn
Coming soon from real customers
"The 10-rule scorecard combined with ER applicability scores allowed us to filter out noise instantly. Highly recommended."
— Director of Pharmacovigilance, Global Pharma
Coming soon from real customers
Dataset specifications & pricing tiers
All tiers are physician-reviewed. Every record ships with its full 10-rule scorecard.
Mini
Free / 50 records
Level 2 reviewed sample
Requires email verification
Manual review & approval
Starter
$2,000 / 250 records
Level 2 Clinical Review
All 5 conditions represented
CSV + JSON delivery
Growth
$3,500 / 500 records
Level 2/3 mixed, audit-grade top records flagged
Priority condition weighting available on request
CSV + JSON delivery
Frequently Asked Questions
Q: Who validates the data?
A: Sonny Saggar, MD personally reviews every record. No automated validation shortcut is used.
Q: What sources are used?
A: ClinicalTrials.gov, PubMed, and OpenFDA.
Q: Can I filter by disease area or evidence grade?
A: Yes, customers can filter subsets using custom criteria during checkout.
Q: Is this data suitable for AI training?
A: Yes, it's specifically curated for machine learning applications in healthcare.