Python Examples¶
Real-world examples showing how to use PMCGrab effectively.
Interactive Jupyter Tutorial¶
For a complete hands-on experience, check out our interactive Jupyter notebook:
You can also download it directly from GitHub: pmcgrab_tutorial.ipynb
This notebook covers:
- Installation and setup (including uv)
- Single paper processing
- Batch processing with multiple papers
- Data analysis and visualization
- RAG chunk preparation
- Export formats for different use cases
Code Examples¶
Batch Processing Multiple Papers¶
This example demonstrates how to process multiple PMC articles and save them as JSON files:
# ─── examples/run_three_pmcs.py ──────────────────────────────────────────────
import json
from pathlib import Path
from pmcgrab.application.processing import process_single_pmc
# The PMC IDs we want to process
PMC_IDS = ["7114487", "3084273", "7690653", "5707528", "7979870"]
OUT_DIR = Path("pmc_output")
OUT_DIR.mkdir(exist_ok=True)
for pmcid in PMC_IDS:
print(f"Fetching PMC{pmcid}...")
data = process_single_pmc(pmcid)
if data is None:
print(f" FAILED to parse PMC{pmcid}")
continue
# Pretty-print a few key fields
title = data["article"]["title"]["main"]
abstract_blocks = data["content"]["abstracts"][0]["blocks"]
abstract_preview = abstract_blocks[0]["text"] if abstract_blocks else ""
print(
f" Title : {title[:80]}{'…' if len(title) > 80 else ''}\n"
f" Abstract: {abstract_preview[:120]}{'…' if len(abstract_preview) > 120 else ''}\n"
f" Authors : {len(data['article']['contributors']['authors'])}"
)
# Persist full JSON
dest = OUT_DIR / f"PMC{pmcid}.json"
with dest.open("w", encoding="utf-8") as fh:
json.dump(data, fh, indent=2, ensure_ascii=False)
print(f" JSON saved to {dest}\n")
Key Features Demonstrated¶
- Batch Processing: Efficiently processes multiple PMC articles
- Error Handling: Gracefully handles failed parsing attempts
- Pretty Output: Shows key information during processing
- JSON Export: Saves complete article data as structured JSON files
- NCBI Settings: Uses the built-in email rotation and rate configuration
Expected Output¶
Fetching PMC7114487...
Title : COVID-19 pandemic response and lessons learned from the…
Abstract: The COVID-19 pandemic has posed unprecedented challenges to global health…
Authors : 5
JSON saved to pmc_output/PMC7114487.json
Fetching PMC3084273...
Title : Machine learning approaches in genomics and personalized medicine…
Abstract: Recent advances in machine learning have revolutionized the field of…
Authors : 3
JSON saved to pmc_output/PMC3084273.json
This example is perfect for:
- Building literature review datasets
- Creating training data for AI models
- Systematic research paper analysis
- Academic research workflows