Processing API¶
Functions for processing PMC articles efficiently.
Primary Processing Function¶
The recommended way to process PMC articles is process_single_pmc(). See the
Core API for the generated function reference.
Recommended Usage Pattern¶
# ─── Recommended Processing Pattern ──────────────────────────────────────────
import json
from pathlib import Path
from pmcgrab.application.processing import process_single_pmc
# The PMC IDs we want to process
PMC_IDS = ["7114487", "3084273", "7690653", "5707528", "7979870"]
OUT_DIR = Path("pmc_output")
OUT_DIR.mkdir(exist_ok=True)
for pmcid in PMC_IDS:
print(f"Fetching PMC{pmcid}...")
data = process_single_pmc(pmcid)
if data is None:
print(f" FAILED to parse PMC{pmcid}")
continue
# Pretty-print a few key fields
title = data["paper"]["title"]
abstract_blocks = data["paper"]["abstract"][0]["content"]
abstract_preview = abstract_blocks[0]["text"] if abstract_blocks else ""
print(
f" Title : {title[:80]}{'…' if len(title) > 80 else ''}\n"
f" Abstract: {abstract_preview[:120]}{'…' if len(abstract_preview) > 120 else ''}\n"
f" PMCID : {data['identifiers']['pmcid']}"
)
# Persist full JSON
dest = OUT_DIR / f"PMC{pmcid}.json"
with dest.open("w", encoding="utf-8") as fh:
json.dump(data, fh, indent=2, ensure_ascii=False)
print(f" JSON saved to {dest}\n")
Email Management¶
next_email() automatically rotates through available email addresses for NCBI
API requests, ensuring proper rate limiting and compliance. See the
Core API for the generated function reference.