Command Line Interface¶

PMCGrab's CLI processes PMC IDs, PubMed IDs, DOIs, ID files, local XML files, and local XML directories into clean paper JSON by default.

Quick Commands¶

# PMC IDs. Accepts PMC7181753, pmc7181753, or 7181753.
uv run python -m pmcgrab --pmcids 7181753 3539614 --output-dir ./results

# PubMed IDs, converted to PMC IDs before processing.
uv run python -m pmcgrab --pmids 33087749 --output-dir ./results

# DOIs, converted to PMC IDs before processing.
uv run python -m pmcgrab --dois 10.1038/s41586-020-2832-5 --output-dir ./results

# IDs from a text file.
uv run python -m pmcgrab --from-id-file ids.txt --output-dir ./results

# Local JATS XML directory, no network.
uv run python -m pmcgrab --from-dir ./pmc_bulk_xml --output-dir ./results

# Specific local XML files, no network.
uv run python -m pmcgrab --from-file article1.xml article2.xml --output-dir ./results

# Download figure image files alongside the default clean paper JSON.
uv run python -m pmcgrab --pmcids 7181753 --with-images --output-dir ./results

# Emit the metadata-rich full JSON instead of the clean paper view.
uv run python -m pmcgrab --pmcids 7181753 --full-json --output-dir ./results

Input Modes¶

Exactly one input mode is required.

Option	Description
`--pmcids`, `--ids`	PMC IDs to download and process. Bare numeric IDs are treated as PMC IDs.
`--pmids`	PubMed IDs to convert to PMC IDs before processing.
`--dois`	DOIs to convert to PMC IDs before processing.
`--from-id-file`	Text file with one identifier per line. Blank lines and `#` comments are ignored. Bare numeric IDs are treated as PMC IDs; use `--pmids` for PubMed IDs.
`--from-dir`	Directory of local JATS XML files.
`--from-file`	One or more local JATS XML files.

Output Options¶

# One JSON file per successful article.
uv run python -m pmcgrab --pmcids 7181753 --output-dir ./results --format json

# One JSONL file containing all successful articles.
uv run python -m pmcgrab --pmcids 7181753 3539614 --output-dir ./results --format jsonl

# Full V4 metadata/debug JSON.
uv run python -m pmcgrab --pmcids 7181753 --full-json --output-dir ./results

# Full V2/V3 compatibility output.
uv run python -m pmcgrab --pmcids 7181753 --full-json --schema-version 2

Output files:

results/
├── PMC7181753.json
├── PMC3539614.json
└── summary.json

For JSONL:

results/
├── output.jsonl
└── summary.json

summary.json maps each input name or PMC ID to true or false. Article files and JSONL rows are strict JSON. By default, each article uses schema: "pmcgrab.paper.v1" with paper.title, paper.abstract, paper.body, assets.images, and assets.tables. Pass --full-json for the metadata-rich V4 shape.

Exit Codes¶

Code	Meaning
`0`	At least one requested article was processed successfully.
`1`	Inputs were valid, but every requested article failed.
`2`	CLI usage or input validation failed.

Performance and Logging¶

# Use four worker threads.
uv run python -m pmcgrab --pmcids 7181753 3539614 --workers 4

# --batch-size is an alias for --workers.
uv run python -m pmcgrab --pmcids 7181753 3539614 --batch-size 4

# Enable debug logging.
uv run python -m pmcgrab --pmcids 7181753 --verbose

# Suppress progress bars.
uv run python -m pmcgrab --pmcids 7181753 --quiet

# Print package version.
uv run python -m pmcgrab --version

ID File Format¶

# ids.txt
PMC7181753
3539614
10.1038/s41586-020-2832-5

Run:

uv run python -m pmcgrab --from-id-file ids.txt --output-dir ./results

Bare numeric values in ID files are interpreted as PMC IDs. If your file contains PubMed IDs, convert them with --pmids or call normalize_pmid() in Python first.

Current Help Output¶

usage: __main__.py [-h] (--pmcids PMCIDS [PMCIDS ...] |
                   --pmids PMIDS [PMIDS ...] | --dois DOIS [DOIS ...] |
                   --from-id-file FROM_ID_FILE | --from-dir FROM_DIR |
                   --from-file FROM_FILES [FROM_FILES ...])
                   [--output-dir OUTPUT_DIR] [--batch-size BATCH_SIZE]
                   [--format {json,jsonl}] [--full-json] [--with-images]
                   [--schema-version {2,3,4}] [--verbose] [--quiet] [--version]