Command Line Interface¶
PMCGrab's command-line interface for batch processing and article retrieval.
CLI Module¶
pmcgrab.cli.pmcgrab_cli ¶
Functions¶
main ¶
Main CLI entry point for batch PMC article processing.
Orchestrates the complete batch processing workflow: 1. Parse command-line arguments 2. Create output directory structure 3. Process PMC IDs in manageable chunks with progress tracking 4. Collect and report processing statistics 5. Write summary results to JSON file
The function processes articles in 100-article chunks to manage memory usage and provide regular progress updates. Each chunk is processed concurrently using the specified number of worker threads.
Output
Creates individual JSON files for each successfully processed article in the output directory, plus a summary.json file containing processing statistics for all articles.
Examples:
This function is typically called via: python -m pmcgrab.cli.pmcgrab_cli --pmcids 7181753 3539614
Note
The function assumes that process_pmc_ids() handles the actual file writing for individual articles. It focuses on orchestration, progress tracking, and summary generation.
Source code in src/pmcgrab/cli/pmcgrab_cli.py
Usage Examples¶
Basic Commands¶
# Process single paper
uv run python -m pmcgrab PMC7181753
# Process multiple papers
uv run python -m pmcgrab PMC7181753 PMC3539614 PMC5454911
Advanced Options¶
# Custom output directory
uv run python -m pmcgrab --output-dir ./results PMC7181753
# Parallel processing
uv run python -m pmcgrab --workers 8 PMC7181753 PMC3539614
# From file input
uv run python -m pmcgrab --input-file pmc_ids.txt --max-retries 3
All Options¶
--output-dir
: Specify output directory (default: ./pmc_output)--workers
: Number of parallel workers (default: 4)--email
: Contact email for NCBI API--input-file
: Read PMC IDs from file--max-retries
: Maximum retry attempts for failed downloads--batch-size
: Number of articles per batch--timeout
: Request timeout in seconds--verbose
: Enable verbose logging--help
: Show help message