Command Line Interface¶
PMCGrab provides a powerful command-line interface for batch processing and article retrieval.
Basic Usage¶
Single Article¶
# Process a single PMC article
uv run uv run python -m pmcgrab PMC7181753
# Specify email (required by NCBI)
uv run uv run python -m pmcgrab --email your-email@example.com PMC7181753
Multiple Articles¶
# Process multiple articles
uv run uv run python -m pmcgrab PMC7181753 PMC3539614 PMC5454911
# From a file (one PMC ID per line)
uv run uv run python -m pmcgrab --input-file pmc_ids.txt
Command Options¶
Output Configuration¶
# Custom output directory
uv run uv run python -m pmcgrab --output-dir ./results PMC7181753
# Create timestamped directory
uv run uv run python -m pmcgrab --output-dir ./results_$(date +%Y%m%d) PMC7181753
Performance Options¶
# Parallel processing
uv run uv run python -m pmcgrab --workers 8 PMC7181753 PMC3539614
# Batch size configuration
uv run uv run python -m pmcgrab --batch-size 20 --workers 4 PMC7181753 PMC3539614
# Timeout settings
uv run uv run python -m pmcgrab --timeout 60 PMC7181753
Error Handling¶
# Retry configuration
uv run python -m pmcgrab --max-retries 5 PMC7181753
# Verbose output
uv run python -m pmcgrab --verbose PMC7181753
# Suppress warnings
uv run python -m pmcgrab --quiet PMC7181753
Complete Example¶
uv run python -m pmcgrab \
--email your-email@example.com \
--output-dir ./pmc_results \
--workers 8 \
--batch-size 25 \
--max-retries 3 \
--timeout 60 \
--verbose \
PMC7181753 PMC3539614 PMC5454911
Input Files¶
Text File Format¶
Create pmc_ids.txt
:
Then run:
CSV Input¶
For CSV files with PMC IDs in a specific column:
# If PMC IDs are in 'pmcid' column
uv run python -m pmcgrab --input-csv articles.csv --pmcid-column pmcid
# If PMC IDs are in 'id' column
uv run python -m pmcgrab --input-csv data.csv --pmcid-column id
Output Files¶
The CLI creates several output files:
Individual Article Files¶
output_directory/
├── PMC7181753.json # Individual article data
├── PMC3539614.json
└── PMC5454911.json
Summary Files¶
output_directory/
├── processing_summary.json # Processing statistics
├── failed_pmcids.txt # Failed PMC IDs
└── processing.log # Detailed log (if --log-file used)
Environment Variables¶
Set default values using environment variables:
export PMCGRAB_EMAIL="your-email@example.com"
export PMCGRAB_OUTPUT_DIR="./default_output"
export PMCGRAB_WORKERS=8
export PMCGRAB_BATCH_SIZE=20
export PMCGRAB_TIMEOUT=60
export PMCGRAB_MAX_RETRIES=3
# Now you can run with defaults
uv run python -m pmcgrab PMC7181753
Advanced Usage¶
Filtering and Validation¶
# Validate XML structure
uv run python -m pmcgrab --validate PMC7181753
# Skip validation for speed
uv run python -m pmcgrab --no-validate PMC7181753
# Download and cache XML files
uv run python -m pmcgrab --download --cache-dir ./xml_cache PMC7181753
Resume Processing¶
# Resume from previous failed run
uv run python -m pmcgrab --resume --input-dir ./previous_output PMC7181753 PMC3539614
# Or resume from failed IDs file
uv run python -m pmcgrab --input-file ./previous_output/failed_pmcids.txt
Logging Options¶
# Enable detailed logging
uv run python -m pmcgrab --verbose --log-file processing.log PMC7181753
# Different log levels
uv run python -m pmcgrab --log-level DEBUG PMC7181753
uv run python -m pmcgrab --log-level WARNING PMC7181753
Batch Processing Examples¶
Small Scale (< 100 articles)¶
uv run python -m pmcgrab \
--input-file small_list.txt \
--workers 4 \
--batch-size 10 \
--email your-email@example.com
Medium Scale (100-1000 articles)¶
uv run python -m pmcgrab \
--input-file medium_list.txt \
--workers 8 \
--batch-size 25 \
--max-retries 3 \
--timeout 45 \
--verbose \
--email your-email@example.com
Large Scale (1000+ articles)¶
uv run python -m pmcgrab \
--input-file large_list.txt \
--workers 12 \
--batch-size 50 \
--max-retries 5 \
--timeout 90 \
--cache-dir ./xml_cache \
--log-file large_processing.log \
--email your-email@example.com
Error Handling¶
Common Exit Codes¶
0
: Success1
: General error2
: Invalid arguments3
: Network error4
: File not found5
: Permission error
Handling Failures¶
# Run with error handling
uv run python -m pmcgrab PMC7181753 PMC3539614
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "Processing completed successfully"
elif [ $EXIT_CODE -eq 3 ]; then
echo "Network error - check connection and retry"
else
echo "Processing failed with exit code $EXIT_CODE"
fi
Retry Failed Articles¶
# Initial processing
uv run python -m pmcgrab --input-file all_ids.txt --output-dir ./results
# Retry failed articles
if [ -f ./results/failed_pmcids.txt ]; then
echo "Retrying failed articles..."
uv run python -m pmcgrab \
--input-file ./results/failed_pmcids.txt \
--output-dir ./results \
--max-retries 5 \
--timeout 120
fi
Performance Tuning¶
Network Optimization¶
# For slow networks
uv run python -m pmcgrab \
--workers 2 \
--batch-size 5 \
--timeout 120 \
--max-retries 10 \
PMC7181753
# For fast networks
uv run python -m pmcgrab \
--workers 16 \
--batch-size 50 \
--timeout 30 \
--max-retries 2 \
PMC7181753
Memory Optimization¶
# For memory-constrained systems
uv run python -m pmcgrab \
--workers 2 \
--batch-size 5 \
--no-cache \
PMC7181753
# For high-memory systems
uv run python -m pmcgrab \
--workers 16 \
--batch-size 100 \
--cache-dir ./large_cache \
PMC7181753
Integration with Shell Scripts¶
Bash Script Example¶
#!/bin/bash
# PMCGrab batch processing script
EMAIL="your-email@example.com"
INPUT_FILE="pmc_ids.txt"
OUTPUT_DIR="./batch_$(date +%Y%m%d_%H%M%S)"
LOG_FILE="processing_$(date +%Y%m%d_%H%M%S).log"
echo "Starting PMCGrab batch processing..."
echo "Input file: $INPUT_FILE"
echo "Output directory: $OUTPUT_DIR"
echo "Log file: $LOG_FILE"
uv run python -m pmcgrab \
--input-file "$INPUT_FILE" \
--output-dir "$OUTPUT_DIR" \
--email "$EMAIL" \
--workers 8 \
--batch-size 20 \
--max-retries 3 \
--verbose \
--log-file "$LOG_FILE"
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "Processing completed successfully!"
echo "Results in: $OUTPUT_DIR"
echo "Log file: $LOG_FILE"
else
echo "Processing failed with exit code: $EXIT_CODE"
echo "Check log file: $LOG_FILE"
exit $EXIT_CODE
fi
PowerShell Script Example¶
# PMCGrab batch processing script for Windows
$EMAIL = "your-email@example.com"
$INPUT_FILE = "pmc_ids.txt"
$OUTPUT_DIR = "./batch_$(Get-Date -Format 'yyyyMMdd_HHmmss')"
$LOG_FILE = "processing_$(Get-Date -Format 'yyyyMMdd_HHmmss').log"
Write-Host "Starting PMCGrab batch processing..."
Write-Host "Input file: $INPUT_FILE"
Write-Host "Output directory: $OUTPUT_DIR"
Write-Host "Log file: $LOG_FILE"
uv run python -m pmcgrab `
--input-file $INPUT_FILE `
--output-dir $OUTPUT_DIR `
--email $EMAIL `
--workers 8 `
--batch-size 20 `
--max-retries 3 `
--verbose `
--log-file $LOG_FILE
if ($LASTEXITCODE -eq 0) {
Write-Host "Processing completed successfully!" -ForegroundColor Green
Write-Host "Results in: $OUTPUT_DIR"
Write-Host "Log file: $LOG_FILE"
} else {
Write-Host "Processing failed with exit code: $LASTEXITCODE" -ForegroundColor Red
Write-Host "Check log file: $LOG_FILE"
exit $LASTEXITCODE
}
Help and Documentation¶
Get Help¶
# Show help message
uv run python -m pmcgrab --help
# Show version
uv run python -m pmcgrab --version
# Show configuration
uv run python -m pmcgrab --show-config
All Available Options¶
Usage: uv run python -m pmcgrab [OPTIONS] [PMCIDS...]
Options:
--email TEXT Contact email for NCBI API (required)
--output-dir TEXT Output directory (default: ./pmc_output)
--input-file TEXT File containing PMC IDs (one per line)
--input-csv TEXT CSV file containing PMC IDs
--pmcid-column TEXT Column name for PMC IDs in CSV (default: pmcid)
--workers INTEGER Number of parallel workers (default: 4)
--batch-size INTEGER Batch size for processing (default: 10)
--max-retries INTEGER Maximum retry attempts (default: 3)
--timeout INTEGER Request timeout in seconds (default: 30)
--validate / --no-validate Validate XML structure (default: True)
--download / --no-download Download and cache XML files (default: False)
--cache-dir TEXT Directory for caching XML files
--verbose / --quiet Enable/disable verbose output (default: False)
--log-file TEXT Log file path
--log-level TEXT Log level (DEBUG, INFO, WARNING, ERROR)
--resume Resume from previous failed run
--input-dir TEXT Input directory for resume mode
--version Show version and exit
--help Show this message and exit
This comprehensive CLI guide should help you use PMCGrab effectively from the command line for any scale of processing.