Skip to content

Command Line Interface

PMCGrab's CLI lives in pmcgrab.cli.pmcgrab_cli.

CLI Module

pmcgrab.cli.pmcgrab_cli

Attributes

Classes

Functions

main

main() -> int

Main CLI entry point for batch PMC article processing.

Source code in src/pmcgrab/cli/pmcgrab_cli.py
def main() -> int:
    """Main CLI entry point for batch PMC article processing."""
    args = _parse_args()

    # --- Configure logging ---
    if args.verbose:
        logging.basicConfig(
            level=logging.DEBUG, format="%(name)s %(levelname)s: %(message)s"
        )
    else:
        logging.basicConfig(level=logging.WARNING)

    out_dir = Path(args.output_dir)
    out_dir.mkdir(parents=True, exist_ok=True)

    _warn_dangling_image_flags(args)

    jsonl_fh = None
    if args.output_format == "jsonl":
        jsonl_fh = (out_dir / "output.jsonl").open("w", encoding="utf-8")

    try:
        if args.from_dir:
            results = _process_local_directory(args, out_dir, jsonl_fh)
            if results is None:
                return 2
        elif args.from_files:
            results = _process_local_files(args, out_dir, jsonl_fh)
        else:
            pmc_ids = _resolve_network_ids(args)
            if not pmc_ids:
                print("No valid PMC IDs to process.", file=sys.stderr)
                return 2
            results = _process_network_ids(args, pmc_ids, out_dir, jsonl_fh)

    finally:
        if jsonl_fh is not None:
            jsonl_fh.close()

    summary_path = _write_summary(results, out_dir)
    total = len(results)
    ok = sum(1 for v in results.values() if v.get("parsed"))
    print(f"\nDone: {ok}/{total} succeeded.  Summary written to {summary_path}")
    return 0 if ok > 0 else 1

Usage Examples

# Process one PMC ID.
uv run python -m pmcgrab --pmcids 7181753

# Process multiple PMC IDs.
uv run python -m pmcgrab --pmcids 7181753 3539614 5454911

# Process PubMed IDs.
uv run python -m pmcgrab --pmids 33087749

# Process local XML without network access.
uv run python -m pmcgrab --from-dir ./pmc_bulk_xml --output-dir ./results

Options

  • --pmcids, --ids: PMC IDs to process.
  • --pmids: PubMed IDs to convert to PMC IDs.
  • --dois: DOIs to convert to PMC IDs.
  • --from-id-file: Text file containing identifiers.
  • --from-dir: Directory of local XML files.
  • --from-file: One or more local XML files.
  • --output-dir, --out: Output directory.
  • --workers, --batch-size: Number of worker threads.
  • --format: json or jsonl.
  • --verbose: Enable debug logging.
  • --quiet: Suppress progress bars.
  • --version: Print the installed PMCGrab version.