pdfly

Tracked

A command-line tool to extract (meta)data from PDFs and manipulate PDF files at scale.

Author py-pdf Open Sourced 2022-04-09 Last Commit Unknown

pdfly is a lightweight CLI tool for extracting metadata from PDFs and performing common PDF manipulations at scale. It provides scriptable commands that fit naturally into automation pipelines, CI jobs, and batch-processing workflows, making it straightforward to integrate PDF operations without writing custom parsing code.

Extraction Capabilities

Fast extraction of document metadata, text content, and structured information from single or multiple PDFs
Batch-oriented CLI designed for scripting and unattended execution in CI/CD or data pipelines
Configurable output formats and processing options that adapt to archival, indexing, or analysis needs

Integration & Automation

Fits into CI/CD pipelines for automated document processing and validation
Serves post-OCR cleanup workflows where PDFs need inspection before downstream analysis
Automates data extraction across thousands of PDFs in large document archives

Technical Design

Built in Python on top of proven PDF parsing libraries
Exposes both a CLI and programmatic APIs for flexibility
BSD-3-Clause licensed with documentation hosted on Read the Docs