Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org : pdf2dataset

Easily convert a subdirectory with big volume of PDF documents into a dataset, supports extracting text and images

Registry - Source - Documentation - JSON
purl: pkg:pypi/pdf2dataset
Keywords: data-science, distributed-computing, distributed-systems, ocr, pandas-dataframe, parallel, parquet, pdf, pdf2image, pdftotext, pyarrow, pytesseract, pytesseract-ocr, python, python3, ray, tesseract, tesseract-ocr
License: Apache-2.0
Latest release: over 3 years ago
First release: almost 4 years ago
Dependent repositories: 1
Downloads: 158 last month
Stars: 17 on GitHub
Forks: 3 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 9 days ago

    Loading...
    Readme
    Loading...