An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org : spark-pdf-python

PDF DataSource for Apache Spark in Python

Registry - Source - Homepage - Documentation - JSON
purl: pkg:pypi/spark-pdf-python
Keywords: big-data , data-engineering , data-extraction , data-science , ocr , ocr-recognition , pdf , pdf-document , pdf-document-processor , spark , spark-datasource , tesseract , tesseract-ocr
License: AGPL-3.0
Latest release: 2 months ago
First release: 2 months ago
Downloads: 93 last month
Stars: 45 on GitHub
Forks: 4 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 13 days ago

pyspark-pdf 0.1.0rc9
Spark-Pdf is a library for processing documents using Apache Spark
8 versions - Latest release: 6 months ago - 254 downloads last month - 45 stars on GitHub - 1 maintainer