An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

crates.io : extractous

Extractous provides a fast and efficient way to extract content from all kind of file formats including PDF, Word, Excel CSV, Email etc... Internally it uses a natively compiled Apache Tika for formats are not supported natively by the Rust core

Registry - Source - Homepage - Documentation - JSON
purl: pkg:cargo/extractous
Keywords: parser , pdf , text , tika , unstructured , data-pipelines , docx , etl , etl-pipelines , extraction , llm , machine-learning , natural-language-processing , nlp , ocr , pdf-parser , rag , rust , unstructured-data
License: Apache-2.0
Latest release: 7 months ago
First release: 11 months ago
Downloads: 21,141 total
Stars: 1,183 on GitHub
Forks: 52 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 8 days ago

    Loading...
    Readme
    Loading...