crates.io : extractous
Extractous provides a fast and efficient way to extract content from all kind of file formats including PDF, Word, Excel CSV, Email etc... Internally it uses a natively compiled Apache Tika for formats are not supported natively by the Rust core
Registry
-
Source
- Homepage
- Documentation
- JSON
purl: pkg:cargo/extractous
Keywords:
parser
, pdf
, text
, tika
, unstructured
, data-pipelines
, docx
, etl
, etl-pipelines
, extraction
, llm
, machine-learning
, natural-language-processing
, nlp
, ocr
, pdf-parser
, rag
, rust
, unstructured-data
License: Apache-2.0
Latest release: 7 months ago
First release: 11 months ago
Downloads: 21,141 total
Stars: 1,183 on GitHub
Forks: 52 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 8 days ago