pypi.org "file-parsing" keyword
View the packages on the pypi.org package registry that are tagged with the "file-parsing" keyword.
caterpillar-py 2.6.1
Library to pack and unpack structurized binary data.7 versions - Latest release: about 2 months ago - 410 downloads last month - 26 stars on GitHub - 1 maintainer
pydatamax 0.2.0
Advanced Data Crawling and Processing Framework20 versions - Latest release: 28 days ago - 310 downloads last month - 140 stars on GitHub - 1 maintainer
tikara 0.1.6
The metadata and text content extractor for almost every file type.6 versions - Latest release: 8 months ago - 55 downloads last month - 4 stars on GitHub - 1 maintainer
Related Keywords
excel
2
python
2
pdf
2
document-processing
2
file-conversion
2
data-processing
2
data-extraction
2
retrieval-augmented-generation
1
metadata
1
language-detection
1
information-extraction
1
image-extraction
1
format-identification
1
format-detection
1
file-type
1
file-reader
1
file-processing
1
file-identification
1
file-format
1
file-analysis
1
docx
1
document-understanding
1
document-text
1
document-reader
1
document-parsing
1
document-ocr
1
document-metadata
1
document-management
1
document-intelligence
1
pdf-to-text
1
natural-language-processing
1
ml
1
metadata-extraction
1
llm
1
java
1
image-to-text
1
word-documents
1
unstructured-data
1
tika
1
text-recognition
1
text-processing
1
text-parsing
1
text-mining
1
text-extraction
1
text-analytics
1
structured-data
1
powerpoint
1
pdf-parsing
1
office-documents
1
ocr
1
mime-type
1
data-cleaning
1
data
1
annotation
1
data-collection
1
automation
1
research
1
academic-papers
1
framework
1
cli
1
async
1
parsing
1
web-scraping
1
arxiv
1
scraping
1
crawler
1
unpacking
1
struct
1
reverse-engineering
1
python-struct
1
pystruct
1
binary-parsing
1
binary-format
1
document-indexing
1
document-extraction
1
document-converter
1
document-classification
1
document-automation
1
document-analysis
1
document-ai
1
data-parsing
1
content-type
1
content-processing
1
content-parsing
1
content-management
1
content-intelligence
1
content-indexing
1
content-extraction
1
content-detection
1
apache-tika
1
word
1
toolkit
1
ppt
1
llm-based
1
images
1