An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "structured-data" keyword

xlstruct 0.1.0
LLM-powered Excel parser — define a Pydantic schema, get structured data from any Excel file
1 version - Latest release: about 17 hours ago - 0 stars on GitHub - 1 maintainer
petey 0.1.2
Petey — The Easy PDF Extractor
3 versions - Latest release: 1 day ago - 153 downloads last month - 1 maintainer
architxt 0.6.0 💰
ArchiTXT is a tool for structuring textual data into a valid database model. It is guided by a me...
12 versions - Latest release: 28 days ago - 336 downloads last month - 5 stars on GitHub - 1 maintainer
jobextractor 0.1.0
Professional job description extraction using multiple LLM providers
1 version - Latest release: about 2 months ago - 27 downloads last month - 1 maintainer
indoxraghelper 0.0.3
Indox Retrieval Augmentation
3 versions - Latest release: about 1 year ago - 18 downloads last month - 20 stars on GitHub - 1 maintainer
langextract 1.1.1
LangExtract: A library for extracting structured data from language models
18 versions - Latest release: 3 months ago - 130 thousand downloads last month - 33,499 stars on GitHub - 1 maintainer
iflow-mcp_langextract 1.1.0
LangExtract: A library for extracting structured data from language models
1 version - Latest release: 3 months ago - 16 downloads last month - 33,499 stars on GitHub - 1 maintainer
langextract-azureopenai 0.1.7
LangExtract provider plugin for Azure OpenAI
3 versions - Latest release: 7 months ago - 1.69 thousand downloads last month - 33,499 stars on GitHub - 1 maintainer
hydration 4.0.0
A module used to define python objects that can be converted to (and from) bytes.
15 versions - Latest release: about 5 years ago - 1 dependent repositories - 385 downloads last month - 16 stars on GitHub - 2 maintainers
Top 1.8% on pypi.org
autogluon.multimodal 1.5.0
Fast and Accurate ML in 3 Lines of Code
1,232 versions - Latest release: 3 months ago - 3 dependent packages - 15 dependent repositories - 186 thousand downloads last month - 6,566 stars on GitHub - 1 maintainer
nlstruct 0.2.0
Natural language structuring library
8 versions - Latest release: about 2 years ago - 1 dependent package - 1 dependent repositories - 107 downloads last month - 21 stars on GitHub - 1 maintainer
metaminer 0.3.6
Extract structured information from documents using AI
7 versions - Latest release: 9 months ago - 20 downloads last month - 0 stars on GitHub - 1 maintainer
extract-monster 0.1.0
Python SDK for Extract Monster - Extract structured data from files and text using AI
1 version - Latest release: 5 months ago - 15 downloads last month - 1 maintainer
wagtail-herald 0.6.0 💰
SEO toolkit for Wagtail CMS - meta tags, Open Graph, Twitter Cards, and Schema.org structured data
8 versions - Latest release: 7 days ago - 104 downloads last month - 1 stars on GitHub - 1 maintainer
open-receipt-extractor 0.1.0
Modular Python pipeline that converts raw receipt documents (images or PDFs) into structured, ana...
1 version - Latest release: 8 days ago - 97 downloads last month - 1 maintainer
llama-index-readers-llama-parse 0.5.1
llama-index readers llama-parse integration
10 versions - Latest release: 6 months ago - 6 dependent packages - 2.59 million downloads last month - 3,956 stars on GitHub - 1 maintainer
Top 7.3% on pypi.org
pdf-structify 0.1.18
Extract structured data from PDFs using LLMs with sklearn-like API
19 versions - Latest release: about 2 months ago - 433 downloads last month - 1 maintainer
open-xtract 0.2.0
Extract structured data from documents, images, audio, and video using LLMs
6 versions - Latest release: about 2 months ago - 63 downloads last month - 13 stars on GitHub - 1 maintainer
llama-cloud-services 0.6.94
Tailored SDK clients for LlamaCloud services.
93 versions - Latest release: 25 days ago - 29.3 million downloads last month - 3,956 stars on GitHub - 1 maintainer
indoxrag 0.1.1
Indox Retrieval Augmentation
5 versions - Latest release: about 1 year ago - 23 downloads last month - 19 stars on GitHub - 1 maintainer
pandapy 2.2
Structured Numpy with Pandas a Click Away
22 versions - Latest release: about 6 years ago - 1 dependent repositories - 40 downloads last month - 549 stars on GitHub - 1 maintainer
log-surgeon-ffi 0.1.0b10
Python FFI bindings for log-surgeon: high-performance parsing of unstructured logs into structure...
11 versions - Latest release: about 1 month ago - 1.19 thousand downloads last month - 0 stars on GitHub - 2 maintainers
contextgem 0.21.0
Effortless LLM extraction from documents
44 versions - Latest release: 16 days ago - 2.02 thousand downloads last month - 1,697 stars on GitHub - 1 maintainer
amphi-scheduler 0.9.7
Amphi Scheduler (JupyterLab extension + Python backend)
10 versions - Latest release: 22 days ago - 242 downloads last month - 1,278 stars on GitHub - 1 maintainer
Top 1.2% on pypi.org
autogluon.tabular 1.5.0
Fast and Accurate ML in 3 Lines of Code
1,848 versions - Latest release: 3 months ago - 7 dependent packages - 44 dependent repositories - 298 thousand downloads last month - 7,185 stars on GitHub - 3 maintainers
Top 1.3% on pypi.org
autogluon.common 1.5.0
Fast and Accurate ML in 3 Lines of Code
1,456 versions - Latest release: 3 months ago - 11 dependent packages - 23 dependent repositories - 242 thousand downloads last month - 6,566 stars on GitHub - 1 maintainer
lightfeed 0.1.6
Lightfeed API Client for Python
6 versions - Latest release: 9 months ago - 32 downloads last month - 5 stars on GitHub - 1 maintainer
Top 4.0% on pypi.org
autogluon.timeseries 1.5.0
Fast and Accurate ML in 3 Lines of Code
1,274 versions - Latest release: 3 months ago - 4 dependent repositories - 328 thousand downloads last month - 9,716 stars on GitHub - 1 maintainer
aglite-test.features 0.7.0b20230314
AutoML for Image, Text, and Tabular Data
8 versions - Latest release: almost 3 years ago - 2 dependent packages - 140 downloads last month - 9,716 stars on GitHub - 1 maintainer
aglite-test.common 0.7.0b20230314
AutoML for Image, Text, and Tabular Data
8 versions - Latest release: almost 3 years ago - 2 dependent packages - 165 downloads last month - 9,493 stars on GitHub - 1 maintainer
tableshot 0.1.0
Extract tables from PDFs into clean, structured data -- instantly. An MCP server for AI assistants.
1 version - Latest release: 18 days ago - 91 downloads last month - 1 maintainer
openllmindex 0.1.0
LLM-ready index generator for websites — spec, validator, and CLI tools
1 version - Latest release: 17 days ago - 1 maintainer
documiner 0.8.2
Advanced tool designed for text analysis and data mining in documents
1 version - Latest release: 8 months ago - 1 maintainer
openapi-client-generator 1.0.13 💰
OpenAPI Client Generator
15 versions - Latest release: about 5 years ago - 1 dependent repositories - 142 downloads last month - 9 stars on GitHub - 1 maintainer
Top 0.9% on pypi.org
autogluon.core 1.1.1
Fast and Accurate ML in 3 Lines of Code
1,695 versions - Latest release: over 1 year ago - 11 dependent packages - 111 dependent repositories - 288 thousand downloads last month - 6,566 stars on GitHub - 3 maintainers
Top 1.3% on pypi.org
autogluon.features 1.1.1
Fast and Accurate ML in 3 Lines of Code
1,582 versions - Latest release: over 1 year ago - 7 dependent packages - 39 dependent repositories - 283 thousand downloads last month - 6,566 stars on GitHub - 3 maintainers
Top 1.7% on pypi.org
autogluon 1.1.1
Fast and Accurate ML in 3 Lines of Code
1,886 versions - Latest release: over 1 year ago - 8 dependent packages - 59 dependent repositories - 270 thousand downloads last month - 6,566 stars on GitHub - 2 maintainers
structcast 1.1.4
Elegantly orchestrating structured data via a flexible and serializable workflow.
6 versions - Latest release: 17 days ago - 277 downloads last month - 0 stars on GitHub - 1 maintainer
indox 0.1.31
Indox Retrieval Augmentation
29 versions - Latest release: over 1 year ago - 154 downloads last month - 20 stars on GitHub - 2 maintainers
indoxgen 0.2.0
Indox Synthetic Data Generation
14 versions - Latest release: about 1 year ago - 101 downloads last month - 20 stars on GitHub - 1 maintainer
structer 0.3.0
Structer is a structurer written in Python based on C language structs.
4 versions - Latest release: over 1 year ago - 12 downloads last month - 1 stars on GitHub - 1 maintainer
aglite-test 0.7.0b20230314
AutoML for Image, Text, and Tabular Data
8 versions - Latest release: almost 3 years ago - 159 downloads last month - 9,493 stars on GitHub - 1 maintainer
dynamic-baml 0.2.0
A standalone library for dynamic BoundaryML schema generation and LLM response parsing
4 versions - Latest release: 9 months ago - 43 downloads last month - 1 maintainer
Top 2.5% on pypi.org
autogluon.text 0.6.2
AutoML for Image, Text, and Tabular Data
837 versions - Latest release: about 3 years ago - 1 dependent package - 17 dependent repositories - 49.8 thousand downloads last month - 9,493 stars on GitHub - 1 maintainer
target_benchmark 0.1.3
Table Retrieval for Generative Tasks Benchmark
4 versions - Latest release: 10 months ago - 48 downloads last month - 22 stars on GitHub - 1 maintainer
mseep-kreuzberg 3.13.5
Document intelligence framework for Python - Extract text, metadata, and structured data from div...
4 versions - Latest release: 6 months ago - 43 downloads last month - 2,454 stars on GitHub - 1 maintainer
autogluon-tonyhu-test 1.0.5b20240302
AutoML for Image, Text, and Tabular Data
4 versions - Latest release: about 2 years ago - 748 downloads last month - 9,493 stars on GitHub - 1 maintainer
tabular-ml-toolkit 0.0.35
A helper library to jumpstart your machine learning project based on tabular or structured data.
35 versions - Latest release: about 4 years ago - 1 dependent repositories - 141 downloads last month - 1 stars on GitHub - 1 maintainer
gittxt 1.7.7
Gittxt: Get Text from Git — Optimized for AI.
18 versions - Latest release: 11 months ago - 93 downloads last month - 0 stars on GitHub - 1 maintainer
langstruct 0.2.0
LLM-powered structured information extraction using DSPy optimization
6 versions - Latest release: 5 months ago - 519 downloads last month - 55 stars on GitHub - 1 maintainer
outformer 0.1.3
Structure Outputs from Language Models
4 versions - Latest release: 9 months ago - 40 downloads last month - 10 stars on GitHub - 1 maintainer
aglite-test.tabular 0.7.0b20230314
AutoML for Image, Text, and Tabular Data
8 versions - Latest release: almost 3 years ago - 1 dependent package - 141 downloads last month - 9,493 stars on GitHub - 1 maintainer
lightfeed-sdk 0.1.7
Lightfeed SDK for Python
1 version - Latest release: 9 months ago - 7 downloads last month - 5 stars on GitHub - 1 maintainer
autogluon-tonyhu-test.multimodal 1.0.5b20240302
AutoML for Image, Text, and Tabular Data
5 versions - Latest release: about 2 years ago - 1 dependent package - 788 downloads last month - 9,493 stars on GitHub - 1 maintainer
aglite-test.core 0.7.0b20230314
AutoML for Image, Text, and Tabular Data
8 versions - Latest release: almost 3 years ago - 2 dependent packages - 161 downloads last month - 9,493 stars on GitHub - 1 maintainer
Top 2.5% on pypi.org
autogluon.vision 0.6.2
AutoML for Image, Text, and Tabular Data
846 versions - Latest release: about 3 years ago - 1 dependent package - 18 dependent repositories - 53.7 thousand downloads last month - 9,493 stars on GitHub - 1 maintainer
autotabular 0.12.0
Automatic machine learning for tabular data.
1 version - Latest release: over 4 years ago - 1 dependent repositories - 31 downloads last month - 70 stars on GitHub - 1 maintainer
Top 8.8% on pypi.org
autogluon.eda 0.8.3
AutoML for Image, Text, and Tabular Data
189 versions - Latest release: almost 2 years ago - 1.33 thousand downloads last month - 9,493 stars on GitHub - 1 maintainer
doc2json 0.1.0
Turn unstructured documents into clean JSON with auto-generated schemas
1 version - Latest release: 3 months ago - 61 downloads last month - 1 maintainer
jertl 0.1.3
A minimum viable package for processing structured data
4 versions - Latest release: over 3 years ago - 15 downloads last month - 0 stars on GitHub - 1 maintainer
open-parser 0.0.7
Open parser for all.
7 versions - Latest release: almost 2 years ago - 48 downloads last month - 130 stars on GitHub - 1 maintainer
indoxarcg 0.0.14
Indox Retrieval Augmentation
13 versions - Latest release: 12 months ago - 59 downloads last month - 20 stars on GitHub - 2 maintainers
superpipe-py 0.1.9
build unstructured to structured data transformation pipelines
8 versions - Latest release: over 1 year ago - 38 downloads last month - 108 stars on GitHub - 1 maintainer
autogluon-tonyhu-test.core 1.0.5b20240302
AutoML for Image, Text, and Tabular Data
5 versions - Latest release: about 2 years ago - 1 dependent package - 781 downloads last month - 9,493 stars on GitHub - 1 maintainer
autogluon-tonyhu-test.common 1.0.5b20240302
AutoML for Image, Text, and Tabular Data
5 versions - Latest release: about 2 years ago - 4 dependent packages - 741 downloads last month - 9,493 stars on GitHub - 1 maintainer
sibila 0.4.5
Structured queries from local or online LLM models
15 versions - Latest release: over 1 year ago - 17 downloads last month - 42 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
deeptables 0.2.6
Deep-learning Toolkit for Tabular datasets
14 versions - Latest release: about 2 years ago - 3 dependent repositories - 249 downloads last month - 694 stars on GitHub - 1 maintainer
sentinelsearcher 0.2.0
AI-powered web search and structured data extraction using Anthropic Claude or OpenAI GPT
1 version - Latest release: 3 months ago - 151 downloads last month - 1 maintainer
bitbuffet 1.0.2
Python SDK for the bitbuffet API - BitBuffet
7 versions - Latest release: 6 months ago - 53 downloads last month - 1 stars on GitHub - 1 maintainer
scrape-schema 0.6.3
A library for converting any text (xml, html, plain text, stdout, etc) to python datatypes
37 versions - Latest release: over 2 years ago - 2 dependent packages - 211 downloads last month - 4 stars on GitHub - 1 maintainer
llm-parse 0.1.5
Parse data from documents optimised for downstream llm tasks.
6 versions - Latest release: 9 months ago - 41 downloads last month - 3,859 stars on GitHub - 1 maintainer
data-analysis-framework 2.0.0
AI-powered analysis framework for structured data files and databases - part of the unified analy...
3 versions - Latest release: 4 months ago - 53 downloads last month - 0 stars on GitHub - 1 maintainer
llmextract 0.2.0
A library to extract structured information from unstructured text using LLMs, powered by LangChain.
2 versions - Latest release: 5 months ago - 19 downloads last month - 1 maintainer
thepipe-api 1.7.1
Get clean data from tricky documents, powered by VLMs.
59 versions - Latest release: 5 months ago - 7.55 thousand downloads last month - 1,317 stars on GitHub - 1 maintainer
jsonschema-default 1.8.1
Create default objects from a JSON schema
15 versions - Latest release: 12 months ago - 1 dependent package - 1 dependent repositories - 9.96 thousand downloads last month - 10 stars on GitHub - 1 maintainer
autogluon-tonyhu-test.features 1.0.5b20240302
AutoML for Image, Text, and Tabular Data
5 versions - Latest release: about 2 years ago - 3 dependent packages - 779 downloads last month - 9,493 stars on GitHub - 1 maintainer
cleanlab-cli 0.1.14
Command line interface for all things Cleanlab Studio
16 versions - Latest release: over 3 years ago - 185 downloads last month - 21 stars on GitHub - 3 maintainers
lmnr-baml 0.40.1
LMNR BAML for Python
2 versions - Latest release: over 1 year ago - 345 downloads last month - 3,272 stars on GitHub - 1 maintainer
Top 2.5% on pypi.org
autogluon.mxnet 0.3.1
AutoML for Text, Image, and Tabular Data
420 versions - Latest release: over 4 years ago - 3 dependent packages - 9 dependent repositories - 10.3 thousand downloads last month - 9,341 stars on GitHub - 1 maintainer
autogluon-tonyhu-test.tabular 1.0.5b20240302
AutoML for Image, Text, and Tabular Data
5 versions - Latest release: about 2 years ago - 747 downloads last month - 9,493 stars on GitHub - 1 maintainer
tikara 0.1.6
The metadata and text content extractor for almost every file type.
6 versions - Latest release: about 1 year ago - 182 downloads last month - 4 stars on GitHub - 1 maintainer
autogluon-tonyhu-test.timeseries 1.0.5b20240302
AutoML for Image, Text, and Tabular Data
5 versions - Latest release: about 2 years ago - 743 downloads last month - 9,493 stars on GitHub - 1 maintainer
markdown-table-extractor 0.1.2
Robust structured data extraction from markdown text, built with literate programming using marim...
3 versions - Latest release: 3 months ago - 170 downloads last month - 1 maintainer
delta_stream 0.1.6
Efficient structured streaming for real-time LLM outputs
7 versions - Latest release: 10 months ago - 118 downloads last month - 3 stars on GitHub - 1 maintainer
structured-prompts 0.1.1
A modular package for managing structured prompts with any LLM API
2 versions - Latest release: 7 months ago - 16 downloads last month - 0 stars on GitHub - 1 maintainer
docstrange 1.1.8
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, J...
19 versions - Latest release: 4 months ago - 3.9 thousand downloads last month - 935 stars on GitHub - 1 maintainer
jxon-schema 0.2.0
JSON with change tracking - A library for converting between JSON and schemas with change tracking
1 version - Latest release: about 1 year ago - 9 downloads last month - 1 stars on GitHub - 1 maintainer
structured-output-cookbook 0.1.2
Extract structured data from text using LLMs with ready-to-use templates
1 version - Latest release: 8 months ago - 6 downloads last month - 1 stars on GitHub - 1 maintainer
transfertab 0.1.0
A library to help transfer learn for structured data.
1 version - Latest release: over 4 years ago - 1 dependent repositories - 10 downloads last month - 1 stars on GitHub - 1 maintainer
Top 3.0% on pypi.org
autogluon.extra 0.3.1
AutoML for Text, Image, and Tabular Data
420 versions - Latest release: over 4 years ago - 1 dependent package - 9 dependent repositories - 20.7 thousand downloads last month - 9,860 stars on GitHub - 1 maintainer
leapocr 0.0.4
Official Python SDK for LeapOCR - Transform documents into structured data using AI-powered OCR
4 versions - Latest release: 4 months ago - 40 downloads last month - 1 maintainer
any-parser 0.0.26
Parser for all.
19 versions - Latest release: 6 months ago - 497 downloads last month - 129 stars on GitHub - 1 maintainer
ljson 0.5.4
A table dataformat based on json
16 versions - Latest release: almost 7 years ago - 1 dependent repositories - 101 downloads last month - 2 stars on GitHub - 1 maintainer
openapi-type 0.2.0 💰
OpenAPI Type
23 versions - Latest release: over 3 years ago - 2 dependent repositories - 235 downloads last month - 11 stars on GitHub - 1 maintainer
exstruct 0.4.2
Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines
26 versions - Latest release: about 2 months ago - 1.91 thousand downloads last month - 2 stars on GitHub - 1 maintainer
indoxminer 0.1.5
Indox Data Extraction
19 versions - Latest release: about 1 year ago - 63 downloads last month - 20 stars on GitHub - 2 maintainers
jintra-aether 1.0.4
A lightweight, extensible framework for structured content authoring and validation with AI assis...
5 versions - Latest release: 5 months ago - 30 downloads last month - 0 stars on GitHub - 1 maintainer
cleanlab-studio 2.5.21
Client interface for all things Cleanlab Studio
128 versions - Latest release: about 1 year ago - 1 dependent repositories - 2.42 thousand downloads last month - 21 stars on GitHub - 4 maintainers
typeit 0.27.2 💰
typeit brings typed data into your project
56 versions - Latest release: over 5 years ago - 4 dependent repositories - 502 downloads last month - 13 stars on GitHub - 1 maintainer
extrai-workflow 1.0.1
Structured data extraction with LLM majority vote
2 versions - Latest release: 3 months ago - 39 downloads last month - 2 stars on GitHub - 1 maintainer
Related Keywords
llm 39 python 36 machine-learning 33 data-science 31 natural-language-processing 29 deep-learning 28 tabular-data 27 automl 27 scikit-learn 26 computer-vision 26 transfer-learning 25 time-series 24 pytorch 24 object-detection 24 hyperparameter-optimization 24 autogluon 24 automated-machine-learning 24 ensemble-learning 24 forecasting 24 gluon 24 ai 20 document 16 pdf 15 data-extraction 15 unstructured-data 13 json 13 nlp 13 openai 12 rag 11 extraction 10 large-language-models 9 document-processing 9 pydantic 8 ml 8 document-parsing 7 schema 7 parsing 7 gemini 7 image-classification 7 machine learning 7 ocr 7 index 6 information-extraction 6 LLM 6 AI 6 NLP 6 serialization 5 yaml 5 text-extraction 5 document-extraction 5 text-processing 5 language models 5 deep learning 5 python3 4 tables 4 pdf-to-text 4 pdf-to-markdown 4 pdf-to-json 4 document-parser 4 validation 4 document-analysis 4 document-intelligence 4 extract 4 document-understanding 4 RAG 4 llm-extraction 4 retrieval-augmented generation 4 natural language processing 4 anthropic 4 parser 4 excel 4 deserialization 4 data-labeling 4 retrieval-augmented-generation 3 multimodal 3 data-analysis 3 docx-to-markdown 3 structured-generation 3 pdf-document-processor 3 pdf-to-excel 3 web-scraping 3 ppt-to-json 3 ppt-to-markdown 3 pptx 3 automation 3 database 3 document-ai 3 artificial-intelligence 3 mcp 3 mypy 3 content-extraction 3 table-extraction 3 typing 3 etl 3 data 3 information-extration 3 gemini-pro 3 gemini-flash 3 gemini-api 3 gemini-ai 3