pypi.org "data-extraction" keyword
View the packages on the pypi.org package registry that are tagged with the "data-extraction" keyword.
Top 1.0% on pypi.org
18 versions - Latest release: over 7 years ago - 19 dependent packages - 208 dependent repositories - 1.94 million downloads last month - 5,681 stars on GitHub - 1 maintainer
flashtext 2.7
Extract/Replaces keywords in sentences.18 versions - Latest release: over 7 years ago - 19 dependent packages - 208 dependent repositories - 1.94 million downloads last month - 5,681 stars on GitHub - 1 maintainer
extract-monster 0.1.0
Python SDK for Extract Monster - Extract structured data from files and text using AI1 version - Latest release: about 1 month ago - 33 downloads last month - 1 maintainer
osrs-wiki-cli 1.0.0
Modern CLI utility for extracting structured data from Old School RuneScape Wiki1 version - Latest release: about 1 month ago - 60 downloads last month - 1 maintainer
spark-pdf-python 0.1.1
PDF DataSource for Apache Spark in Python3 versions - Latest release: 9 months ago - 24 downloads last month - 49 stars on GitHub - 1 maintainer
taulu 2.0.3
Segment a table from an image24 versions - Latest release: about 22 hours ago - 2.06 thousand downloads last month - 8 stars on GitHub - 2 maintainers
any-parser 0.0.26
Parser for all.19 versions - Latest release: about 2 months ago - 415 downloads last month - 129 stars on GitHub - 1 maintainer
tabstack 1.0.1
Python SDK for TABStack AI - Extract, Generate, and Automate web content1 version - Latest release: 1 day ago - 1 maintainer
web-maestro 1.0.0 removed
Production-ready web content extraction with multi-provider LLM support and intelligent browser a...1 version - Latest release: 4 months ago - 0 stars on GitHub - 1 maintainer
trustgraph-cli 1.5.4
TrustGraph provides a means to run a pipeline of flexible AI processing components in a flexible ...286 versions - Latest release: 1 day ago - 1.54 thousand downloads last month - 667 stars on GitHub - 1 maintainer
trustgraph-embeddings-hf 1.5.4
HuggingFace embeddings support for TrustGraph.288 versions - Latest release: 1 day ago - 670 downloads last month - 667 stars on GitHub - 1 maintainer
trustgraph-vertexai 1.5.4
TrustGraph provides a means to run a pipeline of flexible AI processing components in a flexible ...285 versions - Latest release: 1 day ago - 634 downloads last month - 667 stars on GitHub - 1 maintainer
trustgraph-mcp 1.5.4
TrustGraph provides a means to run a pipeline of flexible AI processing components in a flexible ...80 versions - Latest release: 1 day ago - 393 downloads last month - 667 stars on GitHub - 1 maintainer
firecrawl 4.7.0
Python SDK for Firecrawl API96 versions - Latest release: 1 day ago - 999 thousand downloads last month - 66,580 stars on GitHub - 1 maintainer
firecrawl-py 4.7.0
Python SDK for Firecrawl API119 versions - Latest release: 1 day ago - 1 dependent package - 1.26 million downloads last month - 66,580 stars on GitHub - 1 maintainer
facebook-ads-reports 2.0.1
ETL module for Facebook Ads API v23 with lightweight native Python data processing7 versions - Latest release: about 1 month ago - 61 downloads last month - 2 stars on GitHub - 1 maintainer
Top 5.4% on pypi.org
57 versions - Latest release: 3 days ago - 4 dependent packages - 3 dependent repositories - 25.2 thousand downloads last month - 258 stars on GitHub - 1 maintainer
vnstock 3.3.0 💰
A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to ...57 versions - Latest release: 3 days ago - 4 dependent packages - 3 dependent repositories - 25.2 thousand downloads last month - 258 stars on GitHub - 1 maintainer
echr-extractor 1.0.46
Python library for extracting case law data from the European Court of Human Rights (ECHR) HUDOC ...46 versions - Latest release: about 1 month ago - 296 downloads last month - 5 stars on GitHub - 1 maintainer
ytfetcher 1.4.1
YTFetcher lets you fetch YouTube transcripts in bulk with metadata like titles, publish dates, an...15 versions - Latest release: 4 days ago - 464 downloads last month - 20 stars on GitHub - 1 maintainer
extrai-workflow 1.0.0
Structured data extraction with LLM majority vote1 version - Latest release: 11 days ago - 136 downloads last month - 2 stars on GitHub - 1 maintainer
google-sheets-helper 2.0.7
Lightweight helper module to extract data from Google Sheets and Excel files as list of dictionaries7 versions - Latest release: about 1 month ago - 60 downloads last month - 0 stars on GitHub - 1 maintainer
scrape-it-now 3.0.4
Web scraper made for AI and simplicity in mind. It runs as a CLI that can be parallelized and out...5 versions - Latest release: 11 months ago - 14 downloads last month - 528 stars on GitHub - 1 maintainer
google-ads-reports 2.0.2
Simple Google Ads API client with automatic retries, exception handling, and flexible report gene...11 versions - Latest release: about 1 month ago - 56 downloads last month - 1 stars on GitHub - 1 maintainer
molscraper-tool 1.0.0
Chemical data extraction tool for researchers and chemists1 version - Latest release: 4 months ago - 10 downloads last month - 1 maintainer
scaledp 0.2.4
ScaleDP is a library for processing documents using Apache Spark and LLMs121 versions - Latest release: 11 days ago - 1.44 thousand downloads last month - 16 stars on GitHub - 1 maintainer
bbva2pandas 1.1.3
Parse BBVA monthly reports directly to a Dataframe6 versions - Latest release: over 1 year ago - 1 dependent repositories - 107 downloads last month - 8 stars on GitHub - 1 maintainer
kubera 0.0.1
Kubera is a tool for annonymizing and extracting traces from from ChatGPT, Claude, etc. usage data1 version - Latest release: 2 months ago - 18 downloads last month
extralit 0.6.1 💰
Open-source tool for accurate & fast scientific literature data extraction with LLM and human-in-...14 versions - Latest release: 3 months ago - 65 downloads last month - 29 stars on GitHub - 1 maintainer
structured-output-cookbook 0.1.2
Extract structured data from text using LLMs with ready-to-use templates1 version - Latest release: 5 months ago - 9 downloads last month - 1 stars on GitHub - 1 maintainer
phasor-point-cli 0.5.2
A comprehensive CLI for extracting, processing, and analyzing PMU data from PhasorPoint databases3 versions - Latest release: 8 days ago - 198 downloads last month - 0 stars on GitHub - 1 maintainer
filmweb 0.10
Export movie ratings from filmweb.pl9 versions - Latest release: 9 days ago - 1 dependent repositories - 30 downloads last month - 16 stars on GitHub - 1 maintainer
tikara 0.1.6
The metadata and text content extractor for almost every file type.6 versions - Latest release: 10 months ago - 15 downloads last month - 4 stars on GitHub - 1 maintainer
scrapai 0.6.0
AI-powered web scraping SDK with intelligent configuration generation5 versions - Latest release: 9 days ago - 271 downloads last month - 1 maintainer
u-transkript 1.1.0
YouTube videolarını otomatik olarak çıkarıp AI ile çeviren güçlü Python kütüphanesi2 versions - Latest release: 2 months ago - 18 downloads last month - 12 stars on GitHub - 1 maintainer
linkdapi 1.0.2
Python SDK for LinkdAPI - The Most Reliable Unofficial LinkedIn API3 versions - Latest release: 10 days ago - 66 downloads last month - 2 stars on GitHub - 1 maintainer
trustpilot-scraper 0.10
A Python library for scraping Trustpilot reviews.7 versions - Latest release: over 1 year ago - 222 downloads last month - 11 stars on GitHub - 1 maintainer
kryptone 6.0.0
Kryptone is a hight level web scapper dedicated to marketers and wrapped around the Selenium libr...2 versions - Latest release: 9 months ago - 12 downloads last month - 0 stars on GitHub - 1 maintainer
interoperabilityenabler 0.1.6
Interoperability Enabler7 versions - Latest release: 3 months ago - 37 downloads last month - 0 stars on GitHub - 1 maintainer
scienceai-llm 0.1.6
An AI powered scientific literature search engine6 versions - Latest release: over 1 year ago - 20 downloads last month - 2 stars on GitHub - 1 maintainer
intelliscript-ai 2.1.0
World's first AI CLI tool with Google LangExtract integration for command generation and data ana...1 version - Latest release: 3 months ago - 16 downloads last month - 1 stars on GitHub - 1 maintainer
newshound 0.0.1 💰
A future news extractor package for Python 31 version - Latest release: about 4 years ago - 1 dependent repositories - 25 downloads last month - 33 stars on GitHub - 1 maintainer
pydatamax 0.2.0
Advanced Data Crawling and Processing Framework20 versions - Latest release: 2 months ago - 103 downloads last month - 140 stars on GitHub - 1 maintainer
mseep-outscraper-mcp 1.0.1
Streamlined MCP server for Outscraper's Google Maps data extraction services - 2 essential tools ...2 versions - Latest release: 2 months ago - 99 downloads last month - 1 stars on GitHub - 1 maintainer
wiktionary-de-parser 0.12.13 💰
Extracts data from German Wiktionary dump files.47 versions - Latest release: 11 months ago - 2 dependent repositories - 1.03 thousand downloads last month - 26 stars on GitHub - 1 maintainer
contextgem 0.19.2
Effortless LLM extraction from documents40 versions - Latest release: about 1 month ago - 2.93 thousand downloads last month - 1,511 stars on GitHub - 1 maintainer
lightfeed-sdk 0.1.7
Lightfeed SDK for Python1 version - Latest release: 5 months ago - 12 downloads last month - 5 stars on GitHub - 1 maintainer
data-extractor 1.0.1
Combine XPath, CSS Selectors and JSONPath for Web data extracting.43 versions - Latest release: about 1 year ago - 1 dependent repositories - 200 downloads last month - 27 stars on GitHub - 1 maintainer
tcx-extract 0.1.2
A speed-optimized tcx data extractor.4 versions - Latest release: over 1 year ago - 12 downloads last month - 0 stars on GitHub - 1 maintainer
netcdfkit 0.1.2
High-Performance NetCDF Data Extraction Toolkit for Climate and Environmental Sciences3 versions - Latest release: 5 months ago - 14 downloads last month - 0 stars on GitHub - 1 maintainer
boe-etl 1.0.0
Pure ETL pipeline for financial document processing - extracts data without analytical assumptions1 version - Latest release: 6 months ago - 16 downloads last month - 0 stars on GitHub - 1 maintainer
xtweet 1.0.2
Es una biblioteca que te permite interactuar de manera eficiente con la API de Twitter.3 versions - Latest release: over 2 years ago - 11 downloads last month - 3 stars on GitHub - 1 maintainer
extralit-server 0.6.1 💰
Open-source tool for accurate & fast scientific literature data extraction with LLM and human-in-...10 versions - Latest release: 3 months ago - 67 downloads last month - 26 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
48 versions - Latest release: over 2 years ago - 3 dependent packages - 19 dependent repositories - 55.3 thousand downloads last month - 483 stars on GitHub - 1 maintainer
amazoncaptcha 0.5.11
"Pure Python, lightweight, Pillow-based solver for the Amazon text captcha."48 versions - Latest release: over 2 years ago - 3 dependent packages - 19 dependent repositories - 55.3 thousand downloads last month - 483 stars on GitHub - 1 maintainer
Top 4.8% on pypi.org
83 versions - Latest release: over 5 years ago - 8 dependent repositories - 2.41 thousand downloads last month - 1,517 stars on GitHub - 2 maintainers
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...83 versions - Latest release: over 5 years ago - 8 dependent repositories - 2.41 thousand downloads last month - 1,517 stars on GitHub - 2 maintainers
scrappeycom 0.3.8
An API wrapper for Scrappey.com written in Python (cloudflare bypass & solver)11 versions - Latest release: over 2 years ago - 261 downloads last month - 21 stars on GitHub - 1 maintainer
post-archiver-improved 0.3.0
A Python package for archiving YouTube community posts with zero dependencies2 versions - Latest release: 3 months ago - 28 downloads last month - 0 stars on GitHub - 1 maintainer
gsparse 0.2.0
Google Sheets Parser2 versions - Latest release: 17 days ago - 0 stars on GitHub - 1 maintainer
plotdigitizer 0.3.0
Extract raw data from plots images12 versions - Latest release: over 1 year ago - 2 dependent repositories - 518 downloads last month - 148 stars on GitHub - 1 maintainer
kaggle-discussion-extractor 1.3.0
A professional-grade tool for extracting and analyzing discussions from Kaggle competitions17 versions - Latest release: 19 days ago - 195 downloads last month - 1 stars on GitHub - 1 maintainer
tap-planetscaleapi 0.4.4 💰
Singer tap for PlanetScaleAPI, built with the Meltano Singer SDK.18 versions - Latest release: 17 days ago - 352 downloads last month - 0 stars on GitHub - 1 maintainer
schema-string 0.1.0
A simple, LLM-friendly schema definition library1 version - Latest release: 5 months ago - 9 downloads last month - 0 stars on GitHub - 1 maintainer
scrapling 0.3.8 💰
Scrapling is an undetectable, powerful, flexible, high-performance Python library that makes Web ...31 versions - Latest release: 18 days ago - 20 thousand downloads last month - 7,710 stars on GitHub - 1 maintainer
opencrawler 1.0.2
Production-ready, enterprise-grade web scraping and crawling framework with advanced AI integration1 version - Latest release: 4 months ago - 14 downloads last month - 1 maintainer
magicalapi 1.4.0
This is a Python client that provides easy access to the MagicalAPI.com services, fully type anno...12 versions - Latest release: 3 months ago - 101 downloads last month - 6 stars on GitHub - 1 maintainer
boututils 0.2.1
Python utilities for BOUT++12 versions - Latest release: about 2 years ago - 3 dependent packages - 4 dependent repositories - 14.7 thousand downloads last month - 1 stars on GitHub - 4 maintainers
labelkit 0.1.0
build unstructured to structured data transformation pipelines5 versions - Latest release: over 1 year ago - 15 downloads last month - 110 stars on GitHub - 1 maintainer
rapid-crawl 0.1.0 💰
A powerful Python SDK for web scraping, crawling, and data extraction - inspired by Firecrawl1 version - Latest release: 4 months ago - 20 downloads last month - 0 stars on GitHub - 1 maintainer
crawlpnt 0.1.0
Precision Navigation Tool for dependency-free, AI-ready web crawling.1 version - Latest release: 9 months ago - 18 downloads last month - 0 stars on GitHub - 1 maintainer
yirabot 1.0.9
YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, o...20 versions - Latest release: over 1 year ago - 131 downloads last month - 20 stars on GitHub - 1 maintainer
mashrur-facebook-scraper 2.0.0
Professional-grade Facebook data extraction tool with Nuitka compilation support2 versions - Latest release: 21 days ago - 1 maintainer
siaextractlib 0.2.2
Provide an easy to use API for download oceanographic data.1 version - Latest release: over 2 years ago - 1 dependent package - 1 dependent repositories - 9 downloads last month - 0 stars on GitHub - 1 maintainer
tpl-parser 1.0.1
A Python package to parse Photoshop TPL files and extract data into JSON format.2 versions - Latest release: about 1 year ago - 8 downloads last month - 0 stars on GitHub - 1 maintainer
mashrur-facebook-scrapper 1.1.0
Professional-grade Facebook data extraction tool for posts, engagement metrics, and media content1 version - Latest release: 21 days ago
gridgulp 0.3.4
Simplified intelligent spreadsheet ingestion framework with automatic table detection2 versions - Latest release: 4 months ago - 28 downloads last month - 2 stars on GitHub - 1 maintainer
mgmscraper 0.1.0 removed
Asynchronous Python library for accessing Turkish Meteorology General Directorate (MGM) data1 version - Latest release: 6 months ago - 0 stars on GitHub - 1 maintainer
inparse 0.1.1
Collaborative AI for Web Scraping, Data Extraction and Crawling,Knowledge Graph2 versions - Latest release: almost 7 years ago - 1 dependent repositories - 13 downloads last month - 0 stars on GitHub - 1 maintainer
taupe 1.2.0
Taupe: a tool to extract URLs from your personal Twitter archive4 versions - Latest release: almost 3 years ago - 11 downloads last month - 33 stars on GitHub - 1 maintainer
arachnio 0.0.0
Client library for interacting with Arachnio API1 version - Latest release: over 2 years ago - 28 downloads last month - 0 stars on GitHub - 1 maintainer
whizoai 1.0.0
Official WhizoAI SDK for Python - Enterprise-grade web scraping API client1 version - Latest release: 22 days ago - 1 maintainer
complex-parser 0.0.2
A versatile Python package for data extraction from JSON-like structures with user-defined format...3 versions - Latest release: over 1 year ago - 24 downloads last month - 5 stars on GitHub - 1 maintainer
Top 7.4% on pypi.org
10 versions - Latest release: about 1 year ago - 1 dependent package - 4 dependent repositories - 7.13 thousand downloads last month - 95 stars on GitHub - 1 maintainer
cyac 1.11
High performance Trie and Ahocorasick automata (AC automata) for python10 versions - Latest release: about 1 year ago - 1 dependent package - 4 dependent repositories - 7.13 thousand downloads last month - 95 stars on GitHub - 1 maintainer
Top 7.4% on pypi.org
38 versions - Latest release: about 2 months ago - 6 dependent repositories - 320 downloads last month - 120 stars on GitHub - 1 maintainer
sayn 0.6.17
Data-modelling and processing framework for automating Python and SQL tasks38 versions - Latest release: about 2 months ago - 6 dependent repositories - 320 downloads last month - 120 stars on GitHub - 1 maintainer
Top 9.7% on pypi.org
4 versions - Latest release: over 10 years ago - 5 dependent repositories - 45 downloads last month - 505 stars on GitHub - 1 maintainer
libextract 0.0.12
A HT/XML web scraping tool4 versions - Latest release: over 10 years ago - 5 dependent repositories - 45 downloads last month - 505 stars on GitHub - 1 maintainer
hivehoney 1.0.4
Client-less data retrieval from Hive.5 versions - Latest release: almost 7 years ago - 1 dependent repositories - 12 downloads last month - 3 stars on GitHub - 1 maintainer
open-parser 0.0.7
Open parser for all.7 versions - Latest release: over 1 year ago - 23 downloads last month - 130 stars on GitHub - 1 maintainer
sia-app 1.1.0
Application to facilitate the download, exploration and visual analysis of oceanographic data.1 version - Latest release: over 2 years ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
pa-scraper 0.2.4
Python wrapper for Prompt API's Scraper API9 versions - Latest release: about 5 years ago - 1 dependent repositories - 12 downloads last month - 5 stars on GitHub - 1 maintainer
lightfeed 0.1.6
Lightfeed API Client for Python6 versions - Latest release: 5 months ago - 22 downloads last month - 5 stars on GitHub - 1 maintainer
string-schema 0.1.6
A simple, LLM-friendly schema definition library for converting string syntax to structured schemas7 versions - Latest release: 3 months ago - 126 downloads last month - 0 stars on GitHub - 1 maintainer
llm_etl_pipeline 0.1.0
LLM extraction from documents1 version - Latest release: 5 months ago - 17 downloads last month - 1 stars on GitHub - 1 maintainer
kobo2pandas 0.9.0
Desde la API de Kobo a pandas.DataFrame1 version - Latest release: 5 months ago - 12 downloads last month - 0 stars on GitHub - 1 maintainer
crawl4ai-mcp-sse-stdio 1.1.0
MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction10 versions - Latest release: 2 months ago - 2.55 thousand downloads last month - 3 stars on GitHub - 1 maintainer
jsonpath-extractor 0.9.2
A selector expression for extracting data from JSON.18 versions - Latest release: about 1 year ago - 1 dependent repositories - 8.05 thousand downloads last month - 41 stars on GitHub - 1 maintainer
outscraper-mcp-server 0.1.2
MCP server exposing Outscraper tools3 versions - Latest release: 5 months ago - 41 downloads last month - 3 stars on GitHub - 1 maintainer
ricloud 3.2.0
Python client for Reincubate's ricloud API.43 versions - Latest release: over 5 years ago - 2 dependent repositories - 127 downloads last month - 96 stars on GitHub - 2 maintainers
deckard 0.1.0
Extract structured data from unstructured text — no AI, just regular expressions. 🔍2 versions - Latest release: 3 months ago - 241 downloads last month - 0 stars on GitHub - 1 maintainer
serp-forge 1.0.0
A powerful web scraping toolkit for SERP data extraction and analysis1 version - Latest release: 5 months ago - 9 downloads last month - 0 stars on GitHub - 1 maintainer
outscraper-mcp 1.0.0
Streamlined MCP server for Outscraper's Google Maps data extraction services - 2 essential tools ...1 version - Latest release: 5 months ago - 36 downloads last month - 1 stars on GitHub - 1 maintainer
superpipe-py 0.1.9
build unstructured to structured data transformation pipelines8 versions - Latest release: over 1 year ago - 17 downloads last month - 108 stars on GitHub - 1 maintainer
pyspark-pdf 0.1.0rc9
Spark-Pdf is a library for processing documents using Apache Spark8 versions - Latest release: about 1 year ago - 51 downloads last month - 75 stars on GitHub - 1 maintainer
vnstock3 3.2.1 💰
A comprehensive and transparent solution for Vietnamese stock market analysis.12 versions - Latest release: 8 months ago - 4.4 thousand downloads last month - 1,004 stars on GitHub - 1 maintainer
Related Keywords
llm
25
ai
22
web-scraping
22
automation
19
python
18
scraping
16
data
13
data-engineering
12
structured-data
12
crawler
11
etl
11
machine-learning
11
webscraping
10
nlp
10
web-crawler
9
pdf
9
scraper
8
cli
8
mcp
7
document-processing
7
excel
7
playwright
7
data-science
7
python3
7
json
6
openai
6
content-extraction
6
business-intelligence
6
data-analysis
6
extract
6
ocr
6
etl-pipeline
6
context
6
agentic-graphrag
5
data-sovereignty
5
agentic-ai-development
5
agentic-rag
5
graphrag
5
agentic-ai
5
knowledge-core
5
llm-deployment
5
llm-orchestration
5
context-management
5
trustgraph
5
context-engineering
5
model-serving
5
ai-native
5
artificial-intelligence
5
information-extraction
5
data-mining
5
text-processing
5
spark
5
llm-extraction
5
api
5
web-scraper
5
web-data-extraction
5
document
5
unstructured-data
5
markdown
5
async
5
claude
5
search
4
pandas
4
crawling
4
web-scraping-python
4
reports
4
ai-scraping
4
ai-agents
4
document-extraction
4
document-analysis
4
web-automation
4
document-intelligence
4
social-media
4
document-understanding
4
document-parsing
4
document-pipeline
3
validation
3
classification
3
data-visualization
3
document-qa
3
youtube
3
parsing
3
text-mining
3
ocr-recognition
3
selenium
3
text-extraction
3
automated-prompting
3
sql
3
bigdata
3
literature-review
3
parser
3
stealth
3
xpath
3
reviews
3
outscraper
3
google-maps
3
docx
3
research
3
data-cleaning
3
csv
3