pypi.org "html-to-markdown" keyword
View the packages on the pypi.org package registry that are tagged with the "html-to-markdown" keyword.
Top 1.7% on pypi.org
50 versions - Latest release: 9 months ago - 71 dependent packages - 63 dependent repositories - 1.37 million downloads last month - 4,627 stars on GitHub - 1 maintainer
trafilatura 2.0.0 π°
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction...50 versions - Latest release: 9 months ago - 71 dependent packages - 63 dependent repositories - 1.37 million downloads last month - 4,627 stars on GitHub - 1 maintainer
confluence-markdown-exporter 3.0.5 π°
A tool to export Confluence pages to Markdown32 versions - Latest release: 1 day ago - 3.19 thousand downloads last month - 116 stars on GitHub - 1 maintainer
markdown-crawler 0.0.8
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file fo...7 versions - Latest release: almost 2 years ago - 735 downloads last month - 393 stars on GitHub - 1 maintainer
magicconvert 0.1.3
MagicConvert is a Python library that converts various document formats (PDF, DOCX, XLSX, PPTX, H...3 versions - Latest release: 4 months ago - 43 downloads last month - 1 stars on GitHub - 1 maintainer
pyhtml2md 1.7.0
Transform your HTML into clean, easy-to-read markdown with pyhtml2md.20 versions - Latest release: 4 months ago - 7.59 thousand downloads last month - 63 stars on GitHub - 1 maintainer
firecrawl-py 4.3.4
Python SDK for Firecrawl API112 versions - Latest release: 1 day ago - 1 dependent package - 1.09 million downloads last month - 54,038 stars on GitHub - 1 maintainer
firecrawl 4.3.4
Python SDK for Firecrawl API89 versions - Latest release: 1 day ago - 434 thousand downloads last month - 54,038 stars on GitHub - 1 maintainer
webdown 0.7.0
Convert web pages and HTML files to markdown and Claude XML formats8 versions - Latest release: 5 months ago - 17 downloads last month - 3 stars on GitHub - 1 maintainer
docstrange 1.1.5
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, J...16 versions - Latest release: 5 days ago - 1.93 thousand downloads last month - 493 stars on GitHub - 1 maintainer
spiderclient-py 0.0.1
Python SDK for Spider Cloud API1 version - Latest release: over 1 year ago - 4 downloads last month - 19 stars on GitHub - 1 maintainer
spider-client 0.1.77
Python SDK for Spider Cloud API103 versions - Latest release: 9 days ago - 1 dependent package - 164 thousand downloads last month - 19 stars on GitHub - 1 maintainer
spidercloud-py 0.0.1
Python SDK for Spider Cloud API1 version - Latest release: over 1 year ago - 3 downloads last month - 0 stars on GitHub - 1 maintainer
llm-data-converter 2.2.0
Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPo...23 versions - Latest release: about 1 month ago - 297 downloads last month - 3 stars on GitHub - 1 maintainer
document-data-extractor 1.0.4
Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPo...5 versions - Latest release: about 1 month ago - 78 downloads last month - 3 stars on GitHub - 1 maintainer
bookmarks2markdown 1.0.2
Convert HTML bookmarks to Markdown3 versions - Latest release: over 6 years ago - 12 dependent repositories - 12 downloads last month - 5 stars on GitHub - 1 maintainer
html2text-rs 0.2.5
Convert HTML to markdown or plain text7 versions - Latest release: 7 days ago - 5.11 thousand downloads last month - 7 stars on GitHub - 1 maintainer
wizardhtml 1.0.0
WHATWG-compliant HTML5 toolkit: DFA tokenizer, spec-guided tree builder, DOM, configurable serial...1 version - Latest release: 9 days ago - 1 stars on GitHub - 1 maintainer
spiderforce4ai 2.6.9
Python wrapper for SpiderForce4AI HTML-to-Markdown conversion service with LLM post-processing50 versions - Latest release: 6 months ago - 173 downloads last month - 1 maintainer
html2txt 0.6.0
Convert HTML to markdown6 versions - Latest release: almost 5 years ago - 2 dependent repositories - 216 downloads last month - 1 stars on GitHub - 1 maintainer
bookmarkdown 0.1.1
Parse your browser's exported HTML bookmark file to Markdown.2 versions - Latest release: about 4 years ago - 1 dependent repositories - 19 downloads last month - 17 stars on GitHub - 1 maintainer
webpage2md 1.0.0
Convert HTML files and web pages to Markdown format1 version - Latest release: 7 months ago - 9 downloads last month - 1 maintainer
url2md4ai 0.1.2
π Lean Python tool for extracting clean, LLM-optimized markdown from web pages. Handles dynamic c...6 versions - Latest release: 2 months ago - 73 downloads last month - 2 stars on GitHub - 1 maintainer
spiderwebai-py 0.1.4
Python SDK for SpiderWebAI API10 versions - Latest release: over 1 year ago - 25 downloads last month - 18 stars on GitHub - 1 maintainer
markdown-spider 0.1.1
A configurable web spider that converts online documentation into well-structured Markdown files ...1 version - Latest release: 6 months ago - 10 downloads last month - 2 stars on GitHub - 1 maintainer
rapid-crawl 0.1.0 π°
A powerful Python SDK for web scraping, crawling, and data extraction - inspired by Firecrawl1 version - Latest release: about 2 months ago - 1 maintainer
toprint 0.1.32
2print/toprint: Python library for printing and converting between HTML, PDF, ZPL, and image form...6 versions - Latest release: 3 months ago - 454 downloads last month - 0 stars on GitHub - 1 maintainer
file2txt 1.0.1
file2txt is a Python library takes common file formats and turns them into plain text (a txt file...3 versions - Latest release: 2 months ago - 12 stars on GitHub - 1 maintainer
conv-html-to-markdown 0.1.311
Curate scraped HTML for easy interpretation by large language models. Build more robust generativ...7 versions - Latest release: over 1 year ago - 161 downloads last month - 3 stars on GitHub - 1 maintainer
Related Keywords
markdown
19
llm
11
web-scraping
10
scraper
9
ai
9
rag
8
crawler
7
spider
6
ai-scraping
6
text-extraction
6
pdf-to-markdown
6
document-processing
5
content-extraction
5
document-conversion
5
ocr
5
html
5
converter
5
pdf
4
ai-agents
4
llm-webcrawler
4
html2md
4
scraping
4
web-crawler
4
image-processing
3
document-ai
3
batch-document-processing
3
intelligent-document-processing
3
document-understanding
3
ai-training-data
3
word-to-markdown
3
powerpoint-to-markdown
3
html2markdown
3
excel-to-markdown
3
llm-ready-data
3
layout-detection
3
table-extraction
3
structured-data-extraction
3
webscraping
3
local-document-processing
3
document-to-markdown
3
tesseract-alternative
3
paddleocr-alternative
3
mineru-alternative
3
markitdown-alternative
3
supabase
3
unstructured-alternative
3
docling-alternative
3
marker-alternative
3
html2text
2
web
2
offline-document-converter
2
data
2
ppt-to-markdown
2
bookmarks
2
firecrawl
2
API
2
SDK
2
html5
2
python
2
web-automation
2
offline-document-extractor
2
python3
2
pdf-to-json
2
html-to-md
2
file-conversion
2
image-to-text
2
html-parser
2
image-to-markdown
2
doc2pdf
1
doc-to-print
1
doc2print
1
image-to-json
1
img2json
1
image-to-xml
1
img2xml
1
img2markdown
1
image-to-md
1
img2md
1
image-to-docx
1
img2docx
1
image-to-doc
1
img2doc
1
image-to-zpl
1
img2zpl
1
image-to-html
1
img2html
1
image-to-pdf
1
img2pdf
1
image-to-print
1
zpl-to-print
1
zpl2print
1
pdf2json
1
pdf-to-xml
1
pdf2xml
1
pdf2markdown
1
pdf-to-md
1
pdf2md
1
pdf-to-docx
1
pdf2docx
1
pdf-to-doc
1