pypi.org : betterhtmlchunking
A Python library for intelligent HTML segmentation and ROI extraction. It builds a DOM tree from raw HTML and extracts content-rich regions for efficient web scraping and analysis.
Registry
-
Source
- Documentation
- JSON
purl: pkg:pypi/betterhtmlchunking
Keywords:
html
, chunking
, scraping
, dom
, roi
, content extraction
, web-scraping
, ai
, llm
, splitting
License: MIT
Latest release: 4 months ago
First release: 8 months ago
Downloads: 126 last month
Stars: 44 on GitHub
Forks: 5 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 9 days ago