An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org : betterhtmlchunking

A Python library for intelligent HTML segmentation and ROI extraction. It builds a DOM tree from raw HTML and extracts content-rich regions for efficient web scraping and analysis.

Registry - Source - Documentation - JSON
purl: pkg:pypi/betterhtmlchunking
Keywords: html , chunking , scraping , dom , roi , content extraction , web-scraping , ai , llm , splitting
License: MIT
Latest release: 4 months ago
First release: 8 months ago
Downloads: 126 last month
Stars: 44 on GitHub
Forks: 5 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 9 days ago

    Loading...
    Readme
    Loading...