Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "extraction" keyword

edspdf 0.9.1
Smart text extraction from PDF documents
10 versions - Latest release: about 2 months ago - 1 dependent package - 496 downloads last month - 34 stars on GitHub - 1 maintainer
edspdf-mupdf 0.2.0
MuPDF extension for EDS-PDF
1 version - Latest release: 11 months ago - 86 downloads last month - 30 stars on GitHub - 1 maintainer
ximage 0.3.1
xarray-based tools for image/video processing
9 versions - Latest release: over 3 years ago - 1 dependent package - 1 dependent repositories - 67 downloads last month - 8 stars on GitHub - 1 maintainer
pdftext 0.3.7
Extract structured text from pdfs quickly
13 versions - Latest release: 6 days ago - 1.54 thousand downloads last month - 184 stars on GitHub - 1 maintainer
txt-from-pdf 1.1.1
Extract clean text from PDFs.
4 versions - Latest release: 4 days ago - 405 downloads last month - 1 stars on GitHub - 1 maintainer
doxstractor 0.1.1
Doxstractor extracts strutured data from text in an easily configurable way.
9 versions - Latest release: 19 days ago - 1.03 thousand downloads last month - 4 stars on GitHub - 1 maintainer
docspotter 0.3
DocSpotter is a Python library designed to extract specific information from document images by c...
3 versions - Latest release: about 1 month ago - 331 downloads last month - 1 stars on GitHub - 1 maintainer
reporter-utils 0.1.1
Shared utilities for data extraction.
2 versions - Latest release: about 2 months ago - 64 downloads last month - 0 stars on GitHub - 1 maintainer
llama-index-packs-amazon-product-extraction 0.1.3
llama-index packs amazon_product_extraction integration
4 versions - Latest release: 3 months ago - 16 downloads last month - 3,395 stars on GitHub - 1 maintainer
newspaper4k 0.9.3
Simplified python article discovery & extraction.
5 versions - Latest release: about 2 months ago - 9.33 thousand downloads last month - 303 stars on GitHub - 1 maintainer
dataxtractor 1.0.7
DataXtractor is a versatile Python library designed to simplify the extraction of valuable data f...
3 versions - Latest release: 7 months ago - 31 downloads last month - 1 maintainer
df-extract 0.0.2
DecisionFacts Extraction Library extracts content from PDF, PPTX, Docx, png, jpg., and convert as...
3 versions - Latest release: 7 months ago - 1 dependent package - 1 dependent repositories - 71 downloads last month - 12 stars on GitHub - 1 maintainer
jsonflow 0.1.1
Pandoc (Python Library)
4 versions - Latest release: 7 months ago - 20 downloads last month - 113 stars on GitHub - 1 maintainer
cloudsdp 0.1.11
11 versions - Latest release: 9 months ago - 48 downloads last month - 0 stars on GitHub - 1 maintainer
sound-extraction 2.1.2
Slice and segment your audio files easily with open source Python program. Our tool enables you t...
10 versions - Latest release: 10 months ago - 19 downloads last month - 3 stars on GitHub - 1 maintainer
sia-app 1.1.0
Application to facilitate the download, exploration and visual analysis of oceanographic data.
1 version - Latest release: 12 months ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
siaextractlib 0.2.2
Provide an easy to use API for download oceanographic data.
1 version - Latest release: 12 months ago - 1 dependent package - 1 dependent repositories - 15 downloads last month - 0 stars on GitHub - 1 maintainer
extract-drugs 1.3.0
A CLI for extracting drugs from text records
6 versions - Latest release: 20 days ago - 41 downloads last month - 3 stars on GitHub - 1 maintainer
pdfix-sdk 8.0.1
PDFix SDK - Automated PDF Remediation, Data Extraction, HTML Conversion
11 versions - Latest release: 11 days ago - 710 downloads last month - 1 maintainer
pydoxtools 0.8.0
This library contains a set of tools in order to extract and synthesize structured information fr...
12 versions - Latest release: 4 months ago - 73 downloads last month - 55 stars on GitHub - 1 maintainer
teklia-line-image-extractor 0.2.9
A tool for extracting a text line image from the contour with different methods
12 versions - Latest release: 2 months ago - 1 dependent repositories - 603 downloads last month - 0 stars on GitLab.com - 1 maintainer
article-extraction 0.3.0
Article text extraction library
1 version - Latest release: about 1 year ago - 1 dependent package - 20 downloads last month - 5 stars on GitHub - 1 maintainer
acronym-extractor 2.0.7 removed
Extracting acronym-definition pairs from pdf or text files
16 versions - Latest release: about 1 year ago - 1.16 thousand downloads last month - 2 stars on GitHub - 1 maintainer
docile-benchmark 0.3.1
Tools to work with the DocILE dataset and benchmark
3 versions - Latest release: about 1 year ago - 426 downloads last month - 106 stars on GitHub - 1 maintainer
colorum 0.5.2
Colorum is a python package for changing the color of the console.
7 versions - Latest release: about 1 year ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
data-extract 1.9
Extract data from files in python
19 versions - Latest release: about 1 year ago - 22 downloads last month - 0 stars on GitHub - 1 maintainer
ibex-1d 1.3.2 💰
Image 1D Barcode EXtractor - Detect and Extract 1D Barcode(s) in Photographs
3 versions - Latest release: over 1 year ago - 10 downloads last month - 3 stars on GitHub - 1 maintainer
words2numbers 5.12.22
A python package for extracting numbers from text
1 version - Latest release: over 1 year ago - 8 downloads last month - 7 stars on GitHub - 1 maintainer
picturetextcrop 0.6.1
Interactive extraction of selected text from images and batch processing of stored image files.
3 versions - Latest release: 4 months ago - 28 downloads last month - 0 stars on GitHub - 1 maintainer
trtimeextractor 0.1.2
Time Extractor NLP project - locate dates and times in text documents
3 versions - Latest release: over 1 year ago - 29 downloads last month - 0 stars on GitHub - 1 maintainer
conextraction 0.0.3
Extract the conversation
3 versions - Latest release: over 1 year ago - 33 downloads last month - 1 maintainer
fuzzpyxl 0.0.4
Helper functions to easily search for Excel-Cells by value, color, formatting or else
2 versions - Latest release: almost 2 years ago - 1 downloads last month - 0 stars on GitHub - 1 maintainer
zhkeybert 0.1.2
Based on KeyBERT performs Chinese documents keyword extraction with state-of-the-art transformer ...
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 135 downloads last month - 63 stars on GitHub - 1 maintainer
z3c.recipe.i18n 1.0.0
Zope3 egg based i18n locales extraction recipes
14 versions - Latest release: over 6 years ago - 2 dependent repositories - 169 downloads last month - 2 stars on GitHub - 21 maintainers
Top 9.5% on pypi.org
xpath 0.0.1
A python library to extract objects from an object tree.
1 version - Latest release: 9 months ago - 3 dependent repositories - 1 maintainer
xiax 0.6.0
Extract or insert artwork/sourcecode from/to an `xml2rfc` XML document.
3 versions - Latest release: about 5 years ago - 1 dependent repositories - 15 downloads last month - 0 stars on GitHub - 1 maintainer
xextract 0.1.8
Extract structured data from HTML and XML documents like a boss.
17 versions - Latest release: about 4 years ago - 1 dependent package - 4 dependent repositories - 655 downloads last month - 50 stars on GitHub - 1 maintainer
Top 6.0% on pypi.org
woob 3.3.1
Woob, Web Outside Of Browsers
9 versions - Latest release: about 1 year ago - 1 dependent package - 4 dependent repositories - 23.6 thousand downloads last month - 110 stars on GitLab.com - 2 maintainers
webdownloader 0.1.2
webdownloader is a tool for web data extraction
3 versions - Latest release: over 5 years ago - 1 dependent repositories - 60 downloads last month - 0 stars on GitHub - 1 maintainer
webbrowserdownloader 0.1
webbrowserdownloader is a wrapper for selenium browser
1 version - Latest release: over 5 years ago - 1 dependent repositories - 265 downloads last month - 1 maintainer
Top 6.7% on pypi.org
unrpa 2.3.0 💰
Extract files from the RPA archive format (from the Ren'Py Visual Novel Engine).
6 versions - Latest release: over 4 years ago - 3 dependent repositories - 2.39 thousand downloads last month - 561 stars on GitHub - 1 maintainer
unicontent 0.5.2
Python module to extract structured metadata from URL, DOI or ISBN
6 versions - Latest release: about 7 years ago - 1 dependent repositories - 58 downloads last month - 11 stars on GitHub - 1 maintainer
unfluff 0.2
HTML content extraction - remove the fluff
2 versions - Latest release: over 13 years ago - 2 dependent repositories - 9 downloads last month - 18 stars on GitHub - 1 maintainer
turcy 0.0.42
A package for German Open Informtion Extraction
15 versions - Latest release: about 1 year ago - 1 dependent repositories - 150 downloads last month - 2 stars on GitHub - 1 maintainer
tosixinch 0.9.0
Browser to e-reader in a few minutes
16 versions - Latest release: almost 2 years ago - 1 dependent repositories - 57 downloads last month - 3 stars on GitHub - 1 maintainer
torex 0.1.1
Torrent extraction automation
2 versions - Latest release: over 8 years ago - 2 dependent repositories - 11 downloads last month - 0 stars on GitHub - 1 maintainer
timeside 0.9.6
Audio processing framework for the web
27 versions - Latest release: over 3 years ago - 2 dependent repositories - 237 downloads last month - 366 stars on GitHub - 1 maintainer
tf-idf 0.0.0
An implementation of TF-IDF for keyword extraction.
1 version - Latest release: 9 months ago - 1 dependent repositories - 1 stars on GitHub - 1 maintainer
textpipeliner 0.3.1
textpipeliner - library for extracting specific words from sentences of a document
5 versions - Latest release: over 7 years ago - 3 dependent repositories - 28 downloads last month - 68 stars on GitHub - 1 maintainer
take 0.2.0
A DSL for extracting data from a web page.
9 versions - Latest release: about 9 years ago - 10 dependent repositories - 57 downloads last month - 8 stars on GitHub - 1 maintainer
Top 9.6% on pypi.org
tableh 0.0.01
Tableh, taking the "Matt Damon - Oscar Winning actor" out of "Mahhttt Dahhmonnn.
1 version - Latest release: 9 months ago - 2 dependent repositories - 1 maintainer
Top 6.0% on pypi.org
stanford-openie 1.3.2 💰
Minimalist wrapper around Stanford OpenIE
8 versions - Latest release: 4 months ago - 1 dependent package - 14 dependent repositories - 751 downloads last month - 616 stars on GitHub - 1 maintainer
stackdistiller 0.12
A data extraction and transformation library for OpenStack notifications
3 versions - Latest release: almost 9 years ago - 1 dependent package - 5 dependent repositories - 72 downloads last month - 21 stars on GitHub - 3 maintainers
sopex 0.1
Library and CLI to extract the subject, predicate and object for a given english sentence
1 version - Latest release: about 11 years ago - 2 dependent repositories - 10 downloads last month - 9 stars on GitHub - 1 maintainer
snorkel_ie 0.4a0
a lightweight framework for developing structured information extraction applications
1 version - Latest release: over 7 years ago - 17 downloads last month - 5,699 stars on GitHub - 1 maintainer
scrapyz 0.3.3
Scrape Easy
6 versions - Latest release: almost 9 years ago - 2 dependent repositories - 7 downloads last month - 188 stars on GitHub - 1 maintainer
scrapedia 0.1.0
A scraper used for the extraction of brazilizan soccer historic data from the webpage futpedia.gl...
1 version - Latest release: over 4 years ago - 1 dependent repositories - 11 downloads last month - 0 stars on GitHub - 1 maintainer
sap-business-document-processing 0.4.1
Python client library for convenient usage of SAP Business Document Processing services
9 versions - Latest release: 7 days ago - 1 dependent repositories - 1.76 thousand downloads last month - 19 stars on GitHub - 3 maintainers
ruruki 1.4.2
Ruruki is a lightweight in-memory graph database which is ideal if you need a temporary graph dat...
13 versions - Latest release: almost 7 years ago - 3 dependent repositories - 64 downloads last month - 89 stars on GitHub - 1 maintainer
ret 0.1.4
A pure-python command-line regular expression tool for stream filtering, extracting, and parsing.
5 versions - Latest release: almost 2 years ago - 7 dependent repositories - 143 downloads last month - 2 stars on GitHub - 1 maintainer
relext 0.0.4
RelExt: A Tool for Relation Extraction from Text.
3 versions - Latest release: almost 2 years ago - 1 dependent repositories - 17 downloads last month - 44 stars on GitHub - 1 maintainer
refinery 2.6.0 💰
A map extractor for games built with the Blam engine
49 versions - Latest release: about 3 years ago - 1 dependent repositories - 442 downloads last month - 8 stars on GitHub - 1 maintainer
reading-image 1.0.1
Reading Image is a text analysis tool for images files (png, jpg, jpeg) and pdf. The system will ...
3 versions - Latest release: almost 4 years ago - 1 dependent repositories - 27 downloads last month - 1 maintainer
rake 1.0
('Rapid Automatic Keywords Extraction', 'Just a Practice')
1 version - Latest release: over 10 years ago - 7 dependent repositories - 279 downloads last month - 7 stars on GitHub - 1 maintainer
pytimeextractor 0.1.4
Time Extractor NLP project - locate dates and times in text documents
4 versions - Latest release: over 6 years ago - 1 dependent repositories - 28 downloads last month - 22 stars on GitHub - 1 maintainer
pythonrlsa 1.0.0
Python Run Length Smoothing Algorithm for Document Processing
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 270 downloads last month - 27 stars on GitHub - 1 maintainer
pynutshell 1.0.2
An unsupervised text summarization and information retrieval library under the hood using natural...
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 30 downloads last month - 13 stars on GitHub - 1 maintainer
pyisotools 2.4.6
Simple python library for extracting and rebuilding ISOs
41 versions - Latest release: over 1 year ago - 1 dependent repositories - 217 downloads last month - 9 stars on GitHub - 1 maintainer
pydomainextractor 0.13.9
A blazingly fast domain extraction library written in Rust
34 versions - Latest release: about 2 months ago - 1 dependent repositories - 859 downloads last month - 64 stars on GitHub - 1 maintainer
pycrowlingo 0.6.4
Official Crowlingo SDK. Access to all NLP and NLU services that analyze texts regardless of the l...
19 versions - Latest release: almost 3 years ago - 1 dependent repositories - 108 downloads last month - 4 stars on GitHub - 1 maintainer
pyang-module-catalog-plugin 0.2
A pyang plugin to extract OpenConfig module catalog data from YANG modules
2 versions - Latest release: about 7 years ago - 1 dependent repositories - 11 downloads last month - 0 stars on GitHub - 1 maintainer
pyang-jsontree-plugin 0.1
A pyang plugin to produce a JSON representation of module trees for use in graph libraries
1 version - Latest release: over 6 years ago - 1 dependent repositories - 197 downloads last month - 5 stars on GitHub - 1 maintainer
pupillib 1.2.1
A software package to perform trial extraction on Pupil Labs eye tracker data collected into XDF ...
4 versions - Latest release: over 4 years ago - 2 dependent repositories - 16 downloads last month - 10 stars on GitHub - 1 maintainer
pre-ml 1.0.1
pre-ml an optimization tool for machine learning!
1 version - Latest release: over 6 years ago - 1 dependent repositories - 13 downloads last month - 4 stars on GitHub - 1 maintainer
plotextractor 1.0.9
Small library for extracting plots used in scholarly communication.
20 versions - Latest release: over 1 year ago - 4 dependent repositories - 109 downloads last month - 3 stars on GitHub - 1 maintainer
petact 0.1.2
A package extraction tool
3 versions - Latest release: almost 6 years ago - 1 dependent package - 35 dependent repositories - 1.18 thousand downloads last month - 0 stars on GitHub - 1 maintainer
perfectextractor 0.3.3
Extracting Perfects (and related forms) from parallel corpora
7 versions - Latest release: almost 4 years ago - 1 dependent package - 2 dependent repositories - 34 downloads last month - 6 stars on GitHub - 1 maintainer
Top 4.7% on pypi.org
pdftabextract 0.3.0
A set of tools for data mining (OCR-processed) PDFs
5 versions - Latest release: over 6 years ago - 12 dependent repositories - 1.29 thousand downloads last month - 2,159 stars on GitHub - 2 maintainers
pcu-keyphrase 2.0
Keyphrase extraction for PCU project
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 20 downloads last month - 1 stars on GitHub - 1 maintainer
ovos-skill-installer 0.0.5
Mycroft skill installer from .zip or .tar.gz urls
4 versions - Latest release: about 3 years ago - 6 dependent packages - 1 dependent repositories - 1.2 thousand downloads last month - 0 stars on GitHub - 2 maintainers
osf-eimtc 0.1.53
A Framework for Encrypted Internet and Malicious Traffic Classification.
52 versions - Latest release: 7 days ago - 1 dependent repositories - 349 downloads last month - 8 stars on GitHub - 4 maintainers
onesheet 0.1.5
Easily access metadata for image, video, sound, and document file.
35 versions - Latest release: almost 9 years ago - 1 dependent repositories - 77 downloads last month - 3 stars on GitHub - 1 maintainer
numbers_extractor 0.2.3
Extract numbers from a string and return a list of int.
5 versions - Latest release: almost 10 years ago - 2 dependent repositories - 47 downloads last month - 0 stars on GitHub - 1 maintainer
nsi.metadataextractor 1.2
A template-based metadata extractor.
1 version - Latest release: about 11 years ago - 3 dependent repositories - 6 downloads last month - 1 maintainer
noun-hound 1.0.0
Finds nouns and noun phrases in any given text.
1 version - Latest release: over 8 years ago - 2 dependent repositories - 7 downloads last month - 1 maintainer
newsworker 1.0.1
Advanced news feeds extractor and finder library. Helps to automatically extract news from websi...
1 version - Latest release: almost 6 years ago - 1 dependent repositories - 112 downloads last month - 76 stars on GitHub - 1 maintainer
newsman 1.1.0
A tool for web news scraping.
2 versions - Latest release: over 4 years ago - 1 dependent repositories - 18 downloads last month - 0 stars on GitHub - 1 maintainer
mrextractor 0.0.1a1
A library for binaries feature extraction
5 versions - Latest release: almost 5 years ago - 1 dependent repositories - 38 downloads last month - 12 stars on GitHub - 1 maintainer
minutiaeclassificator 1.0.0
Minutiae extraction and classification tool
23 versions - Latest release: about 4 years ago - 1 dependent repositories - 59 downloads last month - 1 stars on GitHub - 1 maintainer
metapdf 0.3.2
A lightweight PDF library optimized for metadata extraction and insertion
5 versions - Latest release: about 12 years ago - 4 dependent repositories - 78 downloads last month - 16 stars on GitHub - 1 maintainer
metadoc 0.10.5
Post-truth era news article metadata service.
12 versions - Latest release: over 5 years ago - 1 dependent repositories - 29 downloads last month - 36 stars on GitHub - 1 maintainer
logxstract 0.0.1
Library for extracting xml from logs to output file.
1 version - Latest release: over 6 years ago - 1 dependent repositories - 12 downloads last month - 0 stars on GitHub - 1 maintainer
Top 9.7% on pypi.org
libextract 0.0.12
A HT/XML web scraping tool
4 versions - Latest release: almost 9 years ago - 5 dependent repositories - 16 downloads last month - 499 stars on GitHub - 1 maintainer
Top 3.8% on pypi.org
krwordrank 1.0.3
KR-WordRank: Korean Unsupervised Word/Keyword Extractor
8 versions - Latest release: almost 4 years ago - 3 dependent packages - 28 dependent repositories - 697 downloads last month - 339 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
keybert 0.8.4
KeyBERT performs keyword extraction with state-of-the-art transformer models.
17 versions - Latest release: 3 months ago - 10 dependent packages - 105 dependent repositories - 70.2 thousand downloads last month - 3,211 stars on GitHub - 1 maintainer
iepy 0.9.6
Information Extraction framework in Python
7 versions - Latest release: over 7 years ago - 3 dependent repositories - 20 downloads last month - 903 stars on GitHub - 1 maintainer
ibooks 0.3 💰
iBooks Author cover and metadata extraction
3 versions - Latest release: almost 11 years ago - 2 dependent repositories - 18 downloads last month - 2 stars on GitHub - 1 maintainer
hundate 1.0.4
NLP modul for hungarian date-entity recognition and translation to specific date values
5 versions - Latest release: over 2 years ago - 1 dependent repositories - 34 downloads last month - 3 stars on GitHub - 1 maintainer
huhuseg 0.6.1
Simple Chinese segmentator, keywords extractor and other examples
13 versions - Latest release: almost 6 years ago - 1 dependent repositories - 67 downloads last month - 8 stars on GitHub - 1 maintainer
htmllist 2.2.2
Extract information from HTML pages that have some kind of a repetitive pattern
11 versions - Latest release: over 13 years ago - 1 dependent repositories - 69 downloads last month - 1 maintainer