An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "corpus" keyword

View the packages on the pypi.org package registry that are tagged with the "corpus" keyword.

Top 6.6% on pypi.org
korpora 0.2.0
This package provides easy-download and easy-usage for various Korean corpora.
7 versions - Latest release: over 4 years ago - 3 dependent repositories - 3.69 thousand downloads last month - 684 stars on GitHub - 2 maintainers
Top 1.7% on pypi.org
chatterbot-corpus 1.2.2
A machine readable multilingual dialog corpus.
12 versions - Latest release: 2 months ago - 3 dependent packages - 516 dependent repositories - 19.9 thousand downloads last month - 1,362 stars on GitHub - 2 maintainers
discoursegraphs 0.4.14
graph-based processing of multi-level annotated corpora
18 versions - Latest release: about 4 years ago - 5 dependent repositories - 575 downloads last month - 49 stars on GitHub - 1 maintainer
norsecorpus 1.0.2
Corpus of Old Norse texts with code to read them
2 versions - Latest release: over 5 years ago - 2 dependent repositories - 97 downloads last month - 1 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
trafilatura 2.0.0 💰
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction...
50 versions - Latest release: 5 months ago - 71 dependent packages - 63 dependent repositories - 944 thousand downloads last month - 4,118 stars on GitHub - 1 maintainer
giganticode-langmodels 0.0.4a0
A toolkit for applying machine learning to large source code corpora
5 versions - Latest release: about 5 years ago - 1 dependent repositories - 184 downloads last month - 8 stars on GitHub - 1 maintainer
neseg 0.7.2
Named Entity Segmentation
3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 79 downloads last month - 0 stars on GitHub - 1 maintainer
hironsancorpus 0.1.2
Japanese IOB2 tagged corpus for named entity recognition.
12 versions - Latest release: about 9 years ago - 2 dependent repositories - 257 downloads last month - 1 maintainer
ekorpkit 0.1.40
eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, tran...
94 versions - Latest release: over 2 years ago - 1 dependent repositories - 841 downloads last month - 6 stars on GitHub - 1 maintainer
corpus-cleaner 0.1.0
Natural-language corpus cleaning scripts
1 version - Latest release: almost 9 years ago - 2 dependent repositories - 30 downloads last month - 9 stars on GitHub - 1 maintainer
google_news_crawler 0.3.9
Google News Crawler
10 versions - Latest release: over 8 years ago - 2 dependent repositories - 234 downloads last month - 1 maintainer
qurancorpus 0.2.1
Arabic Quranic Corpus python API
4 versions - Latest release: almost 9 years ago - 2 dependent repositories - 109 downloads last month - 1 maintainer
corpus-similarity 1.0.1
Measuring corpus similarity in Python
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 97 downloads last month - 13 stars on GitHub - 1 maintainer
hindikosh 0.0.1
Hindi corpus reader
1 version - Latest release: over 6 years ago - 1 dependent repositories - 53 downloads last month - 1 stars on GitHub - 1 maintainer
buzz 3.1.8
Sophisticated corpus linguistics
23 versions - Latest release: over 4 years ago - 1 dependent repositories - 938 downloads last month - 41 stars on GitHub - 1 maintainer
internetargumentcorpus 1.0.1
The Internet Argument Corpus (IAC) version 2 is a collection of corpora for research in political...
1 version - Latest release: almost 9 years ago - 1 dependent repositories - 27 downloads last month - 1 maintainer
visko 0.2.3b4
Web-based software suite for Computational Linguistic Analysis based on construction grammars and...
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 34 downloads last month - 4 stars on GitHub - 1 maintainer
elang 0.1.1 💰
Word Embedding(E) utilities for Language(Lang) Models
11 versions - Latest release: about 5 years ago - 1 dependent repositories - 320 downloads last month - 39 stars on GitHub - 1 maintainer
coarij 0.2.8
Corpus of Annual Reports in Japan
9 versions - Latest release: over 4 years ago - 1 dependent repositories - 225 downloads last month - 90 stars on GitHub - 1 maintainer
chafic 0.1.10
chakki Financial Report Corpus
11 versions - Latest release: over 5 years ago - 288 downloads last month - 90 stars on GitHub - 1 maintainer
Top 2.7% on pypi.org
hanlp-common 0.0.23
HanLP: Han Language Processing
21 versions - Latest release: 3 months ago - 1 dependent package - 103 dependent repositories - 5.78 thousand downloads last month - 34,829 stars on GitHub - 1 maintainer
Top 2.8% on pypi.org
hanlp-trie 0.0.5
HanLP: Han Language Processing
5 versions - Latest release: almost 3 years ago - 1 dependent package - 102 dependent repositories - 4.97 thousand downloads last month - 34,829 stars on GitHub - 1 maintainer
Top 6.5% on pypi.org
hanlp-restful 0.0.24
HanLP: Han Language Processing
22 versions - Latest release: 4 months ago - 2 dependent repositories - 1.03 thousand downloads last month - 34,829 stars on GitHub - 1 maintainer
Top 2.2% on pypi.org
hanlp 2.1.1
HanLP: Han Language Processing
197 versions - Latest release: 3 months ago - 7 dependent packages - 23 dependent repositories - 14.7 thousand downloads last month - 34,829 stars on GitHub - 1 maintainer
mingdongnlp 2.0.0a42
mingdongnlp: ZH Language Processing
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 57 downloads last month - 1 maintainer
utilss 0.1.7
Useful tools to work with text mining in Python
6 versions - Latest release: almost 5 years ago - 1 dependent repositories - 231 downloads last month - 0 stars on GitHub - 1 maintainer
simisimi 0.0.3
NPL Package in development
3 versions - Latest release: over 5 years ago - 1 dependent repositories - 119 downloads last month - 1 maintainer
canto-filter 1.1.4
粵文分類篩選器 Cantonese text filter
10 versions - Latest release: 19 days ago - 570 downloads last month - 33 stars on GitHub - 1 maintainer
cantonesedetect 1.1.2
A minimal package that detect Cantonese sentences in Traditional Chinese text.
4 versions - Latest release: 4 months ago - 171 downloads last month - 33 stars on GitHub - 1 maintainer
fundus 0.5.0
A very simple news crawler
14 versions - Latest release: 2 months ago - 1.81 thousand downloads last month - 366 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
corpora 1.0
Lightweight, fast and scalable text corpus library.
1 version - Latest release: over 13 years ago - 9 dependent repositories - 468 downloads last month - 1 maintainer
sadedegel 0.21.2
Extraction-based Turkish news summarizer.
46 versions - Latest release: about 3 years ago - 2 dependent repositories - 1.53 thousand downloads last month - 94 stars on GitHub - 1 maintainer
anchor-annotator 0.8.1
Anchor annotator is a program for inspecting corpora for the Montreal Forced Aligner and correcti...
23 versions - Latest release: 7 months ago - 1 dependent package - 1 dependent repositories - 997 downloads last month - 3 stars on GitHub - 1 maintainer
demeuk 4.5.0
CLI tool to remove invalid chars from a corpus.
15 versions - Latest release: about 1 month ago - 1 dependent repositories - 518 downloads last month - 16 stars on GitHub - 1 maintainer
runesanalyzer 1.0.5
Gathers different kinds of runic inscriptions
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 92 downloads last month - 4 stars on GitHub - 1 maintainer
similaritylab 0.0.5
NPL Package in development
3 versions - Latest release: over 5 years ago - 1 dependent repositories - 94 downloads last month - 1 stars on GitHub - 1 maintainer
belaweb 0.13.1a1
BELA Dashboard - Web-based user interface for visualising and analysing BELA transcripts
1 version - Latest release: about 3 years ago - 1 dependent repositories - 36 downloads last month - 0 stars on GitHub - 1 maintainer
texttaglib 0.1.1
Python library for managing and annotating text corpuses in different formats (ELAN, TIG, TTL, et...
13 versions - Latest release: almost 4 years ago - 3 dependent repositories - 390 downloads last month - 0 stars on GitHub - 1 maintainer
pynutshell 1.0.2
An unsupervised text summarization and information retrieval library under the hood using natural...
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 145 downloads last month - 15 stars on GitHub - 1 maintainer
colibricore 2.5.9
Colibri Core is an NLP tool as well as a C++ and Python library (all included in this package) fo...
38 versions - Latest release: almost 2 years ago - 6 dependent repositories - 1.05 thousand downloads last month - 126 stars on GitHub - 1 maintainer
ruscorpora 0.10.0 💰
Links to https://github.com/kunansy/rnc
5 versions - Latest release: over 2 years ago - 1 dependent repositories - 169 downloads last month - 7 stars on GitHub - 1 maintainer
rnc 0.10.0 💰
API for Russian National Corpus
20 versions - Latest release: over 2 years ago - 2 dependent repositories - 412 downloads last month - 7 stars on GitHub - 1 maintainer
efaqa-corpus-zh 1.2
Emotional First Aid Dataset, 心理咨询问答语料库
7 versions - Latest release: 11 months ago - 198 downloads last month - 560 stars on GitHub - 1 maintainer
insuranceqa-data 2.5.3
Insuranceqa Corpus in Chinese for Machine Learning
12 versions - Latest release: 11 months ago - 242 downloads last month - 989 stars on GitHub - 1 maintainer
tg-model 3.6.3
Text collections made available by the CLiGS group.
36 versions - Latest release: 8 days ago - 1.17 thousand downloads last month - 23 stars on GitHub - 1 maintainer
pollux 1.0.4
Sophisticated corpus linguistics
1 version - Latest release: almost 8 years ago - 1 dependent repositories - 57 downloads last month - 1 maintainer
audiomate 6.0.0
Audiomate is a library for working with audio datasets.
10 versions - Latest release: over 4 years ago - 3 dependent repositories - 345 downloads last month - 137 stars on GitHub - 1 maintainer
ms3 2.6.0
A parser for MuseScore files, serving as data factory for annotated music corpora.
62 versions - Latest release: about 1 month ago - 2 dependent packages - 1 dependent repositories - 1.08 thousand downloads last month - 28 stars on GitHub - 1 maintainer
coquery 0.10.0
Coquery: A free corpus query tool
5 versions - Latest release: almost 8 years ago - 2 dependent repositories - 206 downloads last month - 1 maintainer
npvcc2016 3.0.0
npvcc2016: Python loader of npVCC2016 speech corpus
17 versions - Latest release: over 4 years ago - 1 dependent repositories - 414 downloads last month - 0 stars on GitHub - 1 maintainer
dcs-wrapper 0.0.1
Wrapper around Digital Corpus of Sanskrit
1 version - Latest release: about 7 years ago - 1 dependent repositories - 42 downloads last month - 2 stars on GitHub - 1 maintainer
python-by-contract-corpus 2021.7.10rc1
Provide a corpus of programs annotated with contracts with no obvious bugs.
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 31 downloads last month - 21 stars on GitHub - 1 maintainer
sugaroid-chatterbot-corpus 1.2.0
Chatterbot corpus for Sugaroid
1 version - Latest release: about 5 years ago - 3 dependent repositories - 51 downloads last month - 1,393 stars on GitHub - 1 maintainer
landa-chatterbot-corpus 1.2.0
A machine readable multilingual dialog corpus
1 version - Latest release: over 6 years ago - 1 dependent repositories - 34 downloads last month - 1,393 stars on GitHub - 1 maintainer
pdpc-decisions 1.3.2
Tools to extract and compile enforcement decisions from the Singapore Personal Data Protection Co...
11 versions - Latest release: almost 5 years ago - 1 dependent repositories - 282 downloads last month - 0 stars on GitHub - 1 maintainer
Top 2.9% on pypi.org
pyhanlp 0.1.89
Python wrapper for HanLP: Han Language Processing
97 versions - Latest release: 3 months ago - 2 dependent packages - 73 dependent repositories - 5.11 thousand downloads last month - 3,170 stars on GitHub - 1 maintainer
yyhanlp 0.1.85
Python wrapper for HanLP: Han Language Processing
1 version - Latest release: 5 months ago - 29 downloads last month - 3,170 stars on GitHub - 1 maintainer
codeprep 1.0.5
A toolkit for pre-processing large source code corpora
4 versions - Latest release: almost 4 years ago - 1 dependent repositories - 165 downloads last month - 45 stars on GitHub - 1 maintainer
giganticode-codeprep 1.0.0
A toolkit for pre-processing large source code corpora
1 version - Latest release: about 5 years ago - 1 dependent repositories - 52 downloads last month - 45 stars on GitHub - 1 maintainer
copr.py 0.0.7
A library to access the Corpus of Place Representations (COPR)
7 versions - Latest release: about 1 month ago - 224 downloads last month - 1 maintainer
Top 4.5% on pypi.org
synonyms 3.23.5
中文近义词:聊天机器人,智能问答工具包;Chinese Synonyms for Natural Language Processing and Understanding
16 versions - Latest release: over 1 year ago - 48 dependent repositories - 903 downloads last month - 5,066 stars on GitHub - 1 maintainer
Top 4.7% on pypi.org
montreal-forced-aligner 3.2.2
Montreal Forced Aligner is a package for aligning speech corpora using Kaldi functionality.
100 versions - Latest release: 25 days ago - 2 dependent packages - 5 dependent repositories - 5.29 thousand downloads last month - 1,124 stars on GitHub - 1 maintainer
Top 9.5% on pypi.org
speach 0.1a15.post1
a Python library for managing, annotating, and converting natural language corpuses using popular...
17 versions - Latest release: about 3 years ago - 2 dependent packages - 2 dependent repositories - 2.14 thousand downloads last month - 17 stars on GitHub - 1 maintainer
poetree 0.0.2
An easy way to get data from PoeTree dataset
2 versions - Latest release: about 1 year ago - 59 downloads last month - 5 stars on GitHub - 1 maintainer
pydracor 2.0.0
Python package which provides access to the DraCor API.
4 versions - Latest release: over 1 year ago - 1 dependent repositories - 122 downloads last month - 5 stars on GitHub - 4 maintainers
corpona 1.0.1
A library for reading corpora.
2 versions - Latest release: about 4 years ago - 1 dependent repositories - 85 downloads last month - 6 stars on GitHub - 1 maintainer
nlpbaselines 0.0.49
Quickly establish strong baselines for NLP tasks
27 versions - Latest release: over 1 year ago - 1 dependent repositories - 546 downloads last month - 0 stars on GitHub - 1 maintainer
ua-gec 2.1.3
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian language
9 versions - Latest release: about 1 year ago - 1 dependent package - 1 dependent repositories - 361 downloads last month - 260 stars on GitHub - 1 maintainer
perin-parser 0.0.19
(Unofficial) PERIN: Permutation-invariant Semantic Parsing
17 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 822 downloads last month - 45 stars on GitHub - 1 maintainer
hanlperceptron 0.2.0
Native Python HanLP Perceptron Model: HanLPerceptron
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 97 downloads last month - 7 stars on GitHub - 1 maintainer
polyglotdb 1.3.0
Language data store and linguistic query API
37 versions - Latest release: 9 months ago - 2 dependent repositories - 688 downloads last month - 39 stars on GitHub - 2 maintainers
efaqa-corpus-raw 1.0.3
心理咨询问答原始语料库(以下也称为“本数据集”,“本语料库”)是为应用人工智能技术于心理咨询领域制作的高品质语料,语料是爬取心理咨询领域公开的网站的数据,经过整理和脱敏制作而成。消息总文本字符数...
4 versions - Latest release: over 1 year ago - 86 downloads last month - 14 stars on GitHub - 1 maintainer
semiolog 0.0.0
Tools for the semiological analysis of corpora
1 version - Latest release: about 2 years ago - 64 downloads last month - 0 stars on GitHub - 1 maintainer
Top 8.3% on pypi.org
corpkit 2.3.8
A toolkit for working with linguistic corpora
151 versions - Latest release: over 8 years ago - 3 dependent repositories - 3.65 thousand downloads last month - 203 stars on GitHub - 1 maintainer
bela 2.0.0a21.post6
BLIP ELAN Language Annotation package
4 versions - Latest release: almost 3 years ago - 1 dependent repositories - 108 downloads last month - 3 stars on GitHub - 1 maintainer
kanon4txt 0.1.5
K Anonymity for Text first Try
11 versions - Latest release: almost 2 years ago - 300 downloads last month - 0 stars on GitHub - 1 maintainer
wordfish 1.1.6
Infrastructure for finding relationships between terms in corpus of interest.
17 versions - Latest release: over 8 years ago - 2 dependent repositories - 367 downloads last month - 20 stars on GitHub - 1 maintainer
corpusit 0.1.3
A multi-thread deterministic corpus iterator for natural language modeling tasks
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 593 downloads last month - 2 stars on GitHub - 1 maintainer
streusle 4.5
STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, superse...
4 versions - Latest release: almost 3 years ago - 1 dependent repositories - 105 downloads last month - 65 stars on GitHub - 1 maintainer
polyhymnia 1.0.0
Polyhymnia: Natual Chinese Data Augmentation
1 version - Latest release: about 3 years ago - 1 dependent repositories - 42 downloads last month - 5 stars on GitHub - 1 maintainer
oscar-corpus-downloader 0.1.0
OSCAR Corpus Download Tool
1 version - Latest release: over 1 year ago - 58 downloads last month - 0 stars on GitHub - 1 maintainer
zoegas 1.3.1
Old Norse Zoëga's dictionary
9 versions - Latest release: almost 3 years ago - 1 dependent repositories - 266 downloads last month - 4 stars on GitHub - 1 maintainer
pycorpus 1.9.2
Easy concurrent launch of series of file based experiments.
10 versions - Latest release: about 8 years ago - 1 dependent package - 1 dependent repositories - 371 downloads last month - 0 stars on GitHub - 1 maintainer
german-nouns 1.2.5 💰
A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CS...
6 versions - Latest release: almost 3 years ago - 1 dependent repositories - 1.84 thousand downloads last month - 149 stars on GitHub - 1 maintainer
wpextract 1.1.1
Create datasets from WordPress sites
9 versions - Latest release: 3 months ago - 348 downloads last month - 3 stars on GitHub - 1 maintainer
lcpcli 0.2.0
Helper for converting CONLLU files and uploading the corpus to LiRI Corpus Platform (LCP)
10 versions - Latest release: 22 days ago - 385 downloads last month - 2 stars on GitHub - 1 maintainer
textdirectory 0.3.3
TextDirectory allows you to combine multiple text files into one. While doing this, filters and t...
12 versions - Latest release: over 2 years ago - 1 dependent repositories - 453 downloads last month - 11 stars on GitHub - 1 maintainer
corpus-replicator 1.1.2
A corpus generation tool
4 versions - Latest release: over 1 year ago - 1.76 thousand downloads last month - 20 stars on GitHub - 2 maintainers
lyricscorpora 1.0.0
Lyrics API
10 versions - Latest release: almost 7 years ago - 1 dependent repositories - 203 downloads last month - 18 stars on GitHub - 1 maintainer
folia-tools 2.5.8
FoLiA-tools contains various Python-based command line tools for working with FoLiA XML (Format f...
97 versions - Latest release: 6 months ago - 1 dependent repositories - 1.68 thousand downloads last month - 62 stars on GitHub - 1 maintainer
spacy2folia 0.3.4
Library that adds FoLiA (format for linguistic annotation) support to spaCy
6 versions - Latest release: about 1 year ago - 1 dependent repositories - 127 downloads last month - 62 stars on GitHub - 1 maintainer
vikitext 0.0.4
Retrieve article links and text from Vikidia and their equivalents in Wikipedia
1 version - Latest release: over 3 years ago - 1 dependent repositories - 47 downloads last month - 0 stars on GitHub - 1 maintainer
expanda 1.3.1
Integrated Corpus-Building Environment
14 versions - Latest release: almost 5 years ago - 1 dependent repositories - 464 downloads last month - 29 stars on GitHub - 1 maintainer
webcorpus 0.2
Generate large textual corpora for almost any language by crawling the web
1 version - Latest release: about 4 years ago - 1 dependent repositories - 57 downloads last month - 8 stars on GitHub - 1 maintainer
sejong-downloader 1.0.0
Downloader for Sejong corpus
3 versions - Latest release: over 5 years ago - 1 dependent repositories - 97 downloads last month - 1 stars on GitHub - 1 maintainer
error-correction 1.0.3
Chinese Text Error Correction for Natural Language Processing and Understanding
1 version - Latest release: about 7 years ago - 1 dependent repositories - 41 downloads last month - 10 stars on GitHub - 1 maintainer
boco 0.3.2
A corpus manager for Tibetan Language
3 versions - Latest release: over 1 year ago - 123 downloads last month - 1 stars on GitHub - 1 maintainer
cd4py 0.1.0
CD4Py: Code De-Duplication for Python
1 version - Latest release: over 4 years ago - 2 dependent repositories - 60 downloads last month - 22 stars on GitHub - 1 maintainer
pytorch-mcrf 0.0.3
Multiple CRF implementation for PyTorch
3 versions - Latest release: almost 4 years ago - 2 dependent repositories - 142 downloads last month - 892 stars on GitHub - 1 maintainer
concordancecrawler 1.0.2
A module for automatic concordance extraction from the Internet
4 versions - Latest release: almost 9 years ago - 109 downloads last month - 2 stars on GitHub - 1 maintainer