pypi.org "tokenization" keyword
View the packages on the pypi.org package registry that are tagged with the "tokenization" keyword.
subtokenizer 0.0.19
Subwords tokenizer for neural natural language processing16 versions - Latest release: over 5 years ago - 1 dependent repositories - 312 downloads last month - 5 stars on GitHub - 1 maintainer
Top 0.1% on pypi.org
216 versions - Latest release: 17 days ago - 873 dependent packages - 15,793 dependent repositories - 17.2 million downloads last month - 29,548 stars on GitHub - 3 maintainers
spacy 3.8.5 💰
Industrial-strength Natural Language Processing (NLP) in Python216 versions - Latest release: 17 days ago - 873 dependent packages - 15,793 dependent repositories - 17.2 million downloads last month - 29,548 stars on GitHub - 3 maintainers
rusyll 0.1.1
Splitting Russian words into phonetic syllables1 version - Latest release: over 4 years ago - 2 dependent repositories - 64 downloads last month - 6 stars on GitHub - 1 maintainer
dango 0.0.1
An easy to use tokenizer for Japanese text, aimed at language learners and non-linguists2 versions - Latest release: over 3 years ago - 3 dependent repositories - 382 downloads last month - 16 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
4 versions - Latest release: over 6 years ago - 12 dependent packages - 42 dependent repositories - 88.9 thousand downloads last month - 241 stars on GitHub - 2 maintainers
sentence-splitter 1.4
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder4 versions - Latest release: over 6 years ago - 12 dependent packages - 42 dependent repositories - 88.9 thousand downloads last month - 241 stars on GitHub - 2 maintainers
spacyface 0.3.0
Aligner for spacy and huggingface tokenization10 versions - Latest release: about 4 years ago - 1 dependent repositories - 186 downloads last month - 44 stars on GitHub - 1 maintainer
bigrams 0.1.2
Simply create (N)grams3 versions - Latest release: about 2 years ago - 152 downloads last month - 6 stars on GitHub - 1 maintainer
llm-obfuscator 0.1.0
A tool for obfuscating text by manipulating token IDs while preserving token count and structure1 version - Latest release: about 1 month ago - 94 downloads last month - 1 maintainer
nlp-preprocessing 0.2.0
A Package for text preprocessing14 versions - Latest release: over 4 years ago - 1 dependent repositories - 471 downloads last month - 16 stars on GitHub - 1 maintainer
Top 3.6% on pypi.org
66 versions - Latest release: about 2 years ago - 3 dependent packages - 103 dependent repositories - 28.8 thousand downloads last month - 302 stars on GitHub - 4 maintainers
pyonmttok 1.37.1
Fast and customizable text tokenization library with BPE and SentencePiece support66 versions - Latest release: about 2 years ago - 3 dependent packages - 103 dependent repositories - 28.8 thousand downloads last month - 302 stars on GitHub - 4 maintainers
spacywb 0.1.1 💰
Industrial-strength Natural Language Processing (NLP) in Python2 versions - Latest release: over 3 years ago - 1 dependent repositories - 64 downloads last month - 31,363 stars on GitHub - 1 maintainer
semantic-split 0.1.0 💰
A better way to split (chunk/group) your text before inserting them into an LLM/Vector DB.1 version - Latest release: almost 2 years ago - 581 downloads last month - 31,363 stars on GitHub - 1 maintainer
spacy-weibo 2.3.0 💰
Industrial-strength Natural Language Processing (NLP) in Python1 version - Latest release: over 3 years ago - 1 dependent repositories - 45 downloads last month - 31,363 stars on GitHub - 1 maintainer
Top 4.5% on pypi.org
74 versions - Latest release: about 4 years ago - 2 dependent packages - 9 dependent repositories - 6.62 thousand downloads last month - 31,363 stars on GitHub - 2 maintainers
spacy-nightly 3.0.0rc5 💰
Industrial-strength Natural Language Processing (NLP) in Python74 versions - Latest release: about 4 years ago - 2 dependent packages - 9 dependent repositories - 6.62 thousand downloads last month - 31,363 stars on GitHub - 2 maintainers
spacy-ci-improve 2.0.5 💰
Industrial-strength Natural Language Processing (NLP) with Python and Cython1 version - Latest release: about 7 years ago - 1 dependent repositories - 57 downloads last month - 31,363 stars on GitHub - 1 maintainer
vaporetto 0.3.0
Python wrapper of Vaporetto tokenizer5 versions - Latest release: about 2 years ago - 1 dependent repositories - 1.9 thousand downloads last month - 20 stars on GitHub - 1 maintainer
mrs-spellings 1.0.3
a micro utility for generating plausible misspellings7 versions - Latest release: almost 5 years ago - 1 dependent repositories - 247 downloads last month - 2 stars on GitHub - 1 maintainer
textmate-grammar-python 0.6.1
A lexer and tokenizer for grammar files as defined by TextMate and used in VSCode, implemented in...12 versions - Latest release: 9 months ago - 1 dependent package - 525 downloads last month - 8 stars on GitHub - 1 maintainer
witokit 1.1.0
A python module to generate a tokenized dump of Wikipedia for NLP20 versions - Latest release: over 5 years ago - 1 dependent repositories - 254 downloads last month - 9 stars on GitHub - 1 maintainer
Top 6.9% on pypi.org
192 versions - Latest release: 3 months ago - 4 dependent repositories - 6.94 thousand downloads last month - 300 stars on GitHub - 1 maintainer
text2text 1.9.5
Text2Text Language Modeling Toolkit192 versions - Latest release: 3 months ago - 4 dependent repositories - 6.94 thousand downloads last month - 300 stars on GitHub - 1 maintainer
count-tokens 0.7.2
Count number of tokens in the text file using toktoken tokenizer from OpenAI.8 versions - Latest release: 3 months ago - 6.63 thousand downloads last month - 6 stars on GitHub - 1 maintainer
tivars 0.9.2
A library for interacting with TI-(e)z80 (82/83/84 series) calculator files10 versions - Latest release: 4 months ago - 454 downloads last month - 19 stars on GitHub - 1 maintainer
Top 2.9% on pypi.org
41 versions - Latest release: 3 months ago - 23 dependent packages - 74 dependent repositories - 1.18 million downloads last month - 273 stars on GitHub - 2 maintainers
sudachipy 0.6.10 💰
Python version of Sudachi, the Japanese Morphological Analyzer41 versions - Latest release: 3 months ago - 23 dependent packages - 74 dependent repositories - 1.18 million downloads last month - 273 stars on GitHub - 2 maintainers
aymara 0.4.1
Python bindings to the LIMA linguistic analyzer23 versions - Latest release: over 2 years ago - 1 dependent repositories - 777 downloads last month - 103 stars on GitHub - 1 maintainer
tokviz 0.1
Library for visualizing tokenization patterns across different language models1 version - Latest release: about 1 year ago - 57 downloads last month - 10 stars on GitHub - 1 maintainer
Top 2.4% on pypi.org
15 versions - Latest release: 5 months ago - 7 dependent packages - 159 dependent repositories - 118 thousand downloads last month - 370 stars on GitHub - 1 maintainer
zhon 2.1.1
Zhon provides constants used in Chinese text processing.15 versions - Latest release: 5 months ago - 7 dependent packages - 159 dependent repositories - 118 thousand downloads last month - 370 stars on GitHub - 1 maintainer
llama-tokens 0.0.3
A Quick Library with Llama 3.1/3.2 Tokenization - source https://github.com/jeffxtang/llama-tokens3 versions - Latest release: 5 months ago - 175 downloads last month - 1 stars on GitHub - 1 maintainer
xontrib-output-search 0.6.5 💰
Get identifiers, names, paths, URLs and words from the previous command output and use them for t...13 versions - Latest release: about 1 year ago - 1 dependent package - 5 dependent repositories - 480 downloads last month - 44 stars on GitHub - 1 maintainer
python-ucto 0.6.9
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost a...24 versions - Latest release: 4 months ago - 1 dependent package - 4 dependent repositories - 3.59 thousand downloads last month - 29 stars on GitHub - 1 maintainer
Top 5.9% on pypi.org
17 versions - Latest release: over 5 years ago - 1 dependent package - 11 dependent repositories - 4.28 thousand downloads last month - 85 stars on GitHub - 1 maintainer
attacut 1.0.6
Fast and Reasonably Accurate Word Tokenizer for Thai17 versions - Latest release: over 5 years ago - 1 dependent package - 11 dependent repositories - 4.28 thousand downloads last month - 85 stars on GitHub - 1 maintainer
pymmseg 1.2.0
pyMMSeg-cpp, a high performance Chinese word segmentation utility.1 version - Latest release: about 12 years ago - 4 dependent repositories - 23 downloads last month - 189 stars on GitHub - 1 maintainer
pithy 0.0.13
Pithy is a collection of utility libraries for Python 3.11 versions - Latest release: almost 5 years ago - 8 dependent repositories - 247 downloads last month - 5 stars on GitHub - 1 maintainer
tolkien 0.0.1
Token class for lexers and parsers.1 version - Latest release: over 5 years ago - 1 dependent repositories - 33 downloads last month - 5 stars on GitHub - 1 maintainer
spag 1.0.0a0
A module containing scanner (regular expression) and parser (BNF) compilers as well as a base gen...1 version - Latest release: over 6 years ago - 1 dependent repositories - 49 downloads last month - 8 stars on GitHub - 1 maintainer
hangul-korean 1.0rc2
Word segmentation for the Korean Language2 versions - Latest release: about 4 years ago - 104 downloads last month - 1 maintainer
biosaic 0.0.7
Tokenizer for encoding/decoding DNA & amino acid sequences2 versions - Latest release: 6 days ago - 0 stars on GitHub - 1 maintainer
nlpannotator 1.0.6
Annotator combining different NLP pipelines7 versions - Latest release: over 1 year ago - 209 downloads last month - 0 stars on GitHub - 1 maintainer
ipa-core 0.1.3
NLP Preprocessing Pipeline Wrappers4 versions - Latest release: almost 2 years ago - 156 downloads last month - 11 stars on GitHub - 1 maintainer
bpeasy 0.1.5
Fast bare-bones BPE for modern tokenizer training6 versions - Latest release: 17 days ago - 6.78 thousand downloads last month - 152 stars on GitHub - 1 maintainer
miditok-for-musiclang 0.0.1
A convenient MIDI tokenizer for Deep Learning networks, with multiple encoding strategies1 version - Latest release: over 1 year ago - 1 dependent package - 23 downloads last month - 1 stars on GitHub - 1 maintainer
Top 5.9% on pypi.org
65 versions - Latest release: 2 months ago - 2 dependent packages - 2 dependent repositories - 3.93 thousand downloads last month - 758 stars on GitHub - 1 maintainer
miditok 3.0.5
MIDI / symbolic music tokenizers for Deep Learning models.65 versions - Latest release: 2 months ago - 2 dependent packages - 2 dependent repositories - 3.93 thousand downloads last month - 758 stars on GitHub - 1 maintainer
Top 5.5% on pypi.org
18 versions - Latest release: 5 months ago - 6 dependent packages - 25 dependent repositories - 16.8 thousand downloads last month - 154 stars on GitHub - 1 maintainer
simplemma 1.1.2
A lightweight toolkit for multilingual lemmatization and language detection.18 versions - Latest release: 5 months ago - 6 dependent packages - 25 dependent repositories - 16.8 thousand downloads last month - 154 stars on GitHub - 1 maintainer
Top 4.6% on pypi.org
17 versions - Latest release: almost 2 years ago - 68 dependent repositories - 8.41 thousand downloads last month - 831 stars on GitHub - 2 maintainers
spacy-streamlit 1.0.6
Visualize spaCy with streamlit17 versions - Latest release: almost 2 years ago - 68 dependent repositories - 8.41 thousand downloads last month - 831 stars on GitHub - 2 maintainers
nlpbrl 1.0.1
NLP algorithm integration package5 versions - Latest release: about 2 years ago - 137 downloads last month - 0 stars on GitHub - 1 maintainer
Top 4.6% on pypi.org
12 versions - Latest release: 6 months ago - 2 dependent packages - 6 dependent repositories - 1.98 thousand downloads last month - 749 stars on GitHub - 1 maintainer
trankit 1.1.2
Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing12 versions - Latest release: 6 months ago - 2 dependent packages - 6 dependent repositories - 1.98 thousand downloads last month - 749 stars on GitHub - 1 maintainer
mamba-safe 1.0.1
A framework to generate molecules with the mamba architecture2 versions - Latest release: 8 months ago - 93 downloads last month - 2 stars on GitHub - 1 maintainer
nepalikit 1.0.2
A Nepali language processing library3 versions - Latest release: 9 months ago - 181 downloads last month - 7 stars on GitHub - 1 maintainer
pytokencounter 1.7.0
A Python library for tokenizing text and counting tokens using various encoding schemes.16 versions - Latest release: about 1 month ago - 667 downloads last month - 2 stars on GitHub - 1 maintainer
rs-bpe 0.1.0
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust1 version - Latest release: about 1 month ago - 1.91 thousand downloads last month - 1 stars on GitHub - 1 maintainer
datesearch 0.0.1
Поиск токенов, относящихся к датам, по регулярным выражениям1 version - Latest release: over 4 years ago - 1 dependent repositories - 47 downloads last month - 0 stars on GitHub - 1 maintainer
ponrawee-ssg 0.0.8
Thai syllable segmentation using Conditional Random Fields1 version - Latest release: over 3 years ago - 1 dependent repositories - 48 downloads last month - 27 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
6 versions - Latest release: over 2 years ago - 10 dependent repositories - 2.56 thousand downloads last month - 19 stars on GitHub - 3 maintainers
match 0.3.2
Match tokenized words and phrases within the original, untokenized, often messy, text.6 versions - Latest release: over 2 years ago - 10 dependent repositories - 2.56 thousand downloads last month - 19 stars on GitHub - 3 maintainers
rftokenizer 2.3.2
A character-wise tokenizer for morphologically rich languages8 versions - Latest release: about 1 month ago - 1 dependent repositories - 295 downloads last month - 27 stars on GitHub - 1 maintainer
maze-dataset 1.3.2
generating and working with datasets of mazes24 versions - Latest release: 9 days ago - 1.22 thousand downloads last month - 5 stars on GitHub - 1 maintainer
tokenization-scorer 1.1.8
Package for evaluating text tokenizations.12 versions - Latest release: 3 months ago - 447 downloads last month - 39 stars on GitHub - 1 maintainer
quickbpe 1.8.6
A fast BPE implementation in C14 versions - Latest release: 4 months ago - 335 downloads last month - 6 stars on GitHub - 1 maintainer
huspacy-nightly 0.11.0.dev261 💰
HuSpaCy: industrial strength Hungarian natural language processing126 versions - Latest release: over 1 year ago - 1 dependent repositories - 2.25 thousand downloads last month - 155 stars on GitHub - 1 maintainer
Top 6.6% on pypi.org
23 versions - Latest release: 6 months ago - 1 dependent package - 6 dependent repositories - 2.19 thousand downloads last month - 142 stars on GitHub - 1 maintainer
huspacy 0.12.1 💰
HuSpaCy: industrial strength Hungarian natural language processing23 versions - Latest release: 6 months ago - 1 dependent package - 6 dependent repositories - 2.19 thousand downloads last month - 142 stars on GitHub - 1 maintainer
tokenization-layer 0.0.2
An NLP tokenization algorithm that is a trainable layer for neural networks.2 versions - Latest release: over 3 years ago - 1 dependent repositories - 94 downloads last month - 2 stars on GitHub - 1 maintainer
alphacodings 0.2.0
base26 ([A-Z]) and base52 ([A-Za-z]) encodings2 versions - Latest release: 4 months ago - 106 downloads last month - 1,044 stars on GitHub - 1 maintainer
hebpipe 4.0.0.0
A pipeline for Hebrew NLP15 versions - Latest release: about 1 month ago - 1 dependent repositories - 381 downloads last month - 36 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
6 versions - Latest release: over 3 years ago - 1 dependent package - 16 dependent repositories - 4.12 thousand downloads last month - 27 stars on GitHub - 1 maintainer
ssg 0.0.8
Thai syllable segmentation using Conditional Random Fields6 versions - Latest release: over 3 years ago - 1 dependent package - 16 dependent repositories - 4.12 thousand downloads last month - 27 stars on GitHub - 1 maintainer
llmaestro 0.1.0
A system for orchestrating LLM tasks that exceed context limits1 version - Latest release: 2 months ago - 66 downloads last month - 0 stars on GitHub - 1 maintainer
hanzinlp 0.1.0
A NLP package specifically for Chinese1 version - Latest release: over 1 year ago - 63 downloads last month - 24 stars on GitHub - 1 maintainer
overtokenizer 0.2.0
Unicode-based language-agnostic (over-) tokenizer.2 versions - Latest release: almost 7 years ago - 1 dependent repositories - 75 downloads last month - 1 maintainer
vtext 0.2.0
Natural Language Processing in Rust with Python bidings4 versions - Latest release: almost 5 years ago - 4 dependent repositories - 300 downloads last month - 150 stars on GitHub - 1 maintainer
Top 4.5% on pypi.org
54 versions - Latest release: almost 3 years ago - 48 dependent repositories - 3.96 thousand downloads last month - 666 stars on GitHub - 1 maintainer
ekphrasis 0.5.4
Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekph...54 versions - Latest release: almost 3 years ago - 48 dependent repositories - 3.96 thousand downloads last month - 666 stars on GitHub - 1 maintainer
nupunkt 0.5.1
Next-generation Punkt sentence and paragraph boundary detection with zero dependencies5 versions - Latest release: 13 days ago - 318 downloads last month - 0 stars on GitHub - 1 maintainer
Top 3.8% on pypi.org
7 versions - Latest release: about 2 years ago - 18 dependent packages - 552 dependent repositories - 14.4 thousand downloads last month - 151 stars on GitHub - 1 maintainer
icetk 0.0.7
A unified tokenization tool for Images, Chinese and English.7 versions - Latest release: about 2 years ago - 18 dependent packages - 552 dependent repositories - 14.4 thousand downloads last month - 151 stars on GitHub - 1 maintainer
handict 0.2.0 💰
Yet another word segmentation tool.3 versions - Latest release: about 5 years ago - 1 dependent repositories - 117 downloads last month - 1 stars on GitHub - 1 maintainer
example990420 1.1.2
Taiwanese Hokkien Transliterator and Tokeniser9 versions - Latest release: 11 months ago - 342 downloads last month - 10 stars on GitHub - 1 maintainer
Top 7.0% on pypi.org
35 versions - Latest release: 5 months ago - 3 dependent repositories - 1.11 thousand downloads last month - 38 stars on GitHub - 3 maintainers
rosette-api 1.31.0
Babel Street Analytics API Python client SDK35 versions - Latest release: 5 months ago - 3 dependent repositories - 1.11 thousand downloads last month - 38 stars on GitHub - 3 maintainers
tokencost 0.1.20
To calculate token and translated USD cost of string and message calls to OpenAI, for example whe...25 versions - Latest release: 20 days ago - 1 dependent package - 28.3 thousand downloads last month - 190 stars on GitHub - 3 maintainers
Top 3.2% on pypi.org
5 versions - Latest release: about 5 years ago - 7 dependent packages - 105 dependent repositories - 27.2 thousand downloads last month - 261 stars on GitHub - 1 maintainer
razdel 0.5.0
Splits russian text into tokens, sentences, section. Rule-based5 versions - Latest release: about 5 years ago - 7 dependent packages - 105 dependent repositories - 27.2 thousand downloads last month - 261 stars on GitHub - 1 maintainer
lughaatnlp 1.3.1
A Python package for natural language processing tasks for the Urdu language, including normaliza...8 versions - Latest release: 4 months ago - 219 downloads last month - 6 stars on GitHub - 1 maintainer
ts-tokenizer 0.1.19
TS Tokenizer is a hybrid (lexicon-based and rule-based) tokenizer designed specifically for token...20 versions - Latest release: 3 months ago - 450 downloads last month - 1 stars on GitHub - 1 maintainer
Top 2.2% on pypi.org
8 versions - Latest release: about 5 years ago - 8 dependent packages - 228 dependent repositories - 59.7 thousand downloads last month - 966 stars on GitHub - 3 maintainers
youtokentome 1.0.6
Unsupervised text tokenizer focused on computational efficiency8 versions - Latest release: about 5 years ago - 8 dependent packages - 228 dependent repositories - 59.7 thousand downloads last month - 966 stars on GitHub - 3 maintainers
sic 1.3.3
Utility for string normalization15 versions - Latest release: over 3 years ago - 4 dependent repositories - 710 downloads last month - 2 stars on GitHub - 1 maintainer
quebra-frases 0.3.7
quebra_frases chunks strings into byte sized pieces12 versions - Latest release: almost 4 years ago - 4 dependent packages - 2 dependent repositories - 15.8 thousand downloads last month - 1 stars on GitHub - 2 maintainers
taibun 1.1.7
Taiwanese Hokkien Transliterator and Tokeniser14 versions - Latest release: 8 months ago - 305 downloads last month - 10 stars on GitHub - 1 maintainer
kl3m-data-client 0.1.2
Client for interacting with KL3M data stored in S31 version - Latest release: 21 days ago - 1 maintainer
mytokenize 0.1.1
Comprehensive tokenization library for Myanmar language3 versions - Latest release: 5 months ago - 163 downloads last month - 3 stars on GitHub - 1 maintainer
beanstream 1.0.1
Beanstream SDK library for processing credit card payments.5 versions - Latest release: almost 9 years ago - 3 dependent repositories - 930 downloads last month - 8 stars on GitHub - 1 maintainer
nlpcube 0.3.1.2
Natural Language Processing Toolkit with support for tokenization, sentence splitting, lemmatizat...22 versions - Latest release: almost 2 years ago - 1 dependent repositories - 1.37 thousand downloads last month - 556 stars on GitHub - 4 maintainers
grigora 0.0.3
Optimised implementation of common deep learning preprocessing utilities.3 versions - Latest release: almost 6 years ago - 1 dependent repositories - 108 downloads last month - 2 stars on GitHub - 1 maintainer
Top 9.8% on pypi.org
15 versions - Latest release: over 1 year ago - 2 dependent packages - 1 dependent repositories - 1.49 thousand downloads last month - 567 stars on GitHub - 1 maintainer
tokenmonster 1.1.12
Tokenize and decode text with TokenMonster vocabularies.15 versions - Latest release: over 1 year ago - 2 dependent packages - 1 dependent repositories - 1.49 thousand downloads last month - 567 stars on GitHub - 1 maintainer
Top 9.0% on pypi.org
1 version - Latest release: about 2 years ago - 25,021 stars on GitHub - 1 maintainer
spacy-wheel 3.5.0
Reupload of SpaCy 3.4.4 with Global Wheel1 version - Latest release: about 2 years ago - 25,021 stars on GitHub - 1 maintainer
reason 1.0.7
Natural language processing toolbox19 versions - Latest release: over 1 year ago - 9 dependent repositories - 834 downloads last month - 3 stars on GitHub - 1 maintainer
b-labs-models 2017.8.22
Ready to use CRFSuite models for sentence segmentation, tokenization and so on3 versions - Latest release: over 7 years ago - 10 dependent repositories - 84 downloads last month - 15 stars on GitHub - 1 maintainer
charformer-pytorch 0.0.4
Charformer - Pytorch4 versions - Latest release: almost 4 years ago - 1 dependent repositories - 208 downloads last month - 117 stars on GitHub - 1 maintainer
pyfpe 0.10.3
Python FPE- Does Format preserving Encryption of values3 versions - Latest release: over 3 years ago - 1 dependent repositories - 2.99 thousand downloads last month - 2 stars on GitHub - 1 maintainer
naivenlp 0.0.9
NLP toolkit, including tokenization, sequence tagging, etc.9 versions - Latest release: over 4 years ago - 1 dependent repositories - 314 downloads last month - 2 stars on GitHub - 1 maintainer
ud-toolkit 0.0.2
NLP toolkit built around UDPipe.2 versions - Latest release: over 6 years ago - 1 dependent repositories - 71 downloads last month - 4 stars on GitHub - 1 maintainer
Top 6.0% on pypi.org
7 versions - Latest release: about 1 year ago - 3 dependent packages - 3 dependent repositories - 202 thousand downloads last month - 101 stars on GitHub - 1 maintainer
ff3 1.0.2
Format Preserving Encryption (FPE) with FF37 versions - Latest release: about 1 year ago - 3 dependent packages - 3 dependent repositories - 202 thousand downloads last month - 101 stars on GitHub - 1 maintainer
epub-conversion 1.0.15
Python package for converting xml and epubs to text files14 versions - Latest release: about 5 years ago - 5 dependent repositories - 767 downloads last month - 34 stars on GitHub - 1 maintainer
wikipedia-ner 0.0.24
Python package for creating labeled examples from wiki dumps22 versions - Latest release: about 10 years ago - 3 dependent repositories - 290 downloads last month - 67 stars on GitHub - 1 maintainer
ciseau 1.0.1
Word and sentence tokenization.2 versions - Latest release: over 7 years ago - 8 dependent repositories - 514 downloads last month - 12 stars on GitHub - 1 maintainer
sept 0.4.2
The Simple Extensible Path Template (sept) is a simple to configure templating system designed at...6 versions - Latest release: over 3 years ago - 1 dependent repositories - 205 downloads last month - 8 stars on GitHub - 1 maintainer
anyks-lm 3.5.0 💰
Smart language model43 versions - Latest release: over 2 years ago - 1 dependent repositories - 1.34 thousand downloads last month - 46 stars on GitHub - 1 maintainer
Top 6.0% on pypi.org
5 versions - Latest release: over 3 years ago - 9 dependent packages - 74 dependent repositories - 19.6 thousand downloads last month - 20 stars on GitHub - 1 maintainer
mosestokenizer 1.2.1
Wrappers for several pre-processing scripts from the Moses toolkit.5 versions - Latest release: over 3 years ago - 9 dependent packages - 74 dependent repositories - 19.6 thousand downloads last month - 20 stars on GitHub - 1 maintainer
Related Keywords
nlp
56
tokenizer
33
python
31
natural-language-processing
29
machine-learning
21
spacy
14
nlp-library
14
ai
13
NLP
12
lemmatization
12
deep-learning
12
named-entity-recognition
12
text-processing
10
text-classification
10
artificial-intelligence
10
tokenize
9
data-science
9
text
9
llm
8
neural-networks
8
neural-network
8
morphological-analysis
7
segmentation
7
cython
7
transformer
7
tagging
7
entity-linking
7
token
6
parsing
6
universal-dependencies
5
natural language processing
5
multilingual
5
dependency-parsing
5
language
5
transformers
5
text-mining
4
ner
4
data
4
learning
4
embeddings
4
deep learning
4
openai
4
information-extraction
4
language-model
4
console
4
tokenisation
4
parser
4
preprocessing
4
bpe
4
unicode
4
pytorch
4
linguistics
3
processing
3
chinese
3
sentence-segmentation
3
hacktoberfest
3
text processing
3
language processing
3
sentiment-analysis
3
sentence boundary detection
3
tf-idf
3
tiktoken
3
lexer
3
stanza
3
sentence
3
language-detection
3
pos-tagging
3
part-of-speech-tagger
3
chatbot
3
artificial intelligence
3
cpp
3
zhuyin
3
machine-translation
3
XML
3
rust
3
augmentation
3
tokens
3
shell
3
morphology
3
terminal
3
tokeniser
3
large-language-models
3
language-models
3
gpt
3
natural
3
text-analysis
3
information-retrieval
2
romanisation
2
poj
2
transliterator
2
taiwan
2
transliteration
2
taiwanese
2
romanization
2
hokkien
2
taigi
2
cli
2
sentence-splitting
2
mmseg
2
word
2