Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
pypi.org "deduplication" keyword
Top 8.7% on pypi.org
1 version - Latest release: over 2 years ago - 2 dependent repositories - 903 downloads last month - 79 stars on GitHub - 2 maintainers
lieu 1.1.1
Dedupe addresses and venues around the world with libpostal1 version - Latest release: over 2 years ago - 2 dependent repositories - 903 downloads last month - 79 stars on GitHub - 2 maintainers
dupandas 0.3.2
python package to deduplicate text data in pandas dataframe using flexible string matching and cl...4 versions - Latest release: almost 7 years ago - 1 dependent repositories - 26 downloads last month - 25 stars on GitHub - 1 maintainer
haychecker 0.0.1
a small library to check for data quality, either with spark or pandas1 version - Latest release: almost 6 years ago - 1 dependent repositories - 6 downloads last month - 2 stars on GitHub - 1 maintainer
rdfhash 0.4.6
De-duplicate RDF triples w/ a SPARQL query. Subjects taken from SELECT are replaced by the hash o...16 versions - Latest release: 12 months ago - 1 dependent repositories - 37 downloads last month - 10 stars on GitHub - 1 maintainer
Top 1.8% on pypi.org
102 versions - Latest release: 2 months ago - 3 dependent packages - 83 dependent repositories - 12 thousand downloads last month - 10,494 stars on GitHub - 1 maintainer
borgbackup 1.2.8 💰
Deduplicated, encrypted, authenticated and compressed backups102 versions - Latest release: 2 months ago - 3 dependent packages - 83 dependent repositories - 12 thousand downloads last month - 10,494 stars on GitHub - 1 maintainer
hashget-kernel-org 0.9
kernel.org (linux kernel sources) plugin for hashget7 versions - Latest release: about 3 years ago - 1 dependent package - 1 dependent repositories - 19 downloads last month - 7 stars on GitHub - 1 maintainer
hashget 0.91.1
deduplication tool for archiving data with extremely high ratio78 versions - Latest release: over 5 years ago - 1 dependent repositories - 106 downloads last month - 7 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
9 versions - Latest release: about 1 year ago - 2 dependent packages - 2 dependent repositories - 1.96 thousand downloads last month - 41 stars on GitHub - 1 maintainer
fastcdc 1.5.0 💰
FastCDC (content defined chunking) in pure Python.9 versions - Latest release: about 1 year ago - 2 dependent packages - 2 dependent repositories - 1.96 thousand downloads last month - 41 stars on GitHub - 1 maintainer
olass 0.0.3
OneFlorida Linkage Submission System3 versions - Latest release: almost 8 years ago - 1 dependent repositories - 8 downloads last month - 0 stars on GitHub - 1 maintainer
imgdup 1.4
Visual similarity image finder and cleaner (image deduplication tool)6 versions - Latest release: about 9 years ago - 2 dependent repositories - 17 downloads last month - 18 stars on GitHub - 1 maintainer
imagedupes 1.2.9
Python 3 CLI application for finding visually similar images13 versions - Latest release: over 6 years ago - 1 dependent repositories - 45 downloads last month - 0 stars on GitHub - 1 maintainer
sdhash 0.0.4
Library for image hashing and deduplication.4 versions - Latest release: over 8 years ago - 2 dependent packages - 2 dependent repositories - 33 downloads last month - 11 stars on GitHub - 1 maintainer
oc-graphenricher 0.2.5
A tool to enrich any OCDM compliant Knowledge Graph, finding new identifiers and deduplicating en...7 versions - Latest release: 8 months ago - 1 dependent repositories - 18 downloads last month - 10 stars on GitHub - 1 maintainer
news-extract 1.0.2
news_extract3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 9 downloads last month - 36 stars on GitHub - 1 maintainer
pylibpostal 1.0.0
Parse street addresses around the world2 versions - Latest release: almost 3 years ago - 1 dependent repositories - 345 downloads last month - 3,942 stars on GitHub - 1 maintainer
deduper 0.0.7
OneFlorida De-duplication Software9 versions - Latest release: about 6 years ago - 1 dependent repositories - 40 downloads last month - 12 stars on GitHub - 1 maintainer
Top 3.9% on pypi.org
131 versions - Latest release: 3 months ago - 3 dependent packages - 4 dependent repositories - 114 thousand downloads last month - 1,072 stars on GitHub - 4 maintainers
splink 3.9.14
Fast probabilistic data linkage at scale131 versions - Latest release: 3 months ago - 3 dependent packages - 4 dependent repositories - 114 thousand downloads last month - 1,072 stars on GitHub - 4 maintainers
Top 5.1% on pypi.org
136 versions - Latest release: 3 months ago - 5 dependent packages - 4 dependent repositories - 5.82 thousand downloads last month - 183 stars on GitHub - 2 maintainers
nomenklatura 3.10.6 💰
Make record linkages in followthemoney data.136 versions - Latest release: 3 months ago - 5 dependent packages - 4 dependent repositories - 5.82 thousand downloads last month - 183 stars on GitHub - 2 maintainers
Top 5.4% on pypi.org
20 versions - Latest release: over 2 years ago - 4 dependent packages - 9 dependent repositories - 248 downloads last month - 103 stars on GitHub - 1 maintainer
rltk 2.0.0a20
Record Linkage ToolKit20 versions - Latest release: over 2 years ago - 4 dependent packages - 9 dependent repositories - 248 downloads last month - 103 stars on GitHub - 1 maintainer
vcardz-data 0.9.2
Python 3 vCard and deduplication2 versions - Latest release: over 6 years ago - 3 dependent repositories - 20 downloads last month - 2 stars on GitHub - 1 maintainer
chunksum 0.6.0
Print FastCDC rolling hash chunks and checksums.7 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 207 downloads last month - 0 stars on GitHub - 1 maintainer
borg-qt 2019.5.30
A graphical frontend for BorgBackup.2 versions - Latest release: about 5 years ago - 36 downloads last month - 16 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
23 versions - Latest release: about 5 years ago - 6 dependent packages - 59 dependent repositories - 2.13 million downloads last month - 916 stars on GitHub - 1 maintainer
recordlinkage 0.13.2 💰
A record linkage toolkit for linking and deduplication23 versions - Latest release: about 5 years ago - 6 dependent packages - 59 dependent repositories - 2.13 million downloads last month - 916 stars on GitHub - 1 maintainer
removedup 1.0.6
Remove duplicates from parallel corpora7 versions - Latest release: 5 months ago - 1.52 thousand downloads last month - 5 stars on GitHub - 1 maintainer
er-evaluation 2.3.0 💰
An End-to-End Evaluation Framework for Entity Resolution Systems.9 versions - Latest release: 6 months ago - 1 dependent package - 1 dependent repositories - 31 downloads last month - 9 stars on GitHub - 1 maintainer
npbackup 2.2.1
One fits all solution for deduplicated and compressed backups on servers and laptops7 versions - Latest release: 9 months ago - 39 downloads last month - 117 stars on GitHub - 1 maintainer
chunkdup 0.5.0
Find (partial content) duplicate files.10 versions - Latest release: over 1 year ago - 232 downloads last month - 1 stars on GitHub - 1 maintainer
inoutlists
inoutlists is a python package to parse and normalize different sources of lists (OFAC, EU, UN, e...3 versions - 400 downloads last month - 1 maintainer
cir-duplicate-detector 0.2.0
PDQ hash and URL duplicate detector. Developed by Sam Sweere from BigData Repulic as part of thei...10 versions - Latest release: 3 months ago - 91 downloads last month - 2 stars on GitHub - 1 maintainer
mail-deduplicate 7.3.0 💰
📧 CLI to deduplicate mails from mail boxes.18 versions - Latest release: 7 months ago - 1 dependent repositories - 75 downloads last month - 159 stars on GitHub - 1 maintainer
dedup-me 0.2.0
Deduplicate concurrent function calls.2 versions - Latest release: over 1 year ago - 11 downloads last month - 1 stars on GitHub - 1 maintainer
pyjedai 0.1.7
An open-source library that builds powerful end-to-end Entity Resolution workflows.16 versions - Latest release: about 2 months ago - 211 downloads last month - 63 stars on GitHub - 2 maintainers
maildir-deduplicate 2.2.0 💰
Deduplicate mails from a set of maildir folders.11 versions - Latest release: almost 4 years ago - 2 dependent repositories - 104 downloads last month - 159 stars on GitHub - 1 maintainer
replicat 1.4.1
Configurable and lightweight backup utility with deduplication and encryption.13 versions - Latest release: 8 months ago - 1 dependent repositories - 256 downloads last month - 5 stars on GitHub - 1 maintainer
redis-message-queue 0.8.0
Python message queuing with Redis and message deduplication11 versions - Latest release: 5 months ago - 223 downloads last month - 2 stars on GitHub - 1 maintainer
Top 8.8% on pypi.org
3 versions - Latest release: 5 months ago - 1 dependent repositories - 1.27 thousand downloads last month - 900 stars on GitHub - 1 maintainer
zingg 0.4.0
Zingg Entity Resolution, Data Mastering and Deduplication3 versions - Latest release: 5 months ago - 1 dependent repositories - 1.27 thousand downloads last month - 900 stars on GitHub - 1 maintainer
py-image-dedup 2.0.0 💰
A library to find duplicate images and delete unwanted ones2 versions - Latest release: over 3 years ago - 1 dependent repositories - 57 downloads last month - 150 stars on GitHub - 1 maintainer
mismo 0.1.0
The SQL/Ibis powered sklearn of record linkage1 version - Latest release: 12 months ago - 6 downloads last month - 11 stars on GitHub - 1 maintainer
Top 3.5% on pypi.org
30 versions - Latest release: 8 months ago - 12 dependent packages - 45 dependent repositories - 30.1 thousand downloads last month - 139 stars on GitHub - 5 maintainers
fingerprints 1.2.3
A library to generate entity fingerprints.30 versions - Latest release: 8 months ago - 12 dependent packages - 45 dependent repositories - 30.1 thousand downloads last month - 139 stars on GitHub - 5 maintainers
unisim 0.0.2
UniSim: Universal Similarity3 versions - Latest release: 25 days ago - 1 dependent package - 2.18 thousand downloads last month - 81 stars on GitHub - 1 maintainer
marty 1
An efficient backup tool inspired by Git, saving your bandwidth and providing global deduplicatio...1 version - Latest release: 10 months ago - 2 dependent repositories - 12 stars on GitHub - 1 maintainer
pydupes 0.6.1
A duplicate file finder that may be faster in environments with millions of files and terabytes o...10 versions - Latest release: over 2 years ago - 1 dependent repositories - 61 downloads last month - 3 stars on GitHub - 1 maintainer
deduplication 0.0.3
Remove duplicate documents via popular algorithms such as SimHash, SpotSig, Shingling, etc.3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 31 downloads last month - 16 stars on GitHub - 1 maintainer
eche 0.2.1
Little helper for handling entity clusters3 versions - Latest release: 3 months ago - 1 dependent package - 38 downloads last month - 1 stars on GitHub - 1 maintainer
entity-embed 0.0.6
Transform entities like companies, products, etc. into vectors to support scalable Record Linkage...6 versions - Latest release: almost 3 years ago - 1 dependent repositories - 58 downloads last month - 139 stars on GitHub - 1 maintainer
bedup 0.10.1
Deduplication for Btrfs filesystems11 versions - Latest release: about 8 years ago - 2 dependent repositories - 55 downloads last month - 324 stars on GitHub - 1 maintainer
deduplicationdict 1.0.4
A dictionary that de-duplicates values.7 versions - Latest release: 11 months ago - 53 downloads last month - 1 stars on GitHub - 1 maintainer
dbretina 2.2.11
DBRetina Python Package21 versions - Latest release: 11 months ago - 1 dependent repositories - 131 downloads last month - 1 stars on GitHub - 1 maintainer
benji 0.17.0
A block based deduplicating backup software for Ceph RBD, image files and devices22 versions - Latest release: over 1 year ago - 3 dependent repositories - 265 downloads last month - 136 stars on GitHub - 1 maintainer
deduplipy 0.7.10
End-to-end deduplication solution23 versions - Latest release: about 1 year ago - 1 dependent repositories - 372 downloads last month - 71 stars on GitHub - 1 maintainer
pnu-dcmp 1.0.1
compare two directories2 versions - Latest release: over 2 years ago - 1 dependent package - 1 dependent repositories - 35 downloads last month - 0 stars on GitHub - 1 maintainer
atbu-pkg 0.0.38
ATBU package supports local/cloud backup/restore as well as local file integrity diff tool for he...37 versions - Latest release: 7 months ago - 1 dependent repositories - 176 downloads last month - 1 stars on GitHub - 1 maintainer
superdeduper 0.1.7
A simple interface to datamade/dedupe to make probabilistic record linkage easy.7 versions - Latest release: about 7 years ago - 1 dependent repositories - 73 downloads last month - 42 stars on GitHub - 1 maintainer
qlink 0.1a1
Entity Resolution and Record Linkage library1 version - Latest release: almost 7 years ago - 1 dependent repositories - 20 downloads last month - 7 stars on GitHub - 1 maintainer
pgdedupe 0.2.1
A simple interface to datamade/dedupe to make probabilistic record linkage easy.2 versions - Latest release: about 7 years ago - 1 dependent repositories - 26 downloads last month - 42 stars on GitHub - 1 maintainer
narrow-down 1.1.0
Fast fuzzy text search18 versions - Latest release: about 1 year ago - 1 dependent repositories - 187 downloads last month - 9 stars on GitHub - 1 maintainer
deduplicator 1.0.0
Rapid file deduplication utility for Unix systems1 version - Latest release: over 6 years ago - 1 dependent repositories - 15 downloads last month - 3 stars on GitHub - 1 maintainer
ded 0.1.4
Helm dependency deduplication post-renderer4 versions - Latest release: almost 4 years ago - 1 dependent repositories - 51 downloads last month - 0 stars on GitHub - 1 maintainer
contentcopy 1.1.0
Merge directory contents, deduplicating files based on their content.2 versions - Latest release: over 4 years ago - 1 dependent repositories - 25 downloads last month - 2 stars on GitHub - 1 maintainer
photodedup 0.2.1
A simple photo deduplication tool written in Python2 versions - Latest release: over 6 years ago - 1 dependent repositories - 20 downloads last month - 3 stars on GitHub - 1 maintainer
Top 9.6% on pypi.org
6 versions - Latest release: over 2 years ago - 50 downloads last month
co-deduplicate 1.0.2 removed
conditor bibloigraphic record deduplication package6 versions - Latest release: over 2 years ago - 50 downloads last month
Related Keywords
python
21
record-linkage
14
entity-resolution
13
backup
10
dedup
7
dedupe
7
record linkage
6
entity resolution
6
fuzzy-matching
6
compression
5
data-matching
4
data-science
4
duplicate-detection
4
cli
4
deduplicate
3
image
3
restic
3
duplicate
3
spark
3
similarity
3
machine-learning
3
python3
3
nlp
3
mbox
2
cleanup
2
mailbox
2
filesystem
2
babyl
2
duplicates
2
mh
2
mmdf
2
photo
2
maildir
2
email
2
mail
2
shell
2
CLI
2
rolling-hash
2
fastcdc
2
content-defined-chunking
2
duckdb
2
record
2
clustering
2
mapping
2
identity
2
linkage
2
pandas
2
s3
2
borgbackup
2
c
2
international
2
postgresql
2
encryption
2
python-3
2
database
2
archive
2
data-cleaning
2
chunking
2
entity-matching
2
address
2
duplication
1
files
1
git
1
marty
1
vector-search
1
analytics-engineering
1
tensorflow2
1
analytics
1
data-transformation
1
identity resolution
1
data mastering
1
algorithms
1
cv
1
google
1
imagehash
1
Entity Resolution
1
shingling
1
simhash
1
connected components
1
fft
1
transitive closure
1
connected-components
1
transitive-closure
1
image-comparison
1
image-analysis
1
hacktoberfest
1
find-duplicates
1
duplicate-images
1
ibis
1
sql
1
names
1
people
1
companies
1
modern-data-stack
1
ml
1
masterdata
1
normalisation
1
identity-resolution
1
iso20275
1
fuzzymatch
1