Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "deduplication" keyword

Top 8.7% on pypi.org
lieu 1.1.1
Dedupe addresses and venues around the world with libpostal
1 version - Latest release: over 2 years ago - 2 dependent repositories - 903 downloads last month - 79 stars on GitHub - 2 maintainers
dupandas 0.3.2
python package to deduplicate text data in pandas dataframe using flexible string matching and cl...
4 versions - Latest release: almost 7 years ago - 1 dependent repositories - 26 downloads last month - 25 stars on GitHub - 1 maintainer
haychecker 0.0.1
a small library to check for data quality, either with spark or pandas
1 version - Latest release: almost 6 years ago - 1 dependent repositories - 6 downloads last month - 2 stars on GitHub - 1 maintainer
rdfhash 0.4.6
De-duplicate RDF triples w/ a SPARQL query. Subjects taken from SELECT are replaced by the hash o...
16 versions - Latest release: 12 months ago - 1 dependent repositories - 37 downloads last month - 10 stars on GitHub - 1 maintainer
Top 1.8% on pypi.org
borgbackup 1.2.8 💰
Deduplicated, encrypted, authenticated and compressed backups
102 versions - Latest release: 2 months ago - 3 dependent packages - 83 dependent repositories - 12 thousand downloads last month - 10,494 stars on GitHub - 1 maintainer
hashget-kernel-org 0.9
kernel.org (linux kernel sources) plugin for hashget
7 versions - Latest release: about 3 years ago - 1 dependent package - 1 dependent repositories - 19 downloads last month - 7 stars on GitHub - 1 maintainer
hashget 0.91.1
deduplication tool for archiving data with extremely high ratio
78 versions - Latest release: over 5 years ago - 1 dependent repositories - 106 downloads last month - 7 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
fastcdc 1.5.0 💰
FastCDC (content defined chunking) in pure Python.
9 versions - Latest release: about 1 year ago - 2 dependent packages - 2 dependent repositories - 1.96 thousand downloads last month - 41 stars on GitHub - 1 maintainer
olass 0.0.3
OneFlorida Linkage Submission System
3 versions - Latest release: almost 8 years ago - 1 dependent repositories - 8 downloads last month - 0 stars on GitHub - 1 maintainer
imgdup 1.4
Visual similarity image finder and cleaner (image deduplication tool)
6 versions - Latest release: about 9 years ago - 2 dependent repositories - 17 downloads last month - 18 stars on GitHub - 1 maintainer
imagedupes 1.2.9
Python 3 CLI application for finding visually similar images
13 versions - Latest release: over 6 years ago - 1 dependent repositories - 45 downloads last month - 0 stars on GitHub - 1 maintainer
sdhash 0.0.4
Library for image hashing and deduplication.
4 versions - Latest release: over 8 years ago - 2 dependent packages - 2 dependent repositories - 33 downloads last month - 11 stars on GitHub - 1 maintainer
oc-graphenricher 0.2.5
A tool to enrich any OCDM compliant Knowledge Graph, finding new identifiers and deduplicating en...
7 versions - Latest release: 8 months ago - 1 dependent repositories - 18 downloads last month - 10 stars on GitHub - 1 maintainer
news-extract 1.0.2
news_extract
3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 9 downloads last month - 36 stars on GitHub - 1 maintainer
pylibpostal 1.0.0
Parse street addresses around the world
2 versions - Latest release: almost 3 years ago - 1 dependent repositories - 345 downloads last month - 3,942 stars on GitHub - 1 maintainer
deduper 0.0.7
OneFlorida De-duplication Software
9 versions - Latest release: about 6 years ago - 1 dependent repositories - 40 downloads last month - 12 stars on GitHub - 1 maintainer
Top 3.9% on pypi.org
splink 3.9.14
Fast probabilistic data linkage at scale
131 versions - Latest release: 3 months ago - 3 dependent packages - 4 dependent repositories - 114 thousand downloads last month - 1,072 stars on GitHub - 4 maintainers
Top 5.1% on pypi.org
nomenklatura 3.10.6 💰
Make record linkages in followthemoney data.
136 versions - Latest release: 3 months ago - 5 dependent packages - 4 dependent repositories - 5.82 thousand downloads last month - 183 stars on GitHub - 2 maintainers
Top 5.4% on pypi.org
rltk 2.0.0a20
Record Linkage ToolKit
20 versions - Latest release: over 2 years ago - 4 dependent packages - 9 dependent repositories - 248 downloads last month - 103 stars on GitHub - 1 maintainer
vcardz-data 0.9.2
Python 3 vCard and deduplication
2 versions - Latest release: over 6 years ago - 3 dependent repositories - 20 downloads last month - 2 stars on GitHub - 1 maintainer
chunksum 0.6.0
Print FastCDC rolling hash chunks and checksums.
7 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 207 downloads last month - 0 stars on GitHub - 1 maintainer
borg-qt 2019.5.30
A graphical frontend for BorgBackup.
2 versions - Latest release: about 5 years ago - 36 downloads last month - 16 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
recordlinkage 0.13.2 💰
A record linkage toolkit for linking and deduplication
23 versions - Latest release: about 5 years ago - 6 dependent packages - 59 dependent repositories - 2.13 million downloads last month - 916 stars on GitHub - 1 maintainer
removedup 1.0.6
Remove duplicates from parallel corpora
7 versions - Latest release: 5 months ago - 1.52 thousand downloads last month - 5 stars on GitHub - 1 maintainer
er-evaluation 2.3.0 💰
An End-to-End Evaluation Framework for Entity Resolution Systems.
9 versions - Latest release: 6 months ago - 1 dependent package - 1 dependent repositories - 31 downloads last month - 9 stars on GitHub - 1 maintainer
npbackup 2.2.1
One fits all solution for deduplicated and compressed backups on servers and laptops
7 versions - Latest release: 9 months ago - 39 downloads last month - 117 stars on GitHub - 1 maintainer
chunkdup 0.5.0
Find (partial content) duplicate files.
10 versions - Latest release: over 1 year ago - 232 downloads last month - 1 stars on GitHub - 1 maintainer
inoutlists
inoutlists is a python package to parse and normalize different sources of lists (OFAC, EU, UN, e...
3 versions - 400 downloads last month - 1 maintainer
cir-duplicate-detector 0.2.0
PDQ hash and URL duplicate detector. Developed by Sam Sweere from BigData Repulic as part of thei...
10 versions - Latest release: 3 months ago - 91 downloads last month - 2 stars on GitHub - 1 maintainer
mail-deduplicate 7.3.0 💰
📧 CLI to deduplicate mails from mail boxes.
18 versions - Latest release: 7 months ago - 1 dependent repositories - 75 downloads last month - 159 stars on GitHub - 1 maintainer
dedup-me 0.2.0
Deduplicate concurrent function calls.
2 versions - Latest release: over 1 year ago - 11 downloads last month - 1 stars on GitHub - 1 maintainer
pyjedai 0.1.7
An open-source library that builds powerful end-to-end Entity Resolution workflows.
16 versions - Latest release: about 2 months ago - 211 downloads last month - 63 stars on GitHub - 2 maintainers
maildir-deduplicate 2.2.0 💰
Deduplicate mails from a set of maildir folders.
11 versions - Latest release: almost 4 years ago - 2 dependent repositories - 104 downloads last month - 159 stars on GitHub - 1 maintainer
replicat 1.4.1
Configurable and lightweight backup utility with deduplication and encryption.
13 versions - Latest release: 8 months ago - 1 dependent repositories - 256 downloads last month - 5 stars on GitHub - 1 maintainer
redis-message-queue 0.8.0
Python message queuing with Redis and message deduplication
11 versions - Latest release: 5 months ago - 223 downloads last month - 2 stars on GitHub - 1 maintainer
Top 8.8% on pypi.org
zingg 0.4.0
Zingg Entity Resolution, Data Mastering and Deduplication
3 versions - Latest release: 5 months ago - 1 dependent repositories - 1.27 thousand downloads last month - 900 stars on GitHub - 1 maintainer
py-image-dedup 2.0.0 💰
A library to find duplicate images and delete unwanted ones
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 57 downloads last month - 150 stars on GitHub - 1 maintainer
mismo 0.1.0
The SQL/Ibis powered sklearn of record linkage
1 version - Latest release: 12 months ago - 6 downloads last month - 11 stars on GitHub - 1 maintainer
Top 3.5% on pypi.org
fingerprints 1.2.3
A library to generate entity fingerprints.
30 versions - Latest release: 8 months ago - 12 dependent packages - 45 dependent repositories - 30.1 thousand downloads last month - 139 stars on GitHub - 5 maintainers
unisim 0.0.2
UniSim: Universal Similarity
3 versions - Latest release: 25 days ago - 1 dependent package - 2.18 thousand downloads last month - 81 stars on GitHub - 1 maintainer
marty 1
An efficient backup tool inspired by Git, saving your bandwidth and providing global deduplicatio...
1 version - Latest release: 10 months ago - 2 dependent repositories - 12 stars on GitHub - 1 maintainer
pydupes 0.6.1
A duplicate file finder that may be faster in environments with millions of files and terabytes o...
10 versions - Latest release: over 2 years ago - 1 dependent repositories - 61 downloads last month - 3 stars on GitHub - 1 maintainer
deduplication 0.0.3
Remove duplicate documents via popular algorithms such as SimHash, SpotSig, Shingling, etc.
3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 31 downloads last month - 16 stars on GitHub - 1 maintainer
eche 0.2.1
Little helper for handling entity clusters
3 versions - Latest release: 3 months ago - 1 dependent package - 38 downloads last month - 1 stars on GitHub - 1 maintainer
entity-embed 0.0.6
Transform entities like companies, products, etc. into vectors to support scalable Record Linkage...
6 versions - Latest release: almost 3 years ago - 1 dependent repositories - 58 downloads last month - 139 stars on GitHub - 1 maintainer
bedup 0.10.1
Deduplication for Btrfs filesystems
11 versions - Latest release: about 8 years ago - 2 dependent repositories - 55 downloads last month - 324 stars on GitHub - 1 maintainer
deduplicationdict 1.0.4
A dictionary that de-duplicates values.
7 versions - Latest release: 11 months ago - 53 downloads last month - 1 stars on GitHub - 1 maintainer
dbretina 2.2.11
DBRetina Python Package
21 versions - Latest release: 11 months ago - 1 dependent repositories - 131 downloads last month - 1 stars on GitHub - 1 maintainer
benji 0.17.0
A block based deduplicating backup software for Ceph RBD, image files and devices
22 versions - Latest release: over 1 year ago - 3 dependent repositories - 265 downloads last month - 136 stars on GitHub - 1 maintainer
deduplipy 0.7.10
End-to-end deduplication solution
23 versions - Latest release: about 1 year ago - 1 dependent repositories - 372 downloads last month - 71 stars on GitHub - 1 maintainer
pnu-dcmp 1.0.1
compare two directories
2 versions - Latest release: over 2 years ago - 1 dependent package - 1 dependent repositories - 35 downloads last month - 0 stars on GitHub - 1 maintainer
atbu-pkg 0.0.38
ATBU package supports local/cloud backup/restore as well as local file integrity diff tool for he...
37 versions - Latest release: 7 months ago - 1 dependent repositories - 176 downloads last month - 1 stars on GitHub - 1 maintainer
superdeduper 0.1.7
A simple interface to datamade/dedupe to make probabilistic record linkage easy.
7 versions - Latest release: about 7 years ago - 1 dependent repositories - 73 downloads last month - 42 stars on GitHub - 1 maintainer
qlink 0.1a1
Entity Resolution and Record Linkage library
1 version - Latest release: almost 7 years ago - 1 dependent repositories - 20 downloads last month - 7 stars on GitHub - 1 maintainer
pgdedupe 0.2.1
A simple interface to datamade/dedupe to make probabilistic record linkage easy.
2 versions - Latest release: about 7 years ago - 1 dependent repositories - 26 downloads last month - 42 stars on GitHub - 1 maintainer
narrow-down 1.1.0
Fast fuzzy text search
18 versions - Latest release: about 1 year ago - 1 dependent repositories - 187 downloads last month - 9 stars on GitHub - 1 maintainer
deduplicator 1.0.0
Rapid file deduplication utility for Unix systems
1 version - Latest release: over 6 years ago - 1 dependent repositories - 15 downloads last month - 3 stars on GitHub - 1 maintainer
ded 0.1.4
Helm dependency deduplication post-renderer
4 versions - Latest release: almost 4 years ago - 1 dependent repositories - 51 downloads last month - 0 stars on GitHub - 1 maintainer
contentcopy 1.1.0
Merge directory contents, deduplicating files based on their content.
2 versions - Latest release: over 4 years ago - 1 dependent repositories - 25 downloads last month - 2 stars on GitHub - 1 maintainer
photodedup 0.2.1
A simple photo deduplication tool written in Python
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 20 downloads last month - 3 stars on GitHub - 1 maintainer
Top 9.6% on pypi.org
co-deduplicate 1.0.2 removed
conditor bibloigraphic record deduplication package
6 versions - Latest release: over 2 years ago - 50 downloads last month