pip install dedupe==2.0.8

A python library for accurate and scaleable data deduplication and entity-resolution

Source
Among top 1% packages on PyPI.
Over 633.4K downloads in the last 90 days.

Commonly used with dedupe

Based on how often these packages appear together in public requirements.txt files on GitHub.

rlr

Case weighted L2 regularized logistic regression

categorical-distance

Compare two categorical variables

affinegap

A Cython implementation of the affine gap string distance

PyLBFGS

LBFGS and OWL-QN optimization algorithms

dedupe-hcluster

Hierarchical Clustering Algorithms (Information Theory)

simplecosine

Simple cosine distance

canonicalize

canonicalize a cluster of records

dedupe-variable-address

Address variable type for dedupe

dedupe-variable-name

Name variable type for dedupe

pyhacrf

Hidden alignment conditional random field, a discriminative string edit distance

highered

Learnable Edit Distance Using PyHacrf

parseratorvariable

Structured variable type for dedupe

csvdedupe

Command line tools for deduplicating and merging csv files

app_version

A tiny utility to get application version from pkg_resouces

hcluster

A hierarchical clustering package for Scipy.

gsconfig-py3

GeoServer REST Configuration

grabber

grabber: periodically grabs a picture of your screen

vulk

Vulk: Advanced 3D engine

arf

Advanced Recording Format for acoustic, behavioral, and physiological data

Version usage of dedupe

Proportion of downloaded versions in the last 3 months (only versions over 1%).

2.0.8

50.28%

1.10.0

33.03%