text-similarity-cli

Compare any two text files and get a detailed similarity score using three algorithms. Zero dependencies.

View on GitHub See Algorithms
install
$ pip install text-similarity-cli

Three algorithms, one score

Levenshtein

Counts the minimum character edits needed to transform one string into another. Best for short texts and code.

Character-level

Jaccard

Compares unique word sets. Score = intersection / union. Best for keyword overlap and topic similarity.

Token-level

Cosine

Builds term-frequency vectors and measures the angle between them. Best for longer documents and essays.

Vector-level

Sample output

Levenshtein 72.4%
Jaccard 68.1%
Cosine 81.3%
Average 73.9%

Verdict: Highly similar

All flags

FlagDefaultDescription
--algoalllevenshtein, jaccard, cosine, or all
--jsonoffOutput results as JSON
--no-coloroffDisable ANSI color output
--threshold NnoneExit code 1 if average score is below N%
--versionPrint version and exit