Echo Word Utility
October 29, 2014 by David T. Allen
An echo is a repeated word that breaks reader immersion. Echoes can be difficult to find in your own writing.
Echoes are a problem I often have, so I built a tool to help me discover where echoes occur. I wrote this blog post to explain how it works, and I hope it can help you, too.
How to Use the Echo Word Detector
Paste an excerpt from your story into the textarea on this page and press Detect.
Potential echoes are highlighted in gray.
Click on a highlighted word to focus on a specific echo. Click on the word again to show all potential echoes.
Red words are potential echoes at the beginning or ending of a sentence. These echoes tend to stand out more.
There is also a list of every word in the excerpt, sorted by how many times the word is used. You can click on one of those words to highlight every occurence in your excerpt.
You shouldn’t trust the echo word utility completely, just like you shouldn’t trust every piece of advice from every critique you receive. It’s a tool. It highlights potential weaknesses; it does not tell you your writing is bad.
A few excerpts can be analyzed with a click of a button. This provides a quick, unbiased view of how the tool works, since it’s analyzing well known works.
Feel free to paste excerpts from other stories for comparison.
Different types of words have different thresholds. You can even exclude certain word types.
Cick the alter thresholds button to change thresholds. Uncheck to exclude word type from being highlighted. You will need to click the Detect button again for changes to appear.
How it Works
Consider the following block of text, where each word is numbered:
Meet Leslie and Dave.
1 2 3 4
Leslie and Dave like writing.
5 6 7 8 9
Leslie writes more.
10 11 12
The text is converted to a data structure where each word is a list of where it appears:
{
meet: 1
leslie: 2, 5, 10
and: 3, 6
dave: 4, 7
like: 8
writing: 9
writes: 11
more: 12
}
Each word is then analyzed for echoes.
An echo is determined using two thresholds: distance and cluster size.
Calculating distance is done by simple subtraction. Consider leslie, which is the second, fifth, and tenth word:
distance(leslie) = { 5 - 2, 10 - 5}
distance(leslie) = { 3, 5 }
If the distance threshold is four, then 3 is a hit because three is less than four. The 5 is a miss, since five is not less than four.
distance_threshold = 4
threshold(distance(leslie)) = { 3 < 4, 5 < 4 }
threshold(distance(leslie)) = { true, false }
Based on the distance threshold, the first two occurences of leslie are close to each other, but the last two occurences are not.
Next, the cluster threshold is applied to determine if there are too many occurences that are close to each other.
If we continue the example, with a cluster size of two, we find one cluster:
L = threshold(distance(leslie)) = {
true, // (words 2, 5)
false // (words 5, 10)
}
cluster_threshold = 2
clusters(L) = { { 2, 5 } }
In other words, the second and fifth occurence of leslie are considered echoes.
Let’s start over, this time with different input and thresholds:
//=======================================
// INPUT
//---------------------------------------
{
the: 1, 5, 9, 22, 41, 50
}
//=======================================
// CALCULATE DISTANCE
//---------------------------------------
distance_threshold = 10
D = distance(the) = {
4, // ( 5 - 1)
4, // ( 9 - 5)
13, // (22 - 9)
19, // (41 - 22)
9 // (50 - 41)
}
//=======================================
// ARE WORDS CLOSE TO EACH OTHER?
//---------------------------------------
T = threshold(D) = {
4 < 10, // (words 1, 5)
4 < 10, // (words 5, 9)
13 < 10, // (words 9, 22)
19 < 10, // (words 22, 41)
9 < 10 // (words 50, 41)
}
T = threshold(D) = {
true, // (words 1, 5)
true, // (words 5, 9)
false, // (words 9, 22)
false, // (words 22, 41)
true // (words 41, 50)
}
//=======================================
// ARE THERE CLUSTERS?
//---------------------------------------
cluster_threshold = 2
clusters(T) = {
{ 1, 5, 9 },
{ 41, 50 }
}
//=======================================
// SUMMARY
//---------------------------------------
There are two clusters.
The first is a cluster of three occurences at positions 1, 5, and 9.
The second is a cluster of two occurences at positions 41 and 50.