Removing Duplicate Strings Using Javascript
Solution 1:
How about this?
va = ["this is a sentence", "sentence this is", "sentence this is a"]
vb = {} // dictionary of combined sorted words in each sentence
vc = [] // output list of sentences without duplicates for (i=0;i<va.length;i++){
// split sentence, sort words, and recombine (this is a sentence => a is sentence this)var combined = va[i].split(" ").sort().join(" ");
if (!vb[combined]){ // if set of combined sorted words doesn't exist already
vc.push(va[i]); // sentence isn't duplicated, push to output list
vb[combined] = true// add set to dictionary
}
}
alert(vc.join("\n"))
Solution 2:
Use an Object as a lookup to get a quick hashtable-backed check. That means using string as your key type, which means normalising the case/ordering/etc of the words first to get a unique key for each combination of words.
// Get key for sentence, removing punctuation and normalising case and word order// eg 'Hello, a horse!' -> 'x_a hello horse'// the 'x_' prefix is to avoid clashes with any object properties with undesirable// special behaviour (like prototype properties in IE) and get a plain lookup//functiongetSentenceKey(sentence) {
var trimmed= sentence.replace(/^\s+/, '').replace(/\s+$/, '').toLowerCase();
var words= trimmed.replace(/[^\w\s]+/g, '').replace(/\s+/, ' ').split(' ');
words.sort();
return'x_'+words.join(' ');
}
var lookup= {};
for (var i= sentences.length; i-->0;) {
var key= getSentenceKey(sentences[i]);
if (key in lookup)
sentences.splice(i, 1);
else
lookup[key]= true;
}
Would need some work if you need to support non-ASCII characters (\w
doesn't play well with Unicode in JS, and the question of what constitutes a word in some languages is a difficult one). Also, is "foo bar foo" the same sentence as "bar bar foo"?
Solution 3:
Here's something to try. I didn't test its performance on large arrays, but I think it should be ok. No jQuery needed.
functionremoveDuplicates(array)
{
var new_array = [];
for(var i=0; i<array.length; i++){
// convert current sentence to sorted, lowercase stringvar sen = array[i].split(" ").sort().join(" ");
if(new_array.indexOf(sen) == -1){
// no matches, let's add it!
new_array.push(sen);
}
}
return new_array;
}
Array.prototype.indexOf = function(item, optional_start_index)
{
for(var i=optional_start_index||0; i<this.length; i++){
if(this[i] == item) return i;
}
return -1;
}
Use it like this:
var a = ["this is a name", "name is this a", "this name is a", "hello there"];
var clean_array = removeDuplicates(a);
alert(clean_array); // outputs: a is name this,hello there
Solution 4:
- JavaScript similar_text Calculates the similarity between two strings
Solution 5:
Sort the array of sentences, and then loop through it and delete an item if it is the same as the previous one:
texts.sort();
for(var i = 1; i < texts.length; i++){
if(texts[i] === texts[i-1]){
texts.splice(i,1);
i--;
}
}
I tested this in an array with 800 strings, and it seemed reasonably fast.
EDIT: Sorry, didn't read your question very carefully
Post a Comment for "Removing Duplicate Strings Using Javascript"