Friday, July 24, 2009

Advanced Bad Word Filter in PHP

N.B. Guys, this is an old post and eregi_replace() is now deprecated in PHP so use preg_replace() instead.


Filtering of profanities is a necessity for most blogs, forums and shout boxes. There are plenty of scripts doing the rounds of the internet which can be harvested for the purpose, but I was'nt too happy with their level of efficiency so I made a small addition.
In addition to the regular filtering by matching exact strings, I also used the metaphone() function in PHP which is actually an implementation of the Metaphone algorithm which is a phonetic algorithm. This helped me to achieve greater accuracy by also filtering out similar sounding words even though they were spelt wrongly. An illustration of the results I achieved with my script and also the actual script are as follows:

My banned word list: "hell, stupid, idiot, bullshit, shit"

My test sentence: "He was a very Stupid boy, really stooopid, as sTUpid as stupid can be. A total idiot. He was determined to drag everyone to hell. Enough bullshit...enough bulshiit...I had enough of shiit. Now you iidiots can go to hel."

Result after filtering with my script: "He was a very ***** boy, really ***** as ***** as ***** can be. A total ***** . He was determined to drag everyone to ***** . Enough ***** ...enough ***** had enough of ***** Now you i ***** s can go to *****"

Result after filtering with a normal script I found on the internet: "He was a very ****** boy, really stooopid, as ****** as ****** can be. A total *****. He was determined to drag everyone to ****. Enough ********...enough bulshiit...I had enough of shiit. Now you iidiots can go to hel."

Most of the scripts I found on the internet failed to filter the words in red above.

----------------------------------------------------------------------------------------------
(N.B. My function is simply a word filtering function and does not remove HTML tags like. However this can be easily done using functions like strip_tags() etc.)

My bad words filter function:


// Create the string containing your list of banned words, comma separated

$banned_word_list = " hell, stupid, idiot, bullshit, shit" ;

// Test sentence to be filtered

$output = "He was a very Stupid boy, really stooopid, as sTUpid as stupid can be. A total idiot. He was determined to drag everyone to hell. Enough bullshit...enough bulshiit...I had enough of shiit. Now you iidiots can go to hel." ;

// The Function accepts two parameters, $checkstr is the string containing your banned word list and $output is the text to be filtered

function wordfilter($checkstr,$output){

/////////////////////////////////// Regaular Cleaning

$iarray = explode(",",$checkstr) ;

foreach($iarray as $i){

$output = eregi_replace(trim($i),' ***** ',$output) ;

}
/////////////////////////////////////////////////////

////////////////////////////////////// Phonetic Filter


foreach($iarray as $i){
//---------------------------------------------------------------- 1
$checkz = explode(" ",$output) ;

foreach($checkz as $c){
//---------------------------------------------------------------- 2

if(metaphone(trim($c))== metaphone(trim($i))){

$output = eregi_replace($c,' ***** ',$output) ;

}

//---------------------------------------------------------------- 2

}
//---------------------------------------------------------------- 1

}

//////////////////////////////////////////////////////
return $output ;

}

// Function Call

wordfilter($banned_word_list,$output) ;

___________________________________________________________________________________