So today we ran into this problem where we want to be able to filter player input in our games. However, players are smart, they know that they can spell words to get around filters. Today, we wrote a system that doesn’t allow them to do this. It’s managed by a nice google spread sheet that you can maintain, and it also supports ignoring the flag if the bad word turns out to be a good one (e.g. in the case the word “bass” is found, it can be triggered for the word “*ss”).

Then we ran into the problem where it is really hard to compare parts of strings against a list of strings. For example, finding the word the word *ss in the word “bass” would require you to iterate over every single word in your list (10k+ words in our case) and see if it matches a .contains boolean. I figured there had to be a better way. I wrote a function that grows in runtime based on the message size, not on the list size, which should allow for more efficient, easier filtering for everyone. Let’s make sure we keep the kids safe from finding out bad words on our games!

Here’s how it works:

1) Have a google spreadsheet with all words that I want to filter out

2) Directly download the google spreadsheet into my code with the loadConfigs method (see below)

3) Replace all l33tsp33k characters with their respective alphabet letter

4) Replace all special characters but letters from the sentence

5) Run an algorithm that checks all the possible combinations of words within a string against the list efficiently, note that this part is key – you don’t want to loop over your ENTIRE list every time to see if your word is in the list. In my case, I found every combination within the string input and checked it against a hashmap (O(1) runtime). This way the runtime grows relatively to the string input, not the list input. It also caps the search space at the length of the largest word in your filter.

6) Check if the word is not used in combination with a good word (e.g. bass contains *ss). This is also loaded through the spreadsheet

6) In our case we are also posting the filtered words to Slack, but you can remove that line obviously.

Screen Shot 2016-05-28 at 5.51.04 PM.png

Use this structure in your google sheet.

Then use the functions in this gist to load to sheet, and use the function in the badWordsFound function to return a list of all bad words inside a string input.

Good luck! Feel free to reply with questions.

Here’s the code running with the word “abcdef”:

checking: 0,1

word: a

checking: 0,2

word: ab

checking: 0,3

word: abc

checking: 0,4

word: abcd

checking: 0,5

word: abcde

checking: 0,6

word: abcdef

checking: 1,1

word: b

checking: 1,2

word: bc

checking: 1,3

word: bcd

checking: 1,4

word: bcde

checking: 1,5

word: bcdef

checking: 2,1

word: c

checking: 2,2

word: cd

checking: 2,3

word: cde

checking: 2,4

word: cdef

checking: 3,1

word: d

checking: 3,2

word: de

checking: 3,3

word: def

checking: 4,1

word: e

checking: 4,2

word: ef

checking: 5,1

word: f