Statistics

Problem Statement for "SpamDetector"

Problem Statement

You are writing part of a spam detection system. Your job is to analyze the subject lines of e-mail messages and return a count of known spam signalling keywords in the subject lines. Your task is made more difficult by the spammers who try to hide the keywords in several ways. Here we will consider just one obfuscation technique: duplicating characters. Duplicating characters means taking an existing character in a word and inserting more copies of that character into the same place in the word. This process can then be repeated on a different character in the word. The spam signalling keyword "credit" might be modified to "creddiT", "CredittT" or "ccrreeeddiitt", etc., but not "credict".

For the purposes of this problem we will consider subject lines which contain only letters and spaces. The "words" in the subject line are delimited by spaces. A word in the subject line is considered a "match" if the entire word is the same as at least one entire keyword, after possibly removing some duplicated characters from the subject word. A keyword that matches only part of a subject word or a subject word that matches only part of a keyword does not count. Note that if a keyword contains a double letter, the subject word must also contain (at least) a double letter in the same position to match ("double letter" means two consecutive letters in the word that are the same). For this application, all matches (and the use of the term "same") are case insensitive.

Given a subject line and a list of keywords, return the count of words in the subject line which "match" words in the keyword list. If multiple words in the subject line match the same keyword, they are each counted, but a word in the subject line that matches multiple keywords is only counted once.

Definition

Class:
SpamDetector
Method:
countKeywords
Parameters:
String, String[]
Returns:
int
Method signature:
int countKeywords(String subjectLine, String[] keywords)
(be sure your method is public)

Constraints

  • subjectLine will contain between 0 and 50 characters, inclusive.
  • subjectLine will include only letter ('a' to 'z' and 'A' to 'Z') and space (' ') characters.
  • keywords will have between 0 and 50 elements, inclusive.
  • each element of keywords will contain between 1 and 50 characters, inclusive.
  • each element of keywords will consist of only letters ('a' to 'z' and 'A' to 'Z').
  • The same letter (ignoring case) never appears more than twice consecutively in any element of keywords. (ie. "aabbAAbb" is ok, but "aaAbb" is not allowed.)

Examples

  1. "LoooW INTEREST RATES available dont BE slow"

    {"interest","rates","loan","available","LOW"}

    Returns: 4

    "INTEREST" , "RATES" , "available", and "LoooW" match. Note that "slow" does not match, even though it contains the substring "low" which is a keyword.

  2. "Dear Richard Get Rich Quick no risk"

    {"rich","risk","Quicken","wealth","SAVE"}

    Returns: 2

    Don't match "Richard"

  3. "a a a a a a a a a a a a a a a a a a a a a a a a a"

    {"aa","b","c","d","e","f","g","a"}

    Returns: 25

  4. "in debbtt againn and aAgain and AGAaiIN"

    {"AGAIN","again","Again","again"}

    Returns: 3

  5. "PlAyy ThEE Lottto get Loottoo feever"

    {"play","lotto","lottery","looser"}

    Returns: 3

  6. "aabb aaabb abb aab ab bbaaa aab"

    {"aab"}

    Returns: 4

  7. "abc aabc abbc abcc aabbc abbcc aabcc"

    {"abc"}

    Returns: 7

  8. "AAAAaaaaaaaaAAAAAAAbbbbbbccccc AAAAAbccc"

    {"abc"}

    Returns: 2

  9. " "

    {"empty","space","does","not","match"}

    Returns: 0

  10. "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

    {"ab","ab","ab","ab","ab","ab","ab","ab","ab","ab", "ab","ab","ab","ab","ab","ab","ab","ab","ab","ab", "ab","ab","ab","ab","ab","ab","ab","ab","ab","ab", "ab","ab","ab","ab","ab","ab","ab","ab","ab","ab", "ab","ab","ab","ab","ab","ab","ab","ab","ab","ab"}

    Returns: 0

  11. "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"

    {"a","a","a","a","a","a","a","a","a","a", "a","a","a","a","a","a","a","a","a","a", "a","a","a","a","a","a","a","a","a","a", "a","a","a","a","a","a","a","a","a","a", "a","a","a","a","a","a","a","a","a","ab"}

    Returns: 1

  12. "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"

    {"a","a","a","a","a","a","a","a","a","a", "a","a","a","a","a","a","a","a","a","a", "a","a","a","a","a","a","a","a","a","a", "a","a","a","a","a","a","a","a","a","a", "a","a","a","a","a","a","a","a","a","b"}

    Returns: 0

  13. "aaaaab abbbbb aaaabbbb ab"

    {"aab","aBB"}

    Returns: 3

  14. "oooooooooooooooooooooooGooooooooooooooooooooooo"

    {"ogo"}

    Returns: 1

  15. "OgO"

    {"OOgOO"}

    Returns: 0

  16. "AAABBBCCCDDDEEEFFFGGGHHHIIIJJKKLLMMNNOOP"

    {"aabbccddeeffgghhiijjkkllmmnnoopp","aabbccddeeffgghhiijjkkllmmnnoop"}

    Returns: 1

  17. "AAABBBCCCDDDEEEFFFGGGHHHIIIJJKKLLMMNNOOP"

    {"aabbccddeeffgghhiijjkkllmmnnoopp"}

    Returns: 0

  18. "ffghiijkklmmnnopqqrsttt lmnno pppp"

    {"abcdefghijklmn","abcdefgh","lmmnno","pp"}

    Returns: 1

  19. "no loosers at losers slots"

    {"losers"}

    Returns: 2

  20. "ded deed dded ddeedd dedd deedd ddeed deeed deeded"

    {"Deed"}

    Returns: 5

  21. ""

    {"nothing","nada","zip","nil","squat","vaccuum"}

    Returns: 0

  22. "Z"

    {"a","b","c","d","e","f","g","i","j","k","l","m","n","o","p", "q","r","s","t","u","v","w","x","y","Z"}

    Returns: 1

  23. "z y x w v u t s r q p o n m l k j i h g f e d c b"

    {"a","b","c","d","e","f","g","i","j","k","l","m","n","o","p", "q","r","s","t","u","v","w","x","y","Z"}

    Returns: 24

  24. "zz bbb ddd lll kkkkk qq rrr pppp uu n mmm i"

    {"a","b","c","d","e","f","g","i","j","k","l","m","nn","o","p", "q","r","s","t","u","v","w","x","y","Z"}

    Returns: 11

  25. "to b or not to b"

    {"a","b","c","d","e","f","g","i","j","k","l","m","n","o","p", "q","r","s","t","u","v","w","x","y","Z"}

    Returns: 2

  26. "Ben said is am are was were be being been bee"

    {"ben","be"}

    Returns: 4

  27. "it is unlikely to match if there are no keywords"

    {}

    Returns: 0

  28. "Dear Richard Get Rich Quick no risk"

    { "rich", "risk", "Quicken", "wealth", "SAVE" }

    Returns: 2

  29. "a"

    { "a", "a" }

    Returns: 1

  30. "todo"

    { "ttodo" }

    Returns: 0

  31. "aabbcc"

    { "aABBCC" }

    Returns: 1

  32. "aabb"

    { "aab" }

    Returns: 1

  33. "abcddd"

    { "abc" }

    Returns: 0

  34. "qw qqw"

    { "qqw", "qqw" }

    Returns: 1

  35. "helo"

    { "hello" }

    Returns: 0

  36. "lo"

    { "loo" }

    Returns: 0

  37. "again againn aagain"

    { "AGAIN", "again", "aGain", "Again" }

    Returns: 3

  38. "aa"

    { "ba" }

    Returns: 0

  39. " cooooool abcdFGHI abb cc dd ee ff gg hh ii jj"

    { "cl", "abcdefgh", "ab", "c", "d", "ii", "j" }

    Returns: 5

  40. "ba"

    { "bab" }

    Returns: 0

  41. "plaaaaya"

    { "plaay" }

    Returns: 0

  42. "fo"

    { "foo" }

    Returns: 0

  43. "lowwwww"

    { "low" }

    Returns: 1

  44. "aaab"

    { "aab" }

    Returns: 1

  45. "boird"

    { "bird" }

    Returns: 0

  46. "misssissippi mississsippi mississipppi misisipi"

    { "mississippi" }

    Returns: 3

  47. "a"

    { "aa" }

    Returns: 0


This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2024, TopCoder, Inc. All rights reserved.
This problem was used for: