Problem Statement
Soundex is a simple system of encoding surnames, used by the U.S. Census Bureau in the late 1800s and early 1900s. The process consists of transforming arbitrary names into a sequence of a letter followed by exactly three digits. The purpose is to have alternate spellings of the same last name encode to the same sequence, so different spellings of the same family's name can be found in genealogical records.
Transforming a name into its Soundex code consists of the following steps.
- 1. Remove the first letter and all occurrences of the letters 'A', 'E', 'H', 'I', 'O', 'U', 'W', or 'Y'
- 2. Replace each remaining letter with the number given for it in the following
table
Number | Represents these letters -------+------------------------- 1 | B P F V 2 | C S G J K Q X Z 3 | D T 4 | L 5 | M N 6 | R
- 3. If two or more letters with the same code are adjacent in the original name (before step 1 was applied), remove the codes for all the adjacent letters but the first. Two letters are adjacent only if the letters 'H' or 'W' appear between them in the original name (or no letters at all). For instance, in the name "Schroeder", the 'S', 'c', and the first 'r' are all adjacent to each other, but none are adjacent to the 'd' or the last 'r'. In the string '"Cwhhwqhgjhwaj"', all but the initial 'C' and the final 'j' would be removed.
- 4. The resulting code is of the form <letter><digit><digit><digit>, where <letter> is the first letter of the original name and the three <digit>s are the first three digits obtained from step 3. If there are less than three digits, add zeros to the end until there are three digits.
For example, the surname "Williams" would be transformed as follows:
- "Williams" -> "llms"
- "llms" -> "4452"
- "4452" -> "452"
- "452" -> "W452"
Your method should return the Soundex code of the given name, formed by following the rules outlined above. The code should be formatted as an uppercase letter followed by three digits. Case does not matter in the transformation process, but the result must have a leading uppercase letter.
Definition
- Class:
- Soundex
- Method:
- encode
- Parameters:
- String
- Returns:
- String
- Method signature:
- String encode(String name)
- (be sure your method is public)
Notes
- keep in mind that the case of letters in name is insignificant, but the case of the letter in its Soundex code is significant.
Constraints
- name will have length between 1 and 20 inclusive
- name will consist only of letters ('A'-'Z', 'a'-'z')
Examples
"TopCoder"
Returns: "T123"
The result of each step is as follows: "TopCoder" -> "pCdr" "pCdr" -> "1236" "1236" -> "1236" "1236" -> "T123" The method should return "T123"
"Jackson"
Returns: "J250"
The result of each step is as follows: "Jackson" -> "cksn" "cksn" -> "2225" "2225" -> "25" (since the c, k, and s are adjacent and have the same code, only the c's code is kept) "25" -> "J250" (note the extra zero added for padding)
"Pfister"
Returns: "P236"
"Pfister" -> "fstr" "fstr" -> "1236" "1236" -> 236 (The P and f have the same code and are adjacent, so the f is discarded) "236" -> "P236"
"ashcroft"
Returns: "A261"
The s and c have the same code and are only separated by an h, so the c is discarded from the code. Also note that the leading a is capitalized in the output.
"Cwhhwqhgjhwaj"
Returns: "C200"
All letters with codes except for the final j are adjacent to each other, and thus all are discarded except for the leading C and the final j.
"Cwhwhhwhwqhwhgjhwa"
Returns: "C000"
"ashcroft"
Returns: "A261"
"ABCDEFGHIJKLMNOPQRST"
Returns: "A123"
"AAAAAAAAAAAAAAAAAAAA"
Returns: "A000"
"BBBBBBBBBBBBBBBBBBBB"
Returns: "B000"
"Logan"
Returns: "L250"
"Hanks"
Returns: "H520"
"X"
Returns: "X000"
"Y"
Returns: "Y000"
"U"
Returns: "U000"
"Chiswick"
Returns: "C220"
"Williams"
Returns: "W452"
"Schroeder"
Returns: "S636"
"pPAt"
Returns: "P300"
first and second letter join, but are different case (my solution failed here but wasn't caught by earlier tests - orb)