Problem Statement
tokens = {"ab","aba","A"}
Given a list of valid tokens and an input string your lexer will work as follows:
1) a) If the input doesn't begin with one of the valid tokens, remove the first character from the string.
b) If the input does begin with a valid token, determine the longest valid token the input starts with and remove it from the string. The removed portion is considered CONSUMED.
2) Repeat 1 until there are no characters left in the input.
The lexer is CASE-SENSITIVE so a token must exactly match the beginning of the string.
Given a list of valid tokens and an input string your method will return a list containing the CONSUMED valid tokens in the order they were CONSUMED. For example:
tokens = {"ab","aba","A"}
input = "ababbbaAab"
"ab" and "aba" are found at the beginning of the input but "aba" is longest so it is consumed. Now:
consumed = {"aba"}
input = "bbbaAab"
There are no tokens that start with 'b' so the lexer will remove the first 3 characters from the string.
consumed = {"aba"}
input = "aAab"
The 'a' doesn't match the token "A" due to CASE-SENSITIVITY. The lexer removes the 'a' from the beginning of the string.
consumed = {"aba"}
input = "Aab"
The lexer consumes the "A" token.
consumed = {"aba","A"}
input = "ab"
Finally the lexer consumes the "ab" token and completes the process.
consumed = {"aba","A","ab"}
input = ""
The returned list is {"aba","A","ab"}.
Create a class Lexer that contains the method tokenize, which takes a
Definition
- Class:
- Lexer
- Method:
- tokenize
- Parameters:
- String[], String
- Returns:
- String[]
- Method signature:
- String[] tokenize(String[] tokens, String input)
- (be sure your method is public)
Constraints
- tokens will contain between 0 and 50 elements inclusive
- Each element of tokens will have length between 1 and 50 inclusive
- Each element of tokens will only consist of letters (A-Z,a-z)
- input will have length between 0 and 50 inclusive
- input will only consist of letters (A-Z,a-z)
Examples
{"ab","aba","A"}
"ababbbaAab"
Returns: { "aba", "A", "ab" }
Same as above
{"a","a","aa","aaa","aaaa","aaaaa","aa"}
"aaaaaaaaaaaaaaaaaaaaaaaaa"
Returns: { "aaaaa", "aaaaa", "aaaaa", "aaaaa", "aaaaa" }
Make sure to use the longest valid starting token
{"wow","wo","w"}
"awofwwofowwowowowwwooo"
Returns: { "wo", "w", "wo", "w", "wow", "wow", "w", "wo" }
{"int","double","long","char","boolean","byte","float"}
"intlongdoublecharintintboolean"
Returns: { "int", "long", "double", "char", "int", "int", "boolean" }
{}
"Great"
Returns: { }
No valid tokens, so nothing is CONSUMED
{"AbCd","dEfG","GhIj"}
"abCdEfGhIjAbCdEfGhIj"
Returns: { "dEfG", "AbCd", "GhIj" }
Remember CASE-SENSITIVITY
{}
"a"
Returns: { }
{"a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa"}
"aaaaaaaaaaaaaaa"
Returns: { "aaaaaaaa", "aaaaaaa" }
{"a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa"}
"aaaaaaaabaaaaaaabaaaaaabaaaaabaaaabaaabaabab"
Returns: { "aaaaaaaa", "aaaaaaa", "aaaaaa", "aaaaa", "aaaa", "aaa", "aa", "a" }
{"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"}
"hsfhiuansvunekinvaABAKJsadfjkjbasBSABDasbhdsahasbA"
Returns: { "h", "s", "f", "h", "i", "u", "a", "n", "s", "v", "u", "n", "e", "k", "i", "n", "v", "a", "s", "a", "d", "f", "j", "k", "j", "b", "a", "s", "a", "s", "b", "h", "d", "s", "a", "h", "a", "s", "b" }
{"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"}
"ASFSIWHAFJWAVMAWOMVOWAMBOAWMBOADVSDVSDF"
Returns: { }
{"AAAAAAAAAAAAAAAAAAAA","AA"}
"AAAAAAAAAAAAAAAAAAAAA"
Returns: { "AAAAAAAAAAAAAAAAAAAA" }
{"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"}
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
Returns: { "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" }
{"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"}
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
Returns: { }
{"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa"}
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
Returns: { }
{"a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a"}
"adfjaslfjalsdfjaslfjdasdfjasdfjsadfjsaoifujobnwef"
Returns: { "a", "a", "a", "a", "a", "a", "a", "a" }
{"a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a"}
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
Returns: { "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a" }
{ "fds", "fdsa", "aaa", "aaaaaaa", "aaa" }
"aaaaaaaaaaaaaaafdsfdsfdsfdsfdsaaaaaa"
Returns: { "aaaaaaa", "aaaaaaa", "fds", "fds", "fds", "fds", "fdsa", "aaa" }
{ "a", "a", "aa", "aaa", "aaaa", "aaaaa", "aa" }
"aaaaaaaaaaaaaaaaaaaaaaaaa"
Returns: { "aaaaa", "aaaaa", "aaaaa", "aaaaa", "aaaaa" }
{ }
""
Returns: { }
{ "abcd" }
"abcd"
Returns: { "abcd" }
{ "aa", "a" }
"aaaaaaaaaa"
Returns: { "aa", "aa", "aa", "aa", "aa" }
{ "a" }
"aaaaaaaaaaaaa"
Returns: { "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a" }
{ "a", "b", "c", "d", "e", "r", "t", "g", "f", "d", "s", "d", "e", "w", "a", "d", "f" }
"ab"
Returns: { "a", "b" }
{ "fds", "fdsa", "aaa", "aaaaaaa", "aaa" }
"aaaaaaaaaaaaaaafdsfdsfdsfdsfdsaaaaaa"
Returns: { "aaaaaaa", "aaaaaaa", "fds", "fds", "fds", "fds", "fdsa", "aaa" }
{ "a", "a", "aa", "aaa", "aaaa", "aaaaa", "aa" }
"aaaaaaaaaaaaaaaaaaaaaaaaa"
Returns: { "aaaaa", "aaaaa", "aaaaa", "aaaaa", "aaaaa" }
{ }
""
Returns: { }
{ "abcd" }
"abcd"
Returns: { "abcd" }
{ "aa", "a" }
"aaaaaaaaaa"
Returns: { "aa", "aa", "aa", "aa", "aa" }
{ "a" }
"aaaaaaaaaaaaa"
Returns: { "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a" }
{ "a", "b", "c", "d", "e", "r", "t", "g", "f", "d", "s", "d", "e", "w", "a", "d", "f" }
"ab"
Returns: { "a", "b" }