Statistics

Problem Statement for "Lexer"

Problem Statement

A lexer (lexical analyzer) is used in compilers to break input text into pieces called tokens. In this problem you will be given a list of valid tokens. For example:
tokens = {"ab","aba","A"}

Given a list of valid tokens and an input string your lexer will work as follows:
1) a) If the input doesn't begin with one of the valid tokens, remove the first character from the string.
b) If the input does begin with a valid token, determine the longest valid token the input starts with and remove it from the string. The removed portion is considered CONSUMED.
2) Repeat 1 until there are no characters left in the input.

The lexer is CASE-SENSITIVE so a token must exactly match the beginning of the string.

Given a list of valid tokens and an input string your method will return a list containing the CONSUMED valid tokens in the order they were CONSUMED. For example:
tokens = {"ab","aba","A"}
input = "ababbbaAab"

"ab" and "aba" are found at the beginning of the input but "aba" is longest so it is consumed. Now:
consumed = {"aba"}
input = "bbbaAab"

There are no tokens that start with 'b' so the lexer will remove the first 3 characters from the string.
consumed = {"aba"}
input = "aAab"

The 'a' doesn't match the token "A" due to CASE-SENSITIVITY. The lexer removes the 'a' from the beginning of the string.
consumed = {"aba"}
input = "Aab"

The lexer consumes the "A" token.
consumed = {"aba","A"}
input = "ab"

Finally the lexer consumes the "ab" token and completes the process.
consumed = {"aba","A","ab"}
input = ""
The returned list is {"aba","A","ab"}.

Create a class Lexer that contains the method tokenize, which takes a String[] tokens, and a String input, and returns a String[] in the form specified above.

Definition

Class:
Lexer
Method:
tokenize
Parameters:
String[], String
Returns:
String[]
Method signature:
String[] tokenize(String[] tokens, String input)
(be sure your method is public)

Constraints

  • tokens will contain between 0 and 50 elements inclusive
  • Each element of tokens will have length between 1 and 50 inclusive
  • Each element of tokens will only consist of letters (A-Z,a-z)
  • input will have length between 0 and 50 inclusive
  • input will only consist of letters (A-Z,a-z)

Examples

  1. {"ab","aba","A"}

    "ababbbaAab"

    Returns: { "aba", "A", "ab" }

    Same as above

  2. {"a","a","aa","aaa","aaaa","aaaaa","aa"}

    "aaaaaaaaaaaaaaaaaaaaaaaaa"

    Returns: { "aaaaa", "aaaaa", "aaaaa", "aaaaa", "aaaaa" }

    Make sure to use the longest valid starting token

  3. {"wow","wo","w"}

    "awofwwofowwowowowwwooo"

    Returns: { "wo", "w", "wo", "w", "wow", "wow", "w", "wo" }

  4. {"int","double","long","char","boolean","byte","float"}

    "intlongdoublecharintintboolean"

    Returns: { "int", "long", "double", "char", "int", "int", "boolean" }

  5. {}

    "Great"

    Returns: { }

    No valid tokens, so nothing is CONSUMED

  6. {"AbCd","dEfG","GhIj"}

    "abCdEfGhIjAbCdEfGhIj"

    Returns: { "dEfG", "AbCd", "GhIj" }

    Remember CASE-SENSITIVITY

  7. {}

    "a"

    Returns: { }

  8. {"a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa"}

    "aaaaaaaaaaaaaaa"

    Returns: { "aaaaaaaa", "aaaaaaa" }

  9. {"a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa"}

    "aaaaaaaabaaaaaaabaaaaaabaaaaabaaaabaaabaabab"

    Returns: { "aaaaaaaa", "aaaaaaa", "aaaaaa", "aaaaa", "aaaa", "aaa", "aa", "a" }

  10. {"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"}

    "hsfhiuansvunekinvaABAKJsadfjkjbasBSABDasbhdsahasbA"

    Returns: { "h", "s", "f", "h", "i", "u", "a", "n", "s", "v", "u", "n", "e", "k", "i", "n", "v", "a", "s", "a", "d", "f", "j", "k", "j", "b", "a", "s", "a", "s", "b", "h", "d", "s", "a", "h", "a", "s", "b" }

  11. {"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"}

    "ASFSIWHAFJWAVMAWOMVOWAMBOAWMBOADVSDVSDF"

    Returns: { }

  12. {"AAAAAAAAAAAAAAAAAAAA","AA"}

    "AAAAAAAAAAAAAAAAAAAAA"

    Returns: { "AAAAAAAAAAAAAAAAAAAA" }

  13. {"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"}

    "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"

    Returns: { "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" }

  14. {"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"}

    "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"

    Returns: { }

  15. {"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa"}

    "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"

    Returns: { }

  16. {"a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a"}

    "adfjaslfjalsdfjaslfjdasdfjasdfjsadfjsaoifujobnwef"

    Returns: { "a", "a", "a", "a", "a", "a", "a", "a" }

  17. {"a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a"}

    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

    Returns: { "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a" }

  18. { "fds", "fdsa", "aaa", "aaaaaaa", "aaa" }

    "aaaaaaaaaaaaaaafdsfdsfdsfdsfdsaaaaaa"

    Returns: { "aaaaaaa", "aaaaaaa", "fds", "fds", "fds", "fds", "fdsa", "aaa" }

  19. { "a", "a", "aa", "aaa", "aaaa", "aaaaa", "aa" }

    "aaaaaaaaaaaaaaaaaaaaaaaaa"

    Returns: { "aaaaa", "aaaaa", "aaaaa", "aaaaa", "aaaaa" }

  20. { }

    ""

    Returns: { }

  21. { "abcd" }

    "abcd"

    Returns: { "abcd" }

  22. { "aa", "a" }

    "aaaaaaaaaa"

    Returns: { "aa", "aa", "aa", "aa", "aa" }

  23. { "a" }

    "aaaaaaaaaaaaa"

    Returns: { "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a" }

  24. { "a", "b", "c", "d", "e", "r", "t", "g", "f", "d", "s", "d", "e", "w", "a", "d", "f" }

    "ab"

    Returns: { "a", "b" }

  25. { "fds", "fdsa", "aaa", "aaaaaaa", "aaa" }

    "aaaaaaaaaaaaaaafdsfdsfdsfdsfdsaaaaaa"

    Returns: { "aaaaaaa", "aaaaaaa", "fds", "fds", "fds", "fds", "fdsa", "aaa" }

  26. { "a", "a", "aa", "aaa", "aaaa", "aaaaa", "aa" }

    "aaaaaaaaaaaaaaaaaaaaaaaaa"

    Returns: { "aaaaa", "aaaaa", "aaaaa", "aaaaa", "aaaaa" }

  27. { }

    ""

    Returns: { }

  28. { "abcd" }

    "abcd"

    Returns: { "abcd" }

  29. { "aa", "a" }

    "aaaaaaaaaa"

    Returns: { "aa", "aa", "aa", "aa", "aa" }

  30. { "a" }

    "aaaaaaaaaaaaa"

    Returns: { "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a" }

  31. { "a", "b", "c", "d", "e", "r", "t", "g", "f", "d", "s", "d", "e", "w", "a", "d", "f" }

    "ab"

    Returns: { "a", "b" }


This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2024, TopCoder, Inc. All rights reserved.
This problem was used for: