Statistics

Problem Statement for "Enzymes"

Problem Statement

*** WARNING: Do not leave the competition room while you are competing. If you have a question, use "admins: question" in the chat area. ***

Biologists often use enzymes to break proteins they are studying into pieces. A protein is a chain of amino acids. A protein will be represented as a String of capital letters, each letter representing an amino acid. For example, "GGCCDDEFFA" and "CCDDEE" are two examples of proteins.

Each enzyme will be represented by a String of non-repeating capital letters. When used, the enzyme will break a protein after all amino acids that are identified in this String. For example, "CDE" is an enzyme that breaks proteins after occurrences of "C", "D", or "E" in the protein representation. If we apply "CDE" to "GGCCDDEFFA" we would get "GGC", "C", "D", "D", "E", "FFA" since the enzymes breaks the protein after all of the specified letters.

Create a class Enzymes that contains a method split which takes as arguments a String[] enzymes representing the enzymes at your disposal and String protein representing the protein to be broken. The method returns the total number of unique fragments that can be produced by using any combination of the given enzymes in any order.

Valid amino acids are: "ACDEFGHIKLMNPQRSTVWY".

Definition

Class:
Enzymes
Method:
split
Parameters:
String[], String
Returns:
int
Method signature:
int split(String[] enzymes, String protein)
(be sure your method is public)

Notes

  • Proteins are directional. Therefore a enzyme can only cleave after a specific amino acid, and not before. For example, the enzyme "C" can cleave the protein "ACE" into "AC" and "E", but not into "A" and "CE".
  • Once applied, an enzyme completely breaks a protein after all specified amino acids.

Constraints

  • enzymes will contain between 1 and 10 elements, inclusive.
  • each element of enzymes will have between 1 and 20 characters, inclusive.
  • each element of enzymes will only contain capital letters representing valid amino acids, with no repetition of letters. Valid amino acids are "ACDEFGHIKLMNPQRSTVWY".
  • protein will have between 1 and 50 characters, inclusive.
  • each character of protein will be a capital letter which represents a valid amino acid. Valid amino acids are "ACDEFGHIKLMNPQRSTVWY".
  • At least one character in each element of enzymes will be a character other than the last in protein

Examples

  1. {"WYFLM","KR"}

    "AACDDYNQQNMEREY"

    Returns: 6

    Using only the first enzyme, we would break the protein at 3 locations. However, one of these is after the last amino acid of the protein, so no new fragment would be created (remember also that we don't count the original protein fragment). This gives us 3 (new) fragments. Using the second enzyme we break at only one amino acid, giving us 2 fragments. Using both enzymes we break at all 3 of the previous points, giving us 4 fragments. However, 3 of these fragments (the initial "AACDDY", "NQQNM", and the final "EY") are repeated, so the total count is 6: "AACDDY", "NQQNM", "EREY", "AACDDYNQQNMER", "EY", AND "ER".

  2. {"GMNPAC","AC","NP","GM","A","C","N","P","G","M"}

    "ACCACCDDEDFGHPPQPRPCCFGGPTVWYYTSPQRMNLKIGHFEDAC"

    Returns: 49

  3. {"AFKRSTY","ADEFHILNQRT","ADGHIQRS","CDHIMNQRV","CDEFHQ", "AGHINRSTWY","ACDEFHILNPQR","ACDGILPRSV","DFKMNQRSTVY"}

    "LDVQMTLTRSIQQLYAKII"

    Returns: 38

  4. {"CEFIKLPQRVY","DFGKPRST","AHMPRST","CDIKLMPSVWY"}

    "FYHKQAYDHLWRHMM"

    Returns: 32

    The 32 fragments are: "AY", "HMM", "LWR", "HLWR", "YDH", "YHK", "KQA", "QAYD", "WR", "Y", "RHM", "W", "FYH", "FY", "R", "Q", "QA", "YH", "M", "L", "YD", "K", "QAY", "DHL", "H", "F", "HM", "HL", "D", "DH", "HK", and "A".

  5. {"EFHILNPQSTW","AEHIPVWY","AFHIKMNRTVW","CFHLPQRVWY","CIKMQSTW","ACDHIQRT","ACDFHILNT"}

    "TVAYQQA"

    Returns: 12

  6. {"EHKNPQSV","CEFHILMPRV","GIMNTVY","ADEIKLQRSTVWY","ACDEFIKLMNTV","KPSTVWY","DEFLMNPQVWY","DIKMRSVY"}

    "EEDSRDTHFEIDSSCTNDNNV"

    Returns: 48

  7. {"ADHKLMNST"}

    "GDPNVCTCWVEY"

    Returns: 4

  8. {"AEFHIKMNPQSTVW","DEHIQSV","ACDEGIR"}

    "MFDIHQWYNATFPNLMDEYIWCEQEVVLEFKDKRQDFDDIHMRLLALWY"

    Returns: 49

  9. {"ACDEFHLMY","DEFKLMNQR","CFGHILNPRT","EIKLNPQSY"}

    "QEDPRRATFSYNWPN"

    Returns: 26

  10. {"ACILMQVWY","DEFHIMTWY","ADGHKLMRVW","ACDFGIMQSTVW","CEFGKLPQTWY","ILNPRSTVY","ACDFHLMPRST"}

    "FITGYGLAHPLLPHWCIAQKTQWEQ"

    Returns: 52

  11. {"DEFHIMQTY","ACDEFGHILNSTY","EFLRSTY","FIKMQV","CEGINRVWY","ACDEFGHLPQSTVW"}

    "KDFWTDMGDTEGDHVIIKPLFDVQIPNFLYHFAA"

    Returns: 61

  12. {"ACNRSTVW","ACFGHIKNPRSWY"}

    "LTFPHHRN"

    Returns: 8

  13. {"CDEGHIKLMNPQW"}

    "SGFNSNCIEDRMVTRYMLFVHIMWPIKFHLGKR"

    Returns: 18

  14. {"CDEGHILSTVW","ADEGHIKLMPQRTW","CEFIKNW","ADEFGMNPQRSVWY","AEHILMNPRTY","ACDFHLMNPSTY","CDFGHINPRSY","AFIKLNPSWY","CEFGHRSTW","CHIQTWY"}

    "YTMTVGSSEWCTRM"

    Returns: 32

  15. { "DEFHIMQTY", "ACDEFGHILNSTY", "EFLRSTY", "FIKMQV", "CEGINRVWY", "ACDEFGHLPQSTVW" }

    "KDFWTDMGDTEGDHVIIKPLFDVQIPNFLYHFAA"

    Returns: 61

  16. { "DEFHIMQTY", "ACDEFGHILNSTY", "EFLRSTY", "FIKMQV", "CEGINRVWY", "ACDEFGHLPQSTVW" }

    "KDFWTDMGDTEGDHVIIKPLFDVQIPNFLYHFAA"

    Returns: 61

  17. { "DEFHIMQTY", "ACDEFGHILNSTY", "EFLRSTY", "FIKMQV", "CEGINRVWY", "ACDEFGHLPQSTVW" }

    "KDFWTDMGDTEGDHVIIKPLFDVQIPNFLYHFAA"

    Returns: 61


This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2024, TopCoder, Inc. All rights reserved.
This problem was used for: