Statistics

Problem Statement for "TandemRepeats"

Problem Statement

An important problem in DNA analysis is finding tandem repeats. A tandem repeat occurs when a base sequence of nucleotides appears twice in a row. (In this problem, a base sequence is simply a contiguous substring and a nucleotide is simply a letter.) For example, in the DNA sequence "GATCATCA", the base sequence "ATC" appears twice in a row. In contrast, "ATCGATC" is not a tandem repeat of the base sequence "ATC" because of the 'G' in between the two copies.

Real DNA often contains errors in which one nucleotide is substituted for another. A k-approximate tandem repeat occurs when a base sequence of nucleotides appears twice in a row, allowing the second copy of the base sequence to differ from the first copy in at most k nucleotides. For example, in the DNA sequence "GATCATGA", the base sequence "ATC" is immediately repeated with a single error as "ATG". Thus, "ATCATG" is a 1-approximate tandem repeat. Note that both copies of the base sequence must be the same length.

Given a DNA sequence dna (represented as a String) and a limit k on the number of errors, find the length of the longest sequence that appears in dna as the base sequence of a k-approximate tandem repeat. Note that a k-approximate tandem repeat contains at most k errors, but it is allowed to contain fewer errors.

Definition

Class:
TandemRepeats
Method:
maxLength
Parameters:
String, int
Returns:
int
Method signature:
int maxLength(String dna, int k)
(be sure your method is public)

Constraints

  • dna contains between 2 and 50 characters, inclusive.
  • Every character in dna is a 'G', 'A', 'T', or 'C'.
  • k is between 0 and 5, inclusive.

Examples

  1. "GATCATCA"

    0

    Returns: 3

    A tandem repeat with a base sequence of length 3 is "ATCATC", with base sequence "ATC".

  2. "GATCATGA"

    1

    Returns: 3

    A 1-approximate tandem repeat is "ATCATG". The base sequence "ATC" is repeated with a single error.

  3. "GATCATGA"

    0

    Returns: 0

    This example has no tandem repeats.

  4. "AGAGAAAGAA"

    3

    Returns: 5

  5. "ATTAGCATTGCACACCTTGAGGACTTAGACAAACCTAGTACACAGGTGTA"

    5

    Returns: 11

  6. "AATAGGACCCATTTTAGGGGGACCCCCC"

    5

    Returns: 6

  7. "TT"

    0

    Returns: 1

  8. "TG"

    0

    Returns: 0

  9. "GC"

    1

    Returns: 1

  10. "GC"

    5

    Returns: 1

  11. "CTGCTTCCAACTTCTAAAGCCCAATGCGCTTTTATTCACCCCCAAGAACT"

    5

    Returns: 8

  12. "GCTGGCAGCAAACGAGGGTTGGAATCAGATCGGCGACTTTCGTGACAACG"

    2

    Returns: 5

  13. "CAGCTCCTTGTAGCGTGCGGCTTTCCCGTCGACAGACAGCCATGACGATT"

    4

    Returns: 7

  14. "CCTCGTGATCGCATTTCGGTCCGTTGCATAGGCGCGGCTGGTTAGTAGGG"

    0

    Returns: 3

  15. "CATGTGCATGGATTAGTGGCTGGTGAGAAAAGACCGCGAGCACATCCAGT"

    5

    Returns: 11

  16. "ATCCACAAAACTCTCTGCTTCTCAGACCCCTGCCTTGTGGTTAGATGTTC"

    4

    Returns: 8

  17. "TCCCATATAATAATGGCCGGGCCTCGTCCCTAATCGAACGAGACAGTCAG"

    0

    Returns: 3

  18. "AGATGCGTAGTAACGTGCGTACTACCTGACTACACGCCCTGATGCGTTTA"

    1

    Returns: 4

  19. "TGCCTGATTGCCACCACATGGCTAGCACTTCGGTCACGGGGCTACCTCAG"

    0

    Returns: 3

  20. "TGCGGACAACATACTCTGCAGCGTGTATCATTGAAGGACTAGACTCCGAC"

    3

    Returns: 7

  21. "GTTAGCGTGTCATGGGCATCCCTTAGTTAGCGTGTCATGGGCATCCCTTA"

    0

    Returns: 25

  22. "GTTAGCGTGTCATGGGCATCCCTTAGTTTGCGTGTCATGGGCATCCCTTA"

    2

    Returns: 25

  23. "ATCAT"

    1

    Returns: 1

  24. "ATCGGTACCGTCGAGCGTCTTCATAGTTAGCGTGGT"

    0

    Returns: 1

  25. "GCCCTACCTCTCCGAAGGGCCAACTTCTCAGTCGTTG"

    2

    Returns: 6

  26. "TATCCATCGAATAATCGCGGACTAACTTCGACAACGAC"

    5

    Returns: 11

  27. "GACATTTCACACGGATTCGTCCAGCTCCCACCCCTGCAA"

    1

    Returns: 4

  28. "TGCAATACCGCTAACCATCGTTAGATGACTAACAGGACTA"

    4

    Returns: 10

  29. "CGAGAAAATGAGATTCGTTTACTGCTCAGGAACTCGTTTTA"

    1

    Returns: 3

  30. "CGGAGTTGATAAGTGTTGGGTAGGCGCCGGGCTCGTCAGTGG"

    0

    Returns: 2

  31. "TGGGCCGAGTTGCATTTGTGAAAAGTGGGAACTGAGCCGACTT"

    5

    Returns: 9

  32. "GTCCACACTTGAAACTAGGGACCCATGCTGGGGCTGGTAGGAAG"

    0

    Returns: 2

  33. "ATGACGTGCTGTTTTTACTGAATCGCAAGTAGCCTCGGCGTCCAC"

    1

    Returns: 3

  34. "CTGACAGTTCACGGCCGATTAGTATTGGCGCACGATATCGCATCCC"

    5

    Returns: 10

  35. "GGATGACATGGAAGAGGGATTTTCTACGTTCTACCAACTTTTATATT"

    3

    Returns: 7

  36. "CGGCGAGTACAACTTAAATGGCTCTACGTTCCATTAGTCGCGCCAGAT"

    0

    Returns: 2

  37. "GGTGCTAGGTCCCATATGAGAGGGCGAACAGGTCTGTACTGATGCCTAG"

    1

    Returns: 3

  38. "TGCTGTGTACTGACGTTGGAGAACACCCGACGAAATCCAGCGCTGGGCCT"

    2

    Returns: 5

  39. "ATTAGCATTGCACACCTTGAGGACTTAGACAAACCTAGTACACAGGTGTA"

    5

    Returns: 11

  40. "AA"

    0

    Returns: 1

  41. "AAAAAA"

    1

    Returns: 3

  42. "CC"

    1

    Returns: 1

  43. "GATCAGATCGACAGGGGACGATACAGATTAGACGAGATTAAACCGGATTA"

    4

    Returns: 10

  44. "AA"

    5

    Returns: 1

  45. "GCA"

    5

    Returns: 1

  46. "AG"

    1

    Returns: 1

  47. "AGAGAAAGAA"

    3

    Returns: 5

  48. "AT"

    1

    Returns: 1

  49. "AA"

    1

    Returns: 1

  50. "AAA"

    5

    Returns: 1

  51. "GATCATCA"

    0

    Returns: 3

  52. "GG"

    1

    Returns: 1

  53. "AGTCTAGCC"

    0

    Returns: 1

  54. "AA"

    2

    Returns: 1

  55. "AGTCCTGGTCTACTA"

    0

    Returns: 3

  56. "GATC"

    4

    Returns: 2

  57. "GATCC"

    0

    Returns: 1

  58. "GATC"

    1

    Returns: 1

  59. "GG"

    0

    Returns: 1


This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2024, TopCoder, Inc. All rights reserved.
This problem was used for: