TopCoder Statistics - Problem Statement

Problem Statement

PROBLEM STATEMENT

A digraph is a sequence of two letters. Often by making a frequency count not
only of single letters but also of digraphs and even trigraphs, a good analysis
can be made of an encrypted message. Since digraphs appear anywhere in the
English language (i.e., the beginning of sentences, the middle of words, etc.),
they are not case-sensitive. Punctuation, digits, and spaces do not count in
digraph counts.

A one letter word is not a digraph.

DEFINITION
Class name: Digraph
Method name: mostcommon
Parameters: String, int
Return type: String[]

The method signature (make sure it is declared public) is:
String[] mostcommon (String text, int max);

max is an integer which tells how many of the most common digraphs to return.
Since digraphs are not case-sensitive, return everything in lowercase letters.
Given a String of text, determine what the most common digraphs are within the
text and return the max most common ones, sorted in descending order first by
number of occurences and then lexicographically in ascending order. That is, if
two digraphs have the same number of occurences, the one which is first
lexicographically should come before the other.

If there are not max unique digraphs, return a String[] containing only the
existing digraphs. For instance, given:
mostcommon("Hello", 10),
The only digraphs are "he", "el", "ll", "lo". So, return them in
lexicographical order:
{"el","he","ll","lo"}

TopCoder will enforce the following restrictions:
* text will be between 0 and 50 characters in length, inclusive.
* text will contain only letters (A-Z,a-z), digits (0-9, inclusive),
punctuation (only ',' '.' '?'), and spaces.
* max will be between 0 and 10, inclusive.

Examples:
mostcommon("a box", 3)
"a" is not a digraph
"bo" appears once
"ox" appears once
We have 2 digraphs, so we only return a String[] of length 2, not 3. Return
value is {"bo","ox"}.

mostcommon("abracadabra", 2)
"ab" appears twice
"br" appears twice
"ra" appears twice
"ac" appears once
"ca" appears once
"ad" appears once
"da" appears once
So, the most common ones are "ab", "br", and "ra". Since we only want the two
most common ones, and these three have the same number of occurences, we return
{"ab", "br"}, the two lowest lexicographically.

mostcommon("aaaabbbb", 1)
returns {"aa"}

mostcommon("aaaabbbb", 2)
returns {"aa","bb"}

mostcommon("aaabbbb", 2)
returns {"bb","aa"}

mostcommon("aaaabbbb" 3)
returns ("aa","bb","ab")

mostcommon("IM TopCoder1 with ?", 10)
returns ("co", "de", "er", "im", "it", "od", "op", "pc", "th", "to"}

Definition

Class:: Digraph
Method:: mostcommon
Parameters:: String, int
Returns:: String[]
Method signature:: String[] mostcommon(String param0, int param1)
(be sure your method is public)

Statistics

Problem Statement for "Digraph"

Problem Statement

Definition

Constraints

Examples