The writeups of these solutions have been provided by Immo Hüneke of Zühlke. Many thanks Immo!
Although there were three in our pair, we failed to complete the
challenge. We had a reasonably plausible approach, which failed
however – probably because we were trying to use too many
unfamiliar components.
Our intended approach was to use an
analysis tool to spit out a CSV file, which we would then pull into
Excel to plot the graph.
Checkstyle seemed like a good tool
for obtaining code metrics, but was difficult to get productive with
it - Immo had used it before but only as a pre-installed plugin under
IntelliJ (not available in this context).
JDepend was much
easier to install and use, but unfortunately it works only on
compiled classes, which wasn't obvious from the
beginning.
http://metrics.sourceforge.net/
is the most promising. It produces the numbers we want and can be run
outside Eclipse as a standalone Ant task.
As a result of
spending too long finding ways to get the basic numbers, we were
unable to integrate the three main components of the solution.
There
are actually tools available commercially that come very close to
achieving what we want. See http://www.cenqua.com/fisheye.
This links to a Subversion or CVS repository (many others supported)
and provides graphical views of the code size, volatility and so on -
what is currently missing is quality metrics,
unfortunately.
Subversion repositories used by the different
teams turned out to be a major bottleneck.
We got a basic solution working but then tried to enhance it. The
approach involved using HTML-scraping to obtain the basic horoscope
text (from Yahoo). Unfortunately after a half-dozen tests of the
solution, Yahoo stopped answering our requests.
From there, it
was easy to put the text into an on-line text summariser such as
http://textmining.i2r.a-star.edu.sg/people/kanagasa/ts/cgi-bin/sumnew.pl
or http://swesum.nada.kth.se/index-eng.html
to get a one-line summary.
However, Nat changed the
requirements by consultation with one of our team, so that he now
wanted an overall plus/minus verdict. So then we were into analysing
frequencies of words in the horoscope page and using Wordnet database
to categorise the most frequently occurring non-stopwords.
Needless
to say, we ran out of time. But here's a good reference to a
recipe-book of text manipulation using Unix tools:
http://www.dsl.org/cookbook/cookbook_16.html
We found this site http://www.larnercorp.com/trumps/
allows you to enter the information to create your own top trump
cards. I registered and created a card as an experiment, but we
didn't fancy trying to screen-scrape the form to generate the
cards.
Second attempt - split the problem into two (data
capture and card generation). Rusty PHP scripting skills were exhumed
to create a HTML template that looked reasonably OK (attached). We
started with a sample CSV file with just two entries.
Time ran
out though to scrape all the information off the Web automatically -
hats off to the wizard teams who managed to use search engines and
regex processing to pull significant information about all the
participants.
At least we didn't run into problems with search
engines blocking us!