Flip it 10 times more (and it lands heads 7 more times) and the interval is now 48% to 85%. If I flip it 10 times more and get the same result I'll have 21/30 flips landing heads, and the interval is 52% to 83%. That's giving me a good reason to believe this coin might be weighted... (A completely fair coin toss should have a 50% chance of landing on heads.)

There's always luck of course, that's what this statistical power thing is for - these coin toss numbers are computed with 95% confidence - so there's a 5% chance that the coin isn't weighted at all and it's all down to luck.

Ref: Wilson, E. B. "Probable Inference, the Law of Succession, and Statistical Inference," Journal of the American Statistical Association.

Or see:
http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
if you get off on complex mathematical formulae.

Now this is all very nice but what's it got to do with reviews?

For our purposes we're using this interval slightly differently - we compute our interval taking the number of good reviews out of the number of all good or bad reviews (we ignore neutrals). We then take the lower bound for our karma.

This is a number that creeps up from zero towards the average rating as the number of ratings increases and it represents the lower bound of our confidence that a show is good. Taking the earlier weighted coin example, we'd score the coin at 52 rather than 83.

How do you decide what's a good review or a bad?

EdTwinge uses the Twitter api to pick up and monitor all tweets mentioning any of the acts at the 2009 Edinburgh Fringe Festival, plus the most commonly used Fringe-related hashtags - including our own #Edtwinge. We monitor both the full name of the act and the @profile names of those that have a Twitter presence.

We have established an extensive database of commonly used words and phrases that express positive or negative sentiment. Tweets containing these words or phrases are automatically scored as positive or negative and contribute to the karma rating accordingly.

Tweets that do not contain an instantly recognised sentiment are stored and manually checked. As such we add new words and phrases to the database as we go.

Phrases are prioritised over single words. So "shit hot" will be correctly recognised as a positive review rather than a bad one.

Why should you trust our karma score?

It's a quantitative measure based on the views of the many rather than the few.

We've applied mathematical rigour to ensure that there is real statistical confidence behind the rating.

We only recognise one tweet (the most recent) from each Twitter profile about any given act. If you tweet once about each of lots of shows that's fine. But you can't artificially boost a show's karma score by repeatedly tweeting from a single account.

PAGE 2 OF 2 PREVIOUS | NEXT