Date: Fri, 14 Jan 94 19:35:47 PST
From: Ronny Kohavi <ronnyk@cs.stanford.edu>
Message-Id: <9401150335.AA26467@Starry.Stanford.EDU>
To: George John <gjohn@cs.stanford.edu>,
        Karl Pfleger <kpfleger@cs.stanford.edu>, Don <geddis@cs.stanford.edu>
Subject: Be a judge for a bet 
Cc: Scott Roy <hsr@cs.stanford.edu>

Background: Scott Roy and I met today to discuss MLC++ and Scott's
learning algorithm.  He seemed too optimistic and complained of my
criticisms.  In order to avoid such a disparate views and settle them,
we made a bet.  You three are asked to be the judges.  We know your
time is valuable, thus you will each get $5 from the loser (or $2 from
each party in case of a draw -- a $1 incentive to avoid a draw).

   Ronny

This bet is between Ron Kohavi (referred to as Ronny) and Howard Scott
Roy (referred to as Scott).  The bet involves a machine learning
experiment.  Three judges, George John, Karl Pfleger, and Don Geddis
will determine who wins the bet, or whether it is a draw.  If the
judges find a winner, the loser will take the winner to a good Chinese
dinner (good means that the cost of the meal for the winner is at
least $30).

Details:

Ronny will run C4.5, a commercially available learning algorithm.
Scott will announce that he has a machine learning program BEFORE
May 1, 1994. 

After Scott's announcement that he is ready, the judges will announce
3 datasets that they think are appropriate for comparing Scott's
algorithm with Quinlan's C4.5.  The datasets should consist of one
artificial domain problem on a discrete domain, and 2 datasets for
real-world problems taken from the Irvine dataset.  The attribute
values and names will be changed so as not to reveal the real database name.

Both Ronny and Scott will receive the same TRAINING set to adjust
their learning algorithms.  One day will be given, after which Scott,
Ronny, and a representative from the judges will meet.  Scott's
program and C4.5 will be executed and tested on the TEST SET (unknown
to Scott and Ronny).  The judges will then decide who won, or whether
it is a draw.

The winner's algorithm must perform better on at least 2 out of the 3
datasets, where better means higher accuracy on the test set.  If one
of the program aborts (crashes) or for some reason does not classify
instances in the test set, it counts as a loss for that dataset.
If Scott does not make his announcement before May 1st, he loses.


 +-------------------------------------------------------------------+
 | Ronny Kohavi - ronnyk@CS.Stanford.Edu                             | 
 |                                                                   |
 | "The one real object of education is to leave a man in a          |
 |    condition of continually asking questions" / Bishop Creighton  |
 |                                                                   |
 +-------------------------------------------------------------------+

_______________________________________________________________________________

Date: Fri, 29 Apr 94 09:55:20 -0700
From: H. Scott Roy <hsr@cs.stanford.edu>
Message-Id: <9404291655.AA03661@Schmendrick.Stanford.EDU>
To: Ronny Kohavi <ronnyk@cs.stanford.edu>
Subject: Re: Bet Scott vs Ronny
Cc: Don <geddis@cs.stanford.edu>, George John <gjohn@cs.stanford.edu>,
        Karl Pfleger <kpfleger@cs.stanford.edu>,
        Scott Roy <hsr@cs.stanford.edu>

Hi folks,

I will indeed have a program ready.  It will only be a shell of its
full potential, but I'll hop on the woofing wagon early and
confidently (?!) predict that C4.5 will go down to ignominious defeat.
Garlic eggplant is my favorite, Ronny, just so you can start scouting
the Chinese restaurants to find which one cooks up the best.

So I am hereby postdating this message to 4/30, 11:59 pm, and announcing that  
my program, MultiClass, is ready and waiting in its corner.

One question for Ronny: what, precisely, do you mean by accuracy?
MultiClass generates probability distributions, so it can give the
complete log likelihood of the test set based on its model.  That
measure has a distinct advantage in that a program gets penalized for
making random guesses.  I can, of course, also just determine the
maximum likelihood class and measure how many of those guesses are
correct.  Which measure shall we use?  Judges?

Scott
_______________________________________________________________________________

Date: Tue, 3 May 94 14:33:13 -0700
From: H. Scott Roy <hsr@cs.stanford.edu>
Message-Id: <9405032133.AA11132@Schmendrick.Stanford.EDU>
To: Ronny Kohavi <ronnyk@cs.stanford.edu>
Subject: Game Time
Cc: Don <geddis@cs.stanford.edu>, George John <gjohn@cs.stanford.edu>,
        Karl Pfleger <kpfleger@cs.stanford.edu>,
        Scott Roy <hsr@cs.stanford.edu>

Hi folks,

The hour is at hand.  I've just unearthed the last bug I care to
correct in my program and am ready to get underway.  Shall we convene
at a central site to run things?

I'll hereby put on my most optimistic face and confidently predict a sound  
thrashing for C4.5.

Scott
_______________________________________________________________________________

Date: Tue, 3 May 94 14:50:59 -0700
From: H. Scott Roy <hsr@cs.stanford.edu>
Message-Id: <9405032150.AA11161@Schmendrick.Stanford.EDU>
To: Ronny Kohavi <ronnyk@cs.stanford.edu>
Subject: Re: The Last Dataset
Cc: Geddis@cs.stanford.edu, kpfleger@cs.stanford.edu, gjohn@cs.stanford.edu,
        hsr@cs.stanford.edu

| C4.5 runs fine on the datasets, so I guess Scott will have to pay up.
| Scott, don't forget that I'm a vegetarian when you scout for that
| restaurant.  Italian is good, and Mondays and Fridays I'm going
| folk-dancing.   This Thursday is fine.

Er, 'scuse me Ronny, but I haven't conceded just yet.  My latest assessments  
leave me somewhat more optimistic than I was this morning (my training and  
testing routines were reading the data differently, with rather spectacular  
consequences on the accuracy results).

Don't count your eggplants until we see which program wins.  But win or lose,  
Thursday should be fine.

Scott
_______________________________________________________________________________

Date: Wed, 4 May 1994 14:46:22 +0800
From: Ronny Kohavi <ronnyk@cs.stanford.edu>
Message-Id: <9405042146.AA28631@starry.Stanford.EDU>
To: kpfleger@hpp.stanford.edu
Cc: Scott Roy <hsr@cs.stanford.edu>, George John <gjohn@cs.stanford.edu>,
        Don <geddis@cs.stanford.edu>
Subject: Re: so what's the result?

Karl> So what happened?  My guess is that there weren't enough
Karl> instances in either George's or my training sets to constrain
Karl> the large number of degrees of freedom os Scott's models,
Karl> preventing his stuff from finding reasonable solutions, but I've
Karl> been expecting to see an announcement from the winner....

George's dataset: C4.5  78.7, MultiClass: 65.6, baseline: 55.3
Karl's   dataset: C4.5: 74.2, MultiClass: 65.8, baseline: 63.3

Let me point out that preparing C4.5 flags on a 10-CV, I wrote down
that I should use -m1 for George, which would have decreased the
accuracy to 76, and -m40 for Karl, which would have increased the
accuracy to 75 (both insignificant variations on accuracy).  On the
-m40, the tree has only 11 nodes, so the model is very comprehensible.
Since every leaf has at least 40 instances, it seems like the Pima
dataset has enough instances, but not enough discriminatory power.

I'll leave it to Scott to explain what happened.

BTW, I offered Scott another bet on June 1 (same conditions), but he
declined.


 +---------------------------------------------------------------------+
 | Ronny Kohavi - ronnyk@CS.Stanford.Edu                               | 
 |                                                                     |
 | Picture a robot on a psychiatric couch:                             |
 |   Doc, my intelligence may be artificial, but my problems are real. |
 |                                                                     |
 +---------------------------------------------------------------------+