CS405

HW #3

Due Friday, November 20, Midnight

48 points total

  

1)  (5 points) Gordo hypothesizes that the human mind operates based on predicate calculus.  For example, to determine that a sparrow is a bird, we would use a process like forward chaining in our brains:  sparrow(x) --> bird(x).  Based on the sentence verification experiment, what evidence supports Gordo's hypothesis?  What evidence contradicts Gordo's hypothesis?

2) (8 points)  Give a semantic network/conceptual graph representation for the sentences "John had lobster" (meaning John ate lobster) and "John had hair" (meaning John possesses hair).

      Consider a computer program that is trying to parse English sentences into a semantic network.  If the conceptual nodes of the  network are already in place (but not instances for a particular sentence) then how might the semantic network be used to disambiguate between the meaning of the two sentences?

3)  (20 points)  Decision Trees.

Consider the problem of learning the concept of whether or not to purchase a music CD. To keep things simple enough to work through this problem by hand, here is a very small number of examples from which we want to learn the concept.

Assume you are using the following attributes to describe the examples:

     TYPE         possible values:     Rock, Jazz, HipHop
     PRICE        possible values:     Cheap, Expensive

(Since each attribute's value starts with a different letter, for shorthand we'll just use that initial letter, e.g., 'J' for Jazz.)  Our output decision is binary-valued, so we'll use '+' and '-' as our concept labels, indicating a "buy" recommendation or not, respectively.

Here is the training set: 
     TYPE = H    PRICE = E    CATEGORY = +
     TYPE = R    PRICE = C    CATEGORY = +
     TYPE = R    PRICE = E    CATEGORY = +
     TYPE = H    PRICE = C    CATEGORY = +
     TYPE = J    PRICE = C    CATEGORY = +
     TYPE = R    PRICE = E    CATEGORY = -
     TYPE = J    PRICE = E    CATEGORY = -
     TYPE = J    PRICE = C    CATEGORY = -
     TYPE = H    PRICE = E    CATEGORY = +
     TYPE = J    PRICE = E    CATEGORY = -
     TYPE = R    PRICE = E    CATEGORY = -
     TYPE = J    PRICE = C    CATEGORY = +
     TYPE = R    PRICE = E    CATEGORY = -

Here is the test set: 
     TYPE = R    PRICE = C    CATEGORY = +
     TYPE = J    PRICE = C    CATEGORY = -
     TYPE = J    PRICE = E    CATEGORY = -
     TYPE = R    PRICE = E    CATEGORY = +
     TYPE = H    PRICE = E    CATEGORY = +

a) Use the ID3 decision tree algorithm with information theoretic test selection (chapter 10 in the book) to construct a tree from the training set.  Show all of your work.  You don't need to write an actual program but can complete this exercise on paper.

If multiple features tie for the best one, choose in alphabetic order first (e.g. cheap before expensive before hiphop before jazz before rock).  If there are no further features to select at a node and there is a tie in the + and - examples for that node then choose - as the category. 

b) Test your decision tree on the test set and report the percent correct on these examples.  Briefly discuss your results.

c) Suppose 4-fold cross validation is used to train and test the decision tree.  Describe the steps that would be taken to train and test the classifier under this scenario.  Why should this give a better picture of the effectiveness of the classifier compared to a single training set and a single test set?

 

4)     (20 pts) Write a program that implements either a 3-Nearest Neighbor or Bayesian Classifier assuming conditional independence that works on the following data.  This data encodes various animal features (e.g., aquatic, breathes, venemous, etc.) to try and differentiate classes of animals in a zoo. Train on the training data and test on the testing data (no need for cross validation, to make this a bit easier).   Output the percentage correct on the test data.

            zoo-train.txt

            zoo-test.txt

      Here is a description of the data:  zoo-info.txt

      You can ignore the first attribute (which is the animal name).  As described in the info file, the last number for each entry indicates the class for the animal.  Use any programming language you wish.