# Exercises 2.1

DS 352 Syllabus

last updated 24-Aug-2020

## Chapter 2.1 selected exercises

Classify the following attributes as binary, discrete, or continuous. Also classify them as qualitative (nominal or ordinal) or quantitative (interval or ratio).
Some cases may have more than one interpretation, so briefly indicate your reasoning if you think there may be some ambiguity .

Example: Age in years. Answer: Discrete, quantitative, ratio

• (a) Time in terms of AM or PM.
• (b) Brightness as measured by a light meter .
• (c) Brightness as measured by people's judgments.
• (d) Angles as measured in degrees between 0 and 360.
• (e) Bronze, Silver, and Gold medals as awarded at the Olympics .
• (f) Height above sea level.
• (g) Number of patients in a hospital.
• (h) ISBN numbers for books. (Look up the form at on the Web .)
• (i) Ability to pass light in terms of the following values: opaque, translucent, transparent .
• (j) Military rank.
• (k) Distance from the center of campus.
• (1) Density of a substance in grams per cubic centimeter .
• (m) Coat check number. (When you attend an event , you can often give your coat to someone who, in turn, gives you a number that you can use to claim your coat when you leave.)

3. You are approached by the marketing director of a local company, who believes that he has devised a foolproof way to measure customer satisfaction. He explains his scheme as follows: "It's so simple that I can't believe that no one has thought of it before. I just keep track of the number of customer complaints for each product. I read in a data mining book that counts are ratio attributes, and so, my measure of product satisfaction must be a ratio attribute. But when I rated the products based on my new customer satisfaction measure and showed them to my boss, he told me that I had overlooked the obvious, and that my measure was worthless. I think that he was just mad because our best selling product had the worst satisfaction since it had the most complaints. Could you help me set him straight ?"

• (a) Who is right, the marketing director or his boss? If you answered, his boss, what would you do to fix the measure of satisfaction?
• (b) What can you say about the attribute type of the original product satisfaction attribute ?

4. A few months later you are again approached by the same marketing director as in Exercise 3. This time, he has devised a better approach to measure the extent to which a customer prefers one product over other similar products. He explains, "When we develop new products, we typically create several variations and evaluate which one customers prefer. Our standard procedure is to give our test subjects all of the product variations at one time and then ask them to rank the product variations in order of preference. However, our test subjects are very indecisive, especially when there are more than two products. As a result , testing takes forever. I suggested that we perform the comparisons in pairs and then use these comparisons to get the rankings. Thus, if we have three product variations , we have the customers compare variation s1 and 2, then 2 and 3, and finally 3 and 1. Our testing time with my new procedureis a third of what it was for the old procedure, but the employees conducting the tests complain that they cannot come up with a consistent ranking from the results. And my boss wants the latest product evaluations, yesterday. I should also mention that he was the person who came up with the old product evaluation approach. Can you help me?"

• (a) Is the marketing director in trouble? Will his approach work for generating an ordinal ranking of the product variations in terms of customerpreference? Explain.
• (b) Is there a way to fix the marketing director's approach? More generally, what can you say about trying to create an ordinal measurement scale based on pairwise comparisons?
• ( c) For the original product evaluation scheme, the overall rankings of each product variation are found by comput ing its average over all test subjects. Comment on whether you think that this is a reasonable approach. What other approaches might you take?

5. Can you think of a situation in which identification numbers would be useful for prediction ?

7. Which of the following quantities is likely to show more temporal autocorrelation: daily rainfall or daily temperature? Why?

9. Many sciences rely on observation instead of ( or in addition to) designed experiments. Compare the data quality issues involved in observational science
with those of experimental science and data mining