PART I: Distribution, Z-score, Median

1. Describe the Distribution of a Quantitative Random Variable

(A) Marginal Distribution vs. Conditional Distribution

The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.

Row total / grand total; column total / grand total

A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.

P(value of that variable | variable) = P(variable and another variable) / P(variable)

(B) Association

There is an association between two variables if knowing the value of one variable helps predict the value of the other. If knowing the value of one variable does not help you predict the value of the other, then there is no association between the variables.

(C) Shape, Outlier, Center, Spread (1 Variable)

  1. Shape
    1. Roughly symmetric, skewed to right, skewed to left
    2. peak + located at about (unimodal/bimodal)
    3. gap/cluster
  2. Outlier
    1. There seems to an outlier locate at about …
      1. Since the interquartile range IOR = Q3 – Q1 = [IQR]. Values that are beyond (Q1 – 1.5IQR, Q3 + 1.5IQR) should be identified as outliers.
    2. There seems to be no apparent outliers.
  3. Center
    1. The midpoint of the xxx values shown in the graph is [median]. So a typical [variable] in the sample got about [median + unit].
      1. Mean: Mean is sensitive to extreme value which it will be pulled to the direction of extreme value.
      2. Median: Median is a resistance measure of central tendency.
    2. Spread
      1. Range: The data vary from [min + unit] to [max + unit]. The range is [max] – [min] = [range].
      2. The first quartile Q1 is [Q1 + unit] and the third quartile Q3 is [Q3 + unit]. The interquartile range IQR = Q3 – Q1 = [IQR].

(D) Shape, Outlier, Center, Spread (2 Variables)

  1. Shape
    1. Roughly symmetric/skewed to the right/skewed to the left
      1. The shape of the distribution 1 is [roughly symmetric/skewed to the right/skewed to the left] while the shape of the distribution 2 is [roughly symmetric/skewed to the right/skewed to the left].
      2. The shape of both distributions is [roughly symmetric/skewed to the right/skewed to the left].
    2. peak + located at about (unimodal/bimodal)
    3. gap/cluster
  2. Outlier
    1. There seems to an outlier locate at about …
      1. Since the interquartile range IOR = Q3 – Q1 = [IQR]. Values that are beyond (Q1 – 1.5IQR, Q3 + 1.5IQR) should be identified as outliers.
    2. There seems to be no apparent outliers.
  3. Center
    1. The median in distribution 1 is [median1] and the median in distribution 2 is [median2], which the median in distribution 1 is [larger/smaller than; equal to] the median in distribution 2.
    2. The median is [median1] in distribution 1, which is largest, following by median in distribution 2 ([median2]), and the median in distribution 3 is [median3], which is the smallest.
      1. Mean: Mean is sensitive to extreme value which it will be pulled to the direction of extreme value.
      2. Median: Median is a resistance measure of central tendency.
    3. Spread
      1. Range: The data vary from [min + unit] to [max + unit]. The range is [max] – [min] = [range].
      2. The first quartile Q1 is [Q1 + unit] and the third quartile Q3 is [Q3 + unit]. The interquartile range IQR = Q3 – Q1 = [IQR]