Solution file for additional exercise 10.6
------------------------------------------

Data on ratings of machines operated by different persons.
The data are unbalanced because the machines are operated
different number of times by the different persons.

- notation:
y_ijk = rating for replication k of machine i by person j,
   i = 1,2,3 (machines),
   j = 1,...,6 (person),
   k = 1,...,n_ij (replication; n_ij's are either 1, 2 or 3),
- the effect of machines is fixed (the 3 machines are of specific interest),
- the effect of persons should probably be taken as random because we have no interest in
the particular persons but in the general variability between persons, persons 
correspond to blocks, (one may also take persons as fixed)
- the effect of machines*persons should be taken as random (our interest is in
general differences between machines, not differences for specific
persons), no matter whether persons are taken as fixed or random,
- model: 
y_ijk = mu + alpha_i + B_j + (AB)_ij + eps_ijk, 
      where B_j's are assumed i.i.d. N(0,sigma_A^2),
      where AB_ij's are assumed i.i.d. N(0,sigma_AB^2), and
      where eps_ijk's are assumed i.i.d. N(0,sigma^2).
- type of analysis: two-way ANOVA with mixed effects (both fixed and random),

This solution is based on analysis by the ANOVA-based estimation
approach, despite the data being unbalanced so that the results will not
match exactly those of likelihood-based estimation. From version 18 such
an analysis is also available in Minitab, but this solution has not been
updated to include it. Because the dataset is small, the inference 
following the mixed command in Stata is somewhat too liberal when used 
with its default error degrees of freedom (that can be overridden with 
the dfmethod option). It is also possible to analyze in SAS (proc mixed) 
or R (nlme library).

MTB > WOpen "H:\VHM\VHM802\Data_csv\hs10_6.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs10_6.csv’
Worksheet was saved on 03/03/2011

MTB > Name C4 "SRES".
MTB > GLM;
SUBC>   Response 'rating';
SUBC>   Nodefault;
SUBC>   Categorical 'machine' 'person';
SUBC>   Random person;
SUBC>   Terms machine person machine*person;
SUBC>   Means machine;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TEMS;
SUBC>   TVariance;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK;
SUBC>  SResiduals 'SRES'.
General Linear Model: rating versus machine, person 

Method
Factor coding  (-1, 0, +1)

Factor Information
Factor   Type    Levels  Values
machine  Fixed        3  1, 2, 3
person   Random       6  1, 2, 3, 4, 5, 6

Analysis of Variance
Source            DF   Seq SS  Contribution   Adj SS   Adj MS  F-Value  P-Value
  machine          2  1648.66        53.45%  1238.20  619.099    16.57    0.001  x
  person           5  1008.76        32.71%  1011.05  202.211     5.17    0.013  x
  machine*person  10   404.32        13.11%   404.32   40.432    46.34    0.000
Error             26    22.69         0.74%    22.69    0.873
Total             43  3084.43       100.00%
x Not an exact F-test.

Model Summary
       S    R-sq  R-sq(adj)  PRESS  R-sq(pred)
0.934111  99.26%     98.78%      *           *

Coefficients
Term              Coef  SE Coef       95% CI       T-Value  P-Value   VIF
Constant        59.657    0.153  (59.343, 59.971)   390.49    0.000
machine
  1             -7.296    0.230  (-7.769, -6.823)   -31.70    0.000  1.77
  2              0.681    0.218  ( 0.233,  1.130)     3.12    0.004  1.73
person
  1              1.409    0.417  ( 0.551,  2.267)     3.38    0.002     *
  2             -1.757    0.314  (-2.403, -1.111)    -5.59    0.000     *
  3              6.343    0.377  ( 5.568,  7.117)    16.84    0.000     *
  4              0.076    0.314  (-0.570,  0.722)     0.24    0.811     *
  5              3.009    0.314  ( 2.363,  3.655)     9.58    0.000     *
machine*person
  1 1           -1.770    0.631  (-3.067, -0.474)    -2.81    0.009     *
  1 2            1.696    0.475  ( 0.721,  2.672)     3.57    0.001     *
  1 3            1.296    0.604  ( 0.054,  2.539)     2.14    0.042     *
  1 4           -0.737    0.475  (-1.713,  0.239)    -1.55    0.133     *
  1 5           -4.004    0.439  (-4.907, -3.101)    -9.11    0.000     *
  2 1            2.252    0.626  ( 0.964,  3.539)     3.60    0.001     *
  2 2            0.985    0.433  ( 0.095,  1.876)     2.27    0.031     *
  2 3            0.519    0.513  (-0.536,  1.573)     1.01    0.321     *
  2 4            2.319    0.433  ( 1.428,  3.209)     5.35    0.000     *
  2 5            1.552    0.469  ( 0.588,  2.516)     3.31    0.003     *

Regression Equation
rating = 59.657 - 7.296 machine_1 + 0.681 machine_2 + 6.615 machine_3 + 1.409 person_1
         - 1.757 person_2 + 6.343 person_3 + 0.076 person_4 + 3.009 person_5 - 9.080 person_6
         - 1.770 machine*person_1 1 + 1.696 machine*person_1 2 + 1.296 machine*person_1 3
         - 0.737 machine*person_1 4 - 4.004 machine*person_1 5 + 3.519 machine*person_1 6
         + 2.252 machine*person_2 1 + 0.985 machine*person_2 2 + 0.519 machine*person_2 3
         + 2.319 machine*person_2 4 + 1.552 machine*person_2 5 - 7.626 machine*person_2 6
         - 0.481 machine*person_3 1 - 2.681 machine*person_3 2 - 1.815 machine*person_3 3
         - 1.581 machine*person_3 4 + 2.452 machine*person_3 5 + 4.107 machine*person_3 6
Equation treats random terms as though they are fixed.

Fits and Diagnostics for Unusual Observations
Obs  rating     Fit  SE Fit       95% CI        Resid  Std Resid  Del Resid       HI  Cook’s D
  1  52.000  52.000   0.934  (50.080, 53.920)   0.000          *          *  1.00000         *
  3  60.000  60.000   0.934  (58.080, 61.920)   0.000          *          *  1.00000         *
  7  64.000  64.000   0.934  (62.080, 65.920)   0.000          *          *  1.00000         *
  9  68.600  67.200   0.661  (65.842, 68.558)   1.400       2.12       2.29  0.50000      0.25
 22  44.800  46.800   0.539  (45.691, 47.909)  -2.000      -2.62      -3.00  0.33333      0.19
 24  65.800  67.200   0.661  (65.842, 68.558)  -1.400      -2.12      -2.29  0.50000      0.25
 35  49.200  46.800   0.539  (45.691, 47.909)   2.400       3.15       3.92  0.33333      0.28
Obs     DFITS
  1         *     X
  3         *     X
  7         *     X
  9   2.28518  R
 22  -2.12005  R
 24  -2.28518  R
 35   2.77284  R
R  Large residual
X  Unusual X

Expected Mean Squares, using Adjusted SS
   Source          Expected Mean Square for Each Term
1  machine         (4) + 2.1370 (3) + Q[1]
2  person          (4) + 2.2408 (3) + 6.7224 (2)
3  machine*person  (4) + 2.3162 (3)
4  Error           (4)

Means
Term     Fitted Mean
machine
  1          52.3611
  2          60.3389
  3          66.2722

Variance Components, using Adjusted SS
Source          Variance  % of Total    StDev  % of Total
person           24.2571      57.47%  4.92515      75.81%
machine*person   17.0791      40.46%  4.13269      63.61%
Error           0.872564       2.07%  0.93411      14.38%
Total            42.2088              6.49683
 
Residual Plots for rating 

MTB > NormTest 'SRES'.
Probability Plot of SRES 
The P-value for the Anderson-Darling test of normality is 0.054.

MTB > CDF -3.92 k1;
SUBC>   T 25.
MTB > let k2=k1*2*44
MTB > Print K1 K2.
Data Display 

K1    0.000304141
K2    0.0267644

MTB > FacPlot 'rating';
SUBC>   Factors machine person;
SUBC>   GInt;
SUBC>     Full.
Interaction Plot for rating 

MTB > Name c5 "ByVar1" c6 "ByVar2" c7 "Mean1"
MTB > Statistics 'rating';
SUBC>   By 'machine' 'person';
SUBC>   GValues 'ByVar1'-'ByVar2';
SUBC>   Mean 'Mean1'.

MTB > Name C8 "SRES_1".
MTB > GLM;
SUBC>   Response 'Mean1';
SUBC>   Nodefault;
SUBC>   Categorical 'ByVar1' 'ByVar2';
SUBC>   Terms ByVar1 ByVar2;
SUBC>   Means ByVar1 ByVar2;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TEquation;
SUBC>   TFactor;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK;
SUBC>  SResiduals 'SRES_1'.
General Linear Model: Mean1 versus ByVar1, ByVar2 

Factor Information
Factor  Type   Levels  Values
ByVar1  Fixed       3  1, 2, 3
ByVar2  Fixed       6  1, 2, 3, 4, 5, 6

Analysis of Variance
Source    DF  Seq SS  Contribution  Adj SS  Adj MS  F-Value  P-Value
  ByVar1   2   584.7        51.29%   584.7  292.37    20.16    0.000
  ByVar2   5   410.4        36.00%   410.4   82.08     5.66    0.010
Error     10   145.0        12.72%   145.0   14.50
Total     17  1140.1       100.00%

      S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
3.80779  87.28%     78.38%  469.777      58.80%

Coefficients
Term        Coef  SE Coef       95% CI       T-Value  P-Value   VIF
Constant  59.657    0.898  (57.658, 61.657)    66.47    0.000
ByVar1
  1        -7.30     1.27  (-10.12,  -4.47)    -5.75    0.000  1.33
  2         0.68     1.27  ( -2.15,   3.51)     0.54    0.603  1.33
ByVar2
  1         1.41     2.01  ( -3.06,   5.88)     0.70    0.499  1.67
  2        -1.76     2.01  ( -6.23,   2.71)    -0.88    0.402  1.67
  3         6.34     2.01  (  1.87,  10.81)     3.16    0.010  1.67
  4         0.08     2.01  ( -4.40,   4.55)     0.04    0.971  1.67
  5         3.01     2.01  ( -1.46,   7.48)     1.50    0.165  1.67

Regression Equation
Mean1 = 59.657 - 7.30 ByVar1_1 + 0.68 ByVar1_2 + 6.61 ByVar1_3 + 1.41 ByVar2_1 - 1.76 ByVar2_2
        + 6.34 ByVar2_3 + 0.08 ByVar2_4 + 3.01 ByVar2_5 - 9.08 ByVar2_6

Fits and Diagnostics for Unusual Observations
Obs  Mean1    Fit  SE Fit      95% CI      Resid  Std Resid  Del Resid        HI  Cook’s D     DFITS
 12  43.63  51.26    2.54  (45.60, 56.92)  -7.63      -2.69      -4.83  0.444444      0.72  -4.32380
Obs
 12  R
R  Large residual

Means
        Fitted
Term      Mean  SE Mean
ByVar1
  1      52.36     1.55
  2      60.34     1.55
  3      66.27     1.55
ByVar2
  1      61.07     2.20
  2      57.90     2.20
  3      66.00     2.20
  4      59.73     2.20
  5      62.67     2.20
  6      50.58     2.20
 
Residual Plots for Mean1 

MTB > NormTest 'SRES_1'.
Probability Plot of SRES_1 
The P-value for the Anderson-Darling test of normality is 0.353.

MTB > CDF -4.83 k1;
SUBC>   T 9.
MTB > let k2=k1*2*18
MTB > Print K1 K2.
Data Display 

K1    0.000466952
K2    0.0168103

MTB > Name c9 "ByVar3" c10 "Mean3"
MTB > Statistics 'rating';
SUBC>   By  'person';
SUBC>   GValues 'ByVar3';
SUBC>   Mean 'Mean3'.
MTB > NormTest 'Mean3'.
Probability Plot of Mean3 
The P-value for the Anderson-Darling normality test is 0.531.

Comments and answers to questions:
----------------------------------
The residuals look a bit strange because of 5 scattered residuals
with values that are somewhat more extreme than all the other residuals.
The two values for machine 2 and employee 3 don't agree well, and there
is also substantial disagreement between the three values of employee 6
at machine 1. However, the residuals are not very large, and there would
not seem to be any reason to exclude them, even if the (approximate)
outlier test based on the deletion residual gives weak significance
(P=0.03). One would be advised to look at the experimental protocols to 
see if some errors had occurred. Some observations do not have standardised 
residuals, because there was no replication at the corresponding 
person*machine level; it's not an error or a serious problem. 

The analysis of machine*person means additionally showed a very quite residual
(corresponding to random effect) for person=6 and machine=2 (this is also
seen when estimating random effects directly in the likelihood-based analysis
in Stata). Also this residual is significant by the outlier test. It could be 
suggested to inspect the circumstances of this particular set of values 
(which are pretty consistent). Also generally, person 6 is somewhat lower 
than the others, and one might consider omitting him from the sample due to 
the extreme variations across machines. The overall value for person 6 is 
not beyond what could be expected from a normal distribution sample.

The estimated variance components are given above. There is a strong
effect of machines (P=0.001 with the approximate test of the ANOVA
table), and the least squares means show machine 3 to perform best and
machine 1 to perform worst. The interaction plot shows roughly parallel
curves, except that machine 2 performs very poorly with employee 6 (as already
discussed above in the residuals analysis). No standard errors are given for 
the least squares means, and Minitab will not perform any pairwise comparisons. 
As judged from the means and the strong significance of machines, one would 
guess all machines to be statistically different. A likelihood-based analysis 
(mixed in Stata or proc mixed in SAS) gives the correct standard errors and provides
pairwise comparisons.

For comparison, below the results of the mixed procedure in Stata. It is
seen that the variance components are estimated at roughly similar
values as by the analysis in Minitab. 


. mixed rating i.machine || person: || machpers:, reml

Mixed-effects REML regression                   Number of obs     =         44
-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
         person |          6          5        7.3          9
       machpers |         18          1        2.4          3
-------------------------------------------------------------
                                                Wald chi2(2)      =      39.93
Log restricted-likelihood = -90.935741          Prob > chi2       =     0.0000
------------------------------------------------------------------------------
      rating |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     machine |
          2  |   7.962446   2.214698     3.60   0.000     3.621717    12.30317
          3  |   13.91822   2.209355     6.30   0.000     9.587965    18.24848
             |
       _cons |     52.354   2.490616    21.02   0.000     47.47248    57.23552
------------------------------------------------------------------------------
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
person: Identity             |
                  var(_cons) |    22.4558   17.41414      4.911726    102.6651
-----------------------------+------------------------------------------------
machpers: Identity           |
                  var(_cons) |     14.234   6.515169       5.80385    34.90904
-----------------------------+------------------------------------------------
               var(Residual) |   .8708677   .2410661      .5062091    1.498216
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 88.29                 Prob > chi2 = 0.0000