Home Assignment 2 for Epidemiology Course VHM 811 at AVC - Fall Semester 2009

The assignment is worth 10% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 811 homepage for confounding and interaction (the rules are the same as in VHM 801).

The home assignment has six questions which should all be answered. Your solution should include text explaining the procedures used even if the calculations were done using computer software; it is recommended to include a Stata do-file as documentation for your analysis.

The data for the assignment stem from a large study in the United Kingdom on the association between smoking and cancer and other chronic diseases. The target population was all British doctors, but we focus here on the male doctors. In 1951, a questionnaire was sent to all British doctors with questions about their smoking habits, as well as general demographic information such as age. We consider data for mortality due to "arteriosclerotic heart disease" during a 10-year period following study onset. The classification of the cause of death was based on standard death certificates. The table below gives the number of deaths of this type and person years at risk, classified after smoking status and age categories based on the questionnaire data. For simplicity, we consider here only two smoking categories: non-smokers (defined as having never smoked the equivalent of one cigarette a day for as long as one year), and current smokers; thus, past smokers are not included in the data.

Age
group
Current smokersNon-smokersTotal% Non-
smokers
DeathsPerson-yearsDeathsPerson-years
25-34032548 015100 923025.3
35-443463032 218790 888816.1
45-5412351562 1210673 710512.0
55-6423136815 285710 40709.5
65-7426319456 282585 26888.5
75-841929171 311462 138611.7
85+371539 11340 17715.8

As supplementary information, the table also includes the total number of male doctors that responded to the initial questionnaire, and the percentage of these who were classified as non-smokers (it is not clear whether this percentage includes ex-smokers). A datafile is available (Stata format) for the data in the table.

Question 1.
Identify the study type from the description given above; if you think some further clarification is needed to ensure that the study is actually of the type under consideration, explain and make appropriate assumptions. Draw a diagram for the causal structure you would hypothesize for how the two factors, smoking status and age group, affect the mortality by arteriosclerotic heart disease. (Note: For simplicity we will in the following refer to the variables as simply smoking, age and mortality.)

Question 2.
Use the data to a determine a relevant measure of association between smoking and mortality. Our focus here is on a "crude" measure of association so age should be ignored for this calculation. Supplement by a (crude) assessment of the statistical significance of the association. (Hint: You may compute the measure of association by hand, but it is recommended to use Stata for the further analysis, both here and in the following questions. Consult the provided do-files of earlier sessions of VHM 811 as well as the do-files supplied with Chapter 6 of VER (1st edition) for hints on which command(s) to use.)

Question 3.
Determine also crude measure(s) of association between age and mortality, and supplement again with (crude) assessments of the statistical significance of these associations. (Hint: For this and the following questions, you will have to deal in a sensible way with the fact that age is categorized into multiple categories. For some calculations it may be appropriate to reduce the number of categories, perhaps even to dichotomize age at a suitable cut-off, but in other cases that may lead to an unnecessary loss of information. Whenever you decide to change the age categories, make sure to describe and motivate your decision.)

Question 4.
Carry out an epidemiological analysis to determine whether age is a confounder for the relation between smoking and mortality. Conversely, determine also whether smoking is a confounder for the relation between age and mortality. Postpone any discussion of a possible interaction between age and smoking to the next question. (Hint: You should use Stata for this part. If you cannot apply the same calculations/commands to assess confounding as described in Chapter 13 of VER, use the same general principles when applying different types of calculations/commands.)

Question 5.
Continue your epidemiological analysis from Q4 to investigate whether there is an interaction effect between smoking and age on mortality. Give appropriate measures of association to describe the relations of the two factors with mortality, and assess their statistical significance. As a summary of your analyses in Q4 and Q5, try to describe the causal structure in terms of the classification system introduced in Section 13.11 of VER.

Question 6.
Irrespective of your findings in Q4 and Q5, assume that confounding exists for one of the two relations studied above. If possible, choose a relation for which you described confounding, otherwise choose freely one of the two relations as your focus of this question. Describe two methods to control for confounding in the design of a future study (involving the same factors and of the same type). Give sufficient detail to allow someone with no specific knowledge about study design to implement the selection of subjects for the new study.


Henrik Stryhn (hstryhn@upei.ca) 2009-11-16