Interpreting Cox Proportional Hazards Model Using Colon Dataset in R

Published in

The Startup

5 min readAug 1, 2020

Cox proportional hazards model is used to determine significant predictors for outcomes that are time-to-event. It is especially relevant in disciplines such as oncology, where outcomes are usually time-to-event (e.g overall survival and disease-free survival). Due to the complex nature of time-to-event outcome which involves censoring as well as both continuous and categorical components, it may be difficult to understand how to interpret the model initially. Hence, similar to what I did in my previous article on logistic regression, I would examine how to interpret R outputs for cox proportional hazards model as well as a test for proportional hazard assumption in the model.

Glossary for statistical terms for the later section:

Overall Survival: This is usually defined as from time to randomization to time to event, or last follow up if no event was observed throughout the period.
Hazard ratio: Similar to how odds is used in logistic regression, the equivalent for odds in cox proportional hazard model is hazard. The hazard ratio look into comparing the hazards occurring in one group in relation to the reference group (Eg. the experimental regimen vs standard treatment). Hazard ratio is the exponential form of the coefficients obtained in the Cox proportional hazard model.
Log rank test: This is to test for the overall difference in survival probability among the groups compared.

For the dataset, I will be using the colon dataset from the survival package. The data was collected from a clinical trial, which tested on the use of adjuvant chemotherapy regimens (Levamisole and Levamisole + 5-FU) for patients with colon cancer. While there are several variables in the dataset, we will be focusing on these variables to build the Cox proportional hazard model:

Treatment (rx): Whether patient was under Observation, Levamisole and Levamisole + 5-FU
Differentiation of the tumor (differ): Whether the tumor was considered as well differentiated, moderately differentiated or poorly differentiated
Nodal involvement (node4): Whether there were more than 4 nodes involved
Sex (sex): Whether patient was male or female
Time to event (time): Time taken for the event to occur, in this case death
Censoring status (censor): Whether the event was censored or not

For the analysis, I will be focusing on death, which is also the overall survival (Time from randomization in the trial to death).

Simple Cox proportional hazard model (Univariate)

Since the data was from a randomized clinical trial, I will assume that the randomization process was effective (e.g the characteristics of the patients between the three groups were similar). Hence, in theory, there isn’t a need to adjust for confounders for a clinical trial that has effective randomization. I will proceed on to plot the survival curve stratified by the treatment regimens as well as generate the Cox proportional hazard model with treatment regimens as the predictor. This is done using the survminer package and the following code is implemented:

graph <- survfit(Surv(time, status == 1) ~ rx, data = colon_1)
OS_plot <- ggsurvplot(graph, xlab = “Time in days”, risk.table = TRUE,
pval = TRUE)

Interpretation: The log-rank test state that the p-value is < 0.0001. This indicates that there is an overall difference between the three groups (Observation, only Levamisole and Levamisole + 5-FU) in terms of overall survival. From the graph, it shows that while there seems to be no difference in overall survival between the observation group and Levamisole group, there is a difference in overall survival between Levamisole + 5FU and the other two groups. The graph also suggests that there is no violation of proportional hazards assumption as the lines do not cross each other (explained in the later section).

Following which, I implement the following code to generate the Cox proportional hazard model and the hazard ratio:

model <- coxph(Surv(time, status == 1) ~ rx, data = colon_1)
summary(model)

Interpretation: The hazards of dying is 1.5% lower for patients who were given only Levamisole as compared to the observation group (95% confidence interval: 0.799–1.22). This difference is not significant (p-value = 0.888). In contrast, the hazards of dying is 40.1% lower for patients who were given Levamisole + 5-FU as compared to the observation group (95% confidence interval: 0.475–0.756). This difference is significant (p-value < 0.01).

The conclusion in the model corresponds to the survival curve, that Levamisole + 5-FU improves overall survival of patients with colon cancer.

Multiple Cox proportional hazard model

For the purpose of illustrating multiple cox proportional hazard model, I will include Differentiation of the tumor (differ), Nodal involvement (node4), Sex (sex) into the model for adjustment.

model_1 <- coxph(Surv(time, status == 1) ~ rx + differ + node4 + sex, data = colon_1)
summary(model_1)

Interpretation: After adjusting for the following variables, the hazards of dying is 0.28% lower for patients who were given only Levamisole as compared to the observation group (95% confidence interval: 0.806 –1.23). This difference is not significant (p-value = 0.98). In contrast, the hazards of dying is 39.9% lower for patients who were given Levamisole + 5-FU as compared to the observation group (95% confidence interval: 0.476–0.760). This difference is significant (p-value < 0.01). Since the changes in hazard ratios do not differ by 10% after adjustment, it shows that these variables are unlikely to confound the association between treatment regimens and time-to-death.

Assumption of Proportional Hazard (PH) in Cox PH model

An assumption made in generating the Cox PH model is that throughout the period, the hazards is proportionally similar among the three groups. Proportional hazards may be violated especially if the survival curves cross each other (In this case, it is not seen as shown in the graph earlier). Another way of examining for proportional hazards would be to implement the following code:

cox.zph(model_1)

Interpretation: From the screenshot, the test is statistically significant for differentiation of the tumor and nodal involvement. In addition, the global test shows that the test is statistically significant. This shows that proportional hazards assumption is violated in the model.

So what can be done in a situation in which proportional hazards assumption is violated? One way is to regroup the variable. For instance, since the proportional hazards assumption is violated for differentiation of the tumor, perhaps the groups that have smaller number of patients can be merged as one group. Another way is to report the hazard ratio and significance at specific timepoints that is clinically relevant.

In conclusion, I have looked at interpreting hazard ratio in cox proportional hazard model as well as testing for proportional hazards assumption in the model.

Interpreting Cox Proportional Hazards Model Using Colon Dataset in R

Written by YS Koh