Estimating Treatment Effect of Adjuvant Chemotherapy in Elderly Patients With Stage III Colon Cancer Using Bayesian Networks

Advanced statistical analysis (with BNs) suggests that oxaliplatin should be avoided in elderly patients.


INTRODUCTION
With over 8,000 new cases annually, which constitutes approximately 7% of the total cancer incidence, colon cancer is the fourth most common type of cancer in the Netherlands. 1 Given that treatment is always a balance between cost (both financial and physical) and benefit, it is important to continuously assess and improve the effect of (adjuvant) treatment.
For stage III colon cancer, Dutch guidelines recommend treatment with surgery and adjuvant chemotherapy with either capecitabine and oxaliplatin (CAPOX) or infusional folinic acid-fluorouracil-oxaliplatin (FOLFOX).However, in elderly patients, the benefit of oxaliplatin is still a point of contention. 2,3ile 59% of the patients with colon cancer are older than 70 years, elderly patients are frequently excluded from clinical trials on the basis of age alone.Even if age does not rule out participation, elderly patients included may still not be representative for daily clinical practice as they must be fit enough to satisfy other inclusion criteria.effect, observational data are sensitive to bias, specifically selection biases like confounding by indication.For example, if predominantly relatively healthy patients receive treatment, this will skew results in favor of treatment.
5][6][7][8][9] In any case, correction requires a decision on which variables should be included as confounders.1][12] The pretreatment criterion would select all pretreatment variables as confounders.The common cause criterion would, as the name suggests, only correct for those variables thought to be common causes of exposure and outcome.The backdoor path criterion resembles the common cause criterion but takes chains of influence into account and can thus correct for indirect confounders.The disjunctive cause criterion would correct for those variables thought to be either a cause of exposure or outcome.
When there is uncertainty about the presence of unmeasured confounding, it can be reasoned that, generally, the disjunctive cause or backdoor path criteria yield the most unbiased results.For example, correcting for variable C 2 in Figure 1, which would be corrected for when applying the pretreatment criterion, would unnecessarily bias the estimated effect of T→ Y. Application of these criteria, however, requires a causal model.Interestingly, structure learning algorithms for Bayesian Networks (BNs) can be used-with input from domain experts-to discover causal models from data, as recently shown. 13iefly, BNs are a type of probabilistic graphical model whose structure is determined by a directed acyclic graph where nodes represent variables and directed edges signify (preferably causal) relationships. 11,14Each node is associated with a probability distribution conditional on its parents (ie, nodes with a directed edge toward that node).

CONTEXT Key Objective
Adjuvant chemotherapy in elderly is effective although previous analyses dispute the benefit of adding oxaliplatin to fluoropyrimidines.Using observational data to estimate treatment effect always comes with the risk of bias, specifically confounding by indication.Here, we use structure learning algorithms for Bayesian Networks, in conjunction with clinical knowledge, to identify confounders, mitigate this risk, and reliably estimate the effect of adjuvant treatment in colon cancer.

Knowledge Generated
Structure learning can aid in finding causal relationships in observational data by precluding noncausal relationships, thus facilitating identification of confounders.We found a strong effect of adjuvant therapy on survival in elderly patients (70 years and older); additional oxaliplatin provided no further benefit.

Relevance
Identification of and correction for confounders is required for using observational data for estimating treatment effects.The ability to do so greatly enhances the use of observational data.Our research suggests oxaliplatin should be avoided in elderly patients.When using structure learning in conjunction with these expert-defined counterfactuals (a blacklist consisting of prohibited edge directions), it is possible to obtain a causal graph.This model can then be used for confounder identification by applying one of the aforementioned criteria and subsequently mitigation.
Previously, we analyzed the effect of adjuvant chemotherapy with CAPOX or capecitabine monotherapy (CapMono) in elderly patients with stage III colon cancer using data from the Netherlands Cancer Registry (NCR) which covers the entire Dutch population. 2 In this analysis, we adjusted for confounding by indication using the pretreatment criterion.
In this article, we again investigate the treatment effect of adjuvant chemotherapy and the added benefit of oxaliplatin (CAPOX) over CapMono.To this end, we first build causal models using BNs in conjunction with different structure learning algorithms and a varying number of recurrence/ survival nodes.Finally, we explore the effect of correcting for different sets of confounders, as identified by these causal models, and compare this with propensity score correction.

Data
The original data set from the study by van Erning et al 2 S1 and S2).For analysis of CAPOX compared with CapMono, a subset consisting of patients who received adjuvant therapy was created (n 5 352).
For assessing association with treatment and for use in the BN, age was discretized into three categories: 70-74, 75-79, and 80 years and older (Data Supplement, Discretization of Age and Fig S2).The anatomic subsite was relabeled as proximal (caecum-splenic flexure of colon), distal (descending-sigmoid colon), or unknown/unspecified.
OS, counted from the date of surgery, was discretized into five Boolean variables representing 1-to 5-year survival, introducing not applicables (NAs; missing values) for patients who had a follow-up of <5 years and were alive at the time of follow-up.The absence of recurrence was discretized similarly.However, because of the limited follow-up, variables representing the absence of recurrence for ≥3 years were skewed toward recurrence or NA.Consequently, these three variables were dropped (Data Supplement, Variable Selection).

BN Development
Using different combinations of outcome variables, full and simplified BNs were developed.The full networks used two nodes/variables to represent recurrence at 1 and 2 years and five nodes for OS at 1, 2, 3, 4, and 5 years.The simplified networks used the last node of each.
The rationale for also evaluating a simplified network is the following.Having multiple outcome nodes at sequential time points might limit the BN ability to find other associations.For example, death at 1 year after diagnosis perfectly predicts death for all subsequent nodes/time points.Therefore, when testing for association between any other variable and a survival node at a later time point, one is essentially testing for association with survival in a specific interval.It can be reasoned that this reduces the (statistical) power to find associations with later time points.For example, if variable 1 is strongly associated with 1-year survival than any, association of variable 2 with 2-year survival will be harder to detect.In the simplified networks, only the final time point is included to obtain the maximal number of associations with the highest statistical power as the treatment effect on survival is expected to be the highest at that time point.
In addition, we evaluated the effect of using different algorithms than the constraint-based NPC algorithm, as available in Hugin. 18Specifically, we applied MMHC (hybrid) and SM (score-based) as implemented in the R package bnstruct. 16,17,19,20Either adjuvant treatment (Yes, No) or adjuvant treatment regimen (CAPOX, CapMono) was included as appropriate.Obtained BNs were visualized using the R package qgraph. 21 all cases, structure learning was performed with the level of significance set to 0.05. 15In Hugin, the process was supported using a manually defined blacklist of prohibited edge directions (Fig 3).bnstruct uses a different, less granular approach and accepts a list of layers, where edges from lower to higher layers are forbidden.
The NPC algorithm may yield ambiguous regions, consisting of a set of interdependent uncertain links where the absence of one link depends on the presence of another.When this happens, Hugin prompts for input.For resolution, we first prioritized associations with outcome nodes (recurrence and survival) and subsequently associations with treatment.
In some situations, Hugin ignores the supplied blacklist after manual resolution and misdirects the selected edge.For example, Hugin created an edge from the node representing 4-year survival (surv_04y) toward the ASA node.To overcome this issue, misdirected edges were added to a (minimal) whitelist of enforced edges and structure learning was repeated.
The resulting graphs were inspected to identify confounders.

Propensity Score Analysis
Propensity scores were calculated as follows.First, the chisquared test was used to assess the association between pretreatment variables and adjuvant treatment.Next, variables with a significant association with treatment were used as covariates in the propensity score calculation.For adjuvant treatment versus surgery only, these were sex, age, comorbidities, ASA, and pN.For CAPOX versus CapMono, these were age and comorbidities only.
Subsequently, the propensity score was discretized using quintiles.Finally, the Cochran-Mantel-Haenszel (CMH) chi-squared test, with the bins as strata, was used to evaluate the effect of propensity score correction.

Cox Proportional Hazards Models
Several Cox models were developed, with each investigating the effect of (regimen of) adjuvant treatment on survival while correcting for a different set of potential confounders, driven by the results from structure learning.Covariates included are listed in Figure 4. Hazard ratios (HRs) for treatment were extracted, together with their CIs.

Variable Distribution and Propensity Score Correction
Chi-squared analysis showed that variables age, sex, pN, ASA (physical performance status), and comorbidities were significantly associated with receiving any form of adjuvant treatment (Table 1; Data Supplement, Table S1).Stratification using the discretized propensity score as strata in the CMH chi-squared test removed all associations between these conceivable confounders and treatment.

BN Development and Confounder Identification
The full BNs are shown in the left column of Figure 5.In the BN found by the NPC algorithm, two confounders were identified through inspection of the graph: pN and ASA.The BNs found by MMHC and SM algorithms did not identify any confounders and generally had fewer edges.
In the simplified networks, which focused on a limited set of recurrence and survival nodes (Fig  did not find an association between pN and treatment (node adj_therapy) but did find an association between age and surv_05y, identifying age as a potential confounder.
In summary, depending on the algorithm selected and nodes included, treatment was either unconfounded or confounded by pN, pN and ASA, or age.

Cox Proportional Hazards Models
Adjuvant treatment was significantly associated with survival in all models (Fig 4, top row; Data Supplement, Table S3).Estimated HRs were between 0.39 and 0.45, depending on covariates included.The models had overlapping CIs and yielded comparable results.

CAPOX Versus CapMono
Variable Distribution and Propensity Score Correction Chi-squared analysis showed variables age and comorbidities to be significantly associated with the choice between CAPOX or CapMono (Table 1; Data Supplement, Table S2).
Here too, stratification using the binned propensity score as strata in the CMH chi-squared test removed all associations between these suspected confounders and treatment.

BN Development and Confounder Identification
From the full networks, none of the algorithms identified an association between the adjuvant treatment regimen and survival (Fig 6).From the pretreatment variables, only grade was associated with survival (using the NPC algorithm) and no confounders were found.
In the simplified networks, again, none of the algorithms identified an association between the adjuvant treatment regimen and survival.All networks suggested that age was a driving factor for the choice of regimen.The SM algorithm implied that pN was an independent predictor of survival.
To summarize, regardless of the algorithm selected and nodes included, choice of treatment regimen was not only unconfounded but also unassociated with survival.

Cox Proportional Hazards Models
None of the Cox models found significant association between the adjuvant treatment regimen and survival with HR point estimates around 0.93 and CIs equal to or wider than 0.68-1.28(Fig 4, bottom row; Data Supplement, Table S4).
CIs between the models were largely overlapping.

DISCUSSION
We When comparing adjuvant treatment with surgery only, the set of identified confounders differed, depending on both the specific algorithm used and the nodes included in the network.Potential confounders identified were pN, ASA, and age although 3 (out of 6) networks suggested that the treatment effect was unconfounded.All Cox models found significant benefit of adjuvant treatment over surgery only (HRs ranging between 0.36 and 0.46); correcting for different sets of potential confounders had no marked effect, and neither did correction using propensity score.
In the comparison between CAPOX and CapMono, no confounders were identified, regardless of the algorithm or nodes included: none of the BNs found an association between the choice of treatment regimen and survival.The unadjusted Cox model yielded the same conclusion and did not find any effect of adding oxaliplatin.Correcting for propensity score did not make a difference.
Looking at the BNs in general, several interesting associations between pretreatment variables are found in all networks, regardless of algorithm or node selection.Unsurprisingly, age appears to be associated with the choice of adjuvant treatment, not only when considering whether to start either regimen but also when making a choice for a specific regimen.Similarly, sex was associated with the tumor location and grade with pN, both associations that have been reported before. 23,24The association found between the number of comorbidities and ASA score also makes sense considering the use of the ASA score (physical performance status) to approximate comorbidity in registries. 25n the other hand, some expected associations were not, or only infrequently, found: only one network connected pT and survival.However, this might be explained by the fact that we selected stage III and the (vast) majority of patients had a pT3 tumor.Overall, we considered the obtained networks clinically plausible.
Previously, Wilkinson et al 26  The effect of adding oxaliplatin to a fluoropyrimidine (eg, capecitabine or fluorouracil) in stage II-III colon cancer has been extensively studied.NO16968 and MOSAIC found a small (4-6 percentage point) but significant benefit of adding oxaliplatin to the treatment regimen.In MOSAIC, treatment effect was stronger in patients with stage III compared with stage II; NO16968 only targeted stage III patients. 29,33Both trials included a relatively young population.NSABP, which also investigated a younger population compared with our analysis, did not find an overall benefit of adding oxaliplatin but did report a significant effect in an unplanned subset analysis of patients 70 years and younger. 32In a subgroup analysis, the MOSAIC study drew the complimentary conclusion that there is no additional benefit of oxaliplatin in elderly patients.This is in line with the results obtained here.
In conclusion, we have shown that structure learning elucidates underlying relationships in data, helping select which variables should be corrected for.Varying included variables and algorithms yielded (slightly) different, complementary results and identified different sets of potential confounders.HRs were similar regardless of the chosen set.
We found a strong association between adjuvant treatment with capecitabine and survival in stage III colon cancer in our

FIG 5 .
FIG 5. Bayesian Networks for adjuvant treatment versus surgery only.Rows 1-3 correspond to the algorithms NPC, MMHC, and SM, respectively.The left column shows full networks, and the right column shows simplified networks.Treatment indicates whether any form of adjuvant treatment was received.Top left: pN and ASA act as confounders.Top right: pN acts as a confounder.Middle left: no confounders identified; pN, ASA, and treatment are independent.Middle right: no confounders identified.Bottom left: no confounders identified; pN, ASA, and treatment are independent.Bottom right: age acts as a confounder.ASA, American Society of Anesthesiologists score; MMHC, Max-min Hill-climbing; NPC, Necessary Path Condition; pN, pathological lymph node classification; pT, pathological tumor classification; SM, Silander-Myllymaki.

FIG 6 .
FIG 6. Bayesian Networks for CAPOX versus CapMono.Rows 1-3 correspond to the algorithms NPC, MMHC, and SM, respectively.The left column shows full networks, and the right column shows simplified networks.Treatment indicates whether CAPOX or CapMono was received.Top left: no confounders found.Top right: no confounders found.Middle left: no confounders found.Middle right: no confounders found.Bottom left: no confounders found.Bottom right: no confounders found.ASA, American Society of Anesthesiologists score; CapMono, capecitabine monotherapy; CAPOX, capecitabine and oxaliplatin; MMHC, Max-min Hill-climbing; NPC, Necessary Path Condition; pN, pathological lymph node classification; pT, pathological tumor classification; SM, Silander-Myllymaki.
In addition, in 2013 and 2014, details regarding adjuvant therapy, the number of comorbidities, and development of (local) recurrence were acquired from medical records and added to the NCR.The year of diagnosis was available but dropped after a quick analysis of its association with recurrence (Data Supplement, Variable Selection and FigS1).As before, patients who died within 90 days after surgery (n 5 125) were excluded since these deaths were likely due to surgical complications and patients were unable to undergo adjuvant chemotherapy.Patients receiving chemotherapy other than CAPOX or CapMono (eg, FOLFOX) were excluded.This left a final data set consisting of n 5 982 records (Fig 2; Table 1, Data Supplement, Tables 0 ) colon cancer and 70 years and older, was selected from the NCR.The variables sex, age, ASA classification (a system indicating physical performance status, developed by the American Society of Anesthesiologists), pT, pathological lymph node classification (pN), tumor subsite (coded according to ICD-O-3), and differentiation grade were included.

TABLE 1 .
Variable Distribution and Association Between Variables and Treatment NOTE.For adjuvant therapy versus surgery only, propensity scores were calculated using sex, age, comorbidities, ASA, and pN.For CAPOX versus CapMono, age and comorbidities were used.P: calculated using the Pearson chi-squared test.P* calculated using the CMH Chi-squared test after stratification by propensity score quintile.
JCO Clinical Cancer Informatics ascopubs.org/journal/cci| 5 Estimating Treatment Effect in Elderly Patients With Stage III Colon Cancer

Model HRs-Adjuvant v Surgery Only (reference category)
Structure learning constraints for the effect of adjuvant chemotherapy on overall survival.Left: constraints for Hugin.The blacklist is made up from cells containing x: these edges are forbidden.Minimal whitelist (ie, enforced edges) is formed by cells marked with >.Right: constraints for bnstruct.Edges can only originate from a node in a layer at the same height or higher.For example, sex → age is allowed, but not the reverse.Edges between nodes at the same height can point either way.ASA, American Society of Anesthesiologists score; pN, pathological lymph node classification; pT, pathological tumor classification.
were able to successfully leverage BN structure learning algorithms in conjunction with (basic) clinical knowledge to create causal models and subsequently identify potential confounders.Different structure learning algorithms identified different potential confounders, but none were contradicting.In other words, confounders identified in one model were not mediators or colliders in others (Data Supplement, FigS3), which may introduce bias if corrected for.From a theoretical point of view, score-based algorithms run a higher risk of incorrectly classifying a variable as a confounder in the presence of unmeasured confounding (Data Supplement, FigS4).In this case, this mechanism does not seem to play a significant role since all HRs were very close together.
investigated the effect of adjuvant 5-fluorouracil with leucovorin compared with surgery only.The HR found here is smaller than the effect they reported, which was 0.62 in stage II-III and 0.65 in stage III only.It should be noted that the age distribution between our study (100% for patients older than 70 years) and the study by Wilkinson et al (approximately 36%-51% patients younger than 60 years depending on the treatment group) is vastly different.Our results seem to suggest that, in our elderly cohort, chemotherapy improves survival relatively more.