close
close

A drug mix and dose decision algorithm for individualized type 2 diabetes management

Study design and datasets

We used anonymized electronic medical records from the SingHealth Diabetes Registry (SDR)29. Our study was approved by the SingHealth Centralized Institutional Review Board (Protocol No. 2019/2414) and the A*STAR Institutional Review Board (Protocol No. 2019-079) with a waiver of informed consent, and fully complied with all relevant ethical regulations.

The SDR contains EMR data for 220,618 adult T2D patients with 3,513,887 ‘prescription visits’ (outpatient visits with glycemic control prescriptions) to any of 22 institutions (ten primary care clinics, nine acute hospitals and specialty centers, three community hospitals) within Singapore’s largest health cluster, SingHealth, between January 2013 to December 202029. Following previous work17, we excluded patients with a recorded history of type 1 or gestational diabetes as decisions on pharmacotherapy in these patients often require thorough reviews of self-monitored blood glucose readings, which were not available in the SDR. Other patients were included if they had outpatient prescription visits meeting the following criteria: (i) preceded by at least one prior prescription visit; (ii) preceded by at least one HbA1c measurement within 30 days before the visit; (iii) followed by at least one HbA1c measurement two to six months after the visit; and (iv) not followed by a visit with a prescription change (relative to the current visit prescription) before the next HbA1c measurement. These criteria, applied to enable meaningful modeling, yielded a total of 107,854 eligible T2D patients with 1,039,869 prescription visits.

We divided the set of eligible patients into four independent cohorts (Supplementary Fig. 1) to define a model development set (21 sites, visits during 2013–19), an internal validation set IV (21 development sites, 2013–19), an external validation set EV (external site designated as the primary care clinic with the largest number of prescription visits, 2013–19), and a temporal validation set TV (any of 22 sites, visits during 2020). The internal validation set reflects diverse care settings for assessing model reproducibility. The external site was designated as a large primary care clinic to assess generalizability in settings where the decision algorithm is likely to be adopted. The model development, internal validation and external validation sets were defined at the start of the study (when only 2013–19 data were available), while the temporal validation set was only defined prospectively when the 2020 data became available.

To ensure that patients with more visits are not over-represented in evaluations, we randomly sampled only one visit per patient in each test set. To ensure that the test sets reflect their respective originating data distributions, all sampling procedures were stratified36 based on age, gender, ethnicity, pre-visit HbA1c, visit year, care setting (primary care or hospital), and the incidence of comorbidities and complications relevant to T2D management. Specifically, we divided visits into subgroups (or strata) based on the above characteristics, and randomly sampled visits from each subgroup in proportion to the size of the subgroup.

Data preprocessing

For each dataset, we extracted a set of 38 EMR variables, including demographics, physical measurements, laboratory values, medical history, and prescription details (Supplementary Table 1). For every prescription visit, we defined a ‘visit medical profile’ comprising patient characteristics, pre-visit medical history, and the previous visit prescription; an in-visit ‘treatment recommendation’ represented as the prescribed set of generic name antidiabetic drugs (‘drug mix’), their daily dosages, and the associated regimen ‘severity level’ (defined based on the number of oral antidiabetic drugs (OADs) and/or type of injectables, as below); and a ‘post-visit HbA1c’ outcome used to define degree of glycemic control.

First, we processed and coded medical profile features at the prescription visit level, as follows. We represented physical measurements and laboratory values using the most recent measurements and the quartile summary statistics (25th, 50th, 75th percentiles) of measurements taken within 1 year before the visit. We denoted the last HbA1c reading before the visit as the ‘pre-visit HbA1c’. We mapped diagnosis codes from any previous visits to features indicating past incidence of relevant comorbidities (hypertension, hyperlipidemia, cardiovascular disease, cerebrovascular disease), complications (including macrovascular, microvascular, and other diabetes complications), and other conditions that could influence treatment decisions (including severe hypoglycemia, keto-/lactate acidosis, hepatic insufficiency, and pancreatitis). We used previous surgery codes to infer past incidence of lower limb amputation, bariatric surgery and vascular surgery. We also included features representing whether or not the patient was previously prescribed any lipid-lowering, antihypertensive, and antiplatelet drugs. We imputed missing data using mean values in the model development set. We represented the previous visit prescription as the prescribed set of generic name antidiabetic drugs (drug mix) and their daily dosages. Drug mix was denoted as a subset of 31 generic name antidiabetic drugs (OADs, injectables, combination drugs) prescribed within the model development set (Supplementary Table 2). Each prescription was associated with a regimen ‘severity level’ defined as: (1) 1 oral antidiabetic drug (OAD); (2) Multiple OADs; (3) First injectables (glucagon-like peptide-1 receptor agonists (GLP-1 RA) or long-acting insulin); (4) Premixed insulin; and (5) Rapid-acting insulin. Second, we encoded the in-visit treatment recommendation as the change in drug mix, daily dosage, and severity level, relative to the previous prescription. Third, we coded the post-visit HbA1c as the mean of all HbA1c readings taken two to six months after the visit.

Treatment recommendation task

Given input of a patient’s visit medical profile, the task is to recommend the drug mix and daily dose regimen that optimizes glycemic control (defined as achieving a post-visit HbA1c at or under the patient’s individualized HbA1c target), subject to clinical knowledge-based guidelines. We set individualized HbA1c targets based on age, diabetes duration, and medical history37. In general, patients had a glycemic target of 7.0%. However, for patients aged

Treatment options

We defined the drug mix options as all possible subsets of 24 generic name antidiabetic medications: comprising all OADs, injectables or combination drugs available in the SDR, except those with very low outpatient utilization rates (Supplementary Table 2). For non-insulin medications, we defined cumulative daily dosage options using local formulary configurations30. For insulins, since exact dose titration often relies on factors beyond the EMR, including self-monitored capillary blood glucose readings, we limited dosage options to “maintain” or “increase” relative to the previous prescription.

Clinical knowledge-based guidelines

To facilitate meaningful treatment recommendations, we distilled established clinical practice standards published by the American Diabetes Association (ADA)2, the European Association for the Study of Diabetes3, and the Agency for Care Effectiveness Singapore38; drug label requirements from MIMS Singapore30 as of August 2021; and the experience of clinical experts in managing T2D into a compilation of knowledge-based diabetes management (KDM) guidelines. We included important medication contraindications (based on age, estimated glomerular filtration rate or eGFR, medical history); restrictions on drug combinations (based on previous visit prescription of sulfonylurea, meglitinide, premixed or rapid-acting insulin, DPP-4 inhibitor, and/or GLP-1 RA); renal dosing requirements (based on eGFR and previous prescription information); restrictions on introducing medications with higher risk of hypoglycemia (based on patient age, HbA1c and glycemic target); as well as drug prioritizations. For drug prioritizations, we specified the general ranking order based on multifactorial practice requirements as (1) Metformin, (2) SGLT2-i, (3) GLP-1 RA, (4) DPP-4 inhibitor, (5) long-acting insulin, and (6) others, with stronger prioritization of metformin in cases with no active metformin prescription and of SGLT2-i and GLP-1 RA based on age and cardiorenal comorbidities. In constructing the knowledge representation, we also incorporated key institutional considerations implicit in the SDR data. The KDM guidelines are summarized in Supplementary Table 2.

Drug mix and dose decision algorithm

Our decision algorithm, AIDA, employs a predict-then-optimize approach. AIDA uses an HbA1c prediction model to evaluate the glycemic control objective; and a heuristic to optimize this objective subject to the KDM guidelines specified above (Fig. 1). AIDA’s treatment options comprise combinations of generic drug names and their daily dosages, resulting in recommendations of the form “metformin: 1000 mg/day, sitagliptin: 50 mg/day, insulin glargine: increase”.

Prediction model

Given the visit medical profile and a plausible in-visit treatment recommendation, the modeling task is to predict the post-visit HbA1c. For the visit medical profile, we performed feature selection using the Least Absolute Shrinkage and Selection Operator (LASSO)39 regression method on the model development set. We applied LASSO with 4-fold cross-validation and grid search across hyperparameters controlling the regularization strength, and used the model with the highest R2 to select the non-zero features. For the in-visit treatment recommendation, we included the complete drug mix and dose regimen feature set. We trained the prediction model using eXtreme Gradient Boosting (XGBoost) for regression, a gradient-boosted regression tree-based ensemble machine learning algorithm40. We tuned the hyperparameters (learning rate and number of estimators) using 4-fold cross-validation and grid search across five learning rate settings {0.001, 0.005, 0.01, 0.05, 0.1} and three number of estimators settings {500, 1000, 1500}. We assigned the model with the highest R2 as AIDA’s prediction model. We further used the XGBoost regression model prediction to classify whether the post-visit HbA1c is within the individualized glycemic control target, so as to inform the decision algorithm.

Optimization details

AIDA’s search heuristic is designed to carefully consider dose effects and practical constraints on severity levels. First, to evaluate possible treatment intensifications (Fig. 1, steps 2–5), we leveraged the established dose response property that treatment effects should be increasing in dose (up to specified maximum ranges)41. Hence, for a given treatment regimen, if glycemic control is not predicted to be possible at a given dose, then it is also not likely to be possible at lower doses. Based on this property, AIDA uses maximum feasible doses to evaluate whether a given treatment regimen is likely to bring the patient into glycemic control. Then, once the optimal treatment regimen is identified, AIDA estimates the minimum dose required for glycemic control. Specifically, it searches the set of feasible doses in descending order, evaluates the corresponding post-visit HbA1c for each feasible dose, and returns the lowest dose seen such that glycemic control is possible. Further, to limit over-intensification for safety and practicability, we capped severity level changes to one if pre-visit HbA1c 1c target + 1%} and two otherwise. Formal algorithm statements are provided in Supplementary Notes1 and 2.

Quantitative performance evaluations

To evaluate performance of AIDA’s HbA1c regression model, we computed RMSE, R2, and MAPE using observed (ground-truth) post-visit HbA1c measurements under standard of care (SoC) prescriptions. In addition, we categorized whether the predicted values were within the individualized HbA1c target and computed the F1 score.

Further, we compared post-visit HbA1c outcomes under AIDA recommendations, the SoC prescriptions and a no-change recommendation. For these comparisons, we considered visits where AIDA recommends a treatment intensification (increase dose, substitute or add medications). To ensure a fair comparison between AIDA and the SoC, we only allowed AIDA to recommend medications that were available at the time of any given visit. This required disallowing AIDA from recommending drugs approved after January 1, 2013 (i.e., SGLT2 inhibitors) for any visits preceding the drug’s earliest prescribed date in our dataset. Since outcomes under AIDA and no-change recommendations are not observable, we estimated them using weighted importance sampling (WIS)42, an off-policy evaluation technique widely adopted in the treatment optimization literature21,43,44,45. Weighted importance sampling approximates the expectation of a function of a random variable X with respect to a target distribution p by computing the weighted average of samples from another distribution q. More specifically, given samples of X, \({x}_{i},{i}=1,\,\ldots ,{n},\) from distribution q, the expectation of a function f under distribution p is approximated by the WIS estimator:

$${\rm{WIS}}=\,\frac{\mathop{\sum }\nolimits_{i=1}^{n}w\left({x}_{i}\right)f\left({x}_{i}\right)}{{\sum }_{i=1}^{n}w\left({x}_{i}\right)},$$

(1)

where the weights \(w({x}_{i})\) are given by

$$w\left({x}_{i}\right)=\frac{p\left({x}_{i}\right)}{q\left({x}_{i}\right)}.$$

(2)

In order to estimate the expected HbA1c outcome and reduction under AIDA (or no-change) recommendations, we require the probability of a given recommendation under the SoC (distribution q) and under AIDA (or no-change) (distribution p). To estimate distribution q, we trained multinomial logistic regression models on each test set to classify SoC treatment recommendations given patient visit medical profiles. For this task, we represented treatments at the drug class level18,20 and set the one-hot-encoded selected treatment as the output, and used the same input features as AIDA’s XGBoost model while excluding all features describing the current visit treatment regimen. These models perform reliably (Supplementary Table 7). To obtain the probability distribution p, we softened AIDA (or no-change) recommendations43 so as to recommend the suggested treatment regimen with 0.99 probability and any other treatment regimen uniformly at random with a total of 0.01 probability. To estimate the distributions of HbA1c outcomes and reductions under SoC, AIDA, and no-change recommendations, we used bootstrapping with 500 resamplings of 80% of the data. For each bootstrap sample, we computed the mean post-visit HbA1c outcomes under the SoC, AIDA, and no-change recommendations, respectively. We computed the differences between the mean pre-visit and post-visit HbA1c values to obtain the mean post-visit HbA1c reductions observed under the SoC, and estimated under the AIDA and no-change recommendations. We also computed differences between the post-visit HbA1c outcomes under AIDA and SoC to estimate the post-visit HbA1c benefits under AIDA, relative to SoC. We then computed 95% confidence intervals for all estimates across the 500 bootstrap samples.

For statistical testing, we hypothesized that the estimated mean post-visit HbA1c under AIDA would be lower than the mean pre-visit HbA1c. Further, we hypothesized that the estimated mean post-visit HbA1c reduction under AIDA would be better than that observed under the SoC prescription and that estimated under the no-change recommendation. We assessed significance levels using one-sided Student’s t-tests46. We repeated evaluations under five random 80/20 splits of data from the development sites to assess sensitivity to train/test data splitting; and for 17 subgroups based on demographic factors, body mass index (BMI), eGFR, and pre-visit HbA1c.

Comparators that omit drug-dose specifics

We compared AIDA to two competing methods that consider less granular treatment options.

The first comparator is a previously developed treatment advisory17 which defines treatment options as combinations of drug classes (e.g., “metformin, an injectable, and one other drug”). This comparator, which we term as Drug Class-combination Advisor (DCA), uses k-nearest neighbors to predict post-visit HbA1c under different drug class combinations and recommends the combination with lowest predicted post-visit HbA1c. DCA differs from AIDA in granularity of its medication representation as well as its visit medical profile feature representation and optimization objective (unlike AIDA’s objective to achieve individualized glycemic targets, DCA minimizes post-visit HbA1c). Our implementation followed the original publication17, except that we excluded ‘no regimen’ from the menu of treatment options and expanded the definition of injectables to include GLP-1 RA. More specifically, we used the following treatment options: (1) metformin monotherapy, (2) injectable monotherapy, (3) one non-metformin oral agent, (4) metformin and one other oral agent, (5) injectable and metformin, (6) injectable and one non-metformin agent, (7) two non-metformin oral agents, (8) metformin and two other oral agents, (9) injectable and metformin and one other agent, (10) injectable and two non-metformin agents, (11) three non-metformin oral agents or at least four agents.

The second comparator, termed Drug Mix Advisor (DMA), is an ablated version of AIDA, which we constructed to pinpoint the impact of including granular dose information on the recommendations. DMA’s treatment options are defined as a combination of generic drug names (e.g., “metformin, sitagliptin”). DMA is identical to AIDA in all aspects including representation of patient characteristics and medical profile, optimization objective, guidelines, prioritization structure, and search approach, except in granularity of its medication representation (previous visit prescription and in-visit recommendation). Specifically, unlike AIDA, DMA does not use dose information (i.e., the daily dosage features/options) either for HbA1c prediction or for optimization. This difference gave rise to drug class level differences between AIDA’s and DMA’s recommendations for 38.9–44.3% of visits eligible for treatment intensification in our three test sets. As DMA has a more granular medication representation than prior works17,18,19,20,21 which predominantly represent medications at the drug class or drug type level, it serves as a compelling comparator.

For fair comparisons, we tuned HbA1c prediction models used by both the comparators to achieve R2 as close to AIDA’s HbA1c model as possible. Further, we ensured that each comparator incorporated all the specified KDM guidelines that are permitted by its architecture and medication representation. More details on treatment options and knowledge-based guidelines for the comparators (in relation to AIDA) are in Supplementary Table 2.

Qualitative evaluations

To evaluate the quality of treatment recommendations, we conducted case reviews with a panel of specialists. We purposively sampled cases from the test set IV to represent a diverse set of patient visits with sufficient coverage for important clinical concepts and scenarios, as follows. We defined case profiles based on combinations of five primary criteria: (a) age, (b) care setting, (c) pre-visit HbA1c, (d) eGFR, and (e) regimen severity levels. An example of a case profile is ‘young adult seen at a primary care clinic with pre-visit HbA1c 7–8%, eGFR ≥ 60 ml/min/1.73 m2, and regimen of severity level 1’. We prioritized 60 case profiles with high prevalence in test set IV, while allowing for diversity. For each case profile, we randomly sampled 1 patient visit from the subset of test set IV visits that match the profile, while ensuring diversity and representation across secondary criteria covering (a) drug classes in active prescription, (b) number of comorbidities and (c) types of comorbidities. We also ensured that the selected cases represented the ratio of cases with differences between AIDA and the comparators in the overall test set. The characteristics of the 60 cases are provided in Supplementary Table 8, and generally follow relevant prevalence trends in the test set.

The evaluation panel comprised three senior clinicians with 17, 10, and 10 years of experience as practicing endocrinologists (Y.M.B., D.C., and P.C.L.). The evaluators first independently reviewed the cases, as follows. For each case, each evaluator, masked to SoC and algorithm recommendations, reviewed the case and noted his/her treatment recommendation. Thereafter, he/she reviewed treatment recommendations from AIDA, and responded to an open-ended questionnaire on the extent to which AIDA’s recommendation is sensible (i.e., consistent with established clinical practice norms) and precise (i.e., provides sufficient specifics to inform practical drug mix and dose decisions), and on how AIDA’s recommendation compares with the competing methods.

Two members of the research team (I.H.M., P.K.) independently analyzed the evaluators’ assessments. Specifically, they coded free-text comments from each evaluator into ordered categories defined as clinically sound (expected to pass muster with a sizeable number of medical professionals), acceptable (suboptimal but not wrong or unsafe, can be encountered in real-world settings), and not sensible (unsafe or not acceptable by clinical standards). The analysts also identified emergent themes47 for comparisons between AIDA and the competing methods. For cases where both analysts found the three evaluators to be in general agreement, we report the apparent consensus opinion. For cases where there were differing opinions, we conducted an adjudication session to resolve differences and obtain consensus opinion of the evaluation panel. In total, 28 cases had to be adjudicated for the question of whether AIDA sensible, 13 cases for the AIDA to DMA comparison, and 5 cases for the AIDA to DCA comparison. In 10 cases, consensus could not be reached, and the panel chose to report opinions by majority vote. Further, the research team discussed observations on comparisons between AIDA and the competing methods with the evaluators to obtain consensus on AIDA’s main advantages and disadvantages. For advantages, we identified illustrative cases to exemplify the themes. In cases with disadvantages or limitations, we identified corrective actions needed for future refinements. All reported results are based on the consensus opinion of the evaluation panel. A summary of cases and assessments is available upon request.

Guideline concordance analysis

Finally, we compared AIDA and the competing methods by the degree of concordance with the medication contraindication, drug combination restriction, and/or renal dosing guidelines specified within the treatment recommendation task (Supplementary Table 2). We first considered test set visits for which one or more of the above specified guidelines apply. Then, for each of these ‘relevant visits’, we assessed treatment recommendations from AIDA, DMA, and DCA. We designated a recommendation as concordant only if it definitively aligns to all of the above applicable guidelines. For each test set and treatment advisor, we quantified guideline concordance rates as the fraction of relevant test set visits with concordant recommendations.