Search
Search
Close this search box.

Comparison between the EKFC-equation and machine learning models to predict Glomerular Filtration Rate – Scientific Reports

Design overview

The ML models have been obtained using the datasets used for developing and validating the SCr-based EKFC equation2 and in case cystatin C was available, for the combined SCr/CysC-based EKFC-equation. We have limited the analysis to white patients since we are using the same cohorts as in2. More details about participants centers, measurement methods and patient characteristics are available in the Supplementary Material Tables S1S4. Briefly, we have data from 19,629 patients from 13 cohorts for development, internal and external validation. These cohorts were divided into development-internal validation or external validation datasets according to their age, exogenous marker used to measure GFR (mGFR) and mGFR levels, as described in Tables S1S3 (Supplementary material). For the models based on the single biomarker SCr, we used 13 cohorts, 7 for the development and internal validation, which were further randomly split into development (n = 8473; 25%) and internal validation (n = 2778; 25%) dataset, and the remaining 6 cohorts (n = 8378) were used for external validation, which are described in Table 1 and S4. For the models based on both SCr and cystatin C, we selected the patients from the same cohorts where both biomarkers were available (Table 2 and S4), leading to the following subsets: development (n = 4849; 41%), internal validation (n = 1603; 13.5%) and external validation (n = 5389, 45.5%).

Table 1 Basic participant characteristics with only serum creatinine available in the development, internal and external validation datasets.
Table 2 Basic participant characteristics with both serum creatinine and cystatin C available in the development, internal and external validation datasets.

All data were anonymized and the original study was approved by the Ethical Board at Lund University (Sweden) with amendment approved by the Swedish Ethical Review Agency.

We used the single biomarker SCr-based EKFC equation and the mean of the SCr-based EKFC and cystatin C based EKFC as benchmark for the current comparison. As cystatin C is not always available in clinical practice, we focused on the single biomarker SCr-based EKFC-equation in a first part of the analysis, but, as the combined equation has the highest accuracy and precision, in a second analysis, we also evaluated the difference between the combined SCr/CysC-based EKFC and the ML models.

Covariates

ML models were allowed to use age, sex, SCr (and CysC), height, weight and BMI, as these data were also available for most of the participants. SCr was measured using assays traceable to the gold standard isotope dilution mass spectrometry method (results from the CRIC study were recalibrated) as described in2. Cystatin C assays were standardized to the international reference material (ERM-DA471/IFCC)3.

Outcomes

Measured GFR was obtained using two methods: plasma clearance and urinary clearance. As previously described2, GFR was measured with different markers, but they all are recognized as reference methods15,16,17. For more details see Supplementary Material Tables S1S3.

Machine learning models

Several different ML models were evaluated: multi-layer perceptron (neural network), support vector machines, k nearest neighbors, linear regression, Random Forest regression, and XGBoost regression. We selected the best performing ML models regarding their performance on the internal validation set: linear regression, XGBoost and random forest.

  • Linear regression: A linear regression model with L1 (lasso regression) and L2 (ridge regression) regularization. Lasso is an acronym for least absolute shrinkage and selection operator. Lasso regression adds the ‘absolute value of magnitude’ of the coefficient as a penalty term to the loss function, so the cost of outliers increases linearly. Ridge regression adds the ‘squared magnitude’ of the coefficient as the penalty term to the loss function, so the cost of outliers increases exponentially. This method has been added as a baseline comparison.

  • Random forest regression: Random forest is an ensemble supervised ML algorithm made up of decision trees, which is used for both classification and regression problems. A random forest model with 100 trees was used. The model is constructed using bootstrapping, i.e., by constructing multiple datasets of the same size as the original dataset, created using resampling with replacement. Furthermore, candidate variables at each split are randomly selected to maximize diversity among the decision trees18,19.

  • XGBoost for regression: XGBoost is another powerful approach for building supervised regression models. An XGBoost model with 100 trees which employs mean squared error as its loss function was used. Boosting is a popular ensemble method, that sequentially builds models, in this case decision trees, where each model learns from the errors of the previous one20.

Random Forest creates decision trees independently and combines their outputs, whereas XGBoost builds trees sequentially to correct errors. As their base models, the random forests used predictive clustering trees19, a variant of decision trees which employ variance reduction as their heuristic. Furthermore, these trees can naturally handle missing covariate values. The library scikit-learn (version 1.4.2) was used for the linear regression and the random forest, whereas XGBoost was implemented using the homonymous library XGBoost (version 2.0).

Statistical analysis

The following usual metrics were used to compare the performance of ML methods with the EKFC equations2,3.

The median bias is the difference between the estimated GFR and the measured GFR. Values close to 0 are desired for this measure, but an absolute bias less than 5 mL/min/1.73m2 may be considered clinically acceptable.

The Interquartile Range (IQR) is the range of values between the 25th percentile and the 75th percentile of the difference between the estimated GFR and the measured GFR and represents the precision expressed in mL/min/1.73m2. Smaller values are associated with better precision.

The percentage of patients whose GFR was estimated within 10 or 30% of the measured GFR is the accuracy within 10 or 30% (P10 and P30). The goal for P30 is 100%, but P30 > 75% has been considered as “sufficient for good clinical decision making” by the Kidney Disease Outcomes Quality Initiative (K/DOQI), although their goal was to reach a P30 > 90%21. The Mean-square error (MSE) is the average of the squared differences between estimated and measured GFR. Smaller values are associated with better precision.

Median quantiles for bias across the age spectrum were graphically presented using fractional polynomials (linear, square and logarithmic). Likewise, accuracy P30 (%) was graphically presented across the age spectrum using cubic splines with two free knots and using 3rd degree polynomials. Bland–Altman plots (difference versus average) were also used to comprehend the differences in performance between ML models and EKFC.

Median bias, P10 and P30 are reported with 95% CIs. To test if an equation is different from another equation in the same population, we did not use statistical tests to avoid numerous p-value calculations, but the reader may consider an equation as different when the 95% CI between equations was not overlapping (which is a more conservative criterion). We made sub-analyses according to age (younger than 6, 6 to 12, 12 to 18, 18 to 40, 40 to 65 and > 65 years), body mass index (BMI) (< 20, 20 to 25, 25 to 30, 30 to 35 and > 35 kg/m2), measured GFR (mGFR) (< 30, 30 to 60, 60 to 90, 90 to 120, > 120 mL/min/1.73m2), and sex. SHAP (Shapley additive explanations) values were used to better understand the variable importance in the ML models22. SHAP values were generated using the homonymous library SHAP, version (0.43.0).