Close this search box.

Deep learning algorithms for predicting renal replacement therapy initiation in CKD patients: a retrospective cohort study – BMC Nephrology

This study aims to find the most effective algorithm to predict the risk of starting RRT among CKD patients within a given period. This was done by DLAs and comparing their performance to that of the KFRE. To the best of the authors’ knowledge, this was the first head-to-head comparison between KFRE and different neural networks using datasets of patients who had not been recruited in other studies before.

The study used different techniques in deep learning and developed five different models to predict the risk of starting RRT in two years and five years. All models were validated using a subset of patient data which was completely isolated from the dataset since the start of model development.

Of the five DLAs developed, the convolutional neural network (CNN) and a neural network combining CNN, long short-term memory (LSTM), and artificial neural network (ANN) layers showed the most accurate performance with the highest ROC-AUC score. In fact, the CNN model slightly outperformed the combined neural networks. This was unexpected as a more complex neural network was assumed to be able to consider more features and temporal relationships. The most likely reason for this was the relatively small training and validation dataset size causing overfitting in complex models. Further model development with larger datasets involving multiple centers should improve models’ performance.

The review of the mislabeled patients, particularly false positives, proved that DLAs can pick up non-linear relationships among different features and provide predictions which may outperform the initial labelling system (Supplementary Table ST4). The less completely missed predictions and predicting uncoded renal transplantation of the combined model also suggested that a combined model may outperform single structured neural networks with adequate training data.

The superior performance of DLAs in this study has three implications. Firstly, a patient’s medical history and recent prescriptions are essential features when estimating the risk of renal failure, and they should be included in any RRT risk estimation algorithms. Secondly, machine learning or deep learning can provide more accurate predictions than traditional linear, logistic, or Cox regression methods. Compared with the traditional methods, including temporal relationships during model development brings an advantage to neural networks. Thirdly, the potential to label unrecorded renal transplants and patients who started dialysis 5 years later has proven the ability of AI algorithms to recognize complex patterns that may not be apparent to humans.

This study offers a significant advantage by providing an enhanced, rapid, and automated prediction of RRT risk in CKD patients, eliminating the need for additional investigations and without disrupting existing protocols or workflows. The entire training and validation process can be conducted locally on a standard laptop computer. The implementation of AI-based DLAs can lead to better decision-making in clinical settings [14]. By accurately identifying patients who are more likely to require Renal Replacement Therapy (RRT), DLAs have the potential to reduce referrals that may not be necessary, particularly in cases where traditional methods might overestimate the risk of disease progression. The scalability of neural network training allows for efficient allocation of resources in local medical centers, optimizing patient care. In the era of growing CKD patient numbers and strained renal services, DLAs offer an objective and practical tool to assess RRT initiation risk, enabling patients to receive extended primary or community-based medical care before referral to specialist services.

To address the generalizability of these tools to primary care settings, where the scope of data collection might differ from that in specialized care. Our study leverages the extensive data available through the Hospital Authority, the sole public primary healthcare provider, which maintains a centralized patient record database ensuring comprehensive data collection and accessibility. This centralized system is instrumental in collecting demographic, biochemical, pharmacological, and ICD-10 code data, which are pivotal for the accuracy and applicability of our models. The interoperability between public and private healthcare sectors significantly enhances the utility of our models across diverse care settings. Private primary care providers have access to data recorded and maintained by public centers, thanks to established data-sharing protocols [15]. This interconnectivity ensures a broader data foundation, which is vital for the effective implementation of predictive models in primary care.

However, the study also has few limitations. Firstly, the neural networks developed in this study are still “black box” in nature, making it difficult for clinicians to explain and build rapport with patients when they need RRT. An explainable prediction by AI will also gain trust from clinicians to be more confident in applying them in clinical practice [16]. Implementing SHapley Additive exPlanations (SHAP) may help address this issue by showing important risk factors to patients in graphics [17].

Secondly, the performance of neural networks relies on the quality and quantity of training data. The data collected from the three medical centers predominantly represent the Chinese population, without recorded ethnicity. Implementing models trained solely on this data may introduce bias when applied to foreign medical centers. The usage of historical data from second tier clinics may introduce concept drift, which limiting the accuracy of the model when applying to a realistic primary care setting [18]. To handle missing data, our study utilized the Last Observation Carried Forward (LOCF) approach, suitable for our healthcare system’s centralized data mechanism. In more decentralized systems, techniques such as Multiple Imputation using Chained Equations (MICE) or Probabilistic Principal Component Analysis (PPCA) should be considered [17].

Additionally, due to ethical constraints, the research team only had access to limited patient information, including ICD coding, prescriptions, and biochemical investigation reports, without consultation records. This may lead to imperfect patient labeling and model training, potentially excluding individuals who received prescriptions, renal transplantations, or hemodialysis in other countries. The low donation and transplantation rate in Hong Kong also limited the number of training data involving renal transplantation, possibly causing bias [19]. Recruiting data from other localities for training would be the most effective solution.

Thirdly, the study was limited by the hardware available, and the neural networks could only be optimized by a randomized hyperparameter optimization algorithm with 30 iterations. Other optimization methods, such as Bayesian algorithms, may produce better models but also consume more computation resources [20].

Lastly, while the data foundation is robust in terms of accessibility and interoperability, challenges remain, particularly in accessing and integrating private primary care data into public health systems. The variation in medication availability between public and private sectors, with public clinics offering a more limited selection within certain medication families, exemplifies the complexities of data integration across different healthcare settings. Addressing this issue is critical for the seamless application of AI-based predictive models in primary care, ensuring that patients across the healthcare continuum benefit from advanced, data-driven care methodologies.

Regarding the integration of our model into existing healthcare systems and handling the concept drift, we emphasize the necessity of specialized knowledge in Machine Learning Operations (MLOps) for managing data and automating processes. We believe that the design of data collection, management, and handling strategies should be a collaborative effort involving clinicians, data scientists, and machine learning experts prior to implementation. The balance between sensitivity and specificity is critical, and determining an appropriate threshold for clinical action is not solely a data science issue; it involves considering the tolerability of the local healthcare system, including manpower and budget constraints. This underscores the necessity for a collaborative approach between clinicians and data scientists to determine thresholds that optimize clinical utility without compromising patient safety. Currently, we are in the process of planning a prospective observational study to further validate the algorithm’s performance in clinical settings. This step is crucial for ensuring that our model not only demonstrates theoretical efficacy but also practical applicability and integration into the existing healthcare infrastructure.

In our study, we trained our algorithm as a classification model. Our intention is to facilitate a clear and intuitive evaluation of our model’s ability to predict high-risk patients requiring Renal Replacement Therapy (RRT). However, we acknowledge that this simplification may not entirely align with the continuous risk assessment provided by the KFRE and provide a suboptimal comparison. Presenting results as a median time or probability to RRT initiation is possible and may offer additional insights. Consequently, we suggest that future research could explore the development of a regression model using the same dataset, which might provide a different perspective on patient risk stratification.

Overall, the study’s findings suggest that DLAs can be a valuable tool in predicting the risk of RRT in CKD patients. The ability of DLAs to identify complex patterns and non-linear relationships among different features can outperform traditional methods, such as linear, logistic or Cox regression models. The study also highlights the importance of including a patient’s medical history and recent prescriptions as key features in risk estimation algorithms.

One potential application of these findings is the development of decision-support tools for clinicians. With the accurate predictions provided by DLAs, clinicians could use these tools to inform their clinical decision-making and improve patient care. For example, a tool that predicts the risk of RRT could help a clinician decide whether to refer a patient to a specialist, initiate specific treatments or implement lifestyle changes. Nevertheless, details of data pipeline design, storage, missing data handling and result interpretation should be openly discussed. Cooperation between data scientists, AI researchers and clinical care collegues are essential to implement AI in modern healthcare.