Analysis of m6A-regulated genes and subtype classification in lupus nephritis – BMC Nephrology

Profile of m6A regulators in lupus nephritis

We identified a total of 4739 Differentially Expressed Genes (DEGs) between control and lupus nephritis (LN) samples, and further analyzed the differential expression levels of m6A regulator. We found that ZC3H13 and IGFBP1 were lowly expressed in LN, and CBLL1, YTHDC1, YTHDC2, YTHDF2 and HNRNPA2B1 were highly expressed in LN (Fig. 1A and B). The chromosomal locations of m6A regulators were visualized using the “RCircos” package (Fig. 1C). We also explored the m6A regulators for correlation analysis, as shown in Fig. 1D.

Fig. 1
figure 1

Landscape of and correlations between m6A-regulated genes in LN. A Heatmap of the expression of m6A regulatory factors in control and LN; B Histogram of the expression of m6A regulatory factors in control and LN; C Chromosomal location of m6A regulatory factors. D The heatmap of the correlations between m6A regulatory factors. * p < 0.05, ** p < 0.01, *** p < 0.001

Construction of RF and SVM models

We constructed RF and SVM models and the candidate m6A modulators were selected to predict the occurrence of LN. In the RF model, the precision value was 1, the recall value was 1, and the F1 value was 1. In the SVM model, the precision value was 0.987(0.963,1.012), the recall value was 1.000(1.000,1.000), and the F1 value was 0.993. The residual box plots and the inverse cumulative distribution of residuals showed that the RF model had the smallest residuals (Fig. 2A and B), and the AUC values of the ROC curves also showed that the RF model had higher accuracy than the SVM model (Fig. 2C), so we chose the RF model as the best model for predicting LN occurrence. The LN signature genes were screened by RF, and YTHDC1, HNRNPA2B1, CBLL1, ZC3H13, IGFBP1, YTHDC2, and YTHDF2 were selected as candidate genes (Fig. 2D and E).

Fig. 2
figure 2

RF and SVM models construction: A The boxplot of residuals for the RF model and the support vector machine (SVM) model; B The inverse cumulative distribution of residuals for the RF model and the SVM model; C ROC curves for the RF and SVM models; D Random forest plot; E The importance of the 14 differentially expressed m6A regulators in the RF model

Construction of the nomogram model

We used the “rms” package in R to construct a nomogram model based on these 7 important m6A regulators to predict the prevalence of LN (Fig. 3A). The calibration curves showed that the predictions of the nomogram models were accurate (Fig. 3B). Decision curves showed that decisions based on the nomogram model favored LN (Fig. 3C). The clinical impact curve showed that the predictive power of the nomogram model was significant (Fig. 3D).

Fig. 3
figure 3

Construction of the nomogram model: A Construction of the nomogram model based on seven m6A regulator candidate genes; B Calibration curves showing the predictive ability of the nomogram model; C Decision curves showing the decision-making ability of the nomogram model; D Clinical impact curves assessing the clinical impact of the nomogram model

Identification of different m6A patterns

Using the “ConsensusClusterPlus” package in R, a consensus clustering method was used to identify different m6A patterns based on 7 important m6A regulators, and 2 m6A patterns (cluster A, cluster B) were identified (Fig. 4A). Cluster A contained 37 cases and cluster B contained 41. heatmaps and histograms suggested that YTHDC1, HNRNPA2B1, CBLL1, ZC3H13, IGFBP1, YTHDC2, and YTHDF2 were more highly expressed in cluster B than in cluster A (Fig. 4B and C). PCA showed that 7 important m6A regulators could distinguish between these two m6A patterns (Fig. 4D).

Fig. 4
figure 4

Clustering analysis of 7 significant m6A regulators associated with LN. A When k = 2, the consensus clustering analysis was the most reliable. B The heatmap of the differential expression of the 7 m6A-regulated genes between cluster A and cluster B. C The box plot of the differential expression of the 7 m6A regulators between cluster A and cluster. D Principal component analysis (PCA) of cluster A and cluster B. ***P < 0.001

We applied ssGSEA to calculate the abundance of immune cells in LN samples and assessed the correlation between the seven important m6A regulators and immune cells, and we found that most immune cells were significantly more infiltrated in cluster B than in cluster A (Fig. 5A). We performed a correlation analysis between m6A regulatory genes and immune cells and found that these seven important m6A regulators were positively correlated with most immune cells (Fig. 5B). As shown in Fig. 5C-I, the expression of m6A regulators also affected the degree of immune cell infiltration.

Fig. 5
figure 5

Single-sample gene set enrichment analysis (ssGSEA). A Differences in immune cell infiltration between cluster A and cluster B. B Immuno-correlation analysis between m6A-regulated genes and immune cells. C-I Differential immune cell infiltration between groups with lower and higher expression of these 7 m6A-regulated genes

Construction of the m6A gene signature

We screened the differentially expressed genes (DEGs) between different m6A clusters which were the m6A-associated DEGs, and identified 3058 m6A-associated DEGs between the two m6A subtypes (Fig. 6A). We analyzed these DEGs by GO and KEGG enrichment, and in biological processes (BP), DEGs were mainly involved in the organic acid catabolic process, carboxylic acid catabolic process, small molecule catabolic process, etc. In the cellular component (CC), DEG is mainly enriched in apical part of cell, apical plasma membrane, membrane raft, etc. In molecular function (MF), DEG is mainly associated with the apical part of cell, apical plasma membrane, and membrane raft. The molecular function (MF) was mainly related to active transmembrane transporter activity, electron transfer activity, primary active transmembrane transporter activity, etc. (Fig. 6B). KEGG analysis showed that DEG was mainly involved in Valine, leucine and isoleucine degradation, Cholesterol metabolism, Complement and coagulation cascades (Fig. 6C). We categorized patients according to DEG into gene cluster A and cluster B by consensus clustering method (Fig. 6D). YTHDC1, HNRNPA2B1, ZC3H13, IGFBP1, YTHDC2, and YTHDF2 were significantly higher expressed in gene cluster B than in gene cluster A (Fig. 6E). ssGSEA analysis revealed that the infiltration of the majority of the immune cells in gene cluster B was significantly higher than gene cluster A (Fig. 6F).

Fig. 6
figure 6

Consensus clustering analysis of the differentially expressed genes (DEGs). A A total of 3058 DEGs were identified between cluster A and cluster B. B Gene Ontology analysis of these DEGs. C Kyoto Encyclopedia of Genes and Genomes pathway analysis of these DEGs. D When k = 2, the consensus clustering analysis was the most reliable. E The box plot of the differential expression of the 7 m6A regulators between cluster A and cluster B. F Differences in immune cell infiltration between cluster A and cluster B

We scored LN patients based on the expression of m6A regulators using principal component analysis (PCA), defined as m6A scores. m6A scores between m6A clusters and gene clusters differed significantly (Fig. 7A and B). Sankey plots showed the distributions of the two m6A clusters, two gene clusters, and two m6A scores. m6A Cluster A and Gene Cluster A corresponded to low m6A scores, while m6A cluster B and gene cluster B corresponded to high m6A scores (Fig. 7C). As shown in Fig. 7D and E, there were also significant differences in inflammatory factor levels between m6A clusters and between gene clusters. Therefore, the two subgroups associated with m6A (m6A clusters and m6A gene clusters) can help us to predict inflammatory factor levels and disease risk scores in LN patients.

Fig. 7
figure 7

Role of m6A subtypes and m6A gene subtypes in distinguishing LN. A m6A score differences between m6A subtypes. B A m6A score differences between m6A gene subtypes. C Sankey diagram displaying the distribution of LN patients in the two m6A clusters, two gene clusters and two m6A score groups. D The expression levels of inflammatory factors in the two m6A clusters. E The expression levels of inflammatory factors in the two gene clusters. ***P < 0.001