Machine Learning Approach for the Prediction of Bladder Cancer Stages Based on Next-Generation Sequencing Data
DOI:
https://doi.org/10.26437/hsea1s73Keywords:
Bioinformatics. bladder cancer. machine learning, next-generation sequencing. RNAseqAbstract
Purpose: The purpose of this paper is to apply Machine learning algorithms for the classification of various stages of bladder Cancer (BCa) based on RNA-Seq transcriptome per million(TPM) gene counts data and its corresponding pathological stages from the TCGA database. The objective is to assess classification performance across different stages.
Design/Methodology/Approach: This study applied a computational research design on publicly available BCa gene expression data from The Cancer Genome Atlas (TCGA). Multiple supervised machine learning predictive modelling algorithms were trained and evaluated, with a nested cross-validation design. A forward feature selection technique was used to select the best features for ML classifiers, in conjunction with 3-fold nested cross-validation (nCV), applied to binary classification using machine learning algorithms. The dataset preprocessing was carried out in two phases using the R and Python programming languages.
Research Limitation: Reliance on downloaded data raises concerns about the data generator’s bias.
Findings: This study suggests that TPM profiles of bulk RNA-seq samples are unreliable for separating adjacent stages of bladder cancer. These findings suggest that bulk transcriptomic data should not be used solely to inform treatment decisions for bladder cancer. Rather, it will be more informative to integrate molecular subtyping with multi-omics data or to make models that can directly predict clinical outcomes.
Practical Implication: In practical terms, these findings suggest that bulk RNAseq TPM transcriptomic data should not be solely relied on for staging bladder cancer in clinical or predictive settings. Instead, more informative approaches such as combining molecular subtypes, integrating multi-omics data, or focusing on models that predict clinical outcomes are likely to provide greater value for decision-making and future research.
Social Implication: This highlights the effect of over-relying on AI diagnostics that do not capture the full biological characteristics, which is essential for protecting patient safety.
Originality/Value: This research examined the application of machine learning algorithms to predict bladder cancer stages using RNA-seq TPM gene-count NGS data from the TCGA database, a method that researchers have not previously considered.
References
Bosserhoff, A., & Kappelmann-Fenzl, M. (2021). Next generation sequencing (NGS): What can be sequenced? In M. Kappelmann-Fenzl (Ed.), Next generation sequencing and data analysis: Learning materials in biosciences. Springer. https://doi.org/10.1007/978-3-030-62490-3_1 DOI: https://doi.org/10.1007/978-3-030-62490-3
Cruz, J. A., & Wishart, D. S. (2006). Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2. https://doi.org/10.1177/117693510600200030 DOI: https://doi.org/10.1177/117693510600200030
Ferlay, J., Ervik, M., Lam, F., Colombet, M., Mery, L., Piñeros, M., Znaor, A., Soerjomataram, I., & Bray, F. (2023). Global cancer observatory: Cancer today. International Agency for Research on Cancer. https://gco.iarc.fr/
Garapati, S. S., Hadjiiski, L., Cha, K. H., Chan, H. P., Caoili, E. M., Cohan, R. H., Weizer, A., Alva, A., Paramagul, C., Wei, J., & Zhou, C. (2017). Urinary bladder cancer staging in CT urography using machine learning. Medical Physics, 44(11), 5814–5823. https://doi.org/10.1002/mp.12510 DOI: https://doi.org/10.1002/mp.12510
Goutas, D., Tzortzis, A., Gakiopoulou, H., Vlachodimitropoulos, D., Giannopoulou, I., & Lazaris, A. C. (2021). Contemporary molecular classification of urinary bladder cancer. In Vivo, 35(1), 75–80. https://doi.org/10.21873/invivo.12234 DOI: https://doi.org/10.21873/invivo.12234
Guo, C. C., Bondaruk, J., Yao, H., Wang, Z., Zhang, L., Lee, S., Lee, J. G., Cogdell, D., Zhang, M., Yang, G., Dadhania, V., Choi, W., Wei, P., Gao, J., Theodorescu, D., Logothetis, C., Dinney, C., Kimmel, M., Weinstein, J. N., McConkey, D. J., & Czerniak, B. (2020). Assessment of luminal and basal phenotypes in bladder cancer. Scientific Reports, 10(1), 9743. https://doi.org/10.1038/s41598-020-66747-7 DOI: https://doi.org/10.1038/s41598-020-66747-7
Haque, A., Engel, J., Teichmann, S. A., & Lönnberg, T. (2017). A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Medicine, 9, 75. https://doi.org/10.1186/s13073-017-0467-4 DOI: https://doi.org/10.1186/s13073-017-0467-4
Hong, M., Tao, S., Zhang, L., Diao, L.-T., Huang, X., Huang, S., Xie, S.-J., Xiao, Z.-D., & Zhang, H. (2020). RNA sequencing: New technologies and applications in cancer research. Journal of Hematology & Oncology, 13, 166. https://doi.org/10.1186/s13045-020-01005-x DOI: https://doi.org/10.1186/s13045-020-01005-x
https://doi.org/10.1016/j.csbj.2014.11.005 DOI: https://doi.org/10.1016/j.csbj.2014.11.005
https://doi.org/10.1093/bib/bbz081 DOI: https://doi.org/10.1093/bib/bbz081
Huang, Z. (2021). Comparison of mutual information-based feature selection method for biological omics datasets. In Proceedings of the 8th International Conference on Soft Computing & Machine Intelligence (ISCMI) (pp. 60–63). IEEE. https://doi.org/10.1109/ISCMI53840.2021.9654940 DOI: https://doi.org/10.1109/ISCMI53840.2021.9654940
Kamoun, A., de Reynies, A., Allory, Y., Sjödahl, G., Robertson, A. G., Seiler, R., ... & Weinstein, J. (2020). A consensus molecular classification of muscle-invasive bladder cancer. European urology, 77(4), 420-433. DOI: https://doi.org/10.1016/j.eururo.2019.09.006
Kobak, D., & Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nature Communications, 10(1), 5416. https://doi.org/10.1038/s41467-019-13056-x DOI: https://doi.org/10.1038/s41467-019-13056-x
Kong, C., Zhang, S., Lei, Q., & Wu, S. (2022). State-of-the-art advances of nanomedicine for diagnosis and treatment of bladder cancer. Biosensors, 12(10), 796. https://doi.org/10.3390/bios12100796 DOI: https://doi.org/10.3390/bios12100796
Kourou, K., Exarchos, K. P., Papaloukas, C., Sakaloglou, P., Exarchos, T., & Fotiadis, D. I. (2021). Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis. Computational and Structural Biotechnology Journal, 19, 5546–5555. https://doi.org/10.1016/j.csbj.2021.10.006 DOI: https://doi.org/10.1016/j.csbj.2021.10.006
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., & Fotiadis, D. I. (2015).
Kumar, Y., Gupta, S., & Singla, R. (2022). A systematic review of artificial intelligence techniques in cancer prediction and diagnosis. Archives of Computational Methods in Engineering, 29, 2043–2070. https://doi.org/10.1007/s11831-021-09648-w DOI: https://doi.org/10.1007/s11831-021-09648-w
Machine learning applications in cancer prognosis and prediction: A systematic review. Computational and Structural Biotechnology Journal, 13, 8–17. DOI: https://doi.org/10.1016/j.csbj.2014.11.005
Parvandeh, S., Yeh, H. W., Paulus, M. P., & McKinney, B. A. (2020). Consensus features nested cross-validation. Bioinformatics, 36(10), 3093–3098. https://doi.org/10.1093/bioinformatics/btaa046 DOI: https://doi.org/10.1093/bioinformatics/btaa046
Qbal, M. J., Javed, Z., & Sadia, H. (2021). Clinical applications of artificial intelligence and machine learning in cancer diagnosis: Looking into the future. Cancer Cell International, 21, 270. https://doi.org/10.1186/s12935-021-01981-1 DOI: https://doi.org/10.1186/s12935-021-01981-1
Saginala, K., Barsouk, A., Aluru, J. S., Rawla, P., Padala, S. A., & Barsouk, A. (2020). Epidemiology of bladder cancer. Medical Sciences, 8(1), 15. https://doi.org/10.3390/medsci8010015 DOI: https://doi.org/10.3390/medsci8010015
Shastry, K. A., & Sanjay, H. A. (2020). Machine learning for bioinformatics. In K. Srinivasa, G. Siddesh, & S. Manisekhar (Eds.), Statistical modelling and machine learning principles for bioinformatics techniques, tools, and applications: Algorithms for intelligent systems. Springer. https://doi.org/10.1007/978-981-15-2445-5_3 DOI: https://doi.org/10.1007/978-981-15-2445-5_3
Song, H., Yang, S., Yu, B., Li, N., Huang, Y., Sun, R., Wang, B., Nie, P., Hou, F., Huang, C., Zhang, M., & Wang, H. (2023). CT-based deep learning radiomics nomogram for the prediction of pathological grade in bladder cancer: a multicenter study. Cancer imaging : the official publication of the International Cancer Imaging Society, 23(1), 89. https://doi.org/10.1186/s40644-023-00609-z DOI: https://doi.org/10.1186/s40644-023-00609-z
Tisoc, M., Marcelo, B., & Jhosep. (2022). Mutual information: A way to quantify correlations. Revista Brasileira de Ensino de Física, 44. https://doi.org/10.1590/1806-9126-rbef-2022-0055 DOI: https://doi.org/10.1590/1806-9126-rbef-2022-0055
Toh, C., & Brody, J. P. (2021). Applications of machine learning in healthcare. Smart Manufacturing: When Artificial Intelligence Meets the Internet of Things, 65. DOI: https://doi.org/10.5772/intechopen.92297
TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data. Briefings in Bioinformatics, 21(6), 2223–2234.
Wang, Q., Armenia, J., Zhang, C., Penson, A. V., Reznik, E., Zhang, L., Minet, T., Ochoa, A., Gross, B. E., Iacobuzio-Donahue, C. A., Betel, D., Taylor, B. S., Gao, J., & Schultz, N. (2018). Unifying cancer and normal RNA sequencing data from different sources. Scientific data, 5, 180061. https://doi.org/10.1038/sdata.2018.61 DOI: https://doi.org/10.1038/sdata.2018.61
Wang, Y., Mashock, M., Tong, Z., Mu, X., Chen, H., Zhou, X., Zhang, H., Zhao, G., Liu, B., & Li, X. (2020). Changing technologies of RNA sequencing and their applications in clinical oncology. Frontiers in Oncology, 10, 447. https://doi.org/10.3389/fonc.2020.00447 DOI: https://doi.org/10.3389/fonc.2020.00447
Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C., & Stuart, J. M. (2013). The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genetics, 45(10), 1113–1120. https://doi.org/10.1038/ng.2764 DOI: https://doi.org/10.1038/ng.2764
Wigner, P., Grębowski, R., Bijak, M., Saluk-Bijak, J., & Szemraj, J. (2021). The interplay between oxidative stress, inflammation and angiogenesis in bladder cancer development. International Journal of Molecular Sciences, 22(9), 4483. https://doi.org/10.3390/ijms22094483 DOI: https://doi.org/10.3390/ijms22094483
Xu, X., Xie, Z., Yang, Z., Li, D., & Xu, X. (2020). A t-SNE based classification approach to compositional microbiome data. Frontiers in Genetics, 11, 620143. https://doi.org/10.3389/fgene.2020.620143 DOI: https://doi.org/10.3389/fgene.2020.620143
Zhao, Y., Li, M. C., Konaté, M. M., Chen, L., Das, B., Karlovich, C., Williams, P. M., Evrard, Y. A., Doroshow, J. H., & McShane, L. M. (2021). TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository. Journal of translational medicine, 19(1), 269. https://doi.org/10.1186/s12967-021-02936-w DOI: https://doi.org/10.1186/s12967-021-02936-w
Downloads
Published
Issue
Section
License
Copyright (c) 2026 AFRICAN JOURNAL OF APPLIED RESEARCH

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
By submitting and publishing your articles in the African Journal of Applied Research, you agree to transfer the copyright of the Article from the authors to the Journal ( African Journal of Applied Research).