PubMed Text Data Mining Automation for Biological Validation on Lists of Genes and Pathways

Hui Wen Nies; Zalmiyah  Zakaria; Weng Howe Chan; Izyan Izzati  Kamsani; Nor Shahida  Hasan

doi:10.11113/ijic.v12n1.313

Authors

Hui Wen Nies School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia https://orcid.org/0000-0003-4521-1648
Zalmiyah Zakaria School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia
Weng Howe Chan School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia
Izyan Izzati Kamsani School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia
Nor Shahida Hasan Malaysia-Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Kuala Lumpur 54100, Malaysia

DOI:

https://doi.org/10.11113/ijic.v12n1.313

Keywords:

PubMed, text data mining, biological validation, cancer markers, diseases, genes, pathways

Abstract

A prognostic cancer marker is helpful in oncology to identify the abnormal cancer cells from the collected sample. This marker can be used as an indicator to determine a disease outcome, cancer treatment, and drug discovery. Identifying cancer markers is also beneficial to improve cancer patients' survival rate in receiving the treatment decision-making. Cancer markers can be determined by manually testing every gene or pathway in the wet lab or using the text mining automation method. The use of text mining techniques effectively investigates hidden information and gathers new knowledge from many existing sources. Unfortunately, querying relevant text to excavate important information is a challenging task. PubMed text data mining is one of the applications that help explore potential cancer markers as the trend of scientific articles in PubMed is steadily increased. Besides, it can support biologists to concentrate on the identified small set of genes or pathways. PubMed identifiers (PMIDs) are then obtained as evidence to ascertain the relationship between diseases and genes (or pathways) used as biological validation. Thus, this technique can discover the biological relationship between disease and genes or pathways. The existing method is commonly manually curated for the biological validation of genes and pathways. Manual curation takes time in the process and may lead to inconsistency. This study aims to automate the process of biological validation of genes and pathways for PubMed text data mining. Therefore, the PubMed text data mining automation was invented to link to the websites for saving time instead of manually. A list of genes and pathways from breast cancer are used in this study. Using PubMed text data mining automation for biological context verification and validation, p53 signaling pathway and TP53 gene as prognostic cancer markers for breast cancer. Hence, the p53 signaling pathway and TP53 are associated with the development of tumour cells and DNA damage after irradiation in breast cancer.

PubMed Text Data Mining Automation for Biological Validation on Lists of Genes and Pathways

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

IJIC