Skip Ribbon Commands
Skip to main content
Menu

Data and Computational Science



Advancements in artificial intelligence (AI) and computational technology have revolutionised the cancer research landscape. To better understand how cancer initiates and progresses, we can interrogate any tumour from multiple perspectives including descriptive health records, different imaging modalities, molecular sequencing, and beyond. In the treatment setting, this data integrated with established knowledge of functional pathways contains crucial information used by clinicians to direct therapy for each cancer patient. In the research setting, the collective data describing thousands of tumours is a powerful resource, which can serve as a valuable reference or background, to uncover discoveries that improve our understanding of cancer and change the direction of cancer treatment in the future. 

The Data and Computational Science (DCS) platform helps to enable these discoveries. Today’s cancer studies frequently generate high volumes of data that require specialised resources and expertise. DCS features high-powered workstations capable of processing these ‘big data’ profiles and running advanced interpretable machine learning algorithms and robust statistical techniques. Our analysts have experience and expertise spanning diverse AI applications, and can guide all stages of data analysis from experimental design, multi-omic analysis, through to inference and translation. DCS offers an in-house and centralised solution for NCCS researchers who require computational analysis without the need to buy specialised equipment or to contract with third-party vendors. In addition to supporting NCCS researchers, DCS also has its own projects focusing on real-time analytics, machine learning and AI applications across three key domains: clinicogenomics, public health, and imaging analysis. 

DCS enables cutting edge research by providing a core platform to analyse big datasets. By providing an in-house computational biology solution, DCS allows researchers to interrogate large amounts of data on-site and in a secure environment. As a resource, we provide several levels of support, from basic data processing to collaborative analyses: i) Pre-processing of raw data to provide analysis-ready data such as mutation calls or gene expression matrices ii) Providing an environment where NCCS researcher code can be run on high-powered computers by DCS analysts iii) Hypothesis-directed or -generating data analyses and visualisations to provide biological insights such as clinical associations to NCCS researchers 

DCS offers transparency and flexibility in the methods used to process and analyse data. In addition to deploying best-practice analysis pipelines, our analysts are also available to guide bespoke analyses according to the research aims of our scientific collaborators. All analyses are conducted in a secure offline computational environment. 

In addition to the computational infrastructure, DCS is building ANCHOR, a pan-cancer research database that will house all clinical information from cancer patients treated in SingHealth healthcare institutions. ANCHOR will describe de-identified and harmonised data from retrospective and prospective patients over a period of 30 years. ANCHOR will serve as a valuable resource to enable clinical analyses focused on population health in Singapore. 

DCS is planning advancements in the key areas of computational infrastructure, data generation and collection, and research output: 

Computational infrastructure 
  • Acquire a new GPU-powered workstation to double our capacity for deep learning and image analysis 
  • Build data pre-processing algorithms to streamline the analysis of novel data profiles including those generated from patient-derived xenograph and single-cell analyses 

Data infrastructure 
  • Implementation of image processing workflows to automatically detect features of interest in radiology and pathology datasets 
  • Expand support of single-cell sequencing pre-processing and analysis pipelines 
  • Develop a research ecosystem that allows the extraction and integration of quality data and metadata through ANCHOR 
  • Implement a flexible high-performance multi-modal data backend that also complies with international standards such as OMOP 

Research 
  • Biomarker discovery in ~1,500 mutational and gene expression profiles across several tumour-types including nasopharyngeal, head & neck, lung, and prostate cancers 
  • Integrative analysis of transcriptomic gene profiling and pathology imaging to identify RNA correlates of slide-based features such as tumour purity and staging 
  • Data mining of population health descriptions, care indicators and care utilization 
  • Comparative effectiveness studies and significance of variability using ANCHOR data 

DCS-associated projects have been presented at international conferences such as ASCO Breakthrough and the International Congress of Genetics. We have also initiated several local and international industry and academic collaborations, including: 

Local
  • GIS – To collaborate on 2 projects: (1) TISUMAP, a 5-year IAF-PP collaboration between 3 National Institutes and 1 Healthcare Cluster (GIS, BII, NCCS, and SingHealth) to leverage on AI for digital pathology; (2) long-read sequencing project in collaboration with Prof. JJ Liu, PI 
  • Increasing centre-wide support for NCCS projects including drug discovery efforts within the Cancer Therapeutics Research Lab (PIs: Professor Gopal Iyer & Associate Professor Daniel Tan) and with clinical trials data analyses (PI: Professor Darren Lim) 

International
  • DecipherBio, Veracyte – To run combined analyses of >100,000 prostate cancer transcriptomes 
  • University of California Los Angeles – To organise faculty exchanges, and training courses for staff at NCCS 
  • Weill Cornell Graduate School of Medical Sciences – To exchange knowledge and continuously improve research data informatics approaches 
  • ArteraAI – To collaborate on digital pathology AI projects to build prognostic models