Ryan Tan, Tan KV, Wong FY Improving Breast Cancer Risk Prediction in Singapore Women with a Hybrid Machine-Learning Model Abstract Disclosures Abstract Introduction: The incidence of breast cancer is rising rapidly in Asia. Compared to other developed countries, Singapore still has a high breast cancer mortality-to-incidence ratio. Population-based mammographic screening has been shown to be acceptable and effective in many parts of the world. However, only 66% of the main target group of women aged 50 to 69 years in Singapore have ever had a mammogram, and half of them do not return for regular screening. Accurate risk-stratification may persuade high risk women to attend regular screening and improve efficiency in the allocation of health resources. Hypothesis: Current traditional breast cancer risk prediction models (based on clinical factors like menarche, menopause, breastfeeding, obesity, and lifestyle) are not sufficiently discriminatory in identifying high risk individuals who may require enhanced screening. When applied to a Singapore population, the widely-used GAIL model actually over-predicted the risk of breast cancer. Attempts to incorporate breast density, family history and common genetic variants into risk models only modestly improved risk prediction. Thus, novel approaches are needed to build better risk-stratification models to improve the value and cost-effectiveness of mammography screening programmes. Methodology: Retrospective cohort study involving the following procedures: 1. Digital raw data of mammogram screens performed between January 2005 and December 2016, and the clinical breast cancer risk factors with clinical outcomes were obtained from the patients; clinical records between January 2004 and December 2020 will be retrospectively analyzed. 2. Clinical data extracted for analysis include menopausal status, age, height, weight, age of menarche, number of children, age of first live birth, breastfeeding, family history, date of breast cancer occurrence/recurrence, cancer stage, pathological subtype of cancer, and type of breast cancer treatment. 3. Digital mammogram data will be read by an automatic breast density software to quantify breast density. 4. Digital mammogram and the clinical risk factors data will be analyzed using three approaches: a risk-factor-based logistic regression model (RF-LR) that uses traditional risk factors, a mammogram-alone machine-learning (ML) model, and a hybrid ML model that uses both traditional risk factors and mammogram data. Model performance will be compared by using areas under the receiver operating characteristics curve. 5. ML models will involve deep convolutional neutral networks. The mammogram raw data will be extracted and the imaging features will be concatenated with the clinical risk factor vectors, which can be further graded or classified using fully connected networks. Transfer learning and Generative Adversarial Networks might also be used to improve the feature representation. 6. Using the incidence of invasive breast cancers obtained from the clinical data, the models will be trained to predict whether breast cancer will develop within 5 years. Joint Breast Cancer Registry Singapore