Health Data Science

Health Data Science is a very broad growing research field which uses health data for medical research, encompassing informatics, statistics, mathematics, Machine Learning (ML) and Artificial Intelligence (AI) applied to health data

On this page

Within the academic arm of the Health Informatics Centre (HIC) are trying to answer 3 key research questions:

  • How can scalable, research access to routinely collected patient data be enabled whilst maintaining patient trust and confidentially?
  • How can patient data be analysed and visualised to help answer key clinical and research questions?
  • How can advanced algorithms such as ML and AI be trained on patient data for use in clinical care?

We are part of Health Data Research (HDR) UK which is a virtual institute “uniting the UK’s health data to enable discoveries that improve people’s lives”. We are collaborating with 7 other academic groups from across the UK to build the National Phenomics Resource ( which is integrated into the HDR Gateway.

We are developing federated data architectures to support automated, scalable, anonymised access to data for research in a safe secure way including leading both the £4M NIHR/DHSC and MRC funded UK-wide COVID-19 CO-CONNECT project and the HDR UK Scottish Federated Data Project.

Under the Health Data Science theme we are researching how Trusted Research Environments (TREs) (otherwise known as Safe Havens) can be enhanced to support Next Generation capabilities such as processing multiomic data (genomic, proteomic and imaging data), AI and ML, software development and High Performance Computing batch processing and GPU access. This requires research in cybersecurity, high performance computing infrastructures, data management and security and software architecture. This work has been funded through a range of initiatives including the HDR Multiomics data project, the £2.1M NIHR Antimicrobial resistance project, the £3.8M MRC PICTURES Programme and the £7M NIHR INSPIRED Programme.

These multi-omic tissue datasets are ideal resources to understand environmental and genetic causes of disease. By applying computational methods to high dimensional datasets, we extract patterns that implicate genes and tissues in disease development, information that is ever more routinely involved in novel drug development, repurposing of old drugs for new diseases, and has the potential in the future to improve clinical decisions.   

We are interested in how new AI and ML algorithms can be used to support patient care such as detecting dementia in MRI images (MRC PICTURES Programme) and stratifying patients with different types of hypertension in the EU Horizon 2020 ENSAT-HT project.

Principal Investigators

Teaching programme