Real-Time Identification and Tracking of Pathogens via 'Big Data' Machine Learning

Project summary

The UK Chief Medical Officer noted in 2013 that ‘infectious disease is as great a threat to national security as climate change’ and that ‘the challenge in identifying future threats is not the acquisition of data, but in using these huge databases with new computer science methods’. Existing methods for detecting and tracking pathogens (such as E. coli and MRSA in the developed world, and tuberculosis in the developing world), and for determining the susceptibility of pathogens to antibiotic drugs, are based on lab-based ‘phenotypical’ tests, which can take up to six weeks to perform. With whole-genome data now becoming available from patient blood samples at the point-of-care, there is an urgent need to develop novel methods that can run in real-time, using genomic data to identify and track pathogens. Such a system could provide real-time support to clinicians to help respond rapidly to novel threats, and to better deploy antibiotics in the battle against killer pathogens.

The focus of this project was to link clinical experts in the Oxford University Hospitals NHS Trust (OUH NHS Trust) with data-management specialists from Computer Science and machine learning specialists from Engineering Science. The clinical team at the OUH NHS Trust collaborating with this project has one of the world’s largest collections of real-time, disparate healthcare datasets of its kind. The data provided by the clinical collaborators, in anonymous form, is being used for the development of novel machine-learning tools based on large-scale Bayesian ‘big data’ modelling. The project leveraged this unique data resource for better understanding the acquisition and propagation of infectious disease throughout the healthcare system, to show that such methods can be used for earlier identification of new threats (‘novelty detection’ of pathogens), and lead to improved patient outcomes, with a reduction in morbidity and mortality from infection. It is expected that the methods used for tracking and combating tuberculosis will be of particular relevance to healthcare systems in developing nations, where mortality from this pathogen is high. This project was the first of its kind, in which novel machine methods were developed for real-time analysis of data feeds acquired from within the OUH NHS Trust.

Lead investigator

Dr David A. Clifton, Balliol College and Institute of Biomedical Engineering, University of Oxford

Research team

 Professor Gari D. Clifford, Department of Medical Sciences, Emory University

 Professor Derrick Crook, Nuffield Department of Medicine, University of Oxford

 Professor Tim Peto, Oxford University Hospitals NHS Trust

 Professor Jim Davies, Department of Computer Science, University of Oxford

Katherine Niehaus, Department of Engineering Science, University of Oxford

Contact details for enquiries

Please email the lead investigator, David Clifton, for any queries regarding this project.