Finding Data

The COVID-Curated and Open aNalysis aNd rEsearCh plaTform (otherwise known as CO-CONNECT) is a UK wide research programme supporting researchers to find and access COVID-19 data at pace whilst ensuring people’s information is kept private and secure. 

This video explains how the innovative programme is helping researchers to find data whilst protecting patient confidentiality. 

CO-CONNECT is researching methods to enable researchers to determine how many people meet their research criteria within the various datasets across the UK using the Cohort Discovery Search Tool embedded within the Health Data Research Innovation Gateway. 

A Data Partner is an organisation that holds data and is collaborating with the CO-CONNECT programme. 

Within each Data Partner, a securecomputeris set up which is separate from where identifiable data is stored, but still within the Data Partner’s secure environment. Staff within each data partner create a copy of relevant data—with anything they deem to be sensitive or identifiable removed. This is known as pseudonymous data. For example, information like names, addresses, and specific dates of birth, dates of testing or care are removed, and identifiers are converted into new pseudonymous codes. The Data Partner then transfers this pseudonymous data on to the secure computer.  

In our example, all that is transferred by the Data Partner is about Jane Doe is the pseudonymous code, that this person had a positive PCR test on the 2nd week of August 2020 and the year of their birth.   

Software within the secure environment of the Data Partner will send a message out to the Gateway which will return any questions which need to be run on the data.  An example question could be “How many people in the dataset who have had a PCR test which was positive and were under the age of 40.” 

The software will return a summary answer to each question which shows the number of people who meet the criteria, helping researchers discover the most useful data.   

In our example, there were 102 people who met the criteria.  

To protect against an individual being identified by a unique set of characteristics, each data partner must set a minimum limit where no results are returned, for most this means only results with more than 10 people are returned. Also, to protect against someone asking multiple questions and subtracting counts to identify someone, each data partner can enable rounding on results, which ensures they are never exact results but sufficiently precise to allow the researcher to know the scale of the dataset.

In our example, it might therefore say there are 110 people in the dataset instead of the exact number of 102. 

The same question is simultaneously run across all data partners – providing the information from across the UK.The Gateway and CO-CONNECT only ever see anonymous data and so have no way of finding out who a person is. 

Acknowledgements

Delivered in partnership with Health Data Research UK via the Data and Connectivity National Core Study and over 24 partners from across the UK. CO-CONNECT is funded by the Medical Research Council, part of UK Research and Innovation, and the National Institute of Health Research, part of the Department of Health & Social Care.   

For further technical information behind the CO-CONNECT project, please review our technical documentation available here.