Accessing and Analysing Data

The COVID-Curated and Open aNalysis aNd rEsearCh plaTform (otherwise known as CO-CONNECT) is a UK wide research programme supporting researchers to find and access COVID data at pace whilst ensuring people’s information is kept private and secure. 

This video explains how the innovative programme is helping researchers to access and analyse data whilst protecting patient confidentiality. 

When researchers want to undertake any analysis of data, they must always apply to each data partner for permission. A Data Partner is an organisation who holds data and is collaborating with the CO-CONNECT programme. 

CO-CONNECT does not intervene with or define the data access procedures at each data partner. Researchers can use the same approval process across all data partners who implement the standard. 

CO-CONNECT is supporting two methods for researchers to analyse data:

Federated Analysis

In collaboration with Data Partners, CO-CONNECT is researching federated analysis. This is where the data remains within the Data Partners’ secure environment, but questions about the data are sent through the Health Data Research Innovation Gateway website and summary results returned. Building upon the functionality of the method to find data using the cohort discovery search tool as described in the CO-CONNECT: Finding Data video, this method would provide more complex trend analysis rather than simple counts. Data Partners must approve all federated analysis research projects before any analysis can take place.  

For example, the question could be “How does the number of people with a positive PCR test change with age”. The trend returned would return the number of people with a positive PCR test grouped by age range. 

 

To protect against an individual being identified by a unique set of characteristics, each data partner must set a minimum limit where no results are returned, for most this means only results with more than 10 people are returned. Also, to protect against someone asking multiple questions and subtracting counts to identify someone, each data partner can enable rounding on results, which ensures they are never exact results but sufficiently precise to allow the researcher to know the scale of the dataset.  

Analysis within a TRE

The other method is for detailed data to be accessed within a Trusted Research Environment (TRE). In collaboration with Data Partners, CO-CONNECT are researching a new capability to speed up the process where Data Partners generate a subset of data for research analysis by utilisation of a semi-automated method. This method helps Data Partners to create this subset of their data based on the cohort definition used with the Cohort Discovery Search Tool. For example, “A PCR test which was positive and were under the age of 40”.

The generated subsets of data from across different Data Partners that have given permission would be placedintowhat is calleda Trusted Research Environment. Trusted Research Environments, also known as TREs, are secure IT systems that researchers can remotely connect to and ask research questions on pseudonymised data. Pseudonymised data is data with identifiable data removed, for example names and addresses are excluded. The data cannot be copied or removed from the Trusted Research Environment and researchers can only export answers such as a graph.   

The copying of a pseudonymous subset of data relevant to a specific research project into a single TRE only takes place once researchers have been granted permission by each Data Partner and all Data Partners agree which single TRE to utilise.  

CO-CONNECT is enabling thedata that is transferred into the Trusted Research Environment to becleaned andtransformed into a standardised format prior to extraction. Thisstandardised formatreducestheamount of effort oneachsingleresearch project, supportingrapidresearch tohelpthegovernmentandpolicy makers to takedecisions.  

Linking Across Data Sets

CO-CONNECT are also researching a new capability for Data Partners to provide data in a form that enables researchers to understand which data is from the same individual across different data partners – without compromising patient confidentiality. In collaboration with Data Partners, CO-CONNECT are investigating a standardised process to create pseudo-anonymised IDs within each Data Partner. An individual’s data would be given the same pseudonymous code across the data partners enabling multiple data sets from an individual to be joined together – this is called linking.  

Before the data is extracted for aresearch project,thepseudonymous code used across Data Partners would be converted intoa new unique identifier for that specific project. Researcherswill be completely unaware of who the information is about. 

This research is important as linking data in this anonymous safe way could enable rapid, powerful research. For example, COVID-19 results from a test centre could be linked to hospital records along with prescriptions from pharmacies. Researchers could understand whether people with different existing health conditions are more or less susceptibleto COVID-19impacting healthcare treatments.  

The semi-automated data generation process and the standardised process to create pseudo-anonymised IDs are research areas of CO-CONNECT which are not currently implemented and work is continuing to establish the feasibility and acceptability of this approach.  

Acknowledgements

Delivered in partnership with Health Data Research UK via the Data and Connectivity National Core Study and over 24 partners from across the UK. CO-CONNECT is funded by the Medical Research Council, part of UK Research and Innovation, and the National Institute of Health Research, part of the Department of Health & Social Care.   

For further technical information behind the CO-CONNECT project, please review our technical documentation available here.