The process to allow Data Partners to automatically answer cohort availability questions

CO-CONNECT is researching methods to enable researchers to determine how many people meet their research criteria within the various datasets across the UK using the Cohort Discovery Search Tool embedded within one website, the Health Data Research Innovation Gateway.

A secure computer is set up by each Data Partner which is separate from where identifiable data is stored, but still within the Data Partner’s secure environment. The Data Partner owns and controls that secure computer, it is not accessible by either CO-CONNECT or HDR UK personnel. Staff within each data partner create a copy of relevant data – with anything they deem to be sensitive or identifiable removed. This is known as pseudonymous data. For example, information like names, addresses, and specific dates of birth, dates of testing or care are removed, and identifiers are converted into new pseudonymous codes. The Data Partner then transfers this pseudonymous data on to the secure computer. 

In our example all that is transferred by the Data Partner is about Jane Doe is the pseudonymous code, that this person had a positive PCR test on the 2nd week of August 2020 and the year of their birth.  

Software within the secure environment of the Data Partner will send a message out to the Gateway which will return any questions which need to be run on the data.  An example question could be “How many people in the dataset who have had a PCR test which was positive and were under the age of 40.”

The software will return a summary answer to each question which shows the number of people who meet the criteria, helping researchers discover the most useful data.