Scalability and patient data security in the age of clinical trial recruitment using electronic medical records: an infrastructure solution

Finding suitable patients to participate in clinical trials is a long complicated process taking months today.The emergence of electronic hospital records (EHRs) in a hospital information system (HIS) allows this process to be automated and accelerated.

What system infrastructure best suits the needs of doing this over multiple locations? And given the digital nature of electronic patient information what about the security of personal medical information?We examine the needs and challenges relating to patient recruitment using EHRs and propose a solution involving a hybrid cloud-based infrastructure.

The case for using EHRs for patient recruitment
Pharmaceutical companies developing new medications must ensure that the drugs work as expected. Any prospective new drug must successfully pass a succession of three clinical trials phases before it can be submitted for market approval. These trials represent a significant investment for a pharma company and put immense financial pressure on them.Every day that a drug’s launch is delayed comes at a high cost:missed sales for a blockbuster drug can amount to an estimated $8 million (USD) and up to $600,000(USD) for a less popular product [1].

One of the root causes for the failure of trials is the inability to recruit the specified number of eligible patients in the foreseen time span. An analysis of more than 100 trials established that just under one-third of them did not meet their recruitment target. Half needed to prolong the recruitment period to fulfil the original recruitment target [2].A review of terminated trials registered in in 2013 found that 39% of terminations arose because of failures of accrual (enrollment) [3].

Today eligible patients are still mostly found by manually tapping into the existing pool of patient data archive in cooperation with hospitals. This is highly labor intensive and time consuming and therefore expensive. As a manual process it is far from foolproof and many suitable candidates may be missed.

The growing availability of electronic patient data offers a solution for a more efficient and scalable recruitment process which can also speed up the screening of eligible candidates. With hospitals creating and maintaining live electronic health records (EHRs) hospitals have (1) a digital database of patient information that (2) runs more, or less in real-time. An electronic solution which queries the HIS according to criteria coded into a query obviates the above procedural problems resulting in a recruitment process that takes less time frees up resources and is rigorous in matching patients to protocols.

A centralized data pool in the cloud?
The obvious approach is to build a centralized cloud archive to which data from different hospitals could be sent. Routines could trigger regular updates of patient data from the connected hospitals to the centralized environment. In the cloud a set of virtual machines would be installed per connected site to store and handle the patient data.

There is another obvious approach namely the connection of different EHR systems to a peer-to-peer network. A peer-to-peer network is a network wherein each peer may query data from other peers.However this requires creating network entry points to their node and hospital IT administrators are usually unwilling to open ports to access their EHRs – the risk for hacking the environments is too large. Also they would not allow third party vendors to query an EHR system in use, due to the risk of overloading the system – imagine if a physician could not load patient data in an emergency room because the EHR database was overloaded.

Challenge 1: Scalability
The challenge for a comprehensive electronic solution – one that covers more than just one hospital – is in scaling the system up as more hospitals are added. The fact is that each hospital will have a separate database (and use a different HIS) and it must be possible to add new ones to the query process in a consistent modular way with minimal effort. Pendant to that is the requirement that the system can query multiple hospitals at once in their own “language”: hospitals may use different terminologies and codes for key descriptors such as disease diagnoses, treatments, demographic information, laboratory results and medication prescribed.

Challenge 2: Real-time data
The second challenge is to make full use of the fact that a hospital HIS is generally current and up-to-date – and to leverage this in the overall system so that the patient search can be done in real-time. As an example let’s assume there is a clinical trial that requires patients suffering from pneumonia. The goal is to identify those patients before physicians in a hospital treat them with known medications and to inform them that the patient could participate in a trial. This is not possible in the centralized environment model where the update frequency is usually between once per week and once per day because the data load for sending patient data over the internet is too high. The update of 1000 patient records may easily exceed 500 MB.

Challenge 3: Data security – how to reidentify a patient
The use of electronic patient data also raises concerns about the security of the data encompassed. How safe is the data where it resides from attack? How secure is it from being accessed by unauthorized players?What prevents unauthorized access via hacking?

We will not cover issues of patient privacy here but it should be mentioned that one important requirement for the sharing of patient data is that the patient must not be identifiable. As a general first line of protection therefore patient data should be pseudonymized – a method of de-identification which still allows the hospital itself to reidentify them for recruitment later. As a principle certain hospitals and country legislation may take the position that data should never be sent to centralized instances.

How a hybrid cloud solution meets these challenges
In summary, we require an infrastructure which can easily be modularly expanded across large numbers of hospitals, access data in real-time, keep the patient data secure, and block attempts at hacking.

We propose a hybrid cloud infrastructure with local and independent site installations federated to a private cloud (Figure 1). These consist of two types of installations:nodes in local intranets which host patient data and a private cloud which orchestrates the overall system and holds job requests (e.g. search query requests). Data are only sent via intranet to the local node instead of to a central data warehouse.The cloud delivers queries to the federated nodes at the hospitals and the results the count of patients who fit the criteria – are returned for aggregation in the cloud.

Assuming each node has the data from one hospital adding another hospital means just adding one more node so the overall system is linearly scalable. This federated system makes expansion to cover new hospitals very easy.

In addition, using XEN Server cloud technology allows the system to achieve virtualization –virtual machines can be added whenever needed and thus resource allocation can be optimized. This leads again to easy scalability both of size of patient data per node as well as in the number of parallel requests which can be run. The overall system can easily be scaled to support and process queries over millions of data records.

Real-time data
Continuous data pushes (ETL) of patient data to the local patient nodes would allow this system to work in real time.Taking our previous example of a trial for pneumonia sufferer this would allow patients to be identified on entry into the ER and diverted to a clinical trial before being treated in the normal way.

Data security
There is no direct inbound connection from the cloud to the local server nodes at the hospitals. This means the cloud cannot initiate a connection with the nodes connection with the cloud is at the discretion of the nodes.

No patient data leaves the secure environment. Patient data (pseudonymized, in any case) stays on local intranets and only report data (aggregated counts of query results) are sent from the nodes to the cloud. The cloud itself contains no patient data just “data about data” (metadata) and this means the cloud cannot be hacked for patient data.

Once installed the local server node is controlled by the site’s own IT department. To ensure privacy and security of patient data the latter is pseudonymised and patient IDs can only be re-identified by authorized personnel on a hospital’s premises. Each site controls the pseudonymisation to ensure that no unauthorized re-identification of patient IDs can be performed.

Side benefits of a hybrid cloud system
Besides the particular strategic competitive advantages in the areas of patient privacy and data security and scalability and flexibility use of cloud technologies also contributes to cost savings because it is much cheaper to maintain smaller servers per hospital hosting the patient data compared to maintaining a big centralized system running a data warehouse technology.

Also because the system is built completely on cloud technologies users of the system front-end only need to use a modern web browser to access the system needing no installation of applications.

The outcome is a system which can interoperably query and aggregate patient information across multiple local installations within a network of hospitals across different geographies without compromising patient privacy. The network grows as hospitals join by installing their own local servers– and with it grows the pool of suitable clinical trial recruitment candidates. Meanwhile patient data is secure and protected within the infrastructure of the originating hospitals.


1. ISR-The-Expanding-Web-of-Clinical-Trial-Patient-Recruitment%202014.pdf
2. McDonald AM, Knight RC, Campbell MK, Entwistle VA, Grant AM, Cook JA, Elbourne DR, Francis D, Garcia J, Roberts I, Snowdon C: What influences recruitment to randomised controlled trials? A review of trials funded by two UK funding agencies. Trials. 2006, 7: 9-10.1186/1745-6215-7-9.
3. Williams RJ,Tse T, DiPiazza K,Zarin DA: Terminated Trials in the Results Database: Evaluation of Availability of Primary Outcome Data and Reasons for Termination. May 2015, PLoSONE10(5):e0127242.doi:10.1371/journal. pone.0127242

Author: Andreas Walter

Author Img

He studied computer science at the University Karlsruhe and holds a Ph.D. in computer science, which he received as a result of his work on high quality image search based on semantic technologies at the Research Center for Computer Science in Karlsruhe (FZI).

Andreas lectures at the Karlsruhe Institute for Technology on the topics of information integration and the creation of web-based information systems.

Author: Le Vin Chin

Author Img

Le Vin Chin is Head of Marketing and Communications at Clinerion and has been working in communications and marketing for 20 years in a wide variety of industries including software and services.

Send Enquiry for this story

By submitting this form you are giving a consent to to store your submitted information.
See our Privacy Policy to learn more about how we use data.