The Embase project


For many years, Cochrane has been feeding reports of trials from PubMed and Embase into Cochrane’s central database of controlled trials (CENTRAL). This has made CENTRAL an incredibly rich and valuable resource for authors and others trying to identify the evidence. The way that Embase records are fed into CENTRAL changed in 2013 when a new model which included crowdsourcing (the Embase Project) was introduced. Records of possible RCTs and quasi-RCTs from Embase are now identified in two ways:

1. Through an autofeed

2. Through human processing/screening (using a ‘crowd’)

The autofeed

Approximately 2/3 of all the reports of RCTs in Embase are indexed with the EMTREE term RCT or CCT. Every month (around the 20th) we feed these records directly into CENTRAL. So that’s 2/3 of the records we want in CENTRAL identified already.

The crowd approach

The remaining 1/3 is retrieved through a sensitive search strategy developed by Julie Glanville at YHEC. The search (complete strategy available at: http://www.cochranelibrary.com/help/central-creation-details.html) is run every month in Embase via Ovid SP. These records are then screened by a crowd. Anyone can join the crowd and start screening. When someone signs up, they undergo a brief, interactive training module before being able to screen ‘live’ records.

How do we ensure quality in this process?

To be included in CENTRAL, a record is assessed by at least two different screeners. We have evaluated this method and the results show very high levels of accuracy in terms of the crowd’s ability to identify the records we want in CENTRAL, and to reject the records we don’t want.

Progress to date

Our vision is that in the future authors and information specialists will only need to search CENTRAL to find relevant reports of randomized and quasi-randomised trials. We are much closer to this goal now as far as Embase is concerned. We have established the new crowd model and evaluated its accuracy, and we have cleared several years’ worth of records. As of mid-December 2015, the crowd were screening records that were added to Embase in October 2015. The number of records needing human screening roughly doubled in the last year with the introduction of conference records into the crowd process, but despite this, we are closing the small time-lag between the date of publication in Embase and publication in CENTRAL.