High quality citations: The journey from publication to your website


Min Read

In this blog:

  • What is unique about our data collection pipeline?
  • How do we ensure the high quality of our product citations and published images?
  • How are our accurate citations used in the new Citation Widget?

At CiteAb, we’ve developed industry-leading text mining technology and combined this with extensive human reviewing to enable us to collect product citations for any reagent type or instrument cited in publications.

These citations power our newly launched widget, as well as our reagent search engine for researchers and data services for reagent suppliers, financial companies and biotech/biopharma companies. 

In our blog today, we take a step back to give you an overview of how we collect product citations and how this pipeline ensures their quality.

So, let’s start at the beginning:

Why is citation accuracy important?

Displaying the correct citations for scientists to view when evaluating reagents is critical; if incorrect citations are shown, many issues can arise:

  • Users of our search engine may buy unsuitable products, wasting their time and funding
  • Our market data partners may raise the wrong products for their portfolio
  • Reagent suppliers who use our citation widget may lose customer trust

For these reasons, and many more, we have never compromised on citation quality being a core focus. 

How do we collect high quality citations?

1. Text-mining technology is applied to scientific publications

Our development team is a key driving force of CiteAb, improving our proprietary text-mining technology over the past ten years. The aim of this tech is to effectively identify and understand how products are used in the scientific literature.

To do so, we identify potential citations before applying both machine learning classifiers and human reviewing to identify the correct ones and eliminate the false positives. We can also look for things such as clonality and conjugation to help us identify the product of interest.

To add value to this information for scientists who use our reagent search engine, we text mine for experimental information such as application and reactivity. We are also able to collect published images for products, which are cropped manually to ensure quality. 

This unstructured free text within millions of pre-prints, open-access and subscription-only publications (due to a number of partnerships we’ve built with leading academic publishers such as Springer Nature and Wiley) is curated in our database. Ultimately, this process transforms the unstructured information within the publications into structured data of value. 

What we think is particularly exciting about our text-mining technology is that we apply it to anything cited in the literature, meaning we can collect citations for all cited products, or even things such as software and instruments.

2. Human reviewing ensures accuracy

Once we have identified the initial citation, the next step is human reviewing –  a core part of our data collection pipeline which helps us to achieve our industry-leading >99% accuracy rates. 

Our reviewing team of scientists help to spot errors, and confirm any citations that aren’t clear from our tech. Importantly, if we can’t be sure the citation is correct, it won’t be added to our database. This is a fundamental step in our data collection which ensures the highest quality of citations. On top of this, it’s critical in training our AI and machine learning classifiers, by providing a huge and high quality training set of over 1 million citations. 

CiteAb data collection pipeline combining AI and human reviewing to collect citations of the highest quality from the literature

Citations direct to webpages

These citations are then ready for our search engine, and our partners’ webpages!

We are able to deliver citations in many different formats to suit specific needs; they can be in an API or excel file, or our recently redeveloped citation plug-in tool: the widget.

We’ve relaunched the widget to include more features to enable our partners to better showcase how their products have been used in research and help to drive sales. The widget updates citations directly on webpages in real-time.

We’ve also made sure that the widget is very easy to install, requiring no extensive tech integration. There are now three tiers available:

  • Product widget: Showcasing product citations, with the articles linked.
  • Product and images widget: This includes product citations as well as published images for the product.
  • Product, images, infographics widget: This tier includes all of the features above, as well as infographics to show how the product has been used in different applications, reactivities and research areas.
Example of a tier 3 citation widget, displaying infographics, published images and product citations
Example of a tier 3 citation widget, displaying infographics, published images and product citations

Interested in finding out more?

You can visit our website to learn more about our citation service, or get in touch with one of the team who will be more than happy to chat about our data and widget!

  • Skye and the CiteAb team
About the author

Join thousands of people who already enjoy the CiteAb newsletter

To keep up to date with the latest developments to our search engine, news from our life science market data analysis and improvements to our citation provision.