How Can We Help?
Fingerprinting - Core PrinciplesFingerprinting - Core Principles
What does the Fingerprint Engine do?
Fingerprint Engine is a tool that automatically generates a ‘fingerprint’ of key concepts based on the available text data related to a piece of content in Pure (see the table below for details on what data is used for fingerprinting for each content type). These concepts provide a quick snapshot of the key themes / topics covered by the content and can be used during search and filtering. On the Portal, it can also be used for finding similar researcher profiles.
Fingerprint as shown on a profile page:
Quick navigation to content linked to a fingerprint concept:
Fingerprint concepts as filters on the search page:
Fingerprints on the Person record in the Pure back end:
What concepts are applied through fingerprinting?
The Elsevier Fingerprint Engine applies concepts from two major sources, which can be configured under Administrator > Fingerprint > Vocabularies for fingerprinting service (see below), and include:
- OmniScience - this is the key thesaurus that contains a wide range of subjects and is the basis of fingerprinting technology.
- Keyphrases - this is an addition to the Omniscience thesaurus which includes multidisciplinary key phrases that make the fingerprint more specific. Please note: we do NOT recommend using the keyphrases thesaurus on its own, it works best combined with the OmniScience thesaurus.
- INIS thesaurus - thesaurus focused on the nuclear physics domain. It is created and maintained by the International Atomic Energy Agency and we recommend enabling it only if your institution is specialised in nuclear physics.
Details of these sources is available in this document:
I do not see the Omniscience, Keyphrase and INIS thesaurus in my settings - why is that?
If your institution has not yet migrated from the old fingerprint engine, we recommend updating to the new fingerprint engine (OmniScience) thesaurus as soon as possible.
Once updated, you will be able to select the preferred thesauri to apply in your Pure:
Please note: the fingerprint update typically takes 24 hours to complete.
What content types are fingerprinted?
Please see the table below for details on how the fingerprint is generated for each of the content types. Some of the fingerprints that are seen in Pure are aggregations, which means that they are recalculated and may change after a new piece of content is added.
Please note: we currently only fingerprint content in English (where the English-language description text is not available, no fingerprint will be generated).
Content Type | What is the fingerprint based on |
Research Output |
Title, subtitle and abstract.
Please note: by default, we only fingerprint content where both title and abstract are available. This can be changed in the settings, but we strongly recommend keeping the default approach to improve the accuracy of the generated results. |
Person (Researcher) |
Aggregated: Research Interests and all research outputs and prizes, linked to the person.
When finding the related research outputs for a Person, it is possible to limit the start and end year of the publications taken into account in Pure settings. By default, no time limit is applied, but this can be changed through Administrator > System settings > Fingerprint service > Start and end year config options.
It is also possible to adjust whether research outputs, research interest and prizes are used as part of person fingerprints. To do this, go to Administrator > Fingerprint > Configuration. |
Organisation |
Aggregated: all research outputs, linked to the organisation.
When finding the related research outputs for an Organisation, it is possible to limit the start and end year of the publications taken into account in Pure settings. By default, no time limit is applied, but this can be changed through Administrator > System settings > Fingerprint service > Start and end year config options.
Note: fingerprints from the research outputs produced by the persons within the organisation while they were outside the organisation will NOT be included into the aggregation. |
Research Project |
Aggregated: Project descriptions and all research outputs and awards, linked to the project. Note: all the available descriptions are used for fingerprinting (i.e. if you add multiple types of descriptions, they will all be used for fingerprint generation). |
Equipment |
Aggregated: Equipment descriptions and all research outputs and projects, linked to the equipment (each item is given an equal weight).
Note: all the available descriptions are used for fingerprinting (i.e. if you add multiple types of descriptions, they will all be used for fingerprint generation). If there is no Equipment description text, the fingerprint will be based solely on related research outputs, if any are present. |
Prizes |
Prize title and description
|
Awards |
Award description
|
Research Institution (total, shown on the Portal Home Page) | Aggregated: all Research outputs from the past 5 years |
It is possible to choose whether fingerprinting results are shown in Pure and/or on the Pure Portal. You can also choose whether the fingerprint is shown on the individual page (i.e. on the Institution’s landing page).
While the fingerprint is generated automatically, it is possible to edit it, making the result more precise. Fingerprint concepts can be removed by going into Pure and reviewing the fingerprint on each content record. If you remove concepts from the list, they will no longer be shown on the given content piece. The content will also not be returned when a search is performed for a previously excluded fingerprint concept.
Please note: it is not possible to manually add concepts to the fingerprint. However, "Research Interests" can be expressed on the Person's profile page as a way of indicating further concepts that the Person is involved in.
Furthermore, content with all levels of visibility is fingerprinted and taken into account when creating aggregated fingerprints. This means that if a researcher has five research outputs, only three of which are public (i.e. shown on the Portal), ALL five will be taken into account when creating the researcher's fingerprint.
Where do the concepts shown in the fingerprint come from?
In a nutshell, the Elsevier Fingerprint Engine uses the following steps to create a fingerprint:
Here is a more detailed view of how the Fingerprint is generated:
The Elsevier Fingerprint Engine applies a variety of Natural Language Processing (NLP) techniques to mine the text provided (i.e. description, publication title and abstract) for the key concepts. The key concepts that define the text are then identified in thesauri spanning all the major scientific disciplines, helping us to identify:
- the key scientific area(s) to which the content relates
- the main concepts (those achieving highest ranking) that should be included into the fingerprint for this content
Within each fingerprint, concepts are ordered and given a weight. This weight indicates how relevant each concept is to the text that was fingerprinted. They are calculated based on the frequency with which each concept is mentioned and based on where they are found (title or description). Please note that the highest weight is always given to concepts found in titles, and to the most frequent concept found in the remainder of a text; weights for the remaining concepts are calculated based on their relative frequency.
If needed, several fingerprints are aggregated. To do this, we follow the process below:
- All the individual relevant fingerprints are added together into one fingerprint with all the concepts and thesauri used by each publication combined.
- The ranks for reoccurring fingerprint concepts are added together (this would normally result in a ranking well over 100% for the most popular fingerprint concepts).
- The fingerprint is normalised, so that the highest ranked concept is now considered to be 100% and all other concepts are then compared to this concept.
The resulting fingerprint is then displayed on the Pure back end and Pure Portal (where relevant).
You can find out more about the technology behind the Fingerprint Engine here.
Frequently Asked Questions
How do I optimise my content for fingerprinting?
In order to generate the fingerprint, the content needs to have a title and a description (abstract) providing an overview of the content piece.
At the moment, there is no restriction on the number of words that must be included, but our tests show that a title of 5-15 words and a description (abstract) of 150–300 words would produce a good fingerprint. Please note that there is no need to shorten your description if it is longer and the system will take all of the available content into account (not just the first 300 words).
How can a fingerprint change over time?
Fingerprints shown in Pure can change over time for two key reasons:
- This is an aggregated fingerprint: when a new component for an aggregated fingerprint is added, the aggregation is recalculated. For example, the researcher’s fingerprint is an aggregation of the fingerprinting results for their research interests and all research outputs and prizes linked to the person. When a new research output is added, the fingerprint for the person is recalculated. This may lead to new fingerprint concepts appearing in the person’s fingerprint, a new subject matter area being added in, or some of the fingerprint concepts gaining more or less importance (weight). This is the most frequent reason for a change in fingerprint.
- There is an update in the thesaurus version: major updates to the thesaurus are not frequent, and are mainly focused on adding new terms or optimising the existing vocabulary (e.g. removing the terms that are no longer used, replacing terms with a better alternative). An update to the thesaurus may lead to a change in the resulting fingerprint.
Does Elsevier use my fingerprint in any way? Where is this data stored? Is it linked to any personally identifiable information?
Pure sends text fragments that describe subject matter (i.e. Title and Abstract for research output, research project description, researcher’s research interest descriptions) to the Fingerprint Engine, which returns sets of weighted concepts that were mentioned in those fragments (known as fingerprints). This fingerprint is then stored in Pure and can be displayed on the Pure Portal.
Where needed, fingerprints are aggregated to Researcher or Institution level. This process happens entirely within Pure and does not involve the Fingerprint Engine.
Please note: no personally identifiable information is passed on to the Fingerprint engine as it operates on research text fragments only. Information such as researcher names and titles, organisation and project names are NOT, therefore, submitted to the engine and are stored only in Pure. When aggregations are created, no personally identifiable information is used either.
Does Elsevier Fingerprint Engine store or use my data?
When text is sent to the Fingerprint Engine, the resulting fingerprint is returned without anything being stored by the Fingerprint Engine in the process.
The concept extraction by the Fingerprint Engine itself is periodically tested and improved using test data. This process is completely independent from Pure. Incidental extraction errors reported by Pure customers are much appreciated and allow us to deliver high quality fingerprints and further improve accuracy and quality. An error report (as submitted through JIRA) typically results in the Elsevier Fingerprints team adding an example sentence that triggers the extraction error to the evaluation set, to test whether subsequent engine improvements have resolved the error. For this evaluation set, no customer reference is stored, only the offending text fragment.
I want to exclude certain terms, how can I do that?
There are two ways in which you can exclude certain concepts from your results: exclude specific terms from an individual record or exclude word types from the entire Pure.
To exclude fingerprint concepts from an individual record, login to pure and edit the record you want to modify. In the Fingerprints section you can remove unwanted fingerprint concepts and they will no longer be shown for this record.
It is also possible to exclude certain types of fingerprint concept - for example, names of places or people - from ALL of your fingerprinting results.
We exclude generic terms from all Fingerprinting results by default. You do not need to do anything if you are happy with the results; however, we encourage all customers to review the exclusion list settings in case they would prefer to block some additional lists. You can do this at Administrator > Fingerprint > Configuration.
Please note: if you change the settings, the content will be re-fingerprinted and then re-published to portal, so it may take some time for the content to appear on the Portal.
The lists of excluded terms are maintained by the Omniscience thesaurus team and are expanded as per customer feedback. If you have noticed a term that you believe should be included in one of the lists and does not appear in the results, please submit your suggestion via JIRA.
You can view the Sensitive Topics exclusion list (turned on by default) below. Please note: this list is as of March 2023; a newer version may be available.
Updated at July 27, 2024