How Can We Help?
Antivirus integration: self-hosted customersAntivirus integration: self-hosted customers
Background
From 5.25.0 on, Pure is able to display information about the health status of files scanned by an antivirus product. See Security: antivirus scan, status badges and download blocker for the feature overview.
Integrating this feature with a self-hosted Pure requires some work on the client side. This guide shows how Pure can be set up to read data from an external virus product using .av files and S3 tags, and provides an example integration.
- Any code shown on this page should be considered as is. Elsevier is not responsible for any issues that it might cause, and we have no commitment to supporting or maintaining it.
- There are numerous antivirus products on the market. We cannot guarantee that the information Pure needs can be extracted programmatically from all such products.
Supported storage
Pure interfaces with the following 4 types of storage. The table below shows which types support and require antivirus integration with Pure.
Storage | Supports antivirus integration |
---|---|
Local file storage | Yes |
Local file storage for temporary files | No |
S3 | Yes |
Any LTP (DSpace, EPrints, Fedora,, etc.) | No |
Files reside in "Local file storage for temporary files" until they are moved to their permanent location, which happens continuously. While they are in "Local file storage for temporary files", they are "Pending" scan, and will not be available for download from Pure if Pure is configured to block pending files.
Files in "Any LTP (DSpace, EPrints, Fedora, etc.)" are outside the scope of Pure and are the responsibility of the owner of the store.
Scanning files in a local store
Binary files in a local store are stored on disk in the folders that are configured in Administrator > Storage > File Storage > Local file storage.
This section describes how to generate .av files to inform Pure about antivirus scanning results. An 'av' file is a JSON document that contains enough information to allow Pure to block download of infected files.
Pure uses two on-disk files for each binary file stored in a local store:
- a 'bin' file containing the binary data uploaded to Pure
- a 'status' file used to manage the 'bin' file
A file stored in a local store is assigned a unique ID, here referred to as 'xxx-yyyy-zzzz'. Pure will in effect store a 'xxx-yyyy-zzzz.bin' and 'xxx-yyyy-zzzz.status' for each file stored in a local store. Your antivirus integration is to scan each 'bin' file and inform Pure about the scanning result. You inform Pure about scanning results by placing a 'xxx-yyyy-zzzz.av' next to each scanned 'xxx-yyyy-zzzz.bin' file.
Pure will only be able to use information from .av files if the files contain permissions that allow Pure's Tomcat process to read them.
Example of an .av file:
{
"ANTIVIRUS_PROVIDER": "ClamAV",
"ANTIVIRUS_TIMESTAMP": "2022-08-26T08:31:31.105473+00:00",
"ANTIVIRUS_ID": "31f67e27-b78f-45d7-ae8b-d5ed1d0740c7",
"ANTIVIRUS_STATUS": "OK"
}
Key | Description | Format | Required |
---|---|---|---|
ANTIVIRUS_PROVIDER | Antivirus product name used for latest scanning | String | No |
ANTIVIRUS_TIMESTAMP | When the file was last scanned | Time stamp in a YYYY-MM-DDThh:mm:ss.sTZD ISO 8601 format | Yes |
ANTIVIRUS_ID | ID used to identify the latest scanning of the file | String | No |
ANTIVIRUS_STATUS | Latest antivirus scanning result | Valid values are "OK", "FOUND". Anything else will be handled as "Pending" in Pure. | Yes |
Please note that not all JSON keys are required. Optional fields are not required to block download of infected files, but may be used to enhance the experience of using the antivirus feature in Pure now, or in the future.
Example code
The following example code is to be considered as is. It can be used freely as a starting point for integrating a ClamAV scanner.
Requirements
- https://pypi.org/project/pyClamd/ installed in the Python environment.
- A ClamAV process running on the server either as a standalone process or in a container, like https://hub.docker.com/r/clamav/clamav, with the correct port mappings to work.
Scanning files in an S3 store
Pure can make use of the AWS tagging feature on S3 objects to get access to the antivirus status of a file. The tags are the same as in .av files. See the description of .av file tags.
Example of AWS tags:
We recommend that you build components using the AWS infrastructure to ensure immediate and continuous scanning of the files. There are also commercial products in the market that can be used for this.
Updated at July 27, 2024