File reputation
The File Reputation (Malware Presence) service provides information about the malware status of requested samples. The status can be malicious, suspicious, known, or unknown.
The service supports single and bulk queries. It also provides an option to return extended response data that includes even more information about samples, such as their trust factor, threat level, and more. Additionally, it is possible to retrieve MD5, SHA1, and SHA256 hashes for the requested sample(s) using the show_hashes parameter.
This API is rate limited to 500 requests per second.
General Info about Requests/Responses
- All requests support the format parameter which supports two options: xml or json.
- Default response format is xml, except for bulk queries, where the default response format is the same as the post_format.
- The number of hashes in a bulk request must not be greater than 100.
- POST requests must contain an HTTP header field Content-Type: application/octet-stream
Malware Presence Status
Malware Presence status is labeled as follows:
-
UNKNOWN - The sample has not received a classification. It does not exist in the file reputation database, or there is no information on whether it is malicious or not.
-
KNOWN - The sample is presumed to be benign by ReversingLabs. The sample does not have any AV detections from trustworthy sources and it does not match any of our internal threat signatures. We recommend checking the
trust factor <trust-factor>
of the sample. A low trust factor (4 or 5 on a scale of 0 to 5, with 0 being highest trust) indicates the source is not trusted. On the other hand, samples with high trust factor values (0, 1, or 2) come from prominent software vendors. -
SUSPICIOUS - The sample is considered suspicious based on ReversingLabs classification algorithm's multi-level analysis. This file may be declared malicious or known at a later time when more reliable heuristics start detecting the file, when a threat specific signature is written, or any new information is received that changes its threat profile.
This classification is reserved for heuristically detected threats. It is an early warning detection mechanism that can result in false positives for oddly formed files, and those with behaviors similar to malware.
The ReversingLabs classification algorithm forms its decision based on historical detection accuracy for the technology that reports positive detection. Thresholds don't relate to the number of the detection technologies, but are based on the metrics behind confirmed true positive detections.
Reducing the number of unwanted detections within this category is an ongoing effort, but, generally, the suspicious classification is a good starting point for new threat discovery.
-
MALICIOUS - The sample was classified as malicious by ReversingLabs proprietary algorithms. This classification is reserved for high-accuracy heuristics and named threats, such as Emotet, Dridex and WannaCry. Threat severity is expressed through the threat level value. The higher the value, the more severe the threat.
While the threat level depends on the malware family, the rule of thumb is that potentially unwanted applications have the lowest value, and ransomware the highest. Some users may choose to ignore certain threat types, such as PUA.
Those are easily identifiable through threat level, or the standardized ReversingLabs threat name. This decision is based on internal policies of the organization that deploys our classification technology.
Threat Level
Threat level is calculated by a proprietary ReversingLabs algorithm. It is a measure of how malicious a malware sample is.
A sample's Threat Level is expressed as a number from 0 to 5, with 5 indicating the most dangerous threats (highest severity). The higher the threat level, the more capable is the malware sample, ranging from Trojan-Keylogger on one end, and Adware and PUA on the other end of the scale.
Various factors are considered when calculating the threat level, including but not limited to: file origin, threat type, number of occurrences in the wild, and analysis performed by Spectra Core, our static analysis engine that includes YARA rules.
In real-world situations, threat level values are typically interpreted in the following way:
- Threat Level 4, 5 - immediate response required (e.g., different types of Trojans)
- Threat Level 2, 3 - should be examined within 24 hours (e.g., first stage exploits)
- Threat Level 1 - not urgent, but should be periodically reviewed in case unapproved remote administration tools have been detected by suspicious users (e.g. Adware / PUA)
Trust factor
In samples classified as "known", trust factor represents how confident we are that the sample is goodware.
It is computed by a proprietary ReversingLabs algorithm and expressed as a number from 0 to 5, where zero represents the highest confidence that a sample is goodware.
The algorithm takes into account how reputable the origin of the file is, and what the file can do (structural metadata). On top of that, modifiers are applied. For example, if the file comes from a company that would usually be assigned some trust factor N, but this company has recently had a security incident or otherwise meets other relevant criteria, the trust factor is raised accordingly.
ReversingLabs malware naming standard
The ReversingLabs detection string consists of three main parts separated by dots. All parts of the string will always appear (all three parts are mandatory).
- The first part of the string indicates the platform targeted by the malware. If the platform is ByteCode, Document or Script, then there will be an additional subplatform string. Platform and subplatform strings are separated by a hyphen ( - ).
- The second part of the detection string describes the malware type.
- The third part represents the malware family name. This string is one of most common names for that malware.
Example
If backdoor malware is a PHP script with the family name "Jones", the detection string will look like this:
Script-PHP.Backdoor.Jones
Supported Detection String Elements
Click to expand:
Platforms (non-exhaustive)
- ABAP
- Archive
- Binary
- ByteCode
- Document
- Image
- MacOS
- OS2
- Script-VBS
- SunOS
- Unix
- WebAssembly
- WinCE
Subplatforms (non-exhaustive)
- 7ZIP
- ActiveX
- AR
- AutoIt
- BMP
- CGI
- CorelDraw
- Excel
- GZIP
- INI
- JAR
- JS
- Lua
- Makefile
- MSG
- NuGet
- OTF
- PHP
- PowerShell
- Python
- RTF
- Shockwave
- SWF
- TIFF
- Visio
- WMF
- XML
Malware Types (non-exhaustive)
- Adware
- Certificate
- Downloader
- Format
- Infostealer
- Malware
- Phishing
- Rogue
- Spyware
- Worm
Malware Presence Single Query
This query returns information about the malware status of the requested sample.
GET /api/databrowser/malware_presence/query/{hash_type}/{hash_value}
hash_type
- Specifies which hash type will be used in the request. Supported values:
md5
,sha1
,sha256
- Required
- Specifies which hash type will be used in the request. Supported values:
hash_value
- Hash of the file for which the user is requesting data from the service. The value must be a valid hash of the same type specified by the
hash_type
parameter. - Required
- Hash of the file for which the user is requesting data from the service. The value must be a valid hash of the same type specified by the
Response Format
{
"rl": {
"malware_presence": {
"status": "KNOWN",
"query_hash": {
"sha1|md5|sha256": "hash_value"
}
}
}
}
status
- Malware presence status designation (UNKNOWN, KNOWN, SUSPICIOUS, or MALICIOUS)
query_hash
- The hash type and value used in the request. Can be
md5
,sha1
, orsha256
- The hash type and value used in the request. Can be
Malware Presence Bulk Query
A bulk query will return a response of the same format as a single query retrieves, but for multiple hashes in a single response. There are also additional response fields describing ill-formatted hashes, and hashes not found by the service. Up to 100 hashes can be submitted in one request.
POST /api/databrowser/malware_presence/bulk_query/{post_format}
post_format
- Required parameter that defines the POST payload format. Supported options are xml and json. By default, the response format matches the format defined by this parameter.
- Required
Request Format
The following rules apply to both formats (XML and JSON).
- hash_type value must be one of the following:
md5
,sha1
,sha256
- hash_value must be a valid hash of the same type specified by
hash_type
{
"rl": {
"query": {
"hash_type": "hash_type",
"hashes": [
"hash_value",
"hash_value",
"...",
"hash_value"
]
}
}
}
Response Format
{
"rl": {
"entries": [
{
"status": "UNKNOWN",
"query_hash": {
"sha1|md5|sha256": "hash_value"
}
}
],
"invalid_hashes": [
"hash_value"
]
}
}
Items in rl.entries
in the bulk query response are equivalent to the rl.malware_presence
element in the single query response.
Extended Malware Presence Query Option
Single and bulk queries support an additional query parameter extended
. This is an optional parameter that specifies whether extended classification metadata for the requested sample(s) should be returned in the response.
The parameter can be set to true
(include extended metadata) or false
(default; don't include extended metadata). If the parameter is not provided in the request, it is interpreted as false
(extended metadata is not returned).
Extended metadata includes information such as trust factor and threat level values; malware type, family name, and platform; first and last seen times, and more.
It may also include information on the classification reason for each sample. This information is conveyed in the optional response field reason
, which is only returned when the extended
parameter is included in the request, and only if the field is calculated for the requested sample(s).
Classification reason information helps users understand the logic behind sample classification decisions; particularly in the case of goodware overrides. This is an advanced goodware whitelisting technique designed to prevent and suppress possible false positives within highly trusted software packages. If a sample has a valid and trusted certificate signature, or if it came from a trusted source, the sample and all its extracted files should be whitelisted regardless of the AV results. A common example of this are samples with high AV detection results that have the KNOWN classification status.
Single Query with Extended Option
GET /api/databrowser/malware_presence/query/hash_type/hash_value?extended=true
Bulk Query with Extended Option
POST /api/databrowser/malware_presence/bulk_query/post_format?extended=true
Response Format
The response format is the same for sample nodes in both single and bulk queries.
{
"rl": {
"malware_presence": {
"status": "string",
"sha1": "string",
"threat_level": 0,
"scanner_percent": 0,
"scanner_match": 0,
"last_seen": "string",
"reason": "string",
"scanner_count": 0,
"query_hash": {
"sha1": "string"
},
"first_seen": "string",
"sha256": "string",
"trust_factor": 0,
"md5": "string"
}
}
}
status
- Malware presence status designation (UNKNOWN, KNOWN, SUSPICIOUS, or MALICIOUS). The status designation is calculated by a proprietary ReversingLabs algorithm that adapts and improves as new information about the file and the threat is discovered.
- The algorithm takes the following into account: Spectra Intelligence antivirus scan results, threat and trust factors, parent/child relationships, certificates, and other metadata-specific information.
reason
- Optional response field that clarifies the reason why a sample received a particular classification status. This field is presently not calculated for all samples in the Spectra Intelligence system, which means it can be omitted from the response.
- The value of this field can be one of the following: analyst_sample_override (the sample was classified manually after an analysis), antivirus (the sample was classified by the ReversingLabs multi-scan algorithm based on aggregated antivirus scan results), best_certificate (the sample or its container are signed with a valid and trusted certificate), best_source (the sample can be obtained from a trusted source, or it was unpacked from a file originating from a trusted source), TC_certificate (the sample is signed with a recognized whitelisted certificate), sandbox (classified by the ReversingLabs Cloud Sandbox).
scanner_count
- Number of scanners used in the last scan
classification
- Malware classification based on the latest analysis of the requested sample, calculated by the ReversingLabs algorithm. See rl > malware_presence > classification for details
scanner_percent
- Percent of scanners that detected malware in the last scan
scanner_match
- Number of scanners that detected malware in the last scan
threat_name
- Detected threat name for the requested sample
query_hash
- Hash value used in the request; can be md5, sha1, or sha256
first_seen
- Indicates the date and time when the sample was first uploaded to the system, or when it has received a scan result for the first time
last_seen
- Indicates the date and time when the sample was last uploaded to the system, or the date and time of the last scan result it has received
threat_level
- Threat level value for the requested sample (0 indicates no threat; 1 is the lowest threat value - lowest severity, such as Adware; 5 is the highest threat value, e.g, Trojan)
trust_factor
- Trust factor value of the sample's sources (0 is the most trusted; 5 is the least trusted)
rl.malware_presence.classification
family_name
- Malware family name of the detected malware
subplatform
- Subplatform targeted by the detected malware; can be one of the subplatforms defined in ReversingLabs Malware Naming Standard
platform
- Platform targeted by the detected malware; can be one of the platforms defined in ReversingLabs Malware Naming Standard
is_generic
- Select trusted 3rd party AV scanners assigned threat names that, based on their naming conventions suggest the sample was "generically" and/or "heuristically" identified as having characteristics similar to known malicious software. If malware is detected in this way, this field returns a true value
cve
- If applicable, contains the Common Vulnerabilities and Exposures (CVE) identifier. Additional fields are is_candidate (returns true if the sample is a CVE candidate or false if the sample is an official CVE list entry), CVE number, and CVE year
type
- Malware type; can be one of the types defined in ReversingLabs Malware Naming Standard
Response Examples
Sample with Malware Presence Status MALICIOUS
{
"rl": {
"malware_presence": {
"status": "MALICIOUS",
"scanner_count": 40,
"classification": {
"platform": "Win32",
"type": "Trojan",
"is_generic": false,
"family_name": "Nsis"
},
"scanner_percent": 82.5,
"scanner_match": 33,
"threat_name": "Win32.Trojan.Nsis",
"query_hash": {
"sha1": "1d412db0ac58dd8d8bfae8b18c7b355bd14dab2f"
},
"first_seen": "2012-07-12T00:05:00",
"threat_level": 5,
"trust_factor": 5,
"last_seen": "2017-08-09T11:40:00"
}
}
}
Sample with Malware Presence Status SUSPICIOUS
{
"rl": {
"malware_presence": {
"status": "SUSPICIOUS",
"scanner_count": 29,
"classification": {
"subplatform": "JS",
"platform": "Script",
"is_generic": false,
"family_name": "Trojan-downloader"
},
"scanner_percent": 6.8965516090393066,
"scanner_match": 2,
"threat_name": "Script-JS.Malware.Trojan-downloader",
"query_hash": {
"sha1": "e0c58164cde263904abc07915d7c9d2722e3806c"
},
"first_seen": "2017-09-07T01:14:31",
"threat_level": 0,
"trust_factor": 5,
"last_seen": "2017-09-28T09:08:00"
}
}
}
Sample with Malware Presence Status KNOWN
{
"rl": {
"malware_presence": {
"status": "KNOWN",
"reason": "TC_certificate",
"scanner_count": 25,
"scanner_percent": 76,
"scanner_match": 19,
"query_hash": {
"sha1": "5ff74f670c8a68557bf36955d3e4e2353266d607"
},
"first_seen": "2016-06-30T01:17:42",
"threat_level": 0,
"trust_factor": 0,
"last_seen": "2019-10-17T10:03:23"
}
}
}
Sample with Malware Presence Status UNKNOWN
{
"rl": {
"malware_presence": {
"status": "UNKNOWN",
"query_hash": {
"sha1": "e4e8c856c1524ff22b87b29d605ab8fdb1007298"
}
}
}
}
Malware Presence Query with Available Hashes
Single and bulk queries support an additional request parameter show_hashes
, which can be set to either true
or false
. This optional parameter can be used in combination with the extended
parameter in the same request.
When set to true
, the show_hashes
parameter specifies that MD5, SHA1, and SHA256 hashes should be returned in the response for the requested sample, in addition to the rest of the Malware Presence information. If the parameter is not provided in the request, it is interpreted as false
(additional hashes are not returned).
GET /api/databrowser/malware_presence/query/hash_type/hash_value?show_hashes=true
POST /api/databrowser/malware_presence/bulk_query/{post_format}?show_hashes=true
Response Format
Entries in rl.entries
in the bulk query response are equivalent to the rl.malware_presence
element in the single query response.
If both the extended and show_hashes parameters are set to true then the response will contain both extended malware presence information and MD5, SHA1 and SHA256 hashes.
{
"rl": {
"malware_presence": {
"status": "KNOWN",
"query_hash": {
"sha1|md5|sha256": "hash_value"
},
"md5": "md5 hash_value",
"sha1": "sha1 hash_value",
"sha256": "sha256 hash_value"
}
}
}
status
- Malware presence status designation (UNKNOWN, KNOWN, SUSPICIOUS, or MALICIOUS)
query_hash
- The hash type and value used in the request; can be MD5, SHA1, or SHA256
md5, sha1, sha256
- Respective hashes for the requested sample(s).
Examples
Single query - changing the response format
/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?format=json
/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?format=xml
Single query - changing the hash type
/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b
/api/databrowser/malware_presence/query/sha256/10dbb2b27208c5566d326b47950657bf6b3c9a59e302598a128ad7125d5fb4fd
/api/databrowser/malware_presence/query/md5/ca083f61113e1fb8f539ecfa7c725fc8
Single query - changing the optional flags
/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?extended=true
/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?show_hashes=true
/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?extended=true&show_hashes=true
Bulk query - changing the POST format
/api/databrowser/malware_presence/bulk_query/json
/api/databrowser/malware_presence/bulk_query/xml
With bulk queries, the response format will default to the request format. If you want a different response format, add the format query field:
/api/databrowser/malware_presence/bulk_query/xml?format=json
Bulk query - JSON POST format
/api/databrowser/malware_presence/bulk_query/json
{
"rl": {
"query": {
"hash_type": "md5",
"hashes": [
"4bb64c06b1a72539e6d3476891daf17b",
"6353de8f339b7dcc6b25356f5fbffa4e",
"59cb087c4c3d251474ded9e156964d5d",
"6c2eb9d1a094d362bcc7631f2551f5a4",
"a82c781ce0f43d06c28fe5fc8ebb1ca9",
"920f5ba4d08f251541c5419ea5fb3fb3"
]
}
}
}
{
"rl": {
"query": {
"hash_type": "sha1",
"hashes": [
"13e40f38427a55952359bfc5f52b5841ce1b46ba",
"831fc2b9075b0a490adf15d2c5452e01e6feaa17",
"42b05278a6f2ee006072af8830c103eab2ce045f"
]
}
}
}
{
"rl": {
"query": {
"hashes": [
"0001f757f6b9523707462066100aa543",
"000202ed4a0fb4c95e68824bc7777a78",
"00026f63fd5a2600b73a866d7ef08b6f"
],
"hash_type": "md5"
}
}
}