Skip to main content

File reputation (TCA-0101)

The File Reputation (Malware Presence) service provides information about the malware status of requested samples. The status can be malicious, suspicious, known, or unknown.

The service supports single and bulk queries. It also provides an option to return extended response data that includes even more information about samples, such as their trust factor, threat level, and more. Additionally, it is possible to retrieve MD5, SHA1, and SHA256 hashes for the requested sample(s) using the show_hashes parameter.

This API is rate limited to 500 requests per second.

General Info about Requests/Responses

  • All requests support the format parameter which supports two options: xml or json.
  • Default response format is xml, except for bulk queries, where the default response format is the same as the post_format.
  • The number of hashes in a bulk request must not be greater than 100.
  • POST requests must contain an HTTP header field Content-Type: application/octet-stream

Malware Presence Status

Malware Presence status is labeled as follows:

  • UNKNOWN - The sample has not received a classification. It does not exist in the file reputation database, or there is no information on whether it is malicious or not.

  • KNOWN - The sample is presumed to be benign by ReversingLabs. The sample does not have any AV detections from trustworthy sources and it does not match any of our internal threat signatures. We recommend checking the trust factor <trust-factor> of the sample. A low trust factor (4 or 5 on a scale of 0 to 5, with 0 being highest trust) indicates the source is not trusted. On the other hand, samples with high trust factor values (0, 1, or 2) come from prominent software vendors.

  • SUSPICIOUS - The sample is considered suspicious based on ReversingLabs classification algorithm's multi-level analysis. This file may be declared malicious or known at a later time when more reliable heuristics start detecting the file, when a threat specific signature is written, or any new information is received that changes its threat profile.

    This classification is reserved for heuristically detected threats. It is an early warning detection mechanism that can result in false positives for oddly formed files, and those with behaviors similar to malware.

    The ReversingLabs classification algorithm forms its decision based on historical detection accuracy for the technology that reports positive detection. Thresholds don't relate to the number of the detection technologies, but are based on the metrics behind confirmed true positive detections.

    Reducing the number of unwanted detections within this category is an ongoing effort, but, generally, the suspicious classification is a good starting point for new threat discovery.

  • MALICIOUS - The sample was classified as malicious by ReversingLabs proprietary algorithms. This classification is reserved for high-accuracy heuristics and named threats, such as Emotet, Dridex and WannaCry. Threat severity is expressed through the threat level value. The higher the value, the more severe the threat.

While the threat level depends on the malware family, the rule of thumb is that potentially unwanted applications have the lowest value, and ransomware the highest. Some users may choose to ignore certain threat types, such as PUA.

Those are easily identifiable through threat level, or the standardized ReversingLabs threat name. This decision is based on internal policies of the organization that deploys our classification technology.

Threat Level

Threat level is calculated by a proprietary ReversingLabs algorithm. It is a measure of how malicious a malware sample is.

A sample's Threat Level is expressed as a number from 0 to 5, with 5 indicating the most dangerous threats (highest severity). The higher the threat level, the more capable is the malware sample, ranging from Trojan-Keylogger on one end, and Adware and PUA on the other end of the scale.

Various factors are considered when calculating the threat level, including but not limited to: file origin, threat type, number of occurrences in the wild, and analysis performed by Spectra Core, our static analysis engine that includes YARA rules.

In real-world situations, threat level values are typically interpreted in the following way:

  • Threat Level 4, 5 - immediate response required (e.g., different types of Trojans)
  • Threat Level 2, 3 - should be examined within 24 hours (e.g., first stage exploits)
  • Threat Level 1 - not urgent, but should be periodically reviewed in case unapproved remote administration tools have been detected by suspicious users (e.g. Adware / PUA)

Trust factor

In samples classified as "known", trust factor represents how confident we are that the sample is goodware.

It is computed by a proprietary ReversingLabs algorithm and expressed as a number from 0 to 5, where zero represents the highest confidence that a sample is goodware.

The algorithm takes into account how reputable the origin of the file is, and what the file can do (structural metadata). On top of that, modifiers are applied. For example, if the file comes from a company that would usually be assigned some trust factor N, but this company has recently had a security incident or otherwise meets other relevant criteria, the trust factor is raised accordingly.

ReversingLabs malware naming standard

The ReversingLabs detection string consists of three main parts separated by dots. All parts of the string will always appear (all three parts are mandatory).

  • The first part of the string indicates the platform targeted by the malware. If the platform is ByteCode, Document or Script, then there will be an additional subplatform string. Platform and subplatform strings are separated by a hyphen ( - ).
  • The second part of the detection string describes the malware type.
  • The third part represents the malware family name. This string is one of most common names for that malware.

Example

If backdoor malware is a PHP script with the family name "Jones", the detection string will look like this:

Script-PHP.Backdoor.Jones

Supported Detection String Elements

Click to expand:

Platforms (non-exhaustive)
  • ABAP
  • Android
  • Archive
  • Audio
  • Binary
  • ByteCode
  • Document
  • Email
  • Image
  • Linux
  • MacOS
  • OS2
  • PDF
  • Script-VBS
  • SunOS
  • Text
  • Unix
  • WebAssembly
  • Win32
  • Win64
  • WinCE
Subplatforms (non-exhaustive)
  • 7ZIP
  • ActiveX
  • AR
  • AutoIt
  • BMP
  • CGI
  • CorelDraw
  • Excel
  • GZIP
  • HTML
  • INI
  • ISO
  • JAR
  • JAVA
  • JPEG
  • JS
  • Lua
  • Macro
  • Makefile
  • MSG
  • NuGet
  • OTF
  • PHP
  • PNG
  • PowerShell
  • Python
  • RTF
  • Shell
  • Shockwave
  • SWF
  • TIFF
  • Visio
  • WMF
  • Word
  • XML
Malware Types (non-exhaustive)
  • Adware
  • Backdoor
  • Certificate
  • Downloader
  • Exploit
  • Format
  • Infostealer
  • Keylogger
  • Malware
  • Phishing
  • Ransomware
  • Rogue
  • Rootkit
  • Spyware
  • Trojan
  • Virus
  • Worm

Malware Presence Single Query

This query returns information about the malware status of the requested sample.

GET /api/databrowser/malware_presence/query/{hash_type}/{hash_value}
  • hash_type
    • Specifies which hash type will be used in the request. Supported values: md5, sha1, sha256
    • Required
  • hash_value
    • Hash of the file for which the user is requesting data from the service. The value must be a valid hash of the same type specified by the hash_type parameter.
    • Required

Response Format

{
"rl": {
"malware_presence": {
"status": "KNOWN",
"query_hash": {
"sha1|md5|sha256": "hash_value"
}
}
}
}
  • status
    • Malware presence status designation (UNKNOWN, KNOWN, SUSPICIOUS, or MALICIOUS)
  • query_hash
    • The hash type and value used in the request. Can be md5, sha1, or sha256

Malware Presence Bulk Query

A bulk query will return a response of the same format as a single query retrieves, but for multiple hashes in a single response. There are also additional response fields describing ill-formatted hashes, and hashes not found by the service. Up to 100 hashes can be submitted in one request.

POST /api/databrowser/malware_presence/bulk_query/{post_format}
  • post_format
    • Required parameter that defines the POST payload format. Supported options are xml and json. By default, the response format matches the format defined by this parameter.
    • Required

Request Format

The following rules apply to both formats (XML and JSON).

  • hash_type value must be one of the following: md5, sha1, sha256
  • hash_value must be a valid hash of the same type specified by hash_type
{
"rl": {
"query": {
"hash_type": "hash_type",
"hashes": [
"hash_value",
"hash_value",
"...",
"hash_value"
]
}
}
}

Response Format

{
"rl": {
"entries": [
{
"status": "UNKNOWN",
"query_hash": {
"sha1|md5|sha256": "hash_value"
}
}
],
"invalid_hashes": [
"hash_value"
]
}
}

Items in rl.entries in the bulk query response are equivalent to the rl.malware_presence element in the single query response.

Extended Malware Presence Query Option

Single and bulk queries support an additional query parameter extended. This is an optional parameter that specifies whether extended classification metadata for the requested sample(s) should be returned in the response.

The parameter can be set to true (include extended metadata) or false (default; don't include extended metadata). If the parameter is not provided in the request, it is interpreted as false (extended metadata is not returned).

Extended metadata includes information such as trust factor and threat level values; malware type, family name, and platform; first and last seen times, and more.

It may also include information on the classification reason for each sample. This information is conveyed in the optional response field reason, which is only returned when the extended parameter is included in the request, and only if the field is calculated for the requested sample(s).

Classification reason information helps users understand the logic behind sample classification decisions; particularly in the case of goodware overrides. This is an advanced goodware whitelisting technique designed to prevent and suppress possible false positives within highly trusted software packages. If a sample has a valid and trusted certificate signature, or if it came from a trusted source, the sample and all its extracted files should be whitelisted regardless of the AV results. A common example of this are samples with high AV detection results that have the KNOWN classification status.

Single Query with Extended Option

GET /api/databrowser/malware_presence/query/hash_type/hash_value?extended=true

Bulk Query with Extended Option

POST /api/databrowser/malware_presence/bulk_query/post_format?extended=true

Response Format

The response format is the same for sample nodes in both single and bulk queries.

{
"rl": {
"malware_presence": {
"status": "string",
"threat_level": 0,
"scanner_percent": 0,
"scanner_match": 0,
"last_seen": "string",
"reason": "string",
"scanner_count": 0,
"query_hash": {
"sha1": "string"
},
"first_seen": "string",
"trust_factor": 0
}
}
}
  • status
    • Malware presence status designation (UNKNOWN, KNOWN, SUSPICIOUS, or MALICIOUS). The status designation is calculated by a proprietary ReversingLabs algorithm that adapts and improves as new information about the file and the threat is discovered.
    • The algorithm takes the following into account: Spectra Intelligence antivirus scan results, threat and trust factors, parent/child relationships, certificates, and other metadata-specific information.
  • reason
    • Optional response field that clarifies the reason why a sample received a particular classification status. This field is presently not calculated for all samples in the Spectra Intelligence system, which means it can be omitted from the response.
    • The value of this field can be one of the following: analyst_sample_override (the sample was classified manually after an analysis), antivirus (the sample was classified by the ReversingLabs multi-scan algorithm based on aggregated antivirus scan results), best_certificate (the sample or its container are signed with a valid and trusted certificate), best_source (the sample can be obtained from a trusted source, or it was unpacked from a file originating from a trusted source), TC_certificate (the sample is signed with a recognized whitelisted certificate), sandbox (classified by the ReversingLabs Cloud Sandbox).
  • scanner_count
    • Number of scanners used in the last scan
  • classification
    • Malware classification based on the latest analysis of the requested sample, calculated by the ReversingLabs algorithm. See rl > malware_presence > classification for details
  • scanner_percent
    • Percent of scanners that detected malware in the last scan
  • scanner_match
    • Number of scanners that detected malware in the last scan
  • threat_name
    • Detected threat name for the requested sample
  • query_hash
    • Hash value used in the request; can be md5, sha1, or sha256
  • first_seen
    • Indicates the date and time when the sample was first uploaded to the system, or when it has received a scan result for the first time
  • last_seen
    • Indicates the date and time when the sample was last uploaded to the system, or the date and time of the last scan result it has received
  • threat_level
    • Threat level value for the requested sample (0 indicates no threat; 1 is the lowest threat value - lowest severity, such as Adware; 5 is the highest threat value, e.g, Trojan)
  • trust_factor
    • Trust factor value of the sample's sources (0 is the most trusted; 5 is the least trusted)

rl.malware_presence.classification

  • family_name
    • Malware family name of the detected malware
  • subplatform
  • platform
  • is_generic
    • Select trusted 3rd party AV scanners assigned threat names that, based on their naming conventions suggest the sample was "generically" and/or "heuristically" identified as having characteristics similar to known malicious software. If malware is detected in this way, this field returns a true value
  • cve
    • If applicable, contains the Common Vulnerabilities and Exposures (CVE) identifier. Additional fields are is_candidate (returns true if the sample is a CVE candidate or false if the sample is an official CVE list entry), CVE number, and CVE year
  • type

Response Examples

Sample with Malware Presence Status MALICIOUS

{
"rl": {
"malware_presence": {
"status": "MALICIOUS",
"scanner_count": 40,
"classification": {
"platform": "Win32",
"type": "Trojan",
"is_generic": false,
"family_name": "Nsis"
},
"scanner_percent": 82.5,
"scanner_match": 33,
"threat_name": "Win32.Trojan.Nsis",
"query_hash": {
"sha1": "1d412db0ac58dd8d8bfae8b18c7b355bd14dab2f"
},
"first_seen": "2012-07-12T00:05:00",
"threat_level": 5,
"trust_factor": 5,
"last_seen": "2017-08-09T11:40:00"
}
}
}

Sample with Malware Presence Status SUSPICIOUS

{
"rl": {
"malware_presence": {
"status": "SUSPICIOUS",
"scanner_count": 29,
"classification": {
"subplatform": "JS",
"platform": "Script",
"is_generic": false,
"family_name": "Trojan-downloader"
},
"scanner_percent": 6.8965516090393066,
"scanner_match": 2,
"threat_name": "Script-JS.Malware.Trojan-downloader",
"query_hash": {
"sha1": "e0c58164cde263904abc07915d7c9d2722e3806c"
},
"first_seen": "2017-09-07T01:14:31",
"threat_level": 0,
"trust_factor": 5,
"last_seen": "2017-09-28T09:08:00"
}
}
}

Sample with Malware Presence Status KNOWN

{
"rl": {
"malware_presence": {
"status": "KNOWN",
"reason": "TC_certificate",
"scanner_count": 25,
"scanner_percent": 76,
"scanner_match": 19,
"query_hash": {
"sha1": "5ff74f670c8a68557bf36955d3e4e2353266d607"
},
"first_seen": "2016-06-30T01:17:42",
"threat_level": 0,
"trust_factor": 0,
"last_seen": "2019-10-17T10:03:23"
}
}
}

Sample with Malware Presence Status UNKNOWN

{
"rl": {
"malware_presence": {
"status": "UNKNOWN",
"query_hash": {
"sha1": "e4e8c856c1524ff22b87b29d605ab8fdb1007298"
}
}
}
}

Malware Presence Query with Available Hashes

Single and bulk queries support an additional request parameter show_hashes, which can be set to either true or false. This optional parameter can be used in combination with the extended parameter in the same request.

When set to true, the show_hashes parameter specifies that MD5, SHA1, and SHA256 hashes should be returned in the response for the requested sample, in addition to the rest of the Malware Presence information. If the parameter is not provided in the request, it is interpreted as false (additional hashes are not returned).

GET /api/databrowser/malware_presence/query/hash_type/hash_value?show_hashes=true
POST /api/databrowser/malware_presence/bulk_query/{post_format}?show_hashes=true

Response Format

Entries in rl.entries in the bulk query response are equivalent to the rl.malware_presence element in the single query response.

If both the extended and show_hashes parameters are set to true then the response will contain both extended malware presence information and MD5, SHA1 and SHA256 hashes.

{
"rl": {
"malware_presence": {
"status": "KNOWN",
"query_hash": {
"sha1|md5|sha256": "hash_value"
},
"md5": "md5 hash_value",
"sha1": "sha1 hash_value",
"sha256": "sha256 hash_value"
}
}
}
  • status
    • Malware presence status designation (UNKNOWN, KNOWN, SUSPICIOUS, or MALICIOUS)
  • query_hash
    • The hash type and value used in the request; can be MD5, SHA1, or SHA256
  • md5, sha1, sha256
    • Respective hashes for the requested sample(s).

Examples

Single query - changing the response format

/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?format=json
/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?format=xml

Single query - changing the hash type

/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b
/api/databrowser/malware_presence/query/sha256/10dbb2b27208c5566d326b47950657bf6b3c9a59e302598a128ad7125d5fb4fd
/api/databrowser/malware_presence/query/md5/ca083f61113e1fb8f539ecfa7c725fc8

Single query - changing the optional flags

/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?extended=true
/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?show_hashes=true
/api/databrowser/malware_presence/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?extended=true&show_hashes=true

Bulk query - changing the POST format

/api/databrowser/malware_presence/bulk_query/json
/api/databrowser/malware_presence/bulk_query/xml

With bulk queries, the response format will default to the request format. If you want a different response format, add the format query field:

/api/databrowser/malware_presence/bulk_query/xml?format=json

Bulk query - JSON POST format

/api/databrowser/malware_presence/bulk_query/json
{
"rl": {
"query": {
"hash_type": "md5",
"hashes": [
"4bb64c06b1a72539e6d3476891daf17b",
"6353de8f339b7dcc6b25356f5fbffa4e",
"59cb087c4c3d251474ded9e156964d5d",
"6c2eb9d1a094d362bcc7631f2551f5a4",
"a82c781ce0f43d06c28fe5fc8ebb1ca9",
"920f5ba4d08f251541c5419ea5fb3fb3"
]
}
}
}
{
"rl": {
"query": {
"hash_type": "sha1",
"hashes": [
"13e40f38427a55952359bfc5f52b5841ce1b46ba",
"831fc2b9075b0a490adf15d2c5452e01e6feaa17",
"42b05278a6f2ee006072af8830c103eab2ce045f"
]
}
}
}
{
"rl": {
"query": {
"hashes": [
"0001f757f6b9523707462066100aa543",
"000202ed4a0fb4c95e68824bc7777a78",
"00026f63fd5a2600b73a866d7ef08b6f"
],
"hash_type": "md5"
}
}
}