Skip to main content

Functionally similar files (analytics) (TCA-0321)

The ReversingLabs Hashing Algorithm (RHA) identifies code similarity between unknown samples and previously seen malware samples. Files have the same RHA1 hash when they are functionally similar.

This API provides real-time statistics (counters) for malicious, suspicious and known samples that are functionally similar to the provided SHA1 hash at the requested precision level.

Precision level represents the degree to which a file is functionally similar to another file. The following precision levels are supported - 25% and 50% for PE and 25% for MachO and ELF executable files. A higher precision level will match fewer files, but the files will have more functional similarity.

Through this API, users can easily submit executable files and get important statistics - counters of functionally similar files mapped to their classification. By using the 'extended' option, additional sample metadata such as file hashes (SHA1, MD5, SHA256), classification and reputation information (threat level, threat name, malware family, malware type...) will be returned for the submitted SHA1 hash.

General Info about Requests/Responses

  • All requests support the format parameter with two possible values: xml or json.
  • Default response format is xml, except for bulk queries where the response format is the same as the post_format.
  • POST requests must contain an HTTP header field Content-Type: application/octet-stream.
  • All bulk queries will accept POST payload in XML or JSON (described below)

RHA1 Analytics Single Query

This query returns statistics (counters) of files mapped to their classifications that are functionally similar to the submitted file at the given precision level.

The response will contain a count of malicious, suspicious and known files, as well as the total number of files. If the extended option is selected, the response will contain additional metadata for the provided SHA1 hash.

Request Format

GET /api/rha1/analytics/v1/query/{rha1_type}/{sha1}[?format=xml|json][&extended=false|true]
  • rha1_type
    • A measure of the RHA1 precision level. It represents the degree to which a file is functionally similar to another file. A higher precision level will match fewer files, but the files will have more functional similarity:
    • Required
    • pe01, elf01, machO01 - 25% precision level
    • pe02 - 50% precision level
  • sha1
    • Must be a valid SHA1 hash
    • Required
  • format
    • Specifies the response format, with possible values being xml (default) and json
    • Optional
  • extended
    • An optional parameter. Possible values are true - extended, and false - non-extended data set (default)
    • Optional

Response Format

If the requested hash doesn't exist in the database records or doesn't match the requested RHA1 precision level, the server will respond with the status response code 404 and the message "Requested data was not found"

All possible response fields are described in the following tables. Example responses can be found further in the document.

  • rha1_counters
    • Parent node for classification counters rl (default rl root)
  • sha1
    • The requested SHA1 hash
  • rha1_type
    • Type of RHA1 hash (the precision level of the RHA1)
  • rha1_first_seen
    • RHA1 bucket first seen
  • rha1_last_seen
    • RHA1 bucket last seen
  • sample_counters
    • List of counters
  • sample_metadata
    • Sample metadata fields, if extended is set to true rl > rha1_counters
  • malicious
    • Number of malicious samples that are functionally similar to the provided SHA1 hash at the requested precision level (RHA1 type)
  • suspicious
    • Number of suspicious samples that are functionally similar to the provided SHA1 hash at the requested precision level (RHA1 type)
  • known
    • Number of known samples that are functionally similar to the provided SHA1 hash at the requested precision level (RHA1 type)
  • total
    • Sum of all counters rl > rha1_counters > sample_counters

Returned response fields depend on the selected data set. If the extended option is set to true, the following fields will be returned for the requested SHA1 sample hash. Empty fields are not included in the response.

  • md5
    • MD5 hash of the sample
  • sha256
    • SHA256 hash of the sample
  • sample_available
    • Sample's download availability status
  • classification
    • Current Malware Presence status designation. The status designation is calculated by a proprietary ReversingLabs algorithm that adapts and improves as new information about the file and the threat is discovered. The algorithm takes the following into account: Spectra Core static analysis results, ReversingLabs RHA1 similarity hashing algorithm, complex malformation rules, YARA rules, ReversingLabs signatures, Spectra Intelligence antivirus scan results, threat and trust factors, parent/child relationships, certificates, and other metadata-specific information
  • threat_level
    • Threat level of the sample
  • trust_factor
    • Trust factor of the sample
  • threat_name
    • Detected threat name
  • malware_family
    • Malware family for malicious and suspicious samples
  • malware_type
    • Malware type for malicious and suspicious samples
  • sample_type
    • Sample type
  • sample_size
    • Sample size (in bytes)
  • first_seen
    • Time when the sample was first seen in the ReversingLabs system (UTC)
  • last_seen
    • Time when the sample was last seen in the ReversingLabs system (UTC) rl > rha1_counters > sample_metadata

Response Examples

{
"rl": {
"rha1_counters": {
"sha1": "4f21ad6781bfbee641ecd075fc079b5e6145f03f",
"rha1_type": "pe01",
"rha1_first_seen": "2011-05-21T09:36:00",
"rha1_last_seen": "2012-12-17T00:00:00",
"sample_counters": {
"known": 4116,
"malicious": 138,
"suspicious": 182,
"total": 4436
},
"sample_metadata": {
"md5": "6ede26d354a3956573291361f754ea10",
...
}
}
}
}

RHA1 Analytics Bulk Query

This query retrieves the same data as the single query, but for multiple hashes within a single response. It is more network-efficient compared to multiple single queries. Bulk query accepts a maximum of 1000 hashes in a single request.

POST /api/rha1/analytics/v1/query/{post_format}
  • post_format
    • Defines the POST payload format. Supported options are xml and json
    • Required

POST Body Request Format

  • rha1_type
    • A measure of the RHA1 precision level. It represents the degree to which a file is functionally similar to another file. A higher precision level will match fewer files, but the files will have more functional similarity:
    • Required
    • pe01, elf01, machO01 - 25% precision level
    • pe02 - 50% precision level
  • hash_value
    • Must be a valid SHA1 hash
    • Required
  • response_format
    • Specifies the response format, with possible values being xml (default) and json
    • Optional
  • extended
    • An optional parameter. Possible values are true - extended, and false - non-extended data set (default)
    • Optional
{
"rl" : {
"query" : {
"rha1_type" : "(pe01|pe02|elf01|macho01)",
"response_format" : "(xml|json)",
"extended" : "(true|false)",
"hashes" : [
"hash_value",
"hash_value",
...,
"hash_value"
]
}
}
}

Response format

  • entries

    • Contains a sequence of counters data
  • invalid_hashes

    • List of ill-formatted hashes from the request
  • unknown_hashes

    • List of hashes from the request that were not found in the database rl (default rl root)
  • item

    • The <item> is equivalent to the table node from the RHA1 Analytics single query rl > entries
{ "rl": {
"entries": [
{ ... }, -
{ ... }
],
"invalid_hashes": [
"zzze764af2be3711ce1147fa762562188b57dae83z"
],
"unknown_hashes": [
"1147fa762562188b57dae8cf3e764af2be3711ce"
]
}}

Examples

Format query field

These examples request different response formats:

[GET]

/api/rha1/analytics/v1/query/pe01/25cefc6fc048fbac9eccf2d65af736dd12e2c62a?format=json
/api/rha1/analytics/v1/query/pe01/eb7f7f9b7744d0f28ab82f8272fbe643e56a070c?format=xml

Format query field

These examples request different RHA1 types

[GET]

/api/rha1/analytics/v1/query/pe01/25cefc6fc048fbac9eccf2d65af736dd12e2c62a
/api/rha1/analytics/v1/query/pe02/00e710e430e5558f06b1b3c85d518fbcfa118eef
/api/rha1/analytics/v1/query/elf01/9c489fcaee9abedd736b474d7f9076d23ea2bb9b
/api/rha1/analytics/v1/query/macho01/b7a1143b04aa93e0b236968c6cded7807aa5e89c

Extended optional parameter used

These examples demonstrate the use of the extended field:

[GET]

/api/rha1/analytics/v1/query/pe01/25cefc6fc048fbac9eccf2d65af736dd12e2c62a?extended=true
/api/rha1/analytics/v1/query/elf01/9c489fcaee9abedd736b474d7f9076d23ea2bb9b?extended=false

Bulk query

These examples use different POST formats:

[POST]

/api/rha1/analytics/v1/query/xml
/api/rha1/analytics/v1/query/json

Bulk JSON POST

This example varies the requested RHA1 type, response_format and extended options, but the post_type is staying the same:

/api/rha1/analytics/v1/query/json

Example 1

{
"rl": {
"query": {
"rha1_type":"pe01",
"response_format":"xml",
"extended":"false",
"hashes": [
"70d1d32e783dac03a7000616e63207f17b996809",
"0dd1bc46e96d41591294e8c13c6eb7f6212be2ed",
"57dafb1b1f5c0e0217fc90e25355386cb087886f",
"eb7f7f9b7744d0f28ab82f8272fbe643e56a070c",
"70d1d32e783dac03a7000616e63207f17b996807"
]
}
}
}

Example 2

{
"rl": {
"query": {
"rha1_type":"pe02",
"response_format":"xml",
"extended":"false",
"hashes": [
"70d1d32e783dac03a7000616e63207f17b996809",
"0dd1bc46e96d41591294e8c13c6eb7f6212be2ed",
"57dafb1b1f5c0e0217fc90e25355386cb087886f",
"eb7f7f9b7744d0f28ab82f8272fbe643e56a070c",
"70d1d32e783dac03a7000616e63207f17b996807"
]
}
}
}

Example 3

{
"rl": {
"query": {
"rha1_type":"pe01",
"response_format":"xml",
"extended":"false",
"hashes": [
"70d1d32e783dac03a7000616e63207f17b996809",
"0dd1bc46e96d41591294e8c13c6eb7f6212be2ed",
"57dafb1b1f5c0e0217fc90e25355386cb087886f",
"eb7f7f9b7744d0f28ab82f8272fbe643e56a070c",
"70d1d32e783dac03a7000616e63207f17b996807"
]
}
}
}