Functionally similar files (analytics) (TCA-0321)
The ReversingLabs Hashing Algorithm (RHA) identifies code similarity between unknown samples and previously seen malware samples. Files have the same RHA1 hash when they are functionally similar.
This API provides real-time statistics (counters) for malicious, suspicious and known samples that are functionally similar to the provided SHA1 hash at the requested precision level.
Precision level represents the degree to which a file is functionally similar to another file. The following precision levels are supported - 25% and 50% for PE and 25% for MachO and ELF executable files. A higher precision level will match fewer files, but the files will have more functional similarity.
Through this API, users can easily submit executable files and get important statistics - counters of functionally similar files mapped to their classification. By using the 'extended' option, additional sample metadata such as file hashes (SHA1, MD5, SHA256), classification and reputation information (threat level, threat name, malware family, malware type...) will be returned for the submitted SHA1 hash.
General Info about Requests/Responses
- All requests support the format parameter with two possible values: xml or json.
- Default response format is xml, except for bulk queries where the response format is the same as the post_format.
- POST requests must contain an HTTP header field Content-Type: application/octet-stream.
- All bulk queries will accept POST payload in XML or JSON (described below)
RHA1 Analytics Single Query
This query returns statistics (counters) of files mapped to their classifications that are functionally similar to the submitted file at the given precision level.
The response will contain a count of malicious, suspicious and known files, as well as the total number of files. If the extended option is selected, the response will contain additional metadata for the provided SHA1 hash.
Request Format
GET /api/rha1/analytics/v1/query/{rha1_type}/{sha1}[?format=xml|json][&extended=false|true]
rha1_type
- A measure of the RHA1 precision level. It represents the degree to which a file is functionally similar to another file. A higher precision level will match fewer files, but the files will have more functional similarity:
- Required
- pe01, elf01, machO01 - 25% precision level
- pe02 - 50% precision level
sha1
- Must be a valid SHA1 hash
- Required
format
- Specifies the response format, with possible values being xml (default) and json
- Optional
extended
- An optional parameter. Possible values are true - extended, and false - non-extended data set (default)
- Optional
Response Format
If the requested hash doesn't exist in the database records or doesn't match the requested RHA1 precision level, the server will respond with the status response code 404 and the message "Requested data was not found"
All possible response fields are described in the following tables. Example responses can be found further in the document.
rha1_counters
- Parent node for classification counters rl (default rl root)
sha1
- The requested SHA1 hash
rha1_type
- Type of RHA1 hash (the precision level of the RHA1)
rha1_first_seen
- RHA1 bucket first seen
rha1_last_seen
- RHA1 bucket last seen
sample_counters
- List of counters
sample_metadata
- Sample metadata fields, if extended is set to true rl > rha1_counters
malicious
- Number of malicious samples that are functionally similar to the provided SHA1 hash at the requested precision level (RHA1 type)
suspicious
- Number of suspicious samples that are functionally similar to the provided SHA1 hash at the requested precision level (RHA1 type)
known
- Number of known samples that are functionally similar to the provided SHA1 hash at the requested precision level (RHA1 type)
total
- Sum of all counters rl > rha1_counters > sample_counters
Returned response fields depend on the selected data set. If the extended option is set to true, the following fields will be returned for the requested SHA1 sample hash. Empty fields are not included in the response.
md5
- MD5 hash of the sample
sha256
- SHA256 hash of the sample
sample_available
- Sample's download availability status
classification
- Current Malware Presence status designation. The status designation is calculated by a proprietary ReversingLabs algorithm that adapts and improves as new information about the file and the threat is discovered. The algorithm takes the following into account: Spectra Core static analysis results, ReversingLabs RHA1 similarity hashing algorithm, complex malformation rules, YARA rules, ReversingLabs signatures, Spectra Intelligence antivirus scan results, threat and trust factors, parent/child relationships, certificates, and other metadata-specific information
threat_level
- Threat level of the sample
trust_factor
- Trust factor of the sample
threat_name
- Detected threat name
malware_family
- Malware family for malicious and suspicious samples
malware_type
- Malware type for malicious and suspicious samples
sample_type
- Sample type
sample_size
- Sample size (in bytes)
first_seen
- Time when the sample was first seen in the ReversingLabs system (UTC)
last_seen
- Time when the sample was last seen in the ReversingLabs system (UTC) rl > rha1_counters > sample_metadata
Response Examples
{
"rl": {
"rha1_counters": {
"sha1": "4f21ad6781bfbee641ecd075fc079b5e6145f03f",
"rha1_type": "pe01",
"rha1_first_seen": "2011-05-21T09:36:00",
"rha1_last_seen": "2012-12-17T00:00:00",
"sample_counters": {
"known": 4116,
"malicious": 138,
"suspicious": 182,
"total": 4436
},
"sample_metadata": {
"md5": "6ede26d354a3956573291361f754ea10",
...
}
}
}
}
RHA1 Analytics Bulk Query
This query retrieves the same data as the single query, but for multiple hashes within a single response. It is more network-efficient compared to multiple single queries. Bulk query accepts a maximum of 1000 hashes in a single request.
POST /api/rha1/analytics/v1/query/{post_format}
post_format
- Defines the POST payload format. Supported options are
xml
andjson
- Required
- Defines the POST payload format. Supported options are
POST Body Request Format
rha1_type
- A measure of the RHA1 precision level. It represents the degree to which a file is functionally similar to another file. A higher precision level will match fewer files, but the files will have more functional similarity:
- Required
- pe01, elf01, machO01 - 25% precision level
- pe02 - 50% precision level
hash_value
- Must be a valid SHA1 hash
- Required
response_format
- Specifies the response format, with possible values being xml (default) and json
- Optional
extended
- An optional parameter. Possible values are true - extended, and false - non-extended data set (default)
- Optional
{
"rl" : {
"query" : {
"rha1_type" : "(pe01|pe02|elf01|macho01)",
"response_format" : "(xml|json)",
"extended" : "(true|false)",
"hashes" : [
"hash_value",
"hash_value",
...,
"hash_value"
]
}
}
}
Response format
-
entries
- Contains a sequence of counters data
-
invalid_hashes
- List of ill-formatted hashes from the request
-
unknown_hashes
- List of hashes from the request that were not found in the database rl (default rl root)
-
item
- The <item> is equivalent to the
table
node from the RHA1 Analytics single query rl > entries
- The <item> is equivalent to the
{ "rl": {
"entries": [
{ ... }, -
{ ... }
],
"invalid_hashes": [
"zzze764af2be3711ce1147fa762562188b57dae83z"
],
"unknown_hashes": [
"1147fa762562188b57dae8cf3e764af2be3711ce"
]
}}
Examples
Format query field
These examples request different response formats:
[GET]
/api/rha1/analytics/v1/query/pe01/25cefc6fc048fbac9eccf2d65af736dd12e2c62a?format=json
/api/rha1/analytics/v1/query/pe01/eb7f7f9b7744d0f28ab82f8272fbe643e56a070c?format=xml
Format query field
These examples request different RHA1 types
[GET]
/api/rha1/analytics/v1/query/pe01/25cefc6fc048fbac9eccf2d65af736dd12e2c62a
/api/rha1/analytics/v1/query/pe02/00e710e430e5558f06b1b3c85d518fbcfa118eef
/api/rha1/analytics/v1/query/elf01/9c489fcaee9abedd736b474d7f9076d23ea2bb9b
/api/rha1/analytics/v1/query/macho01/b7a1143b04aa93e0b236968c6cded7807aa5e89c
Extended optional parameter used
These examples demonstrate the use of the extended field:
[GET]
/api/rha1/analytics/v1/query/pe01/25cefc6fc048fbac9eccf2d65af736dd12e2c62a?extended=true
/api/rha1/analytics/v1/query/elf01/9c489fcaee9abedd736b474d7f9076d23ea2bb9b?extended=false
Bulk query
These examples use different POST formats:
[POST]
/api/rha1/analytics/v1/query/xml
/api/rha1/analytics/v1/query/json
Bulk JSON POST
This example varies the requested RHA1 type, response_format and extended options, but the post_type is staying the same:
/api/rha1/analytics/v1/query/json
Example 1
{
"rl": {
"query": {
"rha1_type":"pe01",
"response_format":"xml",
"extended":"false",
"hashes": [
"70d1d32e783dac03a7000616e63207f17b996809",
"0dd1bc46e96d41591294e8c13c6eb7f6212be2ed",
"57dafb1b1f5c0e0217fc90e25355386cb087886f",
"eb7f7f9b7744d0f28ab82f8272fbe643e56a070c",
"70d1d32e783dac03a7000616e63207f17b996807"
]
}
}
}
Example 2
{
"rl": {
"query": {
"rha1_type":"pe02",
"response_format":"xml",
"extended":"false",
"hashes": [
"70d1d32e783dac03a7000616e63207f17b996809",
"0dd1bc46e96d41591294e8c13c6eb7f6212be2ed",
"57dafb1b1f5c0e0217fc90e25355386cb087886f",
"eb7f7f9b7744d0f28ab82f8272fbe643e56a070c",
"70d1d32e783dac03a7000616e63207f17b996807"
]
}
}
}
Example 3
{
"rl": {
"query": {
"rha1_type":"pe01",
"response_format":"xml",
"extended":"false",
"hashes": [
"70d1d32e783dac03a7000616e63207f17b996809",
"0dd1bc46e96d41591294e8c13c6eb7f6212be2ed",
"57dafb1b1f5c0e0217fc90e25355386cb087886f",
"eb7f7f9b7744d0f28ab82f8272fbe643e56a070c",
"70d1d32e783dac03a7000616e63207f17b996807"
]
}
}
}