File analysis (TCA-0104)

This service provides analysis data on requested hashes. Depending on the hash, the response can include relevant portions of static analysis, dynamic analysis information, AV scan information, file sources and any related IP/domain information. The service supports single and bulk hash queries. For additional context on indicators of compromise and actively exploited vulnerabilities, refer to CISA's Known Exploited Vulnerabilities Catalog.

This API is rate limited to 100 requests per second.

General Info about Requests/Responses

All requests support the format query field which supports two options: xml or json.
Default response format is xml, except for bulk queries, where default format is the same as the post_format
All bulk query rules will accept POST payload of the same format (described below).
The number of hashes in a bulk request must not be larger than a hundred (100).
POST requests must set in HTTP header field Content-Type: application/octet-stream
Dynamic analysis report is only available with additional permissions.

Single File Analysis

This query returns a response containing analysis results for the requested hash. Depending on the hash, the response can contain information such as full file format information, static analysis data, historic multi-AV scan records, extracted malware configuration network and mutex data, dynamic network analysis data, file sources, parent information, certificate chain data, certificate signer information.

View OpenAPI Specification

Request

GET /api/databrowser/rldata/query/{hash_type}/{hash_value}[?format=xml|json]

hash_type accepts these options: md5, sha1, sha256
hash_value must be a valid hash defined by the hash_type parameter

Response

Response code 404 is returned with a message "Requested data was not found" when the requested hash isn't found in the database.

Response

{
  "rl": {
    "sample": {
      "sha1": "string",
      "md5": "string",
      "sha256": "string",
      "sha384": "string",
      "sha512": "string",
      "ripemd160": "string",
      "ssdeep": "string",
      "tlsh": "string",
      "sample_size": 0,
      "password_uploaded": "bool"
      "relationships": {
        "container_sample_sha1": [
          "string"
        ],
        "parent_sample_sha1": [
          "string"
        ],
        "child_sample_sha1": [
          "string"
        ],
        "more_child_samples_available": "bool"
      },
      "analysis": {
        "entries": [
          {
            "record_time": "string",
            "analysis_type": "string",
            "analysis_version": "string",
            "tc_report": {
              "info": {
                "file": {
                  "file_type": "string",
                  "file_subtype": "string"
                },
                "identification": {
                  "name": "string"
                }
              },
              "interesting_strings": [
                {
                  "category": "string",
                  "values": [
                    {}
                  ]
                }
              ],
              "story": "string"
            }
          }
        ]
      },
      "xref": {
        "entries": [
          {
            "record_time": "string",
            "scanners": [
              {
                "name": "string",
                "result": "string"
              }
            ],
            "info": {
              "scanners": [
                {
                  "name": "string",
                  "version": "string",
                  "timestamp": "string"
                }
              ]
            }
          }
        ],
        "first_seen": "string",
        "last_seen": "string",
        "sample_type": "string"
      },
      "sources": {
        "entries": [
          {
            "record_time": "string",
            "tag": "string",
            "properties": [
              {
                "name": "string",
                "value": "string"
              }
            ],
            "domain": {
              "name": "string"
            }
          }
        ]
      },
      "computer_vision_analysis": {
        "entries": [
          {
            "analysis_time": "string",
            "results": [
              {
                "format": "string",
                "category": "string",
                "value": "string"
              }
            ]
          }
        ]
      }
    }
  }
}

rl.sample

sha1
- SHA1 value of the requested sample. This field is mandatory and can be used as a primary key.
hashes
- List of hashes computed for the requested sample, e.g. MD5, SHA256, SHA384, SHA512, SSDEEP, Authenticode hashes (PE_SHA1, PE_SHA256), imphash…
tlsh
- A hash value which can be used for file similarity comparisons, helping to identify similar, nearly identical, or modified files. TLSH hash is not calculated if SSDEEP is enabled and the file size is either smaller than its minimum size (1024 bytes) or larger than its maximum size (734003200 bytes).
sample_size
- Logical file size of the requested sample (in bytes).
password_uploaded
- Indicates that password was uploaded for unpacking this sample. If a password wasn't uploaded, this field is omitted.
relationships
- Parent, container and child sample lists. (limited)
analysis
- Different analysis results for the requested sample. Currently only Spectra Core is supported for this section.
xref
- Collection of AV scanning reports for the requested sample. The API is able to return up to 20 last reports. Every item represents a single AV scanning report.
sources
- A sequence of source items indicating where the sample came from. These can be different domains, specific uploaders, etc. One sample can have multiple sources. The service returns a list of 10 oldest sources, sorted by timestamp in descending order.
dynamic_analysis
- If the sample has been detonated in the sandbox, the section displays network data and mutexes observed.
computer_vision_analysis
- Processes samples to detect URIs in images and extract information from QR codes. If a sample has already been analyzed, this section displays the results.
- For email samples, this section includes QR and OCR-decoded URIs found in the email itself and propagated from all child samples.

rl.sample.relationships

container_sample_sha1
- List of container hashes. Container is the top-level archive/sample that was uploaded to the system and also contains the requested sample. The response will contain up to 5 container sample hashes, sorted by SHA1 hash.
parent_sample_sha1
- List of samples that directly contain the requested sample. The requested hash is a child to the hashes in this list. The list of children has been acquired by file extraction. The response will contain up to 5 parent sample hashes, sorted by SHA1 hash.
child_sample_sha1
- A list of samples contained within the requested sample.
more_child_samples_available
- Indicator that more than 10 child samples are available. If false, this field is omitted.

rl.sample.analysis.entries.item

record_time
- Timestamp indicating when the analysis was executed.
analysis_type
- Label indicating the type of analysis (for example, TC_REPORT indicates Spectra Core static analysis).
analysis_version
- Version of the tool used for analysis.
tc_report
- Available metadata for the requested sample obtained as a result of Spectra Core static analysis.

rl.sample.analysis.entries.item.tc_report

info
- Contains information about file_type, file_subtype, validation, identification, and package (if applicable).
metadata
- Relevant information about a sample extracted through static analysis. The fields returned in this section depend on the sample type.
interesting_strings
- When Spectra Core encounters files with strings that contain interesting information, it will tag those files with tags corresponding to the type of string. Strings are considered interesting if they contain information related to various network resources and addresses. Interesting strings are usually found in binary files, documents and text files. Every item inside this object belongs to a category and contains values.
- The category field classifies the extracted string based on its type. It includes common network resource identifiers and address formats such as URIs, IPs, and protocols. Supported values: domain, mailto, ipv4, ipv6, http, https, ftp, nfs, file, gopher, ldap, prospero, net.pipe, net.tcp, news, nntp, telnet, uuid, and wais.
- The values fields contain the extracted string values.
story
- The story section contains a summarized natural language description of the file's behavior and properties.
indicators
- List of indicators. They are the main static analysis technique Spectra Core uses to describe the analyzed content behavior. Since indicators are human-readable, their purpose is to simplify the code analysis process by converting complex code patterns into descriptions of their intent. Simply put, indicators make it possible to describe the file behavior through descriptions like “Downloads a file”, “Encrypts or encodes data in memory using Windows API”, “Enumerates currently available disk drives”, etc. While some indicators can only be found in certain formats, most are universal and therefore generally applicable.

rl.sample.analysis.entries.item.tc_report.info

file_type
- Type of the sample, as detected by Spectra Core (for example, Document, Image, PE…).
file_subtype
- Subtype of the sample, as detected by Spectra Core (for example, TIFF, Clojure, HTML…).
proposed_filename
- Suggested filename extracted from other metadata if the original filename is not available.
identification
- Identification name of the sample. Identification is not generated for all file types and subtypes; it is an optional field in the static analysis report.
validation
- Indicates whether a sample is considered valid by Spectra Core at the time it was processed. Optionally contains a list of validation descriptions, for example bad checksum, bad signature, invalid certificate, expired certificate, blacklisted certificate, whitelisted certificate, malformed certificate, self-signed certificate. Explanations of these values can be found in the table "Sample validation explanations" below.
package
- Sample metadata related to malware configurations, such as C&C servers.

rl.sample.analysis.entries[].tc_report.metadata

application
- Refers to all PE files. Metadata that is extracted statically from these formats can vary depending on the type of application. This section can contain information such as dos_header, file_header, optional_header, sections, imports, resources...
certificate
- If the requested sample contains certificate-related metadata, this section provides detailed information about certificates, such as subject, issuer, serial_number, thumbprint, extensions...
attack
- If the requested sample contains MITRE ATT&CK metadata, this section provides list of attack tactics, and for each of these tactics, list of their attack techniques and subtechniques. Attack tactics, techniques and subtechniques have information about their id, description and name, while techniques and subtechniques can additionally contain indicators with priority, category, relevance etc.
software_packages
- List of packages. File type reserved for all programming language-specific packages. softwarePackage is specific metadata related to the package file type.

rl.sample.analysis.entries[].tc_report.software_packages

name
- Package name
description
- Package summary
authors
- List of package authors
release_dependencies
- List of release dependency packages with the field name, representing the name of the package
develop_dependencies
- List of development dependency packages with the field name, representing the name of the package

rl.sample.analysis.entries[].indicators

priority
- Priority is a number used to sort the indicators from least to most interesting (0 to 10) within a category. It is determined by the severity of the action described by the indicator. More dangerous indicators are prioritized higher within their category.
category
- Category to which the indicator belongs.
description
- Short description of the capability referenced by the detected indicator.
id
- Unique ID of an indicator.
relevance
- Contribution to the final classification.

rl.sample.xref.entries[]

record_time
- Timestamp when the multi-AV report was generated.
scanners
- List of results per scanner for this report. Every item is a scanner-specific scanning report, containing the scanner name and scanner detection string.
info
- Information about the scanners used for this scanning report. Contains a sequence of scanners ordered by name, with name, version, and timestamp indicating when the scanner was updated.
first_seen
- The date and time of the oldest multi-AV report created for the requested sample.
last_seen
- The date and time when the sample last received impactful changes to its analysis or classification data.
sample_type
- Detected sample type for the requested sample.

rl.sample.sources.entries[]

record_time
- Timestamp indicating when the requested sample was uploaded.
tag
- Uploader designation indicating the origin of the sample; can be reversing_labs, external_feed, microsoft_whitelist or nsrl.
properties
- Various sample-related information listed as name/value properties in free format.
domain
- If there is a domain linked to the sample source, it will be described within this element.

rl.sample.dynamic_analysis.entries.dynamic_analysis_report

analysed_on
- Timestamp indicating when the dynamic analysis report for the requested sample was generated.
version
- Numerical label indicating the version of the tool used for dynamic analysis.
summary
- Contains mutexes detected during dynamic analysis (if any).
network
- Contains information about dns_requests, domains, tcp_destinations, udp_destinations, http_requests detected during dynamic analysis.

rl.sample.computer_vision_analysis.entries[]

analysis_time
- The timestamp indicating when the computer vision analysis was performed.
results
- A list of elements detected by the computer vision analysis. Each element includes the following fields: format, category, and value.
- For email samples, results include URIs extracted directly from the email and URIs propagated from all child samples for up to 1,000 children, providing information for QR and OCR-decoded URIs within an email and its attachments.
- The format field specifies the data format from which the string was extracted. Supported values: OCR for URIs extracted from images and PDFs, and QR_CODE for strings extracted from QR codes.
- The category field classifies the extracted string based on its type. It includes common network resource identifiers and address formats such as URIs, IPs, and protocols. Supported values: domain, mailto, ipv4, ipv6, http, https, ftp, nfs, file, gopher, ldap, prospero, net.pipe, net.tcp, news, nntp, telnet, uuid, and wais.
- The value field contains the extracted string value from the computer vision analysis.

Sample validation explanations

Name	Description
Valid certificate	Any certificate with an intact digital certificate chain that confirms the integrity of the signed file. The hash within Signer Info matches the hash of the file contents.
Invalid certificate	Any certificate with an intact digital certificate chain, but for which the certificate chain validation failed due to other reasons (e.g. because of attribute checks). Without a valid digital certificate chain, the integrity of the signed file cannot be validated.
Bad checksum	The integrity of the signed file could not be verified, because the hash within Signer Info does not match the hash of the file contents.
Bad signature	Any certificate with an intact digital certificate chain, but for which the signature validation failed. Without a valid signature, the integrity of the signed file cannot be validated.
Malformed certificate	Any certificate that does not have an intact digital certificate chain. The digital certificate is corrupted or incomplete, but that doesn't mean the file is also corrupted. Without a valid digital certificate chain, the integrity of the signed file cannot be validated.
Self-signed certificate	A self-signed certificate is a certificate that is signed by the same entity whose identity it certifies. In other words, this is a certificate that is used to sign a file, and doesn't have a CA that issued it. If CA information is present, but not found within the Spectra Core certificate store, the CA will be considered plausible and files signed with it will be declared valid (they will not be considered self-signed).
Impersonation attempt	Any self-signed certificate is a candidate for an impersonation check. Impersonation means that the signer is trying to misrepresent itself as a trusted party, where "trusted party" is defined by the certificate whitelist. Any self-signed certificate that matches the common name of another certificate on the Spectra Core whitelist is marked as an impersonation attempt
Expired certificate	Any certificate with signing time information is checked for expiration. When the time on the local machine indicates that the certificate has passed its "valid to" date and time, the certificate is considered expired. The "Expired" certificate status is merely informative, and expired certificates cannot influence certificate classification.
Untrusted certificate	Any valid certificate for which the digital certificate chain cannot be validated against a trusted CA. Untrusted certificates are valid certificates, but they cannot be whitelisted because their chain does not terminate with a CA in the Spectra Core certificate store.
Other	security catalog, revoked certificate, revoked certificate unspecified, revoked certificate key compromise, revoked certificate ca compromise, revoked certificate affiliation changed, revoked certificate superseded, revoked certificate cessation of operation, revoked certificate hold, revoked certificate remove from crl, revoked certificate privilege withdrawn, revoked certificate aa compromise, signed after revocation, blacklisted certificate, whitelisted certificate, bad certificate timestamp

Bulk File Analysis

This query retrieves the same data as the single query, but for multiple hashes within a single response. It is more network-efficient compared to several consecutive single queries.

View OpenAPI Specification

Request

POST /api/databrowser/rldata/bulk_query/{post_format}

post_format is a required parameter that defines the POST payload format
post_format variable rule will accept the options xml and json

The following definitions are valid for both formats:

hash_type value must be one of the following options: md5, sha1, sha256
hash_value must be a valid hash defined by hash_type

Request body

{
  "rl": {
    "query": {
      "hash_type": "hash_type",
      "hashes": [
        "hash_value",
        "hash_value",
        "hash_value"
      ]
    }
  }
}

Response

{
  "rl": {
    "entries": [
      {}
    ],
    "invalid_hashes": [
      "string"
    ],
    "unknown_hashes": [
      "string"
    ]
  }
}

invalid_hashes
- A list of ill-formatted hashes provided in the request
unknown_hashes
- A list of hashes from the request that were not found in the database or don't have multi-AV data

Examples

Single Query - changing the response format

/api/databrowser/rldata/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?format=json
/api/databrowser/rldata/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?format=xml

Single query - changing the hash type

/api/databrowser/rldata/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b
/api/databrowser/rldata/query/sha256/10dbb2b27208c5566d326b47950657bf6b3c9a59e302598a128ad7125d5fb4fd

Bulk query - changing the POST format

/api/databrowser/rldata/bulk_query/xml
/api/databrowser/rldata/bulk_query/json

Bulk query - JSON POST format

/api/databrowser/rldata/bulk_query/json

{
  "rl": {
    "query": {
      "hash_type": "md5",
      "hashes": [
        "4bb64c06b1a72539e6d3476891daf17b",
        "6353de8f339b7dcc6b25356f5fbffa4e",
        "59cb087c4c3d251474ded9e156964d5d",
        "6c2eb9d1a094d362bcc7631f2551f5a4",
        "a82c781ce0f43d06c28fe5fc8ebb1ca9",
        "920f5ba4d08f251541c5419ea5fb3fb3"
      ]
    }
  }
}

{
  "rl": {
    "query": {
      "hash_type": "sha1",
      "hashes": [
        "13e40f38427a55952359bfc5f52b5841ce1b46ba",
        "831fc2b9075b0a490adf15d2c5452e01e6feaa17",
        "42b05278a6f2ee006072af8830c103eab2ce045f"
      ]
    }
  }
}

General Info about Requests/Responses​

Single File Analysis​

Request​

Response​

Sample validation explanations​

Bulk File Analysis​

Request​

Request body​

Response​

Examples​

Single Query - changing the response format​

Single query - changing the hash type​

Bulk query - changing the POST format​

Bulk query - JSON POST format​

General Info about Requests/Responses

Single File Analysis

Request

Response

Sample validation explanations

Bulk File Analysis

Request

Request body

Response

Examples

Single Query - changing the response format

Single query - changing the hash type

Bulk query - changing the POST format

Bulk query - JSON POST format