Skip to main content

File analysis

This service provides analysis data on requested hashes. Depending on the hash, the response can include relevant portions of static analysis, dynamic analysis information, AV scan information, file sources and any related IP/domain information. The service supports single and bulk hash queries.

This API is rate limited to 100 requests per second.

General Info about Requests/Responses

  • All requests support the format query field which supports two options: xml or json.
  • Default response format is xml, except for bulk queries, where default format is the same as the post_format
  • All bulk query rules will accept POST payload of the same format (described below).
  • The number of hashes in a bulk request must not be larger than a hundred (100).
  • POST requests must set in HTTP header field Content-Type: application/octet-stream
  • Dynamic analysis report is only available with additional permissions.

Single File Analysis

This query returns a response containing analysis results for the requested hash. Depending on the hash, the response can contain information such as full file format information, static analysis data, historic multi-AV scan records, extracted malware configuration network and mutex data, dynamic network analysis data, file sources, parent information, certificate chain data, certificate signer information.

GET /api/databrowser/rldata/query/{hash_type}/{hash_value}[?format=xml|json]
  • hash_type accepts these options: md5, sha1, sha256
  • hash_value must be a valid hash defined by the hash_type parameter

Response format

Response code 404 is returned with a message "Requested data was not found" when the requested hash isn't found in the database.

{
"rl": {
"sample": {
"sha1": "string",
"md5": "string",
"sha256": "string",
"sha384": "string",
"sha512": "string",
"ripemd160": "string",
"ssdeep": "string",
"sample_size": 0,
"relationships": {
"container_sample_sha1": [
{}
]
},
"analysis": {
"entries": [
{
"record_time": "string",
"analysis_type": "string",
"analysis_version": "string",
"tc_report": {
"info": {
"file": {
"file_type": "string",
"file_subtype": "string"
},
"identification": {
"name": "string"
}
},
"interesting_strings": [
{
"category": "string",
"values": [
{}
]
}
],
"story": "string"
}
}
]
},
"xref": {
"entries": [
{
"record_time": "string",
"scanners": [
{
"name": "string",
"result": "string"
}
],
"info": {
"scanners": [
{
"name": "string",
"version": "string",
"timestamp": "string"
}
]
}
}
],
"first_seen": "string",
"last_seen": "string",
"sample_type": "string"
},
"sources": {
"entries": [
{
"record_time": "string",
"tag": "string",
"properties": [
{
"name": "string",
"value": "string"
}
],
"domain": {
"name": "string"
}
}
]
}
}
}
}

rl.sample

  • sha1
    • SHA1 value of the requested sample. This field is mandatory and can be used as a primary key.
  • hashes
    • List of hashes computed for the requested sample, e.g. MD5, SHA256, SHA384, SHA512, SSDEEP, Authenticode hashes (PE_SHA1, PE_SHA256), imphash…
  • sample_size
    • Logical file size of the requested sample (in bytes).
  • relationships
    • Parent and container sample lists.
  • analysis
    • Different analysis results for the requested sample. Currently only Spectra Core is supported for this section.
  • xref
    • Collection of AV scanning reports for the requested sample. The API is able to return up to 20 last reports. Every item represents a single AV scanning report.
  • sources
    • A sequence of source items indicating where the sample came from. These can be different domains, specific uploaders, etc. One sample can have multiple sources. The service returns a list of 10 oldest sources, sorted by timestamp in descending order.
  • dynamic_analysis
    • If the sample has been detonated in the sandbox, the section displays network data and mutexes observed.

rl.sample.relationships

  • container_sample_sha1
    • List of container hashes. Container is the top-level archive/sample that was uploaded to the system and also contains the requested sample. The response will contain up to 5 container sample hashes, sorted by SHA1 hash.
  • parent_sample_sha1
    • List of samples that directly contain the requested sample. The requested hash is a child to the hashes in this list. The list of children has been acquired by file extraction. The response will contain up to 5 parent sample hashes, sorted by SHA1 hash.

rl.sample.analysis.entries.item

  • record_time
    • Timestamp indicating when the analysis was executed.
  • analysis_type
    • Label indicating the type of analysis (for example, TC_REPORT indicates Spectra Core static analysis).
  • analysis_version
    • Version of the tool used for analysis.
  • tc_report
    • Available metadata for the requested sample obtained as a result of Spectra Core static analysis.
  • indicators
    • List of indicators. They are the main static analysis technique Spectra Core uses to describe the analyzed content behavior. Since indicators are human-readable, their purpose is to simplify the code analysis process by converting complex code patterns into descriptions of their intent. Simply put, indicators make it possible to describe the file behavior through descriptions like “Downloads a file”, “Encrypts or encodes data in memory using Windows API”, “Enumerates currently available disk drives”, etc. While some indicators can only be found in certain formats, most are universal and therefore generally applicable.

rl.sample.analysis.entries.item.tc_report

  • info
    • Contains information about file_type, file_subtype, validation, identification, and package (if applicable).
  • metadata
    • Relevant information about a sample extracted through static analysis. The fields returned in this section depend on the sample type.
  • interesting_strings
    • When Spectra Core encounters files with strings that contain interesting information, it will tag those files with tags corresponding to the type of string. Strings are considered interesting if they contain information related to various network resources and addresses (for example, HTTP, HTTPS, FTP or SSH). Interesting strings are usually found in binary files, documents and text files. Every item inside this object belongs to a category (type of interesting string, e.g. ftp, http(s), mailto, ipv4…) and contains values, which correspond to actual strings.
  • story
    • The story section contains a summarized natural language description of the file's behavior and properties.

rl.sample.analysis.entries.item.tc_report.info

  • file_type
    • Type of the sample, as detected by Spectra Core (for example, Document, Image, PE…).
  • file_subtype
    • Subtype of the sample, as detected by Spectra Core (for example, TIFF, Clojure, HTML…).
  • proposed_filename
    • Suggested filename extracted from other metadata if the original filename is not available.
  • identification
    • Identification name of the sample. Identification is not generated for all file types and subtypes; it is an optional field in the static analysis report.
  • validation
    • Indicates whether a sample is considered valid by Spectra Core at the time it was processed. Optionally contains a list of validation descriptions, for example bad checksum, bad signature, invalid certificate, expired certificate, blacklisted certificate, whitelisted certificate, malformed certificate, self-signed certificate. Explanations of these values can be found in the table "Sample validation explanations" below.
  • package
    • Sample metadata related to malware configurations, such as C&C servers.

rl.sample.analysis.entries[].tc_report.metadata

  • application
    • Refers to all PE files. Metadata that is extracted statically from these formats can vary depending on the type of application. This section can contain information such as dos_header, file_header, optional_header, sections, imports, resources...
  • certificate
    • If the requested sample contains certificate-related metadata, this section provides detailed information about certificates, such as subject, issuer, serial_number, thumbprint, extensions...
  • attack
    • If the requested sample contains MITRE ATT&CK metadata, this section provides list of attack tactics, and for each of these tactics, list of their attack techniques and subtechniques. Attack tactics, techniques and subtechniques have information about their id, description and name, while techniques and subtechniques can additionally contain indicators with priority, category, relevance etc.
  • software_packages
    • List of packages. File type reserved for all programming language-specific packages. softwarePackage is specific metadata related to the package file type.

rl.sample.analysis.entries[].tc_report.software_packages

  • name
    • Package name
  • description
    • Package summary
  • authors
    • List of package authors
  • release_dependencies
    • List of release dependency packages with the field name, representing the name of the package
  • develop_dependencies
    • List of development dependency packages with the field name, representing the name of the package

rl.sample.analysis.entries[].indicators

  • priority
    • Priority is a number used to sort the indicators from least to most interesting (0 to 10) within a category. It is determined by the severity of the action described by the indicator. More dangerous indicators are prioritized higher within their category.
  • category
    • Category to which the indicator belongs.
  • description
    • Short description of the capability referenced by the detected indicator.
  • id
    • Unique ID of an indicator.
  • relevance
    • Contribution to the final classification.

rl.sample.xref.entries[]

  • record_time
    • Timestamp when the multi-AV report was generated.
  • scanners
    • List of results per scanner for this report. Every item is a scanner-specific scanning report, containing the scanner name and scanner detection string.
  • info
    • Information about the scanners used for this scanning report. Contains a sequence of scanners ordered by name, with name, version, and timestamp indicating when the scanner was updated.
  • first_seen
    • The date and time of the oldest multi-AV report created for the requested sample.
  • last_seen
    • The date and time of the most recent multi-AV report created for the requested sample.
  • sample_type
    • Detected sample type for the requested sample.

rl.sample.sources.entries[]

  • record_time
    • Timestamp indicating when the requested sample was uploaded.
  • tag
    • Uploader designation indicating the origin of the sample; can be reversing_labs, external_feed, microsoft_whitelist or nsrl.
  • properties
    • Various sample-related information listed as name/value properties in free format.
  • domain
    • If there is a domain linked to the sample source, it will be described within this element.

rl.sample.dynamic_analysis.entries.dynamic_analysis_report

  • analysed_on
    • Timestamp indicating when the dynamic analysis report for the requested sample was generated.
  • version
    • Numerical label indicating the version of the tool used for dynamic analysis.
  • summary
    • Contains mutexes detected during dynamic analysis (if any).
  • network
    • Contains information about dns_requests, domains, tcp_destinations, udp_destinations, http_requests detected during dynamic analysis.

Sample validation explanations

NameDescription
Valid certificateAny certificate with an intact digital certificate chain that confirms the integrity of the signed file. The hash within Signer Info matches the hash of the file contents.
Invalid certificateAny certificate with an intact digital certificate chain, but for which the certificate chain validation failed due to other reasons (e.g. because of attribute checks). Without a valid digital certificate chain, the integrity of the signed file cannot be validated.
Bad checksumThe integrity of the signed file could not be verified, because the hash within Signer Info does not match the hash of the file contents.
Bad signatureAny certificate with an intact digital certificate chain, but for which the signature validation failed. Without a valid signature, the integrity of the signed file cannot be validated.
Malformed certificateAny certificate that does not have an intact digital certificate chain. The digital certificate is corrupted or incomplete, but that doesn't mean the file is also corrupted. Without a valid digital certificate chain, the integrity of the signed file cannot be validated.
Self-signed certificateA self-signed certificate is a certificate that is signed by the same entity whose identity it certifies. In other words, this is a certificate that is used to sign a file, and doesn't have a CA that issued it. If CA information is present, but not found within the Spectra Core certificate store, the CA will be considered plausible and files signed with it will be declared valid (they will not be considered self-signed).
Impersonation attemptAny self-signed certificate is a candidate for an impersonation check. Impersonation means that the signer is trying to misrepresent itself as a trusted party, where "trusted party" is defined by the certificate whitelist. Any self-signed certificate that matches the common name of another certificate on the Spectra Core whitelist is marked as an impersonation attempt
Expired certificateAny certificate with signing time information is checked for expiration. When the time on the local machine indicates that the certificate has passed its "valid to" date and time, the certificate is considered expired. The "Expired" certificate status is merely informative, and expired certificates cannot influence certificate classification.
Untrusted certificateAny valid certificate for which the digital certificate chain cannot be validated against a trusted CA. Untrusted certificates are valid certificates, but they cannot be whitelisted because their chain does not terminate with a CA in the Spectra Core certificate store.
Othersecurity catalog, revoked certificate, revoked certificate unspecified, revoked certificate key compromise, revoked certificate ca compromise, revoked certificate affiliation changed, revoked certificate superseded, revoked certificate cessation of operation, revoked certificate hold, revoked certificate remove from crl, revoked certificate privilege withdrawn, revoked certificate aa compromise, signed after revocation, blacklisted certificate, whitelisted certificate, bad certificate timestamp

Bulk File Analysis

This query retrieves the same data as the single query, but for multiple hashes within a single response. It is more network-efficient compared to several consecutive single queries.

POST /api/databrowser/rldata/bulk_query/{post_format}
  • post_format is a required parameter that defines the POST payload format.
  • post_format variable rule will accept the options xml and json

Request POST format

The following definitions are valid for both formats:

  • hash_type value must be one of the following options: md5, sha1, sha256
  • hash_value must be a valid hash defined by hash_type
{
"rl": {
"query": {
"hash_type": "hash_type",
"hashes": [
"hash_value",
"hash_value",
"hash_value"
]
}
}
}

Response format

{
"rl": {
"entries": [
{}
],
"invalid_hashes": [
"string"
],
"unknown_hashes": [
"string"
]
}
}
  • invalid_hashes
    • A list of ill-formatted hashes provided in the request
  • unknown_hashes
    • A list of hashes from the request that were not found in the database or don't have multi-AV data

Examples

Single Query - changing the response format

/api/databrowser/rldata/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?format=json
/api/databrowser/rldata/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b?format=xml

Single query - changing the hash type

/api/databrowser/rldata/query/sha1/a25b6db2d363eaa31de348399aedc5651280b52b
/api/databrowser/rldata/query/sha256/10dbb2b27208c5566d326b47950657bf6b3c9a59e302598a128ad7125d5fb4fd

Bulk query - changing the POST format

/api/databrowser/rldata/bulk_query/xml
/api/databrowser/rldata/bulk_query/json

Bulk query - JSON POST format

/api/databrowser/rldata/bulk_query/json
{
"rl": {
"query": {
"hash_type": "md5",
"hashes": [
"4bb64c06b1a72539e6d3476891daf17b",
"6353de8f339b7dcc6b25356f5fbffa4e",
"59cb087c4c3d251474ded9e156964d5d",
"6c2eb9d1a094d362bcc7631f2551f5a4",
"a82c781ce0f43d06c28fe5fc8ebb1ca9",
"920f5ba4d08f251541c5419ea5fb3fb3"
]
}
}
}
{
"rl": {
"query": {
"hash_type": "sha1",
"hashes": [
"13e40f38427a55952359bfc5f52b5841ce1b46ba",
"831fc2b9075b0a490adf15d2c5452e01e6feaa17",
"42b05278a6f2ee006072af8830c103eab2ce045f"
]
}
}
}