Analysis
Spectra Detect Worker analyzes files submitted via the Worker API and produces a detailed analysis report for every file using the built-in Spectra Core static analysis engine.
Analysis reports can be retrieved in several ways, depending on the Worker configuration. It is also possible to control the contents of the report to an extent.
Retrieving Analysis Reports
There are two ways to get the file analysis report(s):
- The Get information about a processing task endpoint. Sending a GET request to the endpoint with the task ID returns the analysis report in the response.
- Saved on one of the configured integrations.
- S3 - for hosted deployments, this is the only supported integration
- Microsoft services (Azure Storage, SharePoint, OneDrive)
- file shares (NFS, SMB)
- Splunk
- callback server
Adding Custom Data to the Report
Users can also save any custom data in the analysis report by submitting it in the file upload request.
The custom_data
field accepts any user-defined data as a JSON-encoded payload. This data is included in all file analysis reports (Worker API, Callback, AWS S3, Azure Data Lake and Splunk, if enabled) in its original, unchanged form. The custom_data
field will not be returned in the Get information about a processing task endpoint response if the file has not been processed yet.
Users should avoid using request_id
as a key in their custom_data
, as that value is used internally by the appliance.
Example - Submitting a file with the custom_data parameter to store user-defined information in the report
curl https://tiscale-worker-01/api/tiscale/v1/upload -H 'Authorization: Token 94a269285acbcc4b37a0ad335d221fab804a1d26' -F file=@Classification.pdf -F 'custom_data={"file_source":{"uploader":"malware_analyst", "origin":"example"}}'
Customizing Analysis Reports
There are several different ways of customizing an analysis report:
- through report configuration
- through report types
- through report views
These methods are not mutually exclusive and are applied in the order above (configuration first, then report type, then report view). For example, to even be present for later filtering/transforming, strings found in a file must be included in the report.
Report types are results of filtering the full report. In other words, fields can be included or excluded as required. On the other hand, report views are results of transforming parts of the report, such as field names or the structure of the report. Historically, views could also be used to just filter out certain fields without any transformations, and this functionality has been maintained for backward compatibility. However, filtering-only views should be replaced by their equivalent report types as they are much faster.
As previously mentioned, filtering and transforming actions are not mutually exclusive. You can filter out some fields (using a report type), and then perform a transformation on what remains (using a report view). However, not all report views are compatible with all report types. This is because some report views expect certain fields to be present.
Report Types
Report types are JSON configuration files with the following format:
{
"name": "string",
"exclude_fields": true,
"fields": {
"example_field": false,
"another_example": {
"example_subfield": false,
"another_subfield": false
}
}
}
Some default options:
small
- Contains only the classification of the file, and some information about the file.
extended_small
- Contains information about file classification, information about the file, the
story
field,tags
andinteresting_strings
.
- Contains information about file classification, information about the file, the
medium
- This is the default report that’s served when there are no additional query parameters (in other words, it’s not necessary to specifically request this report, as it’s sent by default).
- It is equivalent to the previous "summary" report with some small differences:
- each subreport contains an
index
andparent
field - if
metadata.application.capabilities
is 0, then this field is not present in the report
- each subreport contains an
- Changes in this report:
- excludes the entire
relationships
section - excludes certain fields under the
info
section, such aswarnings
anderrors
- many
metadata
fields are not present such as those related to certificates - there are no strings, no story and no tags
- excludes the entire
large
- Includes every single field present in the analysis report. It is equivalent to the previous "full" report (
?full=true
).
- Includes every single field present in the analysis report. It is equivalent to the previous "full" report (
Report types that replace report views with the same name:
classification
- This report returns only the classification of the file, story and info. It has no metadata except the
attack
field.
- This report returns only the classification of the file, story and info. It has no metadata except the
classification_tags
- Same as the classification view, with the addition of Spectra Core tags.
extended
- Compared to the default (medium) type, contains:all metadata, relationships, tagsthe
story
fieldunderinfo
, contains statistics and unpacking information
- Compared to the default (medium) type, contains:all metadata, relationships, tagsthe
mobile_detections
- Contains mobile-related metadata, as well as classification and story.
mobile_detections_v2
- Contains more narrowly defined mobile metadata, with exclusive focus on Android. Also contains classification and story.
short_cert
- Contains certificate and signature-related metadata, as well as indicators and some classification info.
The name
of the report type is the string you’ll refer to when calling the Get information about a processing task endpoint (or the one passed to the relevant configuration command). For example, if the name of your report type is my-custom-report-type
, you would include it in the query parameters as follows: ?report_type=my-custom-report-type
.
The exclude_fields
field defines the behavior of report filtering. This is an optional field and is false
by default. This means that, by default, the report fields under fields
will be included (you explicitly say which fields you want). Conversely, if this value is set to true
, then the report fields under fields
will be excluded (you explicitly say which fields you don’t want).
The fields
nested dictionary contains the fields that are either included or excluded (depending on the value of exclude_fields
). If a subfield is set to a boolean value (true
/false
), then the inclusion/exclusion applies to that section and all sections under it.
For example, small.json
:
{
"name": "small",
"fields": {
"info" : {
"file": true,
"identification": true
},
"classification" : true
}
}
In this configuration, we’re explicitly including fields (exclude_fields
was not set, so it’s false
by default). Setting individual fields to true
will make them (and their subfields) appear in the final report. In other words, the only sections that will be in the final report are the entire classification
section and the file
and identification
fields from the info
section. Everything else will not be present.
Or, exclude-example.json
:
{
"name": "exclude-example",
"exclude_fields": true,
"fields": {
"relationships": false,
"info": {
"statistics": false,
"binary_layer": false,
}
}
}
In this configuration, the entire relationships
section is excluded, as well as statistics
and binary_layer
from the info
section. Everything else will be present in the report.
Limitations
-
info.file
cannot be excluded and is always present. -
Items in arrays cannot be selectively included or excluded (entire arrays only).
-
Items that are JSON primitives (string, number, boolean, null) cannot be excluded if they’re on the same level as an included field, or above an included field. Take this example structure - only
e
(line 6) will be explicitly included:{
"a": {
"b": 1
"c": {
"d": "foo",
"e": "bar",
"f": {
"g": "text",
"h": 1
}
},
"x": [
1,
2,
3
],
"y": {
"z": "hello",
"w": "world"
}
}
}If you include
e
, you will also getd
(because it’s on the same level ase
, and is a primitive data type), but you will also getb
(because it’s on the level above, and is a primitive data type as well):Filtering result:
{
"a": {
"b": 1,
"c": {
"d": "foo",
"e": "bar"
}
}
}However, you will not get
f
,x
ory
as they are non-primitives (objects and arrays).
Report Views
Views are transformations of the JSON analysis output produced by the Worker. For example, views can be used to change the names of some sections in the analysis report. There are also deprecated views that allow filtering fields in or out, but this functionality is covered by report types (see above). The following views are present by default (deprecated views are excluded):
classification_top_container_only
- Returns a report view equivalent to the
classification
report type (see above), but for the top-level container (parent file).
- Returns a report view equivalent to the
flat
-
"Flattens" the JSON structure. Without flattening:
"tc_report": [
{
"info": {
"file": {
"file_type": "Binary",
"file_subtype": "Archive",
"file_name": "archive.zip",
"file_path": "archive.zip",
"size": 20324816,
"entropy": 7.9999789976332245,With flattening:
"tc_report": [
{
"info_file_entropy": 7.9999789976,
"info_file_file_name": "archive.zip",
"info_file_file_path": "archive.zip",
"info_file_file_subtype": "Archive",
"info_file_file_type": "Binary",
-
flat-one
- Returns the
flat
report, but only for the parent file.
- Returns the
no_goodware
- Returns a short version of the report for the top-level container, and any children files that are suspicious or malicious (goodware files are filtered out). This view is not compatible with split reports.
no_email_indicator_reasons
- Strips potential PII (personally identifiable information) from some fields in analysis reports for email messages, and replaces it with a placeholder string.
splunk-mod-v1
- Transforms the report so that it’s better suited for indexing by Splunk. The changes are as follows:
- if
classification
is 0 or 1,factor
becomesconfidence
- if
classification
is 2 or 3,factor
becomesseverity
- a
string_status
field is added with the overall classification (UNKNOWN, GOODWARE, SUSPICIOUS, MALICIOUS) - scanner
name
becomesreason
- scanner
result
becomesthreat
- if
- Transforms the report so that it’s better suited for indexing by Splunk. The changes are as follows:
Views can generally be applied to both split (available in self-hosted deployments) and non-split reports. If none of these views satisfy your use case, contact ReversingLabs Support to get help with building a new custom view.
Interpreting the report
After sending files to be processed, you will receive a link to a JSON report. It contains a tc_report
field, which looks something like this:
"tc_report": [
{
"info": {
"file": {
"file_type": "Text",
"file_subtype": "Shell",
"file_name": "test.sh",
"file_path": "test.sh",
"size": 35,
"entropy": 3.7287244452691413,
"hashes": []
}
},
"classification": {
"propagated": false,
"classification": 0,
"factor": 0,
"scan_results": [
{}
],
"rca_factor": 0
}
}
]
High-level overview
The key information here is the classification value (tc_report[].classification.classification
), which will be a number from 0 to 3:
classification | description |
---|---|
0 | unknown (no threats found) |
1 | goodware |
2 | suspicious |
3 | malicious |
More information
For more information, use the tc_report[].classification.rca_factor
field. The higher its value, the more dangerous the threat, except for files that weren’t classified (their classification
is 0). In that case, rca_factor
will be 0 and will not signal trustworthiness.
For even more information on why a file was given a certain classification, look at the scan_results
. This field contains the individual scanners which processed the file (name
), as well as their reason for classifying a file a given way (result
).
The following table maps the classification
value to the old "trust factor" and "threat level" values, and the new "RCA factor" value which replaces them. It also provides a mapping to a color-coded severity value, and provides general commentary with examples regarding the origin of any given classification.
Classification | Trust factor | Threat level | Risk score | Severity | Comment |
---|---|---|---|---|---|
0 (unknown) | N/A | N/A | N/A | ⬜ N/A | No threats found. Please submit the sample to Spectra Intelligence for classification. |
0 | N/A | 0 | 🟩 Clean | File comes from a very trustworthy domain or has a very trustworthy certificate. Examples: HP, IBM, Microsoft, Oracle, Intel, Dell, Sony, Google... | |
1 | N/A | 1 | 🟩 Clean | File comes from a trustworthy domain or has a trustworthy certificate. Examples: php.net, mit.edu, postgresql.org, redhat.de, opera.com, nasa.gov... | |
2 | N/A | 2 | 🟩 Clean | File comes from a usually trusted domain. Examples: softpedia.com, sourceforge.net, cnet.com... | |
3 | N/A | 3 | 🟩 Likely clean | File comes from another known site. | |
4 | N/A | 4 | 🟩 Possibly clean | Some valid but not very trusted certificates. | |
1 (known) | 5 | N/A | 5 |