Expression search
This service provides samples first seen on a particular date, filtered by search criteria. At least 2 criteria must be supplied for a successful query.
It can also return statistics of how many samples are in a specific group, and their estimated size.
The search is performed over a static data set that is created as a daily snapshot. Old or rescanned samples are not included in the search; only those with the first_seen status on a given date are included.
The search uses the sample classification found at that time. Subsequent classification changes are reflected, but they are updated once per day.
String search is performed using a regular expression (re2) match.
The following search criteria fields are available:
- status
- threat_name
- platform
- subplatform
- malware_type
- malware_family
- sample_type
- threat_level *
- trust_factor *
- sample_size *
- scanner_detections *
Fields marked with an asterisk (*) support relational operators: <=
, >=
and =
.
Multiple values for the same field are allowed.
This API is rate limited to 1 request per second.
Supported Sample Types
This API supports querying samples by all file types and subtypes that Spectra Core can identify and analyze.
The sample type, subtype, and identification are presented in the following format:
<file type>/<file subtype>[/<file identification>]
Note that the identification is optional, so the response for the queried sample type always has at least two fields. For example:
- PE/Exe - file type and subtype without identification
- Binary/Archive/TAR - file type, subtype, and identification
When a sample's (sub)type is not identified, "None" is returned for the unidentified field. For example:
- Media Container/None
- Document/None/MicrosoftWord
- Binary/None/PythonPYC
The full list of supported file types, subtypes, and file identifications can be found in the Spectra Core documentation (sections "Native File Types and Subtypes" and "Native Binary/PE/PE SFX/Multimedia File Types and Identifications", respectively).
When using the API to find samples by type, it is recommended to use regular expressions for more precise search queries.
By default, the queries are treated as wrapped in wildcards, so searching for sample_type=PE behaves the same as searching for sample_type=*PE*. The results will include all sample types that contain "pe" or "PE" in their name, including TypeScript and JPEG, for example.
Instead, the query should be written as: sample_type:^PE, or more generally, sample_type:^<file type>. This regular expression will match only the first part of the queried sample type.
Restrictions
A search of the text fields (threat_name, platform, subplatform, malware_type, malware_family, sample_type) is performed using case-insensitive regular expressions.
For example:
- sample_type=PE will also include sample_type=Binary/Archive/OpenDocumentSpreadsheet (because the string "OpenDocumentSpreadsheet" contains the letters "PE" from the search query).
Selecting Windows PE files should be done using sample_type=^PE.
It is not possible to search for benign samples using the threat_name, platform, subplatform, malware_type or malware_family fields. Only malicious and suspicious samples can be searched using those fields.
Expression Search Query
This query returns a maximum of 1000 records of new samples in Spectra Intelligence on the requested date and matching the requested search criteria. At least 2 criteria are required.
If more than 1000 records match the requested criteria, the response will have a next_page field which can be used in request arguments to fetch the next page with up to 1000 results.
Request
GET /api/sample/search/download/v1/query/{time_format}/{time_value}[?format=xml|json|tsv][&page=N][&search_criteria_field_name=search_criteria_value]
Users can also send a request to get the latest available results for a particular combination of search criteria.
GET /api/sample/search/download/v1/query/latest[?format=xml|json|tsv][&page=N][&search_criteria_field_name=search_criteria_value]
The latest endpoint always returns the results for yesterday's date.
Path parameters:
time_format
- Defines the format in which the first seen date and time should be submitted. Accepted values are: timestamp, utc, date
- Required
time_value
- The user-specified date and time on which the requested samples were first seen. The earliest possible date that can be queried is 2010-01-01. The date and time should be formatted according to the format selected in the time_format parameter:
- for timestamp - Unix epoch time as number of seconds from 1970-01-01 00:00:00
- for utc - YYYY-MM-DDThh:mm:ss
- for date - YYYY-MM-DD
- Values that are not provided as a full day will be rounded to midnight of the same day. For example:
- time_value 1398520485 for time_format timestamp in query is the same as querying 1398470400
- time_value 2014-04-14T10:32:12 for time_format utc in query is the same as 2014-04-14T00:00:00
- Required
- The user-specified date and time on which the requested samples were first seen. The earliest possible date that can be queried is 2010-01-01. The date and time should be formatted according to the format selected in the time_format parameter:
Query parameters:
format
- Allows choosing the response format (XML, JSON, TSV). XML is default and will be returned if this parameter is not provided in the request. Accepted values are xml, json, and tsv.
- Optional
page
- Allows choosing which page of results should be returned when there are more than 1000 samples in the list of results. Defaults to 1 (first page) if not provided in the request.
- Optional
search_criteria_field_name
- Specifies which search criteria the samples will be queried against. At least 2 criteria must be supplied for a successful query. Every specified criterion requires at least one value that is provided in the search_criteria_value parameter. Accepted values are: status, threat_level, trust_factor, threat_name, platform, subplatform, malware_type, malware_family, sample_type, sample_size, scanner_detections
- Required
search_criteria_value
- Specifies the search values that will be used to query for samples, and returns only the results that exactly match the selected values. Accepted values depend on the search criteria provided in the search_criteria_field_name parameter.
- Required
Supported Search Criteria Field Names and Values
sha1
- Hexadecimal hash value of the sample
sha256
- Hexadecimal hash value of the sample
md5
- Hexadecimal hash value of the sample
status
- Malware Presence status designation: KNOWN, MALICIOUS, SUSPICIOUS
threat_level
- Malware severity indicator for suspicious and malicious samples, expressed as an integer between 0 and 5, where 5 indicates the most dangerous threats (highest severity). Applies to malicious and suspicious samples only
trust_factor
- Trustworthiness indicator for goodware samples, expressed as an integer between 0 and 5, where 0 indicates the most trusted samples (highest confidence). Applies to known samples only
threat_name
- Complete malware threat name. Conforms to ReversingLabs Malware naming standard: platform-subplatform.type.familyname. Applies to malicious samples only
platform
- The platform part of the full threat name detected for the sample (for example, Win32, Script-PHP, Linux...). Indicates the operating system targeted by the malware. Conforms to ReversingLabs Malware naming standard. Applies to malicious samples only
subplatform
- The subplatform part of the full threat name detected for the sample (for example, HTML, Macro, PDF...). Note that the subplatform part is present in the threat name only for specific platforms (ByteCode, Document, Script). Conforms to ReversingLabs Malware naming standard. Applies to malicious samples only
malware_type
- The type part of the full threat name detected for the sample (for example, Trojan, Adware, Rootkit...). Conforms to ReversingLabs Malware naming standard. Applies to malicious samples only
malware_family
- The familyname part of the full threat name detected for the sample (for example, Marsdaemon, Orcus, Androrat...), or the CVE identifier (full or partial). Applies to only to suspicious and malicious samples
sample_type
- Sample type string as detected by Spectra Core
sample_size
- Sample size in bytes, specified as an integer
scanner_detections
- The number of antivirus scanners that have detected the sample as malicious, specified as an integer
Response
{
"rl": {
"web_sample_search_download": {
"date": "string",
"entries": [],
"next_page": 0,
"sample_count": 0,
"sample_size_sum": 0
}
}
}
date
- The requested date (matching the time_value from the request) in format YYYY-MM-DD
sample_size_sum
- Combined size of all samples matching the search query, expressed in bytes
next_page
- Page number that can be used in the request to retrieve the next batch of results
sample_count
- Number of samples in the response
entries
- A list of records, each returned as a separate item, containing the fields for individual samples
rl.web_sample_search_download.entries[]
{
"sha1": "string",
"sha256": "string",
"md5": "string",
"sample_available": "string",
"status": "string",
"threat_level": 0,
"trust_factor": 0,
"threat_name": "string",
"malware_family": "string",
"malware_type": "string",
"platform": "string",
"subplatform": "string",
"sample_type": "string",
"sample_size": "string",
"first_seen": "string",
"last_seen": "string"
}
Examples
Benign Microsoft documents:
/api/sample/search/download/v1/query/date/2023-09-01?status=KNOWN&sample_type=MicrosoftWord|MicrosoftExcel|MicrosoftPowerPoint&format=json
Malicious PDFs:
/api/sample/search/download/v1/query/date/2023-09-01?status=MALICIOUS&status=SUSPICIOUS&sample_type=PDF&format=json
All Windows PE binaries:
/api/sample/search/download/v1/query/date/2023-09-01?status=KNOWN&status=SUSPICIOUS&status=MALICIOUS&sample_type=^PE&format=json
Expression Search Statistics Query
This query returns a maximum of 1000 records of aggregated statistics about new samples in the Spectra Intelligence cloud on the requested date that match the used search criteria.
The next_page field in the response can be used in request arguments to fetch the next page with up to 1000 results.
Request
GET /api/sample/search/download/v1/statistics/{time_format}/{time_value}[?format=xml|json|tsv][&page=N][&search_criteria_field_name=search_criteria_value]
Users can also send a request to get the latest available statistics for a particular combination of search criteria.
GET /api/sample/search/download/v1/statistics/latest[?format=xml|json|tsv][&page=N][&search_criteria_field_name=search_criteria_value]
The latest endpoint always returns the results for yesterday's date.
Path parameters:
time_format
- Defines the format in which the first seen date and time should be submitted. Accepted values are: timestamp, utc, date
- Required
time_value
- The user-specified date and time on which the requested samples were first seen. The earliest possible date that can be queried is 2010-01-01. The date and time should be formatted according to the format selected in the
time_format
parameter:- for timestamp - Unix epoch time as number of seconds from 1970-01-01 00:00:00
- for utc - YYYY-MM-DDThh:mm:ss
- for date - YYYY-MM-DD
- Values that are not provided as a full day will be rounded to midnight of the same day. For example:
- time_value 1398520485 for time_format timestamp in query is the same as querying 1398470400
- time_value 2014-04-14T10:32:12 for time_format utc in query is the same as 2014-04-14T00:00:00
- Required
- The user-specified date and time on which the requested samples were first seen. The earliest possible date that can be queried is 2010-01-01. The date and time should be formatted according to the format selected in the
Query parameters:
format
- Allows choosing the response format (XML, JSON, TSV). XML is default and will be returned if this parameter is not provided in the request. Accepted values are xml, json, and tsv.
- Optional
page
- Allows choosing which page of results should be returned when there are more than 1000 samples in the list of results. Defaults to 1 (first page) if not provided in the request.
- Optional
search_criteria_field_name
- Specifies which search criteria the samples will be queried against. At least 2 criteria must be supplied for a successful query. Every specified criteria requires at least one value that is provided in the search_criteria_value parameter. Accepted values are: status, threat_level, trust_factor, threat_name, platform, subplatform, malware_type, malware_family, sample_type, sample_size
- Required
search_criteria_value
- Specifies the search values that will be used to query for samples, and returns only the results that exactly match the selected values. Accepted values depend on the search criteria provided in the search_criteria_field_name parameter.
- Required
Supported Search Criteria Field Names and Values
sha1
- Hexadecimal hash value of the sample
sha256
- Hexadecimal hash value of the sample
md5
- Hexadecimal hash value of the sample
status
- Malware Presence status designation: KNOWN, MALICIOUS, SUSPICIOUS
threat_level
- Malware severity indicator for suspicious and malicious samples, expressed as an integer between 0 and 5, where 5 indicates the most dangerous threats (highest severity). Applies to malicious and suspicious samples only
trust_factor
- Trustworthiness indicator for goodware samples, expressed as an integer between 0 and 5, where 0 indicates the most trusted samples (highest confidence). Applies to known samples only
threat_name
- Complete malware threat name. Conforms to ReversingLabs Malware naming standard: platform-subplatform.type.familyname. Applies to malicious samples only
platform
- The platform part of the full threat name detected for the sample (for example, Win32, Script-PHP, Linux...). Indicates the operating system targeted by the malware. Conforms to ReversingLabs Malware naming standard. Applies to malicious samples only
subplatform
- The subplatform part of the full threat name detected for the sample (for example, HTML, Macro, PDF...). Note that the subplatform part is present in the threat name only for specific platforms (ByteCode, Document, Script). Conforms to ReversingLabs Malware naming standard. Applies to malicious samples only
malware_type
- The type part of the full threat name detected for the sample (for example, Trojan, Adware, Rootkit...). Conforms to ReversingLabs Malware naming standard. Applies to malicious samples only
malware_family
- The familyname part of the full threat name detected for the sample (for example, Marsdaemon, Orcus, Androrat...).. Applies to malicious samples only
sample_type
- Sample type string as detected by Spectra Core
sample_size
- Sample size in bytes, specified as an integer
scanner_detections
- The number of antivirus scanners that have detected the sample as malicious, specified as an integer
Response
{
"rl": {
"web_sample_search_download": {
"date": "string",
"entries": [],
"next_page": 0,
"sample_count": 0,
"sample_size_sum": 0
}
}
}
date
- The requested date (matching the time_value from the request) in format YYYY-MM-DD
sample_size_sum
- Combined size of all samples matching the search query, expressed in bytes
next_page
- Page number that can be used in the request to retrieve the next batch of results
sample_count
- Number of samples in the response
entries
- A list of records, each returned as a separate item, containing the fields for individual samples
rl.web_sample_search_download.entries[]
{
"sample_group_size": "string",
"sample_type": "string",
"sample_group_count": "string",
"status": "string",
"threat_level": 0,
"trust_factor": 0
}
Examples
Benign Microsoft documents:
/api/sample/search/download/v1/statistics/date/2023-09-01?status=KNOWN&sample_type=MicrosoftWord|MicrosoftExcel|MicrosoftPowerPoint&format=json
Malicious PDFs:
/api/sample/search/download/v1/query/statistics/2023-09-01?status=MALICIOUS&status=SUSPICIOUS&sample_type=PDF&format=json
All Windows PE binaries:
/api/sample/search/download/v1/query/statistics/2023-09-01?status=KNOWN&status=SUSPICIOUS&status=MALICIOUS&sample_type=^PE&format=json