# Spectra Intelligence Documentation > Cloud threat intelligence API and feed documentation. This file contains all documentation content in a single document following the llmstxt.org standard. ## Analysis Timeout Issues File analysis timeouts can occur when processing complex or large files that require extensive analysis time. Understanding the causes and solutions helps ensure successful file processing. ## Common Causes Analysis timeouts typically happen due to: - **Large file sizes** - Files approaching or exceeding the size limits for your appliance tier - **Deep nesting** - Archives containing multiple layers of compressed files - **Extensive unpacking** - Files that trigger recursive decompression operations - **Complex file structures** - Files with intricate internal structures requiring detailed parsing - **Resource constraints** - Insufficient RAM or CPU allocation for the analysis workload ## Configuration Options ### Spectra Analyze The analysis timeout can be adjusted in the appliance configuration: 1. Navigate to **Administration > Configuration** 2. Locate the analysis timeout setting 3. Increase the timeout value based on your file processing requirements 4. Save the configuration changes ### File Inspection Engine Use the `--analysis-timeout` flag to control the per-file time limit: ```bash rl-scan --analysis-timeout 300 /path/to/file ``` The timeout value is specified in seconds. ## Troubleshooting Steps If analysis timeouts persist: 1. **Increase allocated resources** - Ensure the appliance or container has sufficient RAM (32 GB+ recommended) and CPU cores 2. **Check decompression ratio limits** - Verify that recursive unpacking isn't exceeding configured limits 3. **Review file characteristics** - Examine the file structure to identify potential issues 4. **Monitor system resources** - Check if the appliance is under heavy load from concurrent analyses 5. **Adjust timeout values** - Increase timeout settings for complex file processing workflows ## Related Topics - [Platform Requirements](/General/DeploymentAndIntegration/PlatformRequirements) - Hardware specifications for different appliance tiers - [How Spectra Core analysis works](/General/AnalysisAndClassification/SpectraCoreAnalysis) - Understanding the analysis process --- ## Antivirus Result Availability When a sample is uploaded or rescanned in Spectra Intelligence, it will usually get new antivirus results **within 30 minutes**. When a sample has new antivirus results, these will available in relevant APIs, for example [TCA-0104 File analysis](/SpectraIntelligence/API/FileThreatIntel/tca-0104/). --- ## Certificate Revocation ReversingLabs maintains a certificate revocation database that is updated with each [Spectra Core](/General/AnalysisAndClassification/SpectraCoreAnalysis) release. Because the database is offline, some recently revoked certificates may not appear as revoked until the next update. Certificate Authority (CA) revocation alone is not sufficient to classify a sample as malicious. Most CAs backdate revocations to the certificate's issuance date, regardless of when or whether the certificate was abused. When additional context is available, ReversingLabs adjusts the revocation date to reflect the most appropriate point in time. If a certificate is whitelisted, this correction is not applied. ## Searching for Revoked Certificates You can find samples signed with revoked certificates using **Advanced Search** with the `tag:cert-revoked` keyword. Advanced Search is available both through the [Spectra Analyze user interface](/SpectraAnalyze/search-page/) and as the [TCA-0320 Advanced Search](/SpectraIntelligence/API/MalwareHunting/tca-0320/) API. --- ## File Classification and Risk Scoring — ReversingLabs # Classification File classification assigns a risk score (0-10) and threat verdict (malicious, suspicious, goodware, or unknown) to every analyzed file using ReversingLabs Spectra Core. The classification algorithm combines YARA rules, machine learning, heuristics, certificate validation, and file similarity matching to determine security status. YARA rules take precedence as the most authoritative signal, followed by other detection methods that contribute to the final verdict. The classification of a sample is based on a comprehensive assessment of its assigned risk factor, threat level, and trust factor; however, it can be manually or automatically overridden when necessary. Based on this evaluation, files are placed into one of the following buckets: - No threats found (unclassified) - Goodware/known - Suspicious - Malicious The classification process weighs signals from all available sources to arrive at the most accurate verdict. Some signals are considered more authoritative than others and take priority. For example, [Spectra Core](/General/AnalysisAndClassification/SpectraCoreAnalysis) YARA rules always take precedence because they are written and curated by ReversingLabs analysts. These rules provide the highest degree of accuracy, as they target specific, named threats. This does not mean that other classification methods are less important. Similarity matching, heuristics, and machine learning still contribute valuable signals and may produce additional matches. In cases where multiple detections apply, YARA rules simply serve as the deciding factor for the final classification. ## Risk score A risk score is a value representing the trustworthiness or malicious severity of a sample. Risk score is expressed as a number from 0 to 10, with 0 indicating whitelisted samples from a reputable origin, and 10 indicating the most dangerous threats. At a glance: Files with no threats found don't get assigned a risk score and are therefore **unclassified**. Values from 0 to 5 are reserved for samples classified as **goodware/known**, and take into account the source and structural metadata of the file, among other things. Since goodware samples do not have threat names associated with them, they receive a description based on their risk score. Risk scores from 6 to 10 are reserved for **suspicious** and **malicious** samples, and express their severity. They are calculated by a ReversingLabs proprietary algorithm, and based on many factors such as file origin, threat type, how frequently it occurs in the wild, YARA rules, and more. Lesser threats like adware get a risk score of 6, while ransomware and trojans always get a risk score of 10. ### Malware type and risk score In cases where multiple threats are detected and there are no other factors (such as user overrides) involved, the final classification is always the one that presents the biggest threat. If they belong to the same risk score group, malware types are prioritized in this order: | Risk score | Malware types | |------------|---------------------------------------------------------------------------------------------------------------------| | 10 | EXPLOIT > BACKDOOR > RANSOMWARE > INFOSTEALER > KEYLOGGER > WORM > VIRUS > CERTIFICATE > PHISHING > FORMAT > TROJAN | | 9 | ROOTKIT > COINMINER > ROGUE > BROWSER | | 8 | DOWNLOADER > DROPPER > DIALER > NETWORK | | 7 | SPYWARE > HYPERLINK > SPAM > MALWARE | | 6 | ADWARE > HACKTOOL > PUA > PACKED | ## Threat level and trust factor The [risk score table](#risk-score) describes the relationship between the risk score, and the threat level and trust factor used by the [File Reputation API](/SpectraIntelligence/API/FileThreatIntel/tca-0101). The main difference is that the risk score maps all classifications onto one numerical scale (0-10), while the File Reputation API uses two different scales for different classifications. ### Nomenclature The following classifications are equivalent: | File Reputation API | Spectra Analyze | Spectra Detect Worker | | ------------------- | --------------- | ------------------------ | | known | goodware | 1 (in the Worker report) | In the Worker report, the [risk score](#risk-score) is called `rca_factor`. ## Deciding sample priority The [risk score table](#risk-score) highlights that the a sample's risk score and its classification don't have a perfect correlation. This means that a sample's risk score cannot be interpreted on its own, and that the primary criterion in deciding a sample's priority is its classification. Samples classified as suspicious can be a result of heuristics, or a possible early detection. A suspicious file may be declared malicious or known at a later time if new information is received that changes its threat profile, or if the user manually modifies its status. The system always considers a malicious sample with a risk score of 6 as a higher threat than a suspicious sample with a risk score of 10, meaning that samples classified as malicious always supersede suspicious samples, regardless of the calculated risk score. The reason for this is certainty - a malicious sample is decidedly malicious, while suspicious samples need more data to confirm the detected threat. It is a constant effort by ReversingLabs to reduce the number of suspicious samples. While a suspicious sample with a risk score of 10 does deserve user attention and shouldn't be ignored, a malicious sample with a risk score of 10 should be triaged as soon as possible. ## Malware naming standard --- ## Handling False Positives # Handling False Positives A false positive occurs when a legitimate file is incorrectly classified as malicious. While ReversingLabs strives for high accuracy, false positives can occasionally happen due to the complexity of malware detection across hundreds of file formats and millions of samples. ## What You Can Do If you encounter a false positive, you have several options: ### 1. Local Classification Override On Spectra Analyze, you can immediately override the classification using the classification override feature: - Navigate to the file's Sample Details page - Use the classification override option to manually set the file as goodware - The override takes effect immediately on your appliance - All users on the same appliance will see the updated classification ### 2. Spectra Intelligence Reclassification Request Submit a reclassification request through Spectra Intelligence: - The override propagates across all appliances connected to the same Spectra Intelligence account - Other appliances in your organization will automatically receive the updated classification - This is the recommended approach for organization-wide corrections ### 3. Goodware Overrides Use Goodware Overrides to propagate trusted parent classifications to extracted child files: - If a trusted parent file (e.g., from Microsoft or another reputable vendor) contains files that trigger false positives - The parent's goodware classification can automatically override the child files - This is particularly useful for legitimate installers that may contain components flagged by heuristics ## How ReversingLabs Handles False Positive Reports If a customer reports a false positive (through Zendesk, or by contacting the Support team at support@reversinglabs.com), the first thing we do is re-scan the sample to make sure that the results are up-to-date. If the results are still malicious, our Threat Analysis team will: 1. Conduct our own research of the software and the vendor 2. Contact the AV scanners and notify them of the issue 3. Change the classification in our system (we do not wait for AVs to correct the issue) --- If the file is confirmed to be a false positive, we begin by analyzing why the incorrect classification occurred. Then we try to correct the result by making adjustments related to file relationships, certificates, AV product detection velocity (e.g. are detections being added or removed), we will re-scan and reanalyze samples, adjust/add sources and, if necessary, manually investigate the file. If these efforts do not yield a correct result, we have the ability to **manually override the classification** — but we only do so after thorough analysis confirms the file is benign. --- ## ReversingLabs malware naming standard The ReversingLabs detection string consists of three main parts separated by dots. All parts of the string will always appear (all three parts are mandatory). ``` platform-subplatform.type.familyname ``` 1. The first part of the string indicates the **platform** targeted by the malware. This string is always one of the strings listed in the [Platform string](#platform-string) table. If the platform is Archive, Audio, ByteCode, Document, Image or Script, then it has a subplatform string. Platform and subplatform strings are divided by a hyphen (`-`). The lists of available strings for Archive, Audio, ByteCode, Document, Image and Script subplatforms can be found in their respective tables. 2. The second part of the detection string describes the **malware type**. Strings that appear as malware type descriptions are listed in the [Type string](#type-string) table. 3. The third and last part of the detection string represents the malware family name, i.e. the name given to a particular malware strain. Names "Agent", "Gen", "Heur", and other similar short generic names are not allowed. Names can't be shorter than three characters, and can't contain only numbers. Special characters (apart from `-`) must be avoided as well. The `-` character is only allowed in exploit (CVE/CAN) names (for example CVE-2012-0158). #### Examples If a trojan is designed for the Windows 32-bit platform and has the family name "Adams", its detection string will look like this: ``` Win32.Trojan.Adams ``` If some backdoor malware is a PHP script with the family name "Jones", the detection string will look like this: ``` Script-PHP.Backdoor.Jones ``` Some potentially unwanted application designed for Android that has the family name "Smith" will have the following detection string: ``` Android.PUA.Smith ``` Some examples of detections with invalid family names are: ``` Win32.Dropper.Agent ByteCode-MSIL.Keylogger.Heur Script-JS.Hacktool.Gen Android.Backdoor.12345 Document-PDF.Exploit.KO Android.Spyware.1a Android.Spyware.Not-a-CVE Win32.Trojan.Blue_Banana Win32.Ransomware.Hydra:Crypt Win32.Ransomware.HDD#Cryptor ``` #### Platform string The platform string indicates the operating system that the malware is designed for. The following table contains the available strings and the operating systems for which they are used. | String | Short description | | ----------- | ------------------------------------------------------------------------------------------ | | ABAP | SAP / R3 Advanced Business Application Programming environment | | Android | Applications for Android OS | | AOL | America Online environment | | Archive | Archives. See [Archive subplatforms](#archive-subplatforms) for more information. | | Audio | Audio. See [Audio subplatforms](#audio-subplatforms) for more information. | | BeOS | Executable content for Be Inc. operating system | | Boot | Boot, MBR | | Binary | Binary native type | | ByteCode | ByteCode, platform-independent. See [ByteCode subplatforms](#bytecode-subplatforms) for more information. | | Blackberry | Applications for Blackberry OS | | Console | Executables or applications for old consoles (e.g. Nintendo, Amiga, ...) | | Document | Documents. See [Document subplatforms](#document-subplatforms) for more information. | | DOS | DOS, Windows 16 bit based OS | | EPOC | Applications for EPOC mobile OS | | Email | Emails. See [Email subplatforms](#email-subplatforms) for more information. | | Firmware | BIOS, Embedded devices (mp3 players, ...) | | FreeBSD | Executable content for 32-bit and 64-bit FreeBSD platforms | | Image | Images. See [Image subplatforms](#image-subplatforms) for more information. | | iOS | Applications for Apple iOS (iPod, iPhone, iPad…) | | Linux | Executable content for 32 and 64-bit Linux operating systems | | MacOS | Executable content for Apple Mac OS, OS X | | Menuet | Executable content for Menuet OS | | Novell | Executable content for Novell OS | | OS2 | Executable content for IBM OS/2 | | Package | Software packages. See [Package subplatforms](#package-subplatforms) for more information. | | Palm | Applications for Palm mobile OS | | Script | Scripts. See [Script subplatforms](#script-subplatforms) for more information. | | Shortcut | Shortcuts | | Solaris | Executable content for Solaris OS | | SunOS | Executable content for SunOS platform | | Symbian | Applications for Symbian OS | | Text | Text native type | | Unix | Executable content for the UNIX platform | | Video | Videos | | WebAssembly | Binary format for executable code in Web pages | | Win32 | Executable content for 32-bit Windows OS's | | Win64 | Executable content for 64-bit Windows OS's | | WinCE | Executable content for Windows Embedded Compact OS | | WinPhone | Applications for Windows Phone | ##### Archive subplatforms | String | Short description | | ---------------------------------- | ------------------------------------------------------------ | | ACE | WinAce archives | | AR | AR archives | | ARJ | ARJ (Archived by Robert Jung) archives | | BZIP2 | Bzip2 archives | | CAB | Microsoft Cabinet archives | | GZIP | GNU Zip archives | | ISO | ISO image files | | JAR | JAR (Java ARchive) archives | | LZH | LZH archives | | RAR | RAR (Roshal Archive) archives | | 7ZIP | 7-Zip archives | | SZDD | Microsoft SZDD archives | | TAR | Tar (tarball) archives | | XAR | XAR (eXtensible ARchive) archives | | ZIP | ZIP archives | | ZOO | ZOO archives | | *Other Archive identification* | All other valid [Spectra Core](/General/AnalysisAndClassification/SpectraCoreAnalysis) identifications of Archive type | ##### Audio subplatforms | String | Short description | | -------------------------------- | ---------------------------------------------------------- | | WAV | Wave Audio File Format | | *Other Audio identification* | All other valid Spectra Core identifications of Audio type | ##### ByteCode subplatforms | String | Short description | | ------ | ----------------- | | JAVA | Java bytecode | | MSIL | MSIL bytecode | | SWF | Adobe Flash | ##### Document subplatforms | String | Short description | | ----------------------------------- | ------------------------------------------------------------ | | Access | Microsoft Office Access | | CHM | Compiled HTML | | Cookie | Cookie files | | Excel | Microsoft Office Excel | | HTML | HTML documents | | Multimedia | Multimedia containers that aren't covered by other platforms (e.g. ASF) | | Office | File that affects multiple Office components | | OLE | Microsoft Object Linking and Embedding | | PDF | PDF documents | | PowerPoint | Microsoft Office PowerPoint | | Project | Microsoft Office Project | | Publisher | Microsoft Office Publisher | | RTF | RTF documents | | Visio | Microsoft Office Visio | | XML | XML and XML metafiles (ASX) | | Word | Microsoft Office Word | | *Other Document identification* | All other valid Spectra Core identifications of Document type | ##### Email subplatforms | String | Short description | | ------ | ------------------------------------- | | MIME | Multipurpose Internet Mail Extensions | | MSG | Outlook MSG file format | ##### Image subplatforms | String | Short description | | -------------------------------- | ------------------------------------------------------------ | | ANI | File format used for animated mouse cursors on Microsoft Windows | | BMP | Bitmap images | | EMF | Enhanced Metafile images | | EPS | Adobe Encapsulated PostScript images | | GIF | Graphics Interchange Format | | JPEG | JPEG images | | OTF | OpenType Font | | PNG | Portable Network Graphics | | TIFF | Tagged Image File Format | | TTF | Apple TrueType Font | | WMF | Windows Metafile images | | *Other Image identification* | All other valid Spectra Core identifications of Image type | ##### Package subplatforms | String | Short description | | ---------------------------------- | ------------------------------------------------------------ | | NuGet | NuGet packages | | DEB | Debian Linux DEB packages | | RPM | Linux RPM packages | | WindowStorePackage | Packages for distributing and installing Windows apps | | *Other Package identification* | All other valid Spectra Core identifications of Package type | ##### Script subplatforms | String | Short description | | --------------------------------- | ------------------------------------------------------------ | | ActiveX | ActiveX scripts | | AppleScript | AppleScript scripts | | ASP | ASP scripts | | AutoIt | AutoIt scripts (Windows) | | AutoLISP | AutoCAD LISP scripts | | BAT | Batch scripts | | CGI | CGI scripts | | CorelDraw | CorelDraw scripts | | Ferite | Ferite scripts | | INF | INF Script, Windows installer scripts | | INI | INI configuration file | | IRC | IRC, mIRC, pIRC/Pirch Script | | JS | Javascript, JScript | | KiXtart | KiXtart scripts | | Logo | Logo scripts | | Lua | Lua scripts | | Macro | Macro (e.g. VBA, AmiPro macros, Lotus123 macros) | | Makefile | Makefile configuration | | Matlab | Matlab scripts | | Perl | Perl scripts | | PHP | PHP scripts | | PowerShell | PowerShell scripts, Monad (MSH) | | Python | Python scripts | | Registry | Windows Registry scripts | | Ruby | Ruby scripts | | Shell | Shell scripts | | Shockwave | Shockwave scripts | | SQL | SQL scripts | | SubtitleWorkshop | SubtitleWorkshop scripts | | WinHelp | WinHelp Script | | WScript | Windows Scripting Host related scripts (can be VBScript, JScript, …) | | *Other Script identification* | All other valid Spectra Core identifications of Script type | #### Type string This string is used to describe the general type of malware. The following table contains the available strings and describes what each malware type is capable of. For a catalog of common software weaknesses that enable malware, see [CWE](https://cwe.mitre.org/) maintained by MITRE. CISA maintains advisories on actively exploited vulnerabilities at [cisa.gov/known-exploited-vulnerabilities](https://www.cisa.gov/known-exploited-vulnerabilities). | String | Description | | ----------- | ------------------------------------------------------------ | | Adware | Presents unwanted advertisements | | Backdoor | Bypasses device security and allows remote access | | Browser | Browser helper objects, toolbars, and malicious extensions | | Certificate | Classification derived from certificate data | | Coinminer | Uses system resources for cryptocurrency mining without the user's permission | | Dialer | Applications used for war-dialing and calling premium numbers | | Downloader | Downloads other malware or components | | Dropper | Drops malicious artifacts including other malware | | Exploit | Exploits for various vulnerabilities, CVE/CAN entries | | Format | Malformations of the file format. Classification derived from graylisting, validators on unpackers | | Hacktool | Software used in hacking attacks, that might also have a legitimate use | | Hyperlink | Classifications derived from extracted URLs | | Infostealer | Steals personal info, passwords, etc. | | Keylogger | Records keystrokes | | Malware | New and recently discovered malware not yet named by the research community | | Network | Networking utilities, such as tools for DoS, DDoS, etc. | | Packed | Packed applications (UPX, PECompact…) | | Phishing | Email messages (or documents) created with the aim of misleading the victim by disguising itself as a trustworthy entity into opening malicious links, disclosing personal information or opening malicious files. | | PUA | Potentially unwanted applications (hoax, joke, misleading...) | | Ransomware | Malware which encrypts files and demands money for decryption | | Rogue | Fraudulent AV installs and scareware | | Rootkit | Provides undetectable administrator access to a computer or a mobile device | | Spam | Other junk mail that does not unambiguously fall into the Phishing category, but contains unwanted or illegal content. | | Spyware | Collects personal information and spies on users | | Trojan | Allows remote access, hides in legit applications | | Virus | Self-replicating file/disk/USB infectors | | Worm | Self-propagating malware with exploit payloads | --- ## Risk score reference table --- ## How Spectra Core analysis works # How Spectra Core Analysis Works All ReversingLabs products are powered by [Spectra Core](https://www.reversinglabs.com/products/spectra-core) - the engine that analyzes every file and sample. The process of analyzing software involves several steps, and the final output are the analysis reports. To better understand the source and significance of the information contained in those reports, it's helpful to learn what Spectra Core does in the background of ReversingLabs products. This page provides an overview of the Spectra Core analysis process and explains what happens with files in each of the analysis steps. The following main steps have dedicated sections where they are described in detail: 1. [Identification](#1-identification) 2. [Unpacking](#2-unpacking) 3. [Validation](#3-validation) 4. [Metadata processing](#4-metadata-processing) 5. [Classification](#5-classification) ## Automated static analysis When you scan a file with Spectra Core, the engine automatically performs static analysis on the file and all files extracted from it. Automated static analysis is also referred to as **complex binary analysis**. This unique approach to software analysis decomposes files, collects their metadata, and classifies them in terms of the security risk they pose to end-users. Files are analyzed recursively, which means that every file extracted from the software package goes through the same analysis process like its container software package. As implemented in Spectra Core, automated static analysis does not require access to the source code (like SAST tools typically do). It can directly examine compiled software binaries to determine their structure, dependencies and behaviors. In addition to analyzing software binaries (which is the primary use-case), Spectra Core can analyze library code and source code for specific scripting languages. Another benefit of automated static analysis is that **files are not executed during the analysis process**. All available data is extracted even if the files are compressed, executable, or damaged - regardless of their target OS or platform. Because the analysis process does not execute any files, it can be completed in milliseconds and performed on very large files without significant performance penalties. All these features of automated static analysis give Spectra Core a unique advantage - it can analyze post-build artifacts and detect more novel, sophisticated software supply chain attacks than SCA tools are able to. SCA tools typically analyze package managers, manifest files, or source code repositories to find vulnerabilities. They are limited by the need for known signatures of open source dependencies that have to be cross-referenced against a vulnerability database. Being used in pre-build environments, SCA tools lack visibility into deep file structures and build process tampering evidence - insights that Spectra Core readily provides. ## The Spectra Core analysis process The process starts with the input file. The analysis engine performs several distinct steps on every object it extracts from the input file. The following diagram illustrates the flow that every object goes through. You can interact with the diagram to learn more about the process: - Select steps in the diagram to access their dedicated sections on this page ### 1. Identification Format identification is the initial step of the Spectra Core analysis process. To successfully perform the subsequent analysis steps, we first need to know the file format of every object we are analyzing. Specifically, this step analyzes the object structure to determine whether it's **binary** or **text**, and assigns the analyzed object a unique file format description. This description - file format identification - instructs the analysis engine on which rules and modules to use for further file processing. Two main approaches are used for format identification: - **Signatures** - created by ReversingLabs researchers to identify **binary** file formats based on their unique features. For example, Windows .exe files start with bytes "MZ", while PNG files will usually start with "‰PNG". Signatures describe expectations of what a file format should contain. Using heuristics, the analysis process checks whether those expectations align with the actual file structure. In addition to signatures, the analysis process also evaluates any relevant YARA rules (built into the engine as well as user-provided). If there are multiple matches, those from signatures take priority over YARA rule matches. - **Machine learning models** - created and trained by ReversingLabs researchers to identify **textual** file formats based on statistical text identification. The models are able to recognize basic text objects as scripting languages and distinguish software source code from other types of textual content. **Note: ✅ Completing the identification step** The results of the format identification step are: - File hashes - calculated by the analysis engine - File format descriptions - represented as File type.File subtype.Identification (for example, `Binary/Archive/ZIP`). If there are multiple versions of a file format, they can be identified through the additional `version` field. After the format has been identified, the file is either directed to the proper unpacking module according to its signature, or to the validation step. ### 2. Unpacking Unpacking, also referred to as **file decomposition**, is a step in the Spectra Core analysis process where the analyzed file is taken apart to extract all available components and metadata. During the unpacking process, the analysis engine eliminates obfuscation, encryption, compression, and any other protections that may have been applied to the file and its contents. The engine has built-in mechanisms to prevent infinite recursion, and supports configuring the decompression ratio and unpacking depth (how many layers of a file to extract). Different file formats require different unpacking approaches because of their structure and complexity. Because static analysis does not execute a file, it requires **unpackers** - specialized tools for parsing and unpacking individual file formats. ReversingLabs develops in-house static unpackers tailored to specific file formats, and Spectra Core relies on those unpackers during analysis. Generally speaking, goodware file formats are easier to unpack because their structure is known and well-defined, and file behavior can be observed from the format definition. File formats commonly used for malware are good at hiding code, which makes their unpacking more challenging. To create an unpacker for malware file formats, researchers have to identify each format and document its structure. The unpacker must be able to simulate file execution so that its code can be reconstructed and its behavior observed. Any obfuscation and protection artifacts must also be removed to allow extracting further objects. Information about the file behavior allows the unpacker - and consequently, the analysis process - to reveal the original software intent and to let users understand the true meaning of the code that was packed in that particular file format. The ability to unpack a file format makes it possible for the Spectra Core analysis engine to extract a wealth of metadata and critical information often not available from other tools. The collected metadata includes but is not limited to: format header details, strings (including secrets and URIs), function names, library dependencies, and file segments. Unpacking greatly increases the surface that can be analyzed and helps file classification by providing more metadata to look at. This makes it easier to confirm classification verdicts and increases the chance to catch every threat. **Note: ✅ COMPLETING THE UNPACKING STEP** After the file has been successfully unpacked, all collected metadata and the unpacked file content are passed to the validator assigned to the file format. The validator then performs integrity checks on the available data. ### 3. Validation Validation is a step in the Spectra Core analysis process where the **structure** and the **digital signatures** of the analyzed file are verified according to specific criteria for each file format. In the validation step, the previously identified file format is checked against its specification (the formal definition of the file format by its designer). In other words, the validation process looks for differences between the file format specification and its implementation. By doing this, we can gather additional information about the file format and detect anomalies in it. Any malformations that violate the file format specification are further examined to determine if they are capable of triggering potentially malicious behavior. Such malformations may be reported as known vulnerabilities. ReversingLabs uses these malformation patterns to create heuristics for potential future exploits and predictive vulnerability detection. Multiple validators may be used to verify a file format. They are called successively, first to last, or until one of them acknowledges that it recognizes and can handle the specific file format. If validation fails for one of them, the entire file is marked as invalid. Detected issues are reported as validation warnings or errors, depending on their severity. In addition to performing integrity checks of the file format structure, the validation step also verifies any digital certificates that have been used for code signing. Depending on its status, a certificate may influence the classification of files signed with it. The validation step assigns one of the following statuses to every detected certificate: - Valid certificate - Invalid certificate - Bad checksum - Bad signature - Malformed certificate - Self-signed certificate - Impersonation attempt - Expired certificate - Untrusted certificate - Revoked certificate **Note: ✅ COMPLETING THE VALIDATION STEP** After the file has been validated, all collected metadata is processed, evaluated, and transformed into actionable information that can be used to deliver the final file classification. ### 4. Metadata processing Metadata processing is a step in the Spectra Core analysis process where all previously collected metadata is translated into **human-readable**, **explainable information**. That information is used to produce or support the final file classification. Most of it is surfaced in Spectra Core analysis reports. In this step, metadata is converted into **capabilities** and **indicators**. They build up on the file format properties and platform-specific features of the analyzed file to describe software behavior and intent in more detail. The goal is to make it clearer what the analyzed code means and what each object is trying to do. #### Indicators Indicators can be described as behavior markers that are triggered when a specific pattern is found in the collected metadata or in the file content. An indicator may be triggered for multiple reasons. While some indicators can only be found in specific file formats, most are universal and therefore generally applicable. Indicators contribute to the final file classification, but not in an equal measure. Those deemed highly relevant are better at describing the detected malware type, while those with less relevant contributions help in solidifying the machine learning detection. #### Capabilities Based on the indicators triggered on a file, the analysis engine infers that the file exhibits a specific behavior, or that it is capable of performing specific actions. Similar software behaviors are grouped into broader categories - capabilities - according to the features they have in common. For example, a file can have the filesystem capability, which is a broad description that says the file can access the filesystem or perform filesystem operations, but doesn't describe which operation will actually take place. More fine-grained software behavior descriptions are derived from the indicators (e.g. "Accesses the httpd.conf file"). #### Tags The metadata processing step also assigns tags to files based on their properties such as certificate information, software behaviors, file contents, and many more. Some tags can only be applied to specific file types (for example, web browsers or mobile applications). Tags are visible in [Spectra Analyze](/SpectraAnalyze/tags) and can be queried through the [Spectra Intelligence Advanced Search (TCA-0320)](/SpectraIntelligence/API/MalwareHunting/tca-0320) API. In SAFE reports generated by Spectra Assure, tags appear for all unpacked files and for URIs in the Networking section, where they can be used for filtering. **Note: ✅ COMPLETING THE METADATA PROCESSING STEP** After the metadata has been fully processed, the file receives its classification status in the next step of the analysis. ### 5. Classification Classification is a step in the Spectra Core analysis process where the analysis engine produces a **verdict** on whether the analyzed file contains threats harmful to the end-user. Multiple technologies are used for file classification: - format identification - signatures (byte pattern matches) - file structure validation - extracted file hierarchy - file similarity (RHA1) - certificates - machine learning - heuristics (for scripts and fileless malware) - YARA rules included in the analysis engine They are shipped with the analysis engine and can be used offline, without connecting to any external sources. Their coverage varies based on threat and file format type. In other words, not all technologies can detect all threat types, and not all of them work on all file formats. Those default classification abilities of the Spectra Core platform can be extended with **threat intelligence from the ReversingLabs Cloud** to retrieve file reputation information, and with **custom YARA rules for user-assisted classification**. Some classification approaches are more specific than others, with signatures being the most specific. The final classification result relies on the information from all analysis steps, and it is a combination of all technologies applicable to the file format. It will always match one of the technologies even though they may have differing results between them. Because of differences in how malicious files and malware families behave, some files might end up classified as malicious by one technology, and still be considered goodware by others. This doesn’t negate or diminish the final classification. #### Explainable Machine Learning Spectra Core is the first and only solution on the market that relies on [Explainable Machine Learning (xAI)](https://www.reversinglabs.com/blog/machine-learning-for-humans) for threat detection. Explainable Machine Learning was launched by ReversingLabs in 2020 as a predictive threat detection method that can detect novel malware. It focuses on providing threat analysts with human-readable insights into machine learning-driven classifications. The goal of ReversingLabs Explainable Machine Learning is to go beyond the basic verdict of "goodware vs malware", and to help analysts understand **what type of threat was found**, **why it was detected**, and **what to do with it next**. To achieve that, the classification system combines: - **explainability** (by surfacing software behaviors in the form of indicators), - **relevance** (by ranking behaviors based on their contribution to the final verdict), - and **transparency** (by displaying why each software behavior was triggered). Using natural language to provide clear explanations for classification decisions helps security analysts understand how analyzed software behaves and what malware is capable of doing to the system. This transparency fosters trust, facilitates informed decision-making, and makes the logic behind machine learning classification verdicts easier to follow. Over the years, ReversingLabs threat analysts and researchers have carefully transformed raw code and metadata produced by static analysis into indicators - descriptions of software intent. Those indicators are used in training machine learning (ML) models to recognize if a file is malicious based on the described software functionality and behavior. Many of the threats in the training datasets are hand-picked by ReversingLabs experts and fully, correctly labeled so that ML models can learn what constitutes a specific threat type, and distinguish it from other threat types as well as from clean software. This allows ML models to proactively detect and describe threats - even brand new malware - without the need for additional training. When Spectra Core scans a file and extracts some indicators from it, ML models can match them against the indicators they have learned to recognize as typical for malware or a specific threat type. Some indicators are more meaningful in the context of a malware or threat type, so they contribute more to the classification. When the model decides that something is malicious, the decision can be verified through indicators and reasons why they were triggered. This makes the decision more transparent, relevant, and explainable in terms that are familiar to human analysts. ReversingLabs ML models are tailored to threat types to increase accuracy and [continuously improved](https://www.reversinglabs.com/blog/how-to-harden-ml-models-against-adversarial-attacks) to boost their resilience. All classification models can detect if a file is malicious or not. The PE (Portable Executable) malware classifier is also able to provide the information on the detected threat type. The exact threat type indicates higher confidence in the classification result, while threats that get assigned a generic threat type ("Malware") may point to new, emerging malware. The following ML models are used for malware classification: - PE malware classifier - detects if a file is malicious (that covers all the threat types) and if it is a specific malware type (one of **Backdoor**, **Downloader**, **Infostealer**, **Keylogger**, **PUA**, **Ransomware**, **Worm**) - Script classifiers - apply to `Text/