# File Inspection Engine Documentation > Containerized file analysis service documentation. This file contains all documentation content in a single document following the llmstxt.org standard. ## Example values.yaml file ```yaml # Default values for fie. # This is a YAML-formatted file. # Declare variables to be passed into your templates. image: repository: registry.reversinglabs.com/fie/file-inspection-engine pullPolicy: IfNotPresent # Overrides the image tag whose default is the chart appVersion. tag: "" imagePullSecrets: {} # value not used for RL registry licenseFileContent: "" # FIE license received from ReversingLabs nameOverride: "" fullnameOverride: "" podAnnotations: {} podSecurityContext: {} securityContext: {} storage: existingPvcName: "" # set name to use existing pvc size: 32Gi # min. 22Gi atm. className: gp2 rlTmpInRam: true tmpfsSize: 20Gi service: annotations: {} # To set an internal load balancer, refer to your cloud service provider documentation. type: ClusterIP port: 8000 ingress: enabled: false className: "alb" annotations: {} # There are different features supported by various Ingress controllers. Please refer to # documentation on your platform specific Ingress controller to configure it in your environment. hosts: - host: fie.local.lan paths: - path: / pathType: Prefix tls: [] # - secretName: fie-tls # hosts: # - fie.local.lan resources: requests: cpu: 8 memory: 32Gi ephemeral-storage: 100Gi nodeSelector: {} tolerations: [] affinity: {} settings: # This option is available only for alpha6 and above. Possible values: enabled / disabled / force addFileType: "disabled" # Cloud account password, used only for default registry auth cloudPassword: "" # Desired frequency of cloud threat data updates, between 1 minute and 24 hours. Use m for minutes # and h for hours, e.g. 45m or 6h. cloudUpdateInterval: "5m0s" # Automatic updates of cloud threat data cloudUpdates: true # Cloud account username, used only for default registry auth cloudUsername: "" # Maximum concurrent requests performing file analysis, across all HTTP endpoints. Allowed values are from 0 (unlimited) to 100 concurrencyLimit: 20 # The address and port on which the HTTP server will listen. httpAddress: ":8000" # Files larger than this will be analyzed by large Spectra Core instances (0 to 10240 MiB). When 0, there is no distinction between instances. largeFileThreshold: 10 # Set the max decompression factor to limit resource usage during decompression, with 0 meaning no limit. maxDecompressionFactor: 1.0 # The value needs to be between 1 and 10240 MiB. Uploads larger than this will be rejected. maxUploadFileSize: 2048 # The number of Spectra Core instances that will process large files (0 to 100) numberOfLargeCores: 2 # The number of Spectra Core instances that will process regular files (1 to 100) numberOfRegularCores: 4 # Whether suspicious samples should be classified as malicious paranoidMode: false # cgroup v2 memory use percentage that triggers rejection of new file uploads. Allowed values are from 0 (disabled) to 100. processingUnavailableAtMemoryPercent: 0 # The host and port of a proxy for outgoing HTTP connections. It can optionally include one of the # following three schemes: http, https, socks5. Example: socks5://host:port proxyAddress: "" # Maximum analysis time, for example 10s (seconds) or 1m (minute). The default is 0, which means unlimited. timeout: "0" # The maximum number of file layers to unpack when performing static analysis. Valid values # are from 0 (unlimited) to MaxInt32. Default 17 unpackingDepth: 17 # Includes detailed information about malicious samples in the HTTP response withThreatDetails: false # Do not look up samples in the malicious threat data. The files don't need to be present locally either. withoutMaliciousThreatData: false ``` --- ## Air-Gapped Kubernetes Deployment If the network topology of a Kubernetes cluster prevents access to the ReversingLabs registry and APIs, several objects must be manually transferred and uploaded to the cluster. It is crucial to have Kubernetes API access available since `kubectl` will be used throughout this process. **Steps:** - **Download the threat data manually**: Use a FIE instance with internet access to download threat data. - **Deploy FIE in production**: Deploy the production FIE application with cloud updates disabled - **Transfer threat data**: Copy the downloaded threat data to the air-gapped FIE instance. To complete this process, you will need: - The [FIE Helm Chart](./kubernetes.md#appendix-fie-helm-chart) - The FIE container image [pulled from the ReversingLabs registry](./docker.md#pulling-the-docker-image), which must be made available to the Kubernetes cluster via a client-provided registry. - A valid license (provided by ReversingLabs). ### Manually download the threat data The detailed process for downloading threat data is available [here](./docker.md#manual-threat-data-synchronization). ### Deploy the FIE application After the threat data downloads, deploy FIE using the [Helm chart](./kubernetes.md#appendix-fie-helm-chart). #### Making the container image available to Kubernetes To make the FIE container image available to the Kubernetes cluster, you need to [pull it from the ReversingLabs registry](./docker.md#pulling-the-docker-image) and push it to your own registry. Follow these steps: 1. **Load the image**2. **Tag the image**3. **Push the image to your registry****Note: Podman is used in this example, but the syntax should be similar if using Docker.** #### Installing FIE Using Helm Prepare a custom values file to configure the deployment using the [FIE Helm Chart](./kubernetes.md#install-fie-using-helm). Consult with your Kubernetes administrator to decide how to expose the FIE service (e.g., LoadBalancer, Ingress). For the full list of available values, see the [example values.yaml](./Examples/values.md) file. In this example, we use a LoadBalancer service to expose FIE, and we override the default image repository and tag with the settings from the previous step. **Example Configuration (configuration.yaml):**Once you have prepared the values file, you can proceed to install the Helm chart. The Helm chart can be pushed to a chart repository, an OCI repository, or used directly as shown below: **Example Helm installation command** ```bash helm install fie ./fie-0.2.1.tgz --create-namespace --namespace fie-gapped \ --set settings.cloudPassword="$RL_CLOUD_PASSWORD" \ --values configuration.yaml --set-file licenseFileContent=RL-license.enc ``` ```yaml NAME: fie LAST DEPLOYED: Mon Aug 26 11:57:56 2024 NAMESPACE: fie-ag STATUS: deployed REVISION: 1 TEST SUITE: None ``` **Note: CPU Requests: The `RL_CPU_REQUEST` environment variable exposes the number of CPU requests configured in Kubernetes (`resources.requests.cpu`) to the FIE container.** FIE uses this value to report instance usage in `/status` as a percentage relative to the allotted CPU. - **If deploying with our Helm chart**: the chart template maps `resources.requests.cpu` into this variable (see the [Kubernetes deployment guide](./kubernetes.md#cpu-requests)). - **If deploying without Helm**: define `RL_CPU_REQUEST` in the container's environment section of your Pod or Deployment manifest. ### Copy the threat data There are multiple ways to transfer the threat data to the air-gapped environment. Below is one example workflow: 1. **Download the tar package** We will store the threat data into a .tar file. This requires the `tar` package to be installed in the FIE pod. Since this is an air-gapped environment, the `tar` package must be downloaded externally and then transferred to the pod: ```bash curl -O https://cdn-ubi.redhat.com/content/public/ubi/dist/ubi8/8/x86_64/baseos/os/Packages/t/tar-1.30-9.el8.x86_64.rpm ``` This is an example command, make sure to check that you're downloading the latest available version. 2. **Upload the tar package to FIE and install it** ```bash cat tar-1.30-9.el8.x86_64.rpm | kubectl -n fie-gapped exec -it deploy/fie -- cp /dev/stdin /tar-1.30-9.el8.x86_64.rpm ``` ```bash kubectl -n fie-gapped exec -it deploy/fie -- rpm -ihv /tar-1.30-9.el8.x86_64.rpm ``` 3. **Store and transfer threat data** Once `tar` installs, threat data can be stored into a .tar archive and moved over to the pod: ```bash $ cd /external/dir $ tar cvf - * | kubectl -n fie-gapped exec -i deploy/fie -- tar xf - -C /rl/threat-data --no-same-owner ``` 4. **Restart the pod** After everything is installed and copied over, restart the pod: ```bash kubectl -n fie-gapped rollout restart deploy/fie ``` Contact [ReversingLabs Support](mailto:support@reversinglabs.com) for more information and guidance. --- ### Get the application URL To confirm that the File Inspection Engine is up and running, retrieve the application URL and perform a test file submission. You can follow the steps provided in the [Kubernetes Deployment guide](./kubernetes.md#get-the-application-url). --- ## Docker Deployment — File Inspection Engine ## Docker image The File Inspection Engine Docker image can be obtained from the ReversingLabs container registry. ### Pulling the Docker image To pull the Docker image from the ReversingLabs container registry: 1. **Log in to the Docker Registry** Log in using your cloud username and password: ```bash docker login registry.reversinglabs.com ``` 2. **Pull the Docker image** Pull the `file-inspection-engine` Docker image with the specified tag:## Running the application The File Inspection Engine (FIE) reads its license from an environment variable called `RL_LICENSE`. This license, provided by ReversingLabs, must be passed to the application at startup. To start the application, use the following commands.In this example, the container runs on the host network, so no port mapping is needed. **If you're not using the host network, you'll need to map the container's port to the host.** The HTTP server uses port 8000 by default, but you can change it: - To map the port to a different host port:- To change the HTTP port used by the container:## Storage and mounting considerations FIE uses two directories inside the container for storage: 1. `/rl/threat-data`, which it uses to assign file classifications. 2. `/rl/tmp`, which it uses to store file uploads, unpacked files, and file analysis reports. The `/rl/threat-data` directory contains roughly 20 GiB for malicious data and 1 GiB for suspicious data, and additional space is needed during updates, as files are downloaded fully before replacement. Threat data synchronization starts shortly after the application is up and running and continues at regular intervals, configurable via the `--cloud-update-interval` parameter. Initial synchronization involves larger files, while subsequent updates use incremental changes (typically < 100 KB per segment). Data is divided into 256 segments per classification, and each segment may require multiple updates, which can increase the total download size to several megabytes, especially with less frequent updates. This means that a container started "bare" - without any threat data mounted upon start - will first need to pull in around 20 GiB of data, every time it is started. This radically decreases the performance of FIE, so **mounting an external volume is essential**, for example:This allows reusing threat data between containers, for example by transferring it to an [air-gapped instance](#air-gapped-manual-threat-data-synchronization). Mounting an external volume also means that you avoid the [performance costs](https://docs.docker.com/engine/storage/drivers/#copying-makes-containers-efficient) associated with writing to disk inside the container. **Warning: Reusing threat data must be done in **read-only** mode.** Individual FIE containers will, by default, continuously monitor and update their `/rl/threat-data` directory. Reusing threat data between several containers can lead to an issue with how containers interact with that directory. Therefore, make sure that containers which reuse the same source of threat data **do not write to it**. This can be accomplished by turning off [cloud updates](../../configuration/#--cloud-updates--rl_cloud_updates) for all containers which reuse the same data. **Note**: Do not reuse threat data even when only one container is writing. Even in this case, the read-only containers could potentially use old data *while it is being updated*. Since containers are only aware of their own threat data updates, they cannot detect another container being in the middle of an update. ### Selecting the mount type The two main factors to consider when choosing a mount type are **persistence** and **speed**. You want the `/rl/threat-data` directory to be persistent and have good read speed (as that's where the application will look when classifying files), and you want good write speed for `/rl/tmp`. If you're working directly with the threat data (as described in the [air-gapped instance](#air-gapped-manual-threat-data-synchronization) section), select a regular [bind mount](https://docs.docker.com/engine/storage/bind-mounts/). This allows you to freely interact with the downloaded data from the host system. You could also select a [Docker volume](https://docs.docker.com/engine/storage/volumes/#when-to-use-volumes) if you need a persistent source of data, but do not intend to directly interact with it. For `/rl/tmp`, persistence is not important, but write speed is. A possible choice here is [tmpfs mounts](https://docs.docker.com/engine/storage/tmpfs/). This also allows the highest throughput, as the underlying static analysis engine performs a lot of disk writes, and `tmpfs` mounts are RAM-only - which means that the write speeds will be faster. Note, however, that this requires allocating more RAM than e.g. using a bind mount. ## Manual threat data synchronization The File Inspection engine retrieves updates automatically. If you want to pre-download threat data so your customers can start using it immediately, or if you prefer to manually sync the data, use the `threat-data` command included in the image. This command is also used to [download threat data in air-gapped environments](#air-gapped-manual-threat-data-synchronization). If manual threat data updates occur less than once per week, incremental updates may take longer than a full database download. Performance depends on system resources, network bandwidth, and the deployment environment. Incremental updates are recommended by default, but if they are slow, consider these factors and opt for a full download if necessary. ### Supported Options The `threat-data` command supports the following options in addition to username and password: - `RL_PARANOID_MODE` Download data collection for suspicious files. - `RL_PROXY_ADDRESS` Specify a proxy server address if you need to connect to the cloud via a proxy. - `RL_RETRY_COUNT` The number of retries if a segment fails to download during update. - `RL_LOG_JSON` Defines the log output format as either JSON or colored plain text. ### Sync Command To manually sync the threat data, use the `sync` sub-command, which requires specifying the threat data directory: ```bash ./threat-data sync /threat/data/dir ``` To execute this via Docker, run:If you need to treat suspicious files as malicious, make sure to set the `RL_PARANOID_MODE` option to `true` in the command. **Important**: - The `threat-data` command only supports configuration via environment variables. - We recommend pre-downloading the threat data once and including it in your distribution for multiple users, as a full threat data download is more resource-intensive compared to incremental updates. - **Do not** run the `threat-data` command concurrently with the application if both are accessing the same directory. - **Always use the `threat-data` binary from the same FIE version** as your running container. Older binaries are not compatible with newer threat databases. Extract the binary from the container if needed: `docker cp :/rl/app/threat-data ./threat-data` ## Air-gapped manual threat data synchronization For air-gapped environments, follow the process below to synchronize threat data. First, download the threat data on a machine with internet access, then transfer the data to the air-gapped instance. 1. Start a File Inspection Engine (FIE) instance on a machine with internet access. Once the data sync is complete, stop the FIE instance that was used for downloading, and then proceed to step 2. Alternatively, run the following command to manually sync the threat data:`/external/dir` represents the path on the host system where the threat data is stored. If the directory contains older threat data, it will be incrementally updated. **Note: If using paranoid mode, set the environment variable `RL_PARANOID_MODE=true`.** Upon successful synchronization, the log should show `Threat data fully updated`. In case of errors, rerun the command to retry. Proceed to step 2. 2. Stop a production FIE instance (or create a new one) in the air-gapped environment. 3. Copy the threat data from `/external/dir` on the internet-connected machine to the corresponding threat data directory used by the air-gapped FIE instance. Ensure that the transferred data is placed in the directory where the application would normally download it if it were online. For further assistance, contact [ReversingLabs Support](mailto:support@reversinglabs.com). 4. Restart or deploy the air-gapped FIE instance with the updated threat intelligence data. --- ## Deploying File Inspection Engine: Requirements & Architecture ## Hardware Requirements To handle files of up to 2 GB, we recommend the following: - **Memory (RAM)**: - Provision **at least 32 GB of RAM**. File processing may require up to 8 times the file size in RAM, especially to accommodate large file handling and concurrent requests. - **Disk Size**: - Allocate **at least 100 GB of disk space** to support scanning of larger files and threat intelligence database storage requirements. ### Production Deployment A single deployment can handle files of all sizes using multiple Spectra Core instances. - The total number of instances is configured with `--number-of-regular-cores` and `--number-of-large-cores`. - All instances are identical. The "large" group is simply a reserved subset of instances that only process files above the configured threshold (`--large-file-threshold`). - If no large instances are configured (set to 0), all files are processed by the regular pool. Each Spectra Core instance consumes memory even when idle (approximately 1.4 GB per instance), so factor this into your capacity planning. This approach improves CPU utilization and throughput while keeping the deployment architecture simple. --- ## Kubernetes Deployment — File Inspection Engine ## Introduction A typical File Inspection Engine (FIE) installation is performed on Kubernetes using Helm. Throughout this document, we'll be using Google Kubernetes Engine (GKE) as an example. For managed Kubernetes solutions, you may also need to use vendor-specific tools to interact with the cluster. In our example, this will be `gcloud`. To install `gcloud`, follow [these steps](https://cloud.google.com/sdk/docs/install). ------ ## Deploying FIE Helm Chart to GKE Here is an overview of deploying the FIE Helm chart to a GKE cluster: ### Prerequisites - A GKE cluster is available. - `kubectl` is configured to work with your cluster. - Helm is installed. ------ ### Example: Configuring `kubectl` for a Specific Cluster 1. **List Available GKE Clusters:** ```bash gcloud container clusters list ``` **Example Output:** ```bash NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS gke-autopilot-ado-dev us-east4 1.28.8-gke.1095000 35.199.55.139 e2-small 1.28.8-gke.1095000 2 RUNNING ``` 2. **Get Cluster Credentials:** Run the following command to fetch cluster endpoint and authentication data: ``` gcloud container clusters get-credentials gke-autopilot-ado-dev --region us-east4 ``` **Output:** ```bash Fetching cluster endpoint and auth data. kubeconfig entry generated for gke-autopilot-ado-dev. ``` ------ The FIE Helm chart requires valid Spectra Intelligence credentials, which will be provided by ReversingLabs. ## Install FIE Using Helm The examples provided use a placeholder account (`u/example/fie`). Be sure to replace this with your actual credentials wherever applicable. ### Customize the Installation with a Values File For the full list of available values, see the [example values.yaml](./Examples/values.md) file. You can modify values such as ingress or storage class according to your needs. This example exposes the application internally using a load balancer service. ### Set the password and install the Helm Chart 1. Store the password in a variable: ```bash read -rs SPECTRA_INTELLIGENCE_PASSWORD ``` 2. Log in to the ReversingLabs container registry: ```bash echo "${SPECTRA_INTELLIGENCE_PASSWORD}" | helm registry login -u "u/example/fie" --password-stdin registry.reversinglabs.com ``` 3. Install the Helm chart: ```bash $ helm install fie oci://registry.reversinglabs.com/fie/charts/fie \ --create-namespace --namespace fie \ --set settings.cloudPassword="${SPECTRA_INTELLIGENCE_PASSWORD}" \ --values values-deploy-example-gcp.yaml \ --set-file licenseFileContent=rl-license.enc ``` **Expected Output:** ```bash Pulled: registry.reversinglabs.com/fie/charts/fie:0.2.1 Digest: sha256:61ed7f0761912cc5052ceac1d71654f3c1f89f543df0ab6ae3d199070ab02084 NAME: fie LAST DEPLOYED: Tue May 28 11:20:34 2024 NAMESPACE: fie STATUS: deployed REVISION: 1 TEST SUITE: None ``` ### CPU Requests The `/status` endpoint shows both counts (`available_*`) and percentages (`percentage_*`) for Spectra Core instances. The percentages are based on the value of the `RL_CPU_REQUEST` environment variable. - **Helm deployments**: Helm automatically maps `resources.requests.cpu` into this variable. No extra configuration is needed. - **Non-Helm deployments**: You must set `RL_CPU_REQUEST` yourself. This does not affect how many Spectra Core instances are created or how Kubernetes schedules the pod. It only affects how percentages are reported in `/status`. You can provide this value in three ways: 1. **Command-line flag** ```yaml args: ["--cpu-request=8"] ``` 2. **Environment variable** ```yaml env: - name: RL_CPU_REQUEST value: "8" ``` 3. **Kubernetes Downward API** ```yaml resources: requests: cpu: "8" env: - name: RL_CPU_REQUEST valueFrom: resourceFieldRef: containerName: fie resource: requests.cpu ``` In this last example, the Pod requests 8 CPUs, and Kubernetes injects that value into the container as `RL_CPU_REQUEST`. FIE then uses it only for calculating the `percentage_*` fields in `/status`. ------ ### Get the application URL After deployment, obtain the application URL and port by running one of the following commands: 1. **LoadBalancer IP:** > **Note:** It may take a few minutes for the LoadBalancer IP to be available. You can watch the status by running `kubectl get --namespace fie-ag svc -w fie`. ```bash export SERVICE_IP=$(kubectl get svc --namespace fie fie \ --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}") echo http://$SERVICE_IP:8000 ``` 2. **Verify the deployment** ```bash kubectl -n fie get svc/fie ``` **Expected output** ```bash NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE fie LoadBalancer 34.118.234.76 10.128.0.24 8000:32132/TCP 32m ``` --- Once you have the service IP and port, you can send a test query to the liveness/readiness endpoints: ```bash curl -v http://10.128.0.24:8000/livez curl -v http://10.128.0.24:8000/readyz ``` **Note: Check the [Usage](/docs/FileInspectionEngine/usage.md#check-application-liveness) section for more information on these two endpoints.** --- Alternatively, submit a file for analysis. This only works after the [threat data is fully downloaded](#monitoring-the-threat-data-download): ```bash curl -sS -X POST --upload-file eicar.com http://10.128.0.24:8000/scan | jq ``` **Expected output** ```json { "classification": "malicious", "message": "" } ``` ------ ### Monitoring the Threat Data Download After deployment, the FIE application will begin downloading threat data. This process can take between 30 and 90 minutes, depending on your network speed. You can monitor the download process by running: ```bash kubectl -n fie logs -f deploy/fie ``` Look for the following message, which indicates that the download process completed successfully: ```json {"level":"info","component":"threatdata.UpdateManager","time":"2024-09-18T22:32:58.346353125Z","message":"Cloud update run finished"} ``` ## Appendix: FIE Helm Chart ### Download the Helm Chart To download the Helm Chart, run the following commands: ```bash helm registry login -u "$RL_CLOUD_USERNAME" -p "$RL_CLOUD_PASSWORD" registry.reversinglabs.com helm pull oci://registry.reversinglabs.com/fie/charts/fie ``` --- ## File Inspection Engine Configuration Reference The `--help` flag output lists all available command-line options along with their default values.**Note that default values may be overridden by environment variables.** - **Boolean flags** must include an equals sign (=) when explicitly set to `true`/`1` or `false`/`0` (e.g., `--cloud-updates=true`, `--cloud-updates=false`). Alternatively, to enable a boolean flag, use the shortcut by specifying the flag name alone (e.g., `--cloud-updates` for `true`). Non-boolean flags don't need an equals sign. Both `--timeout 10s` and `--timeout=10s` are fine. - **Time Duration Options:** For configuration options containing time durations, the supported suffixes are `ms`, `s`, `m`, and `h` (e.g., `--timeout 10s` or `--cloud-update-interval 1m`). **Info: **Environment variables:** Command line flags can also be passed as environment variables, by using `RL_` as the prefix and replacing the dashes in between the words with underscores. For example, `--http-address` becomes the environment variable `RL_HTTP_ADDRESS`.** When a default value is not mentioned in the help output, it is empty (either an empty string or `false` for boolean options). ## Configuration options ### `RL_LICENSE` - **Description**: Set the contents of your license file. - **Default**: N/A - **Notes**: This option is **only** available as an environment variable. --- ### `RL_RETRY_COUNT` - **Description:** Configures the maximum number of retries for failed threat data segment downloads when using the `threat-data` command. - **Default:** 3 - **Possible Values:** 0 to 100 --- ### `--add-file-type` / `RL_ADD_FILE_TYPE` - **Description**: Controls whether `file_type` information is returned in the [`/scan` API response](./usage.md#file-submissions). - **Default**: `disabled` - **Possible Values**: `disabled`, `enabled`, `force` - **Notes**: - When `enabled`, the correct `file_type` will only be returned if static analysis was performed. - When `force` is set, static analysis is always performed. --- ### `--concurrency-limit` / `RL_CONCURRENCY_LIMIT` - **Description**: Maximum concurrent requests when performing file analysis, across all HTTP endpoints. - **Default**: 20 - **Possible Values**: From `0` (unlimited) to `100` - **Notes**: Even if the limit is set to 0 (unlimited), the system will still track the number of active concurrent requests. The `active_concurrency` field will always reflect the actual number of concurrent requests, regardless of the limit setting. The `active_concurrency` field is logged in the format: `active_concurrency={value}/{limit}`. --- ### `--cpu-request` / `RL_CPU_REQUEST` - **Description**: Informs the application how many CPUs were requested for the container. This value does not control how many Spectra Core instances are created. Those are configured explicitly with `--number-of-regular-cores` and `--number-of-large-cores`. Instead, it is used only for reporting in the `/status` endpoint. - **Default**: Not set (FIE will use the total number of CPUs detected on the node). - **Notes**: - When set, the `percentage_*` fields in the `/status` endpoint are calculated relative to this value. - The `available_*` fields show counts of available instances and are not affected by this value. - This option is most useful in Kubernetes, where you typically specify `resources.requests.cpu`. Docker does not have a concept of CPU requests. - You can provide the value in three ways: 1. Command-line flag: ```yaml args: ["--cpu-request=8"] ``` 2. Environment variable: ```yaml env: - name: RL_CPU_REQUEST value: "8" ``` 3. Kubernetes Downward API (avoids duplicating the number): ```yaml env: - name: RL_CPU_REQUEST valueFrom: resourceFieldRef: containerName: fie resource: requests.cpu ``` --- ### `--proxy-address` / `RL_PROXY_ADDRESS` - **Description**: Specifies the address of a proxy server for contacting the cloud API. - **Default**: N/A - **Possible Values**: - `https://host:port` - `http://host:port` - `socks5://host:port` - **Notes**: You can include credentials in the proxy URL, for example: - `http://user:password@localhost:8080` --- ### `--cloud-update-interval` / `RL_CLOUD_UPDATE_INTERVAL` - **Description**: Sets how frequently the application checks for cloud threat data updates. - **Default**: `5m` - **Possible Values**: From `1m` to `24h` (e.g., `45m`, `6h`) --- ### `--cloud-updates` / `RL_CLOUD_UPDATES` - **Description**: Enables or disables automatic updates for threat data. Cloud updates are automatically disabled when `--without-malicious-threat-data` is set to `true` and `--paranoid-mode` is set to `false`, as threat data is not used in that case. - **Default**: `true` - **Possible Values**: `true`, `false` --- ### `--http-address` / `RL_HTTP_ADDRESS` - **Description**: Defines the host and port for the HTTP server. - **Default**: :8000 - **Possible Values**: - Port only. Example: `:9000` - Host and port. Example: `127.0.0.1:8080` --- ### `--log-json` / `RL_LOG_JSON` - **Description**: Defines the log output format as either JSON or colored plain text. - **Default**: `true` - **Possible Values**: `true`, `false` --- ### `--max-decompression-factor` / `RL_MAX_DECOMPRESSION_FACTOR` - **Description:** Spectra Core has a set of mechanisms that protect the user from intentional or unintentional archive bombs, ranging from checks that prevent a file from making identical copies of itself during unpacking, to the maximum allowed decompression ratio for any given file. These protection measures enable the engine to terminate the archive decompression if the size of unpacked content exceeds a set quota. The maximum decompression ratio is calculated as ``` MaximumDecompressionFactor * (1000 / ln(1 + InputFileSize * pow(10, -5))) ``` where `InputFileSize` must be in bytes. To calculate the maximum decompressed file size, multiply this ratio by the `InputFileSize`. In practice, this means that the unpacking will stop once the size of all extracted content exceeds the theoretical maximum of the best performing compression algorithm. - **Default**: 1.0 - **Notes:** When a file exceeds the decompression ratio, the unpacking will stop and the partially unpacked content will be sent for analysis. If set to a negative value, a warning is printed, and the value defaults to 1.0. Setting this to 0 disables decompression management, but this is strongly discouraged as it leaves the system vulnerable to resource exhaustion attacks. --- ### `--max-upload-file-size` / `RL_MAX_UPLOAD_FILE_SIZE` - **Description**: Maximum file size (in MiB) the application will accept. - **Default**: 100 - **Minimum**: 1 - **Maximum**: 10240 --- ### `--number-of-regular-cores` / `RL_NUMBER_OF_REGULAR_CORES` - **Description**: Configures how many Spectra Core instances are allocated to handle files up to the size threshold (`--large-file-threshold`). - **Default**: 4 - **Possible Values**: 1-100 --- ### `--number-of-large-cores` / `RL_NUMBER_OF_LARGE_CORES` - **Description**: Configures how many Spectra Core instances are reserved for files larger than the size threshold (`--large-file-threshold`). - **Default**: 2 - **Possible Values**: 0-100 - **Notes**: If set to 0, no instances are reserved for large files, and all files are processed by the pool of "regular" instances. --- ### `--large-file-threshold` / `RL_LARGE_FILE_THRESHOLD` - **Description**: File size threshold (in MiB) that determines when a file is routed to the reserved large-file instances. - **Default**: 10 - **Possible Values**: 0-10240 - **Routing rules**: - Files **larger than** the threshold go to the large-file instances. - Files **equal to or smaller than** the threshold stay in the regular pool. - **Notes**: - When set to 0, size-based routing is disabled and all files are distributed across available instances. In this case, the system routes files to the instance with the fewest active analyses, rather than using file size. - File size is only an approximation of processing cost. Real resource usage depends on file complexity (number of unpacked children, nesting depth). Choosing an optimal threshold and timeout may require experimentation based on your workload. --- ### `--paranoid-mode` / `RL_PARANOID_MODE` - **Description**: Enables an additional classification for suspicious files, allowing them to be flagged as `suspicious` instead of `OK`. With this option, the possible response classifications are `OK`, `malicious` (if malicious threat data is not disabled), and `suspicious`. - **Default**: `false` - **Possible Values**: `true`, `false` - **Notes**: - Requires an additional 1 GiB of cloud threat data for suspicious classification. - When malicious or suspicious threat data is enabled, goodware classification is automatically enabled as well and requires 64 MiB of threat data. Goodware classification cannot be directly enabled or disabled. --- ### `--processing-unavailable-at-memory-percent` / `RL_PROCESSING_UNAVAILABLE_AT_MEMORY_PERCENT` - **Description**: Defines the memory usage threshold (in percentage) at which the application will reject new file uploads and return an error on the `/readyz` endpoint. This helps prevent overloading the system when memory usage is high. For example, to reject uploads once memory usage reaches 80%, use: `--processing-unavailable-at-memory-percent=80`. - **Default**: 0 (disabled) - **Possible Values**: 0–100 - **Notes**: The threshold is based on `cgroup v2` memory usage within the container. If your system doesn't support `cgroup v2`, you can disable this feature by setting the parameter to `0`. --- ### `--with-threat-details` / `RL_WITH_THREAT_DETAILS` - **Description**: Determines whether detailed threat information is included in the JSON HTTP response for malware classification. - **Default**: `false` - **Possible Values**: `true`, `false` - **Notes**: Slows down the response as it contacts the cloud API by submitting the file hash to Spectra Intelligence. If no additional threat information is available, the `threat_details` property won't be present. --- ### `--unpacking-depth` / `RL_UNPACKING_DEPTH` - **Description**: The maximum number of file layers to unpack when performing static analysis. - **Default:** `17` - **Possible values**: From `0` (unlimited) to MaxInt32. --- ### `--timeout` / `RL_TIMEOUT` - **Description**: Configures the timeout limit for file analysis, in seconds. The countdown starts when a Spectra Core instance begins processing a file. - **Default**: 0 (unlimited) - **Examples**: `--timeout=30s`, `--timeout=5m`, `--timeout=1h` - **Notes**: - When the timeout is reached, the Spectra Core instance is terminated and restarted. - If the instance was processing multiple files, all analyses are aborted. - Logs contain information about which files were impacted. - Before restart, the instance cleans up its temporary files. Restart time depends on the number of files and disk performance, but typically takes a few seconds. - Because restart takes time, very short timeout values are not recommended. --- ### `--without-malicious-threat-data` / `RL_WITHOUT_MALICIOUS_THREAT_DATA` - **Description**: Allows the application to run without downloading malicious threat data. When enabled, malicious threat data updates are disabled. If `--paranoid-mode` is also enabled, suspicious threat data will still be downloaded. When both malicious and suspicious threat data are disabled, files are classified based purely on static analysis. - **Default**: `false` - **Possible Values**: `true`, `false` - **Notes**: When malicious or suspicious threat data is enabled, goodware classification is automatically enabled as well and requires 64 MiB of threat data. Goodware classification cannot be directly enabled or disabled. --- **Example - Running with proxy and additional settings**--- ## Getting started with File Inspection Engine This guide walks you through running File Inspection Engine (FIE) locally with Docker and scanning your first file via the HTTP API. **What you'll accomplish:** - Pull and run the FIE container - Verify the engine is ready - Submit your first file for analysis - Understand the classification response ## Prerequisites Before you begin: - Docker installed and running on your machine - A ReversingLabs license file for FIE (`.lic`) - Your ReversingLabs cloud credentials (username and password) for registry access - `curl` installed **Tip: Obtaining your license and credentials** Your FIE license file and cloud credentials are provided by [ReversingLabs Support](mailto:support@reversinglabs.com) when your FIE subscription is activated. If you do not have them, contact support before proceeding. ## Step 1: Pull the container image Log in to the ReversingLabs container registry using your cloud username and password, then pull the FIE image: ```bash docker login registry.reversinglabs.com docker pull registry.reversinglabs.com/fie/file-inspection-engine:latest ``` ## Step 2: Start the container Run FIE with your license passed as an environment variable: ```bash docker run -d \ --name fie \ -p 8000:8000 \ -e RL_LICENSE="$(cat /path/to/rl-license.lic)" \ registry.reversinglabs.com/fie/file-inspection-engine:latest ``` On first start, FIE downloads its threat database. Wait for the container to become ready: ```bash curl http://localhost:8000/readyz ``` When ready, this returns `200 OK`. The download may take a few minutes depending on your network speed. ## Step 3: Scan your first file Submit a file for analysis using the `/scan` endpoint: ```bash curl -X POST --upload-file /path/to/your/sample.exe \ http://localhost:8000/scan ``` FIE analyzes the file synchronously and returns a JSON verdict in the same response. ## Step 4: Interpret the response A typical response looks like: ```json { "classification": "OK", "message": "", "errors": [] } ``` The `classification` field indicates the verdict: | Value | Meaning | | --- | --- | | `OK` | File is goodware or unknown — no threat detected | | `malicious` | File is classified as malicious | | `suspicious` | File shows suspicious indicators (only when paranoid mode is enabled) | ### Getting additional threat details To get threat details alongside the classification verdict, start FIE with the `--with-threat-details` and `--add-file-type` flags. The enriched response includes the threat name, platform, and file type: ```json { "classification": "malicious", "message": "", "errors": [], "threat_details": { "platform": "Script", "type": "Trojan", "threat_name": "Script-JS.Trojan.Redirector" }, "file_type": "Text" } ``` ## Next steps Now that you've scanned your first file, explore the full capabilities of FIE: - **[Usage Guide](./usage.md)** — Complete API reference including all endpoints, hash lookups, status monitoring, error handling, and classification overrides - **[Configuration Reference](./configuration.md)** — CLI flags for core instances, timeouts, file size limits, and advanced options - **[Kubernetes Deployment](./Deployment/kubernetes.md)** — Deploy FIE at scale on Kubernetes - **[Air-Gapped Deployment](./Deployment/air-gapped-kubernetes.md)** — Offline environment setup For troubleshooting common issues or understanding response codes, see the [Usage Guide](./usage.md#possible-response-status-codes). --- ## File Inspection Engine File Inspection Engine (FIE) is a containerized file analysis service that performs synchronous, real-time scanning of files via an HTTP API. It is designed for integration into network security pipelines where files must be inspected inline — each request submits a file, waits for analysis to complete, and receives a verdict in the same response. FIE uses [Spectra Core](/General/AnalysisAndClassification/SpectraCoreAnalysis) for static file analysis, enabling deep inspection of over 400 file formats without executing files. It is deployed as an OCI-compliant container on Docker or Kubernetes and maintains a local threat database, so file content never leaves your infrastructure during scanning. ## Key capabilities - Synchronous HTTP API — submit a file, receive a classification verdict in one request - Containerized deployment on Docker or Kubernetes (no agent installation required) - Local threat database — all file analysis happens on-premises - Configurable Spectra Core instances for throughput scaling - Large file handling with dedicated core pools - Optional enrichment with cloud threat details via [Spectra Intelligence](/SpectraIntelligence/) hash lookups (hash only, no file upload) ## Privacy File Inspection Engine keeps all file data on-premises. Files submitted for scanning are processed locally using a bundled threat database and are not uploaded to external services. When the `--with-threat-details` option is enabled, FIE contacts Spectra Intelligence using the file hash only — the file itself is never transmitted. The threat database is updated on a regular schedule from ReversingLabs infrastructure. ## Deployment options Choose a deployment model based on your infrastructure: - [Deployment Overview and Hardware Requirements](./Deployment/) — capacity planning, Spectra Core instance sizing, and networking prerequisites - [Docker Deployment](./Deployment/docker.md) — standalone container setup for development and smaller deployments - [Kubernetes Deployment](./Deployment/kubernetes.md) — Helm chart installation for production Kubernetes clusters - [Air-Gapped Kubernetes](./Deployment/air-gapped-kubernetes.md) — deployment in offline or restricted network environments - [Helm Values Reference](./Deployment/Examples/values.md) — complete chart configuration reference ## Configuration FIE is configured via CLI flags and environment variables. Key settings include the number of Spectra Core instances, analysis timeouts, maximum file size, and network interface bindings. See the [Configuration Reference](./configuration.md) for all available options. ## Usage and API FIE exposes a REST API for file submission, result retrieval, status monitoring, and classification overrides. See the [Usage Guide](./usage.md) for API endpoint documentation, scanning workflows, response formats, and error handling. ## Related resources - [Spectra Core Analysis](/General/AnalysisAndClassification/SpectraCoreAnalysis) — how the underlying static analysis engine works - [Classification](/General/AnalysisAndClassification/Classification) — risk scores, threat levels, and classification methodology - [Platform Requirements](/General/DeploymentAndIntegration/PlatformRequirements) — hardware sizing across all ReversingLabs products --- ## File Inspection Engine Troubleshooting Guide # Troubleshooting This guide covers common issues with [File Inspection Engine](./index.md) (FIE) and the steps to resolve them. --- ## Container exits immediately on startup **Symptom** The FIE container starts and exits within a few seconds. `docker ps` shows it in an `Exited` state. The container never becomes ready to accept requests. **Cause** - A fatal configuration error occurred during initialization, such as a missing or malformed `RL_LICENSE` environment variable. - A required CLI flag has an invalid value (for example, an invalid duration format for `--timeout`). - The container cannot bind to the configured HTTP port because it is already in use. - The static analysis engine (Spectra Core) failed to initialize due to insufficient resources. **Solution** 1. Inspect the container logs immediately after exit: ```bash docker logs ``` or for a Kubernetes pod: ```bash kubectl logs --previous -n ``` 2. Look for startup error messages. A missing license produces: ``` FATAL: License validation failed ``` Confirm that the `RL_LICENSE` environment variable is set and contains the full license file content: ```bash docker run -e RL_LICENSE="$(cat /path/to/license.lic)" \ registry.reversinglabs.com/fie/file-inspection-engine: ``` 3. Check for port conflicts if the log shows a bind error: ``` Error: listen tcp :8000: bind: address already in use ``` Change the host port mapping or stop the process occupying the port: ```bash docker run -p 9000:8000 ... # Map to a different host port ``` 4. Review the [Configuration Reference](./configuration.md) to verify that all provided flags use correct formats. Boolean flags require `=true` or `=false` when explicitly set (for example, `--cloud-updates=false`, not `--cloud-updates false`). 5. For Kubernetes deployments, check the Helm values for misconfigured environment variables or resource limits that are too low. See the [Helm Values Reference](./Deployment/Examples/values.md). --- ## `/readyz` returns 503 — not ready **Symptom** After starting the container, polling the readiness endpoint returns a non-200 status: ```bash curl http://localhost:8000/readyz # Returns 503 or another 5xx/4xx status ``` The container is running but not accepting file submissions. **Cause** - Threat data has not finished downloading. FIE requires the threat database to be available before it becomes ready, as described in [Starting the File Inspection Engine](./usage.md#starting-the-file-inspection-engine). - All Spectra Core instances are currently busy (high load or concurrency limit reached). - The license is invalid or has expired, preventing the engine from completing initialization. **Solution** 1. Check the container logs for readiness-related messages: ```bash docker logs -f ``` On first startup, you will see threat data download progress. Wait for the download to complete. The container logs `Instance is ready` when at least one analysis instance is available: ```json {"level":"info","process":"fie","instance_id":"core-regular-0.abc12","message":"Instance is ready"} ``` 2. Check the `/status` endpoint for more detail on the current state: ```bash curl http://localhost:8000/status ``` Review the `license.valid_until` field and the `spectra_core.available_regular_cores` field. If `available_regular_cores` shows `0/N`, all instances are busy or failed to initialize. 3. If the license has expired, update the `RL_LICENSE` environment variable and restart the container. See [License validation error on startup](#license-validation-error-on-startup). 4. For the `/readyz` endpoint behavior when under load, see [Request Rejection](./usage.md#request-rejection). The endpoint returns a non-200 status when memory or concurrency limits are exceeded — this is expected behavior, not a fault. --- ## Threat database download fails or is slow **Symptom** The container starts but remains in a not-ready state for a long time. Logs show repeated download failures or slow progress: ```json {"level":"warn","process":"fie","message":"Failed to download threat data segment, retrying"} ``` Or the threat data download does not start at all. **Cause** - Outbound HTTPS connectivity from the container to the ReversingLabs update infrastructure is blocked by a firewall or requires a proxy. - The license does not include the appropriate threat data entitlement. - The `--without-malicious-threat-data` flag is not set, but network access to the update server is unavailable. - The container's DNS is not resolving the update server hostname. **Solution** 1. Test outbound connectivity from inside the container: ```bash docker exec curl -I https://updates.reversinglabs.com ``` If this fails, the container cannot reach the update server. Check firewall egress rules and ensure that outbound HTTPS (port 443) is permitted. 2. If a proxy is required, configure it using the `--proxy-address` flag or the `RL_PROXY_ADDRESS` environment variable: ```bash docker run -e RL_PROXY_ADDRESS="http://proxy.company.internal:8080" \ -e RL_LICENSE="..." \ registry.reversinglabs.com/fie/file-inspection-engine: ``` 3. For air-gapped environments, use the offline threat data download process. See [Air-Gapped Kubernetes Deployment](./Deployment/air-gapped-kubernetes.md) for the procedure to pre-load threat data without internet connectivity. 4. The `RL_RETRY_COUNT` environment variable controls how many times FIE retries failed segment downloads (default: 3). For flaky connections, increase this value: ```bash docker run -e RL_RETRY_COUNT=10 ... ``` 5. If you want to run FIE without downloading malicious threat data (relying on static analysis only), set `--without-malicious-threat-data=true`. See the [Configuration Reference](./configuration.md) for the implications of this option. --- ## Analysis returns 503 Service Unavailable **Symptom** `POST /scan` requests return: ```http HTTP/1.1 503 Service Unavailable ``` or: ```http HTTP/1.1 429 Too Many Requests {"error":"The concurrency limit has been reached"} ``` or: ```http HTTP/1.1 429 Too Many Requests {"error":"Analysis not accepted due to high processing load"} ``` **Cause** - All Spectra Core instances are busy processing other files (high load). - The concurrency limit configured with `--concurrency-limit` has been reached. - Memory usage has exceeded the `--processing-unavailable-at-memory-percent` threshold. **Solution** 1. Review the [response status codes](./usage.md#possible-response-status-codes). A 429 with `"The concurrency limit has been reached"` means too many simultaneous requests are active; retry after a short delay. 2. Monitor the `/status` endpoint to see current instance availability: ```bash curl http://localhost:8000/status | python3 -m json.tool | grep -A4 "spectra_core" ``` The `available_regular_cores` and `available_large_cores` fields show how many instances are currently free. 3. Implement retry logic with backoff in your client for 429 responses. Do not retry at a constant rate under load — this worsens congestion. 4. Increase the number of Spectra Core instances (`--number-of-regular-cores`) to handle higher concurrency, subject to available CPU and memory. See the [Configuration Reference](./configuration.md). 5. Check logs for the high-load indicators described in [Logging](./usage.md#logging): ```json {"level":"warn","process":"core","message":"High processing load"} ``` Wait for `"High processing load over"` before resuming normal submission rates. 6. For sustained high throughput, consider deploying multiple FIE instances behind a load balancer, with each instance's `/readyz` endpoint used as the health check. --- ## Port binding conflict **Symptom** The container fails to start with an error in the logs: ``` Error: listen tcp :8000: bind: address already in use ``` or Docker reports: ``` docker: Error response from daemon: driver failed programming external connectivity: Bind for 0.0.0.0:8000 failed: port is already allocated. ``` **Cause** - Another process on the host is already using port 8000 (the default FIE HTTP port). - A previous FIE container is still running and holding the port. - The Docker daemon has reserved the port range that includes 8000. **Solution** 1. Identify what is using the port: ```bash sudo lsof -i :8000 sudo ss -tlnp | grep 8000 ``` 2. If an old FIE container is occupying the port, stop it: ```bash docker ps -a | grep fie docker stop docker rm ``` 3. Map the container to a different host port without changing the internal port: ```bash docker run -p 9001:8000 \ -e RL_LICENSE="..." \ registry.reversinglabs.com/fie/file-inspection-engine: ``` 4. To change the port the FIE process listens on internally, use the `--http-address` flag: ```bash docker run -p 9001:9001 \ -e RL_HTTP_ADDRESS=":9001" \ -e RL_LICENSE="..." \ registry.reversinglabs.com/fie/file-inspection-engine: ``` See the [Configuration Reference](./configuration.md) for the `--http-address` option. --- ## Out of memory (OOM) — container killed **Symptom** The container is killed abruptly during analysis. Docker events or Kubernetes events show: ``` OOMKilled ``` or the host `dmesg` contains: ``` Out of memory: Kill process (fie) score or sacrifice child ``` **Cause** - The container memory limit is too low for the number of Spectra Core instances and the file types being analyzed. - Files with very high decompression ratios (deeply nested archives) are consuming more memory than expected. - The temporary directory is mounted as `tmpfs`, which counts toward container memory usage. **Solution** 1. Increase the container memory limit. As a general guideline, allocate at least 1–2 GB of memory per Spectra Core instance, plus overhead for the FIE process itself. For Docker: ```bash docker run --memory="8g" ... ``` For Kubernetes, update the resource limits in the Helm values. See the [Helm Values Reference](./Deployment/Examples/values.md). 2. If `tmpfs` is used as the temporary directory, its contents count toward container memory. Consider switching to a host-mounted volume for temporary files to avoid this. 3. Enable the memory threshold check using `--processing-unavailable-at-memory-percent`. This causes FIE to reject new submissions when memory usage is high, preventing OOM rather than being killed: ```bash docker run -e RL_PROCESSING_UNAVAILABLE_AT_MEMORY_PERCENT=85 ... ``` When memory exceeds 85%, the engine logs: ```json {"level":"warn","message":"Memory use is above the threshold of 90%"} ``` and starts returning HTTP 429 to new submissions. See [Memory Usage](./usage.md#memory-usage). 4. Reduce the number of concurrent Spectra Core instances (`--number-of-regular-cores`) to lower peak memory consumption. 5. Review the [platform requirements](/General/DeploymentAndIntegration/PlatformRequirements) for recommended memory allocations based on instance count and expected file types. --- ## Container restarts because of cgroup v2 `memory.oom.group` **Symptom** The Kubernetes node restarts the entire FIE pod when one Spectra Core engine hits its memory limit. Logs show a single engine OOM, but the container is removed rather than just the failing process. **Cause** - Kubernetes v1.28+ defaults the node-level `memory.oom.group=1`, so any OOM in the pod kills every process in that cgroup. FIE enables concurrent Spectra Core instances inside the same pod, and cgroup v2 enforces the group-wide kill. This differs from cgroup v1 behaviour where only the oom-ing process (engine) was restarted. - The behavior is driven by the node’s kubelet configuration and is not something the FIE image can change. **Solution** If you are deploying on Google Kubernetes Engine (GKE), you can restore the cgroup v1-style behavior where only the offending process is killed: 1. Enable the kubelet `singleProcessOOMKill` option on your node pools. This setting is available starting with GKE versions `1.32.4-gke.1132000` and `1.33.0-gke.1748000`. 2. Follow the Google Cloud documentation for [Customizing node system configuration](https://cloud.google.com/kubernetes-engine/docs/how-to/node-system-config) to apply the `singleProcessOOMKill: true` toggle. 3. After the nodes pick up the new kubelet config, pods experiencing isolated engine OOMs should only restart the affected Spectra Core process instead of the entire container. The pod will still log the original OOM event and should recover once the engine restarts. 4. Continue sizing memory and core counts according to the [platform requirements](/General/DeploymentAndIntegration/PlatformRequirements), since `singleProcessOOMKill` only affects how kubelet responds to the OOM—it does not prevent the underlying memory condition. --- ## Large files time out during analysis **Symptom** Analysis of files above a certain size returns: ```http HTTP/1.1 524 {"error": "The analysis could not be completed within the configured maximum analysis time"} ``` The container logs show: ```json {"level":"warn","message":"Analysis aborted due to a timeout"} {"level":"warn","message":"Analysis has timed out"} ``` **Cause** - The `--timeout` value is too short for the complexity of the file being analyzed. - A very large or deeply nested archive requires more time to unpack and analyze than the timeout allows. - After a timeout, the Spectra Core instance handling the file is restarted, temporarily reducing available capacity. **Solution** 1. Increase the analysis timeout using the `--timeout` flag. Duration values use `s`, `m`, or `h` suffixes: ```bash docker run -e RL_TIMEOUT="5m" ... ``` Note: very short timeout values are not recommended because instance restarts after a timeout can cause cascading delays. See [Timeouts](./usage.md#timeouts). 2. After a timeout, the affected instance restarts automatically. Monitor logs for `"Instance is ready"` to confirm recovery: ```json {"level":"info","message":"Instance is ready"} ``` 3. For predictably large files, configure a dedicated large-file instance pool using `--number-of-large-cores` and `--large-file-threshold`. These instances process one file at a time, and their separate timeout can be tuned independently: ```bash docker run \ -e RL_NUMBER_OF_LARGE_CORES=2 \ -e RL_LARGE_FILE_THRESHOLD=50 \ -e RL_TIMEOUT="10m" \ ... ``` See the [Configuration Reference](./configuration.md) for all large-file pool options. 4. Use the [Check for Hard Timeout](./usage.md#check-for-hard-timeout) procedure to distinguish regular timeouts from hard timeouts caused by Spectra Core process termination. --- ## License validation error on startup **Symptom** The container exits immediately or the `/readyz` endpoint returns a non-200 status. Container logs contain: ``` FATAL: License validation failed ``` or: ``` License expired ``` The `/status` endpoint returns a `valid_until` date in the past. **Cause** - The `RL_LICENSE` environment variable is not set. - The license file content is truncated, incorrectly formatted, or was copied with extra whitespace or line breaks. - The license has reached its expiration date. - For network-validated licenses, the container cannot reach the ReversingLabs license server. **Solution** 1. Confirm the `RL_LICENSE` environment variable is set. Pass the license as the entire file contents: ```bash # Using a license file on disk docker run -e RL_LICENSE="$(cat /path/to/rl-license.lic)" \ registry.reversinglabs.com/fie/file-inspection-engine: ``` For Kubernetes, store the license as a Secret and reference it in the pod spec: ```bash kubectl create secret generic fie-license \ --from-file=RL_LICENSE=/path/to/rl-license.lic ``` 2. Verify the license has not expired using the `/status` endpoint: ```bash curl http://localhost:8000/status | python3 -m json.tool | grep valid_until ``` 3. If the license is expired, contact your ReversingLabs account manager or [support@reversinglabs.com](mailto:support@reversinglabs.com) to obtain a renewed license. 4. Note that `RL_LICENSE` is only available as an environment variable, not as a CLI flag. See the [Configuration Reference](./configuration.md) for the `RL_LICENSE` parameter notes. --- ## Analysis results show UNKNOWN for all files **Symptom** All files submitted to `/scan` return `"classification": "OK"` regardless of file type, and no malicious verdicts are produced even for files known to be malicious. **Cause** - The `--without-malicious-threat-data=true` flag is set, which disables downloading of malicious threat data and prevents malicious classifications from threat data matching. - Threat data has not yet downloaded successfully, so the engine is operating without a populated database. - The threat database timestamp is very old (stale), indicating updates have not been applied for an extended period. **Solution** 1. Check the current threat data configuration and status using `/status`: ```bash curl http://localhost:8000/status | python3 -m json.tool ``` Review the `threat_data.enabled_classifications` field. If it shows an empty array (`[]`), malicious classification from threat data is disabled. The `version.threat_data` field shows when the database was last updated. 2. If `enabled_classifications` is empty, check whether `--without-malicious-threat-data=true` is set in your configuration. Remove this flag (or set it to `false`) if you want malicious threat data to be used: ```bash docker run -e RL_WITHOUT_MALICIOUS_THREAT_DATA=false ... ``` 3. If threat data is enabled but stale, verify that cloud updates are working. Check `--cloud-updates` is not set to `false` and that the container can reach the update server. See [Threat database download fails or is slow](#threat-database-download-fails-or-is-slow). 4. Note that with `--without-malicious-threat-data=false` (the default), FIE still classifies files using [Spectra Core](/General/AnalysisAndClassification/SpectraCoreAnalysis) static analysis, so some malicious files will be detected even without threat data. However, threat data significantly improves detection coverage. --- ## `/status` endpoint shows zero available instances **Symptom** The `/status` endpoint shows all Spectra Core instances as unavailable: ```json { "spectra_core": { "available_regular_cores": "0% (0/4)", "available_large_cores": "0% (0/2)" } } ``` All `/scan` requests are being rejected with 429 or 503. **Cause** - All instances are busy processing files submitted simultaneously. - One or more instances have timed out and are in the process of restarting. - All instances failed to initialize during startup (for example, due to resource exhaustion). **Solution** 1. Wait briefly and re-check `/status`. Instances that are restarting after a timeout typically recover within a few seconds. Look for `"Instance is ready"` log messages: ```bash docker logs -f | grep "Instance is ready" ``` 2. If instances are busy (not restarting), reduce the rate of incoming requests and allow in-flight analyses to complete. Check the concurrency limit (`concurrency_limit` in `/status`) and compare it to the number of active instances. 3. If instances failed during startup, check logs for initialization errors: ```bash docker logs 2>&1 | grep -i "error\|fatal\|failed" ``` 4. Check for [OOM conditions](#out-of-memory-oom--container-killed) — if instances are being killed by the kernel before they can finish initializing, the available count will remain at zero. 5. For a persistent `0/N` state where all instances are stuck, restart the container. If this state recurs, review the [platform requirements](/General/DeploymentAndIntegration/PlatformRequirements) to ensure the host has sufficient CPU and memory for the configured number of instances. 6. For Kubernetes deployments, check whether the pod itself is in a degraded state: ```bash kubectl describe pod -n kubectl top pod -n ``` --- ## File Inspection Engine API Reference ## General ### Starting the File Inspection Engine The File Inspection Engine starts the main application process separately from its analysis instances. The application will start successfully in most cases, except for fatal configuration errors. - The license is checked during startup. - Threat data must be downloaded before analysis can begin. - Analysis instances initialize independently. The application is considered ready when at least one instance is ready, which can be seen in the logs or through the `/status` endpoint. To verify overall readiness, use the readiness endpoint: ```bash curl http://:/readyz ``` If the application is ready, this returns `200 OK`. The `/readyz` endpoint is the recommended way to confirm readiness, because it checks not only instance availability but also license validity, threat data availability, and the concurrency limit. **Note: When starting for the first time, the application needs to download threat data. This process may take some time, and the application will only become fully usable once the threat data download is complete, regardless of the messages displayed.** If the static analysis engine fails to initialize or another fatal configuration error occurs, the application will exit. For all other errors, logs will be generated, and the application will continue to run. ### Possible Response Status Codes If present, the `errors` and `message` fields may contain soft errors (e.g., failing to get detailed threat information from the cloud), but are often empty. Hard errors will be returned as HTTP status `500 Internal Server Error`. |Code | Description | Message| |-----|--------------|------------ |200 | The request has succeeded. | N/A| |400 | File size error. | `{"error": "Maximum upload file size in bytes is {configured_value}" }` | |429 | Concurrency limit reached. | `{"error":"The concurrency limit has been reached"}` | |429 | High processing load | `{"error":"Analysis not accepted due to high processing load"}` | |524 | A timeout occurred. | `{"error": "The analysis could not be completed within the configured maximum analysis time"}`| ## Classification and Threat Data Configuration The File Inspection Engine's classification behavior is controlled by two key configuration options that work together: `--without-malicious-threat-data` and `--paranoid-mode`. When `--without-malicious-threat-data` is enabled, the engine skips downloading malicious threat data and relies on static analysis for classification. The `--paranoid-mode` option adds a third classification level (`suspicious`) and requires additional threat data. When both `--without-malicious-threat-data=true` and `--paranoid-mode=false`, cloud threat data updates are automatically disabled since no threat data is needed. However, if `--paranoid-mode=true`, suspicious threat data will still be downloaded even when malicious data is disabled. Related configuration options: - [`--without-malicious-threat-data`](./configuration.md#--without-malicious-threat-data--rl_without_malicious_threat_data) - [`--paranoid-mode`](./configuration.md#--paranoid-mode--rl_paranoid_mode) ### Classification Overrides The File Inspection Engine supports the following types of classification overrides: - **Goodware overrides**: Part of the goodware threat data classification. These are files marked as goodware by ReversingLabs analysts and are automatically downloaded when cloud updates are enabled. The goodware classification requires 64 MiB of threat data. Analyst overrides are always goodware classifications. - **User overrides**: Custom classifications by other users within the same organization (indicated by the middle segment of the username - u/**company**/user). Users can change a file's classification to malicious, suspicious, or goodware. They are automatically downloaded when cloud updates are enabled, their count is logged at application startup. **Creating User Overrides** User overrides can be created in the following ways: 1. **Spectra Analyze**: Override classifications **in [Spectra Intelligence](/SpectraIntelligence/)** for specific files through the [Spectra Analyze user interface](/SpectraAnalyze/#administering-classification-overrides). 2. **Spectra Intelligence API**: Use the [TCA-0102 File reputation override](/SpectraIntelligence/API/FileThreatIntel/tca-0102) service to programmatically create user overrides. Both types of overrides apply only to container files, i.e. to the top-level file being scanned, and not to unpacked children within archives. For example, a file with a user override that is in a ZIP archive will not have the override applied. ## File Submissions ``` POST /scan ``` To scan a file, make a POST request to `http://:/scan`, with the file contents as the raw request body. - You don't need to set the `Content-Type` header. If set, it will be included alongside every log message related to that submission. - It is recommended to set the `Content-Length` header to prevent files larger than the maximum upload size from partially uploading before getting rejected. - Optionally, you can provide an `external_id` query string parameter that will also be included alongside log messages related to the submission. This parameter can contain any value meaningful to the client application, such as a file name, database ID, or other identifying information. ### Examples using `curl` **Example request:** ```bash curl -X POST --upload-file example.docx http://:/scan ``` **Example response:** ```json5 { "classification": "OK", "message": "", "errors": [] } ``` **Classification**: The `classification` string will be either `"OK"` or `"malicious"`. If [paranoid mode](/FileInspectionEngine/configuration/#--paranoid-mode--rl_paranoid_mode) is turned on, the classification could also be `"suspicious"`. If `with-threat-details` and `add-file-type` options are enabled, the response may look like: ```json { "classification": "malicious", "message": "", "errors": [], "threat_details": { "platform": "Script", "type": "Trojan", "threat_name": "Script-JS.Trojan.Redirector" }, "file_type": "Text" } ``` **Logging:** The following example provides `Content-Type` and a custom external ID in the request, both of which can be visible in application logs: ```bash curl -X POST -H 'Content-Type: application/x-tar' --upload-file archive.tar 'http://localhost:8000/scan?external_id=my%20external%20id' ``` Log example: This request would create the following log entry, including the external ID: ```json { "level": "info", "process": "fie", "request_id": "2fd7364b-50c5-4128-ae08-35a819dda62f", "external_id": "my external id", "content_type": "application/x-tar", "component": "http.api", "request_path": "/scan", "content_length": 244244480, "active_concurrency": "1/20", "time": "2025-09-24T15:25:06.350178812+02:00", "message": "Upload started" } ``` ### Error handling If there are any errors, they will be returned in the `message` field (deprecated), as well as the `errors` field. The `message` field will contain the same errors as the `errors` array, only it will be a semicolon-concatenated string. For example: ```json5 { "message": "error one; error two; error three", "errors": ["error one", "error two", "error three"] } ``` In some cases, certain errors are expected, and are converted to additional properties inside `analysis_information`. For example, if a file hits the [decompression factor limit](/FileInspectionEngine/configuration/#--max-decompression-factor--rl_max_decompression_factor), this error will be logged in `errors` and `message`, but also present in `analysis_information.partial_unpacking`. ```json5 { "errors": ["Exceeds decompression ratio."], "message": "Exceeds decompression ratio.", "analysis_information": { "partial_unpacking": true } } ``` ## Hash Lookups Use the following endpoints to check sample classification by sample or hash without triggering static analysis. ### Compute hash and perform lookup ``` POST /check-sample/upload ``` To use a file to compute its hash and check its classification, make a POST request with the file sample as the raw request body. **Note: Even though the sample is not sent for static analysis, the configured file size limit still applies.** #### Examples using `curl` **Example request:** ```bash curl -X POST --upload-file sample 'http://localhost:8000/check-sample/upload' ``` **Example response:** ```json5 { "classification": "malicious" } ``` **Classification**: The `classification` string will be either `"OK"` or `"malicious"`. If [paranoid mode](/FileInspectionEngine/configuration/#--paranoid-mode--rl_paranoid_mode) is turned on, the classification could also be `"suspicious"`. If the `with-threat-details` option is enabled, the response may look like: ```json { "classification": "malicious", "threat_details": { "platform": "Win32", "type": "Malware", "threat_name": "Win32.Malware.Heuristic" } } ``` ### Provide hash and perform lookup ``` GET /check-sample/hash/{name}/{value} ``` To check a file's classification by providing its hash, make a GET request using the following path parameters: - `name`: Hash type; currently supports only `sha1` - `value`: SHA1 hash string #### Examples using `curl` **Example request:** ```bash curl -X GET 'http://localhost:8000/check-sample/hash/sha1/74577262dad60dc5bf35c692f23300c54c92cb53' ``` **Example response:** ```json5 { "classification": "malicious" } ``` **Classification**: The `classification` string will be either `"OK"` or `"malicious"`. If [paranoid mode](/FileInspectionEngine/configuration/#--paranoid-mode--rl_paranoid_mode) is turned on, the classification could also be `"suspicious"`. If the `with-threat-details` option is enabled, the file hash will also be submitted to the cloud API to retrieve additional threat details. The response may look like: ```json { "classification": "malicious", "threat_details": { "platform": "Win32", "type": "Malware", "threat_name": "Win32.Malware.Heuristic" } } ``` ## Request Rejection When a `/scan` or `/check-sample` upload request reaches an FIE instance, the engine first performs a readiness check before accepting the file for processing. Depending on the setup, readiness checks can be approached in two ways: - Kubernetes Readiness: Point your container's readiness probe to `/readyz`. When a container fails its readiness check, Kubernetes marks the Pod as not ready, and Services automatically stop routing traffic to it. - External Load Balancer: Configure your load balancer's health check on `/readyz` so only ready instances receive traffic. These probes are helpful to keep traffic away from busy nodes, but they are optional. The engine also performs its own readiness check for each file upload. **Note: Depending on the delay between the readiness check and the file submission, it is possible that the application returns a ready state, but the file can still be rejected if the conditions change. Such files will have to be resubmitted.** The engine evaluates several conditions to determine whether a file can be accepted. Some conditions are influenced by system state, such as memory usage or processing load, while others, such as concurrency, are driven by the volume of incoming requests. Logging occurs when any of these conditions change, regardless of file submission activity. ### Memory Usage The system can **optionally track memory usage** and compare it to a configured threshold (`processing-unavailable-at-memory-percent`). When enabled, memory usage is calculated as a percentage of either: - the [memory limit defined for the container](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) (if set), or - the total available system memory (if no container limit is present). If this threshold is exceeded at the time of file submission, the application will reject the file until memory usage drops below the limit. **Note: If the temporary directory is configured as `tmpfs`, it will be counted toward memory usage.** Because memory usage depends on the complexity and unpacking behavior of previously submitted files, it may remain elevated even after no new uploads occur. The system logs when it enters and exits high-memory conditions independently of file submissions. If the threshold is not configured, memory usage is **not tracked or logged**, and file rejection based on memory use is disabled. ### Concurrency Limit The system enforces a limit on concurrent requests, defined by the `concurrency-limit` setting. If the number of active concurrent requests exceeds this limit, new submissions are temporarily rejected. The concurrency limit applies **globally** to all uploads, regardless of whether they are routed to the regular core pool or cores reserved for large files. Even if the concurrency limit is set to `0` (unlimited), the system still tracks the number of active concurrent requests. Concurrency is controlled by the number of active file submissions at any given moment. This value is directly influenced by the client's submission behavior, making it a more predictable limit compared to memory or processing-based conditions. ### Multiple Cores Requests are assigned to [Spectra Core](/General/AnalysisAndClassification/SpectraCoreAnalysis) instances. Each file is always processed by a single instance. - All instances are identical in capability. - You can configure a subset of instances to be reserved for files larger than the size threshold (`--large-file-threshold`). - Large-file instances process only **one file at a time**, regardless of the global concurrency limit. - If no large instances are configured, all files are handled by the same pool. The `/readyz` endpoint returns `200 OK` if at least one instance (regular or large-file) is available. For detailed availability per group, use the `/status` endpoint. ## Logging If a file upload is rejected due to memory or processing conditions, the application will return the HTTP status `429 Too Many Requests`. The application logs when it enters and exits high-load or high-memory states. These log entries are independent of file submission attempts, since resource usage is influenced by the complexity of previously submitted files, not just their size or frequency. ### Log Examples **Processing Load: High and Normal** When a Spectra Core instance becomes busy, logs indicate that it cannot accept new files: ```json { "level": "info", "process": "fie", "instance_id": "core-large-0.8qczl", "time": "2025-09-24T15:39:07.618375188+02:00", "message": "Instance is not ready" } { "level": "warn", "process": "core", "instance_id": "core-large-0.8qczl", "time": "2025-09-24T15:39:58.946518488+02:00", "message": "High processing load" } ``` When the load subsides and the instance can accept new files again, logs show a return to normal: ```json { "level": "info", "process": "core", "instance_id": "core-large-0.8qczl", "time": "2025-09-24T15:40:15.592637095+02:00", "message": "High processing load over" } { "level": "info", "process": "fie", "instance_id": "core-large-0.8qczl", "time": "2025-09-24T15:40:15.677907694+02:00", "message": "Instance is ready" } ``` - `Instance is not ready` – The instance cannot accept new files right now. The `/status` endpoint will show it as unavailable, and new submissions will be rejected until it becomes free again. - `High processing load` – The instance is busy. This is an additional signal but less important for deciding whether to submit new files. - `High processing load over` – The instance has returned to a normal state. - `Instance is ready` – The instance is available again, either because load subsided or after a restart. **Note: If an instance recovers due to a timeout and restart, the return-to-normal sequence is slightly different: only the `Instance is ready` message appears after the restart.** **Memory Usage: High and Normal** ```json { "level": "warn", "process": "fie", "component": "readiness.Controller", "time": "2025-09-24T13:55:38.184606708Z", "message": "Memory use is above the threshold of 90%" } { "level": "info", "process": "fie", "component": "readiness.Controller", "time": "2025-09-24T13:56:33.183810287Z", "message": "Memory use is below the threshold of 90%" } ``` ## Timeouts The engine enforces a configurable **timeout** for file analysis. If a file exceeds the configured timeout: - The Spectra Core instance handling the file is terminated and restarted. - Any other analyses running on that instance are aborted. - Logs will show that the analysis was aborted and timed out. - The instance will need to restart and become ready again before it can accept new files. This usually takes a few seconds, but the exact time depends on system load and disk speed. For this reason, very short timeout values are not recommended. ```json { "level": "warn", "process": "fie", "request_id": "3dc57796-2964-4249-bff3-c98ddef747ca", "component": "scanner", "sample_size": 86271956, "sample_sha1": "0aa2f850f0e87ef84743f518a10d17e3b03395d7", "sample_type": "application/x-unix-archive", "scan_duration_ms": 60000.502133, "analyzed_files": 0, "timeout": "1m0s", "time": "2025-09-23T17:35:57.299468984+02:00", "message": "Analysis aborted due to a timeout" } { "level": "warn", "process": "fie", "request_id": "3dc57796-2964-4249-bff3-c98ddef747ca", "component": "http.api", "request_path": "/scan", "content_length": 86271956, "scan_duration_ms": 60349.621584, "sample_sha1": "0aa2f850f0e87ef84743f518a10d17e3b03395d7", "time": "2025-09-23T17:35:57.307513847+02:00", "message": "Analysis has timed out" } ``` After a timeout, the affected instance restarts. Once recovery is complete, it logs that it is ready again: ```json { "level": "info", "process": "fie", "instance_id": "core-regular-0.pwcpx", "time": "2025-09-23T17:36:07.943715309+02:00", "message": "Instance is ready" } ``` **Note: The random suffix in the `instance_id` changes after a restart.** ### Check for Hard Timeout You can look at log messages to determine if your instance has experienced a hard timeout. For example: ``` {"level":"warn","process":"fie","component":"core.process","instance_id":"core-regular-0.dmssx","time":"2025-12-01T13:53:14.665857373+01:00","message":"Hard timeout"} ``` ## Check Application Liveness ```bash curl http://:/livez ``` Returns `200 OK` if the application process is running. Use for Kubernetes liveness probes. ## Check Application Readiness ```bash curl http://:/readyz ``` Returns `200 OK` only if: - The engine has fully initialized (including license validation) and is not too busy to process samples. - Current resource utilization is within configured limits (memory, concurrency) and the system is not too busy (Spectra Core, CPU/load) to process samples. - If not ready, returns a `4xx` or `5xx` status. ## Check Application Version / License / Configuration ``` GET /status ``` To check the File Inspection Engine version, license expiration date, threat data timestamp, and configuration, use the `/status` endpoint. The `config` section contains all the [configurable options](./configuration.md) and their current values. Values containing sensitive information are redacted. The `spectra_core` object describes how analysis instances are configured and available: - **percentage_of_regular_cores**: The number of regular instances expressed as a percentage of the total CPU requests. - If no large-file pool is configured (`--number-of-large-cores=0`), these instances handle all files. - Otherwise, they handle files up to the `--large-file-threshold`. - **percentage_of_large_cores**: The number of large-file instances expressed as a percentage of the total CPU requests. - **available_regular_cores / available_large_cores**: Current availability of instances in each pool (available / total). - **total_cpus**: The total number of CPUs used for calculating percentages. If `RL_CPU_REQUEST` is set, this value comes from that variable. Otherwise, it reflects the total CPUs detected on the node. **Note: The `percentage_*` fields compare instance counts to the CPU request value. They do not represent actual CPU allocation. Each instance may use more or fewer CPUs depending on workload and system limits.** The `threat_data` object describes the threat data used by the engine: - **enabled_classifications**: The classifications enabled in the threat data, based on the combination of the `--without-malicious-threat-data` and `--paranoid-mode` options. When malicious or suspicious threat data is enabled, goodware classification is also automatically enabled. - `--without-malicious-threat-data=false + --paranoid-mode=false` -> `["malicious", "goodware"]` - `--without-malicious-threat-data=false + --paranoid-mode=true` -> `["malicious", "suspicious", "goodware"]` - `--without-malicious-threat-data=true + --paranoid-mode=false` -> `[]` (no threat data is used, files are classified based on static analysis) - `--without-malicious-threat-data=true + --paranoid-mode=true` -> `["suspicious", "goodware"]` - **fp_probability**: The false positive probability for each classification. Not shown for disabled classifications. **Example Response** ```bash curl http://:/status ``` ```json5 { "config": { "add_file_type": "disabled", "cloud_update_interval": "5m0s", "cloud_updates": true, "concurrency_limit": 20, "cpu_request": 8, "http_address": ":8000", "large_file_threshold": 10, "log_json": true, "max_decompression_factor": 1, "max_upload_file_size": 100, "number_of_large_cores": 2, "number_of_regular_cores": 4, "paranoid_mode": true, "processing_unavailable_at_memory_percent": 0, "proxy_address": "http://user:xxxxx@proxy.company.lan", "timeout": "0s", "unpacking_depth": 17, "with_threat_details": false, "without_malicious_threat_data": false }, "license": { "valid_until": "2026-03-01" }, "spectra_core": { "available_large_cores": "100% (2/2)", "available_regular_cores": "100% (4/4)", "percentage_of_large_cores": "25% (2)", "percentage_of_regular_cores": "50% (4)", "total_cpus": 8 }, "threat_data": { "enabled_classifications": [ "malicious", "suspicious", "goodware" ], "fp_probability": { "goodware": "1 in 4.221353e+87 samples", "malicious": "1 in 1.154405e+12 samples", "suspicious": "1 in 2.931565e+12 samples" } }, "version": { "application": "3.2.0", "threat_data": "2025-10-28T14:04:18Z" } } ``` --- ## Analysis Timeout Issues File analysis timeouts can occur when processing complex or large files that require extensive analysis time. Understanding the causes and solutions helps ensure successful file processing. ## Common Causes Analysis timeouts typically happen due to: - **Large file sizes** - Files approaching or exceeding the size limits for your appliance tier - **Deep nesting** - Archives containing multiple layers of compressed files - **Extensive unpacking** - Files that trigger recursive decompression operations - **Complex file structures** - Files with intricate internal structures requiring detailed parsing - **Resource constraints** - Insufficient RAM or CPU allocation for the analysis workload ## Configuration Options ### Spectra Analyze The analysis timeout can be adjusted in the appliance configuration: 1. Navigate to **Administration > Configuration** 2. Locate the analysis timeout setting 3. Increase the timeout value based on your file processing requirements 4. Save the configuration changes ### File Inspection Engine Use the `--analysis-timeout` flag to control the per-file time limit: ```bash rl-scan --analysis-timeout 300 /path/to/file ``` The timeout value is specified in seconds. ## Troubleshooting Steps If analysis timeouts persist: 1. **Increase allocated resources** - Ensure the appliance or container has sufficient RAM (32 GB+ recommended) and CPU cores 2. **Check decompression ratio limits** - Verify that recursive unpacking isn't exceeding configured limits 3. **Review file characteristics** - Examine the file structure to identify potential issues 4. **Monitor system resources** - Check if the appliance is under heavy load from concurrent analyses 5. **Adjust timeout values** - Increase timeout settings for complex file processing workflows ## Related Topics - [Platform Requirements](/General/DeploymentAndIntegration/PlatformRequirements) - Hardware specifications for different appliance tiers - [How Spectra Core analysis works](/General/AnalysisAndClassification/SpectraCoreAnalysis) - Understanding the analysis process --- ## Antivirus Result Availability When a sample is uploaded or rescanned in Spectra Intelligence, it will usually get new antivirus results **within 30 minutes**. When a sample has new antivirus results, these will available in relevant APIs, for example [TCA-0104 File analysis](/SpectraIntelligence/API/FileThreatIntel/tca-0104/). --- ## Certificate Revocation ReversingLabs maintains a certificate revocation database that is updated with each [Spectra Core](/General/AnalysisAndClassification/SpectraCoreAnalysis) release. Because the database is offline, some recently revoked certificates may not appear as revoked until the next update. Certificate Authority (CA) revocation alone is not sufficient to classify a sample as malicious. Most CAs backdate revocations to the certificate's issuance date, regardless of when or whether the certificate was abused. When additional context is available, ReversingLabs adjusts the revocation date to reflect the most appropriate point in time. If a certificate is whitelisted, this correction is not applied. ## Searching for Revoked Certificates You can find samples signed with revoked certificates using **Advanced Search** with the `tag:cert-revoked` keyword. Advanced Search is available both through the [Spectra Analyze user interface](/SpectraAnalyze/search-page/) and as the [TCA-0320 Advanced Search](/SpectraIntelligence/API/MalwareHunting/tca-0320/) API. --- ## File Classification and Risk Scoring — ReversingLabs # Classification File classification assigns a risk score (0-10) and threat verdict (malicious, suspicious, goodware, or unknown) to every analyzed file using ReversingLabs Spectra Core. The classification algorithm combines YARA rules, machine learning, heuristics, certificate validation, and file similarity matching to determine security status. YARA rules take precedence as the most authoritative signal, followed by other detection methods that contribute to the final verdict. The classification of a sample is based on a comprehensive assessment of its assigned risk factor, threat level, and trust factor; however, it can be manually or automatically overridden when necessary. Based on this evaluation, files are placed into one of the following buckets: - No threats found (unclassified) - Goodware/known - Suspicious - Malicious The classification process weighs signals from all available sources to arrive at the most accurate verdict. Some signals are considered more authoritative than others and take priority. For example, [Spectra Core](/General/AnalysisAndClassification/SpectraCoreAnalysis) YARA rules always take precedence because they are written and curated by ReversingLabs analysts. These rules provide the highest degree of accuracy, as they target specific, named threats. This does not mean that other classification methods are less important. Similarity matching, heuristics, and machine learning still contribute valuable signals and may produce additional matches. In cases where multiple detections apply, YARA rules simply serve as the deciding factor for the final classification. ## Risk score A risk score is a value representing the trustworthiness or malicious severity of a sample. Risk score is expressed as a number from 0 to 10, with 0 indicating whitelisted samples from a reputable origin, and 10 indicating the most dangerous threats. At a glance: Files with no threats found don't get assigned a risk score and are therefore **unclassified**. Values from 0 to 5 are reserved for samples classified as **goodware/known**, and take into account the source and structural metadata of the file, among other things. Since goodware samples do not have threat names associated with them, they receive a description based on their risk score. Risk scores from 6 to 10 are reserved for **suspicious** and **malicious** samples, and express their severity. They are calculated by a ReversingLabs proprietary algorithm, and based on many factors such as file origin, threat type, how frequently it occurs in the wild, YARA rules, and more. Lesser threats like adware get a risk score of 6, while ransomware and trojans always get a risk score of 10. ### Malware type and risk score In cases where multiple threats are detected and there are no other factors (such as user overrides) involved, the final classification is always the one that presents the biggest threat. If they belong to the same risk score group, malware types are prioritized in this order: | Risk score | Malware types | |------------|---------------------------------------------------------------------------------------------------------------------| | 10 | EXPLOIT > BACKDOOR > RANSOMWARE > INFOSTEALER > KEYLOGGER > WORM > VIRUS > CERTIFICATE > PHISHING > FORMAT > TROJAN | | 9 | ROOTKIT > COINMINER > ROGUE > BROWSER | | 8 | DOWNLOADER > DROPPER > DIALER > NETWORK | | 7 | SPYWARE > HYPERLINK > SPAM > MALWARE | | 6 | ADWARE > HACKTOOL > PUA > PACKED | ## Threat level and trust factor The [risk score table](#risk-score) describes the relationship between the risk score, and the threat level and trust factor used by the [File Reputation API](/SpectraIntelligence/API/FileThreatIntel/tca-0101). The main difference is that the risk score maps all classifications onto one numerical scale (0-10), while the File Reputation API uses two different scales for different classifications. ### Nomenclature The following classifications are equivalent: | File Reputation API | Spectra Analyze | Spectra Detect Worker | | ------------------- | --------------- | ------------------------ | | known | goodware | 1 (in the Worker report) | In the Worker report, the [risk score](#risk-score) is called `rca_factor`. ## Deciding sample priority The [risk score table](#risk-score) highlights that the a sample's risk score and its classification don't have a perfect correlation. This means that a sample's risk score cannot be interpreted on its own, and that the primary criterion in deciding a sample's priority is its classification. Samples classified as suspicious can be a result of heuristics, or a possible early detection. A suspicious file may be declared malicious or known at a later time if new information is received that changes its threat profile, or if the user manually modifies its status. The system always considers a malicious sample with a risk score of 6 as a higher threat than a suspicious sample with a risk score of 10, meaning that samples classified as malicious always supersede suspicious samples, regardless of the calculated risk score. The reason for this is certainty - a malicious sample is decidedly malicious, while suspicious samples need more data to confirm the detected threat. It is a constant effort by ReversingLabs to reduce the number of suspicious samples. While a suspicious sample with a risk score of 10 does deserve user attention and shouldn't be ignored, a malicious sample with a risk score of 10 should be triaged as soon as possible. ## Malware naming standard --- ## Handling False Positives # Handling False Positives A false positive occurs when a legitimate file is incorrectly classified as malicious. While ReversingLabs strives for high accuracy, false positives can occasionally happen due to the complexity of malware detection across hundreds of file formats and millions of samples. ## What You Can Do If you encounter a false positive, you have several options: ### 1. Local Classification Override On Spectra Analyze, you can immediately override the classification using the classification override feature: - Navigate to the file's Sample Details page - Use the classification override option to manually set the file as goodware - The override takes effect immediately on your appliance - All users on the same appliance will see the updated classification ### 2. Spectra Intelligence Reclassification Request Submit a reclassification request through Spectra Intelligence: - The override propagates across all appliances connected to the same Spectra Intelligence account - Other appliances in your organization will automatically receive the updated classification - This is the recommended approach for organization-wide corrections ### 3. Goodware Overrides Use Goodware Overrides to propagate trusted parent classifications to extracted child files: - If a trusted parent file (e.g., from Microsoft or another reputable vendor) contains files that trigger false positives - The parent's goodware classification can automatically override the child files - This is particularly useful for legitimate installers that may contain components flagged by heuristics ## How ReversingLabs Handles False Positive Reports If a customer reports a false positive (through Zendesk, or by contacting the Support team at support@reversinglabs.com), the first thing we do is re-scan the sample to make sure that the results are up-to-date. If the results are still malicious, our Threat Analysis team will: 1. Conduct our own research of the software and the vendor 2. Contact the AV scanners and notify them of the issue 3. Change the classification in our system (we do not wait for AVs to correct the issue) --- If the file is confirmed to be a false positive, we begin by analyzing why the incorrect classification occurred. Then we try to correct the result by making adjustments related to file relationships, certificates, AV product detection velocity (e.g. are detections being added or removed), we will re-scan and reanalyze samples, adjust/add sources and, if necessary, manually investigate the file. If these efforts do not yield a correct result, we have the ability to **manually override the classification** — but we only do so after thorough analysis confirms the file is benign. --- ## ReversingLabs malware naming standard The ReversingLabs detection string consists of three main parts separated by dots. All parts of the string will always appear (all three parts are mandatory). ``` platform-subplatform.type.familyname ``` 1. The first part of the string indicates the **platform** targeted by the malware. This string is always one of the strings listed in the [Platform string](#platform-string) table. If the platform is Archive, Audio, ByteCode, Document, Image or Script, then it has a subplatform string. Platform and subplatform strings are divided by a hyphen (`-`). The lists of available strings for Archive, Audio, ByteCode, Document, Image and Script subplatforms can be found in their respective tables. 2. The second part of the detection string describes the **malware type**. Strings that appear as malware type descriptions are listed in the [Type string](#type-string) table. 3. The third and last part of the detection string represents the malware family name, i.e. the name given to a particular malware strain. Names "Agent", "Gen", "Heur", and other similar short generic names are not allowed. Names can't be shorter than three characters, and can't contain only numbers. Special characters (apart from `-`) must be avoided as well. The `-` character is only allowed in exploit (CVE/CAN) names (for example CVE-2012-0158). #### Examples If a trojan is designed for the Windows 32-bit platform and has the family name "Adams", its detection string will look like this: ``` Win32.Trojan.Adams ``` If some backdoor malware is a PHP script with the family name "Jones", the detection string will look like this: ``` Script-PHP.Backdoor.Jones ``` Some potentially unwanted application designed for Android that has the family name "Smith" will have the following detection string: ``` Android.PUA.Smith ``` Some examples of detections with invalid family names are: ``` Win32.Dropper.Agent ByteCode-MSIL.Keylogger.Heur Script-JS.Hacktool.Gen Android.Backdoor.12345 Document-PDF.Exploit.KO Android.Spyware.1a Android.Spyware.Not-a-CVE Win32.Trojan.Blue_Banana Win32.Ransomware.Hydra:Crypt Win32.Ransomware.HDD#Cryptor ``` #### Platform string The platform string indicates the operating system that the malware is designed for. The following table contains the available strings and the operating systems for which they are used. | String | Short description | | ----------- | ------------------------------------------------------------------------------------------ | | ABAP | SAP / R3 Advanced Business Application Programming environment | | Android | Applications for Android OS | | AOL | America Online environment | | Archive | Archives. See [Archive subplatforms](#archive-subplatforms) for more information. | | Audio | Audio. See [Audio subplatforms](#audio-subplatforms) for more information. | | BeOS | Executable content for Be Inc. operating system | | Boot | Boot, MBR | | Binary | Binary native type | | ByteCode | ByteCode, platform-independent. See [ByteCode subplatforms](#bytecode-subplatforms) for more information. | | Blackberry | Applications for Blackberry OS | | Console | Executables or applications for old consoles (e.g. Nintendo, Amiga, ...) | | Document | Documents. See [Document subplatforms](#document-subplatforms) for more information. | | DOS | DOS, Windows 16 bit based OS | | EPOC | Applications for EPOC mobile OS | | Email | Emails. See [Email subplatforms](#email-subplatforms) for more information. | | Firmware | BIOS, Embedded devices (mp3 players, ...) | | FreeBSD | Executable content for 32-bit and 64-bit FreeBSD platforms | | Image | Images. See [Image subplatforms](#image-subplatforms) for more information. | | iOS | Applications for Apple iOS (iPod, iPhone, iPad…) | | Linux | Executable content for 32 and 64-bit Linux operating systems | | MacOS | Executable content for Apple Mac OS, OS X | | Menuet | Executable content for Menuet OS | | Novell | Executable content for Novell OS | | OS2 | Executable content for IBM OS/2 | | Package | Software packages. See [Package subplatforms](#package-subplatforms) for more information. | | Palm | Applications for Palm mobile OS | | Script | Scripts. See [Script subplatforms](#script-subplatforms) for more information. | | Shortcut | Shortcuts | | Solaris | Executable content for Solaris OS | | SunOS | Executable content for SunOS platform | | Symbian | Applications for Symbian OS | | Text | Text native type | | Unix | Executable content for the UNIX platform | | Video | Videos | | WebAssembly | Binary format for executable code in Web pages | | Win32 | Executable content for 32-bit Windows OS's | | Win64 | Executable content for 64-bit Windows OS's | | WinCE | Executable content for Windows Embedded Compact OS | | WinPhone | Applications for Windows Phone | ##### Archive subplatforms | String | Short description | | ---------------------------------- | ------------------------------------------------------------ | | ACE | WinAce archives | | AR | AR archives | | ARJ | ARJ (Archived by Robert Jung) archives | | BZIP2 | Bzip2 archives | | CAB | Microsoft Cabinet archives | | GZIP | GNU Zip archives | | ISO | ISO image files | | JAR | JAR (Java ARchive) archives | | LZH | LZH archives | | RAR | RAR (Roshal Archive) archives | | 7ZIP | 7-Zip archives | | SZDD | Microsoft SZDD archives | | TAR | Tar (tarball) archives | | XAR | XAR (eXtensible ARchive) archives | | ZIP | ZIP archives | | ZOO | ZOO archives | | *Other Archive identification* | All other valid [Spectra Core](/General/AnalysisAndClassification/SpectraCoreAnalysis) identifications of Archive type | ##### Audio subplatforms | String | Short description | | -------------------------------- | ---------------------------------------------------------- | | WAV | Wave Audio File Format | | *Other Audio identification* | All other valid Spectra Core identifications of Audio type | ##### ByteCode subplatforms | String | Short description | | ------ | ----------------- | | JAVA | Java bytecode | | MSIL | MSIL bytecode | | SWF | Adobe Flash | ##### Document subplatforms | String | Short description | | ----------------------------------- | ------------------------------------------------------------ | | Access | Microsoft Office Access | | CHM | Compiled HTML | | Cookie | Cookie files | | Excel | Microsoft Office Excel | | HTML | HTML documents | | Multimedia | Multimedia containers that aren't covered by other platforms (e.g. ASF) | | Office | File that affects multiple Office components | | OLE | Microsoft Object Linking and Embedding | | PDF | PDF documents | | PowerPoint | Microsoft Office PowerPoint | | Project | Microsoft Office Project | | Publisher | Microsoft Office Publisher | | RTF | RTF documents | | Visio | Microsoft Office Visio | | XML | XML and XML metafiles (ASX) | | Word | Microsoft Office Word | | *Other Document identification* | All other valid Spectra Core identifications of Document type | ##### Email subplatforms | String | Short description | | ------ | ------------------------------------- | | MIME | Multipurpose Internet Mail Extensions | | MSG | Outlook MSG file format | ##### Image subplatforms | String | Short description | | -------------------------------- | ------------------------------------------------------------ | | ANI | File format used for animated mouse cursors on Microsoft Windows | | BMP | Bitmap images | | EMF | Enhanced Metafile images | | EPS | Adobe Encapsulated PostScript images | | GIF | Graphics Interchange Format | | JPEG | JPEG images | | OTF | OpenType Font | | PNG | Portable Network Graphics | | TIFF | Tagged Image File Format | | TTF | Apple TrueType Font | | WMF | Windows Metafile images | | *Other Image identification* | All other valid Spectra Core identifications of Image type | ##### Package subplatforms | String | Short description | | ---------------------------------- | ------------------------------------------------------------ | | NuGet | NuGet packages | | DEB | Debian Linux DEB packages | | RPM | Linux RPM packages | | WindowStorePackage | Packages for distributing and installing Windows apps | | *Other Package identification* | All other valid Spectra Core identifications of Package type | ##### Script subplatforms | String | Short description | | --------------------------------- | ------------------------------------------------------------ | | ActiveX | ActiveX scripts | | AppleScript | AppleScript scripts | | ASP | ASP scripts | | AutoIt | AutoIt scripts (Windows) | | AutoLISP | AutoCAD LISP scripts | | BAT | Batch scripts | | CGI | CGI scripts | | CorelDraw | CorelDraw scripts | | Ferite | Ferite scripts | | INF | INF Script, Windows installer scripts | | INI | INI configuration file | | IRC | IRC, mIRC, pIRC/Pirch Script | | JS | Javascript, JScript | | KiXtart | KiXtart scripts | | Logo | Logo scripts | | Lua | Lua scripts | | Macro | Macro (e.g. VBA, AmiPro macros, Lotus123 macros) | | Makefile | Makefile configuration | | Matlab | Matlab scripts | | Perl | Perl scripts | | PHP | PHP scripts | | PowerShell | PowerShell scripts, Monad (MSH) | | Python | Python scripts | | Registry | Windows Registry scripts | | Ruby | Ruby scripts | | Shell | Shell scripts | | Shockwave | Shockwave scripts | | SQL | SQL scripts | | SubtitleWorkshop | SubtitleWorkshop scripts | | WinHelp | WinHelp Script | | WScript | Windows Scripting Host related scripts (can be VBScript, JScript, …) | | *Other Script identification* | All other valid Spectra Core identifications of Script type | #### Type string This string is used to describe the general type of malware. The following table contains the available strings and describes what each malware type is capable of. For a catalog of common software weaknesses that enable malware, see [CWE](https://cwe.mitre.org/) maintained by MITRE. CISA maintains advisories on actively exploited vulnerabilities at [cisa.gov/known-exploited-vulnerabilities](https://www.cisa.gov/known-exploited-vulnerabilities). | String | Description | | ----------- | ------------------------------------------------------------ | | Adware | Presents unwanted advertisements | | Backdoor | Bypasses device security and allows remote access | | Browser | Browser helper objects, toolbars, and malicious extensions | | Certificate | Classification derived from certificate data | | Coinminer | Uses system resources for cryptocurrency mining without the user's permission | | Dialer | Applications used for war-dialing and calling premium numbers | | Downloader | Downloads other malware or components | | Dropper | Drops malicious artifacts including other malware | | Exploit | Exploits for various vulnerabilities, CVE/CAN entries | | Format | Malformations of the file format. Classification derived from graylisting, validators on unpackers | | Hacktool | Software used in hacking attacks, that might also have a legitimate use | | Hyperlink | Classifications derived from extracted URLs | | Infostealer | Steals personal info, passwords, etc. | | Keylogger | Records keystrokes | | Malware | New and recently discovered malware not yet named by the research community | | Network | Networking utilities, such as tools for DoS, DDoS, etc. | | Packed | Packed applications (UPX, PECompact…) | | Phishing | Email messages (or documents) created with the aim of misleading the victim by disguising itself as a trustworthy entity into opening malicious links, disclosing personal information or opening malicious files. | | PUA | Potentially unwanted applications (hoax, joke, misleading...) | | Ransomware | Malware which encrypts files and demands money for decryption | | Rogue | Fraudulent AV installs and scareware | | Rootkit | Provides undetectable administrator access to a computer or a mobile device | | Spam | Other junk mail that does not unambiguously fall into the Phishing category, but contains unwanted or illegal content. | | Spyware | Collects personal information and spies on users | | Trojan | Allows remote access, hides in legit applications | | Virus | Self-replicating file/disk/USB infectors | | Worm | Self-propagating malware with exploit payloads | --- ## Risk score reference table --- ## How Spectra Core analysis works # How Spectra Core Analysis Works All ReversingLabs products are powered by [Spectra Core](https://www.reversinglabs.com/products/spectra-core) - the engine that analyzes every file and sample. The process of analyzing software involves several steps, and the final output are the analysis reports. To better understand the source and significance of the information contained in those reports, it's helpful to learn what Spectra Core does in the background of ReversingLabs products. This page provides an overview of the Spectra Core analysis process and explains what happens with files in each of the analysis steps. The following main steps have dedicated sections where they are described in detail: 1. [Identification](#1-identification) 2. [Unpacking](#2-unpacking) 3. [Validation](#3-validation) 4. [Metadata processing](#4-metadata-processing) 5. [Classification](#5-classification) ## Automated static analysis When you scan a file with Spectra Core, the engine automatically performs static analysis on the file and all files extracted from it. Automated static analysis is also referred to as **complex binary analysis**. This unique approach to software analysis decomposes files, collects their metadata, and classifies them in terms of the security risk they pose to end-users. Files are analyzed recursively, which means that every file extracted from the software package goes through the same analysis process like its container software package. As implemented in Spectra Core, automated static analysis does not require access to the source code (like SAST tools typically do). It can directly examine compiled software binaries to determine their structure, dependencies and behaviors. In addition to analyzing software binaries (which is the primary use-case), Spectra Core can analyze library code and source code for specific scripting languages. Another benefit of automated static analysis is that **files are not executed during the analysis process**. All available data is extracted even if the files are compressed, executable, or damaged - regardless of their target OS or platform. Because the analysis process does not execute any files, it can be completed in milliseconds and performed on very large files without significant performance penalties. All these features of automated static analysis give Spectra Core a unique advantage - it can analyze post-build artifacts and detect more novel, sophisticated software supply chain attacks than SCA tools are able to. SCA tools typically analyze package managers, manifest files, or source code repositories to find vulnerabilities. They are limited by the need for known signatures of open source dependencies that have to be cross-referenced against a vulnerability database. Being used in pre-build environments, SCA tools lack visibility into deep file structures and build process tampering evidence - insights that Spectra Core readily provides. ## The Spectra Core analysis process The process starts with the input file. The analysis engine performs several distinct steps on every object it extracts from the input file. The following diagram illustrates the flow that every object goes through. You can interact with the diagram to learn more about the process: - Select steps in the diagram to access their dedicated sections on this page ### 1. Identification Format identification is the initial step of the Spectra Core analysis process. To successfully perform the subsequent analysis steps, we first need to know the file format of every object we are analyzing. Specifically, this step analyzes the object structure to determine whether it's **binary** or **text**, and assigns the analyzed object a unique file format description. This description - file format identification - instructs the analysis engine on which rules and modules to use for further file processing. Two main approaches are used for format identification: - **Signatures** - created by ReversingLabs researchers to identify **binary** file formats based on their unique features. For example, Windows .exe files start with bytes "MZ", while PNG files will usually start with "‰PNG". Signatures describe expectations of what a file format should contain. Using heuristics, the analysis process checks whether those expectations align with the actual file structure. In addition to signatures, the analysis process also evaluates any relevant YARA rules (built into the engine as well as user-provided). If there are multiple matches, those from signatures take priority over YARA rule matches. - **Machine learning models** - created and trained by ReversingLabs researchers to identify **textual** file formats based on statistical text identification. The models are able to recognize basic text objects as scripting languages and distinguish software source code from other types of textual content. **Note: ✅ Completing the identification step** The results of the format identification step are: - File hashes - calculated by the analysis engine - File format descriptions - represented as File type.File subtype.Identification (for example, `Binary/Archive/ZIP`). If there are multiple versions of a file format, they can be identified through the additional `version` field. After the format has been identified, the file is either directed to the proper unpacking module according to its signature, or to the validation step. ### 2. Unpacking Unpacking, also referred to as **file decomposition**, is a step in the Spectra Core analysis process where the analyzed file is taken apart to extract all available components and metadata. During the unpacking process, the analysis engine eliminates obfuscation, encryption, compression, and any other protections that may have been applied to the file and its contents. The engine has built-in mechanisms to prevent infinite recursion, and supports configuring the decompression ratio and unpacking depth (how many layers of a file to extract). Different file formats require different unpacking approaches because of their structure and complexity. Because static analysis does not execute a file, it requires **unpackers** - specialized tools for parsing and unpacking individual file formats. ReversingLabs develops in-house static unpackers tailored to specific file formats, and Spectra Core relies on those unpackers during analysis. Generally speaking, goodware file formats are easier to unpack because their structure is known and well-defined, and file behavior can be observed from the format definition. File formats commonly used for malware are good at hiding code, which makes their unpacking more challenging. To create an unpacker for malware file formats, researchers have to identify each format and document its structure. The unpacker must be able to simulate file execution so that its code can be reconstructed and its behavior observed. Any obfuscation and protection artifacts must also be removed to allow extracting further objects. Information about the file behavior allows the unpacker - and consequently, the analysis process - to reveal the original software intent and to let users understand the true meaning of the code that was packed in that particular file format. The ability to unpack a file format makes it possible for the Spectra Core analysis engine to extract a wealth of metadata and critical information often not available from other tools. The collected metadata includes but is not limited to: format header details, strings (including secrets and URIs), function names, library dependencies, and file segments. Unpacking greatly increases the surface that can be analyzed and helps file classification by providing more metadata to look at. This makes it easier to confirm classification verdicts and increases the chance to catch every threat. **Note: ✅ COMPLETING THE UNPACKING STEP** After the file has been successfully unpacked, all collected metadata and the unpacked file content are passed to the validator assigned to the file format. The validator then performs integrity checks on the available data. ### 3. Validation Validation is a step in the Spectra Core analysis process where the **structure** and the **digital signatures** of the analyzed file are verified according to specific criteria for each file format. In the validation step, the previously identified file format is checked against its specification (the formal definition of the file format by its designer). In other words, the validation process looks for differences between the file format specification and its implementation. By doing this, we can gather additional information about the file format and detect anomalies in it. Any malformations that violate the file format specification are further examined to determine if they are capable of triggering potentially malicious behavior. Such malformations may be reported as known vulnerabilities. ReversingLabs uses these malformation patterns to create heuristics for potential future exploits and predictive vulnerability detection. Multiple validators may be used to verify a file format. They are called successively, first to last, or until one of them acknowledges that it recognizes and can handle the specific file format. If validation fails for one of them, the entire file is marked as invalid. Detected issues are reported as validation warnings or errors, depending on their severity. In addition to performing integrity checks of the file format structure, the validation step also verifies any digital certificates that have been used for code signing. Depending on its status, a certificate may influence the classification of files signed with it. The validation step assigns one of the following statuses to every detected certificate: - Valid certificate - Invalid certificate - Bad checksum - Bad signature - Malformed certificate - Self-signed certificate - Impersonation attempt - Expired certificate - Untrusted certificate - Revoked certificate **Note: ✅ COMPLETING THE VALIDATION STEP** After the file has been validated, all collected metadata is processed, evaluated, and transformed into actionable information that can be used to deliver the final file classification. ### 4. Metadata processing Metadata processing is a step in the Spectra Core analysis process where all previously collected metadata is translated into **human-readable**, **explainable information**. That information is used to produce or support the final file classification. Most of it is surfaced in Spectra Core analysis reports. In this step, metadata is converted into **capabilities** and **indicators**. They build up on the file format properties and platform-specific features of the analyzed file to describe software behavior and intent in more detail. The goal is to make it clearer what the analyzed code means and what each object is trying to do. #### Indicators Indicators can be described as behavior markers that are triggered when a specific pattern is found in the collected metadata or in the file content. An indicator may be triggered for multiple reasons. While some indicators can only be found in specific file formats, most are universal and therefore generally applicable. Indicators contribute to the final file classification, but not in an equal measure. Those deemed highly relevant are better at describing the detected malware type, while those with less relevant contributions help in solidifying the machine learning detection. #### Capabilities Based on the indicators triggered on a file, the analysis engine infers that the file exhibits a specific behavior, or that it is capable of performing specific actions. Similar software behaviors are grouped into broader categories - capabilities - according to the features they have in common. For example, a file can have the filesystem capability, which is a broad description that says the file can access the filesystem or perform filesystem operations, but doesn't describe which operation will actually take place. More fine-grained software behavior descriptions are derived from the indicators (e.g. "Accesses the httpd.conf file"). #### Tags The metadata processing step also assigns tags to files based on their properties such as certificate information, software behaviors, file contents, and many more. Some tags can only be applied to specific file types (for example, web browsers or mobile applications). Tags are visible in [Spectra Analyze](/SpectraAnalyze/tags) and can be queried through the [Spectra Intelligence Advanced Search (TCA-0320)](/SpectraIntelligence/API/MalwareHunting/tca-0320) API. In SAFE reports generated by Spectra Assure, tags appear for all unpacked files and for URIs in the Networking section, where they can be used for filtering. **Note: ✅ COMPLETING THE METADATA PROCESSING STEP** After the metadata has been fully processed, the file receives its classification status in the next step of the analysis. ### 5. Classification Classification is a step in the Spectra Core analysis process where the analysis engine produces a **verdict** on whether the analyzed file contains threats harmful to the end-user. Multiple technologies are used for file classification: - format identification - signatures (byte pattern matches) - file structure validation - extracted file hierarchy - file similarity (RHA1) - certificates - machine learning - heuristics (for scripts and fileless malware) - YARA rules included in the analysis engine They are shipped with the analysis engine and can be used offline, without connecting to any external sources. Their coverage varies based on threat and file format type. In other words, not all technologies can detect all threat types, and not all of them work on all file formats. Those default classification abilities of the Spectra Core platform can be extended with **threat intelligence from the ReversingLabs Cloud** to retrieve file reputation information, and with **custom YARA rules for user-assisted classification**. Some classification approaches are more specific than others, with signatures being the most specific. The final classification result relies on the information from all analysis steps, and it is a combination of all technologies applicable to the file format. It will always match one of the technologies even though they may have differing results between them. Because of differences in how malicious files and malware families behave, some files might end up classified as malicious by one technology, and still be considered goodware by others. This doesn’t negate or diminish the final classification. #### Explainable Machine Learning Spectra Core is the first and only solution on the market that relies on [Explainable Machine Learning (xAI)](https://www.reversinglabs.com/blog/machine-learning-for-humans) for threat detection. Explainable Machine Learning was launched by ReversingLabs in 2020 as a predictive threat detection method that can detect novel malware. It focuses on providing threat analysts with human-readable insights into machine learning-driven classifications. The goal of ReversingLabs Explainable Machine Learning is to go beyond the basic verdict of "goodware vs malware", and to help analysts understand **what type of threat was found**, **why it was detected**, and **what to do with it next**. To achieve that, the classification system combines: - **explainability** (by surfacing software behaviors in the form of indicators), - **relevance** (by ranking behaviors based on their contribution to the final verdict), - and **transparency** (by displaying why each software behavior was triggered). Using natural language to provide clear explanations for classification decisions helps security analysts understand how analyzed software behaves and what malware is capable of doing to the system. This transparency fosters trust, facilitates informed decision-making, and makes the logic behind machine learning classification verdicts easier to follow. Over the years, ReversingLabs threat analysts and researchers have carefully transformed raw code and metadata produced by static analysis into indicators - descriptions of software intent. Those indicators are used in training machine learning (ML) models to recognize if a file is malicious based on the described software functionality and behavior. Many of the threats in the training datasets are hand-picked by ReversingLabs experts and fully, correctly labeled so that ML models can learn what constitutes a specific threat type, and distinguish it from other threat types as well as from clean software. This allows ML models to proactively detect and describe threats - even brand new malware - without the need for additional training. When Spectra Core scans a file and extracts some indicators from it, ML models can match them against the indicators they have learned to recognize as typical for malware or a specific threat type. Some indicators are more meaningful in the context of a malware or threat type, so they contribute more to the classification. When the model decides that something is malicious, the decision can be verified through indicators and reasons why they were triggered. This makes the decision more transparent, relevant, and explainable in terms that are familiar to human analysts. ReversingLabs ML models are tailored to threat types to increase accuracy and [continuously improved](https://www.reversinglabs.com/blog/how-to-harden-ml-models-against-adversarial-attacks) to boost their resilience. All classification models can detect if a file is malicious or not. The PE (Portable Executable) malware classifier is also able to provide the information on the detected threat type. The exact threat type indicates higher confidence in the classification result, while threats that get assigned a generic threat type ("Malware") may point to new, emerging malware. The following ML models are used for malware classification: - PE malware classifier - detects if a file is malicious (that covers all the threat types) and if it is a specific malware type (one of **Backdoor**, **Downloader**, **Infostealer**, **Keylogger**, **PUA**, **Ransomware**, **Worm**) - Script classifiers - apply to `Text/