Skip to main content
Version: Spectra Detect 5.7.2

Troubleshooting

This guide covers common issues with Spectra Detect deployments and the steps to resolve them.


Worker node not appearing in SDM dashboard

Symptom

A newly deployed or restarted Worker node does not appear in the Spectra Detect Manager (SDM) appliance list, or its status shows as disconnected.

Cause

  • The Worker is not configured with the correct SDM address or port.
  • A firewall is blocking the communication channel between the Worker and SDM.
  • The Worker has not been authorized in SDM.
  • TLS certificate mismatch between the Worker and SDM.

Solution

  1. On the Worker appliance, verify that the SDM address is correctly configured under Appliance Configuration.
  2. Test network connectivity from the Worker to the SDM:
curl -k https://<sdm-address>:<port>/api/v1/health

If this fails, check firewall rules between the Worker and SDM host. The required ports are documented in the deployment prerequisites.

  1. In the SDM interface, navigate to Appliances and check whether the Worker appears as pending authorization. Newly connected Workers must be explicitly authorized before they appear as active. See Manager Settings.
  2. If TLS certificates are custom, verify that the Worker trusts the SDM certificate. See Certificate Management for certificate configuration.
  3. Check SDM logs for connection rejection messages to identify the specific failure.

File analysis queue is backing up

Symptom

The Dashboard shows the analysis queue growing continuously. Files are taking much longer than normal to receive a classification result. The queue count in the SDM overview is increasing or not decreasing.

Cause

  • The cluster does not have enough Worker nodes or CPU resources to keep up with the current ingestion rate.
  • A subset of Worker nodes is unhealthy or offline, reducing effective throughput.
  • Very large or deeply nested archives are occupying Worker instances for extended periods.
  • The input connector (S3, folder watch, or API) is ingesting files faster than Workers can process them.

Solution

  1. Check the status of all Worker nodes in the SDM dashboard. Confirm that all expected Workers are online and healthy.
  2. Review Worker CPU and memory usage. Sustained CPU at 100% indicates the Workers are capacity-constrained.
  3. For K8s deployments, scale the Worker deployment horizontally by increasing the replica count:
kubectl scale deployment spectra-detect-worker \
--replicas=<desired-count> -n <namespace>

YARA rules not syncing to Workers

Symptom

YARA rules uploaded to an appliance or to SDM do not appear on Worker nodes. Samples that should match a YARA rule do not produce expected results. The SDM YARA management page shows rules as uploaded but Workers do not reflect the changes.

Cause

  • The YARA sync service on a Worker node is stopped or misconfigured.
  • A network connectivity issue is preventing SDM from pushing rules to the Worker.
  • A syntax error in a newly uploaded YARA rule is causing the entire ruleset to be rejected by the Worker.

Solution

  1. Verify that YARA sync is enabled and configured correctly on the Worker. See YARA Sync for the required configuration parameters.
  2. Check YARA sync logs on the Worker node for error messages:
# For OVA/AMI deployments
journalctl -u spectradetect-yarasync -n 100

# For K8s deployments
kubectl logs <worker-pod-name> -c yara-sync -n <namespace>
  1. If the logs show rule validation errors, identify the specific rule causing the failure. Invalid rules are typically logged with the rule name and a parse error message:
ERROR: YARA rule parse failed: rule "my_rule" at line 14 - unexpected token

Remove or correct the invalid rule and re-upload the ruleset.

  1. Test SDM-to-Worker connectivity on the YARA sync port. Consult Certificate Management if TLS is involved.
  2. After resolving connectivity or rule issues, trigger a manual YARA sync from the SDM interface or via the Management API.

S3 connector not picking up files

Symptom

Files placed in the configured S3 bucket are not being submitted for analysis. The analysis queue remains empty even though new files are present in the bucket.

Cause

  • The S3 connector credentials (AWS access key or IAM role) are incorrect or lack the required permissions.
  • The bucket name or prefix path in the connector configuration does not match the actual bucket.
  • The S3 connector service is stopped or has encountered an error during startup.
  • The bucket is in a different AWS region than expected.

Solution

  1. Review the S3 connector configuration under Analysis Input. Confirm the bucket name, region, and prefix are correct.
  2. Verify that the IAM role or access keys have the following minimum permissions on the bucket:
    • s3:GetObject
    • s3:ListBucket
    • s3:DeleteObject (if files should be removed after processing)
# Test access from the connector host
aws s3 ls s3://<bucket-name>/<prefix>/ --region <region>
  1. Check the connector service logs for authentication or connectivity errors:
# For OVA/AMI deployments
journalctl -u spectradetect-connector -n 100

# For K8s deployments
kubectl logs <hub-pod-name> -n <namespace>
  1. Confirm the connector service is running. Restart it if it has crashed:
# For OVA/AMI deployments
sudo systemctl restart spectradetect-connector
  1. For K8s deployments, note that S3 connector support is provided through the Hub component. Refer to your deployment documentation for Hub configuration details.

SDM shows appliance as unreachable

Symptom

In the SDM overview, one or more connected appliances (Workers or Spectra Analyze instances) display an "Unreachable" status. Alerts may be generated for the affected appliances.

Cause

  • The appliance is powered off or has lost network connectivity.
  • The SDM heartbeat check is failing due to a temporary network disruption.
  • The appliance's management interface has changed IP address.
  • A TLS certificate on the appliance has expired, causing the SDM connection to fail.

Solution

  1. Attempt to access the appliance web interface or SSH directly to confirm it is online:
ssh admin@<appliance-ip>
ping -c 4 <appliance-ip>
  1. In the SDM interface, navigate to Manager Settings and verify the registered IP address or hostname for the appliance. Update it if the IP has changed. See Manager Settings.
  2. Check whether the appliance TLS certificate has expired. Certificate issues are logged in the appliance system logs and also visible on the Certificate Management page. See Certificate Management for certificate renewal steps.
  3. If the appliance is accessible but SDM still shows it as unreachable, restart the SDM communication service on the appliance:
sudo systemctl restart spectradetect-sdm-agent
  1. Review SDM logs for specific error messages related to the unreachable appliance to identify the root cause.

Analysis results not appearing in dashboard

Symptom

Files are being submitted and Workers show activity (CPU usage, queue movement), but analysis results do not appear in the SDM Dashboard or in the analysis results view.

Cause

  • The results reporting service on the Worker is misconfigured and is not sending results back to the Hub or SDM.
  • A downstream results consumer (webhook, SIEM integration, or S3 output bucket) is misconfigured, causing result delivery to fail silently.
  • The SDM database is not receiving result records due to a connectivity issue between the Hub and SDM.

Solution

  1. Check Worker logs for result delivery errors:
kubectl logs <worker-pod-name> -n <namespace> | grep -i "result\|output\|error"
  1. Verify the output configuration under Analysis Input. Confirm that the output destination (Hub address, S3 bucket, or webhook URL) is correct and reachable.
  2. Test connectivity from the Worker to the Hub or SDM results endpoint.
  3. For notification-based integrations (email alerts, webhooks), check the notification configuration. See Notifications for configuration options.
  4. If results are being generated locally but not forwarded, check for disk space issues on the Worker node that might be causing result queuing to back up locally rather than forwarding to SDM.

Update fails or gets stuck

Symptom

A software update initiated from the SDM Updating page (or via the update CLI) does not complete. The update status page shows the process as "In Progress" for an extended period, or the update fails with an error message.

Cause

  • Network connectivity between the appliance and the ReversingLabs update server is disrupted during the download.
  • The appliance does not have sufficient disk space to stage the update package.
  • An SDM-managed appliance was rebooted or lost connectivity to SDM mid-update.
  • An incompatible update sequence (for example, skipping a required intermediate version).

Solution

  1. Navigate to the Updating page in SDM and review the update log for specific error messages.
  2. Check available disk space on the appliance before retrying:
df -h

If disk space is insufficient, free space by purging old analysis results or logs before reattempting the update.

  1. Test network connectivity to the ReversingLabs update infrastructure from the appliance:
curl -I https://updates.reversinglabs.com
  1. Do not reboot or restart the appliance while an update is in progress unless instructed to do so in the error message.
  2. If the update process is hung (no log activity for more than 30 minutes), contact ReversingLabs Support before attempting to cancel or retry the update, as incomplete updates may leave the system in a partially upgraded state.
  3. For K8s deployments, follow the upgrade procedure in your deployment documentation rather than using the SDM update mechanism.

High memory usage on Worker nodes

Symptom

Kubernetes Worker pods are being OOM-killed, or node memory usage is consistently above 90%. The SDM dashboard or cluster monitoring (Prometheus, CloudWatch) shows frequent memory pressure on Worker nodes.

Cause

  • The number of concurrent analyses (concurrency-limit) is set too high relative to the available memory per Worker pod.
  • Large files or archives with high decompression ratios are consuming more memory than typical workloads.
  • Memory limits set in the Helm chart are too low for the configured number of Spectra Core instances.

Solution

  1. Check if OOM events are occurring:
kubectl describe pod <worker-pod-name> -n <namespace> | grep -i "OOMKilled\|Reason"
dmesg | grep -i "oom\|killed"
  1. Review the Worker's Helm chart values and reduce the concurrency limit or the number of Spectra Core instances (number-of-regular-cores) to reduce peak memory usage.
  2. Increase the memory limits and requests for Worker pods in the Helm values if available node memory allows it. Refer to your deployment's Helm values reference documentation.
  3. Consider configuring a dedicated large-file Worker group with higher memory limits and a separate concurrency limit, leaving the regular pool for smaller files. Refer to your deployment documentation for Worker group customization options.
  4. Review the platform requirements to confirm that the cluster nodes meet the recommended memory specifications for the configured number of Spectra Core instances.

Cluster certificate errors

Symptom

Connections between cluster components fail with TLS errors. SDM reports appliances as unreachable. Browser access to the SDM web interface shows a certificate warning. Logs contain messages such as:

TLS handshake failed: x509: certificate has expired or is not yet valid
TLS handshake failed: x509: certificate signed by unknown authority

Cause

  • Internal cluster certificates have expired and have not been renewed.
  • A custom CA certificate used to sign internal certificates is not trusted by all components.
  • The system clock on one or more nodes is incorrect, causing certificate validity window checks to fail.

Solution

  1. Identify which certificate is causing the failure by examining the TLS error in detail:
openssl s_client -connect <host>:<port> -showcerts 2>&1 | openssl x509 -noout -dates
  1. Check certificate expiration across cluster components. See Certificate Management for the full list of certificates used and the renewal procedure.
  2. Renew expired certificates following the procedure documented in Certificate Management. After renewal, restart the affected services.
  3. Verify that the system clock is synchronized on all nodes:
timedatectl status
chronyc tracking

If clock drift is detected, synchronize with NTP and verify that ntpd or chronyd is running and configured.

  1. If an "unknown authority" error is present, ensure that the custom CA certificate is distributed to and trusted by all cluster components and the SDM host. See Certificate Management for CA distribution steps.