Arhitecture overview
Files are uploaded to Spectra Detect either directly via API or through connectors. Spectra Detect offers a variety of connectors for different storage types, such as S3, ADL, SMB, and others. In addition, emails can be sent for analysis using the SMTP connector.
File analysis is performed by Spectra Detect Workers, which are deployed in an autoscaling cluster. During idle times, the number of Worker instances is reduced to save resources. When the load is high, the number of instances increases to accommodate the demand.
There are two types of Workers in Spectra Detect:
- Regular Workers - used for standard file processing. The minimum number of instances is 1 to ensure normal functioning of Spectra Detect.
- Large File Workers - used for large and complex files such as large archives, disk images etc. Normally, the number of large file Workers is 0, and it goes up only if there are large files that need to be processed. The threshold at which large file workers are spun up can be configured in Spectra Detect Manager. To enable large file workers, follow the instructions here.
All Workers are deployed and autoscaled as a single set of applications; their constituent services are not autoscalable separately from one another.
Workers in the autoscaling cluster share the database and the RabbitMQ queue.
Worker reports are stored in /scratch/report-uuid
, which is on EFS. A reference to each report (UUID) is held in the database. This means that any Worker can return the report. Requests for the report go to the Kubernetes Ingress, which sends them to one of the available Workers, and that Worker fetches the report from /scratch
and returns it.
The Hub is deployed, but it is used only to run connectors, not for load balancing. Kubernetes Ingress is used for that instead.
Spectra Detect Manager is used for Worker configuration. When Kubernetes HPA spins up a new Worker, it is automatically detected and added to Spectra Detect Manager, which then pushes the configuration to the Worker.
Spectra Detect can be configured to output reports and/or processed files to various storage types (S3, SMB, ADL, etc.).