Skip to main content

Integrating AWS S3 with Lambda for Spectra Detect

Introduction

This guideline presents how to deploy AWS CloudFormation in a single template in order to automatically analyze files uploaded to Amazon S3 using Spectra Detect.

CloudFormation template (template.yaml) provisions an AWS Lambda function to process files uploaded to an S3 bucket, using the Spectra Detect service (TiScale) for analysis. The Lambda function retrieves an API key from AWS Secrets Manager, sends the file to the Spectra Detect endpoint, and uploads the resulting JSON with appropriate tags to another S3 bucket.

The CloudFormation template provisions following resources:

  • A Lambda function
  • Corresponding role and policy permissions to access S3 Buckets, fetch the secret, and to create TAG on resulting JSON (classification).
  • Two S3 buckets (input and output)
  • A Secret Manager secret storing an API key (with a randomized value)
info

In difference to integration with S3 connector, this procedure involves connecting to Spectra Detect using API key, from Lambda function. This setup has been tested with Spectra Detect 5.5.0.

Requirements

Spectra Detect EC2

Obtain the Spectra Detect API key with the following guide

Refer to the AWS guide on how to create and configure EC2 instance if that is the preferred way.

Prerequisites

AWS CLI: Install and configure the AWS Command Line Interface, following the instructions here

IAM Permissions: The AWS account used to deploy the CloudFormation template must have permissions to:

  • Deploy CloudFormation stacks
  • Create IAM roles
  • Create and manage Secrets Manager secrets
  • Create S3 buckets and objects

Optionally, create a new profile to deploy a template:

  • Configure a New CLI Profile
  • Create a dedicated AWS CLI profile for this deployment:
Example
aws configure --profile rl-profile

CloudFormation Stack Deployment

Following the CloudFormation template used in this example, the names of the resources are arbitrary and can be changed. Keep in mind that S3 Buckets have unique names.

Cloud Formation template
AWSTemplateFormatVersion: '2010-09-09'
Description: Spectra Detect pipeline v1

Resources:

SpectraDetectRole:
Type: AWS::IAM::Role
Properties:
RoleName: SpectraDetectRole-01
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: SpectraDetectPolicy01
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:GetObject
- s3:PutObject
- s3:ListBucket
- s3:PutObjectTagging
Resource:
- arn:aws:s3:::spectra-detect-input-01/*
- arn:aws:s3:::spectra-detect-results-01/*
- arn:aws:s3:::spectra-detect-input-01
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource:
- !Sub "arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:/spectra-detect/apiKey*"

SDApiKey:
Type: AWS::SecretsManager::Secret
Properties:
Name: /spectra-detect/apiKey
Description: Spectra Detect API key
GenerateSecretString:
SecretStringTemplate: '{"apiKey":"mock-key"}'
GenerateStringKey: apiKey
ExcludePunctuation: true
PasswordLength: 32

InputBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: spectra-detect-input-01
NotificationConfiguration:
LambdaConfigurations:
- Event: s3:ObjectCreated:*
Function: !GetAtt SpectraDetectFunction01.Arn

ResultsBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: spectra-detect-results-01

SpectraDetectFunction01:
Type: AWS::Lambda::Function
Properties:
FunctionName: SpectraDetectFunction-01
Runtime: python3.9
Handler: index.lambda_handler
Role: !GetAtt SpectraDetectRole.Arn
MemorySize: 1024
Timeout: 900
Code:
ZipFile: |
import os
import json
import time
import boto3
import urllib3
from urllib import parse

s3 = boto3.client("s3")
secrets = boto3.client("secretsmanager")

TISCALE_HOST = os.environ["TISCALE_HOST"]
TISCALE_URL = f"{TISCALE_HOST}/api/tiscale/v1/upload"
RESULTS_BUCKET = os.environ["RESULTS_BUCKET"]
SECRET_ID = os.environ["SECRET_ID"]
TISCALE_TOKEN = json.loads(secrets.get_secret_value(SecretId=SECRET_ID)["SecretString"])["apiKey"]
VERIFY_CERTS = eval(os.environ["VERIFY_CERTS"].capitalize())
RETRIES_NUMBER = int(os.environ["RETRIES_NUMBER"])
WAIT_TIME = int(os.environ["WAIT_TIME"])

def sumbit_to_tiscale(local_path):
http = urllib3.PoolManager(cert_reqs="CERT_REQUIRED" if VERIFY_CERTS else "CERT_NONE")
with open(local_path, "rb") as file_handle:
file_bytes = file_handle.read()
response = http.request(
method="POST",
url=TISCALE_URL,
headers={"Authorization": f"Token {TISCALE_TOKEN}"},
fields={"file": (local_path, file_bytes, "application/octet-stream")},
encode_multipart=True
)
print("Submit to Spectra Detect successful.")
return json.loads(response.data.decode())

def get_report(task_url):
http = urllib3.PoolManager(cert_reqs="CERT_REQUIRED" if VERIFY_CERTS else "CERT_NONE")
response = http.request(
method="GET",
url=task_url,
headers={"Authorization": f"Token {TISCALE_TOKEN}"}
)
response_dict = json.loads(response.data.decode())
return response_dict if response_dict.get("processed") else None

def download_from_s3(bucket, file_key, local_path):
s3.download_file(bucket, file_key, local_path)
print(f"Downloaded to {local_path} ({os.path.getsize(local_path)} bytes)")

def map_from_numeric(numeric_classification):
return {
0: "unknown",
1: "goodware",
2: "suspicious",
3: "malicious"
}[numeric_classification]

def lambda_handler(event, context):
for record in event["Records"]:
bucket = record["s3"]["bucket"]["name"]
raw_key = record["s3"]["object"]["key"]
file_key = parse.unquote_plus(raw_key)
print(f"Processing s3://{bucket}/{file_key}")
local_path = f"/tmp/{file_key}"
download_from_s3(bucket, file_key, local_path)
submit_resp = sumbit_to_tiscale(local_path)
task_url = submit_resp.get("task_url")
report = None
for _ in range(RETRIES_NUMBER + 1):
report = get_report(task_url)
if report:
break
time.sleep(WAIT_TIME)
if not report:
raise Exception(f"Timeout fetching report for {file_key}")
output_key = f"{file_key}.json"
s3.put_object(
Bucket=RESULTS_BUCKET,
Key=output_key,
Body=json.dumps(report),
ContentType="application/json"
)
verbose = map_from_numeric(report["tc_report"][0]["classification"]["classification"])
s3.put_object_tagging(
Bucket=RESULTS_BUCKET,
Key=output_key,
Tagging={'TagSet': [{'Key': 'classification', 'Value': verbose}]}
)
print(f"Tagged successfully with {verbose}")
return {"statusCode": 200}
Environment:
Variables:
RESULTS_BUCKET: spectra-detect-results-01
TISCALE_HOST: https://changethis-to-real-spectra-detect-url
RETRIES_NUMBER: "5"
SECRET_ID: /spectra-detect/apiKey
VERIFY_CERTS: "false"
WAIT_TIME: "3"

SDInvokePermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref SpectraDetectFunction01
Action: lambda:InvokeFunction
Principal: s3.amazonaws.com
SourceArn: arn:aws:s3:::spectra-detect-input-01

Outputs:
InputBucketName:
Description: Name of the input bucket
Value: !Ref InputBucket

ResultsBucketName:
Description: Name of the results bucket
Value: !Ref ResultsBucket

FunctionName:
Description: Name of the Lambda function
Value: !Ref SpectraDetectFunction01

SecretARN:
Description: ARN of the SecretsManager secret
Value: !Ref SDApiKey

Use the following command to deploy the cf-pipeline-v1.yaml template to the us-east-1 region:

Cloud Formation deployment
   aws cloudformation deploy \
--template-file cf-pipeline-v1.yaml \
--stack-name SpectraDetectV1-CF \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
--region es-east-1

Verify Deployment

Check the status of your stack:

Verify
   aws cloudformation describe-stacks \
--stack-name SpectraDetectV1-CF

Verify output names of the resurces:

Verify output
aws cloudformation describe-stacks \ 
--stack-name SpectraDetectV1-CF \
--region us-east-1 \
--query "Stacks[0].Outputs"

Architecture Overview

info

It is assumed that Spectra Detect is already running on an EC2 instance, accessible via a public or private endpoint (TISCALE_HOST).

AWS Topology

The CloudFormation template provisions:

1. Lambda function

  • Deployed with Python 3.9 and script
  • Configured with S3 Bucket trigger - the Lambda executes when new file gets created or uploaded

Triggers

  • Configured with Role to access S3 Buckets and to fetch secret (API key) from Secret manager

  • Configured with environment variables:

    • TISCALE_HOST: Host name of the Spectra Detect instance. This value needs to be changed after template deployment (enter real IP or DNS of Spectra Detect instance).
    • RESULTS_BUCKET: Output bucket name.
    • SECRET_ID: Name of the Secrets Manager secret containing the API key.
    • VERIFY_CERTS: Set to true to verify SSL certificates, or false to skip verification.
    • WAIT_TIME: Time in seconds between report fetching retries. Default is 3 seconds.
    • RETRIES_NUMBER: Number of report fetching retries. Default is 5.

2. IAM Role and policies

  • Single role with AWSLambdaBasicExecutionRole (CloudWatch permissions) and SpectraDetectPolicy (Allow S3 Buckets access to read/write, and to fetch secret).

3. Two S3 buckets

  • i.e. spectra-detect-input-01 and spectra-detect-results-01

4. Secrets Manager secret storing an API key (with a randomized value)

  • i.e. /spectra-detect/apiKey
  • Secret value needs to be changed after template deployment (enter real API key from Spectra Detect instance).
info

Acquire the Spectra Detect license (API key) by contacting support@reversinglabs.com

Data-Processing Workflow

Data-Processing Workflow Image

  1. A file is uploaded to the input bucket.

  2. The Lambda function is triggered on each upload event.

  3. The function:

  • Downloads the file (sample)
  • Retrieves the Spectra Detect API key from Secrets Manager
  • Sends the sample to the Spectra Detect endpoint
  1. Spectra Detect analyzes the sample and returns a JSON result.
  • Analysis report description available here
  1. The Lambda function tags and uploads the JSON to the output bucket, using the classification value to determine object tags.

Data-Processing Workflow Tags

Classification-to-Tag Mapping

Classification CodeTag
1goodware
2malicious
3suspicious
0unknown

(Optional) VPC-Only Communication

To restrict traffic to the AWS backbone:

1. VPC Setup

  • Ensure a VPC and subnets exist.
  • Launch or move the Spectra Detect EC2 instance into the target subnet.

2. Lambda Networking

  • Configure your Lambda function to use the VPC and subnets.
  • Attach appropriate security-group rules for inbound/outbound traffic.

3. S3 Endpoint

  • Create an S3 VPC endpoint (Gateway) and an S3 access point in the VPC.
  • Update bucket policies to allow access via the VPC endpoint only.

This configuration ensures that all data transfer between Lambda, S3, and EC2 never traverses the public internet.

Monitoring and Logging

To monitor and log the Lambda function's execution:

  • Use Amazon CloudWatch to view logs and metrics -> Log groups -> /aws/lambda/SpectraDetectFunction01 -> Find Log stream and check the logs.
  • Ensure the Lambda function has the AWSLambdaBasicExecutionRole policy attached, which allows it to write logs to CloudWatch.

Lambda function logs

Additional information

Useful links

ReversingLabs home page: https://www.reversinglabs.com/

ReversingLabs Spectra Detect: https://www.reversinglabs.com/products/spectra-detect

AWS https://aws.amazon.com/