Integrating AWS S3 with Lambda for Spectra Detect
Introduction
This guideline presents how to deploy AWS CloudFormation in a single template in order to automatically analyze files uploaded to Amazon S3 using Spectra Detect.
CloudFormation template (template.yaml) provisions an AWS Lambda function to process files uploaded to an S3 bucket, using the Spectra Detect service (TiScale) for analysis. The Lambda function retrieves an API key from AWS Secrets Manager, sends the file to the Spectra Detect endpoint, and uploads the resulting JSON with appropriate tags to another S3 bucket.
The CloudFormation template provisions following resources:
- A Lambda function
- Corresponding role and policy permissions to access S3 Buckets, fetch the secret, and to create TAG on resulting JSON (classification).
- Two S3 buckets (input and output)
- A Secret Manager secret storing an API key (with a randomized value)
In difference to integration with S3 connector, this procedure involves connecting to Spectra Detect using API key, from Lambda function. This setup has been tested with Spectra Detect 5.5.0.
Requirements
- AWS account
- ReversingLabs Spectra Detect service package
- ReversingLabs Spectra Detect account
Spectra Detect EC2
Obtain the Spectra Detect API key with the following guide
Refer to the AWS guide on how to create and configure EC2 instance if that is the preferred way.
Prerequisites
AWS CLI: Install and configure the AWS Command Line Interface, following the instructions here
IAM Permissions: The AWS account used to deploy the CloudFormation template must have permissions to:
- Deploy CloudFormation stacks
- Create IAM roles
- Create and manage Secrets Manager secrets
- Create S3 buckets and objects
Optionally, create a new profile to deploy a template:
- Configure a New CLI Profile
- Create a dedicated AWS CLI profile for this deployment:
Example
aws configure --profile rl-profile
CloudFormation Stack Deployment
Following the CloudFormation template used in this example, the names of the resources are arbitrary and can be changed. Keep in mind that S3 Buckets have unique names.
Cloud Formation template
AWSTemplateFormatVersion: '2010-09-09'
Description: Spectra Detect pipeline v1
Resources:
SpectraDetectRole:
Type: AWS::IAM::Role
Properties:
RoleName: SpectraDetectRole-01
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: SpectraDetectPolicy01
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:GetObject
- s3:PutObject
- s3:ListBucket
- s3:PutObjectTagging
Resource:
- arn:aws:s3:::spectra-detect-input-01/*
- arn:aws:s3:::spectra-detect-results-01/*
- arn:aws:s3:::spectra-detect-input-01
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource:
- !Sub "arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:/spectra-detect/apiKey*"
SDApiKey:
Type: AWS::SecretsManager::Secret
Properties:
Name: /spectra-detect/apiKey
Description: Spectra Detect API key
GenerateSecretString:
SecretStringTemplate: '{"apiKey":"mock-key"}'
GenerateStringKey: apiKey
ExcludePunctuation: true
PasswordLength: 32
InputBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: spectra-detect-input-01
NotificationConfiguration:
LambdaConfigurations:
- Event: s3:ObjectCreated:*
Function: !GetAtt SpectraDetectFunction01.Arn
ResultsBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: spectra-detect-results-01
SpectraDetectFunction01:
Type: AWS::Lambda::Function
Properties:
FunctionName: SpectraDetectFunction-01
Runtime: python3.9
Handler: index.lambda_handler
Role: !GetAtt SpectraDetectRole.Arn
MemorySize: 1024
Timeout: 900
Code:
ZipFile: |
import os
import json
import time
import boto3
import urllib3
from urllib import parse
s3 = boto3.client("s3")
secrets = boto3.client("secretsmanager")
TISCALE_HOST = os.environ["TISCALE_HOST"]
TISCALE_URL = f"{TISCALE_HOST}/api/tiscale/v1/upload"
RESULTS_BUCKET = os.environ["RESULTS_BUCKET"]
SECRET_ID = os.environ["SECRET_ID"]
TISCALE_TOKEN = json.loads(secrets.get_secret_value(SecretId=SECRET_ID)["SecretString"])["apiKey"]
VERIFY_CERTS = eval(os.environ["VERIFY_CERTS"].capitalize())
RETRIES_NUMBER = int(os.environ["RETRIES_NUMBER"])
WAIT_TIME = int(os.environ["WAIT_TIME"])
def sumbit_to_tiscale(local_path):
http = urllib3.PoolManager(cert_reqs="CERT_REQUIRED" if VERIFY_CERTS else "CERT_NONE")
with open(local_path, "rb") as file_handle:
file_bytes = file_handle.read()
response = http.request(
method="POST",
url=TISCALE_URL,
headers={"Authorization": f"Token {TISCALE_TOKEN}"},
fields={"file": (local_path, file_bytes, "application/octet-stream")},
encode_multipart=True
)
print("Submit to Spectra Detect successful.")
return json.loads(response.data.decode())
def get_report(task_url):
http = urllib3.PoolManager(cert_reqs="CERT_REQUIRED" if VERIFY_CERTS else "CERT_NONE")
response = http.request(
method="GET",
url=task_url,
headers={"Authorization": f"Token {TISCALE_TOKEN}"}
)
response_dict = json.loads(response.data.decode())
return response_dict if response_dict.get("processed") else None
def download_from_s3(bucket, file_key, local_path):
s3.download_file(bucket, file_key, local_path)
print(f"Downloaded to {local_path} ({os.path.getsize(local_path)} bytes)")
def map_from_numeric(numeric_classification):
return {
0: "unknown",
1: "goodware",
2: "suspicious",
3: "malicious"
}[numeric_classification]
def lambda_handler(event, context):
for record in event["Records"]:
bucket = record["s3"]["bucket"]["name"]
raw_key = record["s3"]["object"]["key"]
file_key = parse.unquote_plus(raw_key)
print(f"Processing s3://{bucket}/{file_key}")
local_path = f"/tmp/{file_key}"
download_from_s3(bucket, file_key, local_path)
submit_resp = sumbit_to_tiscale(local_path)
task_url = submit_resp.get("task_url")
report = None
for _ in range(RETRIES_NUMBER + 1):
report = get_report(task_url)
if report:
break
time.sleep(WAIT_TIME)
if not report:
raise Exception(f"Timeout fetching report for {file_key}")
output_key = f"{file_key}.json"
s3.put_object(
Bucket=RESULTS_BUCKET,
Key=output_key,
Body=json.dumps(report),
ContentType="application/json"
)
verbose = map_from_numeric(report["tc_report"][0]["classification"]["classification"])
s3.put_object_tagging(
Bucket=RESULTS_BUCKET,
Key=output_key,
Tagging={'TagSet': [{'Key': 'classification', 'Value': verbose}]}
)
print(f"Tagged successfully with {verbose}")
return {"statusCode": 200}
Environment:
Variables:
RESULTS_BUCKET: spectra-detect-results-01
TISCALE_HOST: https://changethis-to-real-spectra-detect-url
RETRIES_NUMBER: "5"
SECRET_ID: /spectra-detect/apiKey
VERIFY_CERTS: "false"
WAIT_TIME: "3"
SDInvokePermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref SpectraDetectFunction01
Action: lambda:InvokeFunction
Principal: s3.amazonaws.com
SourceArn: arn:aws:s3:::spectra-detect-input-01
Outputs:
InputBucketName:
Description: Name of the input bucket
Value: !Ref InputBucket
ResultsBucketName:
Description: Name of the results bucket
Value: !Ref ResultsBucket
FunctionName:
Description: Name of the Lambda function
Value: !Ref SpectraDetectFunction01
SecretARN:
Description: ARN of the SecretsManager secret
Value: !Ref SDApiKey
Use the following command to deploy the cf-pipeline-v1.yaml
template to the us-east-1
region:
Cloud Formation deployment
aws cloudformation deploy \
--template-file cf-pipeline-v1.yaml \
--stack-name SpectraDetectV1-CF \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
--region es-east-1
Verify Deployment
Check the status of your stack:
Verify
aws cloudformation describe-stacks \
--stack-name SpectraDetectV1-CF
Verify output names of the resurces:
Verify output
aws cloudformation describe-stacks \
--stack-name SpectraDetectV1-CF \
--region us-east-1 \
--query "Stacks[0].Outputs"
Architecture Overview
It is assumed that Spectra Detect is already running on an EC2 instance, accessible via a public or private endpoint (TISCALE_HOST).
The CloudFormation template provisions:
1. Lambda function
- Deployed with Python 3.9 and script
- Configured with S3 Bucket trigger - the Lambda executes when new file gets created or uploaded
-
Configured with Role to access S3 Buckets and to fetch secret (API key) from Secret manager
-
Configured with environment variables:
TISCALE_HOST
: Host name of the Spectra Detect instance. This value needs to be changed after template deployment (enter real IP or DNS of Spectra Detect instance).RESULTS_BUCKET
: Output bucket name.SECRET_ID
: Name of the Secrets Manager secret containing the API key.VERIFY_CERTS
: Set totrue
to verify SSL certificates, orfalse
to skip verification.WAIT_TIME
: Time in seconds between report fetching retries. Default is 3 seconds.RETRIES_NUMBER
: Number of report fetching retries. Default is 5.
2. IAM Role and policies
- Single role with
AWSLambdaBasicExecutionRole
(CloudWatch permissions) andSpectraDetectPolicy
(Allow S3 Buckets access to read/write, and to fetch secret).
3. Two S3 buckets
- i.e.
spectra-detect-input-01
andspectra-detect-results-01
4. Secrets Manager secret storing an API key (with a randomized value)
- i.e.
/spectra-detect/apiKey
- Secret value needs to be changed after template deployment (enter real API key from Spectra Detect instance).
Acquire the Spectra Detect license (API key) by contacting support@reversinglabs.com
Data-Processing Workflow
-
A file is uploaded to the input bucket.
-
The Lambda function is triggered on each upload event.
-
The function:
- Downloads the file (sample)
- Retrieves the Spectra Detect API key from Secrets Manager
- Sends the sample to the Spectra Detect endpoint
- Spectra Detect analyzes the sample and returns a JSON result.
- Analysis report description available here
- The Lambda function tags and uploads the JSON to the output bucket, using the classification value to determine object tags.
Classification-to-Tag Mapping
Classification Code | Tag |
---|---|
1 | goodware |
2 | malicious |
3 | suspicious |
0 | unknown |
(Optional) VPC-Only Communication
To restrict traffic to the AWS backbone:
1. VPC Setup
- Ensure a VPC and subnets exist.
- Launch or move the Spectra Detect EC2 instance into the target subnet.
2. Lambda Networking
- Configure your Lambda function to use the VPC and subnets.
- Attach appropriate security-group rules for inbound/outbound traffic.
3. S3 Endpoint
- Create an S3 VPC endpoint (Gateway) and an S3 access point in the VPC.
- Update bucket policies to allow access via the VPC endpoint only.
This configuration ensures that all data transfer between Lambda, S3, and EC2 never traverses the public internet.
Monitoring and Logging
To monitor and log the Lambda function's execution:
- Use Amazon CloudWatch to view logs and metrics -> Log groups ->
/aws/lambda/SpectraDetectFunction01
-> Find Log stream and check the logs. - Ensure the Lambda function has the
AWSLambdaBasicExecutionRole
policy attached, which allows it to write logs to CloudWatch.
Additional information
Useful links
ReversingLabs home page: https://www.reversinglabs.com/
ReversingLabs Spectra Detect: https://www.reversinglabs.com/products/spectra-detect