Skip to content

enmata/bucket-av-scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bucket-av-scanner

Table of contents

Abstract

This document tries to explain the application workflow, set up and configuration of bucket-av-scanner. The solution is automating the new files scan by using bucket event notifications, a ruby script and clamd.

  • Scalability is handled by decoupling the notifications and the scanning script though a RabbitMQ queue.
  • Storage is implemented in MinIO buckets.
  • ClamD scanning is managed in a REST API.
  • The entire solution is running on standard public containers images.
  • The workflow is managed by a Makefile.
  • Containers deploy is implemented on a docker-compose yml manifest.
  • Set up is done by custom entrypoints scripts.
  • HA and resiliency is ensured by healthchecks and volumes.
  • The entire solution is using open source tools.

Main folder structure

Files and scripts has been distributed as follows:

├── Makefile                _# Makefile defining the workflow deploying, testing, and wrapping up the application_
├── README.rd               _# project documentation_
├── bucket-av-scanner.png   _# main architecture diagram used in README.rd_
├── docker-compose          _# folder containing docker-compose manifest and custom entrypoint scripts_
    ├── docker-compose.yml                                _# main docker-compose manifest defining services_
    ├── docker-compose_avscan-script_entrypoint.sh        _# custom entrypoints script for avscan-script service_
    ├── docker-compose_minio-service-init_entrypoint.sh   _# custom entrypoints script for minio-service initialization_
    ├── docker-compose_minio-service_entrypoint.sh        _# custom entrypoints script for minio-service service_
    └── docker-compose_mq-service-init_entrypoint.sh      _# custom entrypoints script for mq-service service_
└── testing             _# folder containing testing scripts_
    └── bucket-av-scanner_tests.sh   _# testing script with clean and infected (EICAR) cases_

Tools and libraries

The following tools has been used:

  • av-scan script

    • ruby [3.0.2p107] # running main script
    • ruby gems
      • aws-sdk-s3 [1.114.0] # download newly putted file on MinIO
      • json [2.6.2] # post scan notification formating
      • uri [0.11.0] # request and notification fields management
      • yaml [0.2.0] # request and notification fields management
      • logger [1.5.1] # script messages logging
      • securerandom [0.2.0] # randomize filename locally on avscan container before scanning
      • bunny [2.19.0] # mq queue subscribing and management
      • net [0.3.3] # request and notification network comunication
      • rest-client [2.1.0] # clamd post request
  • Queue management

    • RabbitMQ [3.8.34] # MQ queue implementation
  • Bucket storage solution

    • MinIO [2022-05-26T05:48:41Z] # bucket implementation
  • Antivirus solution

    • ClamAV 0.104.3
  • Deploy

    • docker [20.10.12] # running standard docker images
    • docker-compose [1.29.2] # defining and creating docker services
  • Testing

    • aws cli [2.7.30] # uploading and listing files
    • wget [1.21.3] # downloading EICAR signature file
  • Docker container images

    image name tag size usage
    ajilaag/clamav-rest latest 263MB antivirus scanning
    rabbitmq 3.8-management-alpine 148MB mq queue
    minio/minio RELEASE.2022-05-26T05-48-41Z 376MB bucket storage
    minio/mc RELEASE.2022-05-09T04-08-26Z 158MB bucket initialization
    alpine/openssl latest 8.04MB tls ssl generation
    ruby 2.7.0 842MB av-scan sscript execution

Architecture diagram

Execution workflow steps are as follows

Diagram Overview

  1. An external service uploads a file on the MinIO s3 bucket
  2. MinIO sends a PUT notification to the RabbitMQ queue
  3. The notification is read by the internal ruby script subscribed to the queue
  4. The internal ruby script downloads temporally the file from MinIO s3 bucket
  5. The internal ruby script scans the file by sending it to clamd-service
  6. using ClamAV and sends the response back to avscan-script
  7. The internal ruby script notifies the content sharing service endpoint with the results (event if it's infected or not)

The following 3 services, are used only for initialize proposes, and are running only on deploy time

  • mq-service-init: initializing mq-service queue, tags, users and bindings
  • minio-service-init: creating buckets and users on minio-service
  • openssl-init: generating self-signed certificate for minio-service

Assumptions, requirements and considerations

Running the Makefile assumes:

  • aws cli installed and PATH accessible.
  • docker daemon is running.
  • docker-compose installed and PATH accessible.
  • you have access to internet (DockerHub and Gem sources) for downloading public images and gem libraries.
  • other tools like make, netcat and wget installed and PATH accessible.

Deployment pipeline and testing with make file

The following stages has been defined on the Makefile

  • make all (< 15m) runs sequentially all the workflow: "make deploy-docker", "make test" and "make deploy-docker"
  • make deploy-docker (< 11m) creation of all the needed resources on local docker daemon
  • make test (< 10s) runs sequentially tests by uploading a clean and an infected file
  • make clean-docker (< 5m) deletes the docker-compose resources created during the deploy
  • make logs (< 5s) shows logs output of resources created during the deploy

Configuration parameters

Most important configuration parameters are customizable as follows:

  • MINIO_SERVER_URL: service name and port for accessing "minio-service" service. Default: https://minio-service:9000
  • MINIO_ENDPOINT: service name for accessing "minio-service" service. Default: minio-service
  • MINIO_ROOT_USER: admin user name for "minio-service" service. Default: minioadmin
  • MINIO_ROOT_PASSWORD: admin user password for "minio-service" service.
  • MINIO_USER_NAME: regular user name for "minio-service" service. Default: miniouser
  • MINIO_USER_PASSWORD: regular user password for "minio-service" service.
  • MINIO_BUCKET_NAME: bucket name created on "minio-service" service. Default: storagebucket
  • MINIO_REGION: emulated bucket region on "minio-service" service. Default: eu-west-1
  • RABBITMQ_ENDPOINT: service name for accessing "rabbitmq-service" service. Default: mq-service
  • RABBITMQ_PORT: service port number for accessing "rabbitmq-service" service. Default: 15672
  • RABBITMQ_DEFAULT_USER: admin user name for "rabbitmq-service" service. Default: rabbitadmin
  • RABBITMQ_DEFAULT_PASS: admin user password for "rabbitmq-service" service.
  • RABBITMQ_REGULAR_USER_NAME: regular user password for "minio-service" service. Default: rabbituser
  • RABBITMQ_REGULAR_USER_PASS: regular user password for "minio-service" service.
  • RABBITMQ_QUEUE_ROUTING_KEY: MQ routing key for directing MinIO notification to mq queue. Default: bucket_notifications
  • RABBITMQ_QUEUE_NAME: MQ routing key for directing MinIO notification to mq queue. Default: s3minioqueue
  • RABBITMQ_TOPIC: MQ topic created where MinIO notification are redirected to. Default: s3minioscan
  • RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS: additional rabbit server setting to set console log level to "error".
  • CLAMD_ENDPOINT: service name for accessing "clamd-service" service. Default: clamd-service
  • CLAMD_PORT: service port number for accessing "clamd-service" service. Default: 9443
  • DELETE_FILE: option enabling the file deletion on bucket is its infected. Default: true
  • TAG_FILES: option enabling adding tag to scanned files on the bucket. Default: true
  • TAG_KEY: tag added to scanned files if TAG_FILES is true: Default: scanned
  • VOLUME_SIZE=: maxium scanned file size (in GB): Default: 2
  • REPORT_CLEAN: option enabling sending of not resulting scan execution. Default: false
  • PUBLISH_URL: option setting up the endpoint where to send the result of scanning execution. Default: https://some-service.example.com/notification

Paths

Paths and folders inside the different services has been distributed as follows:

service name path usage
minio-service /root/.minio/certs certificates
minio-service /data/<bucket_name> bucket data
minio-service /data/.minio.sys config files
rabbitmq-service /var/log/rabbitmq rabbitmq service log
rabbitmq-service /var/lib/rabbitmq rabbitmq service queue
avscan-script /opt/av-scan/worker.rb avscan ruby script
avscan-script /opt/av-scan/av-scan.conf avscan configuration file
avscan-script /opt/av-scan/av-scan.log avscan log
clamd-service /var/lib/clamav/ clamd virus definitions
clamd-service /usr/bin/entrypoint.sh entrypoint scripts
clamd-service /etc/clamav/clamd.conf clamd configuration file
clamd-service /var/log/clamav/clamd.log clamd daemon log

Networking

The following ports and paths are accessible on the deploy:

service name port and path protocol usage
minio-service 3310/ tcp ClamD listening port
clamd-service 9443/scan https ClamD HTTP scan
clamd-service 9443/metrics https ClamD Prometheus metrics
minio-service 9001/ https MinIO Web Console
minio-service 9000/ ssl MinIO bucket service
mq-service 15672/ tcp/http RabbitMQ queue control plugin
avscan-script 8080/ tcp avscan-script readiness check

Technical decisions

The following decisions has been considered done during the implementation:

  • all custom logging messages starting with [SERVICE_NAME] for a clear understanding
  • using official aws s3 client: instead of using MinIO client for uploading/downloading files into the bucket, as same AWS API is implemented and also using basic funcionality.
  • Docker images
    • pinned versions avoiding missconfigurations for features deprecation.
    • using "rabbitmq management-alpine" image instead of regular "rabbitmq" image: rabbitmq_management necessary plugin is enabled by default on management docker images, cannot be enabled by env vars and there is minimal size and software diferences, rabbitmq_management plugin is enabled by default ()
    • using "ruby" regular image instead of "ruby-slim": additional libraries are needed, and it's installation extend the installation boot first boot even more.
  • Docker volumes for data and logging:
    • keeping state and container configuration after failure restarts
    • making it accessible and visible for future additional components (logging sidecar)
    • mount binding for entrypoint, avoiding new volumes creation
  • Custom container entrypoints
    • using dynamic "until ... do" loops instead of hardcoded sleep commands, avoiding raise conditions on boot dependencies
    • using info/error logging levels as possible, avoiding unnecessary information and getting faster boot
    • using separate entrypoint script files, for a better easier debug and understanding
  • restart policies
    • initialization services (openssl-init, minio-service-init, mq-service-init) needs to run once, just setting up the other containers and exit
    • stable services (minio-service, mq-service, clamd-service), stores his state and logs on volumes, and needs to be restarted if anithing wrong is detected with the healthcheck
  • avscan-script readiness check
    • launching a netcat listening a port, at the last moment, as a basic way to know the gem install has been finished
  • Architecture
    • getting file upload scalability, due multiple avscan script and clamd containers can be deployed subscribed to the same MQ queue
    • why not changing the workflow making the uploading file service scan it first to clamd before storing it into s3 bucket? Im assuming as a requirements:
      • to don't modify the legacy application workflow
      • Same working MinIO bucket/cluster could be needed to keep
      • solution does not scale if you don't decouple the notification-script-scan by a queue

Security and best practices

  • In transit encrypted traffic by using SSL certificates on minio-service (port 9000) and clamd SSL (9443)
  • no using admin neither default users, minio, and creating regular non-priviledged users (miniouser, rabbituser) non-priviledge uses
  • exposing minimal necessary ports (no plain http protocols), reducing attach surface
  • exposing only necessary credentials and vars on each container (non admin users)
  • random filename usage inside avscan-script (securerandom gem), and ensuring its deleted after every scan
  • virus definition files are updated by clamd-service

Known issues

  • dependencies between services/containers are not properly managed on docker-compose. As a workaround, the entrypoint script is holding the execution until the needed service/endpoint is ready
  • initialization time for avscan-script service (around 11 minutes). Installing needed gem dependencies for ruby script takes time during initial boot. I would like to keep using public docker images and dynamic initialization script (entrypoint) for better solution understanding. As a workaround, a custom Docker image could be generated and used with the gem dependencies already installed.
  • self-signed certificates. Due for infra limitations, certificates generated on deploy with openssl time are self-signed. Even this ensures in-transit encrypted comunication, certificates itself are not accepted as hey are not using a valid CA/DNS domain. As a workaround, ssl verify has been disabled on aws s3 sdk and also on chrome.

Moving to production and possible upgrades

Useful links

Possible alternatives

About

Integration of antivirus new files scan by using bucket event notifications, mq queue, a ruby script and clamd.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors