TinyMCE AI on-premises: Production deployment guide

Architecture overview

Production deployment topology with reverse proxy AI service replicas database and Redis behind TLS

The AI service is stateless, persists all state to MySQL/PostgreSQL and Redis, and scales horizontally behind a load balancer.

TLS / HTTPS

The AI service does not terminate Transport Layer Security (TLS). Place a reverse proxy in front.

Nginx example

server {
    listen 443 ssl;
    server_name ai.example.com;

    ssl_certificate     /etc/ssl/certs/ai.example.com.pem;
    ssl_certificate_key /etc/ssl/private/ai.example.com.key;

    location / {
        proxy_pass http://ai-service:8000;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # SSE streaming support
        proxy_buffering off;
        proxy_cache off;
        proxy_read_timeout 300s;
    }
}

Server-Sent Events (SSE) streaming requires proxy_buffering off. Without it, AI responses appear to hang until the entire response is generated.

AWS ALB

  • Target group: HTTP on port 8000

  • Health check path: /health

  • Idle timeout: 300 seconds (for long AI responses)

  • Stickiness: not required (service is stateless)

Horizontal scaling

The AI service is stateless. All persistent state lives in the SQL database, Redis, and the file-storage back end. Any number of replicas can run behind a load balancer. All replicas must share identical environment variable configuration.

Scaling considerations

Component Scaling approach

AI service

Add more containers (stateless)

MySQL / PostgreSQL

Read replicas or managed DB (RDS, Cloud SQL, Azure Database)

Redis

Redis Cluster or Sentinel; managed Redis (ElastiCache, Memorystore, Azure Cache)

File storage

S3 / Azure Blob recommended for production. The database storage driver is intended for development only.

When deploying for the first time or upgrading to a new version, start a single instance and wait for it to become healthy before scaling up. Subsequent scale events do not require this precaution.

Podman deployment

The AI service works with Podman as an alternative to Docker. In Podman, containers within a pod share a network namespace, so use 127.0.0.1 instead of container names for hostnames.

podman login -u 'TINY_REGISTRY_USERNAME' registry.containers.tiny.cloud

podman pull registry.containers.tiny.cloud/ai-service:latest

podman pod create --name ai-pod -p 8000:8000 -p 3306:3306 -p 6379:6379

podman run -d --pod ai-pod --name mysql \
  -e MYSQL_ROOT_PASSWORD=ROOT_PASSWORD \
  -e MYSQL_DATABASE=ai_service \
  mysql:8.0

podman run -d --pod ai-pod --name redis redis:7

podman run --init -d --pod ai-pod --name ai-service \
  -e LICENSE_KEY='T8LK:...' \
  -e ENVIRONMENTS_MANAGEMENT_SECRET_KEY='MANAGEMENT_SECRET' \
  -e DATABASE_DRIVER='mysql' \
  -e DATABASE_HOST='127.0.0.1' \
  -e DATABASE_USER='root' \
  -e DATABASE_PASSWORD='ROOT_PASSWORD' \
  -e DATABASE_DATABASE='ai_service' \
  -e REDIS_HOST='127.0.0.1' \
  -e PROVIDERS='{"openai":{"type":"openai","apiKeys":["sk-proj-..."]}}' \
  -e STORAGE_DRIVER='database' \
  registry.containers.tiny.cloud/ai-service:latest
Pin to mysql:8.0. The mysql:8 tag floats to MySQL 8.4, which removes the default-authentication-plugin flag and causes a crash loop. See Database, Redis, and storage for details.

Kubernetes deployment

Namespace and image pull secret

kubectl create namespace tinymce-ai

kubectl create secret docker-registry tiny-registry \
  --namespace tinymce-ai \
  --docker-server=registry.containers.tiny.cloud \
  --docker-username=TINY_REGISTRY_USERNAME \
  --docker-password='TINY_REGISTRY_ACCESS_TOKEN'

Application secrets

apiVersion: v1
kind: Secret
metadata:
  name: ai-service-secrets
  namespace: tinymce-ai
type: Opaque
stringData:
  license-key: "EXAMPLE_LICENSE_KEY"
  management-secret: "EXAMPLE_MANAGEMENT_SECRET"
  db-password: "EXAMPLE_DB_PASSWORD"
  redis-password: "EXAMPLE_REDIS_PASSWORD"
  providers: |
    {
      "openai": {
        "type": "openai",
        "apiKeys": ["sk-proj-EXAMPLE_KEY"]
      }
    }

In production, use Sealed Secrets, External Secrets Operator, or HashiCorp Vault rather than committing raw secret manifests.

Deployment

Full Kubernetes Deployment manifest
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-service
  namespace: tinymce-ai
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ai-service
  template:
    metadata:
      labels:
        app: ai-service
    spec:
      imagePullSecrets:
        - name: tiny-registry
      containers:
        - name: ai-service
          image: registry.containers.tiny.cloud/ai-service:latest
          ports:
            - containerPort: 8000
          env:
            - name: LICENSE_KEY
              valueFrom:
                secretKeyRef:
                  name: ai-service-secrets
                  key: license-key
            - name: ENVIRONMENTS_MANAGEMENT_SECRET_KEY
              valueFrom:
                secretKeyRef:
                  name: ai-service-secrets
                  key: management-secret
            - name: DATABASE_DRIVER
              value: "mysql"
            - name: DATABASE_HOST
              value: "mysql.tinymce-ai.svc.cluster.local"
            - name: DATABASE_USER
              value: "ai_service"
            - name: DATABASE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: ai-service-secrets
                  key: db-password
            - name: DATABASE_DATABASE
              value: "ai_service"
            - name: REDIS_HOST
              value: "redis.tinymce-ai.svc.cluster.local"
            - name: REDIS_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: ai-service-secrets
                  key: redis-password
            - name: PROVIDERS
              valueFrom:
                secretKeyRef:
                  name: ai-service-secrets
                  key: providers
            - name: STORAGE_DRIVER
              value: "s3"
            - name: STORAGE_REGION
              value: "us-east-1"
            - name: STORAGE_BUCKET
              value: "example-ai-storage-bucket"
            - name: ENABLE_METRIC_LOGS
              value: "true"
          readinessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 30
            periodSeconds: 10
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "2Gi"
              cpu: "2000m"

Service

apiVersion: v1
kind: Service
metadata:
  name: ai-service
  namespace: tinymce-ai
spec:
  selector:
    app: ai-service
  ports:
    - port: 8000
      targetPort: 8000

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ai-service
  namespace: tinymce-ai
  annotations:
    nginx.ingress.kubernetes.io/proxy-buffering: "off"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
spec:
  tls:
    - hosts:
        - ai.example.com
      secretName: ai-tls-cert
  rules:
    - host: ai.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: ai-service
                port:
                  number: 8000

Horizontal pod autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-service-hpa
  namespace: tinymce-ai
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-service
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

AWS ECS / Fargate

Task definition

Full ECS Fargate task definition
{
  "family": "ai-service",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "ai-service",
      "image": "registry.containers.tiny.cloud/ai-service:latest",
      "portMappings": [{ "containerPort": 8000 }],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3
      },
      "secrets": [
        { "name": "LICENSE_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:111122223333:secret:ai-license" },
        { "name": "ENVIRONMENTS_MANAGEMENT_SECRET_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:111122223333:secret:ai-mgmt-secret" },
        { "name": "DATABASE_PASSWORD", "valueFrom": "arn:aws:secretsmanager:us-east-1:111122223333:secret:ai-db" },
        { "name": "PROVIDERS", "valueFrom": "arn:aws:secretsmanager:us-east-1:111122223333:secret:ai-providers" }
      ],
      "environment": [
        { "name": "DATABASE_DRIVER", "value": "mysql" },
        { "name": "DATABASE_HOST", "value": "example-rds-endpoint.region.rds.amazonaws.com" },
        { "name": "DATABASE_USER", "value": "ai_service" },
        { "name": "DATABASE_DATABASE", "value": "ai_service" },
        { "name": "REDIS_HOST", "value": "example-elasticache-endpoint.region.cache.amazonaws.com" },
        { "name": "STORAGE_DRIVER", "value": "s3" },
        { "name": "STORAGE_BUCKET", "value": "example-ai-storage-bucket" },
        { "name": "STORAGE_REGION", "value": "us-east-1" }
      ]
    }
  ]
}

Infrastructure recommendations

Service AWS recommendation

Database

RDS for MySQL 8.0 (Multi-AZ for high availability (HA))

Redis

ElastiCache for Redis 7 (cluster mode)

Storage

Same-region S3 bucket

Load balancer

ALB with /health target health check, 300 s idle timeout

Secrets

AWS Secrets Manager

Registry pull credentials

Secrets Manager + ECR pull-through cache, or a private repository mirroring registry.containers.tiny.cloud

Security hardening

Practice Implementation

Network isolation

Place the AI service in a private subnet; expose only through a load balancer. Restrict database and Redis to the AI service security group.

Block panel from the public internet

Restrict /panel/ to an admin VPN or IP allowlist. The panel manages secrets and access keys.

TLS everywhere

Terminate TLS 1.3 at the reverse proxy. Use internal mutual TLS (mTLS) between the AI service and the data layer where supported.

Secrets management

Use Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager. Never store secrets directly in orchestration manifests or commit them to source control.

Database encryption at rest

Turn on encryption at rest in the cloud provider console. RDS, Cloud SQL, and Azure Database enable this by default.

Redis authentication

Always set REDIS_PASSWORD (or use a managed Redis instance with authentication enabled).

Container security

Run as non-root, use a read-only filesystem where possible, and drop unnecessary Linux capabilities.

Image scanning

Scan registry.containers.tiny.cloud/ai-service with Trivy, Snyk, or the registry’s built-in scanner.

Least-privilege JSON Web Tokens (JWTs)

Grant only the permissions each user role requires. Avoid full-access tokens in production.

API secret rotation

Periodically create a new access key, add the new key to the configuration, then revoke the old key. The token endpoint reads the secret at request time.

Audit logging

Enable ENABLE_METRIC_LOGS=true and ship logs to a Security Information and Event Management (SIEM).

Large language model (LLM) API key rotation

Add the new key to the PROVIDERS array, restart the service, then revoke the old key after confirming the new one works.

Rate limiting

The AI service has no built-in rate limiting. Place rate-limit rules in front of the service to prevent a runaway client from consuming LLM provider quota or overloading the database.

nginx

limit_req_zone $http_authorization zone=ai_jwt:10m rate=10r/s;

server {
    location /v1/ {
        limit_req zone=ai_jwt burst=20 nodelay;
        proxy_pass http://ai-service:8000;
        proxy_buffering off;
        proxy_read_timeout 300s;
    }
}

AWS ALB / WAF

ALB does not rate limit natively. Use AWS WAF with a rate-based rule keyed on the Authorization header.

Cloudflare

Use Cloudflare Rate Limiting with a custom rule keyed on the Authorization header for the AI service hostname.

For per-tenant rate limiting, key on the aud claim by parsing it in the reverse proxy, or gate token issuance per tenant per minute at the token endpoint.

Observability

Health monitoring

Poll /health on each instance to confirm it is running. A healthy instance responds with HTTP 200.

curl -f http://ai-service:8000/health

Structured metric logs

Set the ENABLE_METRIC_LOGS environment variable to enable request-level JSON logs to stdout:

-e ENABLE_METRIC_LOGS='true'

When enabled, the service writes a structured JSON entry for each request. Key fields include the request duration, HTTP status code, and outcome status. These entries are suitable for ingestion into any log aggregator that supports JSON parsing.

OpenTelemetry

-e LLM_TELEMETRY_ENABLED='true' \
-e OTEL_EXPORTER_OTLP_TRACES_ENDPOINT='http://otel-collector:4318/v1/traces' \
-e OTEL_TRACES_SAMPLER_ARG='1.0' \
-e OTEL_DEBUG='true'
Variable Required Default Description

LLM_TELEMETRY_ENABLED

Yes

false

Primary telemetry switch

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT

Yes

-

OpenTelemetry Protocol (OTLP) endpoint URL

OTEL_TRACES_SAMPLER_ARG

No

1.0

Sampling rate (0.0 to 1.0)

OTEL_DEBUG

No

-

Verbose OTLP diagnostic logging

Compatible with Jaeger, Grafana Tempo, Datadog, New Relic, Honeycomb, and any OTLP-compatible back end.

Langfuse

Langfuse provides AI-specific observability: token usage, latency per LLM call, prompt quality scores, and cost tracking.

-e LANGFUSE_PUBLIC_KEY='pk-lf-...' \
-e LANGFUSE_SECRET_KEY='sk-lf-...' \
-e LANGFUSE_BASE_URL='https://cloud.langfuse.com' \
-e LANGFUSE_DEBUG='true'
Variable Required Default Description

LANGFUSE_PUBLIC_KEY

Yes (if used)

-

Langfuse public key

LANGFUSE_SECRET_KEY

Yes (if used)

-

Langfuse secret key

LANGFUSE_BASE_URL

No

https://cloud.langfuse.com

Self-hosted Langfuse URL

LANGFUSE_DEBUG

No

-

Verbose Langfuse logging

Langfuse also requires LLM_TELEMETRY_ENABLED=true and a valid OTEL_EXPORTER_OTLP_TRACES_ENDPOINT.

OpenTelemetry and Langfuse can run at the same time. The service emits to both without conflict.

Distributed logging

For production multi-instance deployments, ship container logs to a central aggregator.

Platform Log driver / approach

AWS

CloudWatch Logs through the awslogs driver, or Fluent Bit DaemonSet on EKS

GCP

Cloud Logging (automatic on GKE), or Fluent Bit

Azure

Azure Monitor (automatic on Azure Container Apps and AKS)

Self-hosted (ELK)

Fluent Bit or Filebeat to Elasticsearch + Kibana

Self-hosted (Loki)

Fluent Bit or Promtail to Grafana Loki

Fluentd

Use the Docker fluentd log driver

Fluentd log driver example
docker run ... \
  --log-driver=fluentd \
  --log-opt fluentd-address=localhost:24224 \
  --log-opt tag=ai-service \
  ...

The metric logs produced by the ENABLE_METRIC_LOGS option are already structured JSON and parse cleanly in any aggregator.

The following checks help catch common issues early:

  • Health endpoint — poll /health on each instance; alert if any instance returns a non-200 response for more than 60 seconds.

  • Error rate — monitor the HTTP 5xx rate in the metric logs or traces; a sustained increase may indicate an LLM provider outage or a misconfigured environment.

  • Latency — track request duration; a sudden increase typically points to LLM provider throttling or network issues.

  • Container restarts — alert on repeated container restarts, which may indicate a missing environment variable or a database connectivity problem.

For troubleshooting specific error patterns, see Troubleshooting.

Backup and recovery

Database

The database contains environments, access keys, conversations, messages, and file metadata. Back up the database using standard production practices:

  • MySQL: mysqldump or managed snapshots (RDS automated backups).

  • PostgreSQL: pg_dump or managed snapshots.

Enable point-in-time recovery.

File storage

Back end Backup approach

database

The SQL database stores file blobs; database backups include them.

filesystem

Back up the mounted volume.

s3

Enable versioning on the bucket for point-in-time recovery.

azure

Enable Blob versioning.

Redis

Redis holds ephemeral state. Losing Redis data does not affect persistent data. No backup is required.

Upgrade process

  1. Pull the new image:

    docker pull registry.containers.tiny.cloud/ai-service:NEW_VERSION
  2. For rolling deploys across version boundaries: start one instance at the new version and wait for it to become healthy before rolling the rest.

  3. For Kubernetes: update the image tag in the Deployment. The default RollingUpdate strategy handles zero-downtime upgrades, provided the first new pod becomes Ready before the rollout continues.

  4. Verify /health on every replica before declaring the upgrade complete.

Review the release notes for the target version and take a database backup before upgrading.

License keys are per-deployment, not per-replica. One key covers any number of replicas of a single deployment.

Performance characteristics

Metric Typical value

Cold start

Approximately 3 seconds

Health check response

Less than 10 ms

Token validation

Less than 5 ms

Time to first token (LLM)

200 ms to 2 s (depends on provider and model)

Memory per instance

256 to 512 MB

Concurrent connections

1,000+ per instance

These values are approximate and vary with hardware, provider latency, and prompt complexity. The LLM provider’s rate limits are typically the binding constraint before the AI service becomes one.

Sizing guide

Users AI service replicas Database Redis Notes

1 to 50

1

db.t3.small (or 2 vCPU / 4 GB self-managed)

cache.t3.micro

Development and small teams

50 to 500

2

db.r6g.large

cache.r6g.large

Small production

500 to 5,000

3 to 5

db.r6g.xlarge (Multi-AZ)

cache.r6g.xlarge (cluster)

Medium production

5,000+

5+ (Horizontal Pod Autoscaler (HPA))

db.r6g.2xlarge+

cache.r6g.2xlarge+

Large production; contact Tiny for guidance

Starting point for self-managed deployments:

  • AI service instance: 2 vCPU / 4 GB RAM

  • Database instance: 2 vCPU / 8 GB RAM

  • Redis instance: 1 vCPU / 2 GB RAM

Scale based on user count, average prompt size, and concurrent streaming connections. The LLM provider’s rate limits are usually the binding constraint long before the AI service or database becomes one.