Tag: Linux

Multi Stage Docker Builds – Reduce Image Size and DevOps Risk

Docker has become the method of modern software delivery, but naïve Dockerfiles often produce bloated images that contain build tools, source code, and temporary files. Large images increase attack surface, slow down CI pipelines, and waste bandwidth. Multi‑stage builds solve these problems by allowing you to compile or assemble artifacts in one stage and copy only the final binaries into a lean runtime stage.

This article explains how multi‑stage Docker builds work, walks through a real‑world example for a Go microservice, shows how to apply the technique to Python, Node.js, and compiled C applications, and provides best‑practice tips for security and DevOps risk reduction.

What Is a Multi‑Stage Build?

A multi‑stage build is simply a Dockerfile that defines multiple FROM statements, each creating its own intermediate image. You can reference any earlier stage by name using the --from= flag in a COPY command. The final image is whatever you CMD or ENTRYPOINT at the end of the file, and everything else is discarded.

Example skeleton:

# Stage 1 – Build environment
FROM golang:1.22-alpine AS builder
WORKDIR /src
COPY . .
RUN go build -o app .

# Stage 2 – Runtime environment
FROM alpine:3.19
COPY --from=builder /src/app /usr/local/bin/app
EXPOSE 8080
CMD ["app"]

Only the second stage’s layers are present in the final image, resulting in a tiny Alpine base plus your compiled binary.

Why Multi‑Stage Improves DevOps

Issue	Traditional Dockerfile	Multi‑stage Solution
Large size (hundreds of MB)	Build tools stay in image	Only runtime deps remain
Secret leakage (e.g., API keys used during build)	Keys may be left in layers	Secrets never copied to final stage
Slow CI/CD pipelines	Long `docker push/pull` times	Faster transfers, less storage
Vulnerability surface	Build‑time packages stay installed	Minimal base reduces CVE count

Step 1 – Write a Simple Service

Create main.go:

package main

import (
    "fmt"
    "log"
    "net/http"
)

func handler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintln(w, "Hello from multi‑stage Go!")
}

func main() {
    http.HandleFunc("/", handler)
    log.Println("Listening on :8080")
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Step 2 – Dockerfile with Multi‑Stage

# ---------- Builder ----------
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o server .

# ---------- Runtime ----------
FROM alpine:3.19
LABEL maintainer="devops@example.com"
RUN addgroup -S app && adduser -S -G app app
WORKDIR /home/app
COPY --from=builder /app/server .
USER app
EXPOSE 8080
ENTRYPOINT ["./server"]

Why This Is Secure

CGO_ENABLED=0 builds a static binary, eliminating the need for glibc in the runtime stage.
The runtime image runs as an unprivileged user (app).
No source files or Go toolchain are present after copy.

Step 3 – Build and Verify Size

docker build -t go‑microservice:latest .
docker images | grep go‑microservice

Typical output:

go-microservice   latest   a1b2c3d4e5f6   12MB   2 minutes ago

Contrast with a monolithic Dockerfile that might be >150 MB.

Example 2 – Python Application with Dependencies

Python projects often rely on pip to install many libraries, which can bloat images. Multi‑stage builds let you compile wheels in a builder and copy only the needed packages.

Project Layout

app/
├── requirements.txt
└── main.py

requirements.txt

flask==3.0.2
requests==2.31.0

main.py

from flask import Flask, jsonify
import requests

app = Flask(__name__)

@app.route("/")
def hello():
    return jsonify(message="Hello from multi‑stage Python!")

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Dockerfile

# ---------- Builder ----------
FROM python:3.12-slim AS builder
WORKDIR /src
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt --target=/install

# ---------- Runtime ----------
FROM python:3.12-alpine
LABEL maintainer="devops@example.com"
ENV PYTHONUNBUFFERED=1
WORKDIR /app
COPY --from=builder /install /usr/local/lib/python3.12/site-packages
COPY app/ .
EXPOSE 5000
CMD ["python", "main.py"]

Key Points

The builder uses python:3.12-slim which includes build tools like gcc.
After installing dependencies to /install, we copy that directory into the Alpine runtime, which does not have any compilers.
This results in a final image of ~70 MB versus >200 MB for a single‑stage approach.

Build and Test

docker build -t py‑multi:latest .
docker run --rm -p 5000:5000 py‑multi:latest

Visit http://localhost:5000 to see the JSON response.

Example 3 – Node.js with Native Addons

Node projects that compile native addons (e.g., bcrypt) need a full build environment. Use multi‑stage to keep only compiled binaries.

Dockerfile

# ---------- Builder ----------
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev   # install production deps (includes native compile)
COPY . .
RUN npm run build       # if you have a build step (e.g., TypeScript)

# ---------- Runtime ----------
FROM node:20-alpine
LABEL maintainer="devops@example.com"
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist   # if you built a dist folder
EXPOSE 3000
CMD ["node", "dist/index.js"]

The builder stage compiles native addons using the Alpine toolchain, then discards npm itself and any development dependencies.

Example 4 – C Application with Static Linking

When building a low‑level service in C, you may need glibc. The trick is to compile statically in the builder and copy only the binary.

Dockerfile

# ---------- Builder ----------
FROM gcc:13 AS builder
WORKDIR /src
COPY . .
RUN gcc -static -O2 -o hello_world main.c

# ---------- Runtime ----------
FROM scratch
COPY --from=builder /src/hello_world /hello_world
EXPOSE 8080
ENTRYPOINT ["/hello_world"]

scratch is an empty image, so the final container contains only the binary and nothing else. This yields a ~1 MB image.

Best Practices for Multi‑Stage Builds

Name Stages Explicitly – Use AS builder, AS runtime etc., to improve readability.
Minimize Layers – Combine related commands with && to reduce intermediate layers, especially in the final stage.
Leverage .dockerignore – Exclude source files, test data, and local caches from being sent to the daemon. Example:*.log node_modules .git __pycache__
Use Trusted Base Images – Prefer official minimal images (alpine, scratch) for runtime stages. Verify image digests if security is critical.
Scan Final Image – Run tools like trivy or docker scan on the final stage to ensure no known CVEs are present.trivy image my‑app:latest
Avoid Secrets in Build Args – Never pass API keys via ARG. If you need them for a build step, inject them at runtime instead of copying into the final stage.
Set Non‑Root User – Always create and switch to an unprivileged user in the runtime stage (USER app).

Automating Multi‑Stage Builds in CI/CD

Most CI systems (GitHub Actions, GitLab CI, Jenkins) already support docker build. To enforce multi‑stage builds:

# .github/workflows/docker.yml
name: Build and Push Docker Image
on:
  push:
    branches: [ main ]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKER_USER }}
          password: ${{ secrets.DOCKER_PASS }}
      - name: Build and push multi‑stage image
        run: |
          docker buildx build \
            --platform linux/amd64,linux/arm64 \
            -t myrepo/myapp:${{ github.sha }} \
            --push .

The docker buildx command automatically uses the Dockerfile’s stages; you do not need extra flags.

Reducing DevOps Risk with Multi‑Stage

Predictable Deployments – Smaller images mean fewer unexpected runtime dependencies.
Faster Rollbacks – Pulling a 10 MB image is almost instantaneous compared to a 200 MB one, enabling quick recovery.
Lower Cost – Reduced storage on container registries and less bandwidth usage in CI pipelines translate to cost savings.

Conclusion

Multi‑stage Docker builds are a simple yet powerful technique that transforms bloated images into lean, secure artifacts. By separating build-time tooling from runtime dependencies, you shrink image size, eliminate secret leakage, improve pipeline speed, and reduce the attack surface. The examples above cover Go, Python, Node.js, and native C workloads, showing how universal this approach is across languages. Adopt the best‑practice checklist, integrate scanning tools, and enforce multi‑stage builds in your CI pipelines to dramatically lower DevOps risk while delivering faster, more reliable containers.

December 9, 2025

Send Images to a Vision Language Model (VLM) API
Vision language models such as Ollama, LM Studio, or any OpenAI‑style endpoint accept image data alongside text prompts. This tutorial shows you how to capture an image on a Linux box, encode it for HTTP transmission, and invoke the VLM API using both curl and Python. The examples are deliberately simple so they can be adapted to shell scripts, CI pipelines, or edge devices.

Why Send Images to a VLM?
- Multimodal assistants – combine visual context with natural language queries
- Document analysis – extract text from scanned PDFs or screenshots
- Rapid prototyping – test model responses without writing full client libraries
All of these use cases require you to send binary image data (usually JPEG or PNG) as part of a multipart/form‑data request, or base64‑encoded JSON payloads depending on the server’s expectations.

Prerequisites
- A Linux system with ffmpeg or imagemagick installed for image capture
- curl version 7.55+ (most modern distros have this)
- Python 3.8+ and the requests library (pip install requests)
You also need an active VLM server endpoint:

Service Example Endpoint
Ollama http://localhost:11434/api/generate
LM Studio http://127.0.0.1:1234/v1/chat/completions

Both accept JSON bodies with an optional "image" field.

Step 1 – Capture an Image

If you have a webcam attached, you can use ffmpeg to snap a picture:
```
ffmpeg -f v4l2 -video_size 640x480 -i /dev/video0 -vframes 1 captured.jpg
```
Alternatively, with imagemagick you can capture the screen:
```
import -window root screenshot.png
```
Make sure the file exists and is readable:
```
ls -l captured.jpg
# or
file screenshot.png
```
Step 2 – Encode the Image for HTTP

Option A: Multipart Form Data (curl)

The simplest approach is to let curl handle multipart encoding. The VLM server expects a field named image. Here’s how:
```
curl -X POST http://localhost:11434/api/generate \
     -F "prompt=What do you see in this picture?" \
     -F "image=@captured.jpg;type=image/jpeg"
```
Explanation of flags:
- -X POST – explicit HTTP method
- -F – creates a form field; the @ syntax tells curl to read file contents
- type= – sets the MIME type, useful for servers that validate it
If your endpoint expects JSON instead, you need to base64‑encode the image.

Option B: Base64 JSON Payload (Python)
```
import base64
import json
import requests

# Load and encode image
with open('captured.jpg', 'rb') as f:
    img_bytes = f.read()
b64_image = base64.b64encode(img_bytes).decode('utf-8')

payload = {
    "model": "llava-v1.5",
    "prompt": "Describe the scene in detail.",
    "image": b64_image
}

headers = {'Content-Type': 'application/json'}

response = requests.post(
    'http://localhost:11434/api/generate',
    headers=headers,
    data=json.dumps(payload)
)

print(response.status_code)
print(response.json())
```
Key points:
- The image is sent as a base‑64 string in the "image" field.
- Some servers require an additional "model" key to select the VLM variant; adjust accordingly.
Step 3 – Handling Different API Schemas

Ollama’s /api/generate Endpoint

Ollama expects multipart form data or JSON with a base64 string under "images". Example using curl:
```
curl -X POST http://localhost:11434/api/generate \
     -H "Content-Type: application/json" \
     -d '{
           "model":"llava",
           "prompt":"Explain the objects in this picture.",
           "images":["'$(base64 -w 0 captured.jpg)'"]
         }'
```
The -w 0 flag tells base64 not to insert line breaks.

LM Studio’s Chat Completion Endpoint

LM Studio follows the OpenAI chat schema. Images are passed as separate "content" blocks with a "type": "image_url" entry:
```
{
  "model": "gpt-4v",
  "messages": [
    {"role":"user","content":[
        {"type":"text","text":"What is happening here?"},
        {"type":"image_url","image_url":{"url":"data:image/jpeg;base64,{{BASE64}}"}}
    ]}
  ]
}
```
In Python:
```
import base64, json, requests

with open('screenshot.png', 'rb') as img:
    b64 = base64.b64encode(img.read()).decode()

payload = {
    "model": "gpt-4v",
    "messages": [
        {"role": "user", "content": [
            {"type": "text", "text": "What is happening in this picture?"},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}}
        ]}
    ]
}

resp = requests.post('http://127.0.0.1:1234/v1/chat/completions', json=payload)
print(resp.json())
```
Notice the data: URL scheme – the server extracts the base64 payload automatically.

Step 4 – Automate with a Shell Script

You can wrap everything into a single script that captures, encodes, and calls the API. Save as vlm_send.sh:
```
#!/usr/bin/env bash

# Capture image (adjust device path if needed)
ffmpeg -y -f v4l2 -video_size 640x480 -i /dev/video0 -vframes 1 /tmp/vlm_input.jpg >/dev/null 2>&1

# Base64 encode without newlines
IMG_B64=$(base64 -w 0 /tmp/vlm_input.jpg)

# Build JSON payload (adjust model name as needed)
read -r -d '' PAYLOAD <<EOF
{
  "model": "llava",
  "prompt": "Provide a concise description of the scene.",
  "images": ["$IMG_B64"]
}
EOF

# Send request with curl
curl -s -X POST http://localhost:11434/api/generate \
     -H "Content-Type: application/json" \
     -d "$PAYLOAD"
```
Make it executable:
```
chmod +x vlm_send.sh
./vlm_send.sh
```
The script prints the JSON response from the VLM.

Step 5 – Error Handling and Debugging
- HTTP 400 – Likely a malformed JSON or missing required fields. Use curl -v to view request headers.
- HTTP 415 Unsupported Media Type – The server did not recognize the MIME type; ensure you set Content-Type: application/json when sending JSON, and use correct image MIME in multipart (image/jpeg).
- Timeouts – VLM inference can be slow for large images. Increase curl timeout with --max-time 120. In Python, pass timeout=120 to requests.post.
Performance Tips
1. Resize before sending – Large images increase payload size and processing time. Use ImageMagick:convert captured.jpg -resize 800x600 resized.jpg
2. Cache base64 strings – If you send the same image repeatedly, cache the encoded string to avoid re‑encoding overhead.
3. Batch multiple images – Some APIs accept an array of images ("images": ["b64_1","b64_2"]). This reduces round trips when analyzing a set of photos together.
Conclusion

Sending images to vision language model APIs is straightforward once you understand the required payload format. Whether you prefer raw multipart form data with curl or structured JSON with base64 encoding in Python, the steps above cover both approaches for popular servers like Ollama and LM Studio. By automating image capture, resizing, and request handling, you can embed multimodal AI capabilities into scripts, IoT gateways, or web back‑ends with minimal effort.
December 9, 2025
Nginx IP Rate Limiting – A Practical Guide
Rate limiting is a core defensive technique for any public‑facing web service. By throttling requests from a single client you can protect upstream resources, mitigate brute force attacks and keep latency predictable. Nginx ships with the limit_req module which makes it easy to enforce request caps based on IP address. This guide walks through a complete setup, explains each directive, shows how to test the limits, and offers tips for logging and fine tuning.

Prerequisites
- A Linux server with root or sudo access
- Nginx version 1.9.0 or newer (the limit_req module is built‑in in most distributions)
- Basic familiarity with editing /etc/nginx/*.conf files
If you are using a distribution that splits configuration into /etc/nginx/conf.d/ and /etc/nginx/sites-enabled/, the examples below will work in either location as long as they are included by the main nginx.conf.

Understanding the limit_req Module

The module works with two concepts:
1. Zone – a shared memory area that stores request counters keyed by a variable (usually $binary_remote_addr).
2. Limit rule – applied inside a server or location block, referencing the zone and optionally configuring burst capacity and delay behavior.
A minimal configuration looks like this:
```
limit_req_zone $binary_remote_addr zone=perip:10m rate=5r/s;
```
- $binary_remote_addr is the binary representation of the client IP, which saves memory compared to the plain string form.
- zone=perip:10m creates a 10‑megabyte shared memory segment named perip. Roughly 1 KB stores about 16 000 entries, enough for most small to medium sites.
- rate=5r/s limits each IP to five requests per second.
The actual enforcement happens with the limit_req directive:
```
location /api/ {
    limit_req zone=perip burst=10 nodelay;
    proxy_pass http://backend;
}
```
- burst=10 allows a short spike of up to ten extra requests that exceed the steady rate.
- nodelay tells Nginx to reject excess requests immediately instead of queuing them.
Step 1 – Install or Verify Nginx

On Ubuntu/Debian:
```
sudo apt update
sudo apt install nginx -y
```
On CentOS/RHEL:
```
sudo yum install epel-release -y
sudo yum install nginx -y
```
Start and enable the service:
```
sudo systemctl start nginx
sudo systemctl enable nginx
```
Check the version to confirm module availability:
```
nginx -V 2>&1 | grep -- '--with-http_limit_req_module'
```
If you see --with-http_limit_req_module in the output, you are ready.

Step 2 – Define a Shared Memory Zone

Edit /etc/nginx/nginx.conf (or create a file under conf.d/). Add the zone definition inside the http block:
```
http {
    # Existing directives ...

    limit_req_zone $binary_remote_addr zone=perip:10m rate=5r/s;

    # Include other config files
    include /etc/nginx/conf.d/*.conf;
}
```
Save and test the syntax:
```
sudo nginx -t
```
If the test passes, reload Nginx:
```
sudo systemctl reload nginx
```
Step 3 – Apply Limits to a Location

Open the site configuration you want to protect. For example /etc/nginx/conf.d/example.conf:
```
server {
    listen 80;
    server_name example.com;

    location / {
        # Default content or proxy
        root /var/www/html;
        index index.html;
    }

    # Protect the API endpoint
    location /api/ {
        limit_req zone=perip burst=10 nodelay;
        proxy_pass http://127.0.0.1:8080;
    }
}
```
The limit_req line tells Nginx to consult the perip zone for each request that matches /api/. If a client exceeds 5 r/s plus the burst of 10, Nginx returns HTTP 503 by default.

Step 4 – Customize Burst and Delay

You may want to let occasional spikes pass without penalty. Removing nodelay causes excess requests to be queued for up to one second (the rate period). Example:
```
location /api/ {
    limit_req zone=perip burst=20;
    proxy_pass http://127.0.0.1:8080;
}
```
Now a client can send 5 r/s continuously and an additional 20 requests that will be processed gradually. Adjust burst based on typical traffic patterns.

Step 5 – Test the Configuration

A quick way to verify limits is using curl in a loop:
```
for i in $(seq 1 30); do
    curl -s -o /dev/null -w "%{http_code} " http://example.com/api/status
done; echo
```
You should see a series of 200 responses followed by 503 once the limit is hit. To observe real‑time counters, enable the $limit_req_status variable in logs:
```
log_format main '$remote_addr - $status [$limit_req_status] "$request"';
access_log /var/log/nginx/access.log main;
```
After reloading Nginx, entries will show - for normal requests and 429 or 503 when the limit triggers.

Optional – Centralized Logging with Syslog

If you aggregate logs in a SIEM, add a syslog target:
```
error_log syslog:server=127.0.0.1:514,facility=local7,severity=info;
```
Now every rate‑limit event appears as a structured log entry that can trigger alerts.

Common Pitfalls
- Using $remote_addr instead of $binary_remote_addr – the plain string consumes more memory and reduces the number of entries you can store.
- Setting the zone too small – a 10 MB zone is usually enough, but high‑traffic sites may need 20 MB or more to avoid eviction of counters.
- Forgetting limit_req_status – without it you cannot tell from logs whether a 503 came from rate limiting or an upstream error.
Conclusion

Nginx’s built‑in request throttling gives you a lightweight, high‑performance way to protect services on a per‑IP basis. By defining a shared memory zone, applying the limit to specific locations, and tuning burst settings you can stop abusive traffic without adding external dependencies. The configuration snippets above are ready to copy into any modern Nginx deployment. Remember to test with realistic request patterns, monitor logs for unexpected rejections, and adjust the zone size as your traffic grows.
December 9, 2025

Service	Example Endpoint
Ollama	`http://localhost:11434/api/generate`
LM Studio	`http://127.0.0.1:1234/v1/chat/completions`

Tag: Linux

Multi Stage Docker Builds – Reduce Image Size and DevOps Risk

What Is a Multi‑Stage Build?

Why Multi‑Stage Improves DevOps

Step 1 – Write a Simple Service

Step 2 – Dockerfile with Multi‑Stage

Why This Is Secure

Step 3 – Build and Verify Size

Example 2 – Python Application with Dependencies

Project Layout

Dockerfile

Key Points

Build and Test

Example 3 – Node.js with Native Addons

Dockerfile

Example 4 – C Application with Static Linking

Dockerfile

Best Practices for Multi‑Stage Builds

Automating Multi‑Stage Builds in CI/CD

Reducing DevOps Risk with Multi‑Stage

Conclusion

Send Images to a Vision Language Model (VLM) API

Why Send Images to a VLM?

Prerequisites

Step 1 – Capture an Image

Step 2 – Encode the Image for HTTP

Option A: Multipart Form Data (curl)

Option B: Base64 JSON Payload (Python)

Step 3 – Handling Different API Schemas

Ollama’s /api/generate Endpoint

LM Studio’s Chat Completion Endpoint

Step 4 – Automate with a Shell Script

Step 5 – Error Handling and Debugging

Performance Tips

Conclusion

Nginx IP Rate Limiting – A Practical Guide

Prerequisites

Understanding the limit_req Module

Step 1 – Install or Verify Nginx

Step 2 – Define a Shared Memory Zone

Step 3 – Apply Limits to a Location

Step 4 – Customize Burst and Delay

Step 5 – Test the Configuration

Optional – Centralized Logging with Syslog

Common Pitfalls

Conclusion

Ollama’s `/api/generate` Endpoint

LM Studio’s Chat Completion Endpoint