Hub Configuration

The Hub orchestrates the grid. It requires a connection to PostgreSQL.

hub.config.json

{
  "webdriverSessionPath": "/wd/hub",
  "storage": {
    "type": "local",
    "local": {
      "basePath": ".cache/device-farm/logs"
    }
    // Options: "aws-s3", "cloudflare-r2"
  },
  "ssl": {
    "enabled": false,
    "cert": "./cert.pem",
    "key": "./key.pem"
  },
  "nodeMonitor": {
    "checkInterval": 10000,
    "timeoutMs": 40000
  },
  "websocket": {
    "heartbeatInterval": 30000,
    "nodeAvailabilityCheckTimeout": 60000,
    "sessionCreationTimeout": 120000,
    "sessionRequestTimeout": 30000,
    "nodeCleanupDelaySeconds": 30
  },
  "webrtc": {
    "browserDisconnectTimeout": 30000,
    "allowSessionReconnection": true
  }
}

SSL Configuration

The Device Farm hub can run HTTPS directly without nginx. The hub server serves everything: API, WebSocket, and frontend.

Configuration Options:

Field	Type	Default	Description
`enabled`	boolean	`false`	Enable HTTPS on the hub server
`cert`	string	-	Path to SSL certificate file (absolute path recommended)
`key`	string	-	Path to SSL private key file (absolute path recommended)

What Gets Served:

When you enable HTTPS on the hub server, everything runs on HTTPS:

✅ Frontend UI - Served directly by Express at https://localhost:3000/
✅ API Endpoints - All /api/* routes
✅ WebSocket - Socket.IO connections at https://localhost:3000/socket.io
✅ WebDriver Endpoints - Appium endpoints at /wd/hub/*

Generating SSL Certificates

For development, you can generate self-signed certificates:

# Create certs directory
mkdir -p certs
cd certs

# Generate self-signed certificate (for development)
openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365 -nodes \
  -subj "/C=US/ST=State/L=City/O=Device Farm/CN=localhost" \
  -addext "subjectAltName=DNS:localhost,IP:127.0.0.1"

# Set secure permissions
chmod 600 key.pem
chmod 644 cert.pem

cd ..

Configuration File Method

Configure SSL in hub.config.json:

{
  "ssl": {
    "enabled": true,
    "cert": "/absolute/path/to/certs/cert.pem",
    "key": "/absolute/path/to/certs/key.pem"
  }
}

Environment Variables Method

Alternatively, you can use environment variables:

# Enable HTTPS
export SSL_ENABLED=true
export SSL_CERT=$(pwd)/certs/cert.pem
export SSL_KEY=$(pwd)/certs/key.pem

# Optional: Certificate Authority bundle
# export SSL_CA=$(pwd)/certs/ca.pem

# Optional: Private key passphrase (if your key is encrypted)
# export SSL_PASSPHRASE=your_passphrase

Production Certificates

For production, use certificates from a trusted CA (e.g., Let's Encrypt):

# Install certbot
sudo apt-get install certbot  # or brew install certbot

# Get certificate for your domain
sudo certbot certonly --standalone -d your-domain.com

# Use the certificates
export SSL_CERT=/etc/letsencrypt/live/your-domain.com/fullchain.pem
export SSL_KEY=/etc/letsencrypt/live/your-domain.com/privkey.pem

Or configure in hub.config.json:

{
  "ssl": {
    "enabled": true,
    "cert": "/etc/letsencrypt/live/your-domain.com/fullchain.pem",
    "key": "/etc/letsencrypt/live/your-domain.com/privkey.pem"
  }
}

Node Configuration for HTTPS

When the hub uses HTTPS, nodes must connect using HTTPS as well. Update your node.config.json:

{
  "hubUrl": "https://localhost:3000",
  "accessKey": "your-access-key",
  "token": "your-token"
}

Or via environment:

export HUB_URL=https://localhost:3000

Troubleshooting

Browser Shows "Not Secure" Warning:

This is normal for self-signed certificates. Click "Advanced" → "Proceed to localhost" (Chrome) or add an exception.

Certificate File Not Found:

# Check files exist
ls -la certs/cert.pem
ls -la certs/key.pem

# Use absolute paths in config files
export SSL_CERT=$(pwd)/certs/cert.pem
export SSL_KEY=$(pwd)/certs/key.pem

Node Can't Connect to Hub:

Ensure hubUrl in node config uses https:// not http://
If using self-signed certs, you may need to disable SSL verification in the HTTP client (not recommended for production)

Node Monitor Configuration

Controls health monitoring of connected nodes to detect disconnections and failures. This is critical for maintaining system reliability in a distributed environment.

Why It's Needed:

In a distributed system with hub-node architecture, nodes can fail or become unreachable for many reasons:

Hardware Failures: Node machine crashes, power loss, or hardware malfunction
Network Issues: Network disconnections, firewall changes, or connectivity problems
Process Crashes: Node process terminates unexpectedly due to bugs or resource exhaustion
System Overload: Node becomes unresponsive due to high CPU/memory usage
Silent Failures: WebSocket connection appears active but node is actually frozen or hung

Without active monitoring, the hub would:

Continue sending requests to dead nodes (wasting time waiting for timeouts)
Show incorrect device availability in the dashboard (devices on dead nodes appear available)
Leave sessions in limbo (active sessions on failed nodes never get cleaned up)
Cause test failures and poor user experience

How It Works:

Periodic Health Checks: Every 10 seconds, the hub sends a ping to all connected nodes
Response Monitoring: Each node must respond within 40 seconds (4x check interval for network tolerance)
Failure Detection: If a node doesn't respond within timeout, it's marked offline
Automatic Cleanup: Hub triggers cleanup of all devices and sessions on the failed node
Recovery: When node comes back online, it's automatically detected and re-registered

Configuration Options:

Field	Type	Default	Description
`checkInterval`	number	`10000`	How often to check node health in milliseconds (default: 10 seconds). The hub periodically pings all connected nodes to ensure they're responsive.
`timeoutMs`	number	`40000`	Maximum time to wait for node response before marking as offline (default: 40 seconds). If a node doesn't respond within this timeout, it's marked offline and cleanup is started.

Recommended Settings:

timeoutMs should be 3-4x checkInterval to tolerate temporary network delays
Default (10s/40s): Good balance for most environments - detects failures quickly without false positives
Unstable networks (15s/60s): Longer intervals to avoid marking nodes offline due to temporary network hiccups
High-reliability (5s/20s): Faster detection in stable networks with redundant nodes

WebSocket Configuration

Controls WebSocket communication between hub and nodes, including heartbeats and request timeouts.

Why Heartbeat Is Needed:

WebSocket connections can appear "open" even when the connection is actually dead due to:

Network middleboxes (firewalls, NAT, proxies) closing idle connections without notification
TCP timeouts where the connection dies silently without proper close handshake
One-way communication loss where hub can send but node can't respond (or vice versa)

Without heartbeats:

Dead connections wouldn't be detected until a command is sent (wasting time)
Idle connections would be closed by firewalls/proxies (breaking communication)
Hub would only learn about node failures when trying to create a session (bad user experience)

Heartbeat vs Node Monitor:

These work together but serve different purposes:

Feature	Heartbeat (`heartbeatInterval`)	Node Monitor (`nodeMonitor.checkInterval`)
Direction	Node → Hub (node sends, hub listens)	Hub → Node (hub sends ping, node responds)
Purpose	Keep WebSocket alive, passive connection check	Active health check with timeout enforcement
Frequency	Every 30 seconds (more frequent)	Every 10 seconds (more frequent for faster detection)
Failure	Hub marks node offline if no heartbeat received	Hub marks node offline if ping times out
Use Case	Prevents idle connection timeouts, detects drops	Actively verifies node is responsive and processing

Configuration Options:

Field	Type	Default	Description
`heartbeatInterval`	number	`30000`	How often nodes send heartbeat messages to hub in milliseconds (default: 30 seconds). Heartbeats keep the WebSocket connection alive and help detect network issues. Should be less than firewall idle timeout (typically 60-120 seconds).
`nodeAvailabilityCheckTimeout`	number	`60000`	Maximum time to wait for node availability before sending request (default: 60 seconds). Used when querying if a node is ready to accept commands.
`sessionCreationTimeout`	number	`120000`	Timeout for session creation requests in milliseconds (default: 120 seconds). Session creation can take time due to app installation, device preparation, and Appium startup.
`sessionRequestTimeout`	number	`30000`	Timeout for regular session requests like click, getText, etc. in milliseconds (default: 30 seconds).
`nodeCleanupDelaySeconds`	number	`30`	Delay in seconds before cleaning up sessions and devices for permanently offline nodes (default: 30 seconds). When a node disconnects, the hub waits this duration before terminating sessions. If the node reconnects within this time, cleanup is cancelled. Can also be set via `NODE_CLEANUP_DELAY_SECONDS` environment variable.

Recommended Heartbeat Settings:

Default (30s): Works well with most firewalls/proxies (typically timeout after 60-120s of inactivity)
Aggressive (15s): Very strict networks with short idle timeouts
Relaxed (60s): Internal networks with no firewall restrictions (not recommended for production)
Never disable: Heartbeat is critical for connection health - always keep enabled

WebRTC Configuration

Controls WebRTC connections for manual device sessions, including browser disconnect handling and reconnection support.

Field	Type	Default	Description
`browserDisconnectTimeout`	number	`30000`	Grace period in milliseconds before terminating session after browser disconnect (default: 30 seconds). During this period, users can reconnect to their session without losing their work.
`allowSessionReconnection`	boolean	`true`	Whether to allow session reconnection within grace period (default: `true`). Set to `false` to immediately terminate sessions on browser disconnect instead of waiting for the grace period.

Session Idle Timeout

Controls automatic termination of idle manual sessions to free up devices for other users.

Why It's Needed:

Manual sessions (live device control via the web UI) can be left open accidentally when users:

Close their browser without properly ending the session
Walk away from their desk and forget to release the device
Lose network connectivity without the browser being able to send a disconnect signal

Without idle timeout:

Devices would remain locked indefinitely
Other users wouldn't be able to access those devices
Resources would be wasted on abandoned sessions

How It Works:

Activity Tracking: The frontend tracks user activity (mouse, keyboard, touch, scroll)
Heartbeat: Every 30 seconds, the frontend sends a heartbeat to the hub
Warning: 30 seconds before timeout, a warning popup appears with a countdown
Termination: If no activity, the session is automatically terminated
Backend Safety Net: The hub also monitors heartbeats and terminates orphaned sessions

Current Settings:

Component	Setting	Value	Description
Frontend	`idleTimeoutMs`	`300000` (5 min)	Time of inactivity before session termination
Frontend	Warning popup	4 min 30 sec	Warning appears 30 seconds before timeout
Backend	`orphanedThresholdMs`	`300000` (5 min)	Backend safety net for missed heartbeats
Backend	`checkIntervalMs`	`60000` (60 sec)	How often backend checks for orphaned sessions

User Experience:

User is inactive for 4 minutes 30 seconds → Warning popup appears with 30-second countdown
User clicks "Continue Session" → Timer resets, session continues
User doesn't respond within 30 seconds → Session is terminated, user is redirected to home

Note: This idle timeout only applies to manual sessions (live device control). Automation sessions use Appium's newCommandTimeout capability instead, which defaults to 60 seconds but can be configured per-session.