Health & Metrics

Whisp provides commands to monitor daemon health and track usage statistics.

Health Check

Get a quick overview of daemon status:

whisp health

Output:

Daemon Health
─────────────
Version:      0.5.0
Uptime:       2d 4h 32m 15s
Provider:     openai

Requests:     1,247
Errors:       12
Active:       1 connection

Memory:       24.5 MB
Last Request: 3s ago

Health Fields

Field	Description
Version	Daemon version
Uptime	Time since daemon started
Provider	Current AI provider
Requests	Total requests handled
Errors	Failed requests
Active	Current concurrent connections
Memory	Memory usage (Linux only)
Last Request	Time since last activity

Using Health for Troubleshooting

# Check if daemon is responsive
whisp health

# High error count? Check logs
whisp health | grep Errors

# Memory growing? Consider restart
whisp restart

Detailed Metrics

Get comprehensive usage statistics:

whisp metrics

Output:

Whisp Metrics (last week)
─────────────────────────

Performance
  Average response:  245ms
  P50 response:      180ms
  P95 response:      520ms
  P99 response:      890ms
  Requests/hour:     12.4

Token Usage
  Input tokens:      45.2K
  Output tokens:     23.1K
  Total tokens:      68.3K

Requests by Type
  query:    842
  explain:  156
  error:    89
  chat:     72
  dryrun:   45
  pipe:     34

Metrics Fields

Performance:

Metric	Description
Average response	Mean response time
P50/P95/P99	Percentile response times
Requests/hour	Request rate

Token Usage:

Metric	Description
Input tokens	Tokens sent to AI
Output tokens	Tokens received from AI
Total tokens	Combined usage

Request Types:

Type	Description
query	Natural language command generation
explain	Command explanations (`,. cmd`)
error	Error recovery suggestions
chat	Interactive chat messages
dryrun	Dry-run analysis (`,d cmd`)
pipe	Piped input processing

Time Period Filter

Filter metrics by time period:

# Today only
whisp metrics --period day

# Last 7 days (default)
whisp metrics --period week

# Last 30 days
whisp metrics --period month

# All time
whisp metrics --period all

JSON Output

Get metrics as JSON for scripts or monitoring:

whisp metrics --json

{
  "uptime_secs": 186735,
  "avg_response_ms": 245,
  "p50_response_ms": 180,
  "p95_response_ms": 520,
  "p99_response_ms": 890,
  "requests_per_hour": 12.4,
  "total_input_tokens": 45234,
  "total_output_tokens": 23156,
  "request_type_counts": {
    "query": 842,
    "explain": 156,
    "error": 89,
    "chat": 72,
    "dryrun": 45,
    "pipe": 34
  }
}

Diagnostic Check

Run comprehensive diagnostics:

whisp doctor

Output:

Whisp Diagnostic Check
──────────────────────
✓ Daemon is running (PID 12345)
✓ Socket exists at /tmp/whisp.sock
✓ Socket is responsive
✓ Configuration is valid
✓ Config file permissions OK (600)
✓ API key configured for openai
✓ Python3 available
✓ netcat (nc) available

All checks passed!

Checks Performed

Check	Description
Daemon running	Process exists and PID file valid
Socket exists	Unix socket file present
Socket responsive	Can communicate with daemon
Config valid	TOML syntax correct
Config permissions	File not world-readable
API key	Key configured for active provider
Python3	Required tool available
netcat	Required tool available

When Checks Fail

✗ Daemon is not running

Hint: Run 'whisp start' to start the daemon

⚠ Config file permissions too open (644)

Hint: Run 'chmod 600 ~/.config/whisp/config.toml'

✗ API key not configured for anthropic

Hint: Run 'whisp init' to configure or set ANTHROPIC_API_KEY

Monitoring Integration

Prometheus/Grafana

Export metrics periodically:

# Cron job to export metrics
*/5 * * * * whisp metrics --json >> /var/log/whisp-metrics.jsonl

Health Check Script

#!/bin/bash
# healthcheck.sh

if ! whisp health > /dev/null 2>&1; then
    echo "Whisp daemon unhealthy, restarting..."
    whisp restart
fi

Alerting on Errors

# Check error rate
errors=$(whisp metrics --json | jq '.request_type_counts.error // 0')
total=$(whisp metrics --json | jq '[.request_type_counts[]] | add')

if [ "$errors" -gt 0 ] && [ "$total" -gt 0 ]; then
    error_rate=$(echo "scale=2; $errors / $total * 100" | bc)
    if (( $(echo "$error_rate > 10" | bc -l) )); then
        echo "High error rate: ${error_rate}%"
    fi
fi

Understanding Token Usage

Tokens are the billing unit for AI providers. Each query uses:

Input tokens: Your query + context (directory, history, etc.)
Output tokens: The AI's response

Typical usage per request type:

Type	Input	Output	Total
query	~150	~80	~230
explain	~200	~150	~350
chat	~300+	~200	~500+
dryrun	~180	~120	~300

Chat uses more tokens because it includes conversation history.

Cost Estimation

Whisp automatically fetches model pricing from LiteLLM's community-maintained pricing database. Pricing data is cached locally for 24 hours at ~/.whisp/model_prices.json.

This means cost estimates are automatically accurate for:

OpenAI models (GPT-4o, GPT-5-nano, etc.)
Anthropic models (Claude)
Google Gemini models
Cerebras models
Most major providers

Local providers (Ollama) are automatically marked as free.

Per-Token Pricing

Cost is calculated as: (input_tokens × input_price) + (output_tokens × output_price)

Provider	Model	Input $/1M	Output $/1M
OpenAI	gpt-5-nano	~$0.10	~$0.40
OpenAI	gpt-4o-mini	~$0.15	~$0.60
OpenAI	gpt-4o	~$2.50	~$10.00
Anthropic	claude-haiku	~$0.25	~$1.25
Anthropic	claude-sonnet	~$3.00	~$15.00
Gemini	gemini-1.5-flash	~$0.075	~$0.30
Cerebras	gpt-oss-120b	Varies	Varies
Ollama	any	Free	Free

Prices change frequently. Check provider pricing pages for current rates.

Cost by Provider (Approximate)

Based on typical whisp usage (~500 tokens/request):

Provider	Model	Cost per 1000 requests
OpenAI	gpt-4o-mini	~$0.08
OpenAI	gpt-4o	~$2.50
Anthropic	claude-haiku	~$0.13
Anthropic	claude-sonnet	~$1.50
Gemini	gemini-1.5-flash	~$0.04
Ollama	any	Free (local)

Prices are estimates based on typical usage. Actual costs depend on query complexity and context size.

Viewing Cost Data

# Get total tokens and estimated cost for the month
whisp metrics --period month --json | jq '{
  input_tokens: .total_input_tokens,
  output_tokens: .total_output_tokens,
  total_tokens: (.total_input_tokens + .total_output_tokens)
}'

Cache Behavior

Pricing data is fetched from GitHub and cached at ~/.whisp/model_prices.json. The cache strategy:

Fresh cache (< 24 hours old): Use cached data immediately
Stale cache (> 24 hours): Attempt to fetch new data
Network failure: Fall back to stale cache if available
No cache: Fetch with 10-second timeout

Delete the cache file to force a refresh: rm ~/.whisp/model_prices.json

See Providers & Models for detailed pricing information.

Performance Optimization

If response times are slow:

Check provider: Some providers are faster than others
Use a faster model: gpt-5-nano-2025-08-07 vs gpt-4o
Use Ollama locally: No network latency
Check network: ping api.openai.com

If memory usage is high:

# Restart daemon to clear memory
whisp restart

Health & Metrics

Whisp provides commands to monitor daemon health and track usage statistics.

Health Check

Get a quick overview of daemon status:

whisp health

Output:

Daemon Health
─────────────
Version:      0.5.0
Uptime:       2d 4h 32m 15s
Provider:     openai

Requests:     1,247
Errors:       12
Active:       1 connection

Memory:       24.5 MB
Last Request: 3s ago

Health Fields

Field	Description
Version	Daemon version
Uptime	Time since daemon started
Provider	Current AI provider
Requests	Total requests handled
Errors	Failed requests
Active	Current concurrent connections
Memory	Memory usage (Linux only)
Last Request	Time since last activity

Using Health for Troubleshooting

# Check if daemon is responsive
whisp health

# High error count? Check logs
whisp health | grep Errors

# Memory growing? Consider restart
whisp restart

Detailed Metrics

Get comprehensive usage statistics:

whisp metrics

Output:

Whisp Metrics (last week)
─────────────────────────

Performance
  Average response:  245ms
  P50 response:      180ms
  P95 response:      520ms
  P99 response:      890ms
  Requests/hour:     12.4

Token Usage
  Input tokens:      45.2K
  Output tokens:     23.1K
  Total tokens:      68.3K

Requests by Type
  query:    842
  explain:  156
  error:    89
  chat:     72
  dryrun:   45
  pipe:     34

Metrics Fields

Performance:

Metric	Description
Average response	Mean response time
P50/P95/P99	Percentile response times
Requests/hour	Request rate

Token Usage:

Metric	Description
Input tokens	Tokens sent to AI
Output tokens	Tokens received from AI
Total tokens	Combined usage

Request Types:

Type	Description
query	Natural language command generation
explain	Command explanations (`,. cmd`)
error	Error recovery suggestions
chat	Interactive chat messages
dryrun	Dry-run analysis (`,d cmd`)
pipe	Piped input processing

Time Period Filter

Filter metrics by time period:

# Today only
whisp metrics --period day

# Last 7 days (default)
whisp metrics --period week

# Last 30 days
whisp metrics --period month

# All time
whisp metrics --period all

JSON Output

Get metrics as JSON for scripts or monitoring:

whisp metrics --json

{
  "uptime_secs": 186735,
  "avg_response_ms": 245,
  "p50_response_ms": 180,
  "p95_response_ms": 520,
  "p99_response_ms": 890,
  "requests_per_hour": 12.4,
  "total_input_tokens": 45234,
  "total_output_tokens": 23156,
  "request_type_counts": {
    "query": 842,
    "explain": 156,
    "error": 89,
    "chat": 72,
    "dryrun": 45,
    "pipe": 34
  }
}

Diagnostic Check

Run comprehensive diagnostics:

whisp doctor

Output:

Whisp Diagnostic Check
──────────────────────
✓ Daemon is running (PID 12345)
✓ Socket exists at /tmp/whisp.sock
✓ Socket is responsive
✓ Configuration is valid
✓ Config file permissions OK (600)
✓ API key configured for openai
✓ Python3 available
✓ netcat (nc) available

All checks passed!

Checks Performed

Check	Description
Daemon running	Process exists and PID file valid
Socket exists	Unix socket file present
Socket responsive	Can communicate with daemon
Config valid	TOML syntax correct
Config permissions	File not world-readable
API key	Key configured for active provider
Python3	Required tool available
netcat	Required tool available

When Checks Fail

✗ Daemon is not running

Hint: Run 'whisp start' to start the daemon

⚠ Config file permissions too open (644)

Hint: Run 'chmod 600 ~/.config/whisp/config.toml'

✗ API key not configured for anthropic

Hint: Run 'whisp init' to configure or set ANTHROPIC_API_KEY

Monitoring Integration

Prometheus/Grafana

Export metrics periodically:

# Cron job to export metrics
*/5 * * * * whisp metrics --json >> /var/log/whisp-metrics.jsonl

Health Check Script

#!/bin/bash
# healthcheck.sh

if ! whisp health > /dev/null 2>&1; then
    echo "Whisp daemon unhealthy, restarting..."
    whisp restart
fi

Alerting on Errors

# Check error rate
errors=$(whisp metrics --json | jq '.request_type_counts.error // 0')
total=$(whisp metrics --json | jq '[.request_type_counts[]] | add')

if [ "$errors" -gt 0 ] && [ "$total" -gt 0 ]; then
    error_rate=$(echo "scale=2; $errors / $total * 100" | bc)
    if (( $(echo "$error_rate > 10" | bc -l) )); then
        echo "High error rate: ${error_rate}%"
    fi
fi

Understanding Token Usage

Tokens are the billing unit for AI providers. Each query uses:

Input tokens: Your query + context (directory, history, etc.)
Output tokens: The AI's response

Typical usage per request type:

Type	Input	Output	Total
query	~150	~80	~230
explain	~200	~150	~350
chat	~300+	~200	~500+
dryrun	~180	~120	~300

Chat uses more tokens because it includes conversation history.

Cost Estimation

Whisp automatically fetches model pricing from LiteLLM's community-maintained pricing database. Pricing data is cached locally for 24 hours at ~/.whisp/model_prices.json.

This means cost estimates are automatically accurate for:

OpenAI models (GPT-4o, GPT-5-nano, etc.)
Anthropic models (Claude)
Google Gemini models
Cerebras models
Most major providers

Local providers (Ollama) are automatically marked as free.

Per-Token Pricing

Cost is calculated as: (input_tokens × input_price) + (output_tokens × output_price)

Provider	Model	Input $/1M	Output $/1M
OpenAI	gpt-5-nano	~$0.10	~$0.40
OpenAI	gpt-4o-mini	~$0.15	~$0.60
OpenAI	gpt-4o	~$2.50	~$10.00
Anthropic	claude-haiku	~$0.25	~$1.25
Anthropic	claude-sonnet	~$3.00	~$15.00
Gemini	gemini-1.5-flash	~$0.075	~$0.30
Cerebras	gpt-oss-120b	Varies	Varies
Ollama	any	Free	Free

Prices change frequently. Check provider pricing pages for current rates.

Cost by Provider (Approximate)

Based on typical whisp usage (~500 tokens/request):

Provider	Model	Cost per 1000 requests
OpenAI	gpt-4o-mini	~$0.08
OpenAI	gpt-4o	~$2.50
Anthropic	claude-haiku	~$0.13
Anthropic	claude-sonnet	~$1.50
Gemini	gemini-1.5-flash	~$0.04
Ollama	any	Free (local)

Prices are estimates based on typical usage. Actual costs depend on query complexity and context size.

Viewing Cost Data

# Get total tokens and estimated cost for the month
whisp metrics --period month --json | jq '{
  input_tokens: .total_input_tokens,
  output_tokens: .total_output_tokens,
  total_tokens: (.total_input_tokens + .total_output_tokens)
}'

Cache Behavior

Pricing data is fetched from GitHub and cached at ~/.whisp/model_prices.json. The cache strategy:

Fresh cache (< 24 hours old): Use cached data immediately
Stale cache (> 24 hours): Attempt to fetch new data
Network failure: Fall back to stale cache if available
No cache: Fetch with 10-second timeout

Delete the cache file to force a refresh: rm ~/.whisp/model_prices.json

See Providers & Models for detailed pricing information.

Performance Optimization

If response times are slow:

Check provider: Some providers are faster than others
Use a faster model: gpt-5-nano-2025-08-07 vs gpt-4o
Use Ollama locally: No network latency
Check network: ping api.openai.com

If memory usage is high:

# Restart daemon to clear memory
whisp restart

Health & Metrics

On this page

Health & Metrics

On this page