Monitoring

LangGuard provides comprehensive monitoring capabilities to help you understand the health and performance of your AI operations.

Dashboard Overview

The main dashboard displays key metrics and recent activity at a glance.

Key Metrics

The dashboard header shows aggregate metrics:

Metric	Description
Total Traces	Number of traces ingested
Success Rate	Percentage of successful traces
Avg Latency	Mean trace duration
Active Agents	Agents with recent activity
Policy Violations	Count of triggered policies

Time Range Selection

Use the time range selector to adjust the analysis period:

Last Hour
Last 24 Hours (default)
Last 7 Days
Last 30 Days
Custom Range

All metrics update automatically when you change the time range.

Metrics Deep Dive

Success Rate

The success rate shows the percentage of traces that completed without errors:

Success Rate: 94.2%
───────────────────────────────
████████████████████░░░░ 94.2%

Interpretation:

≥ 95% - Excellent (green)
85-95% - Acceptable (yellow)
< 85% - Needs attention (red)

Latency

Track response times across your agents:

P50 (Median) - Half of requests complete within this time
P95 - 95% of requests complete within this time
P99 - 99% of requests complete within this time
Average - Mean duration

Latency Distribution (P95)
────────────────────────────────
< 100ms  ████████ 40%
100-500ms ██████████████ 35%
500ms-1s ████████ 15%
> 1s     ████ 10%

Token Usage

Monitor token consumption for cost management:

Input Tokens - Tokens in requests
Output Tokens - Tokens in responses
Total Tokens - Combined usage
Cost Estimate - Based on model pricing

Volume Trends

See request patterns over time:

Requests per Hour (Last 24h)
|                    ╭────╮
|              ╭─────╯    ╰───╮
|         ╭────╯              ╰───╮
| ────────╯                        ╰────
└───────────────────────────────────────
    12am   6am   12pm   6pm   12am

Agent Health

Agent Status Cards

Each discovered agent has a health card showing:

┌──────────────────────────────────────┐
│  CustomerService Agent         [●●●] │
├──────────────────────────────────────┤
│  Status: ● Healthy                   │
│  Success Rate: 94.2%   ▲ +2.3%      │
│  Avg Latency: 1.2s     ▼ -0.1s      │
│  Traces (24h): 1,234                 │
│  Last Active: 5 minutes ago          │
└──────────────────────────────────────┘

Health Status Indicators

Status	Meaning
🟢 Healthy	Success rate ≥ 95%, no recent errors
🟡 Warning	Success rate 85-95% or elevated latency
🔴 Critical	Success rate < 85% or many errors
⚪ Inactive	No activity in selected time range

Integration Health

Connection Status

Monitor the health of your data source connections:

┌─────────────────────────────────────────┐
│  Integrations                            │
├─────────────────────────────────────────┤
│  Langfuse (Production)    ● Connected   │
│  Last Sync: 2 minutes ago               │
│  Traces Synced: 12,456                  │
├─────────────────────────────────────────┤
│  Databricks MLflow        ● Connected   │
│  Last Sync: 5 minutes ago               │
│  Traces Synced: 3,421                   │
└─────────────────────────────────────────┘

Sync History

View recent sync operations:

Time	Integration	Status	Items
2 min ago	Langfuse	✓ Success	45 traces
7 min ago	Langfuse	✓ Success	52 traces
12 min ago	Databricks	✓ Success	23 traces
17 min ago	Langfuse	⚠ Warning	0 traces (rate limited)

Policy Violations

Violation Summary

The dashboard shows recent policy violations:

Policy Violations (Last 24h)
────────────────────────────────
Critical:  3  ████
High:      8  ██████████
Medium:   15  ████████████████████
Low:       7  █████████

Recent Violations

Quick list of recent policy triggers:

Time	Policy	Agent	Severity
5 min ago	PII Detection	EmailBot	Critical
12 min ago	Token Limits	ChatBot	Medium
1 hour ago	SQL Injection	DataAgent	High

Click any violation to view details in the Trace Explorer.

Real-time Updates

Auto-Refresh

The dashboard automatically refreshes based on your sync interval:

Metrics update after each sync
Agent status refreshes in real-time
New violations appear immediately

Manual Refresh

Click the refresh button (↻) to force an update.

Exporting Data

Export Formats

Export monitoring data for external analysis:

JSON - Complete data with metadata
CSV - Spreadsheet-compatible format
PDF - Formatted report (coming soon)

API Access

Access metrics programmatically:

# Get dashboard metrics
curl -X GET "https://api.langguard.ai/v1/metrics" \
  -H "Authorization: Bearer $API_KEY"

Response:

{
  "timeRange": "24h",
  "metrics": {
    "totalTraces": 12456,
    "successRate": 0.942,
    "avgLatency": 1.23,
    "activeAgents": 8,
    "policyViolations": 33
  }
}

Best Practices

1. Set Up Baselines

Establish baseline metrics for comparison:

Document normal success rates
Record typical latency ranges
Track average daily volumes

2. Monitor Trends

Look for patterns rather than absolute values:

Gradual latency increases
Declining success rates
Volume anomalies

3. Regular Reviews

Schedule periodic reviews:

Daily: Quick health check
Weekly: Trend analysis
Monthly: Deep dive and optimization

4. Configure Alerts (Coming Soon)

Set up alerts for critical conditions:

Success rate drops below threshold
Latency exceeds limit
Policy violations spike

Next Steps

Trace Explorer - Investigate specific traces
Agent Activity - Detailed agent analysis
Policies - Set up governance rules

Dashboard Overview​

Key Metrics​

Time Range Selection​

Metrics Deep Dive​

Success Rate​

Latency​

Token Usage​

Volume Trends​

Agent Health​

Agent Status Cards​

Health Status Indicators​

Integration Health​

Connection Status​

Sync History​

Policy Violations​

Violation Summary​

Recent Violations​

Real-time Updates​

Auto-Refresh​

Manual Refresh​

Exporting Data​

Export Formats​

API Access​

Best Practices​

1. Set Up Baselines​

2. Monitor Trends​

3. Regular Reviews​

4. Configure Alerts (Coming Soon)​

Next Steps​