Infrastructure Operations
Monitoring, alert routing, remediation history and service health across Hozzt production environments.
Live · All core services online
Service availabilityLast 24 hours99.99%Based on active service checks
MTTDLast 24 hours7 minMedian detection time
MTTRLast 24 hours21 minMedian recovery time
Open incidentsLast 24 hours62 waiting operator review
Production nodesLast 24 hours476Grouped by service role
Domains scannedLast 24 hours6,758HTTP, DNS, SSL and mail checks
Anomaly Detection
Scoring activeOpen Details
0.21Mean risk score
17Hosts above baseline
4Recurring patterns
98.4%Telemetry coverage
00:0006:0012:0018:00Now
| Host | Metric | Baseline | Current | Score | Disposition |
|---|---|---|---|---|---|
| TR-CPANEL-042 | Disk I/O wait | 7.2% | 18.6% | 0.74 | Cleanup workflow queued |
| EU-WEB-118 | HTTP p95 latency | 210ms | 520ms | 0.69 | Cache pool check |
| US-MAIL-021 | SMTP queue depth | 1,200 | 4,850 | 0.88 | Queue drain completed |
| TR-DB-014 | DB connections | 62% | 78% | 0.46 | Watch only |
Response Metrics
24-hour windowView Report
98.7%Detection SLA compliance
97.9%Recovery SLA compliance
84%Automated resolution rate
check: service-health --scope production
ok: dns, http, smtp, mysql, backup-agent
warn: 14 hosts outside normal baseline
ok: no global outage condition detected
ok: dns, http, smtp, mysql, backup-agent
warn: 14 hosts outside normal baseline
ok: no global outage condition detected
Automation Workflow
Policy engine onlineManage Workflows
CollectMetrics, logs, checks and host events
CorrelateGroup events by service and impact
ClassifyRisk level, blast radius and approval
ExecuteSafe action, webhook or operator task
VerifyHealth check, recovery note and closure
Monitoring Alerts
Active queueAll Alerts
| Time | Alert | Severity | Source | Owner | Status |
|---|---|---|---|---|---|
| 03:42 | Mail queue growth above baseline | Critical | Zabbix | Automation | Resolved |
| 03:35 | HTTP latency drift on EU-WEB pool | Warning | Grafana | NOC | Investigating |
| 03:19 | Disk pressure forecast on shared hosting node | Warning | Prometheus | Automation | Queued |
| 02:58 | Database connection saturation normalized | Recovered | Zabbix | System | Closed |
Remediation History
Last actionsHistory
SMTP queue drainedUS-MAIL-021 · completed in 6m 14s · post-check passed
Disk cleanup waiting verificationTR-CPANEL-042 · safe cleanup threshold matched
PHP-FPM pool recycledEU-WEB-118 · latency returned to normal range
Operator approval requiredTR-DB-014 · database action blocked by policy
Grafana Rules · Zabbix Actions · Prometheus Alerts
Configuration viewEdit Rules
Grafana: Latency Drift
Flags web pools when p95 response time remains outside normal operating range.
WHEN avg(http_latency_p95)
IS ABOVE service_baseline + 2.5σ
FOR 8 minutes
ROUTE TO web-opsZabbix: Service Recovery
Handles confirmed service failures with escalation, notification and recovery confirmation.
IF trigger = service.unavailable
AND group = production
THEN run safe_restart
ELSE escalate to NOCPrometheus: Capacity Pressure
Tracks disk, inode, memory and queue behavior before the node reaches hard limits.
alert: CapacityPressure
expr: node_filesystem_avail_bytes < 12%
for: 10m
labels: severity="warning"Service Impact Watchlist
Prioritized nodesOpen Watchlist
| Server / Service Group | Role | Checks | Risk | Next Check |
|---|---|---|---|---|
| TR-WEB-034 / shared-web | cPanel web pool | 842 | Medium | 4 min |
| US-MAIL-021 / mail-relay | SMTP queue node | 318 | Low | 2 min |
| EU-VPS-076 / managed-vps | Virtualization host | 146 | Medium | 6 min |
| TR-DNS-008 / dns-edge | Authoritative DNS | 529 | Low | 1 min |
| TR-BKP-012 / backup-agent | Backup verification | 204 | Info | 9 min |
Operations Notes
InternalNotes
note: no customer-wide incident open
policy: db actions require approval
runbook: mail queue remediation v3.4
backup: daily verification completed
status: core infrastructure stable
policy: db actions require approval
runbook: mail queue remediation v3.4
backup: daily verification completed
status: core infrastructure stable
