배경

목적

원인 확인

개선 방향

구분 AS-IS TO-BE
모니터링 방식 Datadog 에서 AWS Integration 을 통해 CloudWatch Metric 크롤링하여 수집한 지표 monitor → Webhook (Teams, AlertNow 등) → 알람 발송 AWS EventBridge Rule → AWS SNS Topic → AmazonQ (Teams), AlertNow 구독 → 알람 발송
**aws metric or
event pattern** avg(last_5m):avg:aws.rds.engine_uptime{aws_account:514738259429} by {aws_account,availability-zone,dbinstanceidentifier} <= 60 AWS EventBridge Rule

추가 작업대상

구분 AS-IS TO-BE
모니터링 방식 Datadog 에서 AWS Integration 을 통해 CloudWatch Metric 크롤링하여 수집한 지표 monitor → Webhook (Teams, AlertNow 등) → 알람 발송 AWS CloudWatch Alarm → AWS SNS Topic → AmazonQ (Teams), AlertNow 구독 → 알람 발송

| 구분 | AWS Integration 방식 관제 대상 중 서비스에 Critical 한 관제 목록 | Metric | | --- | --- | --- | | aws metric 1 | EC2_StatusCheckFailed | aws.ec2.status_check_failed | | aws metric 2 | LB_UnhealthyHostCount | aws.applicationelb.un_healthy_host_count aws.networkelb.un_healthy_host_count | | aws metric 3 | ElastiCache CPUUtilization | aws.elasticache.cpuutilization | | aws metric 4 | ElastiCache EngineCPUUtilization | aws.elasticache.engine_cpuutilization | | aws metric 5 | ElastiCache DatabaseMemoryUsagePercentage | aws.elasticache.database_memory_usage_percentage | | aws metric 6 | MemoryDB CPUUtilization | aws.memorydb.cpuutilization | | aws metric 7 | MemoryDB EngineCPUUtilization | aws.memorydb.engine_cpuutilization | | aws metric 8 | MemoryDB DatabaseMemoryUsagePercentage | aws.memorydb.database_memory_usage_percentage | | aws metric 9 | MemoryDB KeyspaceMisses | aws.memorydb.KeyspaceMisses_Count |