Anti-Nuke Trigger¶

Discord Gateway 이벤트를 감시하다가 임계값을 초과하는 파괴적 패턴을 감지했을 때, 긴급 스냅샷을 생성하고 owner 에게 알림을 보내며 Enterprise 는 자동 대응까지 수행하는 흐름. Temporal workflow 가 길드 단위로 장기 실행되며 rolling window 로 이벤트를 집계한다.

Scenario¶

Bot 프로세스가 Discord Gateway 에서 GUILD_ROLE_DELETE, CHANNEL_DELETE 등 이벤트를 수신한다. 이벤트를 길드별 AntiNuke Temporal workflow 에 signal 로 전달. Workflow 가 rolling window 에 누적하고 임계값 체크 후, 초과 시 respondToDetection 을 실행 — Discord audit log 조회로 가해자 추정, 긴급 스냅샷 생성, owner DM, 감사 채널 알림, Enterprise 면 자동 권한 박탈.

Actors¶

Attacker (가해자) — 길드 권한을 악용한 파괴자
Discord Gateway — 이벤트 전송
umbra-bot — Gateway 수신 후 signal 전달
AntiNuke Temporal Workflow — 길드별 장기 실행 감지자
AntiNuke Activities — audit log fetch, snapshot 생성, 알림, 권한 박탈
Snapshot sub-context — 긴급 스냅샷 생성
Notification (긴급) — Owner DM (preference 무시)
Restore — (Enterprise opt-in 시) 자동 복구 트리거 가능

Preconditions¶

Guild 의 active License feature ANTINUKE_DETECT 포함 (Pro+)
Guild 의 AntiNuke workflow 가 활성 상태 (Bot install 시 시작됨)
guild_configs.antinuke_enabled = true
Bot 이 Discord VIEW_AUDIT_LOG 권한 보유 (가해자 추정용)

Postconditions¶

recovery.antinuke_incidents 에 incident row
events.outbox 에 AntiNukeTriggered (항상), AntiNukeActioned (Enterprise 자동 대응 시)
Owner 에게 긴급 DM (Notification preference 무시)
감사 채널(설정되어 있으면) 알림
(옵션) recovery.snapshots 에 긴급 스냅샷 (실패해도 플로우 계속)
(Enterprise opt-in) 가해자 멤버의 역할 박탈
(Enterprise opt-in + 옵션) Restore workflow 자동 트리거

Detection patterns¶

Pattern	Default threshold	Window
`mass_role_delete`	5 role deletions	5 min
`mass_channel_delete`	3 channel deletions	5 min
`mass_kick`	10 member removals	5 min
`permission_escalation`	1 (Administrator 획득)	immediate

Enterprise 는 길드별 커스터마이징 가능 (guild_configs.antinuke_thresholds JSONB).

Sequence — Detection & Response¶

sequenceDiagram
    participant Attacker
    participant Discord
    participant Bot as umbra-bot
    participant WF as AntiNuke
Workflow
    participant AuditAct as Activity:
IdentifySuspects
    participant SnapAct as Activity:
CreateAntiNukeSnapshot
    participant NotifyAct as Activity:
PublishAntiNukeTriggered
    participant ActionAct as Activity:
RevokeMemberRoles
    participant Outbox
    participant Poller
    participant Notification

    loop Attack in progress
        Attacker->>Discord: Delete role
        Discord->>Bot: Gateway: GUILD_ROLE_DELETE
        Bot->>WF: Signal "event"
{pattern: mass_role_delete, timestamp, aggregate_id}
        WF->>WF: window.Push(ev)
window.EvictOld()
    end

    Note over WF: window.Len() >= threshold
    WF->>AuditAct: IdentifySuspects(guildID, pattern, window.Range())
    AuditAct->>Discord: GET /guilds/{id}/audit-logs
(filter by action_type)
    Discord-->>AuditAct: audit log entries
    AuditAct-->>WF: suspects [{user_id, confidence, actions}]

    WF->>SnapAct: CreateAntiNukeSnapshot(guildID)
    SnapAct->>SnapAct: snapshotSvc.Create(trigger=antinuke)
    alt Success
        SnapAct-->>WF: snapshot_id
    else Failure
        SnapAct-->>WF: error (continue anyway)
    end

    WF->>WF: INSERT antinuke_incidents
(pattern, window, suspects, snapshot_id)
    WF->>Outbox: INSERT AntiNukeTriggered event

    Note over Poller: ~2s
    Poller->>Notification: OnAntiNukeTriggered
(urgent, ignore preference)
    Notification->>Notification: DM to Owner
(dashboard link, details)
    Notification->>Notification: 감사 채널 메시지 (설정 있으면)

    alt Enterprise + auto_action enabled + suspects confident
        WF->>ActionAct: RevokeMemberRoles(guildID, suspect.user_id)
        ActionAct->>Discord: PATCH /guilds/{id}/members/{user_id}
{roles: []}
        Discord-->>ActionAct: 200
        ActionAct-->>WF: success

        WF->>WF: UPDATE incident actions_taken
        WF->>Outbox: INSERT AntiNukeActioned event
        Poller->>Notification: DM "자동 대응 실행: {actions}
취소 버튼 (TTL 10m)"
    end

    WF->>WF: window.Clear() (같은 패턴 연속 트리거 방지)
    WF->>WF: Wait for next signal / timer

Step-by-step¶

1. Bot 이 이벤트 signal 전달¶

umbra-bot 의 Gateway 핸들러:

// apps/bot/internal/handler/gateway.go
func (h *Handler) OnGuildRoleDelete(ev *events.GuildRoleDelete) {
    guildID := h.resolveGuildID(ev.GuildID)
    if guildID == uuid.Nil { return }

    // AntiNuke feature 체크 (캐시됨)
    if ok, _ := h.licensing.Can(ctx, guildID, FeatureAntiNukeDetect); !ok {
        return
    }

    // Temporal signal 전송
    h.temporal.SignalWorkflow(ctx, antinukeWorkflowID(guildID), "event", AntiNukeEvent{
        Pattern:            MassRoleDelete,
        Timestamp:          time.Now(),
        DiscordAggregateID: ev.RoleID,
    })
}

비슷한 핸들러: OnChannelDelete → MassChannelDelete, OnGuildMemberRemove → MassKick, OnGuildRoleUpdate/OnGuildMemberUpdate → PermissionEscalation.

2. Workflow 가 signal 수신¶

AntiNukeWorkflow (상세는 domain/recovery/antinuke.md):

func AntiNukeWorkflow(ctx workflow.Context, guildID uuid.UUID) error {
    windows := newRollingWindows(defaultThresholds(guildID))
    antinukeEnabled := true

    for {
        selector := workflow.NewSelector(ctx)

        selector.AddReceive(workflow.GetSignalChannel(ctx, "event"), func(c, more) {
            var ev EventSignal
            c.Receive(ctx, &ev)
            if !antinukeEnabled { return }

            window := windows[ev.Pattern]
            window.Push(ev)
            window.EvictOld(workflow.Now(ctx))

            if window.Len() >= window.Threshold() {
                respondToDetection(ctx, guildID, ev.Pattern, window)
                window.Clear()  // 연속 트리거 방지
            }
        })

        // 설정 업데이트, shutdown signal, 정기 evict timer 처리
        selector.AddReceive(workflow.GetSignalChannel(ctx, "config_update"), ...)
        selector.AddReceive(workflow.GetSignalChannel(ctx, "shutdown"), ...)
        selector.AddFuture(workflow.NewTimer(ctx, time.Minute), evictAllWindows)

        selector.Select(ctx)
    }
}

3. `respondToDetection` (감지 후 대응)¶

func respondToDetection(ctx, guildID, pattern, window) {
    // 1. 가해자 추정
    var suspects []Suspect
    workflow.ExecuteActivity(ctx, IdentifySuspects,
        guildID, pattern, window.Range(),
    ).Get(ctx, &suspects)

    // 2. 긴급 스냅샷 (best-effort)
    var snapshotID uuid.UUID
    _ = workflow.ExecuteActivity(ctx, CreateAntiNukeSnapshot, guildID).Get(ctx, &snapshotID)

    // 3. Incident 기록
    incidentID := uuid.NewV7()
    workflow.ExecuteActivity(ctx, RecordIncident, IncidentInput{
        ID: incidentID,
        GuildID: guildID,
        Pattern: pattern,
        WindowStart: window.Start(),
        WindowEnd: workflow.Now(ctx),
        EventsCount: window.Len(),
        Threshold: window.Threshold(),
        Suspects: suspects,
        SnapshotID: snapshotID,
    }).Get(ctx, nil)

    // 4. AntiNukeTriggered 이벤트 (Notification 긴급 알림)
    workflow.ExecuteActivity(ctx, PublishAntiNukeTriggered, incidentID).Get(ctx, nil)

    // 5. Enterprise auto-action
    plan, _ := getPlan(guildID)
    config, _ := getGuildConfig(guildID)
    if plan == Enterprise && config.AntiNukeAutoAction {
        actionsTaken := []ActionLog{}
        for _, s := range suspects {
            if s.Confidence > 0.8 && !s.IsOwner {
                err := workflow.ExecuteActivity(ctx, RevokeMemberRoles, guildID, s.DiscordUserID).Get(ctx, nil)
                actionsTaken = append(actionsTaken, ActionLog{
                    Type: "roles_revoked",
                    TargetUserID: s.DiscordUserID,
                    Success: err == nil,
                })
            }
        }

        workflow.ExecuteActivity(ctx, UpdateIncidentActions, incidentID, actionsTaken).Get(ctx, nil)
        workflow.ExecuteActivity(ctx, PublishAntiNukeActioned, incidentID, actionsTaken).Get(ctx, nil)
    }
}

4. Suspect identification (Activity)¶

Discord audit log 조회로 가해자 추정:

func IdentifySuspectsActivity(ctx, guildID, pattern, window TimeRange) ([]Suspect, error) {
    actionType := patternToAuditAction(pattern)
    entries, err := discord.FetchAuditLog(ctx, guildID, actionType, window.End.Add(time.Minute))
    if err != nil {
        // audit log 실패해도 incident 는 진행. 빈 suspects.
        return nil, nil
    }

    // Filter by window
    filtered := filterByTimeRange(entries, window)

    // Aggregate by executor
    counts := map[string]int{}
    for _, e := range filtered {
        counts[e.UserID]++
    }

    // Confidence = min(1.0, count / threshold)
    threshold := getThreshold(guildID, pattern)
    suspects := []Suspect{}
    for userID, count := range counts {
        confidence := math.Min(1.0, float64(count)/float64(threshold))
        isOwner := checkIsOwner(ctx, guildID, userID)
        suspects = append(suspects, Suspect{
            DiscordUserID: userID,
            Confidence: confidence,
            ActionCount: count,
            IsOwner: isOwner,
        })
    }

    // Top 3
    sort.Sort(byConfidence(suspects))
    if len(suspects) > 3 { suspects = suspects[:3] }
    return suspects, nil
}

5. 긴급 스냅샷 생성¶

func CreateAntiNukeSnapshotActivity(ctx, guildID) (uuid.UUID, error) {
    snap, err := snapshotSvc.Create(ctx, guildID, AntiNuke, CreateOptions{})
    if err != nil {
        logger.Error("antinuke snapshot failed", err)
        return uuid.Nil, nil  // 실패 허용 (alert 는 계속)
    }
    return snap.ID, nil
}

중요: Activity 가 에러를 리턴하지 않음. Temporal retry 대상이 되면 사용자 알림이 지연됨. 실패는 metric 으로 기록하고 플로우 계속.

6. Owner DM (긴급)¶

Notification consumer 의 특별 처리:

func (c *Consumer) OnAntiNukeTriggered(ctx, event) error {
    // ⚠️ Preference 무시 (긴급)
    return c.svc.EnqueueUrgent(NotificationRequest{
        RecipientType: "user_dm",
        RecipientID:   event.OwnerDiscordUserID,
        TemplateKey:   "antinuke.triggered",
        Payload: map[string]any{
            "pattern":     event.Pattern,
            "threshold":   event.Threshold,
            "events":      event.EventsCount,
            "suspects":    event.Suspects,
            "dashboard":   dashboardIncidentURL(event.IncidentID),
            "snapshot_id": event.SnapshotID,
        },
    })
}

DM 템플릿 (ko):

🚨 [Umbra 긴급] 이상 감지

길드: My Cool Guild
패턴: 역할 대량 삭제
최근 5분간 7개 역할 삭제 (임계값: 5)

의심되는 사용자:
1. @baduser (confidence 95%)

조치:
- 긴급 스냅샷 생성됨 ({snapshot_id})
- 대시보드에서 상세 확인: {dashboard_url}

Enterprise 고객: 자동 대응 실행 여부 하단 확인

7. Enterprise 자동 대응¶

RevokeMemberRoles Activity:

func RevokeMemberRolesActivity(ctx, guildID, discordUserID) error {
    // 안전장치: Owner 면 거부
    if isOwner, _ := checkIsOwner(ctx, guildID, discordUserID); isOwner {
        return errors.New("cannot revoke owner roles")
    }

    return discord.UpdateGuildMember(ctx, guildID, discordUserID, MemberUpdate{
        Roles: []string{},  // 모든 역할 제거
    })
}

8. Auto-action 사용자 알림¶

AntiNukeActioned 이벤트 → Notification:

자동 대응이 실행되었습니다:
- @baduser 의 역할 3개 박탈

10분 이내에 취소할 수 있습니다: [취소 버튼]

취소 버튼 클릭 시 별도 API 로 역할 복원 (Phase 2 에서 상세화).

9. Incident 기록 완료¶

recovery.antinuke_incidents 에 최종 row:

INSERT INTO recovery.antinuke_incidents (
    id, guild_id, pattern,
    detected_at, window_start, window_end,
    events_count, threshold,
    suspects, actions_taken,
    snapshot_id
) VALUES (...);

사용자가 대시보드에서 false_positive = true 마킹 가능 (임계값 튜닝 근거).

Auto-restore (Enterprise 추가 옵션)¶

Enterprise 가 추가로 "감지 시 자동 복구" 를 opt-in 한 경우 (MVP 에는 포함 안 됨, Phase 2+ 검토):

AntiNuke workflow 가 Restore workflow 를 signal 로 트리거
사용 스냅샷: 감지 직전 snapshot (안정 상태)
diff preview 생략, 즉시 실행
위험 높음 → 명시적 opt-in + 확실한 감지 패턴에만 적용

MVP 는 알림 + 역할 박탈까지만.

Failure cases¶

Audit log fetch 실패¶

When — 봇이 VIEW_AUDIT_LOG 없거나 Discord API 장애
Response — suspects 빈 배열로 진행, DM 에는 "가해자 정보 확인 불가" 표시
User experience — Owner 가 수동으로 Discord 감사 로그 확인 필요

긴급 스냅샷 생성 실패¶

When — DB 장애, size overflow 등
Response — snapshot_id = NULL 로 incident 기록, DM 은 여전히 발송
User experience — "스냅샷 생성 실패. 기존 scheduled 스냅샷을 확인하세요"

Owner DM 실패 (Discord 거부)¶

When — Owner 가 봇 DM 차단
Detection — Notification 403 응답
Response — 감사 채널 설정되어 있으면 해당 채널로 fallback, mention 포함
User experience — DM 대신 채널 알림

Auto-action 권한 부족¶

When — 봇의 역할이 대상 역할보다 낮음 (Discord 의 role hierarchy)
Detection — Discord 403
Response — 해당 user 는 skip, actions_taken 에 failed 기록
User experience — DM 에 "봇 권한 부족으로 자동 대응 일부 실패" 표시

Workflow 프로세스 crash¶

When — Worker 프로세스 장애
Detection — Temporal 이 다른 워커에 재할당
Response —
진행 중 Activity 는 재실행 (idempotent)
Rolling window 는 메모리 state 라 재시작 시 손실 → 진행 중이던 공격이 continue 되면 재감지에서 다시 trigger
Mitigation (Phase 2) — Rolling window 를 periodic Temporal search attribute 로 저장

공격자가 봇을 먼저 제거¶

When — 공격 시작 직후 봇 강퇴 (BOT_REMOVED)
Detection — BotKicked 이벤트
Response — AntiNuke workflow 는 shutdown signal 로 종료. 이미 쌓인 이벤트로 감지된 것만 처리 완료. 그 외는 손실.
Mitigation — 봇 강퇴에 MANAGE_GUILD 필요 → 가해자 권한 제약에 의존

Guild deleted¶

When — 공격자가 길드 자체 삭제 (owner 권한 탈취 시나리오)
Response — 치명적 손실. AntiNuke 는 이 시점에 이벤트 signal 수신 가능하지만 복구 대상 자체가 사라짐. 이 경우는 Discord 레벨에서만 복구 가능 (Discord 관리자 문의)

Edge cases¶

False positive (운영 이벤트)¶

Admin 이 합법적으로 길드 재구성 (대량 역할 정리 등)
AntiNuke 가 trigger → Owner 에게 알림
Owner 가 대시보드에서 false_positive = true 마킹 + 메모 작성
Phase 2: 반복 false positive 시 자동 threshold 조정 제안

동시 다발 패턴¶

여러 패턴이 같은 시간에 trigger (예: mass_role_delete + mass_kick)
각 패턴별 독립 rolling window 라 둘 다 respondToDetection 실행 가능
중복 Owner DM 보낼 수 있음 → 대시보드 incident 2개 기록

Owner 가 가해자인 경우¶

IsOwner 체크로 Enterprise auto-action 에서 제외
DM 은 Owner 에게 (스스로에게) 전송되어 효과 없음
Phase 2: registered "emergency contact" 에게 DM fallback

Permission escalation 즉시 감지¶

threshold=1 이라 1회 이벤트만으로 trigger
합법적 권한 부여도 감지될 수 있음 (예: Admin 이 부관리자에게 Administrator 부여)
Phase 2: 권한 부여자의 신원/이력 추가 검증

Enterprise 커스텀 threshold 변경¶

사용자가 대시보드에서 임계값 수정
guild_configs.antinuke_thresholds JSONB 업데이트
Workflow 에 config_update signal 전송 → 기존 windows 업데이트

Workflow restart 이후 이벤트 중복¶

Workflow restart 시 rolling window 초기화
재시작 이전 이벤트는 재전송 안 됨 → 잠재 감지 지연
재시작 이후 새 공격은 정상 감지
수용 가능한 한계 (인정하고 문서화)

Pro 에서 Enterprise 로 upgrade 중 trigger¶

Plan 변경은 즉시 반영이지만 config 의 antinuke_auto_action 은 기본 false
따라서 upgrade 직후에는 Pro 레벨 대응 (알림만). 사용자가 opt-in 해야 auto-action 발동
합리적 기본 동작

Involved domains¶

Domain	Role
Recovery (AntiNuke)	감지 + 대응 workflow (writer)
Recovery (Snapshot)	긴급 스냅샷
Recovery (Restore)	(Phase 2+) 자동 복구
Guild	GuildConfig 읽기 (thresholds, auto_action)
Licensing	Feature 체크
Notification	긴급 알림 (preference override)
Audit	이력 기록