Skip to content
AI Code Governance

Building an AI Code Review Gate in Your CI/CD Pipeline

Automated enforcement policies for agent-authored code with risk-based approval workflows and escalation paths.

Advanced 9 min read Updated May 2026

A standard code review catches typos and logic errors. An AI code review gate must answer a different question: "Is this agent-authored code safe to ship, given that we only partially trust it?"

The answer depends on what the code does, whether it was reviewed, and what the agent version was. A hardcoded approval gate won't work. You need a policy-driven gate that routes code to the right verification level.

The Three-Tier Review Model

Organizations shipping agent code typically implement three tiers of review intensity:

Tier 1: Automated Scanning (All Agent Code)

Every agent-authored PR runs through automated gates before human eyes see it:

  • Syntax & compilation: Does the code parse and compile?
  • Test execution: Do existing tests pass? Do new tests pass?
  • Security scanning (SAST): No SQL injection, path traversal, hardcoded secrets?
  • License scanning (SCA): Is the code derived from copyleft sources?
  • Code metrics: Cyclomatic complexity, nesting depth, function length
  • Dependency checks: Any new high-risk dependencies?

This catches ~60–70% of obvious agent failures. If this tier rejects the code, the agent gets no further.

Tier 2: Risk-Based Human Review

For code that passes Tier 1:

  • High-risk code → Mandatory human review (auth, crypto, payment, data handling)
  • Medium-risk code → Spot-check (10–20% sample review)
  • Low-risk code → Automated approval with audit trail

Risk assessment is based on:

  • File path (does it touch sensitive code?)
  • Change size (is it a small patch or a large refactor?)
  • Confidence score (did the agent rate itself highly?)
  • History (has this code path had bugs before?)

Tier 3: Production Audit

Post-deployment:

  • Track every agent-authored commit in production
  • Monitor for runtime errors in agent code paths
  • If agent code causes incidents, escalate future agent code to Tier 2 review
  • Measure: agent code incident rate vs. human code incident rate

Implementation: GitHub Actions Example

Here's a complete implementation for a GitHub Actions-based review gate:

name: AI Code Review Gate

on: [pull_request]

jobs:
  detect-agent-code:
    runs-on: ubuntu-latest
    outputs:
      has_agent_commits: ${{ steps.check.outputs.has_agent_commits }}
      agent_commits: ${{ steps.check.outputs.agent_commits }}
      risk_level: ${{ steps.risk.outputs.risk_level }}
    
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      
      - name: Check for agent commits
        id: check
        run: |
          AGENT_COMMITS=$(git log origin/main..HEAD --format="%H %an %s" | grep -i "copilot\|cursor\|replit\|agent")
          
          if [ -z "$AGENT_COMMITS" ]; then
            echo "has_agent_commits=false" >> $GITHUB_OUTPUT
            exit 0
          fi
          
          echo "has_agent_commits=true" >> $GITHUB_OUTPUT
          echo "agent_commits<<EOF" >> $GITHUB_OUTPUT
          echo "$AGENT_COMMITS" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT

      - name: Assess risk level
        id: risk
        if: steps.check.outputs.has_agent_commits == 'true'
        run: |
          # Get changed files
          CHANGED_FILES=$(git diff --name-only origin/main..HEAD)
          
          # Risk scoring
          RISK_SCORE=0
          
          # Check for high-risk paths
          if echo "$CHANGED_FILES" | grep -qE 'auth|crypto|payment|secret|token|password'; then
            RISK_SCORE=$((RISK_SCORE + 100))
            echo "High-risk files detected (auth/crypto/payment)"
          fi
          
          # Check for large changes
          LINES_CHANGED=$(git diff --stat origin/main..HEAD | tail -1 | awk '{print $NF}')
          if [ "$LINES_CHANGED" -gt 200 ]; then
            RISK_SCORE=$((RISK_SCORE + 50))
            echo "Large change detected ($LINES_CHANGED lines)"
          fi
          
          # Check for low test coverage
          if ! git diff origin/main..HEAD | grep -q "test\|spec"; then
            RISK_SCORE=$((RISK_SCORE + 25))
            echo "No tests added with changes"
          fi
          
          # Assign risk level
          if [ "$RISK_SCORE" -ge 100 ]; then
            RISK_LEVEL="critical"
          elif [ "$RISK_SCORE" -ge 50 ]; then
            RISK_LEVEL="high"
          elif [ "$RISK_SCORE" -ge 25 ]; then
            RISK_LEVEL="medium"
          else
            RISK_LEVEL="low"
          fi
          
          echo "risk_level=$RISK_LEVEL" >> $GITHUB_OUTPUT
          echo "risk_score=$RISK_SCORE" >> $GITHUB_OUTPUT

  tier-1-automated:
    needs: detect-agent-code
    if: needs.detect-agent-code.outputs.has_agent_commits == 'true'
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Compile
        run: npm run build || exit 1
      
      - name: Run tests
        run: npm test || exit 1
      
      - name: SAST scanning
        uses: github/super-linter@v4
        env:
          DEFAULT_BRANCH: main
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      
      - name: License scanning
        run: |
          npm install -g syft
          syft . -o json > sbom.json
          # Flag copyleft licenses
          if grep -q '"license":.*GPL' sbom.json; then
            echo "⚠ GPL-licensed code detected in dependencies"
            exit 1
          fi
      
      - name: Dependency check
        uses: dependency-check/Dependency-Check_Action@main
        with:
          path: '.'
          format: 'JSON'
      
      - name: Comment on PR
        if: failure()
        uses: actions/github-script@v6
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '❌ **AI Code Review Gate — Tier 1 Automated Scanning Failed**\n\nAgent code failed one or more automated checks:\n- Syntax/compilation\n- Test execution\n- Security scanning\n- License compliance\n\nPlease fix before proceeding to human review.'
            })

  tier-2-human-review:
    needs: [detect-agent-code, tier-1-automated]
    if: needs.detect-agent-code.outputs.has_agent_commits == 'true' && success()
    runs-on: ubuntu-latest
    
    steps:
      - name: Request review based on risk
        uses: actions/github-script@v6
        with:
          script: |
            const riskLevel = '${{ needs.detect-agent-code.outputs.risk_level }}';
            
            const reviewPolicy = {
              critical: {
                reviewers: ['@security-team', '@platform-leads'],
                label: '🔴 critical-ai-review'
              },
              high: {
                reviewers: ['@code-reviewers'],
                label: '🟠 high-ai-review'
              },
              medium: {
                reviewers: ['@devs'],
                label: '🟡 medium-ai-review'
              },
              low: {
                reviewers: [],
                label: '🟢 low-ai-review'
              }
            };
            
            const policy = reviewPolicy[riskLevel] || reviewPolicy.medium;
            
            // Request reviewers
            if (policy.reviewers.length > 0) {
              github.rest.pulls.requestReviewers({
                owner: context.repo.owner,
                repo: context.repo.repo,
                pull_number: context.issue.number,
                reviewers: policy.reviewers
              });
            }
            
            // Add label
            github.rest.issues.addLabels({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              labels: [policy.label]
            });
            
            // Comment with guidance
            const guidance = {
              critical: 'This PR contains agent code in security-critical areas. Security team must review.',
              high: 'This PR contains significant agent-authored changes. Code review team should focus on edge cases and error handling.',
              medium: 'This PR contains agent code. Standard review applies, with focus on documented intent vs. generated output.',
              low: 'This PR contains low-risk agent code (comments, docs, utilities). Automated approval recommended.'
            };
            
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `🤖 **AI Code Review Gate — Tier 2 Human Review**\n\nRisk Level: **${riskLevel.toUpperCase()}**\n\n${guidance[riskLevel]}\n\n**Review Checklist for Agent Code:**\n- [ ] Does the generated code match the intent of the prompt?\n- [ ] Did the agent consider edge cases (null checks, empty arrays, error states)?\n- [ ] Are there any unsafe assumptions (e.g., input validation, race conditions)?\n- [ ] Does the code follow the team's patterns and conventions?\n- [ ] Would a human reviewer have written this differently? If so, why?\n`
            });

  tier-3-audit:
    needs: [detect-agent-code, tier-1-automated, tier-2-human-review]
    if: always()
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Log agent commit to audit database
        run: |
          AGENT_COMMITS="${{ needs.detect-agent-code.outputs.agent_commits }}"
          
          for commit in $AGENT_COMMITS; do
            curl -X POST https://audit-api.company.local/agent-commits \
              -H "Authorization: Bearer ${{ secrets.AUDIT_API_TOKEN }}" \
              -H "Content-Type: application/json" \
              -d '{
                "commit_hash": "'$commit'",
                "pr_url": "${{ github.server_url }}/${{ github.repository }}/pull/${{ github.event.pull_request.number }}",
                "risk_level": "${{ needs.detect-agent-code.outputs.risk_level }}",
                "passed_tier_1": ${{ needs.tier-1-automated.result == 'success' }},
                "timestamp": "'$(date -u +'%Y-%m-%dT%H:%M:%SZ')'",
                "branch": "${{ github.head_ref }}"
              }'
          done

Policy-as-Code Example

For teams wanting declarative policy, define agent review rules in a policy file:

# .github/ai-review-policy.yaml

version: 1
policies:
  - name: authentication-and-crypto
    risk: critical
    paths:
      - 'src/auth/**'
      - 'src/crypto/**'
      - 'src/security/**'
    rules:
      - type: require-human-review
        message: "Security-critical code must be reviewed by security team"
      - type: require-tests
        coverage_minimum: 90
      - type: block-if-confidence-below
        score: 0.85
      - type: require-signed-commits

  - name: payment-and-billing
    risk: critical
    paths:
      - 'src/payment/**'
      - 'src/billing/**'
    rules:
      - type: require-human-review
        reviewers: ['@payments-team']
      - type: require-financial-audit
      - type: block-if-confidence-below
        score: 0.90

  - name: core-features
    risk: high
    paths:
      - 'src/api/**'
      - 'src/database/**'
    rules:
      - type: require-human-review
        sample-rate: 0.20  # Review 20% of PRs
      - type: require-tests
        coverage_minimum: 70
      - type: require-changelog

  - name: documentation-and-comments
    risk: low
    paths:
      - '**/*.md'
      - '**/*.txt'
    rules:
      - type: auto-approve
        label: 'ai-approved'

Escalation and Incident Response

When agent code causes a production incident:

  1. Classify the incident — Was this code in Tier 1 (auto-approve low-risk) or Tier 2 (high-risk)?
  2. Review the decision — If Tier 1 code caused an incident, something is wrong with your risk assessment
  3. Update policy — Move that code path to higher risk level
  4. Notify agent platform — Report the issue to Copilot/Cursor/Replit for model feedback

Example escalation:

# When an incident involves agent code:
incident = {
  "code_path": "src/api/users.py",
  "agent_type": "copilot-coding-agent-v3",
  "review_tier": 1,  # Was auto-approved
  "issue": "null pointer exception in user lookup",
  "impact": "50 API failures over 10 minutes",
  "root_cause": "agent didn't check for empty user_id"
}

# Update policy to move this path to Tier 2
update_policy({
  "path": "src/api/users.py",
  "new_risk_level": "high",
  "reason": "Incident on 2026-04-29: agent missed null check"
})

# Block future agent commits in this path
block_agent_commits({
  "path": "src/api/users.py",
  "until": "2026-05-10",
  "reason": "Incident recovery period"
})

Metrics to Track

Once your review gate is running, measure:

  • Agent code merge rate — What % of agent PRs get merged vs. rejected?
  • Review time — How long does human review take for high-risk code?
  • Confidence correlation — Do high-confidence agent commits cause fewer incidents?
  • Incident rate — What % of production incidents involve agent code?
  • False positives — How many PRs does the gate reject that human review would have approved?

These metrics let you tune your policy: if agent code causes 5x more incidents than human code, raise the review bar. If your gate rejects 90% of agent code but human review approves 80%, your risk assessment is miscalibrated.

Sources

This article is part of the AI Code Governance knowledge series (6 articles) Browse all AI Code Governance articles →
Related Use Case

AI Code Traceability — Your developers don't write the code

Nobody has control anymore. Leaders have visibility.

Explore Use Case →