Syncing Rules at Scale: 50 to 500 Repos
At 5 repos: sync manually (rulesync pull in each project). At 50 repos: manual sync takes an hour and you will miss some. At 200+ repos: manual sync is impossible — you need automation. Bulk sync: the process of distributing rule updates to many repositories simultaneously. The challenges: parallel execution (syncing 200 repos sequentially takes hours — parallelism reduces to minutes), error handling (some repos will fail — network issues, permission errors, merge conflicts — the bulk sync must handle failures gracefully), and progress tracking (which repos synced successfully, which failed, which were skipped).
Bulk sync approaches: CLI-based (a script that loops through repos and runs rulesync pull for each), CI-based (a central workflow that triggers sync jobs in each target repo), and platform-based (RuleSync's built-in bulk sync feature that handles parallelism, errors, and tracking). Most organizations: start CLI-based at 20-50 repos, move to CI-based at 50-200 repos, and use platform-based at 200+ repos.
The bulk sync goal: a ruleset is updated → within 1 hour, every affected repo has a PR with the updated rules → teams merge at their own pace. The sync: is fast (minutes, not hours), reliable (handles failures gracefully), and traceable (a log shows which repos were updated, which failed, and why).
Step 1: CLI-Based Bulk Sync (20-50 Repos)
A shell script that iterates through repos and runs rulesync pull. The script: reads a list of repos from a file (repos.txt with one repo path per line), clones or navigates to each repo, runs rulesync pull, checks if files changed (git diff), and if changed: creates a branch, commits, and pushes (or opens a PR via the GitHub/GitLab API). Parallelism: use xargs -P 10 or GNU parallel to run 10 syncs simultaneously. 50 repos at 10 parallel: ~5 minutes total.
Error handling: the script captures the exit code of each rulesync pull. Success (exit 0): log as synced. Failure (non-zero): log the repo name, error message, and continue (do not abort the entire batch for one failure). After the batch: display a summary: 47 synced, 2 failed (repo-x: permission denied, repo-y: network timeout), 1 skipped (already current). The summary: actionable (fix the 2 failed repos manually or re-run for those repos).
Limitations: CLI-based bulk sync runs from one machine (your laptop or a CI runner). At 200+ repos: the machine's network and disk I/O become bottlenecks. The solution: CI-based bulk sync (distributed across multiple CI runners) or platform-based (RuleSync handles the distribution). AI rule: 'CLI-based bulk sync: sufficient for 20-50 repos. Beyond 50: the single-machine bottleneck slows the sync. Scale up to CI-based or platform-based.'
Sequential sync: 50 repos × 30 seconds each = 25 minutes. Parallel (10 at a time): 50 repos ÷ 10 parallel × 30 seconds = 2.5 minutes. The parallelism: 10x speedup with no additional complexity. Implementation: xargs -P 10 (bash), Promise.all with chunking (Node.js), or parallel jobs in CI. The bottleneck at 50 repos: is not compute or network. It is sequential execution. Parallelism: eliminates it.
Step 2: CI-Based Bulk Sync (50-200 Repos)
The central sync workflow: a GitHub Actions or GitLab CI workflow in the central rules repo. Trigger: on push to main (a ruleset changed) or on schedule (weekly drift detection). The workflow: reads the manifest (list of target repos with their ruleset assignments), and for each repo: triggers a sync job. GitHub Actions: use repository_dispatch to trigger a sync workflow in each target repo. GitLab CI: use the API to trigger a pipeline in each target project. The jobs: run in parallel across different CI runners — no single-machine bottleneck.
Fan-out pattern: the central workflow fans out to N target repos. Each target repo: has a sync workflow that runs rulesync pull, checks for changes, and opens a PR if the rules are updated. The central workflow: does not need to clone all repos. It only triggers the sync. Each target repo: handles its own sync. This distributed model: scales linearly (adding a repo: adds one more trigger, not more work for the central runner).
Progress aggregation: after the fan-out, collect the results. Each target sync job: reports success/failure to the central system (via a webhook, status check, or shared artifact). The central workflow: aggregates results into a summary. Display: in Slack, as a GitHub issue, or on the adoption dashboard. AI rule: 'CI-based bulk sync: each repo syncs itself. The central workflow: triggers and aggregates. No single point of bottleneck. Scales to 200+ repos.'
Repo 23 out of 200: fails with a network timeout. Without error handling: the script aborts. Repos 24-200: never synced. With proper error handling: repo 23 is logged as failed. The script continues to repos 24-200. After the batch: summary shows 199 synced, 1 failed. The operator: retries the one failed repo. The alternative: manual sync of 177 repos because the script died at repo 23. Always continue on failure. Always summarize at the end.
Step 3: Error Handling and Recovery
Common bulk sync errors: permission denied (the sync token does not have access to the repo — fix: update token permissions), merge conflict (the target repo has manual CLAUDE.md changes that conflict with the sync — fix: resolve the conflict in the PR), network timeout (transient — retry automatically), and repo archived (the repo is read-only — skip and log). Each error type: has a different recovery strategy. The bulk sync system: classifies errors and applies the appropriate strategy.
Retry logic: for transient errors (network timeout, API rate limiting): retry 3 times with exponential backoff (wait 1 sec, then 4 sec, then 16 sec). For permanent errors (permission denied, repo archived): do not retry. Log the error and move on. The retry logic: handles 90% of transient failures automatically. The remaining 10%: require manual intervention (usually permission fixes). AI rule: 'Automatic retry for transient errors. Manual investigation for permanent errors. The bulk sync should never fail entirely because of one repo's issue.'
Post-sync verification: after the bulk sync completes, run a verification pass. For each repo: rulesync pull --check (verify the rules are current). Any repo that fails the check: either the sync failed silently or the PR was not merged. The verification: catches edge cases that the sync's error handling missed. Run verification 1 hour after the sync (allowing time for PRs to be opened and network operations to complete). AI rule: 'Verification after bulk sync: the safety net. Even the best error handling misses edge cases. A verification pass: catches them.'
The bulk sync reports: 200 repos synced successfully. But: repo 45's PR was opened but could not be merged (protected branch without the right approvals). Repo 112's rules were written but the commit failed silently (disk full in CI). Without verification: these failures are invisible. With verification (rulesync pull --check 1 hour later): repo 45 and 112 are flagged as outdated. The verification: the safety net that catches what error handling missed.
Bulk Sync Summary
Summary of bulk syncing AI rules to many repositories.
- CLI-based (20-50 repos): shell script with parallel execution. 5 min for 50 repos at 10 parallel
- CI-based (50-200 repos): central workflow fans out to target repos. Each repo syncs itself. No bottleneck
- Platform-based (200+ repos): RuleSync built-in bulk sync. Handles parallelism, errors, and tracking
- Error handling: retry transient (3x with backoff). Log and skip permanent. Never abort batch for one failure
- Common errors: permission denied (fix token), merge conflict (resolve in PR), timeout (auto-retry)
- Progress tracking: summary after batch. N synced, N failed (with reasons), N skipped (already current)
- Verification: rulesync pull --check 1 hour after sync. Catches silent failures and unmerged PRs
- Goal: ruleset update → every affected repo has a PR within 1 hour