PLAN-v1.6.7 — error handling for the rest of the commands
IMPLEMENTATION RULES: Before implementing this plan, read and follow:
- WORKFLOW.md — the implementation process
- PLANS.md — plan structure and best practices
Status: Completed
Goal: Extend the v1.6.6 failure-handling pattern (fetch the actual reason + give an actionable next step, not "failed + URL") to the rest of the commands that currently emit cryptic ADO errors: merge-pr / add-service's PR-merge step, read_iac_variables REST failures, create-pr errors, and update's git-pull conflict.
Last Updated: 2026-05-29
Prerequisites: PLAN-v1.6.6-ux-and-error-handling ships report_pipeline_failure + the action-pattern table format. This plan reuses both.
Branch: standalone — feat/v1.6.7-error-handling-rest. Single PR, version bump 1.6.6 → 1.6.7.
What changes
1. report_pr_merge_failure <project> <pr-id>
When merge_pr_in_project (v2) or squash_complete_pr (v1) fails to mark a PR completed, fetch the PR + its branch policies and tell the user exactly which policy blocked them. Examples of common blockers in ADO:
- Required reviewers haven't approved
- Required build hasn't run / passed
- Linked work items missing
- Auto-complete waiting on something
- PR is in draft mode
Output format (matches PLAN-v1.6.6 report_pipeline_failure):
✗ FAILED to merge PR #4839 in TCH900001-mrkmedlem (FrontendPlatform)
Reason: Required policy "Build validation" hasn't completed.
Last run: pending (queued 5 min ago).
Action: → Wait for the build to finish, then re-run:
noclickops merge-pr 4839
PR URL: https://dev.azure.com/RedCrossNorway/FrontendPlatform/_git/TCH900001-mrkmedlem/pullrequest/4839
Helper signature:
report_pr_merge_failure <project> <pr-id>
# 1. GET /_apis/policy/evaluations?artifactId=... to find the PR's policies + their state
# 2. Identify which policy is NOT 'approved' yet
# 3. Pattern-match against the policy-action table
# 4. Print the formatted block
Initial policy-action table:
| Policy type | Status: not approved | Action |
|---|---|---|
| Build validation | pending | Wait for the build to finish, then re-run merge-pr |
| Required reviewers | waiting | Ping the listed reviewers; they need to approve |
| Linked work items | violation | Link a work item to the PR (web UI) |
| Comments | violation | Resolve all comments on the PR |
| Auto-complete | n/a | Already set; will complete when blockers clear |
| (no match) | See PR in web UI for details |
Used by both:
merge_pr_in_projectinlib/service-v2.sh(v2; called byadd-service)squash_complete_prinlib/azdo.sh(v1; called bymerge-prand the v1 surface)
2. report_rest_failure <verb> <url> <http-status> <body>
When _nco_ado_rest_get (or any wrapped REST call) returns a non-2xx, decode the status + body and explain. Most common cases for read_iac_variables:
| Status | Body signature | Action |
|---|---|---|
| 401 | TF400813 (invalid auth) | Refresh az login: az logout && az login |
| 403 | TF401019 (no read access on repo) | You don't have read access to IaC/platform-infrastructure. Ask the admin for "Reader" on that repo. |
| 404 | item not found at path | Variables file doesn't exist at the expected path. Has the IaC PR-B for <svc> been merged? Check: noclickops status for recent <repo>-infra-add-service runs. |
| 5xx | transient | ADO is having a moment. Retry in ~30s. |
| (other) | Network or unknown error. Check connectivity. |
Output format:
✗ FAILED: GET https://dev.azure.com/.../variables/common.yaml (HTTP 404)
Reason: Variables file doesn't exist at the expected IaC path.
Action: → Has the IaC PR-B for 'smk1' been merged in platform-infrastructure?
Check: noclickops status | grep infra-add-service
read_iac_variables (and any other caller of _nco_ado_rest_get) replaces the bare die "failed to GET ..." with a call to report_rest_failure + exit.
3. create-pr error mapping
az repos pr create failures, mapped to friendlier output:
| az error | Action |
|---|---|
TF401179: An active pull request for the source and target branch already exists | A PR for this branch already exists. Use noclickops merge-pr <existing-id> instead. (Lookup the id, print it.) |
TF401028: The reference 'refs/heads/<branch>' does not exist | The branch isn't on the remote. Run: git push -u origin <branch> first. (Or noclickops should push automatically — investigate.) |
Source branch and target branch are the same | You're on main. Create a feature branch first: git checkout -b feature/<name>. |
Permission denied | You don't have permission to create PRs in this repo. Check ADO permissions. |
Output: same formatted block. Replaces the current raw az error.
4. update git-pull conflict
noclickops update runs git pull --ff-only in ~/.noclickops. If the install dir has diverged (uncommitted changes, fork commits, etc.), the pull fails with fatal: Not possible to fast-forward, aborting. — currently dies with a vague "resolve manually" message.
Better output:
✗ FAILED: noclickops update — local install has diverged from origin/main
Reason: ~/.noclickops has 2 commits not on origin/main, OR uncommitted changes.
Action: Pick ONE:
a) If you have no local edits worth keeping (most likely — this dir is install-only):
git -C ~/.noclickops fetch origin && git -C ~/.noclickops reset --hard origin/main
b) Wipe + reinstall (always safe):
rm -rf ~/.noclickops && curl -fsSL https://raw.githubusercontent.com/terchris/noclickops/main/install.sh | bash
c) If you DO have edits and want to keep them:
cd ~/.noclickops && git status # inspect, decide manually
bin/update.sh checks git status + git rev-list to detect WHICH divergence pattern applies, then prints the targeted message.
Phase 1: report_pr_merge_failure helper — DONE
Tasks
- 1.1 Add
report_pr_merge_failure <project> <pr-id>tolib/service-v2.sh. Calls_nco_ado_rest_getfor/_apis/policy/evaluations?artifactId=.... - 1.2 Action table (initial 5 entries per the table above). Same data-structure format as PLAN-v1.6.6's
report_pipeline_failure. - 1.3 Update
merge_pr_in_project(v2): onpr updatefailure, callreport_pr_merge_failurethen return 1. - 1.4 Update
squash_complete_pr(v1 inlib/azdo.sh): same treatment. Sourcelib/service-v2.shfor the helper (same source pattern that v1.6.4 introduced). - 1.5 Tests: stub the policy-evaluations REST response; assert the helper formats each known policy type correctly.
Validation
bash tests/run-all.sh green. Manual: create a PR with a deliberately failing build policy, run merge-pr — expect the formatted block.
User confirms phase is complete.
Phase 2: report_rest_failure helper — DONE
Tasks
- 2.1 Add
report_rest_failure <verb> <url> <http-status> [<body>]tolib/service-v2.sh. - 2.2 Status-action table (initial 5 entries).
- 2.3 Update
_nco_ado_rest_getto capture HTTP status + body on failure, return them as a structured error. - 2.4 Update
read_iac_variables: on REST failure, callreport_rest_failurethendie. Similar for any other callers. - 2.5 Tests: stub REST responses with 401 / 404 / 5xx; assert formatted output for each.
Validation
bash tests/run-all.sh green. Manual: deliberately invalidate the az token, run info — expect the 401 hint.
User confirms phase is complete.
Phase 3: create-pr error mapping — DONE
Tasks
- 3.1 In
bin/create-pr.sh, wrap theaz repos pr createcall: capture stderr, pattern-match against the error table, print the formatted block + exit 1. - 3.2 For the "PR already exists" case: also do a
az repos pr listto find the existing PR id and include it in the suggestion. - 3.3 Tests: stub az to return each known error; assert formatted output.
Validation
bash tests/run-all.sh green. Manual: try create-pr twice with the same branch — expect "PR already exists at #XXXX, use merge-pr <id>".
User confirms phase is complete.
Phase 4: update git-pull conflict diagnostics — DONE
Tasks
- 4.1 In
bin/update.sh, after agit pull --ff-onlyfailure: detect divergence pattern (uncommitted changes vs fork commits vs both) usinggit status --porcelain+git rev-list HEAD..origin/main+git rev-list origin/main..HEAD. - 4.2 Print the targeted message (matching the diagnosis) with the specific recovery commands.
- 4.3 Tests: simulate each divergence pattern in a temp repo; assert the right message prints.
Validation
bash tests/run-all.sh green. Manual: create a divergent install state and run noclickops update — expect the targeted recovery commands.
User confirms phase is complete.
Phase 5: Docs + version bump — DONE
Tasks
- 5.1 Update
website/docs/contributors/lib-service-v2.md: addreport_pr_merge_failure+report_rest_failureto the Public API table. - 5.2 Bump
version.txt→1.6.7.
Acceptance Criteria
-
merge-prandadd-service(via merge_pr_in_project) print the formatted PR-failure block when a policy blocks the merge -
read_iac_variablesHTTP errors print the formatted REST-failure block (at minimum for 401 / 404 / 5xx) -
create-pr"PR already exists" suggests the existing PR id + merge-pr command -
updategit-pull conflict prints the targeted recovery commands matching the divergence pattern -
tests/run-all.shgreen -
version.txtshows1.6.7
Files to Modify
lib/service-v2.sh(addreport_pr_merge_failure,report_rest_failure)lib/azdo.sh(updatesquash_complete_prto source lib/service-v2.sh + call the helper)bin/create-pr.sh(wrap az error)bin/update.sh(divergence diagnostics)tests/test-PLAN-A-service-discovery.sh(lib-level helper tests)tests/test-PLAN-F-add-service.sh(PR-merge-failure path)tests/test-PLAN-008-info.sh(REST-failure path viaread_iac_variables, if still in tree)- New:
tests/test-PLAN-003-create-pr.shor extend existing — for create-pr error mapping - New:
tests/test-PLAN-101-update.shor extend — for update divergence website/docs/contributors/lib-service-v2.mdversion.txt
Implementation Notes
Reuse PLAN-v1.6.6's action-table format
Whatever data structure PLAN-v1.6.6 settles on for the pipeline-failure action table, reuse it for the three new tables. Avoid 4 different formats.
Out of scope for v1.6.7
statusshowing inline step-failure info when querying a specific run (could usereport_pipeline_failurebut it's lower priority — file as v1.6.8 if needed)sync-lovableerror handling (rsync errors are usually clear enough; revisit if real users complain)info/logs/shelldiscover_containerapp failures (the current "set SVC_APP_NAME_OVERRIDE" message is already good; only the underlying read_iac_variables REST failure case improves here)