PLAN-A — v2 service discovery library
IMPLEMENTATION RULES: Before implementing this plan, read and follow:
- WORKFLOW.md — the implementation process
- PLANS.md — plan structure and best practices
Status: Completed
Goal: Build the v2 discovery library (lib/service-v2.sh) that all v2 command rewrites (PLAN-B through PLAN-F) will source. Single source of truth for resolving per-service / per-env context against the new two-project, two-repo layout.
v2 is Bash only. No .ps1 mirrors. The planned post-v2 rewrite to Bun (TypeScript) takes over multi-OS support; until then, Windows-native users either run via WSL/Git Bash or stay on v1.5.x.
Last Updated: 2026-05-29
Completed: 2026-05-29
Completion notes
- All 5 phases shipped: scaffold → YAML readers → pipeline/IaC-project discovery → containerapp/URL → docs.
- 376 tests pass; new
test-PLAN-A-service-discovery.shcontributes 68 of them across all 4 implementation phases. - v1
lib/service.shuntouched — no regression for in-flight v1 callers. - No customer-tenant tokens (
nrxetc.) anywhere inlib/service-v2.sh. - Per CLAUDE.md PR-per-investigation rule, this plan commits to the
feat/v2-new-target-structurebranch. PR opens after PLAN-F merges.
Investigation: INVESTIGATE-new-target-structure.md (see § "PLAN sequence → PLAN-A")
Blocks: PLAN-B (info), PLAN-C (deploy), PLAN-D (logs/shell), PLAN-E (clean-sample), PLAN-F (add-service). Each consumes the API this plan ships.
Overview
v1 has lib/service.sh which reads three YAML files from the source repo's .pipelines/variables/*.yaml (the FRT-shaped layout). It exports SVC_* globals consumed by bin/info, bin/logs, bin/shell.
v2 needs an equivalent module for the new layout, where:
- Per-service config lives in
services/<svc>/config.<env>.yaml(NEW location — sits with the service, not at repo root). - Repo-level variables live in
IaC/platform-infrastructure/environments/<TEAM>/<repo>/infrastructure/.pipelines/variables/{common,test,prod}.yaml(NEW location — IN A DIFFERENT REPO IN A DIFFERENT PROJECT). - Pipeline names span TWO ADO projects (
FrontendPlatform+IaC), five names per service (<repo>-<svc>-build,<repo>-<svc>-deploy,<repo>-<svc>-infra-build,<repo>-<svc>-deploy-test,<repo>-<svc>-deploy-prod). - Container app is
ca-<repo-prefix-lc>-<svc>inrg-<env>-<TENANT>-<repo-prefix-lc>.TENANTis customer-specific (nrxat Red Cross) and is NOT in any discoverable YAML. v2 must discover the RG viaaz containerapp listor accept a--resource-groupoverride; it must never hardcodenrx.
This plan ships lib/service-v2.sh + tests. Old lib/service.sh stays untouched — PLAN-B/C/D/E/F switch their respective callers over command-by-command. After PLAN-F merges, a follow-up removes v1 (both lib/service.sh and lib/service.ps1).
Co-existence strategy: new module, new file name. Existing tests + commands keep working unchanged. No behavior change for users until PLAN-B lands.
Public API
The new module exposes these functions:
# 1. Read per-service config from services/<svc>/config.<env>.yaml.
# Populates SVC_CFG_<KEY> globals for every flat key found.
# Errors if file missing.
read_service_config <svc> <env>
# 2. Read IaC variables from the cross-project IaC repo's path:
# IaC/platform-infrastructure/environments/<TEAM>/<repo>/infrastructure/.pipelines/variables/{common,<env>}.yaml
# Populates IAC_<KEY> globals. <TEAM> is the first dash-segment of the
# source repo's name (upper-cased). Errors if either file missing.
read_iac_variables <env>
# 3. Return the IaC project name (defaults to "IaC"; honors $NOCLICKOPS_IAC_PROJECT
# env override; if IAC_PROJECT is set in common.yaml after read_iac_variables, uses that).
discover_iac_project
# 4. Discover pipeline IDs across both projects.
# Returns five lines, "<role>=<id>", empty when not found:
# frontend_build=<id>
# frontend_deploy=<id>
# iac_infra_build=<id>
# iac_deploy_test=<id>
# iac_deploy_prod=<id>
discover_pipelines <svc>
# 5. Best-effort container-app discovery for live state.
# Order: (a) honor --app-name / --resource-group overrides if set in
# SVC_APP_NAME_OVERRIDE / SVC_RG_OVERRIDE globals; (b) read
# COMMON_RESOURCE_GROUP_NAME from iac vars and az containerapp list there
# with name filter ca-<repo-prefix-lc>-<svc>; (c) fall back to az
# containerapp list across the subscription with the same name filter.
# Returns "name=<name>\nresource_group=<rg>\nfqdn=<fqdn>" on success;
# empty stdout + non-zero exit on failure.
discover_containerapp <svc>
# 6. Pure derivation. Computes "ca-<repo-prefix-lc>-<svc>" — no az calls.
# Use when you need the predicted name without verifying it exists.
derive_containerapp_name <svc>
# 7. Public URL for a service with ENABLE_PUBLIC_ENDPOINT: "true".
# Reads ENABLE_PUBLIC_ENDPOINT from SVC_CFG_*, DNS_ZONE_NAME from IAC_*.
# Returns "<svc>.<DNS_ZONE_NAME>" or empty when not public.
public_url_for <svc> <env>
No derive_rg function — the v2 design deliberately drops it. v1 had rg-<env>-nrx-<APP_NAME> hardcoded; v2 discovers the RG via discover_containerapp (which queries Azure) or requires an override. Hardcoding the tenant prefix is what made v1 customer-specific; v2 doesn't repeat that mistake.
Phase 1: Scaffold module + test file — DONE
Tasks
- 1.1 Create
lib/service-v2.shwith the function-name stubs above. Each stub:printf 'TODO: %s\n' "<func-name>" >&2; return 1. Sourcelib/logging.sh+lib/utilities.shlike v1 does. Same_NCO_SERVICE_V2_LOADEDguard pattern. - 1.2 Create
tests/test-PLAN-A-service-discovery.shwith the test-runner boilerplate from existing tests (sourcetests/_helpers.sh, setNCO_ROOT). One placeholder test that sourceslib/service-v2.shand asserts the module loaded. Adds to the test list intests/run-all.sh(if it auto-discovers, skip this). - 1.3 Add v2 fixture helpers to
tests/_fixtures.sh:make_v2_source_repo [origin-url]— git repo with.pipelines/add-service.yaml+ emptyservices/.make_v2_service <repo> <svc>— addsservices/<svc>/withDockerfile,app/,config.test.yaml,config.prod.yaml,.pipelines/{service,deploy_service}.yaml. Config files populated with realistic SERVICE_* keys.make_v2_iac_repo— separate temp git repo modelingplatform-infrastructure;make_v2_iac_service <iac-repo> <source-repo-name> <svc>adds the variables and pipeline YAMLs for that service.
Validation
bash tests/run-all.sh
All existing tests still pass; new test-PLAN-A-service-discovery.sh runs and shows the placeholder PASSED.
User confirms phase is complete.
Phase 2: YAML readers (read_service_config, read_iac_variables) — DONE
Tasks
- 2.1 Lift
yaml_varfromlib/service.shintolib/service-v2.shunchanged (the YAML files in v2 are the same shape — flat, no nesting, no anchors). - 2.2 Implement
read_service_config <svc> <env>:- Resolves repo root via
nco_repo_root(existing helper fromlib/utilities.sh). - Reads
<repo-root>/services/<svc>/config.<env>.yaml. - Iterates every
KEY: valueline; exportsSVC_CFG_<KEY>=<value>(uppercase,-/.→_). - Sets
SVC_CFG__LOADED=1sentinel. - Errors clearly when file missing:
✗ services/<svc>/config.<env>.yaml not found. Is this the right service / env?
- Resolves repo root via
- 2.3 Implement
read_iac_variables <env>:- Reads source repo name from
git config --get remote.origin.url(or acceptsNOCLICKOPS_REPO_NAMEoverride). - Derives
<TEAM>= first dash-segment of repo name, upper-cased (e.g.ABC100001-myservice→ABC). - Reads the IaC repo's variables. Two approaches: (a) clone
platform-infrastructureto a cache dir on first use; (b) read via the ADO REST API (/_apis/git/repositories/.../items?path=...). Decision: API approach — no clone cache, no stale-state risk, matches v1's "wrap, never replicate" + read-only philosophy. - For each of
common.yamland<env>.yaml: GET via API, parse with same loop as 2.2, exportIAC_<KEY>=<value>. - Sets
IAC__LOADED=1sentinel. - Errors clearly when either file missing: name the exact path the API was queried for.
- Reads source repo name from
- 2.4 Tests for 2.2 + 2.3:
read_service_configagainst a fixture: assert SVC_CFG_SERVICE_PORT, SERVICE_CPU, ENABLE_PUBLIC_ENDPOINT populated correctly.read_iac_variables: mock the API via aNCO_ADO_REST_OVERRIDEenv var pointing at a local file (the function reads the URL through a small shim that honors the override — useful for testing without ADO access).- Negative case: missing service config returns expected error message.
Validation
bash tests/test-PLAN-A-service-discovery.sh
All Phase-2 tests pass.
User confirms phase is complete.
Phase 3: Pipeline + IaC project discovery — DONE
Tasks
- 3.1 Implement
discover_iac_project:- Returns
$NOCLICKOPS_IAC_PROJECTif set. - Returns
$IAC_PROJECTifread_iac_variableshas run. - Otherwise returns the literal
IaC.
- Returns
- 3.2 Implement
discover_pipelines <svc>(refactored to 2 calls per project + local filter — fewer round-trips, easier to stub):- Wraps
az pipelines list --querycalls. Two project queries:- FrontendPlatform:
--repository <repo> --repository-type tfsgit, then filter names matching<repo>-<svc>-buildand<repo>-<svc>-deploy. - IaC:
--organization <org-url> --project <iac-project>, then filter names matching<repo>-<svc>-{infra-build,deploy-test,deploy-prod}.
- FrontendPlatform:
- Returns lines
<role>=<id>for all five roles; empty value when a pipeline isn't found. - Caches results in
discover_pipelines_cache_<svc>global to avoid re-querying within a single command run.
- Wraps
- 3.3 Tests:
- Mock
azvia aNCO_AZ_OVERRIDEenv var pointing at a stub script — same pattern existing tests use (checktests/test-PLAN-008-info.shfor the convention). - Stub returns canned JSON for
az pipelines list; assert parsed IDs come out right. - Test: when one of the IaC pipelines doesn't exist, that role's line has empty value.
- Mock
Validation
bash tests/test-PLAN-A-service-discovery.sh
User confirms phase is complete.
Phase 4: Container-app discovery + public URL — DONE
Tasks
- 4.1 Implement
derive_containerapp_name <svc>:- Pure string derivation: repo name → lowercase → split on
-→ first segment →ca-<prefix>-<svc>. - No az calls.
- Pure string derivation: repo name → lowercase → split on
- 4.2 Implement
discover_containerapp <svc>:- Step (a): if
SVC_APP_NAME_OVERRIDEandSVC_RG_OVERRIDEset, return them without calling Azure. - Step (b): require
IAC__LOADED=1. ReadCOMMON_RESOURCE_GROUP_NAMEfromIAC_*. Callaz containerapp list -g $IAC_COMMON_RESOURCE_GROUP_NAME --query "[?name=='<derived>']". If hit, return parsedname/resource_group/fqdn. - Step (c): if (b) returned nothing, list across subscription:
az containerapp list --subscription $IAC_SUBSCRIPTION_ID --query "[?name=='<derived>']". - On all failures: print clear error naming the override flag the caller should pass (
--app-name+--resource-group). - Each az call wrapped via the same
NCO_AZ_OVERRIDEshim from 3.3.
- Step (a): if
- 4.3 Implement
public_url_for <svc> <env>:- Requires
SVC_CFG__LOADED=1+IAC__LOADED=1(errors clearly otherwise). - If
SVC_CFG_ENABLE_PUBLIC_ENDPOINT != "true"→ print empty + exit 0. - Otherwise return
<svc>.<IAC_DNS_ZONE_NAME>.
- Requires
- 4.4 Tests:
derive_containerapp_name: cases forABC100001-myservice+frontend→ca-abc100001-frontend; verify lowercasing.discover_containerapp: stub az to return matching app on first try (verify path b), no match on first + match on second (verify path c), no match anywhere (verify error message names the override flags).public_url_for: returns<svc>.example.cloudwhen public; returns empty when not.
Validation
bash tests/test-PLAN-A-service-discovery.sh
User confirms phase is complete.
Phase 5: End-to-end smoke + docs — DONE
Tasks
- 5.1 End-to-end smoke against the live test repo (manual; not a CI test) — procedure documented in
website/docs/contributors/lib-service-v2.md§ "End-to-end smoke". Manual execution against the live target repo is the user's responsibility before the v2 release; tracked outside this PLAN. - 5.2 Docs update:
- Added
website/docs/contributors/lib-service-v2.md(API, discovery-vs-derivation-vs-override, test shims, smoke procedure). - Added entry to
website/docs/contributors/index.md. - Updated
terchris/redaction-map.md's "What was NOT redacted" section to notelib/service-v2.shis leak-free by design.
- Added
Validation
bash tests/run-all.sh
End-to-end smoke documented + executed against the live repo.
User confirms phase is complete.
Acceptance Criteria
-
lib/service-v2.shexists and passes its test file - All 7 public functions implemented per signatures in the "Public API" section
- No
nrxor other customer-tenant tokens hardcoded -
tests/run-all.shstill green; new v2 tests added and passing -
lib/service.sh(v1) untouched — no regression for v1 callers -
website/docs/contributors/lib-service-v2.mddocuments the module - End-to-end smoke run against the live test repo passes
- PR description names which PLAN-* files (B/C/D/E/F) will consume this module
Files to Modify
lib/service-v2.sh(new)tests/test-PLAN-A-service-discovery.sh(new)tests/_fixtures.sh(addmake_v2_*helpers)website/docs/contributors/lib-service-v2.md(new)terchris/redaction-map.md(note v2 module is leak-free; local-only file)
Implementation Notes
Why "discover over hardcode"
v1's lib/service.sh line 82 hardcodes rg-${env}-nrx-${SVC_APP_NAME}. That nrx is the Red Cross tenant prefix. It works for Red Cross developers but bakes a customer identifier into open-source code. v2 deliberately avoids this:
- The RG name is discovered by
az containerapp listagainst the IaC-vars-provided subscription/RG, OR - the user overrides it with
--resource-group/--app-name.
The Bicep templates in platform-infrastructure produce the RG name; the engineer owns that convention. noclickops just looks up what's there.
Why a new file rather than mutating lib/service.sh
PLAN-B through PLAN-F each rewrite one command. Until all five land, both v1 and v2 commands must work side-by-side (so noclickops is usable on FrontendPlatform during the transition). Two libraries, two _NCO_*_LOADED guards. After PLAN-F merges, a v2-cleanup follow-up removes both lib/service.sh and lib/service.ps1 (v1's PowerShell mirror), and renames lib/service-v2.sh → lib/service.sh.
Why Bash-only
v2 drops PowerShell. The planned post-v2 rewrite is to Bun (TypeScript), which gives native multi-OS support without shell-flavor concerns. Maintaining .ps1 mirrors through v2 would be wasted effort the Bun rewrite supersedes. Windows-native users either run v1.5.x or use WSL/Git Bash for v2. v1's .ps1 files stay in the tree until PLAN-F cleanup; no new .ps1 work happens in v2.
Why API over clone for IaC vars
The IaC repo can be 100+ MB. Cloning to read 3 YAML files is wasteful. The ADO REST endpoint GET /_apis/git/repositories/<id>/items?path=... returns file content directly with an existing AAD token. No clone cache, no stale state, no .git/ to manage.
Acceptable trade-off: every noclickops info / deploy / logs / shell does ~2 small REST calls (~50 ms each) instead of a one-time clone followed by git pull. Net latency is similar; reliability is higher.
The az-shim pattern (for testability)
Every az call in this module goes through:
_nco_az() {
if [ -n "${NCO_AZ_OVERRIDE:-}" ]; then
"$NCO_AZ_OVERRIDE" "$@"
else
az "$@"
fi
}
Tests set NCO_AZ_OVERRIDE to a stub script that returns canned JSON. Same pattern existing tests use for bin/info etc.
Out of scope for PLAN-A
- Any command-side changes (those are PLAN-B through PLAN-F).
- Removing
lib/service.shorlib/service.ps1(post-PLAN-F cleanup). - Any
.ps1work for v2 (PowerShell dropped; Bun rewrite takes over multi-OS). sync-lovablerework (deferred per the investigation).- ADO REST caching across invocations (premature; revisit if benchmarks show it matters).