# Plan: Add Employee Helper CLI and Home Manager Integration For Safe VPS Provisioning ## Summary Build a packaged helper CLI that is exported from the existing root flake and installed through the existing Home Manager module. The helper suite will wrap the current manual provisioning flow into validated, guided commands that: - inspect a live VPS over SSH - scaffold `hosts//` files in this repo - generate or update `disko.nix` safely from probed facts - create the OpenBao policy/AppRole/bootstrap files for Tailscale enrollment - generate and optionally execute the `nixos-anywhere` install command - run post-install checks - generate the corresponding `colmena` host stanza instructions The helper CLI will be the supported employee interface. Raw manual commands remain possible, but the helper commands become the documented and safest path. Chosen defaults: - helper surface: packaged CLI installed by Home Manager - mutation mode: interactive confirm for risky operations - scope: full VPS workflow - repo writes: yes, write directly into this repo after validation - OpenBao auth: use the employee’s existing `bao` login/session - host fact collection: SSH probe live host - topology: implement as root-flake exports rather than a nested subflake, because the user marked either fine and this is lower-friction ## Goals ### User-facing goal Employees should be able to provision a new VPS with a small number of safe commands and without manually editing Nix files or remembering the OpenBao/Tailscale bootstrap sequence. ### Success criteria A new employee with: - a checkout of this repo - a working local `bao` login - SSH access to a target VPS can run helper commands that: 1. probe the target host and derive safe defaults 2. create or update the host files in `hosts//` 3. create the OpenBao AppRole bootstrap material for that host 4. print or run the exact `nixos-anywhere` command 5. verify first boot and tailnet enrollment 6. avoid footguns through validation, confirmations, and idempotency checks ## Architecture ## Repo additions Add a helper package and Home Manager module to the existing root flake. New files/directories: - `pkgs/helpers/` - `pkgs/helpers/cli.py` or equivalent main entrypoint - `pkgs/helpers/templates/` - `modules/helpers/home.nix` - optional `pkgs/helpers/lib/` for command submodules - optional `pkgs/helpers/README.md` if internal tool docs grow beyond the top-level README Root flake exports to add: - `packages..nodeiwest-helper` - `apps..nodeiwest-helper` - `homeManagerModules.helpers` Existing `modules/home.nix` should import or include the helper Home Manager module so employees automatically get the command suite. ## Packaging choice Use a packaged Python CLI with stdlib only unless a strong reason appears otherwise. Why: - safer structured argument parsing than shell aliases - simpler file templating and repo mutation than Bash - easier SSH probing output parsing and validation - no need to depend on a persistent external runtime beyond `python3` - clean packaging through `pkgs.writeShellApplication` or `python3Packages.buildPythonApplication` The CLI should still shell out to: - `ssh` - `bao` - `nix` - `git` - optionally `colmena` These tools are already aligned with the repo’s workflow. ## Home Manager integration ### Existing behavior Current `modules/home.nix` only installs: - `openbao` - `colmena` ### New behavior Employees should also get: - `nodeiwest` helper CLI command - any runtime dependencies not guaranteed by the base environment, if needed Recommended command name: - `nodeiwest` Reason: - short - organization-scoped - extensible subcommands - does not collide with upstream tools ## CLI command surface The CLI should be subcommand-based and decision-complete from day one. ### 1. `nodeiwest host probe` Purpose: - SSH into the target host - collect disk and boot facts - output normalized machine facts Inputs: - `--ip ` - optional `--user ` default `root` Behavior: - run the exact discovery commands already documented in the README: - `lsblk -o NAME,SIZE,TYPE,MODEL,FSTYPE,PTTYPE,MOUNTPOINTS` - boot mode probe via `/sys/firmware/efi` - root mount source via `findmnt -no SOURCE /` - swap via `cat /proc/swaps` - parse into structured output - determine: - primary disk candidate - root partition - boot mode - whether current disk naming is `sda` / `vda` / `nvme` - whether current system appears UEFI or BIOS Output: - human-readable summary - optional `--json` machine-readable output Failure conditions: - no SSH connectivity - multiple ambiguous disk candidates - missing required commands on remote host ### 2. `nodeiwest host init` Purpose: - create or update `hosts//configuration.nix` - create or update `hosts//disko.nix` - create placeholder `hardware-configuration.nix` if missing - print exactly what changed - ask for confirmation before writing Inputs: - `--name ` - `--ip ` - optional `--user ` default `root` - optional overrides: - `--disk /dev/sda` - `--boot-mode uefi|bios` - `--swap-size 4GiB` - `--timezone UTC` - `--tailscale-openbao on|off` default `on` - optional `--write` or `--apply` - without it: dry-run plan only - with it: write after interactive confirmation Behavior: - call `host probe` implicitly unless overrides are provided - derive safe defaults from live facts - validate host name format - validate that target files do not already contain contradictory data - create backup copies before overwriting existing tracked files - update `configuration.nix` with: - hostName - boot loader config matching boot mode - Tailscale AppRole bootstrap enabled by default - update `disko.nix` with: - correct disk device - GPT - correct UEFI/BIOS boot partition shape - ext4 root - swap partition using chosen/default size - create placeholder `hardware-configuration.nix` if absent - detect if host is missing from `flake.nix` and report exact block to add Safety checks: - if boot mode is BIOS but host config would still emit EFI loader, abort - if probed disk differs from existing `disko.nix`, require explicit confirmation - if repo has unrelated dirty changes in the exact target host files, warn and stop unless `--force` ### 3. `nodeiwest openbao init-host` Purpose: - create the OpenBao policy and AppRole for a host - generate bootstrap files locally for install-time injection Inputs: - `--name ` - optional `--namespace it` - optional `--secret-path tailscale` - optional `--field auth_key` - optional `--auth-path auth/approle` - optional `--policy-name tailscale-` - optional `--role-name tailscale-` - optional `--out ./bootstrap` - optional `--kv-mount-path ` - optional `--cidr ` repeatable if later needed - optional `--apply` - without it: show policy/AppRole plan and output paths - with it: execute after interactive confirmation Behavior: - verify `bao` is available - verify employee is already authenticated: - e.g. `bao token lookup` or equivalent harmless auth check - generate policy content from inputs - write policy via `bao policy write` - write AppRole via `bao write auth/approle/role/...` - fetch `role_id` - generate `secret_id` - create local bootstrap directory: - `bootstrap/var/lib/nodeiwest/openbao-approle-role-id` - `bootstrap/var/lib/nodeiwest/openbao-approle-secret-id` - chmod both `0400` - print exact next-step command to install with `nixos-anywhere` Defaults: - namespace `it` - auth path `auth/approle` - secret path `tailscale` - field `auth_key` - role name `tailscale-` - policy name `tailscale-` Important validation: - if `bao kv get` equivalent path probe fails for the chosen namespace/path, abort with a clear explanation that the KV mount path/policy path likely needs adjustment - if OpenBao is unreachable or auth is missing, fail fast ### 4. `nodeiwest install plan` Purpose: - assemble and print the exact `nixos-anywhere` command - validate all required local files exist before install Inputs: - `--name ` - optional `--ip ` default from flake/host inventory if available - optional `--bootstrap-dir ./bootstrap` - optional `--copy-host-keys on|off` default `on` - optional `--generate-hardware-config on|off` default `on` Behavior: - validate: - `hosts//configuration.nix` exists - `hosts//disko.nix` exists - bootstrap role_id/secret_id files exist - target host is present in `flake.nix` or warn with exact missing stanza - print the exact install command - print preflight checklist: - provider snapshot taken - app/data backup taken - public SSH reachable - host keys may change No mutation beyond optional temp checks. ### 5. `nodeiwest install run` Purpose: - run the validated `nixos-anywhere` command Inputs: - same as `install plan` - requires explicit `--apply` Behavior: - run the same validation as `install plan` - print command and require interactive confirmation - execute `nix run github:nix-community/nixos-anywhere -- ...` - stream logs - on success, print post-install verification commands Safety: - refuse to run without confirmation unless `--yes` - refuse to run if bootstrap files are missing - refuse to run if target IP is not reachable over SSH ### 6. `nodeiwest verify host` Purpose: - verify first boot and Tailscale/OpenBao bootstrap Inputs: - `--name ` - `--ip ` - optional `--user root` Behavior: - SSH to host and run: - `systemctl status vault-agent-tailscale` - `systemctl status nodeiwest-tailscale-authkey-ready` - `systemctl status tailscaled-autoconnect` - `tailscale status` - summarize health - print actionable failures grouped by likely cause: - missing AppRole files - OpenBao auth failed - wrong secret path / field - Tailscale autoconnect blocked ### 7. `nodeiwest colmena plan` Purpose: - ensure the host is ready for ongoing deploys - print or verify the Colmena target block Inputs: - `--name ` - optional `--ip ` Behavior: - check that `flake.nix` has `colmena..deployment.targetHost` - if missing, print exact snippet to add - print the post-install deploy command: - `nix run .#colmena -- apply --on ` ## Templates and file generation Use explicit templates stored under `pkgs/helpers/templates/`. Templates to maintain: - `configuration.nix.j2` equivalent - `disko-uefi-ext4.nix` - `disko-bios-ext4.nix` if BIOS support is intended - `hardware-configuration.placeholder.nix` - OpenBao policy template Rendering rules: - do not overwrite files blindly - preserve existing CA key list and obvious host-specific overrides when possible - if parsing existing files is too risky, switch to a guarded “abort and instruct user” policy rather than clever rewrites Chosen default: - for v1, only support the current ext4+swap single-disk pattern cleanly - if the existing host directory contains custom structure outside the supported template shape, abort with a clear message instead of trying to merge arbitrarily ## Public interfaces to add ## Flake outputs Add: - `packages..nodeiwest-helper` - `apps..nodeiwest-helper` - `homeManagerModules.helpers` The root Home Manager module should include or re-export the helper package, so employees get it automatically. ## Home Manager module contract `modules/helpers/home.nix` should: - install `packages.${pkgs.system}.nodeiwest-helper` - optionally ensure helper runtime dependencies are present if not embedded by packaging ## CLI contract Stable executable: - `nodeiwest` Stable subcommands for v1: - `nodeiwest host probe` - `nodeiwest host init` - `nodeiwest openbao init-host` - `nodeiwest install plan` - `nodeiwest install run` - `nodeiwest verify host` - `nodeiwest colmena plan` ## Error-handling and safety policy Global rules: - every mutating command supports dry-run first - every mutating command requires confirmation unless `--yes` - every repo write command creates a timestamped backup copy of the target file - every external command failure is surfaced with: - the exact subcommand - stdout/stderr summary - the next likely fix Abort conditions: - current repo is not the expected flake root - `bao` is not authenticated for OpenBao actions - target SSH host is unreachable - disk facts are ambiguous - target host files already exist in a shape the helper cannot safely reason about ## Out of scope for v1 Not included in the first helper bundle: - multi-disk or ZFS disko generation - non-AppRole Tailscale bootstrap flows - OpenBao response wrapping automation - automatic `flake.nix` AST rewriting beyond a tightly controlled supported block shape - full Colmena block auto-editing if the flake layout drifts from the current structure - vps2/OpenBao server provisioning For unsupported cases, commands should fail with a precise message and the manual fallback step. ## Implementation details ## Packaging Recommended implementation: - Python CLI in `pkgs/helpers/` - build with `python3` from nixpkgs - keep dependencies stdlib-only if possible Why not shell aliases: - too weak for idempotent file writes and validation - poorer error reporting - weaker SSH probe parsing ## File writing strategy For repo-tracked file mutation: - read current file if it exists - compare against generated output - if unchanged, report no-op - if changed, write a `.bak.` sibling first, then overwrite target after confirmation Chosen default: - direct repo writes are allowed because the user explicitly asked for helpers that make the workflow hard to get wrong - commands remain conservative and stop on unsupported customizations ## OpenBao auth assumptions Helpers assume: - user already has a valid `bao` login/session locally - `BAO_ADDR` is already set by Home Manager or can be inferred - namespace defaults to `it` If auth is missing: - fail fast - print the exact `bao` command that must work before retrying ## SSH probing assumptions Helpers use: - `ssh root@` by default - override via `--user` The CLI should probe live host facts by default and allow manual overrides for exceptional cases. ## README integration After the helper CLI exists, the top-level `README.md` should be revised to: - keep the underlying manual process documented - make the helper commands the recommended flow - reduce the manual sequence to a fallback / advanced section ## Test cases and scenarios ### Static/evaluation tests 1. Flake exposes: - `packages..nodeiwest-helper` - `apps..nodeiwest-helper` - `homeManagerModules.helpers` 2. Home Manager module installs the helper package without breaking current `modules/home.nix`. 3. Generated commands reflect current repo paths: - `hosts//configuration.nix` - `hosts//disko.nix` - `hosts//hardware-configuration.nix` ### CLI behavior tests 1. `host probe` on a UEFI `/dev/sda` VPS returns: - boot mode `UEFI` - disk `/dev/sda` - root partition `/dev/sda2` 2. `host init` dry-run for a new host: - prints file plan - does not write files 3. `host init --apply`: - creates host files - uses probed disk and boot mode - creates backups when overwriting 4. `openbao init-host` with valid local `bao` auth: - writes policy and AppRole - creates bootstrap files with mode `0400` 5. `openbao init-host` without local `bao` auth: - fails before attempting writes 6. `install plan`: - refuses to proceed if bootstrap files are missing - prints exact `nixos-anywhere` command otherwise 7. `install run`: - requires confirmation - executes the printed command 8. `verify host` after successful install: - reports agent healthy - reports tailscaled autoconnect healthy - reports Tailscale joined ### Failure scenarios 1. Multiple candidate disks on target host: - helper aborts and requires explicit `--disk` 2. BIOS host with UEFI template: - helper aborts before writing 3. Existing custom `disko.nix` not matching supported template: - helper aborts with “manual intervention required” 4. OpenBao secret path exists but field `auth_key` is missing: - helper fails during validation with a precise message 5. SSH probe works but `nixos-anywhere` later loses connectivity: - helper reports the exact command that failed and reminds user to recover via provider console/public SSH ## Assumptions and defaults Chosen assumptions: - keep one root flake, not a nested helper flake - helper package is installed through Home Manager - command name is `nodeiwest` - direct repo writes are acceptable with backups and confirmation - employees already authenticate to OpenBao manually before using helper OpenBao commands - live SSH probing is preferred over manual fact entry - first release only supports the current single-disk ext4+swap provisioning shape cleanly - helper commands should be conservative and stop rather than guess in unsupported cases If these assumptions remain acceptable, the implementation can proceed without further design decisions.