Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 23 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ Two-Node Toolbox (TNF) is a comprehensive deployment automation framework for Op
# From the deploy/ directory:

# Deploy AWS hypervisor and cluster in one command
make deploy arbiter-ipi # Deploy arbiter topology cluster
make deploy arbiter-ipi # Deploy arbiter topology cluster
make deploy fencing-ipi # Deploy fencing topology cluster
make deploy fencing-assisted # Deploy hub + spoke TNF via assisted installer

# Instance lifecycle management
make create # Create new EC2 instance
Expand Down Expand Up @@ -70,6 +71,15 @@ ansible-playbook kcli-install.yml -i inventory.ini -e "test_cluster_name=my-clus
ansible-playbook kcli-install.yml -i inventory.ini -e "force_cleanup=true"
```

#### Assisted Installer Method (Spoke TNF via ACM)
```bash
# Copy and customize the configuration template
cp vars/assisted.yml.template vars/assisted.yml

# Deploy hub + spoke TNF cluster via assisted installer
make deploy fencing-assisted
```

### Linting and Validation
```bash
# Shell script linting (from repository root)
Expand All @@ -88,14 +98,17 @@ make shellcheck
- Automatic inventory management for Ansible integration

2. **OpenShift Cluster Deployment** (`deploy/openshift-clusters/`)
- Two deployment methods: dev-scripts (traditional) and kcli (modern)
- Three deployment methods: dev-scripts (traditional), kcli (modern), and assisted installer (spoke via ACM)
- Ansible roles for complete cluster automation
- Support for both arbiter and fencing topologies
- Assisted installer deploys spoke TNF clusters on an existing hub via ACM/MCE
- Proxy configuration for external cluster access

3. **Ansible Roles Architecture**:
- `dev-scripts/install-dev`: Traditional deployment using openshift-metal3/dev-scripts
- `kcli/kcli-install`: Modern deployment using kcli virtualization management
- `assisted/acm-install`: Install ACM/MCE + assisted service + enable TNF on hub
- `assisted/assisted-spoke`: Deploy spoke TNF cluster via assisted installer + BMH
- `proxy-setup`: Squid proxy for cluster external access
- `redfish`: Automated stonith configuration for fencing topology
- `config`: SSH key and git configuration
Expand All @@ -119,16 +132,23 @@ make shellcheck
- `roles/kcli/kcli-install/files/pull-secret.json`: OpenShift pull secret
- SSH key automatically read from `~/.ssh/id_ed25519.pub` on ansible controller

#### Assisted Installer Method
- `vars/assisted.yml`: Variable override file (copy from `vars/assisted.yml.template`)
- Hub cluster must be deployed first via dev-scripts (`make deploy fencing-ipi`)
- Spoke credentials output to `~/<spoke_cluster_name>/auth/` on hypervisor
- Hub proxy preserved as `hub-proxy.env`

#### Generated Files
- `proxy.env`: Generated proxy configuration (source this to access cluster)
- `hub-proxy.env`: Hub proxy config (preserved when spoke proxy is configured)
- `kubeconfig`: OpenShift cluster kubeconfig
- `kubeadmin-password`: Default admin password

### Development Workflow

1. **Environment Setup**: Use `deploy/aws-hypervisor/` tools or bring your own RHEL 9 host
2. **Configuration**: Edit inventory and config files based on chosen deployment method
3. **Deployment**: Run appropriate Ansible playbook (setup.yml or kcli-install.yml)
3. **Deployment**: Run appropriate Ansible playbook (setup.yml, kcli-install.yml, or assisted-install.yml)
4. **Access**: Source `proxy.env` and use `oc` commands or WebUI through proxy
5. **Cleanup**: Use cleanup make targets or Ansible playbooks

Expand Down
5 changes: 5 additions & 0 deletions deploy/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ arbiter-ipi:
arbiter-agent:
@./openshift-clusters/scripts/deploy-arbiter-agent.sh

fencing-assisted:
@$(MAKE) fencing-ipi
@./openshift-clusters/scripts/deploy-fencing-assisted.sh

patch-nodes:
@./openshift-clusters/scripts/patch-nodes.sh
get-tnf-logs:
Expand Down Expand Up @@ -82,6 +86,7 @@ help:
@echo " fencing-agent - Deploy fencing Agent cluster (non-interactive) (WIP Experimental)"
@echo " arbiter-ipi - Deploy arbiter IPI cluster (non-interactive)"
@echo " arbiter-agent - Deploy arbiter Agent cluster (non-interactive)"
@echo " fencing-assisted - Deploy hub + spoke TNF cluster via assisted installer"
@echo " redeploy-cluster - Redeploy OpenShift cluster using dev-scripts make redeploy"
@echo " shutdown-cluster - Shutdown OpenShift cluster VMs in orderly fashion"
@echo " startup-cluster - Start up OpenShift cluster VMs and proxy container"
Expand Down
2 changes: 1 addition & 1 deletion deploy/aws-hypervisor/scripts/create.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ echo -e "AMI ID: $RHEL_HOST_AMI"
echo -e "Machine Type: $EC2_INSTANCE_TYPE"

ec2Type="VirtualMachine"
if [[ "$EC2_INSTANCE_TYPE" =~ c[0-9]+[gn].metal ]]; then
if [[ "$EC2_INSTANCE_TYPE" =~ c[0-9]+[a-z]*.metal ]]; then
ec2Type="MetalMachine"
fi

Expand Down
154 changes: 154 additions & 0 deletions deploy/openshift-clusters/assisted-install.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
---
# Deploy a spoke TNF cluster via ACM/assisted installer on an existing hub cluster.
#
# Prerequisites:
# - vars/assisted.yml exists (copy from vars/assisted.yml.template)
#
# Usage:
# make deploy fencing-assisted

- hosts: metal_machine
gather_facts: yes

vars:
topology: fencing
interactive_mode: false
pull_secret_path: /opt/dev-scripts/pull_secret.json
hub_kubeconfig: "{{ ansible_user_dir }}/auth/kubeconfig"
method: assisted
cluster_state_dir: "../aws-hypervisor/instance-data"
cluster_state_filename: "cluster-vm-state.json"

vars_files:
- vars/assisted.yml

pre_tasks:
- name: Check that proxy.env exists (hub must be deployed first)
stat:
path: "{{ playbook_dir }}/proxy.env"
delegate_to: localhost
register: proxy_env_check

- name: Fail if proxy.env is missing
fail:
msg: >-
proxy.env not found. The hub cluster must be deployed first
using 'make deploy fencing-ipi'. proxy.env is required for
cluster access.
when: not proxy_env_check.stat.exists

- name: Check that hub kubeconfig exists
stat:
path: "{{ ansible_user_dir }}/auth/kubeconfig"
register: hub_kubeconfig_check

- name: Fail if hub kubeconfig is missing
fail:
msg: >-
Hub kubeconfig not found at ~/auth/kubeconfig.
The hub cluster must be deployed first.
when: not hub_kubeconfig_check.stat.exists

- name: Preserve hub proxy.env as hub-proxy.env
copy:
src: "{{ playbook_dir }}/proxy.env"
dest: "{{ playbook_dir }}/hub-proxy.env"
remote_src: no
backup: no
delegate_to: localhost

- name: Get hub release image
shell: |
oc get clusterversion version -o jsonpath='{.status.desired.image}'
register: hub_release_image_raw
changed_when: false
environment:
KUBECONFIG: "{{ hub_kubeconfig }}"

- name: Get hub OCP version
shell: |
oc get clusterversion version -o jsonpath='{.status.desired.version}' | cut -d. -f1-2
register: hub_ocp_version_raw
changed_when: false
environment:
KUBECONFIG: "{{ hub_kubeconfig }}"

- name: Set hub release facts
set_fact:
hub_release_image: "{{ hub_release_image_raw.stdout }}"
hub_ocp_version: "{{ hub_ocp_version_raw.stdout }}"
effective_release_image: >-
{{ hub_release_image_raw.stdout if spoke_release_image == 'auto'
else spoke_release_image }}
effective_ocp_version: "{{ hub_ocp_version_raw.stdout }}"

- name: Display assisted installer configuration
debug:
msg: |
Assisted Installer Configuration:
Hub operator: {{ hub_operator }}
ACM/MCE channel: {{ acm_channel if hub_operator == 'acm' else mce_channel }}
Spoke cluster: {{ spoke_cluster_name }}.{{ spoke_base_domain }}
Spoke release image: {{ spoke_release_image }}
Spoke VMs: {{ spoke_ctlplanes }}x ({{ spoke_vm_vcpus }} vCPUs, {{ spoke_vm_memory }}MB RAM, {{ spoke_vm_disk_size }}GB disk)
Spoke network: {{ spoke_network_cidr }}
API VIP: {{ spoke_api_vip }}
Ingress VIP: {{ spoke_ingress_vip }}
Storage method: {{ assisted_storage_method }}
Force cleanup: {{ force_cleanup }}

- name: Update cluster state to deploying
include_role:
name: common
tasks_from: cluster-state
vars:
cluster_state_phase: 'deploying'
default_playbook_name: 'assisted-install.yml'
num_masters: "{{ spoke_ctlplanes }}"
num_workers: 0

roles:
- role: assisted/acm-install
- role: assisted/assisted-spoke

post_tasks:
- name: Setup proxy access for spoke cluster
include_role:
name: proxy-setup
vars:
kubeconfig_path: "{{ spoke_kubeconfig_path }}"
kubeadmin_password_path: "{{ spoke_kubeadmin_password_path }}"

- name: Update cluster inventory with spoke VMs
include_role:
name: common
tasks_from: update-cluster-inventory
vars:
test_cluster_name: "{{ spoke_cluster_name }}"

- name: Update cluster state to deployed
include_role:
name: common
tasks_from: cluster-state
vars:
cluster_state_phase: 'deployed'
default_playbook_name: 'assisted-install.yml'
num_masters: "{{ spoke_ctlplanes }}"
num_workers: 0

- name: Display deployment summary
debug:
msg: |
Spoke TNF cluster deployed successfully!

Spoke credentials:
Kubeconfig: {{ spoke_kubeconfig_path }}
Admin password: {{ spoke_kubeadmin_password_path }}

Access spoke cluster:
source proxy.env
KUBECONFIG={{ spoke_kubeconfig_path }} oc get nodes

Access hub cluster:
source hub-proxy.env
KUBECONFIG=~/auth/kubeconfig oc get nodes
2 changes: 2 additions & 0 deletions deploy/openshift-clusters/collections/requirements.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,5 @@ collections:
version: ">=2.0"
- name: community.general
version: ">=5.0.0"
- name: ansible.utils
version: ">=2.0.0"
79 changes: 79 additions & 0 deletions deploy/openshift-clusters/roles/assisted/acm-install/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# acm-install Role

Installs ACM or MCE operator on a hub cluster and configures the assisted installer service for spoke TNF cluster deployment.

## Description

This role prepares an existing hub OpenShift cluster to deploy spoke TNF clusters via the assisted installer. It:

1. Validates hub cluster health and prerequisites
2. Provisions hostPath storage for the assisted service
3. Installs the ACM or MCE operator (auto-detects channel)
4. Creates the AgentServiceConfig with RHCOS ISO auto-extracted from the hub release image
5. Enables TNF cluster support in the assisted service
6. Configures BMO to watch all namespaces and disables the provisioning network

## Requirements

- A running hub OpenShift cluster (deployed via `make deploy fencing-ipi` or equivalent)
- Hub kubeconfig accessible at `~/auth/kubeconfig`
- Pull secret with access to required registries
- `oc` CLI available on the hypervisor

## Role Variables

### Configurable Variables (defaults/main.yml)

- `hub_operator`: Operator to install - `"acm"` or `"mce"` (default: `"acm"`)
- `acm_channel`: ACM operator channel - `"auto"` detects from packagemanifest (default: `"auto"`)
- `mce_channel`: MCE operator channel (default: `"auto"`)
- `assisted_storage_method`: Storage backend - currently only `"hostpath"` (default: `"hostpath"`)
- `assisted_images_path`: Host directory for ISO images (default: `/var/lib/assisted-images`)
- `assisted_db_path`: Host directory for database (default: `/var/lib/assisted-db`)
- `assisted_images_size`: PV size for images (default: `50Gi`)
- `assisted_db_size`: PV size for database (default: `10Gi`)
- `assisted_storage_class`: StorageClass name (default: `assisted-service`)

### Timeout Variables

- `acm_csv_timeout`: Operator CSV install timeout in seconds (default: `900`)
- `multiclusterhub_timeout`: MultiClusterHub readiness timeout (default: `1800`)
- `assisted_service_timeout`: Assisted service pod readiness timeout (default: `600`)
- `metal3_stabilize_timeout`: Metal3 pod stabilization timeout after provisioning changes (default: `300`)

### Variables Set by Playbook

These are set in `assisted-install.yml` and passed to the role:

- `hub_kubeconfig`: Path to hub cluster kubeconfig
- `pull_secret_path`: Path to pull secret on the hypervisor
- `hub_release_image`: Hub cluster release image (extracted in playbook pre_tasks)
- `hub_ocp_version`: Hub OCP version major.minor (extracted in playbook pre_tasks)
- `effective_release_image`: Release image to use for the spoke (hub image or user override)

## Task Flow

1. **validate.yml** - Checks hub cluster health, node readiness, and API access
2. **storage.yml** - Creates hostPath PVs, StorageClass, and fixes permissions/SELinux on hub nodes
3. **install-operator.yml** - Installs ACM/MCE operator subscription, waits for CSV, creates MultiClusterHub
4. **agent-service-config.yml** - Extracts RHCOS ISO URL from release image, creates AgentServiceConfig
5. **enable-tnf.yml** - Enables TNF support in assisted service configuration
6. **enable-watch-all-namespaces.yml** - Patches Provisioning CR to enable BMO in all namespaces

## Usage

This role is not called directly. It is invoked via `assisted-install.yml`:

```bash
make deploy fencing-assisted
# or
ansible-playbook assisted-install.yml -i inventory.ini
```

## Troubleshooting

- Check operator CSV status: `oc get csv -n open-cluster-management`
- Check MultiClusterHub status: `oc get multiclusterhub -n open-cluster-management`
- Check assisted service pods: `oc get pods -n multicluster-engine -l app=assisted-service`
- Check AgentServiceConfig: `oc get agentserviceconfig agent -o yaml`
- Check events: `oc get events -n multicluster-engine --sort-by='.lastTimestamp'`
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
# Default variables for acm-install role

# Hub operator to install: "acm" or "mce"
hub_operator: acm

# ACM/MCE channel: "auto" detects from packagemanifest
acm_channel: "auto"
mce_channel: "auto"

# Storage method for assisted service: "hostpath"
assisted_storage_method: "hostpath"

# hostPath directories on hub nodes
assisted_images_path: /var/lib/assisted-images
assisted_db_path: /var/lib/assisted-db
assisted_images_size: 50Gi
assisted_db_size: 10Gi
assisted_storage_class: assisted-service

# Timeouts (seconds)
acm_csv_timeout: 900
multiclusterhub_timeout: 1800
assisted_service_timeout: 600
metal3_stabilize_timeout: 300
Loading