Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 81 additions & 93 deletions docs/en/installation/ai-cluster.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ weight: 30

To begin, you will need to deploy the **Alauda AI Operator**. This is the core engine for all Alauda AI products. By default, it uses the **KServe** `Raw Deployment` mode for the inference backend, which is particularly recommended for resource-intensive generative workloads. This mode provides a straightforward way to deploy models and offers robust, customizable deployment capabilities by leveraging foundational Kubernetes functionalities.

If your use case requires `Serverless` functionality, which enables advanced features like **scaling to zero on demand** for cost optimization, you can optionally install the **Alauda AI Model Serving Operator**. This operator is not part of the default installation and can be added at any time to enable `Serverless` functionality.
If your use case requires `Serverless` functionality, which enables advanced features like **scaling to zero on demand** for cost optimization, you can optionally install the **Knative CE Operator**. This operator is not part of the default installation and can be added at any time to enable `Serverless` functionality.



Expand All @@ -26,19 +26,19 @@ If your use case requires `Serverless` functionality, which enables advanced fea

_Download package: aml-operator.xxx.tgz_

- **Alauda AI Model Serving Operator**
- **Knative CE Operator**

Alauda AI Model Serving Operator provides serverless model inference.
Knative CE Operator provides serverless model inference.

_Download package: kserveless-operator.xxx.tgz_
_Download package: knative-operator.ALL.v1.x.x-yymmdd.tgz_

:::info
You can download the app named 'Alauda AI' and 'Alauda AI Model Serving' from the Marketplace on the Customer Portal website.
You can download the app named 'Alauda AI' and 'Knative CE Operator' from the Marketplace on the Customer Portal website.
:::

## Uploading

We need to upload both `Alauda AI` and `Alauda AI Model Serving` to the cluster where Alauda AI is to be used.
We need to upload both `Alauda AI` and `Knative CE Operator` to the cluster where Alauda AI is to be used.

<Steps>

Expand Down Expand Up @@ -66,7 +66,7 @@ export PLATFORM_ADMIN_PASSWORD=<admin-password> # [!code callout]
export CLUSTER=<cluster-name> # [!code callout]

export AI_CLUSTER_OPERATOR_NAME=<path-to-aml-operator-tarball> # [!code callout]
export KSERVELESS_OPERATOR_PKG_NAME=<path-to-kserveless-operator-tarball> # [!code callout]
export KNATIVE_CE_OPERATOR_PKG_NAME=<path-to-knative-operator-tarball> # [!code callout]

VIOLET_EXTRA_ARGS=()
IS_EXTERNAL_REGISTRY=
Expand Down Expand Up @@ -97,9 +97,9 @@ violet push \
${VIOLET_EXTRA_ARGS[@]}

# [!code highlight]
# Push **KServeless** operator package to destination cluster
# Push **Knative CE Operator** package to destination cluster
violet push \
${KSERVELESS_OPERATOR_PKG_NAME} \
${KNATIVE_CE_OPERATOR_PKG_NAME} \
--platform-address=${PLATFORM_ADDRESS} \
--platform-username=${PLATFORM_ADMIN_USER} \
--platform-password=${PLATFORM_ADMIN_PASSWORD} \
Expand All @@ -114,14 +114,14 @@ violet push \
3. `${PLATFORM_ADMIN_PASSWORD}` is the password of the ACP platform admin.
4. `${CLUSTER}` is the name of the cluster to install the Alauda AI components into.
5. `${AI_CLUSTER_OPERATOR_NAME}` is the path to the Alauda AI Cluster Operator package tarball.
6. `${KSERVELESS_OPERATOR_PKG_NAME}` is the path to the KServeless Operator package tarball.
6. `${KNATIVE_CE_OPERATOR_PKG_NAME}` is the path to the Knative CE Operator package tarball.
7. `${REGISTRY_ADDRESS}` is the address of the external registry.
8. `${REGISTRY_USERNAME}` is the username of the external registry.
9. `${REGISTRY_PASSWORD}` is the password of the external registry.

</Callouts>

After configuration, execute the script file using `bash ./uploading-ai-cluster-packages.sh` to upload both `Alauda AI` and `Alauda AI Model Serving` operator.
After configuration, execute the script file using `bash ./uploading-ai-cluster-packages.sh` to upload both `Alauda AI` and `Knative CE Operator`.

</Steps>

Expand All @@ -137,7 +137,7 @@ In **Administrator** view:
2. At the top of the console, from the **Cluster** dropdown list, select the destination cluster where you want to install Alauda AI.
3. Select **Alauda AI**, then click **Install**.

**Install Alauda AI** window will popup.
**Install Alauda AI** window will pop up.

4. Then in the **Install Alauda AI** window.
5. Leave **Channel** unchanged.
Expand Down Expand Up @@ -195,9 +195,6 @@ In **Administrator** view:

By default, the configuration uses `SelfSigned` certificate type for securing ingress traffic to your cluster, the certificate is
stored in the `knative-serving-cert` secret that is specified in the **Domain Certificate Secret** field.

To use certificate provided by your own, store the certificate secret in the `istio-system` namespace, then update the value of the
**Domain Certificate Secret** field, and change the value of the **Domain Certificate Secret** field to `Provided`.
:::

11. In the **Serverless Configuration** section, set **Knative Serving Provider** to **Operator**; leave all other parameters blank.
Expand Down Expand Up @@ -230,115 +227,106 @@ Now, the core capabilities of Alauda AI have been successfully deployed. If you

Serverless functionality is an optional capability that requires an additional operator and instance to be deployed.

### 1. Installing the Alauda AI Model Serving Operator
### 1. Installing the Knative CE Operator


<Steps>

### Prerequisites

The `Serverless` capability relies on the **Istio** `Gateway` for its networking. Please install the **Service Mesh** first by following the [documentation](./pre-configuration.mdx#deploy-service-mesh).
:::info
Starting from **Knative CE Operator**, the Knative networking layer switches to **Kourier**, so installing **Istio** is no longer required.
:::
Comment on lines +235 to +237
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l docs/en/installation/ai-cluster.mdx

Repository: alauda/aml-docs

Length of output: 98


🏁 Script executed:

# Read the file around lines 238-240 and search for istio-system references
cat -n docs/en/installation/ai-cluster.mdx | head -300 | tail -100

Repository: alauda/aml-docs

Length of output: 3754


🏁 Script executed:

# Search for all istio-system references in the file
rg "istio-system" docs/en/installation/ai-cluster.mdx -B 2 -A 2

Repository: alauda/aml-docs

Length of output: 433


🏁 Script executed:

rg "istio-system" docs/en/installation/ai-cluster.mdx -B 5 -A 5 -n

Repository: alauda/aml-docs

Length of output: 802


🏁 Script executed:

rg -i "istio" docs/en/installation/ai-cluster.mdx -B 2 -A 2 -n

Repository: alauda/aml-docs

Length of output: 630


Clarify whether istio-system namespace is created in the base deployment.

Lines 199–200 reference storing certificates in the istio-system namespace during initial setup, but lines 238–240 state Istio is no longer required for Knative. Clarify whether istio-system is still provisioned as part of the core cluster setup, or if an alternative namespace should be used for custom certificates when Istio is not installed.

🤖 Prompt for AI Agents
In `@docs/en/installation/ai-cluster.mdx` around lines 238 - 240, Clarify whether
the istio-system namespace is provisioned by the core/base deployment or not:
update the text around the mentions of "istio-system" and the Knative
installation note (references: "istio-system", "Knative CE Operator", "Kourier")
to state explicitly whether the base cluster still creates istio-system for
legacy certificate storage, or, if Istio is not installed, specify the
alternative namespace users should use for storing custom certificates (and
change the guidance at lines that reference storing certs in istio-system to
point to that alternative). Ensure a single clear sentence near the Knative note
indicates if istio-system is present by default or only present when Istio is
installed and give the recommended namespace to use for custom certs when using
Kourier.


### Procedure
#### Procedure

In **Administrator** view:

1. Click **Marketplace / OperatorHub**.
2. At the top of the console, from the **Cluster** dropdown list, select the destination cluster where you want to install.
3. Select **Alauda AI Model Serving**, then click **Install**.
3. Search for and select **Knative CE Operator**, then click **Install**.

**Install Alauda AI Model Serving** window will popup.
**Install Knative CE Operator** window will pop up.

4. Then in the **Install Alauda AI Model Serving** window.
4. Then in the **Install Knative CE Operator** window.
5. Leave **Channel** unchanged.
6. Check whether the **Version** matches the **Alauda AI Model Serving** version you want to install.
7. Leave **Installation Location** unchanged, it should be `kserveless-operator` by default.
6. Check whether the **Version** matches the **Knative CE Operator** version you want to install.
7. Leave **Installation Location** unchanged.
8. Select **Manual** for **Upgrade Strategy**.
9. Click **Install**.

### Verification
#### Verification

Confirm that the **Alauda AI Model Serving** tile shows one of the following states:
Confirm that the **Knative CE Operator** tile shows one of the following states:

- `Installing`: installation is in progress; wait for this to change to `Installed`.
- `Installed`: installation is complete.

</Steps>


### 2. Creating Alauda AI Model Serving Instance

Once **Alauda AI Model Serving Operator** is installed, you can create an instance. There are two ways to do this:
### 2. Creating Knative Serving Instance

#### **Automated Creation (Recommended)**
Once **Knative CE Operator** is installed, you need to create the `KnativeServing` instance manually.

You can have the instance automatically created and managed by the `AmlCluster` by editing its parameters.
<Steps>

In **Administrator** view:

1. Click **Marketplace / OperatorHub**.
2. At the top of the console, from the **Cluster** dropdown list, select the destination cluster where you previously installed the `AmlCluster`.
3. Select **Alauda AI**, then **Click**.
4. In the **Alauda AI** page, click **All Instances** from the tab.
5. Click name **default**.
6. Locate **Actions** dropdown list and select update.

**update default** form will show up.

7. In the **Serverless Configuration** section:
1. Set **Knative Serving Provider** to `Legacy`.
2. Set **BuiltIn Knative Serving** to `Managed`.

8. Leave all other parameters unchanged. Click **Update**.

</Steps>

#### **Manual Creation and Integration**

You can manually create the `KnativeServing (knativeservings.components.aml.dev)` instance.
#### Procedure

1. Create the `knative-serving` namespace.

```bash
kubectl create ns knative-serving
```

2. In the **Administrator** view, navigate to **Operators** -> **Installed Operators**.
3. Select the **Knative CE Operator**.
4. Under **Provided APIs**, locate **KnativeServing** and click **Create Instance**.
5. Switch to **YAML view**.
6. Replace the content with the following YAML:
7. Click **Create**.

```yaml
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
name: knative-serving
namespace: knative-serving
spec:
config:
deployment:
registries-skipping-tag-resolving: kind.local,ko.local,dev.local,private-registry # [!code callout]
domain:
example.com: ""
features:
kubernetes.podspec-affinity: enabled
kubernetes.podspec-hostipc: enabled
kubernetes.podspec-hostnetwork: enabled
kubernetes.podspec-init-containers: enabled
kubernetes.podspec-nodeselector: enabled
kubernetes.podspec-persistent-volume-claim: enabled
kubernetes.podspec-persistent-volume-write: enabled
kubernetes.podspec-securitycontext: enabled
kubernetes.podspec-tolerations: enabled
kubernetes.podspec-volumes-emptydir: enabled
queueproxy.resource-defaults: enabled
network:
domain-template: '{{.Name}}.{{.Namespace}}.{{.Domain}}'
ingress-class: kourier.ingress.networking.knative.dev
ingress:
kourier:
enabled: true
```

<Steps>

In **Administrator** view:

1. Click **Marketplace / OperatorHub**.
2. At the top of the console, from the **Cluster** dropdown list, select the destination cluster where you want to install.
3. Select **Alauda AI Model Serving**, then **Click**.
4. In the **Alauda AI Model Serving** page, click **All Instances** from the tab.
5. Click **Create**.

**Select Instance Type** window will pop up.

6. Locate the **KnativeServing** tile in **Select Instance Type** window, then click **Create**.

**Create KnativeServing** form will show up.

7. Keep `default-knative-serving` unchanged for **Name**.
8. Keep `knative-serving` unchanged for **Knative Serving Namespace**.
9. In the **Ingress Gateway** section, configure the following:
1. Set the **Ingress Gateway Istio Revision** to a value that corresponds to your Istio version (e.g., `1-22`).
2. Set a valid domain for the **Domain** field.
3. Set the appropriate **Domain Certificate Type**.
<Callouts>

:::info
For details on configuring the domain and certificate type, refer to the [relevant section](#procedure-1).
:::
1. `private-registry` is a placeholder for your private registry address. You can find this in the **Administrator** view, then click **Clusters**, select `your cluster`, and check the **Private Registry** value in the **Basic Info** section.

10. In the **Values** section, configure the following:
1. Select **Deploy Flavor** from dropdown:
1. `single-node` for non HA deployments.
2. `ha-cluster` for HA cluster deployments (**Recommended** for production).
2. Set **Global Registry Address** to Match Your Cluster
</Callouts>

You can find your cluster's private registry address by following these steps:
1. In the Web Console, go to Administrator / Clusters.
2. Select your target **cluster**.
3. On the **Overview** tab, find the `Private Registry address` value in the **Basic Info** section.
</Steps>

</Steps>
### 3. Integrate with AmlCluster

Configure the `AmlCluster` instance to integrate with a `KnativeServing` instance.
Configure the `AmlCluster` instance to integrate with the `KnativeServing` instance.

<Steps>

Expand All @@ -348,10 +336,10 @@ In the **AmlCluster** instance update window, you will need to fill in the requi
After the initial installation, you will find that only the **Knative Serving Provider** is set to `Operator`. You will now need to provide values for the following parameters:
:::

* **APIVersion**: `components.aml.dev/v1alpha1`
* **APIVersion**: `operator.knative.dev/v1beta1`
* **Kind**: `KnativeServing`
* **Name**: `default-knative-serving`
* Leave **Namespace** blank.
* **Name**: `knative-serving`
* **Namespace**: `knative-serving`
</Steps>

## Replace GitLab Service After Installation
Expand Down
7 changes: 2 additions & 5 deletions docs/en/installation/fine-tuning.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,8 @@ spec:
type: SelfSigned
domain: '*.example.com'
knativeServing:
istioConfig:
controlPlane:
autoRevisionMode: legacy
managementState: Managed
providerType: Legacy
managementState: Removed
providerType: Operator
kserve:
managementState: Managed
values:
Expand Down
2 changes: 1 addition & 1 deletion docs/en/overview/architecture.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ NOTE: Alauda AI uses some general Kubernetes, ACP components including:
| aml-controller | Manages Alauda AI namespaces on workload clusters. Namespaces will be automatically configured a Model Repo space and corresponding resources. | Self-developed | |
| aml-api-deploy | Provides high-level APIs for "Lich" | Self-developed | |
| Gitlab (with Minio or S3) | Model repository backend storage and version tracking. | Open source | MIT |
| kserve-controller | (Optionally with knative serving and istio enabled) Manages AI inference services and inference service runtimes. | Open source | Apache Version 2.0 |
| kserve-controller | (Optionally with knative serving enabled) Manages AI inference services and inference service runtimes. | Open source | Apache Version 2.0 |
| workspace-controller | Manages workbench instances (jupyter notebooks, codeserver) | Open source | Apache Version 2.0 |
| Volcano | Plugin to provide co-scheduling (gang-scheduling) features for AI training jobs. Also manages "volcanojob" resource to run general training workloads. | Open source | Apache Version 2.0 |
| MLFlow | Track training, evaluation jobs by storing, visualizing metrics and artifacts | Open source | Apache Version 2.0 |
Expand Down