Skip to content

[WIP][DO NOT MERGE][AKS] Support service account image pull feature#9665

Open
norshtein wants to merge 1 commit intoAzure:mainfrom
norshtein:tosi/saip
Open

[WIP][DO NOT MERGE][AKS] Support service account image pull feature#9665
norshtein wants to merge 1 commit intoAzure:mainfrom
norshtein:tosi/saip

Conversation

@norshtein
Copy link
Member


This checklist is used to make sure that common guidelines for a pull request are followed.

Related command

az aks create and az aks update

General Guidelines

  • Have you run azdev style <YOUR_EXT> locally? (pip install azdev required)
  • Have you run python scripts/ci/test_index.py -q locally? (pip install wheel==0.30.0 required)
  • My extension version conforms to the Extension version schema

For new extensions:

About Extension Publish

There is a pipeline to automatically build, upload and publish extension wheels.
Once your pull request is merged into main branch, a new pull request will be created to update src/index.json automatically.
You only need to update the version information in file setup.py and historical information in file HISTORY.rst in your PR but do not modify src/index.json.

Copilot AI review requested due to automatic review settings March 10, 2026 10:54
@norshtein norshtein requested a review from FumingZhang as a code owner March 10, 2026 10:54
@azure-client-tools-bot-prd
Copy link

Validation for Breaking Change Starting...

Thanks for your contribution!

@azure-client-tools-bot-prd
Copy link

Hi @norshtein,
Please write the description of changes which can be perceived by customers into HISTORY.rst.
If you want to release a new extension version, please update the version in setup.py as well.

@yonzhan
Copy link
Collaborator

yonzhan commented Mar 10, 2026

Thank you for your contribution! We will review the pull request and get back to you soon.

@github-actions
Copy link
Contributor

The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR.

Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions).
After that please run the following commands to enable git hooks:

pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>

@github-actions
Copy link
Contributor

CodeGen Tools Feedback Collection

Thank you for using our CodeGen tool. We value your feedback, and we would like to know how we can improve our product. Please take a few minutes to fill our codegen survey

@github-actions
Copy link
Contributor

Hi @norshtein

Release Suggestions

Module: aks-preview

  • Update VERSION to 19.0.0b25 in src/aks-preview/setup.py

Notes

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for a new AKS service account image pull feature, allowing clusters to authenticate to Azure Container Registry using service account scoped managed identities instead of the node-scoped kubelet identity. It adds three new CLI parameters: --enable-service-account-image-pull, --disable-service-account-image-pull (update only), and --service-account-image-pull-default-managed-identity-id.

Changes:

  • New context getter methods and create/update decorator methods for the ServiceAccountImagePullProfile security sub-profile
  • New CLI parameter registrations (_params.py), function signatures (custom.py), and help text (_help.py)
  • Unit tests for the new context getters and decorator methods, plus integration test stubs for live testing

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
managed_cluster_decorator.py Core logic: context getters (get_enable/disable_service_account_image_pull, get_service_account_image_pull_default_managed_identity_id), and decorator methods (set_up_service_account_image_pull for CREATE, update_service_account_image_pull for UPDATE)
custom.py Added three new parameters to aks_create and aks_update function signatures
_params.py Registered the new CLI parameters for both create and update contexts as is_preview=True
_help.py Added help text for the new parameters in both aks create and aks update command groups
test_managed_cluster_decorator.py Unit tests for the new getter methods and decorator methods; also restructured existing test_get_image_cleaner_interval_hours into a separate _extended method
test_aks_commands.py Added integration test stubs for create and update scenarios (missing recording files)
HISTORY.rst Added entry under Pending section documenting the new parameters
Comments suppressed due to low confidence (1)

src/aks-preview/azext_aks_preview/tests/latest/test_managed_cluster_decorator.py:1857

  • The new test_get_image_cleaner_interval_hours_extended method starts with a variable named ctx_2, skipping ctx_1. This is confusing because the method is a new standalone test, not a continuation - it should start from ctx_1. More significantly, the variable ctx_2 is reused twice in this same method (lines 1833 and 1846) for two different test scenarios, which is also inconsistent with the pattern used in the rest of the test file where each test context is given a unique number.
    def test_get_image_cleaner_interval_hours_extended(self):
        ctx_2 = AKSPreviewManagedClusterContext(
            self.cmd,
            AKSManagedClusterParamDict(
                {
                    "enable_image_cleaner": True,
                    "image_cleaner_interval_hours": 24,
                }
            ),
            self.models,
            decorator_mode=DecoratorMode.CREATE,
        )
        self.assertEqual(ctx_2.get_image_cleaner_interval_hours(), 24)

        ctx_2 = AKSPreviewManagedClusterContext(
            self.cmd,
            AKSManagedClusterParamDict(
                {
                    "image_cleaner_interval_hours": 24,
                }
            ),
            self.models,
            decorator_mode=DecoratorMode.CREATE,
        )
        with self.assertRaises(RequiredArgumentMissingError):
            ctx_2.get_image_cleaner_interval_hours()

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +6300 to +6311
profile = mc.security_profile.service_account_image_pull_profile
if profile is None:
profile = self.models.ServiceAccountImagePullProfile() # pylint: disable=no-member
mc.security_profile.service_account_image_pull_profile = profile

if enable_service_account_image_pull:
profile.enabled = True
elif disable_service_account_image_pull:
profile.enabled = False

if default_managed_identity_id is not None:
profile.default_managed_identity_id = default_managed_identity_id
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In update_service_account_image_pull, when only service_account_image_pull_default_managed_identity_id is provided without either enable_service_account_image_pull or disable_service_account_image_pull, and the cluster has no existing service_account_image_pull_profile, the code creates a new profile with enabled=None (line 6302) and sets the identity ID. Sending a profile where enabled is not set to the API is likely unintended behavior and could result in an API error or an inconsistent state. The code should either raise a RequiredArgumentMissingError in this case, or only allow updating the identity ID when the feature is already enabled (by checking the existing profile's state from the mc object).

Copilot uses AI. Check for mistakes.
Comment on lines +4104 to +4114
if self.context.get_enable_service_account_image_pull():
if mc.security_profile is None:
mc.security_profile = self.models.ManagedClusterSecurityProfile() # pylint: disable=no-member
mc.security_profile.service_account_image_pull_profile = (
self.models.ServiceAccountImagePullProfile( # pylint: disable=no-member
enabled=True,
default_managed_identity_id=self.context.get_service_account_image_pull_default_managed_identity_id(),
)
)

return mc
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In set_up_service_account_image_pull (CREATE path), when a user provides --service-account-image-pull-default-managed-identity-id without --enable-service-account-image-pull, the identity ID is silently ignored because the code only enters the configuration block when get_enable_service_account_image_pull() is truthy. This could be confusing to users. Consider adding a warning or raising a RequiredArgumentMissingError similar to how other paired parameters in this codebase handle mismatched usage (e.g., get_image_cleaner_interval_hours raises RequiredArgumentMissingError when image_cleaner_interval_hours is provided without enable_image_cleaner).

Copilot uses AI. Check for mistakes.
Comment on lines +5905 to +5970
def test_set_up_service_account_image_pull(self):
# test no-op when not enabled
dec_1 = AKSPreviewManagedClusterCreateDecorator(
self.cmd,
self.client,
{},
CUSTOM_MGMT_AKS_PREVIEW,
)
mc_1 = self.models.ManagedCluster(location="test_location")
dec_1.context.attach_mc(mc_1)
dec_mc_1 = dec_1.set_up_service_account_image_pull(mc_1)
ground_truth_mc_1 = self.models.ManagedCluster(location="test_location")
self.assertEqual(dec_mc_1, ground_truth_mc_1)

# test enabled without managed identity id
dec_2 = AKSPreviewManagedClusterCreateDecorator(
self.cmd,
self.client,
{
"enable_service_account_image_pull": True,
},
CUSTOM_MGMT_AKS_PREVIEW,
)
mc_2 = self.models.ManagedCluster(location="test_location")
dec_2.context.attach_mc(mc_2)
dec_mc_2 = dec_2.set_up_service_account_image_pull(mc_2)

ground_truth_profile_2 = self.models.ServiceAccountImagePullProfile(
enabled=True,
default_managed_identity_id=None,
)
ground_truth_security_profile_2 = self.models.ManagedClusterSecurityProfile(
service_account_image_pull_profile=ground_truth_profile_2,
)
ground_truth_mc_2 = self.models.ManagedCluster(
location="test_location",
security_profile=ground_truth_security_profile_2,
)
self.assertEqual(dec_mc_2, ground_truth_mc_2)

# test enabled with managed identity id
dec_3 = AKSPreviewManagedClusterCreateDecorator(
self.cmd,
self.client,
{
"enable_service_account_image_pull": True,
"service_account_image_pull_default_managed_identity_id": "test_identity_id",
},
CUSTOM_MGMT_AKS_PREVIEW,
)
mc_3 = self.models.ManagedCluster(location="test_location")
dec_3.context.attach_mc(mc_3)
dec_mc_3 = dec_3.set_up_service_account_image_pull(mc_3)

ground_truth_profile_3 = self.models.ServiceAccountImagePullProfile(
enabled=True,
default_managed_identity_id="test_identity_id",
)
ground_truth_security_profile_3 = self.models.ManagedClusterSecurityProfile(
service_account_image_pull_profile=ground_truth_profile_3,
)
ground_truth_mc_3 = self.models.ManagedCluster(
location="test_location",
security_profile=ground_truth_security_profile_3,
)
self.assertEqual(dec_mc_3, ground_truth_mc_3)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_set_up_service_account_image_pull (CREATE path) does not cover the scenario where service_account_image_pull_default_managed_identity_id is provided without enable_service_account_image_pull. In this case, the identity is silently ignored (the function returns early at line 4104 when get_enable_service_account_image_pull() is falsy). Adding a test case for this scenario would help document and verify the actual behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +11355 to +11456
@AllowLargeResponse()
@AKSCustomResourceGroupPreparer(
random_name_length=17, name_prefix="clitest", location="westus2"
)
def test_aks_create_with_service_account_image_pull(
self, resource_group, resource_group_location
):
# reset the count so in replay mode the random names will start with 0
self.test_resources_count = 0
# kwargs for string formatting
aks_name = self.create_random_name("cliakstest", 16)
node_pool_name = self.create_random_name("c", 6)
self.kwargs.update(
{
"resource_group": resource_group,
"name": aks_name,
"node_pool_name": node_pool_name,
"location": resource_group_location,
"ssh_key_value": self.generate_ssh_keys(),
}
)
# create cluster with --enable-service-account-image-pull
create_cmd = (
"aks create --resource-group={resource_group} --name={name} "
"--nodepool-name {node_pool_name} -c 1 "
"--location {location} --ssh-key-value={ssh_key_value} "
"--enable-service-account-image-pull"
)
self.cmd(
create_cmd,
checks=[
self.check("provisioningState", "Succeeded"),
self.check(
"securityProfile.serviceAccountImagePullProfile.enabled",
True,
),
],
)

@AllowLargeResponse()
@AKSCustomResourceGroupPreparer(
random_name_length=17, name_prefix="clitest", location="westus2"
)
def test_aks_update_with_service_account_image_pull(
self, resource_group, resource_group_location
):
# reset the count so in replay mode the random names will start with 0
self.test_resources_count = 0
# kwargs for string formatting
aks_name = self.create_random_name("cliakstest", 16)
node_pool_name = self.create_random_name("c", 6)
self.kwargs.update(
{
"resource_group": resource_group,
"name": aks_name,
"node_pool_name": node_pool_name,
"location": resource_group_location,
"ssh_key_value": self.generate_ssh_keys(),
}
)
# create cluster
create_cmd = (
"aks create --resource-group={resource_group} --name={name} "
"--nodepool-name {node_pool_name} -c 1 "
"--location {location} --ssh-key-value={ssh_key_value}"
)
self.cmd(
create_cmd,
checks=[
self.check("provisioningState", "Succeeded"),
],
)
# update: enable service account image pull
update_cmd = (
"aks update --resource-group={resource_group} --name={name} "
"--enable-service-account-image-pull"
)
self.cmd(
update_cmd,
checks=[
self.check("provisioningState", "Succeeded"),
self.check(
"securityProfile.serviceAccountImagePullProfile.enabled",
True,
),
],
)
# update: disable service account image pull
update_cmd = (
"aks update --resource-group={resource_group} --name={name} "
"--disable-service-account-image-pull"
)
self.cmd(
update_cmd,
checks=[
self.check("provisioningState", "Succeeded"),
self.check(
"securityProfile.serviceAccountImagePullProfile.enabled",
False,
),
],
)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new integration tests test_aks_create_with_service_account_image_pull and test_aks_update_with_service_account_image_pull do not have corresponding recording files in the recordings/ directory. Without YAML recording files, these tests cannot run in replay mode (CI). Recording files need to be created by running the tests in live mode against a real Azure subscription before this PR can pass CI.

Copilot uses AI. Check for mistakes.
enable_service_account_image_pull = self.context.get_enable_service_account_image_pull()
disable_service_account_image_pull = self.context.get_disable_service_account_image_pull()
default_managed_identity_id = self.context.get_service_account_image_pull_default_managed_identity_id()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disable flag and managed identity flag should also be mutually exclusive.

c.argument("disable_image_integrity", action="store_true", is_preview=True)
c.argument("enable_service_account_image_pull", action="store_true", is_preview=True)
c.argument("disable_service_account_image_pull", action="store_true", is_preview=True)
c.argument("service_account_image_pull_default_managed_identity_id", is_preview=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to validate the format of the resource id to fail fast if user provided like a client ID?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AKS Auto-Assign Auto assign by bot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants