Skip to content

rename __set_env to env and make it un-hidden#1656

Merged
NobodyXu merged 11 commits intorust-lang:mainfrom
oconnor663:env
Feb 26, 2026
Merged

rename __set_env to env and make it un-hidden#1656
NobodyXu merged 11 commits intorust-lang:mainfrom
oconnor663:env

Conversation

@oconnor663
Copy link
Contributor

@oconnor663 oconnor663 commented Jan 11, 2026

Related to #1655. I went with env (same name as the env member) for consistency with std::process::Command but of course we could bikeshed that.

One doubt I have about this change is that it looks like there are some places where we don't consult self.env when building Command objects to invoke. Here's one:

cc-rs/src/lib.rs

Lines 1809 to 1817 in 8124fc5

let mut cmd = if is_assembler_msvc {
self.msvc_macro_assembler()?
} else {
let mut cmd = compiler.to_command();
for (a, b) in self.env.iter() {
cmd.env(a, b);
}
cmd
};

If we're going to expose this, would we want to make sure all child processes see these variables?

Also I'm not sure what would be a clean way to test this. So this PR is more of a discussion starter :)

Copy link
Contributor

@NobodyXu NobodyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I went with env (same name as the env member) for consistency with std::process::Command but of course we could bikeshed that.

This is also consistent with other function naming (i.e. pic), sothe naming LGTM

If we're going to expose this, would we want to make sure all child processes see these variables?

Yeah I think should be

@NobodyXu
Copy link
Contributor

If we're going to expose this, would we want to make sure all child processes see these variables?

@oconnor663 do you want to do all of that within this PR, or do you want me to merge this and cut a release now?

Copy link
Contributor

@madsmtm madsmtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should support something like this! The reason I haven't added it myself is that cc itself accesses environment variables with std::env::var, and I wanted it to be completely clear what those do?

E.g. if I call build.env("CC", "clang"), does that change the compiler to be clang, or must I do std::env::set_var("CC", "clang")? I tend to think that the answer is "yes", but I'm unsure if there could be a downside to it?

(Whatever we figure out here should be documented in Build::env).

@NobodyXu
Copy link
Contributor

NobodyXu commented Jan 29, 2026

E.g. if I call build.env("CC", "clang"), does that change the compiler to be clang, or must I do std::env::set_var("CC", "clang")? I tend to think that the answer is "yes", but I'm unsure if there could be a downside to it?

I agree and we should change getenv to read from self.env

I can't think of downside doing so

@NobodyXu
Copy link
Contributor

Kcc @madsmtm I'd like to merge this and open another PR to update getenv to use it

@madsmtm
Copy link
Contributor

madsmtm commented Jan 30, 2026

Kcc @madsmtm I'd like to merge this and open another PR to update getenv to use it

fn getenv already accesses self.env. I think it's more things like TargetInfoParserInner::from_cargo_environment_variables that should be updated to read from self.env too (actually basically all the places that we currently do #[allow(clippy::disallowed_methods)]).

I'd prefer to do that first, and also update all the tests that use std::env::set_var to use Build::__set_env instead - and only after should we merge this. But it's up to you!

@NobodyXu
Copy link
Contributor

fn getenv already accesses self.env.

Thanks you are right, but I'm wondering if we should try to access self.env first before accessing the cache, and stop putting self.env into the shared cache so that users can change the cache at any time.

And I'm think we don't need to emit a rerun for it, given that it is set at the build script?

cc-rs/src/lib.rs

Line 3841 in 0767349

let r = self

I think it's more things like TargetInfoParserInner::from_cargo_environment_variables that should be updated to read from self.env too (actually basically all the places that we currently do #[allow(clippy::disallowed_methods)]).

Do we really need to override those? Is there any use case?

I'd prefer to do that first, and also update all the tests that use std::env::set_var to use Build::__set_env instead - and only after should we merge this. But it's up to you!

Yes that does sound much better, given that this PR can be edited by maintainer, I could make some commits later when I have time

@madsmtm
Copy link
Contributor

madsmtm commented Jan 30, 2026

Thanks you are right, but I'm wondering if we should try to access self.env first before accessing the cache, and stop putting self.env into the shared cache so that users can change the cache at any time.

And I'm think we don't need to emit a rerun for it, given that it is set at the build script?

Yes to both of those.

Do we really need to override those? Is there any use case?

Hmm, not that I really know of? But we'd currently allow overwriting e.g. CARGO_CFG_TARGET_FEATURE, so I'd at least want to be consistent with that.

@madsmtm
Copy link
Contributor

madsmtm commented Jan 30, 2026

I kinda think there's a few semantically different "kinds" of environment variables:

  1. Variables that the C compiler itself might reasonably read: SDKROOT, *_DEPLOYMENT_TARGET, WASI_SDK_PATH.
  2. Variables that are semi-standard across CMake-like tools to find the desired compiler + flags: CC*, CXX*, RANLIB*, AR*, NVCC*, CXXSTDLIB*, *FLAGS*, CROSS_COMPILE.
  3. Variables specific to cc-rs: CRATE_CC_NO_DEFAULTS, CC_KNOWN_WRAPPER_CUSTOM, CC_SHELL_ESCAPED_FLAGS, CC_FORCE_DISABLE, CC_ENABLE_DEBUG_OUTPUT, WASM_MUSL_SYSROOT, WASI_SYSROOT.
  4. Variables Cargo sets for build scripts: CARGO_*, RUSTC*, TARGET, HOST, OPT_LEVEL, OUT_DIR etc.

NUM_JOBS kinda falls into multiple of these categories btw, and I'm a bit unsure where in what category the env vars find-msvc-tools reads (like VisualStudioVersion, VSCMD_ARG_TGT_ARCH, VCINSTALLDIR, WindowsSdkDir etc.) falls into. A special case here is also PATH, this might need different handling depending on what it's used for.

I'm fairly certain that Build::env should set category 1 env vars. And as we discussed above, category 4 is not really that desirable to override (semantically, accesses to these are done by cc-rs-the-crate, not the tool we're invoking).

I'm unsure about category 2 and 3, but I'm somewhat leaning towards not allowing overriding them? They're all meant to be controlled by the end user, not the build script / cc-rs - build scripts should instead call Build::compiler/Build::flags. And in any case, there's a fix if you really want to override them (namely, call std::env::set_var before creating Build).

@NobodyXu
Copy link
Contributor

Yeah agree, however I think env wise, keeping getenv's access to self.env is still the simplest way

@NobodyXu
Copy link
Contributor

Variables specific to cc-rs: CRATE_CC_NO_DEFAULTS, CC_KNOWN_WRAPPER_CUSTOM, CC_SHELL_ESCAPED_FLAGS, CC_FORCE_DISABLE, CC_ENABLE_DEBUG_OUTPUT, WASM_MUSL_SYSROOT, WASI_SYSROOT.

i think WAM/WASI, know wrapper might be worth overriding, as there's currently no way to override it using cc api

@madsmtm
Copy link
Contributor

madsmtm commented Feb 2, 2026

I actually think the whole thing should be refactored a lot. For example, self.build_cache.env_cache makes no sense IMO, environment variables are not that costly to look up.


Maybe the design I want should instead be something like:

// Free method, does not need to touch `Build` at all.
#[allow(clippy::disallowed_methods)] // Cargo env, no need for rerun-if-env-changed
fn cargo_env_var(key: &str) -> Option<OsString> {
    std::env::var_os(key)
}

impl Build {
    /// Look up an environment variable, but allow it to be overwritten by `Build::env`.
    fn get_env_overridable(&self, key: &str) -> Option<Cow<'static, OsStr>> {
        // Try to look up in overrides first.
        if let Some((_, value)) = self.env_override.iter().find(|(k, _)| k == key) {
            return Some(Cow::Borrowed(value));
        }

        // If not found in overrides, look up from environment.
        Some(Cow::Owned(self.get_cc_env(key)?))
    }

    /// Look up an environment variable, and tell Cargo that we used it.
    #[allow(clippy::disallowed_methods)] // We emit rerun-if-env-changed
    fn get_env(&self, key: &str) -> Option<OsString> {
        // Tell Cargo that we're going to depend on this env var.
        if self.emit_rerun_if_env_changed && key != "PATH" {
            self.cargo_output.print_metadata(&format_args!("cargo:rerun-if-env-changed={v}"));
        }

        // And look it up in the environment.
        Some(std::env::var_os(key)?)
    }
}

And we'd have environment variable "category" 1 call Build::get_env_overridable, 2 and 3 call Build::get_env and 4 call cargo_env_var.

i think WAM/WASI, know wrapper might be worth overriding, as there's currently no way to override it using cc api

I would rather start with them not being override-able, we can always allow it in the future.

That is, if we make category 2 and 3 call Build::get_env, we can always expand it to instead call Build::get_env_overridable in the future if a need for it is found, that would not be a breaking change (but the reverse would be).

@NobodyXu
Copy link
Contributor

NobodyXu commented Feb 3, 2026

I think part of the reason we cache it, is to avoid emitting the re-run tag, which does involve some I/O and could be expensive, and could even block the thread?

@madsmtm
Copy link
Contributor

madsmtm commented Feb 11, 2026

I think part of the reason we cache it, is to avoid emitting the re-run tag, which does involve some I/O and could be expensive, and could even block the thread?

I guess, but still, it's gonna be orders of magnitude less compared to the C compiler invocation itself.

Anyhow, I'm not gonna block on that, Build::get_env can keep doing the caching if you want.

@NobodyXu
Copy link
Contributor

I can change it to just a HashSet and only cache the println!

Sorry I don't know why I was being stubborn on either cache the entire thing or nothing mindset

@madsmtm
Copy link
Contributor

madsmtm commented Feb 11, 2026

I can change it to just a HashSet and only cache the println!

Sorry I don't know why I was being stubborn on either cache the entire thing or nothing mindset

Still feels unnecessary to me, especially since most users are gonna run Build::compile only once per build script.

I think if we want to do caching, I'd probably prefer to have it at a higher level; e.g. instead of caching getenv, it makes more sense to cache Build::env_tool IMO.

But that's also dangerous if you suddenly change target to something else, since then you'd need to read different CC_$TARGET variables, so idk.
I can also see the annoyance in filling up the build log (as noted in #1561), and that'd probably be worse if we emitted a bunch more unnecessary cargo::* annotations.


To expand a bit more on why I'm against caching in cc-rs: It is said that there are only two hard problems in computer science: naming, cache-invalidation and off by one errors ;). We're taking a difficult problem (cross-platform invocation of the C compiler) and adding another problem on top (cache-invalidation).

To give a concrete example, what if the user did:

let mut build = Build::new().file("foobar.c");
// We don't want metadata for this compilation run.
build.cargo_metadata(false).compile("foo");
// But for this one, we do!
build.cargo_metadata(true).compile("bar");

It's not a use-case I feel is necessary to support, and I know we could solve it by including CargoOutput in the cache key, but eehhhhh. It feels unclean.


But yeah, I'm digressing, again, I'm fine with either form of caching (HashMap or HashSet) in Build::get_env.

@NobodyXu
Copy link
Contributor

Yeah letting cargo do the caching is right thing to do.

Too much can change in the env, I'll just remove the caching

@NobodyXu
Copy link
Contributor

cc @madsmtm I've removed the env caching, please take another look

@NobodyXu NobodyXu requested a review from madsmtm February 20, 2026 18:07
@NobodyXu
Copy link
Contributor

NobodyXu commented Feb 24, 2026

cc @madsmtm pinging

@NobodyXu
Copy link
Contributor

I plan to merge this tomorrow to include this in next release, if no further feedback is given

Copy link
Contributor

@madsmtm madsmtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, calling build.env("foo", "bar").get_compiler().env() should return the env vars set, same with and build.env("foo", "bar").get_compiler().to_command().get_envs().

I also still think we should be deliberate about envs that are overridable with a get_env_overridable or similar.

@madsmtm
Copy link
Contributor

madsmtm commented Feb 25, 2026

I can also do these in a follow-up PR, as long as we're fine with delaying the release until the follow-up lands too.

@NobodyXu
Copy link
Contributor

I can also do these in a follow-up PR, as long as we're fine with delaying the release until the follow-up lands too.

Yep I can delay that until it's done, that's fine with me @madsmtm

@NobodyXu NobodyXu merged commit 9c4720b into rust-lang:main Feb 26, 2026
79 checks passed
@madsmtm
Copy link
Contributor

madsmtm commented Feb 27, 2026

I filed #1682 to implement the changes I propose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants