Skip to content

library: Move CStr to libcore, and CString to liballoc#94079

Merged
bors merged 5 commits intorust-lang:masterfrom
petrochenkov:cstr
Apr 15, 2022
Merged

library: Move CStr to libcore, and CString to liballoc#94079
bors merged 5 commits intorust-lang:masterfrom
petrochenkov:cstr

Conversation

@petrochenkov
Copy link
Contributor

@petrochenkov petrochenkov commented Feb 17, 2022

Closes #46736

Interesting points:

Otherwise it's just a code move + some minor hackery usual for liballoc in cfg(test) mode.

@rust-highfive
Copy link
Contributor

r? @yaahc

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 17, 2022
@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 17, 2022
@petrochenkov petrochenkov added T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Feb 17, 2022
@rust-log-analyzer

This comment was marked as resolved.

@bors

This comment was marked as resolved.

@joshtriplett
Copy link
Member

This is great! I'd love to see this move, to make it easier to interface with C FFI in contexts that don't necessarily want std.

Platform strlen implementations typically have extensive optimizations, and I don't think we'd want to miss out on those. It might make sense to ensure we have a weak strlen in compiler-builtins or similar. (cc @Amanieu)

@rust-log-analyzer

This comment has been minimized.

@petrochenkov
Copy link
Contributor Author

Relying on target ABI in libcore is ok

A lot of attention in #46736 is put on whether depending on c_char in libcore is an acceptable thing or not.
I don't think the self-imposed restriction of not doing that is reasonable.

Obviously, libcore cannot depend on calling OS services, be it syscalls or functions or system libraries.
But why cannot it rely on the target's ABI (which includes sizes/alignments/signedness of basic C types)? It doesn't introduce any new dependencies to libcore.

Even without libcore the compiler cares about the target's ABI and keeps the relevant data in target specifications (inside rustc and LLVM).
It must know these things to generate layouts for repr(C) data structures, and calling conventions for generating extern "C" functions.
rustc's target specs, for example, have fields like c_int_width and c_enum_min_bits, even if we don't count the LLVM part.
So libcores build for different targets are already different, because the compiler uses that ABI knowledge when generating code.
If the compiler can do it, then why a higher layer like libcore cannot?

Recently core contains more of target-dependent stuff than back in 2017 when #46736 was created, because it's simply too useful to break that restriction sometimes.
A bunch of code in core::arch depends on the target's CPU architecture, and the va_list implementation in core::ffi requires relying on even more ABI details.

I personally don't see issues with putting that cfg_if that defines c_char into libcore, but perhaps people will feel better about this if the char signedness is moved to the compiler's target specs (compiler\rustc_target\src\spec\*.rs)?
Then the c_char definition will look roughly like

#[cfg(target_c_char_signed)]
type c_char = i8;
#[cfg(not(target_c_char_signed))]
type c_char = u8;

@Amanieu
Copy link
Member

Amanieu commented Feb 18, 2022

Platform strlen implementations typically have extensive optimizations, and I don't think we'd want to miss out on those. It might make sense to ensure we have a weak strlen in compiler-builtins or similar. (cc @Amanieu)

I am generally in favor of depending on more libc symbols in core (strlen, memchr, bzero, etc) while providing fallbacks in compiler-builtins for platforms that don't have that symbol.

@bors

This comment was marked as resolved.

@petrochenkov
Copy link
Contributor Author

trait CStrExt

Two inherent methods on CStr (into_c_string and to_string_lossy) have signatures including types unavailable in libcore (CString, Cow and Box).

To keep backward compatibility I had to add an unstable trait CStrExt with these two stable methods, and add it to prelude.
That trait is only implemented for CStr.

strlen

CStr(ing) needs to calculate lengths of zero-terminated byte arrays.
While this functionality is trivial, its implementation can also be optimized by using CPU architecture specific techniques.
Such optimized implementations usually live in platform's libc, this is where the current libstd takes them from.

libcore doesn't have access to platform's libc, so it has to either

  • Use the naive trivial implementation of strlen, or
  • Provide the optimized version of strlen, similarly to how it does with memchr, or
  • Use some scheme with weak symbols as suggested above, with weak naive implementation being provided by libcore by default, but then overridable by any other linked libraries, including platform's libc.

@petrochenkov petrochenkov added S-waiting-on-team I-libs-api-nominated Nominated for discussion during a libs-api team meeting. I-libs-nominated Nominated for discussion during a libs team meeting. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2022
@m-ou-se m-ou-se removed the I-libs-nominated Nominated for discussion during a libs team meeting. label Feb 23, 2022
@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/libs-api team meeting. Many people are enthusiastic to see this work happen.

Regarding relying on target ABI, there were zero objections.

Regarding strlen, there was a consensus in favor of the general approach @Amanieu suggested, of going ahead and using platform C library functions in core as long as they're functions that we can fall back to compiler-builtins for, so that we can use the optimized implementations from the C library if available.

Regarding CStr functions that return types only available in alloc or std: we had some discussion about whether we should use an extension trait for this, or use a lang item or some similar mechanism to allow extending CStr from alloc and std. Our primary concern there was that an extension trait may technically be a very slightly breaking change, if someone had a different trait with the same method names; extension trait vs inherent method has different precedence. Using lang items to extend the set of inherent methods from alloc wouldn't have that issue.

Ultimately, though, we decided that we'd rather not add lang items for this, and we'd rather use an extension trait as long as that won't break anything in practice. So, summary: let's do a crater run to confirm that the extension trait doesn't break anything, and then go with the extension trait.

@petrochenkov
Copy link
Contributor Author

we had some discussion about whether we should use an extension trait for this, or use a lang item or some similar mechanism to allow extending CStr from alloc and `std

When implementing this I recalled that we had such extension traits before, but I couldn't find any of them in libcore now.
Apparently all of them were converted to lang items in 2018 (the main PR was #49896), so the existing policy seems to be preferring lang items for this.

This CStr(ing) move would require a single new lang item in liballoc.

@joshtriplett
Copy link
Member

joshtriplett commented Mar 2, 2022

@petrochenkov A lang item seems fine as well. And (EDIT: corrected) we don't need a crater run in that case.

@bstrie
Copy link
Contributor

bstrie commented Mar 2, 2022

Note that #94503 has just moved c_char (among others) to core::ffi.

@joshtriplett
Copy link
Member

We discussed this again in today's @rust-lang/libs meeting. We're fine with using a lang item in this case, since that's what we already do in other cases.

In general, we'd like to steer away in the future from having any inherent methods on types in core that return types from alloc or std (or inherent methods on types in alloc that return types from std, if any). Such methods make the binding between those two types more special, and make it a little harder to write drop-in replacements for types like CString/String/etc. So, for instance, in this case we should ideally steer people towards into() rather than into_c_string(), so that people can just impl From<CStr> for MyCStringType.

But none of that is a blocker for this move; this method can go ahead and use a lang item as other methods do, and we'll have the more general discussion in the future.

@m-ou-se m-ou-se added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-team I-libs-api-nominated Nominated for discussion during a libs-api team meeting. labels Mar 9, 2022
@petrochenkov
Copy link
Contributor Author

@bors r=joshtriplett

@bors
Copy link
Collaborator

bors commented Apr 14, 2022

📌 Commit 6eaec56 has been approved by joshtriplett

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 14, 2022
@bors
Copy link
Collaborator

bors commented Apr 15, 2022

⌛ Testing commit 6eaec56 with merge f50f07b174bc2022c40eefc1b9b5f6ade8b4505a...

@bors

This comment was marked as resolved.

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 15, 2022
@rust-log-analyzer

This comment was marked as resolved.

@petrochenkov
Copy link
Contributor Author

@bors r=joshtriplett

@bors
Copy link
Collaborator

bors commented Apr 15, 2022

📌 Commit f62c84e has been approved by joshtriplett

@bors
Copy link
Collaborator

bors commented Apr 15, 2022

⌛ Testing commit f62c84e with merge 1e6fe58...

@bors
Copy link
Collaborator

bors commented Apr 15, 2022

☀️ Test successful - checks-actions
Approved by: joshtriplett
Pushing 1e6fe58 to master...

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (1e6fe58): comparison url.

Summary:

  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: mixed results
Regressions 😿
(primary)
Regressions 😿
(secondary)
Improvements 🎉
(primary)
Improvements 🎉
(secondary)
All 😿 🎉
(primary)
count1 1 3 0 2 1
mean2 0.8% 0.9% N/A -0.6% 0.8%
max 0.8% 1.2% N/A -0.7% 0.8%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

@petrochenkov
Copy link
Contributor Author

@lygstate
This PR implements the strategy from #94079 (comment).
If libc is linked then strlen is used from there (this is typically a fast platform-provided implementation), otherwise it's used from compiler-builtins (this is a naive implementation).

If you have issues with this setup, could you open a new issue and describe your case?

@pnkfelix
Copy link
Contributor

Visiting for weekly performance triage

@rustbot label: +perf-regression-triaged

@CAD97
Copy link
Contributor

CAD97 commented Jun 8, 2022

I want to note that this is a regression in rustdoc output, as https://doc.rust-lang.org/nightly/std/ffi/type.CString.html is now just a link to https://doc.rust-lang.org/nightly/alloc/ffi/struct.CString.html. (This also causes the type to show up as an external type for implementations.)

Also, should this get a tracking issue rather than pointing to this PR?

@joshtriplett
Copy link
Member

@CAD97 When we stabilize this, we can switch back from type aliases to re-exports, which should address that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow CStr and CString types to be used with no_std