Skip to content

feat(string): implement RopeString with thin vtable and lazy flattening#5006

Open
akash-R-A-J wants to merge 14 commits intoboa-dev:mainfrom
akash-R-A-J:feat/rope-strings
Open

feat(string): implement RopeString with thin vtable and lazy flattening#5006
akash-R-A-J wants to merge 14 commits intoboa-dev:mainfrom
akash-R-A-J:feat/rope-strings

Conversation

@akash-R-A-J
Copy link
Contributor

This Pull Request fixes/closes #5005 .

This PR introduces Rope strings to boa_string and integrates them into the engine’s concatenation pipeline to eliminate the pathological O(N²) behavior of repeated string concatenation.

Key Changes

Rope Strings

  • Added RopeString representation (core/string/src/vtable/rope.rs)
  • Lazy flattening: ropes are only flattened when a contiguous buffer is required
  • Balanced tree construction for batch concatenations
  • Depth limit (32) and small-string threshold to avoid pathological trees
  • Iterative traversal to prevent recursion overflow

Engine Integration

  • JsValue::add now creates ropes for large concatenations (len(lhs) + len(rhs) > 1024)
  • ConcatToString updated to use balanced batch concatenation
  • String.prototype.concat refactored to use the new API

Thin VTable & Performance Recovery

  • Moved method pointers to static vtables, reducing string header size
  • Devirtualized hot paths (clone, drop, as_str, code_unit_at)
  • Added cached u64 hash to speed up property lookups
  • Added ptr_eq + hash checks to optimize equality comparisons

Benchmarks

V8 Combined Suite

main : 168.13 s
rope : 166.20 s

1% improvement (within noise)

Concatenation Stress Test

main : 424 ms
rope : 156 ms

2.7× faster

Summary

This change removes quadratic concatenation behavior while maintaining baseline performance for general workloads. The thin vtable and devirtualization ensure that ropes introduce minimal overhead when not used.

@akash-R-A-J akash-R-A-J requested a review from a team as a code owner March 11, 2026 23:58
@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 44.65318% with 383 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.72%. Comparing base (6ddc2b4) to head (f21309a).
⚠️ Report is 858 commits behind head on main.

Files with missing lines Patch % Lines
core/string/src/iter.rs 25.80% 115 Missing ⚠️
core/string/src/vtable/rope.rs 36.58% 104 Missing ⚠️
core/string/src/lib.rs 62.22% 68 Missing ⚠️
core/string/src/str.rs 43.80% 59 Missing ⚠️
cli/src/debug/string.rs 0.00% 8 Missing ⚠️
core/string/src/vtable/slice.rs 30.00% 7 Missing ⚠️
core/engine/src/builtins/intl/segmenter/mod.rs 0.00% 6 Missing ⚠️
core/runtime/src/text/encodings.rs 50.00% 4 Missing ⚠️
core/string/src/vtable/static.rs 62.50% 3 Missing ⚠️
core/string/src/builder.rs 75.00% 2 Missing ⚠️
... and 6 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #5006       +/-   ##
===========================================
+ Coverage   47.24%   58.72%   +11.47%     
===========================================
  Files         476      565       +89     
  Lines       46892    63124    +16232     
===========================================
+ Hits        22154    37069    +14915     
- Misses      24738    26055     +1317     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +36 to +65
if depth > 32 {
// Auto-flatten if we hit the depth limit, unless the string is "insanely" large.
// This bounds access time and recursion depth for other components.
if left.len() + right.len() < 1_000_000 {
let mut vec = Vec::with_capacity(left.len() + right.len());
for s in [&left, &right] {
match s.variant() {
crate::JsStrVariant::Latin1(l) => {
vec.extend(l.iter().map(|&b| u16::from(b)));
}
crate::JsStrVariant::Utf16(u) => vec.extend_from_slice(u),
}
}
return JsString::from(&vec[..]);
}
}

let rope = Box::new(Self {
header: RawJsString {
vtable: &ROPE_VTABLE,
len: left.len() + right.len(),
refcount: 1,
kind: JsStringKind::Rope,
hash: 0,
},
left,
right,
flattened: OnceCell::new(),
depth,
});
Copy link
Member

@jedel1043 jedel1043 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to rebalance the rope following some kind of heuristic. Otherwise, you kinda lose the time complexity of O(log N) to get an element, since one side could get arbitrarily large, which converts indexing into an O(N) operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've addressed rebalancing in two ways:

Batch Concatenations: The concat_strings_balanced helper uses a mid-point split to build a perfectly balanced tree (log N depth).

Incremental Concatenations: I've implemented a hard depth limit of 32 in RopeString::create. If this limit is exceeded, the rope auto-flattens into a contiguous string. This effectively bounds indexing to O(32), which is O(1) in practice, preventing the O(N) worst-case.

This keeps the implementation simple while providing a guaranteed bound on access time

Copy link
Member

@jedel1043 jedel1043 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concat_strings_balanced helper uses a mid-point split to build a perfectly balanced tree (log N depth).

That's not a balanced tree because you only consider the top strings, not if the strings themselves are ropes. If you pass an array of 3 strings, where 2 strings are ropes of depth 2 and 1 string is a rope of depth 28, you will have a very unbalanced tree.

This effectively bounds indexing to O(32), which is O(1) in practice, preventing the O(N) worst-case.

This is not O(N) in practice, because if you have thousands of strings of depth 32 and you need to index through all of them, you would pay the cost of O(N).

Copy link
Contributor Author

@akash-R-A-J akash-R-A-J Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation uses a depth cap to avoid pathological trees, but you're right that it doesn't guarantee logarithmic access in all cases.

Would it make sense to adopt a Fibonacci-based rebalancing heuristic (similar to classical rope implementations) so that we guarantee weight-balanced ropes instead of relying on flattening?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's pretty much the standard for ropes

// SAFETY: Caller must ensure the type matches.
unsafe { self.ptr.cast::<T>().as_ref() }
#[must_use]
pub fn as_str(&self) -> JsStr<'static> {
Copy link
Member

@jedel1043 jedel1043 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsound. It's returning a 'static reference to a temporary JsString, so you could deallocate the original string and the compiler will happily let you access the JsStr<'static>:

let s = JsString::from_str("undefined behaviour!");
let temp = s.as_str();
drop(s)
println!("{}", temp.display_lossy()); // UB!

Copy link
Contributor Author

@akash-R-A-J akash-R-A-J Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! you're absolutely right, that was a leftover from the initial pointer-based POC while focusing on the memory layout. I've now refactored the vtable to use HRTBs for as_str which replaces that temporary shim with a lifetime binding. It was my fault.

Copy link
Contributor

@zhuzhu81998 zhuzhu81998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci shows segmentaion fault (potential cause below)

let specifier = specifier.cow_replace('/', "\\");

let short_path = Path::new(&specifier);
let short_path = Path::new(&*specifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this have to do with rope strings?

pub(crate) mod rope;
pub(crate) use rope::RopeString;

/// Header for all `JsString` allocations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then call this JsStringHeader instead of RawJsString perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, will replace RawJsString with the suggested JsStringHeader

Comment on lines +37 to +39
unsafe impl Sync for RawJsString {}
// SAFETY: RawJsString contains only thread-safe data.
unsafe impl Send for RawJsString {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this somewhere?

Comment on lines +60 to +63
// SAFETY: We only mutate refcount and hash via atomic-casts when kind != Static.
unsafe impl Sync for JsStringVTable {}
// SAFETY: JsStringVTable contains only thread-safe data.
unsafe impl Send for JsStringVTable {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question: where is the need?

Comment on lines +61 to +70
/// A rope string that is a tree of other strings.
#[repr(C)]
pub(crate) struct RopeString {
/// Standardized header for all strings.
pub(crate) header: RawJsString,
pub(crate) left: JsString,
pub(crate) right: JsString,
flattened: OnceCell<JsString>,
pub(crate) depth: u8,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm so it is my understanding that JsString here will most likely GC-tracked. And there is no way for GC to know that RopeString is holdig reference to the left and right here?
are yo doing something to prevent this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: JSStrings are not GC-tracked, they're independently ref-counted, so it's fine to not trace through them. I'm fairly certain the UB comes from the modifications made to as_str that removed the lifetime inheritance for JsStr<'static>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed its not gc. disabling the gc does not resolve the segmentation fault XD.

@github-actions
Copy link

github-actions bot commented Mar 14, 2026

Test262 conformance changes

Test result main count PR count difference
Total 52,963 52,963 0
Passed 49,935 49,934 -1
Ignored 2,207 2,207 0
Failed 821 822 +1
Panics 0 0 0
Conformance 94.28% 94.28% -0.00%
Broken tests (1):
test/staging/sm/String/string-pad-start-end.js (previously Passed)

Tested main commit: ea849b7140574c446fbad07ccb7a7e0a3b85cb17
Tested PR commit: f6061626236bd61d1be681ce144526934de5bbf0
Compare commits: ea849b7...f606162

pub(crate) header: JsStringHeader,
pub(crate) left: JsString,
pub(crate) right: JsString,
flattened: OnceLock<JsString>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhh I doubt you need a thread safe OnceCell here. JsString implements !Send and !Sync, so it's kinda unnecessary to make this thread safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've switched it to OnceCell to avoid any unnecessary thread-safe overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting this back to OnceLock as OnceCell panics on reentrant initialization, and since rope trees can share nodes symmetrically, flattening may recursively call .as_str() on the same node, leading to reentrant initialization and test262 failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhh if you have that behaviour that's a bug though... because if you have a children that recursively references itself, things like code_point_at could infinitely recurse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a self reference cycle. Since JsString is refcounted, ropes can share subtrees and form DAGs. During flattening the same node may be reached through multiple paths before its cache is initialized, which triggers reentrant initialization in OnceCell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I don't see the reentrancy. If you have a children that is shared between two nodes, the first access would build the flat version, and the second access would just fetch the already constructed array, no?

But I think this is all null because your code already does a DFS so it should only call as_str for the leaves, which are never ropes.

Copy link
Contributor Author

@akash-R-A-J akash-R-A-J Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, you're right that the DFS traversal only calls as_str() on leaf strings, not ropes, so rope_as_str itself shouldn't recursively reenter. The earlier panic I observed with OnceCell likely came from another path (possibly inside concat_array). I'll investigate that path more closely.

@akash-R-A-J akash-R-A-J requested a review from jedel1043 March 14, 2026 02:23
@akash-R-A-J
Copy link
Contributor Author

@jedel1043 I believe all requested changes have now been addressed. CI is green across all platforms, and test262 results match main.

Please let me know if anything else needs adjustment.

@github-actions github-actions bot added Waiting On Review Waiting on reviews from the maintainers C-Tests Issues and PRs related to the tests. C-Builtins PRs and Issues related to builtins/intrinsics C-VM Issues and PRs related to the Boa Virtual Machine. and removed Waiting On Review Waiting on reviews from the maintainers labels Mar 15, 2026
@github-actions github-actions bot added the Waiting On Review Waiting on reviews from the maintainers label Mar 15, 2026
Comment on lines +97 to +101
// Using a raw buffer cache instead of `JsString` in `OnceCell` solves the "refcount aliasing"
// problem. Storing a refcounted `JsString` inside a `OnceCell` could lead to ownership cycles
// or use-after-free errors if shared nodes were dropped while still cached.
// By storing raw `Box<[u8]>` or `Box<[u16]>`, we decouple the cache from the refcounting system.
flattened: OnceCell<Flattened>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use Option<JsString>...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it should be Cell<Option<JsString>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise you won't be able to modify it 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait that doesn't work, you need to have something referenceable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... what if we just make as_str return Option<JsStr>? That kinda makes more sense since flattening ropes should be reserved to only exceptional situations such as reaching the maximum depth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also nicely reflects in our inner API that not all strings have a JsStr available to use, and the caller should be the one responsible for accessing the string in other ways if that's the case

Copy link
Contributor Author

@akash-R-A-J akash-R-A-J Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense from an API design perspective. However, changing as_str() to return Option<JsStr> would require a fairly large refactor across the engine since many built-ins and internal methods currently rely on it always returning a JsStr. Would it be okay if I explore that in a separate follow-up PR?

@akash-R-A-J akash-R-A-J requested a review from jedel1043 March 15, 2026 03:23
@akash-R-A-J akash-R-A-J requested a review from nekevss as a code owner March 15, 2026 14:59
@github-actions github-actions bot added C-CLI Issues and PRs related to the Boa command line interface. C-Intl Changes related to the `Intl` implementation C-Runtime Issues and PRs related to Boa's runtime features labels Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-Builtins PRs and Issues related to builtins/intrinsics C-CLI Issues and PRs related to the Boa command line interface. C-Intl Changes related to the `Intl` implementation C-Runtime Issues and PRs related to Boa's runtime features C-Tests Issues and PRs related to the tests. C-VM Issues and PRs related to the Boa Virtual Machine. Waiting On Review Waiting on reviews from the maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize string concatenation by introducing Rope strings

3 participants