Skip to content

Controller delay#12

Open
gurasinghMS wants to merge 66 commits intomainfrom
controller_delay
Open

Controller delay#12
gurasinghMS wants to merge 66 commits intomainfrom
controller_delay

Conversation

@gurasinghMS
Copy link
Owner

No description provided.

// all the commands are being passed through
let mut inner_controller = self.controller.lock();
let data = u32::to_ne_bytes(data);
inner_controller.write_bar0(addr, &data);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is doing a LOT of the heavy lifting here by sending the doorbell writes later.
An issue that I noticed here is that this might end up dinging the doorbell down the line several times for batched requests. The batched requests can be throttled slightly by toying with the doorbell register. The "value" in the doorbell register write is the new tail for that queue. If we only update the tail one at a time instead of large increments, we can throttle requests to the inner controller. That way we do a doorbell write per admin command issued. Delays in that case can also be applied on a per-command level.

if let Some((0, offset)) = self.cfg_space.find_bar(addr) {
let ret_bar0 = self.write_bar0(offset, data); // TODO: This means that we are calling the inner write_bar0 function twice, fix that later.
if self.is_doorbell_write(addr as u16) {
return ret_bar0;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents doorbell writes doing downstream. they are instead handled in the submission queue.

@gurasinghMS
Copy link
Owner Author

A few things still need doing:

  • Replicate the AdminQueue enable / disable flow from the actual controller
  • Figure out why the I/O completion queue creation is not working (even though the test itself is passing)
  • Some cleanup of unused variables and such.
  • Add some comments on the what and why for the notable functions
  • Add fault injection to the PCI side of the controller too!


#[repr(C)]
#[derive(Copy, Clone, Debug, IntoBytes, Immutable, KnownLayout, FromBytes, Inspect)]
#[derive(Copy, Clone, Debug, IntoBytes, Immutable, KnownLayout, FromBytes, Inspect, Default)]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to make this Default

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(You can derive FromZeroes if you want a default zero representation)

@gurasinghMS gurasinghMS requested a review from Copilot July 30, 2025 17:52
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a fault injection framework for the NVMe controller, allowing simulation of delays and other faults for testing purposes. The implementation creates a wrapper around the existing NVMe controller that can intercept admin queue commands and inject controlled delays.

  • Adds a new fault injection framework for NVMe controller testing
  • Introduces NvmeControllerFaultInjection struct that wraps the existing NvmeController
  • Implements admin queue command interception with configurable fault injection functions

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
vm/devices/storage/nvme_spec/src/lib.rs Adds Default derive to Command struct to support fault injection initialization
vm/devices/storage/nvme/src/lib.rs Exports the new fault injection module and its public interface
vm/devices/storage/nvme/src/fault_injection/pci.rs Main implementation of NvmeControllerFaultInjection wrapper with admin queue interception
vm/devices/storage/nvme/src/fault_injection/mod.rs Module declaration for fault injection components
vm/devices/storage/nvme/src/fault_injection/admin.rs Admin queue fault injection handler and state management
vm/devices/storage/nvme/Cargo.toml Reorders task_control dependency for consistency
vm/devices/storage/disk_nvme/nvme_driver/src/tests.rs Adds test case demonstrating 5-second admin command delay injection

@@ -0,0 +1,140 @@
use crate::NvmeController;
// use crate::fault_injection::queue::SubmissionQueueFaultInjection;
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out import that is not being used.

Suggested change
// use crate::fault_injection::queue::SubmissionQueueFaultInjection;

Copilot uses AI. Check for mistakes.

#[derive(Debug)]
enum Event {
Command(Result<spec::Command, QueueError>), // TODO: Is this really needed?
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or address the TODO comment about whether the Command event is needed.

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +92
// TODO: Why is the inner controller handling 3 different types of events? This seems to work for now.
let next_command = state
.admin_sq
.next(&self.config.mem)
.map(Event::Command)
.await;
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or address the TODO comment about event handling complexity.

Suggested change
// TODO: Why is the inner controller handling 3 different types of events? This seems to work for now.
let next_command = state
.admin_sq
.next(&self.config.mem)
.map(Event::Command)
.await;
// Refactored to handle events explicitly for better clarity and maintainability.
let next_command = state
.admin_sq
.next(&self.config.mem)
.await
.map(Event::Command)?;
// Additional event handling logic can be added here in the future.

Copilot uses AI. Check for mistakes.
Comment on lines +340 to +393
// DONE: Always read from inner.
fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult {
let mut inner = self.inner.lock();
inner.pci_cfg_read(offset, value)
}

// DONE: Copy any writes going to inner.
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the 'DONE:' comment as it appears to be a development note that should not remain in production code.

Suggested change
// DONE: Always read from inner.
fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult {
let mut inner = self.inner.lock();
inner.pci_cfg_read(offset, value)
}
// DONE: Copy any writes going to inner.
fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult {
let mut inner = self.inner.lock();
inner.pci_cfg_read(offset, value)
}

Copilot uses AI. Check for mistakes.
Comment on lines +340 to +393
// DONE: Always read from inner.
fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult {
let mut inner = self.inner.lock();
inner.pci_cfg_read(offset, value)
}

// DONE: Copy any writes going to inner.
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the 'DONE:' comment as it appears to be a development note that should not remain in production code.

Suggested change
// DONE: Always read from inner.
fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult {
let mut inner = self.inner.lock();
inner.pci_cfg_read(offset, value)
}
// DONE: Copy any writes going to inner.
fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult {
let mut inner = self.inner.lock();
inner.pci_cfg_read(offset, value)
}

Copilot uses AI. Check for mistakes.
Comment on lines +135 to +139
/// Returns the doorbells that need to be fault injected to. In this case it is just the admin submission queue
/// TODO: Can be extended in the future to return a Vec<u16> of doorbell indices.
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions extending to return Vec but the method signature returns u16. Either update the comment to reflect current implementation or consider the future extension.

Suggested change
/// Returns the doorbells that need to be fault injected to. In this case it is just the admin submission queue
/// TODO: Can be extended in the future to return a Vec<u16> of doorbell indices.
/// Returns the doorbell index that needs to be fault injected. Currently, this is limited to the admin submission queue.

Copilot uses AI. Check for mistakes.
Comment on lines +437 to +441
/// A fault injection function where fn_name is the name of the function being invoked and
/// input is the input provided to that function
/// this controller can respond with types of actions: FaultInjectionAction
/// This function is used to mock the behavior of the NVMe controller for testing purposes.
/// I might have to wrap this in a struct to control some Mesh::Cell objects.
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function documentation describes parameters that don't exist (fn_name, input) and mentions undefined types (FaultInjectionAction). Update the documentation to match the actual implementation.

Suggested change
/// A fault injection function where fn_name is the name of the function being invoked and
/// input is the input provided to that function
/// this controller can respond with types of actions: FaultInjectionAction
/// This function is used to mock the behavior of the NVMe controller for testing purposes.
/// I might have to wrap this in a struct to control some Mesh::Cell objects.
/// Simulates a delay in the NVMe controller's response to administrative commands.
/// This function introduces a 5-second delay using the provided `driver` parameter.
/// It is used to mock the behavior of the NVMe controller for testing purposes.
/// Note: This function may need to be extended or wrapped in a struct for more complex use cases.

Copilot uses AI. Check for mistakes.
Comment on lines +442 to +445
async fn fault_controller(driver: VmTaskDriver) {
info!("DELAYING ADMIN COMMANDS FOR 5 SECONDS");
// TODO: This should be a reference
PolledTimer::new(&driver).sleep(Duration::new(5, 0)).await;
Copy link

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or address the TODO comment about making the driver parameter a reference.

Suggested change
async fn fault_controller(driver: VmTaskDriver) {
info!("DELAYING ADMIN COMMANDS FOR 5 SECONDS");
// TODO: This should be a reference
PolledTimer::new(&driver).sleep(Duration::new(5, 0)).await;
async fn fault_controller(driver: &VmTaskDriver) {
info!("DELAYING ADMIN COMMANDS FOR 5 SECONDS");
// TODO: This should be a reference
PolledTimer::new(driver).sleep(Duration::new(5, 0)).await;

Copilot uses AI. Check for mistakes.
…ld theoretically now have stuff to intercept and create it's own admin queues and what not
…e why that works with Handler. Everything is almost wired up to run but I just need to update AdminState to take an AdminHandlerFaultInjection
…t up to date tail. Instead it uses the new head after every next() call on the queue. that way there is one doorbell per ding in the submission queue. Although the test is passing it seems to still be failing to create the completion queue and I am not really sure what that is about
gurasinghMS and others added 17 commits August 6, 2025 13:57
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ing to its own crate but that would require significant visibility changes accross the nvme crate for things like SubmissionQueue and DoorbellRegisters
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…est and added a sample test for this functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants