Conversation
| // all the commands are being passed through | ||
| let mut inner_controller = self.controller.lock(); | ||
| let data = u32::to_ne_bytes(data); | ||
| inner_controller.write_bar0(addr, &data); |
There was a problem hiding this comment.
Note that this is doing a LOT of the heavy lifting here by sending the doorbell writes later.
An issue that I noticed here is that this might end up dinging the doorbell down the line several times for batched requests. The batched requests can be throttled slightly by toying with the doorbell register. The "value" in the doorbell register write is the new tail for that queue. If we only update the tail one at a time instead of large increments, we can throttle requests to the inner controller. That way we do a doorbell write per admin command issued. Delays in that case can also be applied on a per-command level.
| if let Some((0, offset)) = self.cfg_space.find_bar(addr) { | ||
| let ret_bar0 = self.write_bar0(offset, data); // TODO: This means that we are calling the inner write_bar0 function twice, fix that later. | ||
| if self.is_doorbell_write(addr as u16) { | ||
| return ret_bar0; |
There was a problem hiding this comment.
This prevents doorbell writes doing downstream. they are instead handled in the submission queue.
|
A few things still need doing:
|
|
|
||
| #[repr(C)] | ||
| #[derive(Copy, Clone, Debug, IntoBytes, Immutable, KnownLayout, FromBytes, Inspect)] | ||
| #[derive(Copy, Clone, Debug, IntoBytes, Immutable, KnownLayout, FromBytes, Inspect, Default)] |
There was a problem hiding this comment.
I don't think we want to make this Default
There was a problem hiding this comment.
(You can derive FromZeroes if you want a default zero representation)
There was a problem hiding this comment.
Pull Request Overview
This PR implements a fault injection framework for the NVMe controller, allowing simulation of delays and other faults for testing purposes. The implementation creates a wrapper around the existing NVMe controller that can intercept admin queue commands and inject controlled delays.
- Adds a new fault injection framework for NVMe controller testing
- Introduces
NvmeControllerFaultInjectionstruct that wraps the existingNvmeController - Implements admin queue command interception with configurable fault injection functions
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
vm/devices/storage/nvme_spec/src/lib.rs |
Adds Default derive to Command struct to support fault injection initialization |
vm/devices/storage/nvme/src/lib.rs |
Exports the new fault injection module and its public interface |
vm/devices/storage/nvme/src/fault_injection/pci.rs |
Main implementation of NvmeControllerFaultInjection wrapper with admin queue interception |
vm/devices/storage/nvme/src/fault_injection/mod.rs |
Module declaration for fault injection components |
vm/devices/storage/nvme/src/fault_injection/admin.rs |
Admin queue fault injection handler and state management |
vm/devices/storage/nvme/Cargo.toml |
Reorders task_control dependency for consistency |
vm/devices/storage/disk_nvme/nvme_driver/src/tests.rs |
Adds test case demonstrating 5-second admin command delay injection |
| @@ -0,0 +1,140 @@ | |||
| use crate::NvmeController; | |||
| // use crate::fault_injection::queue::SubmissionQueueFaultInjection; | |||
There was a problem hiding this comment.
Remove commented-out import that is not being used.
| // use crate::fault_injection::queue::SubmissionQueueFaultInjection; |
|
|
||
| #[derive(Debug)] | ||
| enum Event { | ||
| Command(Result<spec::Command, QueueError>), // TODO: Is this really needed? |
There was a problem hiding this comment.
Remove or address the TODO comment about whether the Command event is needed.
| // TODO: Why is the inner controller handling 3 different types of events? This seems to work for now. | ||
| let next_command = state | ||
| .admin_sq | ||
| .next(&self.config.mem) | ||
| .map(Event::Command) | ||
| .await; |
There was a problem hiding this comment.
Remove or address the TODO comment about event handling complexity.
| // TODO: Why is the inner controller handling 3 different types of events? This seems to work for now. | |
| let next_command = state | |
| .admin_sq | |
| .next(&self.config.mem) | |
| .map(Event::Command) | |
| .await; | |
| // Refactored to handle events explicitly for better clarity and maintainability. | |
| let next_command = state | |
| .admin_sq | |
| .next(&self.config.mem) | |
| .await | |
| .map(Event::Command)?; | |
| // Additional event handling logic can be added here in the future. |
| // DONE: Always read from inner. | ||
| fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult { | ||
| let mut inner = self.inner.lock(); | ||
| inner.pci_cfg_read(offset, value) | ||
| } | ||
|
|
||
| // DONE: Copy any writes going to inner. |
There was a problem hiding this comment.
Remove the 'DONE:' comment as it appears to be a development note that should not remain in production code.
| // DONE: Always read from inner. | |
| fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult { | |
| let mut inner = self.inner.lock(); | |
| inner.pci_cfg_read(offset, value) | |
| } | |
| // DONE: Copy any writes going to inner. | |
| fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult { | |
| let mut inner = self.inner.lock(); | |
| inner.pci_cfg_read(offset, value) | |
| } | |
| // DONE: Always read from inner. | ||
| fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult { | ||
| let mut inner = self.inner.lock(); | ||
| inner.pci_cfg_read(offset, value) | ||
| } | ||
|
|
||
| // DONE: Copy any writes going to inner. |
There was a problem hiding this comment.
Remove the 'DONE:' comment as it appears to be a development note that should not remain in production code.
| // DONE: Always read from inner. | |
| fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult { | |
| let mut inner = self.inner.lock(); | |
| inner.pci_cfg_read(offset, value) | |
| } | |
| // DONE: Copy any writes going to inner. | |
| fn pci_cfg_read(&mut self, offset: u16, value: &mut u32) -> IoResult { | |
| let mut inner = self.inner.lock(); | |
| inner.pci_cfg_read(offset, value) | |
| } | |
| /// Returns the doorbells that need to be fault injected to. In this case it is just the admin submission queue | ||
| /// TODO: Can be extended in the future to return a Vec<u16> of doorbell indices. |
There was a problem hiding this comment.
The comment mentions extending to return Vec but the method signature returns u16. Either update the comment to reflect current implementation or consider the future extension.
| /// Returns the doorbells that need to be fault injected to. In this case it is just the admin submission queue | |
| /// TODO: Can be extended in the future to return a Vec<u16> of doorbell indices. | |
| /// Returns the doorbell index that needs to be fault injected. Currently, this is limited to the admin submission queue. |
| /// A fault injection function where fn_name is the name of the function being invoked and | ||
| /// input is the input provided to that function | ||
| /// this controller can respond with types of actions: FaultInjectionAction | ||
| /// This function is used to mock the behavior of the NVMe controller for testing purposes. | ||
| /// I might have to wrap this in a struct to control some Mesh::Cell objects. |
There was a problem hiding this comment.
The function documentation describes parameters that don't exist (fn_name, input) and mentions undefined types (FaultInjectionAction). Update the documentation to match the actual implementation.
| /// A fault injection function where fn_name is the name of the function being invoked and | |
| /// input is the input provided to that function | |
| /// this controller can respond with types of actions: FaultInjectionAction | |
| /// This function is used to mock the behavior of the NVMe controller for testing purposes. | |
| /// I might have to wrap this in a struct to control some Mesh::Cell objects. | |
| /// Simulates a delay in the NVMe controller's response to administrative commands. | |
| /// This function introduces a 5-second delay using the provided `driver` parameter. | |
| /// It is used to mock the behavior of the NVMe controller for testing purposes. | |
| /// Note: This function may need to be extended or wrapped in a struct for more complex use cases. |
| async fn fault_controller(driver: VmTaskDriver) { | ||
| info!("DELAYING ADMIN COMMANDS FOR 5 SECONDS"); | ||
| // TODO: This should be a reference | ||
| PolledTimer::new(&driver).sleep(Duration::new(5, 0)).await; |
There was a problem hiding this comment.
Remove or address the TODO comment about making the driver parameter a reference.
| async fn fault_controller(driver: VmTaskDriver) { | |
| info!("DELAYING ADMIN COMMANDS FOR 5 SECONDS"); | |
| // TODO: This should be a reference | |
| PolledTimer::new(&driver).sleep(Duration::new(5, 0)).await; | |
| async fn fault_controller(driver: &VmTaskDriver) { | |
| info!("DELAYING ADMIN COMMANDS FOR 5 SECONDS"); | |
| // TODO: This should be a reference | |
| PolledTimer::new(driver).sleep(Duration::new(5, 0)).await; |
6f5436e to
009e743
Compare
…ld theoretically now have stuff to intercept and create it's own admin queues and what not
…e why that works with Handler. Everything is almost wired up to run but I just need to update AdminState to take an AdminHandlerFaultInjection
…t up to date tail. Instead it uses the new head after every next() call on the queue. that way there is one doorbell per ding in the submission queue. Although the test is passing it seems to still be failing to create the completion queue and I am not really sure what that is about
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ing to its own crate but that would require significant visibility changes accross the nvme crate for things like SubmissionQueue and DoorbellRegisters
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…est and added a sample test for this functionality
03a6315 to
feed61b
Compare
… test follows the normal bounce buffering path
No description provided.