Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk is unusable for minutes at a time #10

Open
WorkingRobot opened this issue Feb 22, 2021 · 20 comments · May be fixed by #14
Open

Disk is unusable for minutes at a time #10

WorkingRobot opened this issue Feb 22, 2021 · 20 comments · May be fixed by #14

Comments

@WorkingRobot
Copy link

In my current use of WinSpd, there are somewhat rare occasions, usually while the disk is being read from, when it completely freezes up and refuses to respond for a while. When that happens, the disk's active time spikes to 100%, despite not having any active transfers (as reported by Task Manager.) During this "hang time", none of my callbacks are running either. In perfmon, it reports that the "Current Disk Queue Length" is 1, with all other values being 0. Because both the "% Disk Time" and "% Idle Time" is 0%, it leads me to believe that there is some sort of an internal problem. My mounted drive is write protected, meaning that there should be no other interferences.

Task Manager during hang
perfmon during hang

@WorkingRobot
Copy link
Author

Doing some more research into this, and it seems that when I changed ThreadCount in SpdStorageUnitStartDispatcher to 1, I no longer experienced this issue.

@billziss-gh
Copy link
Collaborator

I have never seen this problem.

Given that the problem resolves itself when you use a single thread, I am wondering whether the problem is some form of race condition when you are using multiple threads.

@WorkingRobot
Copy link
Author

That was my thought too, but I recently restarted my computer, and the problem had oddly fixed itself with no changes to the code. I guess it's some odd Windows stuff.

@billziss-gh
Copy link
Collaborator

Here is something to know: if you process is a console one and you accidentally make a selection in the console window, this can result in the whole process freezing under some circumstances until you unselect the text. This has bitten me a couple of times.

@WorkingRobot
Copy link
Author

WorkingRobot commented Feb 23, 2021

No, it definitely isn't that, especially since none of my callback code reports to actually be running. I had set up an atomic integer that increments/decrements and prints its value when entering or exiting the callback, and it just stopped getting called during those hangs. It happened even while I didn't have a debugger attached. If this issue pops up again, I'll make sure to screen record it.

@extratype
Copy link

extratype commented Mar 18, 2021

I have this issue too. Here's how I reproduce it:

  1. Use rawdisk-x64 to create a few GB disk.
  2. Format it in NTFS.
  3. Write a file, several hundreds MB's in size.
  4. Unmount and mount again to flush disk cache. Then read it:
#include <stdlib.h>
#include <stdio.h>
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>

int wmain(int argc, wchar_t** argv)
{
    if (argc < 3) return -1;

    int bufsize = _wtoi(argv[1]);
    UINT8* buf = (UINT8*) malloc(bufsize);
    HANDLE h = CreateFileW(argv[2], GENERIC_READ, 0, NULL,
        OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

    UINT64 progress = 0;
    while (true)
    {
        DWORD bytesRead;
        if (!ReadFile(h, buf, bufsize, &bytesRead, NULL)) break;
        if (bytesRead == 0) break;

        progress += bytesRead;
        printf("%lld\n", progress);
    }

    CloseHandle(h);
    return 0;
}

(in my case args are "32768 file")
(similar effect using 7-Zip: just right-click the file and select "CRC SHA" > "CRC-32")

  1. (optional) Repeat 4. a few times more.
  2. Eventually the WinSpd driver stops responding.
    The call stack says it waits for DeviceIoControl() called by SpdIoctlTransact() inside SpdStorageUnitHandleTransact().
    The issue goes away with ThreadCount == 1.

@WorkingRobot
Copy link
Author

I have this issue too.

Whew, and I thought I was the crazy one.

@WorkingRobot WorkingRobot reopened this Mar 18, 2021
@extratype
Copy link

I think the problem lies in the kernel (driver or Windows). I can reproduce it without locks and file mappings in the user-mode callbacks. It doesn't occur while reading a file randomly in threads. It occurs while reading sequentially. Also I set a lock in Read() with ThreadCount == 0, but it didn't work either.

@extratype
Copy link

extratype commented Mar 26, 2021

I tried to trace some kernel code and noticed SpdIoqStartProcessingSrb() freezes from time to time. There seems to be a synchronization problem inside it.

@billziss-gh
Copy link
Collaborator

@extratype thanks for the debugging on this.

I do not have time to look at this right now, but I am hoping to look at it soon. If you find any additional information regarding this problem please let me know.

@extratype
Copy link

extratype commented Mar 30, 2021

I have reviewed ioq.c but I couldn't find any issues. I think SpdIoqStartProcessingSrb() was just waiting for requests. During the hang SpdHwStartIo() was not called at all even when I tried to issue other requests in user level. The drive stops hanging and continues to read the file sometimes. I compressed a file in rawdisk drive using 7-zip, extracted and compared it with the original file. It was NOT the same! For example, the data at [0x7C10000, 0x7C40000) was from [0x7C00000, 0x7C30000)of the original. [0x15750000,0x15760000) was from [0x15780000,0x15790000) for another try. I guess they were the file positions just when the drive hanged.

Also I found the drive does not hang with caching disabled (FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH for CreateFileW). It ran fine with 2 processes reading the same file.

@extratype
Copy link

extratype commented Mar 31, 2021

Call stack of 7-zip while rawdisk hangs:

nt!KiSwapContext+0x76
nt!KiSwapThread+0x500
nt!KiCommitThreadWait+0x14f
nt!KeWaitForSingleObject+0x233
nt!MiWaitForCollidedFaultComplete+0x10e
nt!MiHandleCollidedFault+0x101
nt!MiResolveTransitionFault+0x3e6
nt!MiResolveProtoPteFault+0x1236
nt!MiDispatchFault+0x3d5
nt!MmAccessFault+0x189
nt!KiPageFault+0x35e (TrapFrame @ fffff003`2c8761c0)
nt!memcpy+0x92
nt!CcCopyBytesToUserBuffer+0x6d
nt!CcMapAndCopyFromCache+0x124
nt!CcCopyReadEx+0x139
Ntfs!NtfsCopyReadA+0x2ed
FLTMGR!FltpPerformFastIoCall+0x16b
FLTMGR!FltpPassThroughFastIo+0x107
FLTMGR!FltpFastIoRead+0x165
nt!NtReadFile+0x37f
nt!KiSystemServiceCopyEnd+0x28 (TrapFrame @ fffff003`2c876b00)
ntdll!NtReadFile+0x14

I found an interesting thread: https://community.osr.com/discussion/198906/image-file-disk-driver-blocking-on-filter-manager
I'll try with FILE_FLAG_NO_BUFFERING in rawdisk.c.

@extratype
Copy link

extratype commented Apr 4, 2021

For testing I replaced CopyBuffer() in Read() with ReOpenFile(..., FILE_FLAG_NO_BUFFERING), SetFilePointerEx(), ReadFile() and then CloseHandle(). It worked without hangs!

Also I tried to debug the hang, here's what I got:

0: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Resource @ 0xffffe58b4da32940    Shared 2 owning threads
     Threads: ffffe58b4c33c040-01<*> ffffe58b4d110080-01<*> 
KD: Scanning for held locks..
22846 total locks, 1 locks currently held
0: kd> !thread ffffe58b4c33c040
THREAD ffffe58b4c33c040  Cid 0004.2130  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrPageIn) KernelMode Non-Alertable
    ffffe58b4da8e330  NotificationEvent
IRP List:
    ffffe58b4c9877f0: (0006,0478) Flags: 00060403  Mdl: ffffe58b47584350
Not impersonating
DeviceMap                 ffff94065045ba80
Owning Process            ffffe58b45885040       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      922499         Ticks: 275 (0:00:00:04.296)
Context Switch Count      6140           IdealProcessor: 2             
UserTime                  00:00:00.000
KernelTime                00:00:00.109
Win32 Start Address nt!ExpWorkerThread (0xfffff8067b025870)
Stack Init ffff8a0e3464cc90 Current ffff8a0e3464c270
Base ffff8a0e3464d000 Limit ffff8a0e34647000 Call 0000000000000000
Priority 14 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffff8a0e`3464c2b0 fffff806`7b065850 : ffff9406`00000009 00000000`ffffffff 00000000`00000000 ffffe58b`45993158 : nt!KiSwapContext+0x76
ffff8a0e`3464c3f0 fffff806`7b064d7f : ffffe58b`4c33c040 00000000`00000000 ffff8a0e`3464c5b0 fffff806`8090cd9d : nt!KiSwapThread+0x500
ffff8a0e`3464c4a0 fffff806`7b064623 : 00000000`00000000 00000000`00000000 ffff5111`525a7200 ffffe58b`4c33c180 : nt!KiCommitThreadWait+0x14f
ffff8a0e`3464c540 fffff806`7b0b6848 : ffffe58b`4da8e330 fffff806`00000009 ffff8a0e`3464c600 ffffe58b`4c4e6d00 : nt!KeWaitForSingleObject+0x233
ffff8a0e`3464c630 fffff806`7b0b3dff : ffff8a0e`3464c760 00000000`00000000 ffff8a0e`3464c750 ffffe58b`00000000 : nt!MiWaitForInPageComplete+0x1f8
ffff8a0e`3464c720 fffff806`7b0bc555 : ffff8a0e`3464c910 00000000`00000001 ffffe58b`4d2a5cf8 00000000`00000000 : nt!MiPfCompleteInPageSupport+0x83
ffff8a0e`3464c810 fffff806`7b0bc4aa : ffffe58b`4c33c040 00000000`00000000 00000000`00048000 00000000`00048000 : nt!MiPfCompletePrefetchIos+0x51
ffff8a0e`3464c840 fffff806`7b0bde86 : ffffe58b`4c33c040 fffff806`7b033e00 00000000`00000000 00000000`00000005 : nt!MmWaitForCacheManagerPrefetch+0x32
ffff8a0e`3464c870 fffff806`7b0340cb : 01d72921`8575e3e4 fffff806`7b064d7f 00000000`00020000 00000000`00000000 : nt!CcPerformReadAhead+0x382
ffff8a0e`3464ca40 fffff806`7b025975 : ffffe58b`4c33c040 ffffe58b`4c33c040 ffffe58b`4588e820 00000000`00000000 : nt!CcWorkerThread+0x2cb
ffff8a0e`3464cb70 fffff806`7b117e85 : ffffe58b`4c33c040 00000000`00000080 ffffe58b`45885040 00000000`00000080 : nt!ExpWorkerThread+0x105
ffff8a0e`3464cc10 fffff806`7b1fd2a8 : fffff806`79289180 ffffe58b`4c33c040 fffff806`7b117e30 00000000`00000246 : nt!PspSystemThreadStartup+0x55
ffff8a0e`3464cc60 00000000`00000000 : ffff8a0e`3464d000 ffff8a0e`34647000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28

0: kd> !thread ffffe58b4d110080
THREAD ffffe58b4d110080  Cid 2028.1dc0  Teb: 0000000cad772000 Win32Thread: 0000000000000000 WAIT: (WrPageIn) KernelMode Non-Alertable
    ffffe58b4cdcc618  NotificationEvent
Not impersonating
DeviceMap                 ffff9406555cbd30
Owning Process            ffffe58b4c6ec080       Image:         7zG.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      922499         Ticks: 275 (0:00:00:04.296)
Context Switch Count      873            IdealProcessor: 2             
UserTime                  00:00:00.062
KernelTime                00:00:00.031
Win32 Start Address 0x00007fffde76afb0
Stack Init ffff8a0e35483c90 Current ffff8a0e354828a0
Base ffff8a0e35484000 Limit ffff8a0e3547e000 Call 0000000000000000
Priority 10 BasePriority 8 PriorityDecrement 2 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffff8a0e`354828e0 fffff806`7b065850 : ffffc201`5e2d8180 00000000`ffffffff 00000000`00000000 00000000`5e2d8180 : nt!KiSwapContext+0x76
ffff8a0e`35482a20 fffff806`7b064d7f : 00000000`00000002 fffff806`00000001 ffff8a0e`35482be0 ffffa783`00000000 : nt!KiSwapThread+0x500
ffff8a0e`35482ad0 fffff806`7b064623 : 00000000`00000000 00000000`00000000 00000000`00000000 ffffe58b`4d1101c0 : nt!KiCommitThreadWait+0x14f
ffff8a0e`35482b70 fffff806`7b141b82 : ffffe58b`4cdcc618 ffffe58b`00000009 7fffffff`ffffff00 ffffb980`00577000 : nt!KeWaitForSingleObject+0x233
ffff8a0e`35482c60 fffff806`7b141881 : 00000000`1d2588c0 ffff9406`5bc4d900 ffff8a0e`35482d60 ffffb980`054e7602 : nt!MiWaitForCollidedFaultComplete+0x10e
ffff8a0e`35482cd0 fffff806`7b016506 : ffffe58b`00000000 00000000`1d2588c0 ffff8a0e`35482fe0 ffffb980`00577080 : nt!MiHandleCollidedFault+0x101
ffff8a0e`35482d20 fffff806`7b015f96 : ffff8a0e`35482f40 ffff88c4`69e0c098 ffffb980`054e7630 00000000`00000000 : nt!MiResolveTransitionFault+0x3e6
ffff8a0e`35482de0 fffff806`7b00e7a5 : ffff8a0e`35482f40 00000000`00000000 ffff8a0e`35482f20 ffffa783`027a0000 : nt!MiResolveProtoPteFault+0x1236
ffff8a0e`35482ee0 fffff806`7b00c6e9 : 00000000`00000110 00000000`00000000 00000000`c0000016 00000000`00000000 : nt!MiDispatchFault+0x3d5
ffff8a0e`35483020 fffff806`7b203d5e : 00000002`00000040 00000000`00000001 fffff806`7ba52980 00000000`00000000 : nt!MmAccessFault+0x189
ffff8a0e`354831c0 fffff806`7b208b52 : fffff806`7b08aa1d ffffa783`027a0000 00000000`00008000 00000000`07b20000 : nt!KiPageFault+0x35e (TrapFrame @ ffff8a0e`354831c0)
ffff8a0e`35483358 fffff806`7b08aa1d : ffffa783`027a0000 00000000`00008000 00000000`07b20000 fffff806`7b087faf : nt!memcpy+0x92
ffff8a0e`35483360 fffff806`7b431be4 : 00000000`07b40000 00000000`00008000 00000000`00000001 ffff8a0e`00000000 : nt!CcCopyBytesToUserBuffer+0x6d
ffff8a0e`354833e0 fffff806`7b087d49 : ffffe58b`45a77378 00000000`07b20000 00000000`00020000 ffffe58b`4d110001 : nt!CcMapAndCopyFromCache+0x124
ffff8a0e`35483480 fffff806`809e502d : ffff8a0e`354835e0 00000000`00000000 ffff9406`00008000 00000000`00008000 : nt!CcCopyReadEx+0x139
ffff8a0e`35483530 fffff806`7a2d73fb : 00000000`00000000 ffff8a0e`35483908 ffff8a0e`354838c8 000001b6`71d00000 : Ntfs!NtfsCopyReadA+0x2ed
ffff8a0e`35483820 fffff806`7a2d44d7 : ffff8a0e`35483930 ffff8a0e`354838c8 ffffe58b`47551c30 ffffe58b`47551b30 : FLTMGR!FltpPerformFastIoCall+0x16b
ffff8a0e`35483880 fffff806`7a30b405 : ffff8a0e`35484000 ffff8a0e`3547e000 ffffe58b`4d110080 fffff806`7b42b19e : FLTMGR!FltpPassThroughFastIo+0x107
ffff8a0e`35483900 fffff806`7b3e57bf : ffffe58b`4dc539f0 00000000`00000000 00000000`00000000 ffffe58b`00000000 : FLTMGR!FltpFastIoRead+0x165
ffff8a0e`354839b0 fffff806`7b2075b8 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtReadFile+0x37f
ffff8a0e`35483a90 00007fff`dee0ccd4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x28 (TrapFrame @ ffff8a0e`35483b00)
0000000c`adcff7a8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtReadFile+0x14

0: kd> !irp ffffe58b4c9877f0 1
Irp is active with 11 stacks 4 is current (= 0xffffe58b4c987998)
 Mdl=ffffe58b47584350: No System Buffer: Thread ffffe58b4c33c040:  Irp stack trace.  
Flags = 00060403
ThreadListEntry.Flink = ffffe58b4c33c4f0
ThreadListEntry.Blink = ffffe58b4c33c4f0
IoStatus.Status = 00000000
IoStatus.Information = 00000000
RequestorMode = 00000000
Cancel = 00
CancelIrql = 0
ApcEnvironment = 00
UserIosb = ffffe58b4da8e360
UserEvent = ffffe58b4da8e330
Overlay.AsynchronousParameters.UserApcRoutine = 00000000
Overlay.AsynchronousParameters.UserApcContext = 00000000
Overlay.AllocationSize = 00000000 - 00000000
CancelRoutine = 00000000   
UserBuffer = 00000000
&Tail.Overlay.DeviceQueueEntry = ffffe58b4c987868
Tail.Overlay.Thread = ffffe58b4c33c040
Tail.Overlay.AuxiliaryBuffer = 00000000
Tail.Overlay.ListEntry.Flink = ffffe58b4c987898
Tail.Overlay.ListEntry.Blink = ffffe58b4c987898
Tail.Overlay.CurrentStackLocation = ffffe58b4c987998
Tail.Overlay.OriginalFileObject = ffffe58b4d81a0b0
Tail.Apc = 00000001
Tail.CompletionKey = 00000001
     cmd  flg cl Device   File     Completion-Context
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
 [N/A(0), N/A(0)]
            0  0 00000000 00000000 00000000-00000000    

			Args: 00000000 00000000 00000000 00000000
>[IRP_MJ_READ(3), N/A(34)]
           10 e0 ffffe58b458ce060 00000000 fffff806802f2fd0-00000000 Success Error Cancel 
	       \Driver\disk	partmgr!PmIoCompletion
			Args: 00020000 00000000 15b89000 00000000
 [IRP_MJ_READ(3), N/A(3)]
           10 e0 ffffe58b4b696770 00000000 fffff806802f3ef0-00000000 Success Error Cancel 
	       \Driver\partmgr	partmgr!PartitionIoCompletion
			Args: 218f728d63 00000000 15b89000 00000000
 [IRP_MJ_READ(3), N/A(0)]
           10 e0 ffffe58b459bb4f0 00000000 fffff806803e1160-ffffe58b4b6eed90 Success Error Cancel 
	       \Driver\partmgr	volmgr!VmpReadWriteCompletionRoutine
			Args: 00020000 00000000 15b88000 00000000
 [IRP_MJ_READ(3), N/A(0)]
            0 e0 ffffe58b4b6eec40 00000000 fffff806812c52c0-ffffe58b4d129180 Success Error Cancel 
	       \Driver\volmgr	fvevol!FvePassThroughCompletionRdpLevel2
			Args: 00020000 00000000 218f728d61 00000000
 [IRP_MJ_READ(3), N/A(0)]
            0 e0 ffffe58b4d129030 00000000 fffff806815a51b0-00000000 Success Error Cancel 
	       \Driver\fvevol	iorate!IoRateReadWriteCompletion
			Args: 00020000 00000000 15b88000 00000000
 [IRP_MJ_READ(3), N/A(0)]
            0 e0 ffffe58b4c31fc10 00000000 fffff8068090a580-ffffe58b49610038 Success Error Cancel 
	       \Driver\iorate	Ntfs!NtfsMasterIrpAsyncCompletionRoutine
			Args: 00020000 00000000 15b88000 00000000
 [IRP_MJ_READ(3), N/A(0)]
            0 e1 ffffe58b4c7ef030 ffffe58b4d81a0b0 fffff8067a2d43b0-ffffe58b4be7fb80 Success Error Cancel pending
	       \FileSystem\Ntfs	FLTMGR!FltpPassThroughCompletion
			Args: 00020000 00000000 07b40000 00000000
 [IRP_MJ_READ(3), N/A(0)]
            0  1 ffffe58b4c4e6d20 ffffe58b4d81a0b0 00000000-00000000    pending
	       \FileSystem\FltMgr
			Args: 00020000 00000000 07b40000 00000000

Irp Extension present at 0xffffe58b4c987bd8:

The OSR thread seems right. We may not need caching with memory mapping. Windows already caches the data read from Read().

Found another thread: https://community.osr.com/discussion/272478

@extratype
Copy link

I thought I solved this issue, but it was not. I used non-cached I/Os in my project. It was working at first, but when I turned off real-time scanning in the antivirus solution, it started freezing again.

@extratype
Copy link

There's a similar project called wnbd and they had the same issue: cloudbase/wnbd#30. Having lpOverlapped as NULL in DeviceIoControl() causes hangs while processing concurrent requests. It takes about 1 min. to recover from each hang and an I/O retry event is logged in the Event Viewer.

Here is a patch:
0001-Fix-hang-in-DeviceIoControl.patch.txt

I don't claim any copyrights for this. Please use it whatever you want.

@billziss-gh
Copy link
Collaborator

billziss-gh commented Jun 3, 2021

@extratype thanks for your extensive research on this.

The reason for passing no OVERLAPPED structure to DeviceIoControl is described in the comment above the call. I use the same method for communicating with the kernel driver in WinFsp (see FspFsctlTransact) and I have never had any similar issue crop up (and that solution has millions of active installations). There may of course be a subtle interaction between this method of doing DeviceIoControl and something that WinSpd does that causes this issue.

Has your patch completely resolve the issue for you?

@extratype
Copy link

The reason for passing no OVERLAPPED structure to DeviceIoControl is described in the comment above the call.

I had read the comment. I think so too. Because WinSpd is a disk driver? I don't know.

Has your patch completely resolve the issue for you?

Yes for rawdisk. It no longer freezes with the #10 (comment).

But WIndows may freeze when the write cache is full if you use synchronous buffered I/O (regular WriteFile) instead of memory mapping. I will use asynchronous non-buffered I/O for my project.

WorkingRobot added a commit to WorkingRobot/EGL3 that referenced this issue Jun 6, 2021
- Remove Utils::Callback (and therefore std::function) overhead by directly using function pointers
- Removed redundant copy with a static buffer (also allows the code to be multi threaded when winfsp/winspd#10 gets fixed)
- Builds the service executable with a static CRT to prevent dll calls on memcpy
WorkingRobot added a commit to WorkingRobot/EGL3 that referenced this issue Jun 6, 2021
- Remove Utils::Callback (and therefore std::function) overhead by directly using function pointers
- Removed redundant copy with a static buffer (also allows the code to be multi threaded when winfsp/winspd#10 gets fixed)
- Builds the service executable with a static CRT to prevent dll calls on memcpy
@WorkingRobot
Copy link
Author

@extratype, could you get the ball rolling and write a PR with your patch to close this issue?

@extratype
Copy link

@WorkingRobot Sorry, I don't agree with the contributor agreement so I don't raise a PR. The patch is simple, you can submit it if you want. I don't need credit, just mention this issue in your PR :)

@WorkingRobot WorkingRobot linked a pull request Jun 8, 2021 that will close this issue
@FlorianFreudiger
Copy link

I am also facing this issue in a simple RAM-Disk I made using this project.
For me the problem comes into effect almost immediately when running CrystalDiskMark (Version 8.0.4) after the initial setup of the drives, no unmounting/flushing required...
It highlights a weird behaviour when increasing the number of threads created using SpdStorageUnitStartDispatcher from 1 to many:

1 Thread: Whole benchmark completes as expected.
image

2 Threads: Occasionally freezes during "Preparing.. Creating Test File", completes with greater performance than the previous test (as one would expected) with the sole exception being the second test row which is a lot slower, during which the disk seemingly becomes unresponsive.
image
The Event Viewer also shows 2 warning have been logged during this test:

  1. Event 129, WinSpd

The description for Event ID 129 from source WinSpd cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\RaidPort6

The message resource is present but the message was not found in the message table

  1. Event 153, disk

The IO operation at logical block address 0xa31b8 for Disk 8 (PDO name: \Device\00000132) was retried.

4 or more Threads: Issues continues to grow, new but same Event Viewer entries
image

Using #14 patch: Using a custom winspd-x64.dll build from #14 the will cause the issues to vanish, I do however understand from the explanation in #14 that this patch may not solve the underlying issue at hand.
image

Here's my C++ RAM-Disk I cobbled together using bits from rawdisk.c

#include <iostream>
#include <winspd/winspd.h>


static BOOLEAN Read(SPD_STORAGE_UNIT* StorageUnit,
	PVOID Buffer, UINT64 BlockAddress, UINT32 BlockCount, BOOLEAN FlushFlag,
	SPD_STORAGE_UNIT_STATUS* Status)
{
	const auto block_length = StorageUnit->StorageUnitParams.BlockLength;
	const size_t size = static_cast<size_t>(BlockCount) * block_length;

	const void* start = static_cast<char*>(StorageUnit->UserContext) + block_length * BlockAddress;
	memcpy(Buffer, start, size);

	return TRUE;
}

static BOOLEAN Write(SPD_STORAGE_UNIT* StorageUnit,
	PVOID Buffer, UINT64 BlockAddress, UINT32 BlockCount, BOOLEAN FlushFlag,
	SPD_STORAGE_UNIT_STATUS* Status)
{
	const auto block_length = StorageUnit->StorageUnitParams.BlockLength;
	const size_t size = static_cast<size_t>(BlockCount) * block_length;

	void* start = static_cast<char*>(StorageUnit->UserContext) + block_length * BlockAddress;
	memcpy(start, Buffer, size);

	return TRUE;
}

static SPD_STORAGE_UNIT_INTERFACE RawDiskInterface =
{
	Read,
	Write
};

int main()
{
	constexpr unsigned long threads = 4;
	constexpr size_t gb = 8;

	// Set storage unit parameters
	const SPD_STORAGE_UNIT_PARAMS params = {
		.Guid = { 1234, 5678, 9012, {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'} },
		.BlockCount = static_cast<UINT64>(1024) * 1024 * 1024 / 512 * gb,
		.BlockLength = 512,
		.WriteProtected = false,
		.CacheSupported = false,
		.UnmapSupported = false,
		.MaxTransferLength = 64 * 1024,
	};

	// Try to create storage unit
	SPD_STORAGE_UNIT* storage_unit = nullptr;
	if (const auto error = SpdStorageUnitCreate(nullptr, &params, &RawDiskInterface, &storage_unit); error != ERROR_SUCCESS)
	{
		std::cerr << "Storage unit creation failed! Error code " << error << "\n";
		return static_cast<int>(error);
	}
	std::cerr << "Storage unit created successfully\n";

	// Allocate buffer
	void* buffer = malloc(params.BlockCount * params.BlockLength);
	if (buffer == nullptr)
	{
		std::cerr << "Malloc failed";
		return 1;
	}

	// Initialize buffer with MBR + partition
	SPD_PARTITION partition = {
		.Type = 7,
		.BlockAddress = 4096 >= params.BlockLength ? 4096 / params.BlockLength : 1,
	};
	partition.BlockCount = params.BlockCount - partition.BlockAddress;
	SpdDefinePartitionTable(&partition, 1, static_cast<UINT8*>(buffer));

	storage_unit->UserContext = buffer;

	// Try to start dispatcher
	if (const auto error = SpdStorageUnitStartDispatcher(storage_unit, threads); error != ERROR_SUCCESS)
	{
		std::cerr << "Storage unit dispatcher start failed! Error code " << error << "\n";
		return static_cast<int>(error);
	}
	std::cerr << "Storage unit dispatcher started successfully using " << storage_unit->DispatcherThreadCount << " threads\n";

	// Wait until dispatcher stops
	SpdStorageUnitWaitDispatcher(storage_unit);

	// Cleanup allocated buffer
	free(buffer);

	return 0;
}

System info

Edition Windows 11 Pro
Version 22H2
OS build 22621.819
Experience Windows Feature Experience Pack 1000.22636.1000.0

Processor 12th Gen Intel(R) Core(TM) i7-12700K 3.60 GHz
Installed RAM 64,0 GB (63,7 GB usable)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants