Very Strange Win11 Gaming PC Crashing Problem - Please Help!

Hi all,

I've been building my own PCs for 20 years and think of myself as fairly cluey with tech stuff, but im completely stumped with this particular PC crashing issue. Hoping the brains trust can help! Sorry for the long post, but there's a lot of detail to convey.

I built a new PC in late 2020, for gaming, running Win10 and with an MSI RTX 3090 and an 850W PSU (rest of specs at bottom of post). The system IS NOT OVERCLOCKED.

Previous problem:

For a long time with this setup i used to have an issue with instantaneous resets (as if you hard-pressed the reset button) while playing graphically demanding video games. This used to happen once every few days (i tend to game every evening). At the time i put this down to power spikes in the GPU, although i never confirmed this and didnt really investigate that further.

About a year back i upgraded to Windows11. Actually dont recall if i in-place upgraded or installed clean.

Since then my system has developed a different type of issue. I no longer get the spontaneous resets, but i get a weird freezing problem, which manifests in 2 ways:

Current problem:

If im not gaming but using the Win11 PC for productivity (watching youtube, usually), i get a partial crash/freeze of the OS. If watching youtube for example, the audio cuts out, but the video keeps going for another 10secs (whatever was cached). All the apps in Windows still respond but only partially. I can switch windows and scroll through windows, but cant open any new app. Task manager also will not open. Trying to reboot via windows reboot menu causes a freeze, so the only way to remedy this is a hard-reset via button on the PC case.

If im gaming, graphically intense games insta-freeze (audio and graphics), although i can still alt-tab out of them, with the rest of the behaviour same as discribed in point #1. If the game is not graphically intense, i sometimes a situation where the audio cuts out, but i can still interact with the graphical parts of the game. Trying to create a savegame in the game though never works, even though the game often tells you saving the game was a success. Checking later, the savegame doesnt exist. (makes me wonder if my whole problem is related to storage?).
These crashes sometimes happen once in a few days, and sometimes twice in 10 minutes. I've not been able to detect any pattern to what causes these more or less frequently.
Games have been both DX11 and DX12, and maybe even Vulkan, not sure.

Things i've tried:

Monitoring PC specs and logging them via MSI Afterburner. Nothing seems to spike around crash time, and there's no issues with CPU/GPU temperatures.

Check eventviewer for any errors. Nothing seems to spring out at me, but i feel the crash is such that no log entries can be written from time of crash, which makes this not very useful.

Have run a number of diagnostic tools such as memtest and the OCCT stability testing tool with many passes of the GPU, VRAM, MEM, CPU & Power tests, with no crashes occuring during those tests.

Have checked the status of my SSD's (samsung) with the Samsung Magician software. Everything seems fine.

Have reinstalled the Nvidia drivers clean with DDU. This hasnt helped.

Have updated all chipset drivers from my motherboard manufacturer (Asus). This hasnt helped.

Have used usb dongles for bluetooth and wifi while disabling the NICs/Wifi Adapters & Bluetooth adapter on the motherboard to eliminate those as a potential cause. Still got crashes anyway.

Using console commands to restart explorer.exe and reset the windows audio and gfx services - attempting either of these causes a hard freeze of the system

Things i havent tried:

Reinstalling Windows clean.

Replacing any hardware components (except the NIC/bluetooth test described above).

As you can see this is an annoyingly complicated problem, as there's no error logs, error messages or crashdumps to go off. I'm completely stumped.

The ask:
Would any wise PC experts be able to give me any advice on these crashes please!?

thank you!

kommz

Full specs:

Asus X570 ROG Crosshair VIII Formula

AMD Ryzen 5950x

NZXT Kraken x73 AIO CPU water cooler

MSI Gaming X Trio RTX 3090 graphics

Corsair Dominator Platinum 32GB 4x8GB 3600Mhz CL16

Samsung EVO 980 1TB m2 x2 SSDs

Lian Li O11 Dynamic XL Case

Corsair RM850x Gold - White - 850W

LG 38GN950 38" Ultrawide Screen

I often use a bluetooth headset while gaming, but i've also had crashes while using the PC speakers.

There system is NOT OVERCLOCKED, i'm using memory timings that are recommended by the memory vendor, and there's no specific changes to BIOS settings except for enabling REBAR.

FINAL UPDATE ~1month later

I seem to have managed to achieve good but not perfect stability. It seems upping the DRAM voltage was the solution. Ended up bumping this to 1.46v up from 1.35v, even though 1.35v is what the XMP/DOCP memory profile suggests. Strangely this only became a problem on Win11 and not Win10. I still get about 1 crash per week, but this is significantly improved from 3 crashes per evening, so thats a huge win in my books.
Thanks very much to @7ekn00 for his suggestions - lifesaver!

For posterity - other things I've tried (in addition to those described above in this post) that were suggested by posters but didn't seem to yield any results:
DISM & sfcscan commands
Disabling CPU C-States in bios
Disabling REBAR in bios

Comments

  • +1

    What if you reduce the memory timing, and do you have a higher wattage PSU you can use to test with?

    Also try booting into a live linux distro like Ubuntu and running some benchmarks/diagnostics so you can rule out software or driver issues

    • I dont have another PSU to test with, unfortunately.
      I'd play with the mem timings, but all the mem diagnostics ive run dont indicate any memory problems, so not really sure if this will yield any results.
      My plan has been to hold out till the 5xxx series of Gfx cards come out from Nvidia (should be Dec/Jan), and buy one of those, with a new more powerful PSU.
      That way i can eliminate both the gfx card and PSU as potential problems.
      But the crashes have recently become very frequent (i'm playing Starfield - not sure if this game is more prone, or whats up), so am having another attempt at fixing it now by asking the community

    • +1
      until you have run a live cd and ruled out a software problem you are just chasing your tail and wasting yours and others time

  • +4

    Things i havent tried:

    Reinstalling Windows clean.

    It's a pain but you have to try this I reckon.

    • :( :( :(

  • Any warning lights on the GPU or motherboard?

    Not double dipping on any of the power cables running to the GPU (i.e. individual PCI-e cables from the PSU to the GPU, not using one cable with two plugs)?

    Made sure all the plugs are in securely (motherboard, GPU, powersupply, etc)?

    • the 3090 needs 3 individual power plugs (cant remember the spec name of these), and they're all individual cables running to the PSU

  • Cant be sure, could be power supply failing, could be a short in a usb socket somewhere… haven't had any new hardware plugged in last few months have you?

    Also the power supply issue happened to me, was a pita to diagnose as well until a new unit fixed all the elusive problems, maybe a $15 power supply tester might have found it?

    I suppose you've also checked the motherboard for any physical problems and given it a quick clean.

    • +1

      nothing new plugged in via USB.
      I have a glass-wall case, so can see inside clearly and can see the mobo. Its looking fresh and isnt dusty

      • Lol had a quick google, you don't have a high res wallpaper changer do you? big screens and automated wallpaper changes cause big problems and stuttering to some users.

        Also maybe bring your windows pagefile up to 32gb and see if that helps with stuttering

        Also a random video i found said asus armory crate might be causing problems https://youtu.be/O5o2MjHZiwg?t=368 particularly color cycle

        • +1

          I feel like this person is talking about performance hits. My performance has always been excellent, no stuttering, anything, its these instantaneous crashes that are the problem.. So not certain if removing armoury crate will have an effect (i only use the RGB component of armoury crate, nothing else).

  • +1

    You say the system is not overclocked yet you also state:

    Corsair Dominator Platinum 32GB 4x8GB 3600Mhz CL16

    Which one is it or is the memory running at stock?
    It sounds very much like a memory issue. Perhaps removing the overclock on your memory and see if the issue persists.

    Without isolation testing it is very hard (near impossible) to determine.

    • some comments on my memory a few posts below this one

  • Can you try disabling one CCX on your 5950X and see how it goes?

    • is that something that's possible on the Crosshair motherboard? i dont believe i've ever seen a setting for this..

      • Should be able to download and install Ryzen master and disable a CCX/the cores in CCX0 or CCX1.

  • Corsair Dominator Platinum 32GB 4x8GB 3600Mhz CL16

    When running 4 sticks on AM4/5 bump the voltages slightly for stability (ie. 1.35V > 1.4V) …

    • Never noticed such issues on B550M with 4 * 16 GB G.Skill 3200 MHz cards though.

      • +1

        Depends very much on timings, memory controller and RAM die …

        Safest bet to get full performance out of 3600 RAM with 4 sticks is to add ~ 5% voltage …

        My 5950x with 4 sticks of Corsair 16GB 3600 RAM needed 1.465V to run completely stable for weeks, before that it was randomly crashing in weird places with nothing in event logs, etc. It always completed memtest for 12+ hours fine, but after reading a few forums I decided to bump the voltage (first to 1.4V which reduced a lot of the crashes, but not all, then 1.465V which eliminated all crashes and has been stable for years now!)

        • yeah, it is a bit tricky with 4 sticks of DDR4 or DDR5. I have 160 GB in total DDR5 (on Z690 with 13900K) and it was testing fine at 5600 MHz for a few days on Memtest but got one or two errors after that. I ended up running it at 4800 MHz and it has been rock solid for about a year now. If I only use a pair, all works well at 6400 MHz

          • @bazingaa: some comments on my memory a few posts below this one

  • +1

    how long have you run MEMTEST ? at least 3 full passes ? I have seen rare errors only after few passes with higher overclocks.
    Boot with this and test again : https://memtest.org/

    I think 850W PSU is more than enough, I have similar setup using Deepcool 850 W PSU, but with aircooled 13900K + MSI 3090 + 4 * DDR5 + 1 HDD + 6 M.2 drives + Win10 Pro, it is rock solid and I run it kinda 24/7 with high load. Have you done OCCT Power test, just to be sure?

    My previous PC was 5600x + B550M, by any chance do you have any PCI to PCIe adapter (or any PCIe card with builtin PCIe to PCI bridge) ? It had similar issue when I tried to use CMedia 8738 sound card. Couldn't find a way to fix it and I gave up that idea to use hardware OPL3.

    Also, have you disabled auto CPU overclocking in BIOS and tested again. I can't remember what it is called for AMD CPUs.

    • some comments on my memory a few posts below this one

  • Why haven't you fresh installed windows? It's probably the easiest thing to do. Install any flavour of linux on your second ssd and test that too

    • i agree that i should, although i have sooo much stuff installed that reinstalling is really a fair pain.. but yes, i should probably give that a go

      • just use a spare drive (hopefully you have one) to do the clean install on & run with minimal software / games.

        If still crashing, then you know it's hardware related

  • Get a better powerboard and or try plugging the system into a different circuit.

    • with the insta-resets i might've thought it could be something like this (although i have a fairly fancy surge-protected power board), but since upgradind to win11 i never get the resets anymore, more like these partial crashes (where OS is still partially responding) that i've described, so i really dont feel like its an electicity feed problem

      • It's pretty low cost, why wouldn't you just give it a go?

  • +1

    ok somehow my long comment about my memory settings got lost.. ok, retyping it.

    For my memory settings i'm using the D.O.C.P provided with the mem sticks, so that in theory should be very stable. It recommends 1.35v for the mem, and thats what it's set to.
    Here's some pictures of some BIOS stats and my memory settings

    https://imgur.com/a/pW7lpnU

    I've previously run memtest86 for a number of hours without problems, but perhaps i need to run it for much longer. I'll give that a go.
    After running that, i'll try bumping my mem voltage to 1.45v, as one commenter suggested, to see if that helps.

  • Have you tried in an admin cmd (or powershell) the DISM and sfcscannow commands?

    DISM /Online /Cleanup-Image /CheckHealth
    DISM /Online /Cleanup-Image /ScanHealth
    DISM /Online /Cleanup-Image /RestoreHealth
    SFC /scannow

    (in that order)

    • ran the checkhealth and scanhealth commands just now, everything comes back healthy

  • I have a similarish setup and there are known stability issues that are resolved by disabling Global C-State Control in the bios.

    In my Asus ROG Crosshair VIII Dark Hero bios it is found under:
    Advanced -> AMD CBD -> CPU Common Options -> Global C-state Control

    I'm not sure if the problem (or option to disable Global C-States) is a thing on the Formula motherboard but definitely worth a look.

    By similarish setup, I mean:

    Asus X570 ROG Crosshair VIII Dark Hero
    AMD Ryzen 5950x
    NZXT Kraken z73 AIO CPU water cooler
    Asus ROG Strix RTX 3090
    G.Skill Trident Z Neo 32GB 3600MHz CL14 RAM
    Samsung Pro 980 2TB m2 SSD
    Asus ROG Thor platinum - 850W

    • +1

      i see you're a man of good taste :)
      I've had a bit of success (see post below) with memory voltage.. going to see how long the stability lasts. If not, i'll give disabling the C-States a go (checked in the bios, yes, i can disable this). Dont want to change 2 things at once, thats poor testing methodology

  • Test mem sticks
    try running on 2 mem sticks and see if problem goes away, then try the other two.

  • +1

    an UPDATE:

    I ran memtest86 last night for 8hrs straight, no errors.

    However, yesterday evening i also bumped my DRAM voltage to from 1.35v to 1.45v based off advice by @7ekn00. Managed to game without crashes for about 4hrs!
    Too early to say if that has actually eliminated my problem or not, as sometimes the PC can go several days without crashing, but it gives me a bit of hope.
    Will run in this config till the next crash. If in the end that doesnt help, i'll try changing the C-States as per advice from @Hoofee

  • Has this started happening recently or for a while now?

    What version nVidia drivers are you on? I've seen reports of crashing problems with 560 onwards.

    I actually started having weird issues with the drivers randomly crashing recently, even just during normal desktop use.
    I reverted to 556.12 and haven't seen the problem since after a few weeks (I only have a 1660 Super though), no other changes were made.

  • Is bios up to date? But yeah my guess is memory sticks.

    Everything else seems fine unless your CPU is overheating constantly which might mean you need to reapply thermal paste or get a new cooler like a thermalright Peerless Assassin 120 se or something

  • +1

    UPDATE2:
    After putting DRAM voltage to 1.45v, had one evening of stability, but unfortunately on the next evening had another crash.
    I've left the voltage there, but have also now disabled CPU C-States, as per recommendation from @Hoofee
    Had one evening of stability, but need to test this over a longer period.
    Will give you guys some updates when i have them!

    p.s. also ran memtest86 for another 8hrs, no errors again.

  • The only thing I don't really see suggested here (sorry if I missed it), is check whether your MB has the latest BIOS. Often stability problems get ironed out in newer revisions.

    • +1

      yes, have latest bios

  • UPDATE3:

    CPU C-States disable didnt seem to improve matters.
    I also disabled REBAR, which also didnt seem to help.
    Still getting roughly 1 crash per evening.

    Next thing i might try is disabling some CPU cores on my 5950x with Ryzen Master and see if that has any effect.

    At this point, since really struggling to nail down the issue, i'd probably have to put it down to either a Windows corruption of some sort (i might try another install via dual-boot) or my mobo being faulty, which would suck.

    Any further advice appreciated.

    • I had a similar, hard to track down issue recently. PC would insta reboot or act weird when doing random stuff on it. Turns out it was one of my NVME SSDs. It tested fine, but I narrowed down the culprit and it’s sitting on my shelf ready for an RMA when I get around to it.

      • Faulty SSDs can indeed cause weird crap like this speaking from experience even if they test OK. Definitely worth checking.

        • +2

          Even just pulling everything out and reseating can do the trick sometimes.

  • Have added a FINAL UPDATE to the bottom of my post, thanks everyone for helping out!!

Login or Join to leave a comment