|
A customer has a PC that's about 5 years old (P4 3GHz, DDR RAM, AOpen motherboard, WinXP SP3), that up until the end of January was reliable. Now it has started regularly BSOD'ing, to the point that hardly a single OS session stays up until the customer shuts the computer down.
I got the customer to run a full hard disk check on it, to get that out of the way. No serious problems there (no failures of any sort or bad sectors/clusters), and it made no difference to the situation. One other odd point is that the system clock jumped forward about two weeks (year and time of day accurate though) - though I wonder if the customer somehow accidentally did this (no evidence of that though).
I visited, and checked through the event log. Nothing particularly interesting in the logs except the crash logs. Most of them were IRQL_NOT_LESS_OR_EQUAL (0x000000D1), but some of the time there were different BSODs (I can't remember the codes) which I googled and the results strongly suggested hardware, maybe RAM.
I ran Orthos, which failed in about 50 seconds. I vacuum-cleaned the insides of the computer and re-seated the memory (it wasn't completely clogged with dust, but was a tad dusty - I wouldn't have thought it would be enough to cause things to overheat, but the case isn't well-ventilated - the PSU has an additional fan, but no chassis fan). It was at this point I noticed that some capacitors near the processor were bloating out at the top, and two may have been leaking (though that might be dust). I realise the potential consequences of bloaty/leaky capacitors, so they could well be the problem, but I want to be thorough.
I ran Orthos again, which failed in about 13 minutes. I ran memtest v3.5 in the hope that it was bad memory, but that turned out fine (1 successful pass - bear in mind when I'm on site I'm charging by the hour).
I had some spare second-hand memory which wasn't ideal (original memory was 512MB DDR400, and the system was using it at that speed), as it was 256MB DDR333, but the computer accepted it. I ran Orthos again, which worked for 30 minutes without a failure. I advised the customer that with less and slower memory, the computer will be running somewhat slower, but hopefully it will be stable until I get some new memory in which would be an ideal replacement for their memory. The customer was ok with this, and I also advised that Orthos should be allowed to continue to run for a few more hours.
The customer ran Orthos for another 7 hours, apparently no problem, but since then the computer has started BSOD'ing again, this time with 0x000000F1 (which seems to be a SCSI related one! system is using IDE, no RAID) and 0x000000D1 again. I've asked the customer if it would be ok for me to take the computer in for further testing, which she has agreed to.
I want to try running multiple passes with new memory in memtest86 v3.5, as well as multiple tests with Orthos (when I have the computer at my place, I just charge for my time rather than the amount of time a test takes to run). Once I build up some consistent results with either test (assuming there's a failure), I think I'll try a different PSU. I hope that it will be memtest that gives the failures, so I can rule out the Windows installation.
I'll also try updating all the drivers I can, just in case. Unless of course it's memtest that gives the bad results.
I realise the bad capacitors just might make a mockery of these tests by throwing in random results, and I also don't like the idea that I'm probably reducing the life of those capacitors by stressing out the system, but I want to be able to say in all honesty that I've tested everything else, so it really comes down to those capacitors.
If anyone has any ideas I'd much appreciate hearing them.
_________________ My PC spec
|