Problem: after hitting "Login" on an ASP.NET web application on IIS6, w3wp.exe races to high CPU usage, and about 70MB of memory. CPU usage falls off, but memory usage increases a few KB every second. No indication is made that the application has crashed or hung, no output.
The first utility I installed to check this out was IISTracer. Ignore the horrible web site - this is a nice little utility which lets you see what requests IIS is currently serving. In this instance the default.aspx page calls a dashboard.aspx, which was timing out indefinitely without returning a response to the client.
There are a number of utilities you can use to debug IIS faults. David Wang's guide on how to understand and diagnose an application pool crash gives some insight into how it should be done - attach a debugger and wait for the fault, then look at the stack trace. His suggestions include IISState, an IIS resource kit utility, which will give you an output of the current state of an IIS process; and DebugDiag, a much nicer and newer interactive debugger for IIS.
I spent a couple of days playing with DebugDiag, and unfortunately, due to the nature of my problem, I couldn't get anything useful. It seems that the application wasn't ever actually crashing or hanging - states that DebugDiag should be able to deal with easily. IISState, on the other hand, suggested I was regularly getting first chance exceptions thrown. A first chance exception means an exception has been raised - if the application handles it, then nothing more is mentioned, else it becomes a second chance exception, and the debugger or system exception handler is called). So, more digging was required.
Enter the Windows debug tools, and specifically WinDbg. This is what the professionals use, and as such, it's ultimately powerful and incredibly scary. It's updated regularly, except for the interface, which is nicest described as "circa-1990".
There are a number of blogs that mention running WinDbg, but none that actually put together all the pieces on how to debug this kind of problem. So here's my story for you.
First, attach the debugger to your w3wp.exe process. I found it's process ID in Task Manager - you need to add the PID column. At this point, I get a breakpoint hit - you might not, so use Debug/Break or Ctrl-Break to stop the program.
Debugging symbols let you see not just the memory location of a failure, but the name of the function that it is in. They are not normally included with release software as they make it a lot larger, but can be downloaded separately in this case. DebugDiag is friendly and will automatically download appropriate debugging symbols for you from Microsoft's symbol server, and put them in c:\symcache\. WinDbg is less friendly and requires you set this up yourself. I used the same path, as I already had some symbols downloaded - I told WinDbg to look at the local symbols first, and look at the online symbol server if they're not already cached there. Do this like so:
.sympath SRV*c:\symcache*http://msdl.microsoft.com/download/symbols
Now, we need to load the SOS debugger helper, which lets you debug managed code in WinDbg (Nerd note: SOS stands for Son of Strike, which is a Microsoft in-joke.) If you started the process inside the debugger, you can apparently do this:
0:033> .loadby mscorlib sos
Unable to find module 'sos'
I couldn't, so I'll do this:
0:033> .load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\SOS.dll
If you're running .NET 1.1, you can .load clr10\sos.dll for the version shipped with the debugger.
Now, we instruct it to break whenever there is an exception thrown:
0:033> sxe CLR
And hit Debug/Go, type 'g and hit Enter, or hit F5 to resume the program.
No sooner than I hit 'g', we get results!
0:033> g
(a78.8b4): CLR exception - code e0434f4d (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=0230ec7c ebx=000d9c50 ecx=00000000 edx=00000024 esi=0230ed08 edi=e0434f4d
eip=77e4bee7 esp=0230ec78 ebp=0230eccc iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00200202
kernel32!RaiseException+0x3c:
77e4bee7 5e pop esi
I don't speak assembly and really have no desire to. Thankfully the SOS extension lets us look at the exception, and the call stack (the list of functions that we called to get where we are). Type !pe to print the exception, and !clrstack to see the call stack.
0:012> !pe
Exception object: 048f8a4c
Exception type: System.Data.SqlClient.SqlException
Message: Cannot open database requested in login 'R622P1Sy1'. Login fails.
Login failed for user 'admin'.
InnerException: <none>
StackTrace (generated):
<none>
StackTraceString: <none>
HResult: 80131904
0:012> !clrstack
OS Thread Id: 0x8b4 (12)
ESP EIP
0230ed54 77e4bee7 [HelperMethodFrame: 0230ed54]
0230edf8 6539bf14 System.Data.SqlClient.SqlInternalConnection.OnError(System.Data.SqlClient.SqlException, Boolean)
0230ee0c 652f695d System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(System.Data.SqlClient.TdsParserStateObject)
0230ee3c 652f79e5 System.Data.SqlClient.TdsParser.Run(System.Data.SqlClient.RunBehavior, System.Data.SqlClient.SqlCommand, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.BulkCopySimpleResultSet, System.Data.SqlClient.TdsParserStateObject)
0230ee90 652e932a System.Data.SqlClient.SqlInternalConnectionTds.CompleteLogin(Boolean)
And there's the problem. A first-class exception is handled trying to connect to a database with incorrect login details, so it continues, probably never freeing the memory from the last connection. This particular exception doesn't get logged and doesn't get displayed on screen (probably because it's being loaded in the background) and causes an infinite timeout. Oops!
Armed with a little knowledge about the debugging tools, it's easy to see what's going on inside your code, even if it's running in IIS.