Flying objects
Wednesday, November 21st, 2007 at 7:45 pm, .net
About a month ago, I was asked by a company to fix a bug in their huge C# application. From time to time their application just crashed. No exceptions, no hints in the logfiles, nothing.
Of course the bug was not reproducable (”it tends to occur under heavy load”), which made things even more interesting.
So what you do when you have no idea and should have fixed it yesterday?
Get A General Idea
Of course I have no idea about the whole project. To get an overview it’s generally a good idea to search for subprojects/solutions. They often have meaningful names and are stored in directories which may indicate some kind of hierarchy. You can get an idea about the structuring in a short time.
After doing that, heh, I know there is plenty code around, but at least I somehow have an idea what is going on and more important, what the application actually does.
Debugging
I add a listener for unhandled AppDomain exceptions so their details get flushed out into a file. Then I start the App.
“Tataaa”, after half an hour I have a crash. Great! My exception handler did not produce a file though.
So I can assume the problem is not in the application’s C# code itself. Thus cluttering the code with silly debug statements won’t help here (That’s what the guy before was doing).
What I need is a better debugger! After some research I even found something useful.
I run the App with my new debugger and after a while the program fails again. Catting trough the output files and there was my problem: 0xc0000005 (STATUS_ACCESS_VIOLATION)
Ou ou…
Well, I don’t think that the App runs into a .NET bug. Access violation… Pointers and stuff? C comes into my mind. In C# there are no pointers though. If they have been used anyway, my exception handler would have gotten a proper exception and would have flushed out some details.
Maybe there are some COM calls somewhere which are causing problems?
It is possible to directly call unmanaged code from a managed C# App. The only thing you have to do is declaring the API methods you want to use with the “DllImport” attribute:
[DllImport("wininet.dll")] static extern IntPtr FindFirstUrlCacheGroup( int dwFlags, int dwFilter, IntPtr lpSearchCondition, int dwSearchCondition, ref int lpGroupId, IntPtr lpReserved);
A simple grep and I located the COM stuff. Some further research and this line turns up:
public int Write( byte[] buffer ) { //... if( WriteFile( this.fileHandle, buffer, buffer.Length - 1, out this.bytesWritten, ref this.writeNativeOverlapped) ) return bytesWritten; //... return 0; }
Whats wrong here?
Maybe you think it’s okay. Well, this is why this post got written.
Garbage Collection
The debugged program is written for the .NET framework. This means we have garbage collection support. This again makes everybody building and using objects and except the IDisposable guys nobody really cares when and how the objects get deleted because it just works.
Nothing new for you I guess.
The garbage collector does also other cool things.
For example it moves the memory used by the objects in the RAM so the objects stay together and thus prevent memory fragmentation.
We therefore also benefit from the locality effect and our fancy L1, L2 and Ln caches finally do something useful. That means the program gets faster, the user is happier, we have less performance issues and hence more time for doing other things.
Everything fantastic until here.
COM
The debugged program does not live just in that world though. By passing our empty buffer we do enter the dark and evil COM world.
Thats where the reason of my program crash lies.
What happens with the COM invoke?
The .NET framework copies some values (value types) and passes a pointer to the system for the others. It needs pointers so e.g. the operating system can fill the buffer. Of course I can also pass a copy of my buffer and the system can fill also that one, but well, I wont be able to get it back into my application anymore. By leaving a reference I can get the result when the system finished writing data into my buffer. Thats why we have to specify the size of the buffer. On the COM side the system just receives a pointer to a byte array and has no idea how big it is.
Anyway, guess what happens when the garbage collector in its infinite wisdom decides to move our buffer into another memory area?
On the .NET side everything will still be okay. Calls on the buffer will be mapped to the right place. The operating system however, still has its pointer from the last call and that pointer points to a memory location where there is no buffer anymore. The OS writes it’s result to some location in the heap. Maybe another data structure is using it or that location doesn’t even belong to us anymore.
However, it writes to that address and because of memory protection the call gets blocked, and somebody kill us. ha!
Holding memory
So I must prevent that the buffer gets moved from the garbage collector while a reference to it is held by the system. The .NET way preventing a buffer to be moved by the Garbage Collector is the fixed keyword.
byte[] buffer = new byte[512]; long size = buffer.Length; unsafe { long* ptrSize = &size; fixed( byte* ptrbuffer = buffer ) { GetComputerName( ptrbuffer, ptrSize ); } }
That doesn’t work for me though. I have two threads. One for asynchronous writing and one for asynchronous reading. I cant use that fixed stuff because I would need to wait until I can free the buffer, that would make me synchronous again.
Fortunately there is another solution: Object-Pinning. From msdn:
… This prevents the garbage collector from moving the object and hence undermines the efficiency of the garbage collector. …
ye! By pinning an object you tell the garbage collector it should let its dirty fingers from your shiny objects. That fixed my problem:
- Creating a wrapper class for the buffer. Which holds the size and the buffer array itself.
- Pin those members in the constructor
bufferHandle = GCHandle.Alloc( localBuffer, GCHandleType.Pinned)
and unpin them when they get disposed.
bufferHandle.Free();
- Save them locally before writing and reading so they don’t get disposed by the GC while an IO action is pending
Classical example of framework magic striking back

November 26th, 2007 at 2:20 pm
You forgot to mention how you fixed it in less than thirty minutes. Now that would be *really* interesting ;)
Anyway, I hope you have nothing against if I broadcast this post internally.
btw, the fixed example has a bug (var naming)
cheers
November 26th, 2007 at 5:14 pm
Dear Sir
After having found your blog in google I tried to apply your code examples. Unfortunately without success. I am trying to asynchronously read from a windows handle and experience the same problem (Access Violation) you described in your post. May you please give further details regarding your last suggestion, especially which attributes you Alloc-ed.
Yours faithfully
November 27th, 2007 at 12:36 pm
@reto
Hehe, I can’t write about that, I still need to earn some money in your company :p
Thx, I will fix it this evening.
@akmir
Hello Akmir.
Async read and write method both need two pointers which can be modified by the OS. That is the buffer where the data is written to and a pointer where the number of bytes written is stored (accessing the Length property of the new buffer will result in wrong results). That means, there are two attributes which will be used by the OS and thus have to be pinned.
So I created my own wrapper class which pins those two attributes. It looks more or less like this:
Other points:
If you may use unsafe code and want speed improvement use
in the ctor instead of allocating a new one in line
Dispose() is not thread safe. Use Interlocked.CompareExchange
You can safely use that class in your system calls. Don’t forget to hold a reference to them until your GetOverlappedResult returned though. Otherwise it will get disposed while the OS uses it and you get the same error like before.
Hope that helped :)
November 27th, 2007 at 6:29 pm
And I see no reason why it should be.
It’s not possible that you pass the same PinnedBuffer instance to two system methods (well it is, but I dont think you expect it to work anyway)
ha!
1 : 0 for me :)
November 27th, 2007 at 7:13 pm
2 : 0 for him…
November 28th, 2007 at 4:06 am
Hello
Thank you for your detailed explanation. Sometimes your post wasn’t explicit enough about the details and my co-workers and me had to do some testing and research to successfully adopt your solution. Here is what we did:
1) Pinvoke is called like this: GetOverlappedResult( fileHandle, out pinnedBuffer.data, out pinnedBuffer.size, …)
2) Complex checks have to be made in case Read()/Write() get executed synchronously. This dramatically changes the program flow.
3) I introduced a member pendingWritePinnedBuffer (same for reading) which holds a reference to the buffer being written until the GetOverlappedResult() returns. The reference of those members needs to be switched between calling GetOverlappedResult() and Read()/Write(). This requires special handling because of 2)
4) A maximum number of two threads may enter the read/write method.
5) ERROR_IO_PENDING indicates that my system call got executed asynchronously.
Are our test/research results and assumptions correct?
November 28th, 2007 at 4:10 pm
Yes
November 28th, 2007 at 4:13 pm
@luca
yes you are right. Though setting the buffer to null in Dispose() would be wrong because that would cause problems when multiple instances share the same result buffer.