Flying objects


Wednesday, November 21st, 2007 at 7:45 pm, .net

About a month ago, I was asked by a company to fix a bug in their huge C# application. From time to time their application just crashed. No exceptions, no hints in the logfiles, nothing.
Of course the bug was not reproducable (”it tends to occur under heavy load”), which made things even more interesting.

So what you do when you have no idea and should have fixed it yesterday?

Get A General Idea

Of course I have no idea about the whole project. To get an overview it’s generally a good idea to search for subprojects/solutions. They often have meaningful names and are stored in directories which may indicate some kind of hierarchy. You can get an idea about the structuring in a short time.

After doing that, heh, I know there is plenty code around, but at least I somehow have an idea what is going on and more important, what the application actually does.

Debugging

I add a listener for unhandled AppDomain exceptions so their details get flushed out into a file. Then I start the App.

“Tataaa”, after half an hour I have a crash. Great! My exception handler did not produce a file though.
So I can assume the problem is not in the application’s C# code itself. Thus cluttering the code with silly debug statements won’t help here (That’s what the guy before was doing).
What I need is a better debugger! After some research I even found something useful.

I run the App with my new debugger and after a while the program fails again. Catting trough the output files and there was my problem: 0xc0000005 (STATUS_ACCESS_VIOLATION)

Ou ou…

Well, I don’t think that the App runs into a .NET bug. Access violation… Pointers and stuff? C comes into my mind. In C# there are no pointers though. If they have been used anyway, my exception handler would have gotten a proper exception and would have flushed out some details.

Maybe there are some COM calls somewhere which are causing problems?

It is possible to directly call unmanaged code from a managed C# App. The only thing you have to do is declaring the API methods you want to use with the “DllImport” attribute:

[DllImport("wininet.dll")]
 static extern IntPtr FindFirstUrlCacheGroup(
 	int dwFlags,
 	int dwFilter,
 	IntPtr lpSearchCondition,
 	int dwSearchCondition,
 	ref int lpGroupId,
 	IntPtr lpReserved);

A simple grep and I located the COM stuff. Some further research and this line turns up:

public int Write( byte[] buffer )
{
 
//...
 
if( WriteFile(
	this.fileHandle,
	buffer,
	buffer.Length - 1,
	out this.bytesWritten,
	ref this.writeNativeOverlapped) )
		return bytesWritten;
//...
 
return 0;
}

Whats wrong here?

Maybe you think it’s okay. Well, this is why this post got written.

Garbage Collection

The debugged program is written for the .NET framework. This means we have garbage collection support. This again makes everybody building and using objects and except the IDisposable guys nobody really cares when and how the objects get deleted because it just works.

Nothing new for you I guess.

The garbage collector does also other cool things.
For example it moves the memory used by the objects in the RAM so the objects stay together and thus prevent memory fragmentation.
We therefore also benefit from the locality effect and our fancy L1, L2 and Ln caches finally do something useful. That means the program gets faster, the user is happier, we have less performance issues and hence more time for doing other things.

Everything fantastic until here.

COM

The debugged program does not live just in that world though. By passing our empty buffer we do enter the dark and evil COM world.
Thats where the reason of my program crash lies.

What happens with the COM invoke?
The .NET framework copies some values (value types) and passes a pointer to the system for the others. It needs pointers so e.g. the operating system can fill the buffer. Of course I can also pass a copy of my buffer and the system can fill also that one, but well, I wont be able to get it back into my application anymore. By leaving a reference I can get the result when the system finished writing data into my buffer. Thats why we have to specify the size of the buffer. On the COM side the system just receives a pointer to a byte array and has no idea how big it is.

Anyway, guess what happens when the garbage collector in its infinite wisdom decides to move our buffer into another memory area?
On the .NET side everything will still be okay. Calls on the buffer will be mapped to the right place. The operating system however, still has its pointer from the last call and that pointer points to a memory location where there is no buffer anymore. The OS writes it’s result to some location in the heap. Maybe another data structure is using it or that location doesn’t even belong to us anymore.
However, it writes to that address and because of memory protection the call gets blocked, and somebody kill us. ha!

Holding memory

So I must prevent that the buffer gets moved from the garbage collector while a reference to it is held by the system. The .NET way preventing a buffer to be moved by the Garbage Collector is the fixed keyword.

byte[] buffer = new byte[512];
long size = buffer.Length;
unsafe
{
	long* ptrSize = &size;
	fixed( byte* ptrbuffer = buffer )
	{
		GetComputerName( ptrbuffer, ptrSize );
	}
}

That doesn’t work for me though. I have two threads. One for asynchronous writing and one for asynchronous reading. I cant use that fixed stuff because I would need to wait until I can free the buffer, that would make me synchronous again.

Fortunately there is another solution: Object-Pinning. From msdn:

… This prevents the garbage collector from moving the object and hence undermines the efficiency of the garbage collector. …

ye! By pinning an object you tell the garbage collector it should let its dirty fingers from your shiny objects. That fixed my problem:

  • Creating a wrapper class for the buffer. Which holds the size and the buffer array itself.
  • Pin those members in the constructor
     bufferHandle = GCHandle.Alloc(
    	localBuffer,
    	GCHandleType.Pinned)

    and unpin them when they get disposed.

     bufferHandle.Free();
  • Save them locally before writing and reading so they don’t get disposed by the GC while an IO action is pending

Classical example of framework magic striking back

Tags: , , ,

8 Responses so far

  1. Reto Bachmann Says:

    You forgot to mention how you fixed it in less than thirty minutes. Now that would be *really* interesting ;)

    Anyway, I hope you have nothing against if I broadcast this post internally.

    btw, the fixed example has a bug (var naming)

    cheers

  2. Akmir Al Says:

    Dear Sir

    After having found your blog in google I tried to apply your code examples. Unfortunately without success. I am trying to asynchronously read from a windows handle and experience the same problem (Access Violation) you described in your post. May you please give further details regarding your last suggestion, especially which attributes you Alloc-ed.

    Yours faithfully

  3. steve Says:

    @reto
    Hehe, I can’t write about that, I still need to earn some money in your company :p
    Thx, I will fix it this evening.

    @akmir
    Hello Akmir.
    Async read and write method both need two pointers which can be modified by the OS. That is the buffer where the data is written to and a pointer where the number of bytes written is stored (accessing the Length property of the new buffer will result in wrong results). That means, there are two attributes which will be used by the OS and thus have to be pinned.

    So I created my own wrapper class which pins those two attributes. It looks more or less like this:

    public class PinnedBuffer : IDisposable
    {
    	public uint size;
    	public byte[] data;
    	private bool disposed;
    	private GCHandle sizeGCHandle;
    	private GCHandle dataGCHandle;
     
    	public PinnedBuffer( int size )
    		: this( new byte[ size ] )
    	{
    	}
     
    	public PinnedBuffer( byte[] data )
    	{
    		this.size = data.Length;
    		this.data = data;
    		sizeGCHandle = GCHandle.Alloc( this.size, GCHandleType.Pinned );
    		dataGCHandle = GCHandle.Alloc( this.data, GCHandleType.Pinned );
    	}
     
    	public void Dispose()
    	{
    		if( this.disposed )
    			return;
     
    		this.disposed = true;
     
    		if( this.dataGCHandle.IsAllocated )
    			this.dataGCHandle.Free();
     
    		if( this.sizeGCHandle.IsAllocated )
    			this.sizeGCHandle.Free();
    	}
     
    	~PinnedBuffer()
    	{
    		this.Dispose();
    		this.data = null;
    	}
    }

    Other points:
    If you may use unsafe code and want speed improvement use

    unsafe
    {
    	this.data = new byte[ size ];
    }

    in the ctor instead of allocating a new one in line

    : this( new byte[ size ] )

    Dispose() is not thread safe. Use Interlocked.CompareExchange

    You can safely use that class in your system calls. Don’t forget to hold a reference to them until your GetOverlappedResult returned though. Otherwise it will get disposed while the OS uses it and you get the same error like before.

    Hope that helped :)

  4. luca Says:

    Dispose() is not thread safe

    And I see no reason why it should be.
    It’s not possible that you pass the same PinnedBuffer instance to two system methods (well it is, but I dont think you expect it to work anyway)

    ha!
    1 : 0 for me :)

  5. Chrigi Says:

    1 : 0 for me :)

    2 : 0 for him…

  6. Akmir Al Says:

    Hello

    Thank you for your detailed explanation. Sometimes your post wasn’t explicit enough about the details and my co-workers and me had to do some testing and research to successfully adopt your solution. Here is what we did:

    1) Pinvoke is called like this: GetOverlappedResult( fileHandle, out pinnedBuffer.data, out pinnedBuffer.size, …)
    2) Complex checks have to be made in case Read()/Write() get executed synchronously. This dramatically changes the program flow.
    3) I introduced a member pendingWritePinnedBuffer (same for reading) which holds a reference to the buffer being written until the GetOverlappedResult() returns. The reference of those members needs to be switched between calling GetOverlappedResult() and Read()/Write(). This requires special handling because of 2)
    4) A maximum number of two threads may enter the read/write method.
    5) ERROR_IO_PENDING indicates that my system call got executed asynchronously.

    Are our test/research results and assumptions correct?

  7. steve Says:

    Yes

  8. steve Says:

    @luca
    yes you are right. Though setting the buffer to null in Dispose() would be wrong because that would cause problems when multiple instances share the same result buffer.

Leave a Reply