20 July, 2022

EurekaLog erases my bitmap? (even VCL has bugs)

We were contacted by a client who claimed that EurekaLog was erasing his image in a program. In particular, it was claimed that the client's code "worked great" until EurekaLog was added to the application. After adding EurekaLog, the previously "working" code stopped working, clearing the image instead (e.g. the code's result was blank instead of expected image). The client also claimed that if the "Extended memory manager" option in EurekaLog is disabled - everything works fine again. But if this option is enabled - nothing works, even if all other memory debugging options are disabled.

Simplified code looks really simple:
Buffer := TBitmap.Create;
try
  Buffer.SetSize(64, 64);
  Buffer.Canvas.StretchDraw(Rect(0, 0, 64, 64), Bitmap); 
  Bitmap.SetSize(64, 64);
  Bitmap.Canvas.Draw(0, 0, Buffer); 
finally
  Buffer.Free;
end;
Do you see a problem in that code?

The client's statement about the "Extended memory manager" option turned out to be not entirely true: the client did not disable the "When memory is released" option - it was set to the "Fill memory with non-zeros" value, although the "disabled" value is the "Do nothing" value. Because of this, I went down the wrong path for quite some time, which I will not describe here.

So, the correct input data looks like this: the "Extended memory manager" option is enabled; if the "When memory is released" option is set to the "Do nothing" value, then everything works as it should; and if it is set to the "Fill memory with non-zeros" value, then the copied bitmap is erased.

What does this tell us? Of course, it just screams about an error in accessing data after deleting it (use after free bug)!

But how do we find it?

It won't be easy in this case.

You see, the problem is that some code reads data from an already deleted object (or memory area). But there are no breakpoints for reading data in Delphi. Yes, there is data breakpoints, but those works only for memory writes, not reads. E.g. if some rogue code was writing to deleted object/memory, then we could set a data breakpoint on the memory block and just run the program, waiting for the debugger to stop at the breakpoint and poke our nose into the "bad" code. But we cannot do the same for the memory read operation.

So what should we do?

First of all, it would be good to identify the object/memory block that is being accessed after deletion. You can do this with selective wipe.

Luckily, the code above doesn't delete too many memory blocks, so this can be done in a reasonable amount of time. Run the program, set a breakpoint on the EurekaFreeMem function (yes, you will need the EurekaLog Enterprise edition - which comes with source code, Professional edition does not have source code). Run the code and mark all the places where EurekaFreeMem fires. Sometimes you will also need to set a breakpoint on realloc, but a breakpoint on FreeMem only is sufficient in this case.

Now run the program again, stop at the first breakpoint. Of course, the "first one" is after your code starts executing, not the first one after the program starts. To do this: the breakpoint must be disabled at program startup and enabled immediately before the problematic code starts executing.

Next, you need to step through the EurekaLog's code until you see the memory cleanup code. In particular, it is called like this:
function EurekaFreeMem(P: Pointer): Integer;
// ...
begin
  // ...
      if CheckOurPointer(P, True, True, Error, IsDummy, SharedBlock, Caller) then
  // ...
end;  

function CheckOurPointer(var P: Pointer; const CheckForFree, 
  ClearData: Boolean; out Error: TMemoryError; out Dummy: Boolean; 
  out SharedBlock: Boolean; const AFirstAddr: Pointer = nil): Boolean;
// ...
begin
  // ...
        if ClearData then
        begin
          if (loCatchLeaksExceptions in LeaksOptions) and
             (PtrDummy^.BlockType = btObject) then
            PtrDummy^.Tag := Pointer(P^);

          FillReleasedMem(P, PtrDummy^.SingleSize, { ... });
        end;
  // ...
end;
E.g. you can set a breakpoint on ClearData or FillReleasedMem instead of EurekaFreeMem (the CheckOurPointer function has two places where FillReleasedMem is called).

So, now we are at the memory cleanup in the first deleted memory block within our problematic code.

Open the CPU debugger (View / Debug Windows / CPU Windows / Entire CPU):
You can't change the ClearData's value via the Evaluate / Modify command, because it is a const argument. But you can change its value on the CPU stack - if you know how to do it. Or you can simply execute the compare CPU instruction and then change CPU flags.
In any case - the debugger should indicate that compare has failed and the code below will be skipped:
Now, continue execution (and disable all breakpoints). We just skipped over wiping the first released memory block. Did it help?

If not (and in our case - it will not work), then restart the app, repeat everything, but this time skip one EurekaFreeMem call and stop on the second call. Change check's results again to skip the wipe. Does it work now? If not - then restart again, skip two EurekaFreeMem calls and disable wipe on the third call. Repeat this until app starts working.

Sooner or later, you will find that memory block, clearing which leads to erroneous behavior of the program.

And in this case, it turned out to be a TBitmapImage with a stack like this:

Vcl.Graphics.TBitmapImage.Destroy
System.TObject.Free
Vcl.Graphics.TSharedImage.Release
Vcl.Graphics.TBitmap.NewImage
Vcl.Graphics.TBitmap.ReadDIB
Vcl.Graphics.TBitmap.ReadStream
Vcl.Graphics.TBitmap.LoadFromStream
Vcl.Graphics.TBitmap.HandleNeeded
Vcl.Graphics.TBitmap.GetCanvas
Vcl.Graphics.TBitmap.Draw
Vcl.Graphics.TCanvas.StretchDraw


The deletion code itself looks like this:
procedure TBitmap.NewImage(NewHandle: HBITMAP; NewPalette: HPALETTE;
  const NewDIB: TDIBSection; OS2Format: Boolean; RLEStream: TStream = nil);
  // ...
begin
  // ...
  EnterCriticalSection(BitmapImageLock);
  try
    FImage.Release; // - it will be deleted here
    FImage := Image;
    FImage.Reference;
  finally
    LeaveCriticalSection(BitmapImageLock);
  end;
  // ...
end;
Nothing wrong at first glance: the old image is deleted, a new one is inserted instead. But our test data shows that somewhere there is still a reference to the deleted TBitmapImage.

How do we find it?

I've been scratching my head, but I couldn't think of anything better than remembering the value of the object's pointer at the moment it was deleted, and then go through the code further, watching where this value will be used.

To do this, I stopped at the moment of deleting the object and calculated the value of Pointer(Self) via Evaluate / Modify (if you are inside TBitmapImage.Destroy), or Pointer(FImage) (if you are inside TBitmap.NewImage). Let's assume this value is $2A411E0. For convenience, we write it in the full form: $02A411E0.

Now you need to open the CPU debugger and continue executing the program. You need to watch if this value (or a value close to it, but slightly bigger) appears anywhere in CPU registers.

This task is compicated by the fact that the code has just created a new TBitmapImage whose value will be close to ours. In our case it will be $02A41000. And it is important for us not to confuse these two values.

Fortunately, when you do F7/F8 through code in the CPU debugger, it shows you (by highlighting) which registers have been changed by the command you just executed. This makes it a little bit easier to keep track of addresses.

You will have to execute some code when you first notice our value:
In this case, the saved CPU registers are restored when exiting the TBitmap.LoadFromStream method.

Of course, this does not mean that the bug in the code is in the TBitmap.LoadFromStream method. It means that the CPU register still holds a reference to the already-deleted TBitmapImage object, and it is likely that someone higher up in the code is using that register to access (already-deleted) data.

Let's continue.

In general, there will be a lot more code. At some point, our value will be overwritten, but then restored again from the one saved on the stack. At the same time, it will always be stored in the ESI register - which, of course, makes it very easy for us to find an error in the code, since we just need to look at machine opcodes to see if any of them accesses the ESI register. And sooner or later we will find such code:
The code reads a field from an object stored in the ESI register and loads that value into the EAX register. Note that the loaded value is DEADBEEF - this is the debug marker that EurekaLog fills in the memory being deleted. This once again confirms that it is this code that refers to an already deleted object.

In addition to DEADBEEF, you can also encounter other values: $CCCCCCCC, $FEEEFEEE, $FDFDFDFD, $FADEDEAD, $00009999. EurekaLog attempts to reserve at least one of them for debugging purposes and will use the first one for which the reservation succeeds. There may also be values close to them (i.e. debug marker + offset, e.g. DEADBE8F).

Besides, it can also be a pointer to the code, following which you will get to the beginning of the DeadObjectVirtualMethodCall or DeadInterfaceMethodCall procedures - it is EurekaLog's service code for catching virtual methods calls to already deleted objects and interfaces. Unfortunately, pointers to DeadObjectVirtualMethodCall or DeadInterfaceMethodCall are not fixed and will be different in each program. Perhaps you should write them down on a piece of paper in advance when you start the program, so that later you can see if they appear in the CPU registers.

Anyway, we have found the code that accessed the already deleted TBitmapImage:
procedure TBitmap.Draw(ACanvas: TCanvas; const Rect: TRect);
// ...
begin
  with Rect, FImage do
  begin
    // ...
        StretchBlt(ACanvas.FHandle, Left, Top, Right - Left, Bottom - Top,
          Canvas.FHandle, 0, 0, FDIB.dsbm.bmWidth,
          FDIB.dsbm.bmHeight, ACanvas.CopyMode);
    // ...        
  end;
end;
Do you see a bug in the code now?

The bug is using the with operator.

Indeed, using the with operator causes the Delphi optimizer to store a reference to the evaluated expression in with somewhere (in this case - in the ESI register). And since we are talking about an object (which are stored by reference, and not by value), it is not the object's data that is stored, but only a reference to the data. Therefore, any data change (including their deletion/clearing) from the sub-procedures called in the code will go unnoticed. In particular:
  // FImage is being cached by with
  with Rect, FImage do 
  // ...
      // FImage was deleted (recreated), but reference to the deleted object is being stored by with
      Canvas.RequiredState(csAllValid); 
  // ...
        // FImage.FDIB references the deleted object
        StretchBlt(ACanvas.FHandle, Left, Top, Right - Left, Bottom - Top,
          Canvas.FHandle, 0, 0, FDIB.dsbm.bmWidth, 
          FDIB.dsbm.bmHeight, ACanvas.CopyMode);
  // ...
And the worst thing about this with operator is that the debugger is not able to interpret it correctly. For example, the calculation of expressions by mouse hovering does not work. And if we try to calculate the values of the specified expressions (for example, FImage.FDIB.dsbm.bmWidth), then the debugger will, of course, return the correct value from the new FImage object, and not from the removed old one, which is preserved by the with statement.

Values of the StretchBlt's function argument were one of the things that I have checked first! But due to this "feature" of the debugger, I could not see the problem, although I had it right before my eyes. I had to go the long way.

Conclusion: customer has found a bug in VCL. Congratulations!

But what the customer should do? After all, fixing an error in the VCL can take a long time, and the program should work like right now.

Luckily for him, a workaround can be suggested in this case. Basically, we need to move the FImage re-creation outside of the with block. For example:
Buffer := TBitmap.Create;
try 
  Bitmap.Canvas; // - added
  Buffer.SetSize(64, 64);
  Buffer.Canvas.StretchDraw(Rect(0, 0, 64, 64), Bitmap); 
  Bitmap.SetSize(64, 64);
  Bitmap.Canvas.Draw(0, 0, Buffer); 
finally
  Buffer.Free;
end;
If you can't offer a workaround, then what is left is disabling memory checks in EurekaLog.

Moral of the story:
  • Do not use the with operator;
  • If you have a memory error somewhere in the code that uses with - refactor the code so it will not use with;
  • VCL can have bugs too;
  • If there are errors in your program after adding EurekaLog to it - try disabling memory checks in EurekaLog. If the problem goes away after that - 99% that your program has a memory bug somewhere.

P.S. You can vote for fixing this issue here: [RSP-38694] Using "with" in VCL causes "use after free" bugs.
P.P.S. It is possible that VCL has another similar bug.

P.P.S. Read more stories like this one or read feedback from our customers.