03 April, 2023

Many libraries/components are not tested for memory bugs

We were contacted by a customer, who claimed that EurekaLog causes Access Violation in a simple sample demo application.

Specifically, the application runs fine when compiled without EurekaLog and produces the expected results. But application crashes with "Access violation at address 00410759 in module 'DemoApp.exe'. Read of address 83EC8B59" when compiled with EurekaLog.

Unfortunately, the customer did not report any additional details, such as: EurekaLog version, IDE version, OS version, bug report file nor call stack, etc. However, we did know that the access violation exception happenned inside the System._IntfClear function.

Well, it doesn't take a genius to figure out that there is most likely a memory bug in the sample app's code. And it is probably something related to mixing manual/automatic management of object's lifecycle.

We have installed a demo version of the components in question and compiled a sample application with EurekaLog. Running the application and simply exiting it trigered the following error:

"Application made attempt to call method of already deleted object: $0430B610 OBJECT [TContosoDoc] 340 bytes"

while the call stack from the bug report looked like this:

System._IntfClear
System._FinalizeRecord
System.TObject.CleanupInstance
System.TObject.FreeInstance
System._ClassDestroy
Contoso.VCL.TContosoPropertiesForm.Destroy 2893[2]
System.TObject.Free
System.Classes.TComponent.DestroyComponents
Vcl.Forms.DoneApplication
System.SysUtils.DoExitProc
System._Halt0

Well, the error message is different, but the crash location is the same. Why is that?

There could be few reasons for that:
  1. The customer did not report when and how the error occurs. Perhaps, he was seeing a different error in a different place.
  2. The client could (correctly) assume that it was a memory bug somewhere, so he tried to "fix" the bug by disabling some of the EurekaLog's memory check options. Therefore, the client application's configuration could differ from defaults. And we were checking using the default settings.
  3. The code that detects/shows the "call method of already deleted object" message relies on the fact that the released memory remains untouched. However, if some code allocates memory over this disposed memory, the check could not function, and you get a simple access violation instead. So, depending on how memory is allocated/disposed, the behaviour can change.

Anyway, the EurekaLog was able to show a second call stack for the same object: specifically, the second call stack shows where the object was originally destroyed:

ContosoDoc.TContosoCustomDoc.Destroy
System.Classes.TComponent.DestroyComponents
Vcl.Forms.DoneApplication
System.SysUtils.DoExitProc
System._Halt0

Just looking at these two call stack you can see the problem already:
  1. The second call stack (actual deletion) mentions that the object in question was deleted "manually" by calling its destructor when components are cleaned up on app's shutdown.
  2. The first call stack (access to already deleted object) mentions that the same object was also tried to be deleted automagically via an interface reference.
Thankfully, we don't need the source code for the component/library (which we don't have, because we are using demo/trial) to confirm that. The line number from the first call stack leads us directly to the problem:
constructor TContosoPropertiesForm.Create(AOwner: TComponent);
begin
  inherited;
  FDoc := TContosoDoc.Create(Self);
end;

destructor TContosoPropertiesForm.Destroy;
begin
  inherited;
end;
where the FDoc field is declared as:
private
  FDoc: IContosoDoc;
Do you see the problem?

The TContosoDoc will be deleted by the TContosoPropertiesForm, because the TContosoPropertiesForm (Self) was passed as an owner to the TContosoDoc. So, when the TContosoPropertiesForm deletes itself - it also deletes all owned sub-components, including the TContosoDoc.

But the reference to the TContosoDoc was also saved into the FDoc field. That should not be a problem if the field has the TContosoDoc type. But it has the IContosoDoc type. In other words, it is an interface! When interface goes out of scope, it dereferences, and the object is deleted when the reference count reaches zero.

You may know that the components (descendants from TComponent) override the automatic inteface management by saying "there is no reference counter". In other words, increasing and descreasing interface counter do absolutely nothing.

If so - why there is the crash then? The reason is that even the simple "there is no reference counter" behavior requires virtual method calls! Indeed, remember that IInterface/IUnknown is declared as:
type
  IInterface = interface
    ['{00000000-0000-0000-C000-000000000046}']
    function QueryInterface(const IID: TGUID; out Obj): HResult; stdcall;
    function _AddRef: Integer; stdcall;
    function _Release: Integer; stdcall;
  end;
  
  IUnknown = IInterface;
In other words, any interface in Delphi must implement the _AddRef and _Release methods, because all interfaces in Delphi descent from IInterface.

Another piece of the puzzle: the "there is no reference counter" behaviour is not implemented like "do not call the _AddRef/_Release methods". Instead, this behaviour is implemented like "the _AddRef/_Release methods do nothing". So, the _AddRef/_Release methods must be called.

If so - they must be implemented (as emtpy methods). And how do you implement interface methods? By using virtual methods:
type
  TInterfacedObject = class(TObject, IInterface)
  protected
    FRefCount: Integer;
    function QueryInterface(const IID: TGUID; out Obj): HResult; stdcall;
    function _AddRef: Integer; stdcall;
    function _Release: Integer; stdcall;
  end;
Yes, the _AddRef/_Release methods are non-virtual methods for the object (class). However, remember that interface is basically an abstract class, which means all of its methods are pure virtual. Which means that the mentioned methods will be virtual methods for the interface once it is implemented.

And how do you call a virtual method? Well, you have to look it up inside object's (interface's) virtual method table. But if the object/interface was already released then its virtual method table won't be accessible anymore. That is where the bug comes from. The code is not actually trying to delete already deleted object, but it is trying to say "interface goes out of scope, please decrease reference counter". Normally this would result in the "do nothing" behaviour, but in our case the "do nothing" behaviour could not be located, since its implementing object is already gone.

So, why it was not a problem without EurekaLog on board?

It's simple: deleting object means marking its memory as "empty". The memory itself is not gone. And its content stays the same. Therefore, any futher calls to _AddRef/_Release methods will be successful, since virtual method table still could be located.

Conclusion: it is a bug in library/component's demo code, which must be fixed. Simplest way is a workaround: set the FDoc field to nil as the first action in the TContosoPropertiesForm's destructor. One correct way to fix it is to change field's type to object (class), so interfaces will be created/disposed only when used. Another way is to remove ownership and implement the reference counting, so object's lifetime will be managed by interface field only.

Moral of the story: use either interfaces or objects, do not mix! E.g. if you use interfaces - do not store references to implementing objects. If you use objects - do not store interface references.

As you can imagine, many libraries and components come with memory-related bugs, because there is no build-in tools in Delphi to diagnose such issues. You need a 3rd party tool: debugging memory manager. Not every library/component vendor will go extra length to use 3rd party tool to test his code. This is true even for Delphi itself, as both VCL and FMX has similar memory bugs which usually stays hidden. For example: RSP-38694, RSP-30403, RSP-28294, RSP-10308, ...

So, what if you can't fix the 3rd party code? Well, you can hide the bug by disabling the memory checks in EurekaLog. We recommend that you keep the "Enable extended memory manager" option enabled and disable all other sub-options. Don't forget to set the "When memory is released" option to "Do nothing". Please note that by doing so - you are hiding the bug, you are not actually fixing it!

P.S. It might be counter-intuitive to some, but if you want to fix a memory bug - you need to enable the "Catch memory leaks" option (make sure the "Active only when running under debugger" option is off if you are running the app outside of the debugger). Enabling memory leaks checks allows EurekaLog to allocate additional memory blocks with information about allocated memory. In these additional memory blocks, EurekaLog can store, among other things, additional call stacks and information about the memory's data type. All this additional information can help EurekaLog produce more accurate diagnostic information if a problem was found.

P.P.S. Read more stories like this one or read feedback from our customers.