12 August, 2023

What can you do with the "Assember" tab in bug reports? Is it even useful?

We were contacted by a customer that claimed that his application worked fine until he added EurekaLog to it. Specifically, his application starts to raise an EAccessViolation exception with the following message:
Access violation at address 03DB472F in module 'Sample.exe'. Read of address 5653E4CC

The customer was kind enough to share the code mentioned in the call stack of the bug report:
function TContosoEventCollection.FindEvent(const AValue: TContosoEvent): Integer;
var
  X: Integer;
begin
  Result := -1;

  if not Assigned(FEvents) then
    Exit;

  for X := 0 to FEvents.Count - 1 do // - crashes here
  begin
    if SameMethod(AValue, FEvents[X]) then
    begin
      Result := X;
      Break;
    end;
  end;
end;
Where FEvents is the TList field in the TContosoEventCollection class.

As a reminder: the TList class is implemented in the RTL like this:
type
  TList = class(TObject)
  private
    FList: TPointerList;
    FCount: Integer;
  // ...
  protected
    function Get(Index: Integer): Pointer; // returns FList[Index]
  // ...  
  public  
    property Items[Index: Integer]: Pointer read Get write Put; default;
    property Count: Integer read FCount write SetCount;
  // ...
  end;
So, the code crashes while trying to read the FCount field of the TList class - which probably means that the TList object is not valid (in other words: trashed).

Well, the bug report can not be false-positive. So, what happenned? And why there is no access violation in the same application compiled without EurekaLog?

Quite often you would see message like this for memory errors:
Access violation at address ... in module '...'. Read of address DEADBEEF
Where the data address would match or be close to the DEADBEEF value - this is the debug marker that EurekaLog fills in the memory being deleted. In addition to DEADBEEF, you can also encounter other values: $CCCCCCCC, $FEEEFEEE, $FDFDFDFD, $FADEDEAD, $00009999. EurekaLog attempts to reserve at least one of them for debugging purposes and will use the first one for which the reservation succeeds. There may also be values close to them (i.e. debug marker + offset, e.g. DEADBE8F).

However, in this particular case: the data address is 5653E4CC - which does not match any known debugger marker. So, at the first glance it could be a valid data value.

Note that 5653E4CC is supposed to be an address of the TList's FCount field. Now, let's take a look at the "Assember" tab from EL's bug report:
; Line=3297 - Offset=23
; ---------------------
03DB472F  8B7808          MOV  EDI, [EAX+8]                 ; <-- EXCEPTION
Now we can see that the TList's FCount field has the offset of 8 from the start of the TList object. In other words: it is the second field in the class. Therefore, the address of the TList object itself (in other words: the FEvents field in customer's TContosoEventCollection object) is 5653E4CC - 8 = 5653E4C4.

Why is this important?

Well, the 5653E4C4 = 1448338628 and it is NOT divisible by 16. In fact, it is not even disisible by 8. It matters because allocation granularity on Delphi is at least 16 bytes.

Conclusion: 5653E4C4 can not be address of any TList object in customer's application. Which means that it is a completely trash value. (Otherwise, we might suspect that it could be a value of some TList object - the one that was already deleted.)

Summary: the FEvents (TList) field in customer's TContosoEventCollection object is trashed.

Let's dig further. Look at the "Assembler" tab again:
; ContosoUnit.TContosoEventCollection.FindEvent (Line=1234 - Offset=0)
; ---------------------------------------------------------
03DB4718  55              PUSH EBP
03DB4719  8BEC            MOV  EBP, ESP
This sounds like a start of the TContosoEventCollection.FindEvent method to me. And what do we know about Delphi methods? Well, they pass Self as the implicit first argument. And what do we know about arguments passing in x86? Well, the first argument is stored in the EAX CPU register. It is further confirmed by the same "Assember" tab:
03DB471F  8BF0            MOV  ESI, EAX

; if not Assigned(FEvents) then
03DB4728  8B4604          MOV  EAX, [ESI+4]
03DB472B  85C0            TEST EAX, EAX
We can clearly see as Self is being copied from EAX into ESI, and then ESI is being used to read the FEvents field (which, as we remember, is trashed) - which has offset of 4 (e.g. the very first field in the class - also confirmed by customer's source code).

So, when crash occurs, the Self must be stored in the ESI CPU register. Let's take a look at it: 00451C10.

Do you see anything wrong with it?

Well, while the 00451C10 value is divisible by 16, but customer's EXE loads at the default 00400000 address - which is awfully close to the 00451C10. In fact, customer's app has crashed at the 03DB472F address (see the original exception's message) - which is well inside reach from the 00400000, given size of customer's exe being more than 100'000'000 bytes (as inspected from the "Modules" tab). 100'000'000 = 05F5E100, so 00400000 + 05F5E100 = 0635E100 - way more than the 03DB472F. Therefore, any valid data/objects must have addresses only above 0635E100 (or below 00400000).

In other words: 00451C10 is a code address. It can not possibly be any data/TObject address.

Conclusion: customer's TContosoEventCollection object is trashed too, so no wonder we read trash data from it, so its FEvents/TList field also becomes trash. Customer was unlucky in the sense that the invalid value of the TContosoEventCollection object was, in fact, readable, so CPU could load the (trashed) FEvents field from it and only crashed later. If customer was lucky - he would crash immediately on access to TContosoEventCollection's FEvents field.

Summary: customer should look further up stack to the caller code: it is calling the FindEvent method on the invalid object value. From where does it reads/takes the TContosoEventCollection object? Whatever it was - it is trashed now.

The customer has reported that he was able to find and fix the issue armed with this information.

As for the question why there was no access violation in the same application compiled without EurekaLog - well, applications without EurekaLog have absolutely no memory checks in them! So most memory-related errors go unnoticed.

P.S. Read more stories like this one or read feedback from our customers.