Sunday, November 5, 2023

[Yukon Beta Blog] THashset


A HashSet is a high-performance, unordered collection of unique elements.

Delphi 12 introduces THashSet<T> and TObjectHashSet<T> classes in System.Generics.Collections.

Of course, Delphi has had other options for this kind of thing.

  • TStringList with its fairly quick binary search.
  • THashedStringList, which was created specifically for TMemIniFile, has improved read times and better scaling.
  • TDictionary, which uses hashes for keys and has significantly faster read and write times than the other two. Embarcadero (eventually) replaced the THashedStringList in TMemIniFile with an internal-only wrapper around TDictionary.

Thomas Mueller did a great writeup on these, including some benchmarks.

One of the benefits to having a dedicated THashSet instead of using a TDictionary and ignoring things like values is clarity. Code that's easier to read is also easier (and cheaper) to maintain.

For example, in a discussion about how we might take advantage of Delphi 12's new GraphViz feature, someone from Embarcadero posted this interesting one-liner:

TDigraph = class(TObjectDictionary<string, THashSet<string>>);

It struck me as a nice clean way to represent the unit dependency graph generated by the compiler. The intent is clear and it's simpler than what I was doing. I really need to step up my game.

THashSet is one of the many incremental improvements in this release.


Disclaimer:

"This blog post is based on a pre-release version of the RAD Studio software and it has been written with specific permission by Embarcadero. No feature is committed until the product GA release."

2 comments:

Stefan Glienke said...

FWIW the image you are using does not match the implementation of the hash table in Delphi. The RTL does not use a linked list or array to solve collisions but uses linear probing.

Bruce McGee said...

@Stefan, Yeah, I know. I just "borrowed" something vaguely related.