The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,261 other subscribers

parsing – delphi – strip out all non standard text characers from string – Stack Overflow

Posted by jpluimers on 2021/12/02

From a while back a totally non-optimised code example by me (intentionally limiting to AnsiStr as it was about filtering ASCII, and UniCode has way many code points for the Latin script).

// For those who need a disclaimer: 
// This code is meant as a sample to show you how the basic check for non-ASCII characters goes
// It will give low performance with long strings that are called often.
// Use a TStringBuilder, or SetLength & Integer loop index to optimize.
// If you need really optimized code, pass this on to the FastCode people.
function StripNonAsciiExceptCRLF(const Value: AnsiString): AnsiString;
var
  AnsiCh: AnsiChar;
begin
  for AnsiCh in Value do
    if (AnsiCh >= #32) and (AnsiCh <= #127) and (AnsiCh <> #13) and (AnsiCh <> #10) then
      Result := Result + AnsiCh;
end;

and an optimised one by [WayBack] David Heffernan

function StrippedOfNonAscii(const s: string): string;
var
  i, Count: Integer;
begin
  SetLength(Result, Length(s));
  Count := 0;
  for i := 1 to Length(s) do begin
    if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then begin
      inc(Count);
      Result[Count] := s[i];
    end;
  end;
  SetLength(Result, Count);
end;

Even when “trivial”, I usually do not prematurely optimise as optimised code is almost always less readable than non-optimised code.

Source: [Wayback] parsing – delphi – strip out all non standard text characers from string – Stack Overflow

–jeroen

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.