Tiburon's LoadFromFile and SaveToFile for Unicode characters

Posted by on in Blogs
With Tiburon, I can use Unicode characters with VCL components like TMemo, TListBox, TComboBox (and others that contain string lists).  How can I  load the strings from a file and save the strings to a file? How do I need to modify any existing Delphi and C++Builder programs to handle Unicode characters for these components?  Here is the answer.

There is a new, optional, parameter for the LoadFromFile and SaveToFile methods. The optional parameter is named "Encoding" and its type is class type "TEncoding". TEncoding (defined in the SysUtils unit) contains several class properties that you can use to specify the type of strings you want to load and/or save:  ASCII, BigEndianUnicode, Default, Unicode, UTF7, UTF8.

The following are the declarations for LoadFromFile and SaveToFile methods for components that contain TStrings (defined in the Classes unit)

Delphi:
  procedure TStrings.LoadFromFile(const FileName: string);
  procedure TStrings.LoadFromFile(const FileName: string; Encoding: TEncoding);
  procedure TStrings.SaveToFile(const FileName: string);
  procedure TStrings.SaveToFile(const FileName: string; Encoding: TEncoding);

C++Builder:
  virtual void __fastcall LoadFromFile(const System::UnicodeString FileName)/* overload */;
  virtual void __fastcall LoadFromFile(const System::UnicodeString FileName, Sysutils::TEncoding* Encoding)/* overload */;
  virtual void __fastcall SaveToFile(const System::UnicodeString FileName)/* overload */;
  virtual void __fastcall SaveToFile(const System::UnicodeString FileName, Sysutils::TEncoding* Encoding)/* overload */;

Looking at the Delphi implementation for SaveToFile shows the use of TStream and the encoding I provide:

procedure TStrings.SaveToFile(const FileName: string);
begin
  SaveToFile(FileName, nil);
end;

procedure TStrings.SaveToFile(const FileName: string; Encoding: TEncoding);
var
  Stream: TStream;
begin
  Stream := TFileStream.Create(FileName, fmCreate);
  try
    SaveToStream(Stream, Encoding);
  finally
    Stream.Free;
  end;
end;

The following examples show how to load and save the strings with a ListBox VCL component on your form:

Delphi:
  Listbox1.Items.LoadFromFile('c:\temp\MyListBoxItems.txt',TEncoding.UTF8)
  ListBox1.Items.SaveToFile('MyListBoxItems.txt',TEncoding.UTF8);

C++Builder:
  ListBox1->Items->LoadFromFile("c:\\temp\\MyListBoxItems.txt", TEncoding::UTF8);
  ListBox1->Items->SaveToFile("c:\\temp\\MyListBoxItems.txt",TEncoding::UTF8);

Here is a screen shot of my example Delphi application:

delphihelloworld_658.jpg 

Here are links to the Delphi and C++Builder versions of the application:  delphihelloworld_660.zip  cpphelloworld_661.zip

With Tiburon, now my Delphi and C++ demo applications can handle Unicode characters in list boxes, edit boxes, and labels,  and I can also save and load the Unicode strings to/from my hard drive.


About
Gold User, Rank: 1, Points: 2466
David Intersimone (known to many as David I.) is a passionate and innovative software industry veteran-often referred to as a developer icon-who extols and educates the world on Embarcadero developer tools. He shares his visions as an active member of the industry speaking circuit and is tapped as an expert source by the media. He is a long-standing champion of architects, developers and database professionals and works to ensure that their needs are folded into Embarcadero's strategic product plans. David holds a bachelor's degree in computer science from California Polytechnic State University at San Luis Obispo, California.

Comments

  • Guest
    Jolyon Smith Tuesday, 15 July 2008

    And what exactly does a NIL encoding mean as applied to a Unicode string?

  • Guest
    Aleksander Oven Tuesday, 15 July 2008

    >TEncoding ... BigEndianUnicode, ..., Unicode

    Couldn't you just call the encodings what they are, i.e. UTF16BE and UTF16LE?

  • Guest
    Kryvich Tuesday, 15 July 2008

    Is it possible to save a Unicode stringlist in a certain ANSI codepage? I.e.:

    ListBox1.Items.SaveToFile(’MyListBoxItems1251.txt’, TEncoding.CP1251);
    ListBox1.Items.SaveToFile(’MyListBoxItems1250.txt’, TEncoding.CP1250);
    or may be
    ListBox1.Items.SaveToFile(’MyListBoxItems.txt’, TEncoding.ANSI, 1251);
    // type of encoding + number of codepage

  • Guest
    Mike Dillamore Tuesday, 15 July 2008

    I strongly agree with Aleksander. "Unicode" is absolutely _not_ an encoding - it is an abstract concept representing a character set. Google for the phrase "Unicode is not an encoding" for numerous explanations of why this is fundamentally wrong.

    Aleksander's suggested names (UTF16BE and UTF16LE) would be correct. If the term Unicode were to be applied to any encoding (which it shouldn't!), the most appropriate would be UTF-32, being the only one that can represent the full character set without variable length encodings.

  • Guest
    Dennis Tuesday, 15 July 2008

    Oh yes, and there is a bug in the attached sample

    ListBox1.Items.SaveToFile('MyListBoxItems.txt',TEncoding.UTF8);

    should read

    ListBox1.Items.SaveToFile('c:\temp\MyListBoxItems.txt',TEncoding.UTF8);

    well. I think no one needs this info, but well.

  • Guest
    Bernhard Geyer Wednesday, 16 July 2008

    procedure TForm34.ListBox1Click(Sender: TObject);
    begin
    Label1.Caption := ListBox1.Items.Strings[ListBox1.ItemIndex];
    end;

    should be

    procedure TForm34.ListBox1Click(Sender: TObject);
    begin
    if ListBox1.ItemIndex > -1 then
    Label1.Caption := ListBox1.Items.Strings[ListBox1.ItemIndex];
    end;

  • Guest
    Maël Hörz Wednesday, 16 July 2008

    Thanks for the post.

    I think that Unicode and BigEndianUnicode aren't good encoding names. The Unicode standard makes a clear distinction between the set of characters and its encoding. I guess what you want is UTF16LE and UTF16BE as encoding-classes or more verbose names if you prefer. But please don't use Unicode as an encoding name. MS did it in the past when UTF-16 was equal to UCS2 and considered to be the only necessary encoding (Windows NT).

    Please, pretty please, change that. In projects where I worked and such naming was done it created confusion amongst developers and led to bugs that could have been avoided if the name was clear and didn't mix up concepts.

  • Guest
    davidi Wednesday, 16 July 2008

    Here is the SaveToStream implementation to answer some of the questions above about what "nil" does, and what the encodings do for streaming out and in Strings:
    procedure TStrings.SaveToStream(Stream: TStream; Encoding: TEncoding);
    var
      Buffer, Preamble: TBytes;
    begin
      if Encoding = nil then
        Encoding := TEncoding.Default;
      Buffer := Encoding.GetBytes(GetTextStr);
      Preamble := Encoding.GetPreamble;
      if Length(Preamble) > 0 then
        Stream.WriteBuffer(Preamble[0], Length(Preamble));
      Stream.WriteBuffer(Buffer[0], Length(Buffer));
    end;
    Note: *byte-char* based strings can have an affinity to a given codepage. For UTF8String, it is UTF8String = type AnsiString(65001); Assigning a UnicodeString to a UTF8String will perform an automatic conversion. The reverse is also true.
    The <code> value is whatever the underlying OS supports.
    Glad everyone is commenting and catching my typos and inbetween versions of these sample demos. I will fix them.

  • Guest
    davidi Wednesday, 16 July 2008

    I should have added that "Default" encoding = user's active code page.

  • Guest
    davidi Wednesday, 16 July 2008

    For more on Tiburon "String Theory" check out Allen Bauer's recent blog post at
    http://blogs.codegear.com/abauer/2008/07/16/38864/

  • Guest
    Remy Lebeau (TeamB) Wednesday, 16 July 2008

    The new TEncoding class is modeled after .NET's System.Text.Encoding class. That is where the "Unicode" and "BigEndianUnicode" property names come from.

    As for loading/saving in a specific codepage, TEncoding has support for that as well, similar to .NET:

    var
    Enc: TEncoding;
    begin
    Enc := TEncoding.GetEncoding(1251);
    try
    ListBox1.Items.SaveToFile(’c:\temp\MyListBoxItems.txt’, Enc);
    finally
    Enc.Free;
    end;
    end;

  • Guest
    Jolyon Smith Wednesday, 16 July 2008

    "That is where the "Unicode" and "BigEndianUnicode" property names come from."

    I would have thought that people that wanted Unicode so badly that they were using .NET already would not be that interested in knowing that badly chosen names from .NET were lovingly preserved in a Win32 implementation.

    A bad name is a bad name. A bad excuse for using a bad name doesn't make it a good name.

    This is just yet more hamstringing/pollution of the Win32 implementation in the name of (psst) Delphi.NET compatability. Y'know, the thing that someone at CodeGear recently said was "over" (reading between the lines - "had been a mistake from the get go").

    Same ol', same ol'.

    New name over the door, same old crap coming through it.

  • Guest
    Jolyon Smith Wednesday, 16 July 2008

    Oh, and thanks (DavidI) for clarifying the NIL encoding Q.

  • Guest
    Mike Dillamore Wednesday, 16 July 2008

    Thanks Remy for clarifying the origin of the erroneous "Unicode" and "BigEndianUnicode" property names. Please, though, take note of the feedback in these comments. It would be no credit to CodeGear/Embarcadero to make a mistake simply because Microsoft made it once already.

  • Guest
    Arne Hartmann Monday, 28 July 2008

    Creat that unicode is comming!
    But what happens with the filename?
    Here only a "string" type is defined and not "Widestring" to support unicode also for the path and filename itself.

  • Guest
    Samir Sunday, 24 August 2008

    filename is also unicode. Because, string is now (Delphi 2009) UnicodeString (Before was AnsiString).

  • Please login first in order for you to submit comments
  • Page :
  • 1

Check out more tips and tricks in this development video: