Getting gettext & translation into Allacrost

For discussion of the code running behind the game

Moderator: Staff

User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Thu Dec 11, 2008 9:59 pm

That is a most excellent explanation. Thank you Drakkoon :approve:
Image
User avatar
Roots
Dictator
Posts: 8666
Joined: Wed Jun 16, 2004 6:07 pm
Location: Austin TX
Contact:

Re: Getting gettext & translation into Allacrost

Postby Roots » Fri Dec 12, 2008 2:16 am

Drakkoon wrote:So yeah, only ASCII chars can be printed to the console correctly without prior conversion.


In Linux I know its possible to get unicode support in your console so the above isn't always true, but I would assume that the vast majority of systems only support ASCIi encoding in their consoles.


Anyway, do you know enough to fix the problem now gorzuate? Is there something that needs to be done in the video engine that you can not do?
Image
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Fri Dec 12, 2008 8:59 am

Roots wrote:Anyway, do you know enough to fix the problem now gorzuate? Is there something that needs to be done in the video engine that you can not do?


Linds' code example converted wchar_t characters to uint16, and then displayed that uint16 string through the video engine correctly. Which can only mean the conversion from the char* returned by gettext to our ustring class isn't working correctly.

Linds also mentioned reading from Lua and being able to convert from UTF-8 to UTF16. If we use uint16s in the code/engine, and gettext is configured to UTF16, I would hope that we could stick with that all the way and not have to do any conversions between UTF-8 and UTF16.
Linds mentioned the need for Lua to read a Unicode string. That's the way to do it (not sure how yet) if all the strings in the Lua map files are wrapped in calls to Translate(). Another way would be to read in a regular string from the Lua map file, which our scripting engine can currently support, and if it needs translating, wrap the returned string in a call to Translate(). In this way all calls to gettext would be done on the C++ side rather than the Lua side and we wouldn't have to figure out how to get Lua to play nice with Unicode. (Now, you might say with this method the programmer would need to keep track of which strings should be translatable, but that's not the case. If gettext cannot find a translation for a string because it doesn't exist it will return the original string. So with this method to make things simple for the programmer, all strings returned by a call to the scripting engine's ReadString function would be wrapped in a call to Translate()).
Image
Linds
Developer
Posts: 145
Joined: Tue Jan 09, 2007 9:21 am
Location: Sydney, Australia

Re: Getting gettext & translation into Allacrost

Postby Linds » Fri Dec 12, 2008 9:46 am

gorzuate wrote:Linds' code example converted wchar_t characters to uint16, and then displayed that uint16 string through the video engine correctly. Which can only mean the conversion from the char* returned by gettext to our ustring class isn't working correctly.

Yeah MakeUnicodeString() doesn't actually do proper conversion at all, it just copies an array of bytes into an array of shorts. If we change that function to a real version that uses libiconv for example, text should start to draw properly. I believe iconv might be a gettext dependency anyway so this might be quite reasonable.

gorzuate wrote:Linds also mentioned reading from Lua and being able to convert from UTF-8 to UTF16. If we use uint16s in the code/engine, and gettext is configured to UTF16, I would hope that we could stick with that all the way and not have to do any conversions between UTF-8 and UTF16.
Linds mentioned the need for Lua to read a Unicode string. That's the way to do it (not sure how yet) if all the strings in the Lua map files are wrapped in calls to Translate(). Another way would be to read in a regular string from the Lua map file, which our scripting engine can currently support, and if it needs translating, wrap the returned string in a call to Translate(). In this way all calls to gettext would be done on the C++ side rather than the Lua side and we wouldn't have to figure out how to get Lua to play nice with Unicode. (Now, you might say with this method the programmer would need to keep track of which strings should be translatable, but that's not the case. If gettext cannot find a translation for a string because it doesn't exist it will return the original string. So with this method to make things simple for the programmer, all strings returned by a call to the scripting engine's ReadString function would be wrapped in a call to Translate()).

I don't think we can expect lua to output or convert to UTF16 for us, I think we're going to have to do that ourselves. Which is fine really, if we follow the above and implement a real UTF8->16 conversion function this will just work, as lua is just handing us unmolested pure UTF8 bytes. So its really fine to have lua calling translate, it supports UTF8 anyway. No need for blanket solutions on this one ;).

What I can't find is any documentation on gettext's UTF16 support. Have you got a reference on that? It seems strange it returns a char* and expects the user to do a cast to a different size format.

But do you want me to just check in a version of MakeUnicodeString that does an actual conversion to svn?
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Fri Dec 12, 2008 8:06 pm

Linds wrote:What I can't find is any documentation on gettext's UTF16 support. Have you got a reference on that? It seems strange it returns a char* and expects the user to do a cast to a different size format.


It's not in the docs :angel:
One day I just decided to see if gettext could handle it. So in the configuration functions invoked to setup gettext, I told it to use UTF16 instead of UTF-8. I figured if it didn't work gettext would barf somehow, either by not compiling/linking, or by spitting out erroneous return values or error messages, or just returning a string that looked very goofy. Since none of these things happened, I just assumed it could handle UTF16. :uhoh:
Anyhow, it's probably safer to stick to the officially supported UTF-8.

Linds wrote:But do you want me to just check in a version of MakeUnicodeString that does an actual conversion to svn?


Yeah, sure :approve:
Image
User avatar
Roots
Dictator
Posts: 8666
Joined: Wed Jun 16, 2004 6:07 pm
Location: Austin TX
Contact:

Re: Getting gettext & translation into Allacrost

Postby Roots » Sat Dec 13, 2008 7:46 pm

gettext probably returns char* because it supports multiple encodings and doesn't want to have multiple gettext calls that return the string in different formats. Just my guess.


So are we officially going to be using UTF8 encoding then? If so, we should probably document this somewhere (like in the system engine documentation on the wiki).
Image
Guthur
Newbie
Posts: 8
Joined: Sat Dec 13, 2008 9:53 pm

Re: Getting gettext & translation into Allacrost

Postby Guthur » Sun Dec 14, 2008 1:45 am

Wonderful world of unicode :eyespin: , to call it a standard is just a sick joke if you ask me :)
Linds
Developer
Posts: 145
Joined: Tue Jan 09, 2007 9:21 am
Location: Sydney, Australia

Re: Getting gettext & translation into Allacrost

Postby Linds » Sun Dec 14, 2008 11:07 am

Ok I've checked in that bad boy. But, there are a few issues for us to deal with in the long run.

First and foremost, prophile's point can not go unheeded. UTF16 is a variable byte format and our code currently doesn't take that into account. Anything that uses CalculateTextWidth() will be correct as SDL is handling that, but ustring::length() != num_characters. The glyph cache for example is invariably flawed considering we might not actually be looking at a glyph, possibly a plane marker. Even worse, we could be looking at a character from another plane and caching that glyph where it will be used by the plane 0 equivalent. So technically we support UCS-2.

The wikipedia article is a good read, and expands on the difference. Specifically interesting:
Wikipedia Article wrote:UTF-16 is the native internal representation of text in the Microsoft Windows 2000/XP/2003/Vista/CE; Qualcomm BREW operating systems; the Java and .NET bytecode environments; Mac OS X's Cocoa and Core Foundation frameworks; and the Qt cross-platform graphical widget toolkit.


This is expanded on by this article highlighting the advantages of using UTF16 in text processing over UTF8 and UTF32.
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Tue Dec 16, 2008 12:14 am

I would prefer we deal with this sooner (i.e. now, since we're already working on it) rather than later. I would like the code/game to be fully Unicode-functional when we make the next release so we don't have to come back and look at it again.

Since you brought up the glyph cache, I have a question about it. I assume its intent was to cache the glyphs necessary for rendering certain strings of text, so that if you had to render them again then it would go much faster. Furthermore, I would assume the glyph cache needs to be cleared when switching languages (I noticed if I start a new game in English and talk to Laila, then return to the boot menu and select French and start a new game and talk to Laila, the conversation is still in English). I tried just duplicating (with some tweaks) some code I found I think in the TextManager's destructor and putting it in its own glyph-cache-clearing function, but when I called that when I switched languages the old text disappeared, no new text appeared, and I got a seg fault. So, what needs to be done video engine-wise in order to use a new language?
Image
Linds
Developer
Posts: 145
Joined: Tue Jan 09, 2007 9:21 am
Location: Sydney, Australia

Re: Getting gettext & translation into Allacrost

Postby Linds » Tue Dec 16, 2008 6:53 am

gorzuate wrote:I would prefer we deal with this sooner (i.e. now, since we're already working on it) rather than later. I would like the code/game to be fully Unicode-functional when we make the next release so we don't have to come back and look at it again.

Ok understood. Well we need to make a decision about how to support this internally. As far as I see it regardless of which way we choose to do this we need to set a dependency. If we choose UTF8 we need a library that will give us a UTF8 character iterator, length and valueat function. If we choose UTF16 we need a conversion function. The lack of complexity of the UTF16 standard means we could write trivial methods for iterators and character extractors internally.

Personally I think we should stick with UTF16 and just extend ustring for usability. Add an ostream::operator<<(ustring) and a std::string ustring::ToString() method for example. This would make debugging fairly painless and the UTF16/8 conversion process fairly transparent. All that needs to be done then is a few changes to the text rendering system and we should have full functionality.

gorzuate wrote:Since you brought up the glyph cache, I have a question about it. I assume its intent was to cache the glyphs necessary for rendering certain strings of text, so that if you had to render them again then it would go much faster. Furthermore, I would assume the glyph cache needs to be cleared when switching languages (I noticed if I start a new game in English and talk to Laila, then return to the boot menu and select French and start a new game and talk to Laila, the conversation is still in English). I tried just duplicating (with some tweaks) some code I found I think in the TextManager's destructor and putting it in its own glyph-cache-clearing function, but when I called that when I switched languages the old text disappeared, no new text appeared, and I got a seg fault. So, what needs to be done video engine-wise in order to use a new language?


The glyph cache should be cleared, but in all honesty it doesn't need to be. It only caches on a glyph by glyph basis so it can't possibly cache whole English strings. Something else is going on there. Are you sure the newly translated string is being passed through? You aren't using a pre-rendered TextImage or anything?
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Tue Dec 16, 2008 8:46 am

Let's stick with UTF-16.

As for clearing the glyph cache... let's see, well, when a string is passed to the text renderer it goes via the Translate() function. If a string in English is rendered, then the user switches languages, the same string translated to the new language should not be present in the cache since it requires different glyphs, so the TextImage would have to be different. Maybe I am giving it the same TextImage, but I don't know how I would be doing that. Somehow I need to tell the renderer to "refresh" itself, or something along those lines, because just switching languages (via an environment variable) isn't doing the trick...

Also, remember that const_cast you put into the UTF8ToUTF16() function()? Turns out I have an iconv.h from libiconv version 1.11 that was expecting a const char **. But after upgrading to libiconv version 1.12, it's just a char **. I thought that was strange. Anyway, I am having a problem though, I'm getting the following message printed out lots of times:
"Failed to initialise UTF8->UTF16 conversion through iconv."
I guess the iconv_open() function is failing, though I'm not sure why. When I changed the arguments to "UTF-16" and "UTF-8" (with a dash), I no longer got that message, but all the text in the game was boxes (translated or not)... :huh:
Image
Linds
Developer
Posts: 145
Joined: Tue Jan 09, 2007 9:21 am
Location: Sydney, Australia

Re: Getting gettext & translation into Allacrost

Postby Linds » Tue Dec 16, 2008 9:48 am

gorzuate wrote:Let's stick with UTF-16.

As for clearing the glyph cache... let's see, well, when a string is passed to the text renderer it goes via the Translate() function. If a string in English is rendered, then the user switches languages, the same string translated to the new language should not be present in the cache since it requires different glyphs, so the TextImage would have to be different. Maybe I am giving it the same TextImage, but I don't know how I would be doing that. Somehow I need to tell the renderer to "refresh" itself, or something along those lines, because just switching languages (via an environment variable) isn't doing the trick...

Also, remember that const_cast you put into the UTF8ToUTF16() function()? Turns out I have an iconv.h from libiconv version 1.11 that was expecting a const char **. But after upgrading to libiconv version 1.12, it's just a char **. I thought that was strange. Anyway, I am having a problem though, I'm getting the following message printed out lots of times:
"Failed to initialise UTF8->UTF16 conversion through iconv."
I guess the iconv_open() function is failing, though I'm not sure why. When I changed the arguments to "UTF-16" and "UTF-8" (with a dash), I no longer got that message, but all the text in the game was boxes (translated or not)... :huh:

Well actually TextImage doesn't even use the glyph cache, thats a speedup technique for rendering text directly only. I would say the same TextImage is being used.

What exactly is happening on language switch? You're calling setenv("LANG", "fr") or something ? Then that makes subsequent calls to gettext() return the string in that language? The TextImages in question would have to be set to the new string. I believe there is a function called SetText or something similar for this purpose.

But if we want to support mid runtime language changes we're going to have to look at a backend solution so we don't have to manually go round setting the text correctly. E.g. Modes implement a CreateLanguageStrings function or something that sets all the TextImage contents.

For now, setting the TextImage explicitly will re-render it with the new contents.
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Wed Dec 17, 2008 12:15 am

Yeah, you've got the idea with how gettext switches languages.

So, the wiki makes no mention of a TextImage. So how exactly does it work if I create a TextBox or an OptionBox and give it a ustring as an argument? How does it actually create the text and render it onto the screen. It seems from what you said that it creates an image of the letters (glyphs? which are stored in the glyph cache?) in the string and then displays the image everytime the TextBox or OptionBox is displayed. Is that somewhat correct? Where are these images stored? Are they local to each TextBox/OptionBox or in a general location relative to the video engine? Instead of a CreateLanguageStrings function for each mode like you suggested, would it be better if, when the language switched to another one, some flag was passed to the video engine which would delete all TextImages that had already been created?
Image
User avatar
Roots
Dictator
Posts: 8666
Joined: Wed Jun 16, 2004 6:07 pm
Location: Austin TX
Contact:

Re: Getting gettext & translation into Allacrost

Postby Roots » Wed Dec 17, 2008 2:14 am

Right now none of the GUI classes make use of TextImages. They render the text anew each frame based on the ustring text that they contain. So if you change the text string for these classes, you should see the text change just fine when you draw the GUI object. The plan is to make the GUI system use only text images in the future and not the slow re-render and draw calls.
Image
User avatar
prophile
Senior Member
Posts: 324
Joined: Fri Jan 27, 2006 7:18 pm
Location: Chaldon, Surrey, UK
Contact:

Re: Getting gettext & translation into Allacrost

Postby prophile » Wed Dec 24, 2008 3:40 pm

Although you could have the "immediate" text rendering store an image in a hash table and render it, and check that hash table for each call - that way the strings are only rendered once and used as many times as the programmer desires. You'd have to store a "last used time" with the entry in the hash table and purge text entries which haven't be used for more than 30 seconds regularly.
Alastair Lynn / Resident Whinger / Allacrost
Linds
Developer
Posts: 145
Joined: Tue Jan 09, 2007 9:21 am
Location: Sydney, Australia

Re: Getting gettext & translation into Allacrost

Postby Linds » Tue Feb 24, 2009 10:05 am

Ok I basically need to know from people if the current trunk works as far as translation goes. I've been able to add 'runic' unicode characters to the French translation thats currently set up and have them display perfectly in dialogue from lua. This is of course in addition to the more standard French characters. No boxes to be seen.

Do people still have boxes coming up? Gorz?
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Tue Feb 24, 2009 6:56 pm

Huh? :huh: The current trunk? Well, I'll double-check tonight on my laptop running OS X, but I don't think anything has changed in the repository for it to suddenly start working for me (unless people just aren't making their commit posts...). I'd also like to check to see if it works on my Linux box, but I get a seg fault as soon as I start up the game :sad: (could be for any number of reasons, one being my graphics drivers aren't working properly...)

If it's working for you I wouldn't mind seeing a screenshot...
Image
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Wed Feb 25, 2009 6:48 am

On Linux, the game doesn't run, I think because of problems with my audio card (after examining the output of -d all).

On Mac, the current trunk still does not work for me. Accented characters show up as boxes, and the following message is printed out many, many times:
"Failed to initialise UTF8->UTF16 conversion through iconv."
I have libiconv version 1.12. Maybe it's buggy... which version do you have?
Image
Linds
Developer
Posts: 145
Joined: Tue Jan 09, 2007 9:21 am
Location: Sydney, Australia

Re: Getting gettext & translation into Allacrost

Postby Linds » Thu Mar 05, 2009 2:49 pm

Ok I did a quick check on the format names of my Mac's iconv and it seems we were using names available on linux that weren't available on OS X. I've commited revision 1490 which changes our names to those available on both platforms.

These are
  • iconv (GNU libc) 2.7 (linux)
  • iconv (GNU libiconv 1.9) (OS X)
For reference, the new names in the code are UTF-16 and UTF-8, both of which should be visible in the list returned when you run 'iconv -l' from a terminal. My Mac is far from set up to compile Alla so I'll admit I haven't tested the effect on OS X, but this doens't regress my linux support.

The screenshot below is some dialog for which I added some characters to the 'French' translation from the Runic character set, which are high Unicode. Apologies for the stipple background, keep meaning to get rid of that.

Image
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Fri Mar 06, 2009 7:21 am

Already beat you to it Linds, a couple posts above I said the following:

gorzuate wrote: Anyway, I am having a problem though, I'm getting the following message printed out lots of times:
"Failed to initialise UTF8->UTF16 conversion through iconv."
I guess the iconv_open() function is failing, though I'm not sure why. When I changed the arguments to "UTF-16" and "UTF-8" (with a dash), I no longer got that message, but all the text in the game was boxes (translated or not)... :huh:


This is still the case with last commit you made. Also, when I start a new game, either in English or French, it crashes with the following messages:

Code: Select all

MODE MANAGER: GameMode constructor invoked
WARNING:/Users/philip/allacrost/demo/src/engine/video/option.cpp:SetSelection:379: argument was invalid (out of bounds): 0
SCRIPT ERROR: ReadScriptDescriptor::OpenFile() could not open the file ??????????????????????
cannot read ??????????????????????: No such file or directory
BOOT: BootMode destructor invoked.
MODE MANAGER: GameMode destructor invoked
Program received signal:  “EXC_BAD_ACCESS”.


I tried it with libiconv 1.9 and 1.12, same results either way.
Image

Return to “Programming”

Who is online

Users browsing this forum: No registered users and 1 guest