Getting gettext & translation into Allacrost

For discussion of the code running behind the game

Moderator: Staff

User avatar
Roots
Dictator
Posts: 8666
Joined: Wed Jun 16, 2004 6:07 pm
Location: Austin TX
Contact:

Re: Getting gettext & translation into Allacrost

Postby Roots » Fri Feb 29, 2008 6:49 pm

WK: I don't think your suggestion would work in a technical aspect. I'm not an expert on gettext (yet ;)), but I think the way it does the translation lookup is dependent on what file and at what location the string to translate was located in, which determines where the translated string should be. If we have what is essentially a wrapper function that makes the gettext lookup call from a single location, I think gettext won't be able to find what it needs. MC can hopefully confirm this for us.


I thought that "Translate()" would be short enough of a function name, but prophile's suggestion of TS isn't a bad one either. I'll need to observe our internationalization system in action before I lean either way though.
Image
rujasu
Developer
Posts: 758
Joined: Sun Feb 25, 2007 5:40 am
Location: Maryland, USA

Re: Getting gettext & translation into Allacrost

Postby rujasu » Fri Feb 29, 2008 8:59 pm

:approve: on naming the function TS()

:disapprove: on a "_" macro; I just don't think that's a wise idea.
User avatar
MoOshiCow
Developer
Posts: 18
Joined: Thu Jul 05, 2007 6:42 pm
Location: (currently) Pittsburgh, USA
Contact:

Re: Getting gettext & translation into Allacrost

Postby MoOshiCow » Tue Mar 18, 2008 6:06 pm

Roots wrote:WK: I don't think your suggestion would work in a technical aspect. I'm not an expert on gettext (yet ;)), but I think the way it does the translation lookup is dependent on what file and at what location the string to translate was located in, which determines where the translated string should be. If we have what is essentially a wrapper function that makes the gettext lookup call from a single location, I think gettext won't be able to find what it needs. MC can hopefully confirm this for us.


Well this is really going back to what villiam said in the posts above, up there..
a)

Code: Select all

function DoSomething(s) {
  DoSomethingWithString(s);
}

DoSomething(Translate("Hello World!"));
DoSomething(Translate("Hello Another World!"));


b)

Code: Select all

function DoSomething(s) {
  DoSomethingWithString(Translate(s));
}

DoSomething("Hello World!");
DoSomething("Hello Another World!");


(b) is the design that Winter Knight is suggesting, where DoSomething is AddText().

Technically speaking, yeah, we can do something like,

dialogue:AddText("Laila, what's wrong? You have a worried look on your face.", 1000, -1, -1);

And have Translate() be called within AddText() and then when looking for translatable strings, instead of looking for the
keyword=Translate(), we could lookup keyword=AddText().

This actually was my initial design, but as villiam and others have said in above posts,
design (a) is the choice we made for reasons posted above.

My argument for not switching back to (a) is, that AddText() isn't the only function that will have translatable strings in it.
For example, we'll need to translate item names, descriptions, etc, and if you look into dat/objects/items.lua:

Code: Select all

items[1] = {
   name = "Healing Potion",
   description = "Restores a small amount of hit points to a target.",
   icon = "img/icons/items/health_potion.png",
...


well, nothing calls AddText(), so we'll have to have some function to handle Translate() anyways...
and if i use Translate() in here, and use AddText() in the dialog scripts, and something else somewhere else...
well, that's just confusing for all of us..

Plus i'm not sure if we can use multiple keywords to search for translatable strings..probably can..
But hey, I'm also not an l33t expert on gettext() either..so I'll do some more testing and find out the best solution..
Meanwhile, you guys can suggest more stuff along the way ^^

-MoOshiCow
Winter Knight
Contributor
Posts: 304
Joined: Fri Sep 21, 2007 12:35 pm
Contact:

Re: Getting gettext & translation into Allacrost

Postby Winter Knight » Wed Mar 19, 2008 12:20 am

MoOshiCow wrote:Technically speaking, yeah, we can ... have Translate() be called within AddText() and then when looking for translatable strings, instead of looking for the
keyword=Translate(), we could lookup keyword=AddText().
...
My argument for not switching back to (a) [I assume you meant (b)] is, that AddText() isn't the only function that will have translatable strings in it.
For example, we'll need to translate item names, descriptions, etc, and if you look into dat/objects/items.lua:


I wasn't suggesting removing the Translate() function. I was just recommending that AddText() translate automatically. IOW, I was suggesting we use both.

MoOshiCow wrote:if i use Translate() in here, and use AddText() in the dialog scripts, and something else somewhere else...
well, that's just confusing for all of us..


That's one problem with having a single function perform multiple functions. It is a little harder to learn to use. I think it's worth it, though. In this case especially, it's not any more confusing than having every (or almost every) AddText() function embed another function inside it. (b) makes the lines significantly shorter and easier to read, and decreases the likeliness of leaving an unclosed parenthesis.
User avatar
prophile
Senior Member
Posts: 324
Joined: Fri Jan 27, 2006 7:18 pm
Location: Chaldon, Surrey, UK
Contact:

Re: Getting gettext & translation into Allacrost

Postby prophile » Thu Mar 20, 2008 1:46 pm

You need to be very careful and very consistent in use. In past projects I've worked on, the golden rule is no implicit translation - if you pass a string somewhere, it is YOUR job to translate it; likewise, if you receive a string from somewhere else, you should assume that it's already in the correct format to output.
Alastair Lynn / Resident Whinger / Allacrost
User avatar
Roots
Dictator
Posts: 8666
Joined: Wed Jun 16, 2004 6:07 pm
Location: Austin TX
Contact:

Re: Getting gettext & translation into Allacrost

Postby Roots » Thu Mar 20, 2008 3:33 pm

I agree with prophile. Having certain functions that automatically translate for you while others do not is bad. Then the programmer has to keep track of what is translated automatically and what is not, and we get a big mess.


I think we should just Translate() everything that needs to be translated. Yes, it is more typing. But it makes it easier to understand the code, especially for new people.
Image
TangentZ
Contributor
Posts: 16
Joined: Tue Mar 18, 2008 1:19 am

Re: Getting gettext & translation into Allacrost

Postby TangentZ » Thu Mar 20, 2008 5:10 pm

I just want to throw in my 2 cents. :frustrated:

I think this whole gettext stuff should have been done in a branch, since this seems like a major task. When it's completely tested, then it can be merged back into the trunk. :angel:

This way, other people working on the trunk are not affected by the intermittent commits that may or may not break something they're working on. :approve:

FWIW, gettext on Window seems to be a major PITA to get working. I've given up on it, for now. :cry:
User avatar
Roots
Dictator
Posts: 8666
Joined: Wed Jun 16, 2004 6:07 pm
Location: Austin TX
Contact:

Re: Getting gettext & translation into Allacrost

Postby Roots » Thu Mar 20, 2008 8:09 pm

We don't really use branches that much with our source (for better or worse). The only time I recall us branching something was when we saved a copy of an older version of an engine that had undergone a massive upgrade or something. We typically just do all work in the trunk, and occasionally a library dependency is added/changed/removed and the message is propagated that people on their respective systems need to accommodate (we don't have anyone specifically assigned to updating libraries on any of our supported platforms... its just whoever gets around to fixing it first).


The reason why gettext isn't working well in Windows right now is because we haven't had any active Windows developers on this team for about 4-8 weeks, so things weren't getting updated. I haven't had any problem at all with gettext and Linux...I haven't had to change a single thing in our make files and I don't think anyone else has done so either. :huh:
Image
TangentZ
Contributor
Posts: 16
Joined: Tue Mar 18, 2008 1:19 am

Re: Getting gettext & translation into Allacrost

Postby TangentZ » Thu Mar 20, 2008 8:53 pm

Sorry for going off topic. It's just that gettext on Windows frustrated me a lot. :bang: :angry:

At my work, we use branches on a regular basis. Whenever someone needs to work on a major task, and needs to make many commits that could break other people's code, we would create a branch for them and let them play in that "sandbox". :devil: Bug fixes can do directly into the trunk, though. ;)

Roots wrote:The reason why gettext isn't working well in Windows right now is because we haven't had any active Windows developers on this team for about 4-8 weeks, so things weren't getting updated. I haven't had any problem at all with gettext and Linux...I haven't had to change a single thing in our make files and I don't think anyone else has done so either. :huh:


Understood. It's just a little frustrating for someone new (like me) who is just trying to compile the project. I'll give gettext another chance and try to "conquer" it at a later time. Right now, I'm just studying how everything works, and just want to have a debug version up and running. :angel:
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Sun Apr 20, 2008 4:39 am

I've been looking into getting gettext into the configure and make setup and I have this nagging feeling that by calling the directory txt/ instead of po/ we will no longer be GNU-compliant... :shrug:
Image
User avatar
Roots
Dictator
Posts: 8666
Joined: Wed Jun 16, 2004 6:07 pm
Location: Austin TX
Contact:

Re: Getting gettext & translation into Allacrost

Postby Roots » Fri Dec 05, 2008 9:32 pm

Just stumbled upon this blog post at work and wanted to link it here:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
http://www.joelonsoftware.com/articles/Unicode.html
Image
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Sat Dec 06, 2008 10:32 pm

That was a good read.
Image
Linds
Developer
Posts: 145
Joined: Tue Jan 09, 2007 9:21 am
Location: Sydney, Australia

Re: Getting gettext & translation into Allacrost

Postby Linds » Tue Dec 09, 2008 11:38 am

gorzuate wrote:Thanks Linds :)

Linds told me that converting the rendering code to support UTF-8 would be a pain in the butt, since UTF-8 is a variable-byte encoding scheme. Gettext can be configured to use UTF-16 easily enough. Our ustring class essentially implements UTF-16 as well.

What I'm still perplexed about is why my test string, which was hard-coded into the C++ code and so didn't use Lua, wasn't printed out correctly when I had configured gettext to use UTF-16. Either I missed something when setting gettext up, or there's something wrong with our ustring class. What do you think, Linds?

Reply to the programmer roll call thread.

Well I'm not sure exactly why. What does gettext return? Whats the function call path from gettext to where you were using DrawText to draw it?

If its unsigned short*'s all the way I'm confused, but if we're casting or converting to std::strings or ustrings in the middle I'd be inclined to take a closer look...
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Tue Dec 09, 2008 6:37 pm

We have a Translate() function in utils.h/cpp. Any strings we wish to be translatable to other languages are wrapped in a call to this function. I think it takes a std::string as input. Anyway, this function then invokes gettext by calling a function gettext(). gettext() takes as input a const char *, and returns the translated string as a char *. The Translate() function then takes this return value from gettext(), converts it to a ustring, and returns it. So yes, there are conversions between ustrings and char * taking place.
Image
User avatar
Roots
Dictator
Posts: 8666
Joined: Wed Jun 16, 2004 6:07 pm
Location: Austin TX
Contact:

Re: Getting gettext & translation into Allacrost

Postby Roots » Tue Dec 09, 2008 6:47 pm

So regardless of the encoding scheme we tell gettext to use, it returns a zero-terminated string of bytes (chars). Then it is up to us to say "oh yeah, we told gettext to use this encoding scheme, which means we now need to take this array of bytes and cast it into an array of double bytes (or whatever data format we use)". Is that right?


It sounds to me that we we should be doing is the following when retrieving a translated string.

  1. Call the Translate() function with the source string (English std::string)
  2. Translate() will create a new ustring class and gives it the char* pointer passed from gettext
  3. The ustring class converts the char* data to whatever data format is easiest for the video engine to process when rendering text
  4. The text rendering component of the video engine extracts the translated and formatted data from the ustring to render the string to draw

Are we doing all these steps? It sounds like the 1st and 2nd ones we are, but I'm unsure if the 3rd and 4th are being done properly, which may be the cause of this problem. :think:
Image
Linds
Developer
Posts: 145
Joined: Tue Jan 09, 2007 9:21 am
Location: Sydney, Australia

Re: Getting gettext & translation into Allacrost

Postby Linds » Wed Dec 10, 2008 12:53 am

Sounds to me like Translate should be the following:

Code: Select all

hoa_utils::ustring Translate(const char *text)
{
  return hoa_utils::ustring(static_cast<uint16*>(gettext(text)));
}
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Wed Dec 10, 2008 8:11 am

static_cast does not compile. reinterpret_cast does, but then I get a translated string of all question marks on the command-line, and all boxes in the GUI.

Ok, so here's the deal.

In order to test this thing, I want to display a translated string in the command-line and also in the Allacrost GUI.

I'm using std::cerr to print it out on the command-line, so the string cannot be a ustring since std::cerr doesn't know what that is. Using the Translate function as it currently exists (i.e. returning a std::string), I printed out:

Code: Select all

cerr << Translate("blahblah") << endl;

and it displayed correctly with accents.

(So now we know gettext is working correctly. We also know that the return value from gettext (char*) converted to a plain old std::string contains everything we need in it already. Just printing out a std::string gives us accented characters! ... And this is why I find the need for the ustring class completely bewildering. Let's just use std::string everywhere! gettext can handle the rest for us.)

To put the same string in the GUI, I used an AddOption function for a BootMenu, which takes as text a ustring. So, when I called this AddOption function, I did it with the text argument as such: MakeUnicodeString(Translate("blahblah")). This showed up in the GUI with square boxes where the accented characters should have been, which is why I thought the text rendering code was messed up.

But your post above, Linds, caused to me to think some more, and so I went back to the command-line and printed out:

Code: Select all

cerr << MakeStandardString(MakeUnicodeString(Translate("blahblah"))) << endl;

and ended up with question marks where the accented characters should have been.
If I comment out the following code in the MakeStandardString() function:

Code: Select all

if(curr_char > 0xff)
   strbuff[c] = '?';
else

then I can get accented characters on the command-line.

So, at this point we know ... what? That converting a translated std::string to a ustring and back to std::string still gives us accented characters, meaning nothing was lost in the conversion. Therefore, MakeUnicodeString and MakeStandardString (with the commented out code) are fine and have no problems.

I can only conclude from this that the text renderer has issues with the ustring class.
Let's get rid of ustring and use std::string everywhere.
Image
rujasu
Developer
Posts: 758
Joined: Sun Feb 25, 2007 5:40 am
Location: Maryland, USA

Re: Getting gettext & translation into Allacrost

Postby rujasu » Thu Dec 11, 2008 12:23 am

gorzuate wrote:(So now we know gettext is working correctly. We also know that the return value from gettext (char*) converted to a plain old std::string contains everything we need in it already. Just printing out a std::string gives us accented characters! ... And this is why I find the need for the ustring class completely bewildering. Let's just use std::string everywhere! gettext can handle the rest for us.)


I believe the link that Roots posted the other day mentions this, but IIRC, ASCII supports accented characters (the types used in Spanish and French) on most systems, but does not support characters from other languages. That's what Unicode/UTF8/UTF16 is needed for.

Feel free to correct me if I'm wrong though.
User avatar
gorzuate
Developer
Posts: 2575
Joined: Thu Jun 17, 2004 3:03 am
Location: Hermosa Beach, CA
Contact:

Re: Getting gettext & translation into Allacrost

Postby gorzuate » Thu Dec 11, 2008 7:15 am

rujasu wrote:
gorzuate wrote:(So now we know gettext is working correctly. We also know that the return value from gettext (char*) converted to a plain old std::string contains everything we need in it already. Just printing out a std::string gives us accented characters! ... And this is why I find the need for the ustring class completely bewildering. Let's just use std::string everywhere! gettext can handle the rest for us.)


I believe the link that Roots posted the other day mentions this, but IIRC, ASCII supports accented characters (the types used in Spanish and French) on most systems, but does not support characters from other languages. That's what Unicode/UTF8/UTF16 is needed for.

Feel free to correct me if I'm wrong though.


gettext returns a translated string of type char* no matter which language you are translating to. Let's say we're translating to Japanese. Does this stuff about ASCII vs Unicode mean that the char* return value of gettext in this case would not be printable by std::cerr because it would contain some funky characters? I guess I thought if you can represent something in a char* string, then it should be printable/useable without any further modifications...or can only ASCII characters be represented in a char* string? If that's the case, how can gettext return a char* string in the first place without losing any of the extra information needed to represent the characters of the new language? :huh:
Image
User avatar
Drakkoon
Developer
Posts: 173
Joined: Thu Jan 11, 2007 12:54 am
Location: Montréal, Qc

Re: Getting gettext & translation into Allacrost

Postby Drakkoon » Thu Dec 11, 2008 3:37 pm

gorzuate wrote:gettext returns a translated string of type char* no matter which language you are translating to. Let's say we're translating to Japanese. Does this stuff about ASCII vs Unicode mean that the char* return value of gettext in this case would not be printable by std::cerr because it would contain some funky characters? I guess I thought if you can represent something in a char* string, then it should be printable/useable without any further modifications...or can only ASCII characters be represented in a char* string? If that's the case, how can gettext return a char* string in the first place without losing any of the extra information needed to represent the characters of the new language? :huh:


You can represent all you want in a char*, that's what UTF-8 is for.

In short:
ASCII is 7 bits per char, no accented characters.
ANSI is 8 bits per char, accented characters. (Usually encoded in Latin1 or windows-1252, on windows you need to convert to OEM format for accents to show in console)
UTF-8 is 8 to 32 bits (I think it might go up to 64 bits, but that's insane and never used) per character, Oriental chars and Klingon supported.

In UTF-8 if you use ASCII chars, the string will be the same as if it was encoded in ASCII. If you use accented characters or chars from another language it will use 2 to 4 chars of 8 bits each _for that character_. Meaning that in UTF-8 a string like "Aimé" will fit in a char[6] 3 for "Aim" 2 for "é" and 1 for "\0". "Été" will also fit in a char[6]. 2 chars for "É", 1 for "t", 2 for "é" and 1 for the "\0".

So gettext will always return a char*, but for a single "letter/glyph" it might use more than one char to represent it. Meaning that you can't simply print each char of the string to the screen, you must first check the value of the char. Depending on it's value it will indicate how many chars make up the glyph. Then you can take the 2,3 or 4 chars that make it up and convert it to a Unicode codepoint you can print on the screen.

So, if you print to std::cerr one of those french strings incoded in UTF-8, you'll get 5 chars and each "é" will look something like "@$" instead of just "é" because it's encoded with 2 chars. So yeah, only ASCII chars can be printed to the console correctly without prior conversion.
Last edited by Drakkoon on Fri Dec 12, 2008 2:56 am, edited 1 time in total.

Return to “Programming”

Who is online

Users browsing this forum: No registered users and 1 guest