Translating with CAT tools

Discuss how to use the Ren'Py engine to create visual novels and story-based games. New releases are announced in this section.
Forum rules
This is the right place for Ren'Py help. Please ask one question per thread, use a descriptive subject like 'NotFound error in option.rpy' , and include all the relevant information - especially any relevant code and traceback messages. Use the code tag to format scripts.
Post Reply
Message
Author
OrsonDeWitt
Regular
Posts: 35
Joined: Thu Dec 06, 2018 3:23 am
Contact:

Translating with CAT tools

#1 Post by OrsonDeWitt » Fri Jun 26, 2020 4:18 pm

Hey there,

So my translator wanted me to send her the files for translation, and I thought it's going to be a breeze because Ren'py has a built in solution for generating strings for translations. However, it wasn't all that easy in the end because what it gave me was this:

# game/scripts/boat.rpy:13
translate russian boat_c87765bd:

# player "Hey, you. What are you doing on my boat?"
player ""


Needless to say this is a terribly inconvenient format for anyone who is used to working in CAT tools (trados, memoq, etc.). So I am now wondering whether I could just give her the original files, with the regular indentation and the regular structure. If I did this, how would I then go about adding the translated script files into the game for seamless language switch integration? Thank you.

User avatar
Jackkel Dragon
Veteran
Posts: 269
Joined: Mon Mar 31, 2014 7:17 pm
Organization: Nightshade, Team Despair
itch: jackkel-dragon
Location: USA
Contact:

Re: Translating with CAT tools

#2 Post by Jackkel Dragon » Fri Jun 26, 2020 7:57 pm

I don't know anything about CAT tools, but the way Ren'py's translation works is to add the translation for each string into the generated files in the /tl/[language] folder. So it would look like:

Code: Select all

# game/scripts/boat.rpy:13
translate russian boat_c87765bd:

# player "Hey, you. What are you doing on my boat?"
player "Эй, ты.  Что ты делаешь на моей лодке?"
Then, when the game detects the language setting has changed, it replaces everything it finds with the new language's strings. Missing strings use the base language. Some displayables have issues with translating mid-line, but the next line onward should all work.
Main Website
Includes information about and links to many of my current and past projects.

Major Game Projects
[Nightshade] Eldritch Academy, Eldritch University, Blooming Nightshade, Flowering Nightshade, Life as Designed
[Team Despair] Corpse Party D2 series

philat
Eileen-Class Veteran
Posts: 1853
Joined: Wed Dec 04, 2013 12:33 pm
Contact:

Re: Translating with CAT tools

#3 Post by philat » Sat Jun 27, 2020 5:53 am

Not guaranteeing that this is what you're looking for, but probably worth a look. *shrug* viewtopic.php?f=32&t=55318

OrsonDeWitt
Regular
Posts: 35
Joined: Thu Dec 06, 2018 3:23 am
Contact:

Re: Translating with CAT tools

#4 Post by OrsonDeWitt » Sat Jun 27, 2020 7:06 am

philat wrote:
Sat Jun 27, 2020 5:53 am
Not guaranteeing that this is what you're looking for, but probably worth a look. *shrug* viewtopic.php?f=32&t=55318
It works with some work-arounds. Thanks!

Fede
Newbie
Posts: 1
Joined: Mon Jul 06, 2020 11:15 am
Location: Italy
Contact:

Re: Translating with CAT tools

#5 Post by Fede » Mon Jul 06, 2020 11:38 am

I've recently translated a Ren'PY game (One Night Stand) with a CAT tool (memoQ), so my experience may help you.

After receiving the source inside RPY files with the format described above, I created a small piece of Python code to parse the RPY files with regular expressions to extract all data and save it in TSV (tab-separated values) files that were easy to import on memoQ and some more code to extract the translation from memoQ's TSV into the original RPY files.

This is the part that extracts the source from RPY and saves data into TSV:

Code: Select all

#!/usr/bin/env python
# extracts source text and translated text from Ren'Py files and creates tsv files

import glob # list .rpy files in directory
import re # regex

# regex to extract ids
regex_id = re.compile(r'^translate italian\s*(?P<id>.+):(?P<id_content>[\s\S]+?)(?=translate|\Z)', re.MULTILINE)

# regex to extract speaker/type and source text
regex_source = re.compile(r'^    (# |old)(?P<type>.*?)\s*\"(?P<text>.*)\"', re.MULTILINE)

# regex to extract speaker/type and translated text
regex_target = re.compile(r'^    (?P<type>extend|new|[^\s]{1,2})\s*\"(?P<text>.*)\"', re.MULTILINE)

# create a list for .rpy files
rpy = []
# add all .rpy files to the list
for f in glob.glob("*.rpy"):
    rpy.append(f)

# open every file and convert it
for f in rpy:
    # open .rpy input file; encoding = "UTF-8" due to cyrillic and chinese characters
    with open(str(f), "r", encoding = "UTF-8") as rpy_file:
        # create and open .rpy.tsv output file
        with open(str(f) + ".tsv", "w", encoding = "UTF-8") as tsv_file:
            # write the header row
            tsv_file.write("ID\tType\tSource\tTarget")
            # read the whole input file and store it in input
            input = rpy_file.read()
            # search for an id using regex
            for i in regex_id.finditer(input):
                # create a list for speaker/type
                speaker_list = []
                #create a list for source texts
                source_list = []
                # search for speaker/type and source text using regex
                for j in regex_source.finditer(i.group(2)):
                    # add speaker/type to list
                    speaker_list.append(j.group(2))
                    # add source text to list
                    source_list.append(j.group(3))
                # create a list for target texts
                target_list = []
                # search for target text using regex
                for k in regex_target.finditer(i.group(2)):
                    # add target text to list
                    target_list.append(k.group(2))
                # write to the output file a newline followed by id, speaker, source text and translated text found, separated by tabulations
                for type, source, target in zip(speaker_list, source_list, target_list):
                    tsv_file.write("\n{}\t{}\t{}\t{}".format(i.group(1),type,source,target))
The TSV output file contains the following information
column A: ID
column B: speaker/type
column C: source language
column D: target language

In your example

Code: Select all

# game/scripts/boat.rpy:13
translate russian boat_c87765bd:

# player "Hey, you. What are you doing on my boat?"
player ""
ID = boat_c87765bd
speaker/type = player
source = Hey, you. What are you doing on my boat?
target =


NOTE: The regex I used worked for that specific game and will need to be adjusted. And perhaps someone who's actually good at programming could improve on it.

Parts that need editing:

Code: Select all

# regex to extract ids
regex_id = re.compile(r'^translate italian\s*(?P<id>.+):(?P<id_content>[\s\S]+?)(?=translate|\Z)', re.MULTILINE)
This is specific for Italian, as it searches "translate italian" - I think I can rework this part to make it work with any language

Code: Select all

# regex to extract speaker/type and source text
regex_source = re.compile(r'^    (# |old)(?P<type>.*?)\s*\"(?P<text>.*)\"', re.MULTILINE)
This is used to determine which lines contain the source; any line starting with " # " or " old" will be considered a source line.

Code: Select all

# regex to extract speaker/type and translated text
regex_target = re.compile(r'^    (?P<type>extend|new|[^\s]{1,2})\s*\"(?P<text>.*)\"', re.MULTILINE)
This is used to determine which lines contain the target; in my specific case, it was lines starting with four white spaces + "extend", "new" or words of up to 2 letters without spaces ([^\s]{1,2}).

In ONS the player is "y", while in the example above it's "player", so this code needs to be edited so that " player" is understood as a target line, for example, so the last regex would need to be changed to

Code: Select all

# regex to extract speaker/type and translated text
regex_target = re.compile(r'^    (?P<type>player|extend|new|[^\s]{1,2})\s*\"(?P<text>.*)\"', re.MULTILINE)
or similar.

If you or anyone thinks this can be useful, I can adapt it so that it works for them, I just need the RPY files outputted by Ren'Py.

Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot]