Part 2: Content Globalization 101 in Twenty Minutes
Blogger: Craig Roth
Here’s the blog companion for part two of my 4 part podcast series on “What IT needs to know about content globalization, localization, and translation”. If you missed part 1, you can find it here.
Audio URL: Download Globalization Part 2 (19 minutes, 6.6MB)
- This part contains background information on globalization that can be handy for people coming up to speed on globalization or looking for rationale for the analysis, but can be skipped if not needed
- Definitions
- Content globalization (numeronym: g11n) is a strategy to convey information in the cultural, linguistic, and business context of target audiences.
- Localization (L10n) is adapting content to a particular locale
- Internationalization (i18n) is the work you do enable localization (prep work for localization)
- Transliteration (t13n) is using characters from one language to represent another
- Resources to create a statement of trends for a particular business
- Internet World Stats to show the prevalence of the languages you want to translate into or the growth or overall percentage of users in a particular language
- English still #1 with 365 million users, 184 million for #2 Chinese
- Growth trends: between 2000 and 2007 for Arabic (+941%), Portuguese (+525%), and Chinese (+470%)
- Internet usage outside of North America dwarfs usage within it: 20% north America, 37% Asia, 27% Europe
- Ethnologue to show number of countries speaking a language or what weight a content-creating organization should place on localization issues given the makeup of a specific country being targeted
- The Ethnologue can also show linguistic diversity in countries to demonstrate how heterogeneous a country is. It lists the probability that any two people in the country chosen at random will speak different native languages
- Internet World Stats to show the prevalence of the languages you want to translate into or the growth or overall percentage of users in a particular language
- Dialects
- The Portuguese spoken in Brazil and Portugal are much different and can lead to misunderstandings or make a reader feel the author doesn’t truly understand them
- Corporate English is a dialect
- See sample courses from CorporateEnglish and ELT
- Code Internationalization
- Wikipedia: “The current prevailing practice is for applications to place text in resource strings which are loaded during program execution as needed. These strings, stored in resource files, are relatively easy to translate. Programs are often built to reference resource libraries depending on the selected locale data.”
- As with content globalization, localization concerns should be moved to the beginning of the planning process from the end and businesses need to treat globalization as a first-order imperative
- Examples of system development guidelines
- Standards
- Darwin Information Typing Architecture (DITA)
- Created for complex documents in multiple formats
- Not specifically for globalization, but there’s a translation subcommittee
- DITA enforces a strict structure on content which makes it easier to translate, but interferes with the writing process
- Probably fine for technical manuals, but not artful content like ads or brochures
- DITA requires customization before use
- Internationalization Tag Set
- Defines the metadata needed content globalization
- Examples: tags to define the source language, parts that shouldn’t be translated, targeting for a right-to-left language, notes to pass to the translator
- Still evolving
- Translation Memory eXchange (TMX)
- Translation memory is used to keep a record of how certain phrases have been translated in the past so those translations can be used again to save time, leverage the best translations, and ensure consistency
- Owned by LISA – the Localization Industry Standards Association – which is trying to unify many standards for translation
- Unicode
- Unicode is a universal 16 bit code to represent characters in memory and on disc
- Every language hasn’t been encoded, but most of those in commercial use are
- Most major software is now Unicode compliant, although that doesn’t mean every language is automatically supported and there are different flavors of Unicode
- Darwin Information Typing Architecture (DITA)
- The printed report also covers ISO 8601 Date Formats and XLIFF


Comments