Internationalization in Java - Sip of Java

Java applications often need to work with data and users across the world. As national and linguistic applications are crossed, our applications need to be able to seamlessly handle the differences between languages, how text is formatted, types of currencies, and more. Luckily Java provides several SPIs that help with these needs. Let’s look at how Java handles internationalization (i18n)!

Finding the Proper Locale

Much of Java’s internationalization behavior is driven through the java.util.Locale class. Locale implements the IETF BCP 47 standard that codifies the identification and data exchange of human language.

Locale has three constructors:

Locale(String language)
Locale(String language, String country)
Locale(String language, String country, 
	String variant)

The arguments language, country, and variant are defined by the IETF BCP 47 standard to identify human languages. This can allow refining to specific dialects of a language like US English, new Locale("en", "US"), British English new Locale("en", "GB"), as needed. For the full list of supported Locales, check the links at the end of the article.

All SPIs have a no-arg static factory method to create an instance using the default Locale supplied by the JVM or static factory methods taking Locale as an argument, as well as other arguments in some cases.

BreakIterator

The java.text.BreakIterator class provides a Locale sensitive way of finding boundaries in text. This can be done at the character, word, sentence, and line levels. Note that “line” refers to where linebreaks can be done for text wrapping, not where a linebreak occurs in the supplied text. And a linebreak in the supplied text is considered a sentence terminator. Consider watching the embedded video for how the below text would be broken up using the different BreakIterator settings.

Java is
great!

Collator

The java.text.Collator class provides a Locale sensitive way to compare Strings. Strings can be compared on PRIMARY, SECONDARY, TERTIARY, and IDENTICAL differences. For example, in English, ñ and n would be considered a SECONDARY difference, while in Spanish, it would be PRIMARY. The example below shows the difference in Collator behavior based on Locale and match strength.

Collator EN_COLLATOR =
	Collator.getInstance(ENGLISH);
Collator ES_COLLATOR = 
	Collator.getInstance(SPANISH);


EN_COLLATOR.setStrength(Collator.PRIMARY);
ES_COLLATOR.setStrength(Collator.PRIMARY);

//ñ is not a primary difference in English, 
but it is in Spanish
EN_COLLATOR.compare("nino", "niño");//0
ES_COLLATOR.compare("nino", "niño");//-1

EN_COLLATOR.setStrength(Collator.SECONDARY);
ES_COLLATOR.setStrength(Collator.SECONDARY);

//ñ is a secondary difference in English
EN_COLLATOR.compare("nino", "niño");//-1
ES_COLLATOR.compare("nino", "niño");//-1

Date and Number Formatting

The java.text.DateFormat and java.text.NumberFormat classes provide a Locale sensitive way of formatting dates and numbers based on the supplied Locale rules. A frequent source of confusion is how dates should be formatted by country. In the US, dates are typically formatted mm/dd/yyyy, whereas many other countries use dd/mm/yyyy. DateFormat can handle this seamlessly, as shown in the below example:

US_DATE_FORMAT.format(new Date());
"8/15/22"

FRANCE_DATE_FORMAT.format(new Date());
"15/08/2022"

Date and Number Symbols

The java.text.DateFormatSymbols and java.text.DecimalFormatSymbols classes provides a Locale sensitive way of providing values like weekdays, months, currency symbols and more.

DateFormatSymbols

In the below example, DateFormatSymbols can be used to retrieve the names of the days of the week based on a Locale. Though note the first day of the week is always Sunday, and the last Saturday, which might not match how the days of the week are typically ordered in that country.

DateFormatSymbols.getInstance(US)
	.getWeekdays();
{"","Sunday","Monday","Tuesday", "Wednesday", 
"Thursday","Friday","Saturday"}

DateFormatSymbols.getInstance(MEXICO)
	.getWeekdays();
{"","domingo","lunes","martes","miércoles", 
 "jueves", "viernes","sábado" }

DecimalFormatSymbols

In the below example, DecimalFormatSymbols can be used to retrieve the symbols associated when dealing with numbers. This can be particularly helpful when there are subtle differences. In the example below, France and Germany use € as their currency symbol, but France uses a space “ “ as a grouping separator, while Germany uses a period “.”:

DecimalFormatSymbols.getInstance(GERMANY)
	.getCurrencySymbol()
"€"

DecimalFormatSymbols.getInstance(FRANCE)
	.getCurrencySymbol()
"€"

DecimalFormatSymbols.getInstance(GERMANY)
	.getGroupingSeparator()
"."

DecimalFormatSymbols.getInstance(FRANCE)
	.getGroupingSeparator()
" "

java.time Package

Also, be sure to check out the classes under the java.time package. Which provides several options for storing and formatting dates with a consistent API.

Additional Reading

Happy coding!