Internationalization in Java - Sip of Java
Billy Korando on August 17, 2022Java applications often need to work with data and users across the world. As national and linguistic applications are crossed, our applications need to be able to seamlessly handle the differences between languages, how text is formatted, types of currencies, and more. Luckily Java provides several SPIs that help with these needs. Let’s look at how Java handles internationalization (i18n)!
Finding the Proper Locale
Much of Java’s internationalization behavior is driven through the java.util.Locale
class. Locale
implements the IETF BCP 47 standard that codifies the identification and data exchange of human language.
Locale
has three constructors:
Locale(String language)
Locale(String language, String country)
Locale(String language, String country,
String variant)
The arguments language
, country
, and variant
are defined by the IETF BCP 47 standard to identify human languages. This can allow refining to specific dialects of a language like US English, new Locale("en", "US")
, British English new Locale("en", "GB")
, as needed. For the full list of supported Locales, check the links at the end of the article.
All SPIs have a no-arg static factory method to create an instance using the default Locale
supplied by the JVM or static factory methods taking Locale
as an argument, as well as other arguments in some cases.
BreakIterator
The java.text.BreakIterator
class provides a Locale
sensitive way of finding boundaries in text. This can be done at the character, word, sentence, and line levels. Note that “line” refers to where linebreaks can be done for text wrapping, not where a linebreak occurs in the supplied text. And a linebreak in the supplied text is considered a sentence terminator. Consider watching the embedded video for how the below text would be broken up using the different BreakIterator
settings.
Java is
great!
Collator
The java.text.Collator
class provides a Locale
sensitive way to compare Strings. Strings can be compared on PRIMARY
, SECONDARY
, TERTIARY
, and IDENTICAL
differences. For example, in English, ñ
and n
would be considered a SECONDARY
difference, while in Spanish, it would be PRIMARY
. The example below shows the difference in Collator
behavior based on Locale
and match strength.
Collator EN_COLLATOR =
Collator.getInstance(ENGLISH);
Collator ES_COLLATOR =
Collator.getInstance(SPANISH);
EN_COLLATOR.setStrength(Collator.PRIMARY);
ES_COLLATOR.setStrength(Collator.PRIMARY);
//ñ is not a primary difference in English,
but it is in Spanish
EN_COLLATOR.compare("nino", "niño");//0
ES_COLLATOR.compare("nino", "niño");//-1
EN_COLLATOR.setStrength(Collator.SECONDARY);
ES_COLLATOR.setStrength(Collator.SECONDARY);
//ñ is a secondary difference in English
EN_COLLATOR.compare("nino", "niño");//-1
ES_COLLATOR.compare("nino", "niño");//-1
Date and Number Formatting
The java.text.DateFormat
and java.text.NumberFormat
classes provide a Locale
sensitive way of formatting dates and numbers based on the supplied Locale
rules. A frequent source of confusion is how dates should be formatted by country. In the US, dates are typically formatted mm/dd/yyyy
, whereas many other countries use dd/mm/yyyy
. DateFormat
can handle this seamlessly, as shown in the below example:
US_DATE_FORMAT.format(new Date());
"8/15/22"
FRANCE_DATE_FORMAT.format(new Date());
"15/08/2022"
Date and Number Symbols
The java.text.DateFormatSymbols
and java.text.DecimalFormatSymbols
classes provides a Locale
sensitive way of providing values like weekdays, months, currency symbols and more.
DateFormatSymbols
In the below example, DateFormatSymbols
can be used to retrieve the names of the days of the week based on a Locale
. Though note the first day of the week is always Sunday, and the last Saturday, which might not match how the days of the week are typically ordered in that country.
DateFormatSymbols.getInstance(US)
.getWeekdays();
{"","Sunday","Monday","Tuesday", "Wednesday",
"Thursday","Friday","Saturday"}
DateFormatSymbols.getInstance(MEXICO)
.getWeekdays();
{"","domingo","lunes","martes","miércoles",
"jueves", "viernes","sábado" }
DecimalFormatSymbols
In the below example, DecimalFormatSymbols
can be used to retrieve the symbols associated when dealing with numbers. This can be particularly helpful when there are subtle differences. In the example below, France and Germany use € as their currency symbol, but France uses a space “ “ as a grouping separator, while Germany uses a period “.”:
DecimalFormatSymbols.getInstance(GERMANY)
.getCurrencySymbol()
"€"
DecimalFormatSymbols.getInstance(FRANCE)
.getCurrencySymbol()
"€"
DecimalFormatSymbols.getInstance(GERMANY)
.getGroupingSeparator()
"."
DecimalFormatSymbols.getInstance(FRANCE)
.getGroupingSeparator()
" "
java.time Package
Also, be sure to check out the classes under the java.time
package. Which provides several options for storing and formatting dates with a consistent API.
Additional Reading
- Java Internationalization Overview
- JDK 18 Supported Locales
- java.time package
- IETF BCP 47 language tag
Happy coding!