How we do Internationalization

Reading Josh’s post on helpful Date/Time/TimeZone handling methods inspired me to write about how we deal with the complexities of internationalization and localization in our app.

When we started, we set out some principles for our framework, application, and any related code:

  • Data flowing into our system from external sources (human users, integrations, import, etc) will be normalized into a single unit to the maximum extent possible.
  • External recipients receiving data from our system will have the option of receiving the normalized or localized data (for example, export = normalized, web app = localized).
  • Raw unstructured input (such as the contents of text input fields) will not be normalized and will be treated as raw data (we won’t try to parse through text fields looking for dates, for example).

That first point requires a little bit of clarification because you quickly run into problems with, for example, currency.  So we refined it a little to say that only data which represents a unit for which there is a standard conversion will be normalized. Otherwise, the value will be stored with appropriate extra information.  A few examples might help explain this.

Safely Convertible Data

Physical measurements like length, width, weight, temperature, etc all have standard conversions (feet to meters, pounds to kilograms, Fahrenheit to Celsius, etc). We convert these into a standard unit (metric, Celsius, etc).  These kinds of measurements are pretty straightforward and the formulae for converting them are well known.  So unless someone decides to change the gravitational constant of the universe, you shouldn’t have much trouble here.

Currency

Currency does not have a standard conversion and varies wildly. So we don’t convert these units. Instead, we store the appropriate information to establish definite context in the future: Value, Currency Unit, and Date/Time when the value was recorded.  With these three things, we could reconstruct the value and convert it to whatever unit we needed anytime in the future. For example, what was the value of USD$12.57 in Italian Lira on February 4, 2011 at 11:08am?  Note that Martin Fowler has covered this particular problem in Patterns of Enterprise Application Architecture with the “Money Pattern.”  This problem is not like the date/time problem below because there is no standard, immovable baseline for currency (no, not even gold).  With money, everything is relative.

Date/Time

Date/Time is a complicated enough beast without tossing in the problem of time zones. Add to this the fact that governments can and frequently do change the meaning of time zones and you’ve got a recipe for future disaster.  It’s not enough to store a date, time, and time zone because you’ll need to know what the legal definition of that time zone was at that date and time. And good luck trying to deal with the math on a range of dates that span time zones and/or cover a period where the government changed the definition of the time zone.  This task is roughly equivalent in difficulty to a muddy pig catching contest.  It’s not impossible, but you’ll be covered in mud and smell like pig when you’re all done.  Those who have worked with date/time math before know exactly what I mean by this.

So Date/Time appears to be like currency on its face, but it won’t take long before you strongly regret your decision to store it like money.  What’s the answer then?  Store everything in a constant time zone.  Convert all incoming date/times into a standard time zone. When sending out or displaying dates/times, convert them into the recipient’s time zone according to the laws of right now.  In this situation, all you have to worry about is the law *right now* for the time zone in question.  There’s no going back in time and trying to figure out what the UTC offset for “Indiana (East)” was in 1996 versus 2011.  When the date/time is coming in, convert it according to today’s law. When it’s going out, convert it according to today’s law. Easy as pie! Fortunately, the OS and your language framework usually handle the problem of figuring out what the law is right now (Windows/.NET BCL, *nix/zone.tab, etc)

But then the question is, which time zone do you choose as the constant?  The server’s time zone?  This seems an obvious/easy one at first since that’s what most OS/frameworks make easiest. But if your app is going to last more than a few months, you will grow to regret this decision.  There are a few problems with this decision:

  • What if the law changes the definition of the server’s time zone?
  • What if you end up moving the server?
  • What if you end up having multiple servers in geographically diverse locations?

All of these problems will ruin your day and force you to go through your entire database and correct/adjust each date/time value.  And the last problem will require changes to your code to support conversions depending on where a particular server is.

The correct answer here is to normalize all date/times into the UTC/GMT time zone.  This time zone will never change and is not affected by any governing laws unless somehow the entire world decides to move to a new time system. I don’t have an answer for that one, but I’m guessing that you’ll have bigger problems to deal with besides your data at that point. There’s just some things you can’t plan for, I guess.

List Data

Another issue we ran into is how to store data selected by users from select boxes.  We wanted localized language values to appear in the select box in the user’s language, but we wanted to store the value as language-agnostic.  Fortunately, select boxes inherently support this separation of data-value and display-value. So the implementation here was relatively easy. Data values go into the database, display values are configurable, localized by culture, and displayed to the user.

Summary

In the end, the hardest challenge we faced is being consistent. It seems every time we turned around, there was yet another way that data was received or displayed by the system.  This particular problem is probably the #1 reason why FubuMVC’s UI facilities exist in the first place, and are necessarily so conventional (NOTE: due to the compositional architecture of FubuMVC, its conventional support is available outside FubuMVC)

Having data flow into and out of your database through a standard layer of conventions helps ensure that you consistently, always, and everywhere enforce your data conversion principles and make a truly localized application that stands the test of humans from every continent. Aside from the “storing date/times in UTC” lesson, the lesson on conventions was perhaps the greatest lesson we learned through this whole exercise.

Related Articles:

    Post Footer automatically generated by Add Post Footer Plugin for wordpress.

    About Chad Myers

    Chad Myers is the Director of Development for Dovetail Software, in Austin, TX, where he leads a premiere software team building complex enterprise software products. Chad is a .NET software developer specializing in enterprise software designs and architectures. He has over 12 years of software development experience and a proven track record of Agile, test-driven project leadership using both Microsoft and open source tools. He is a community leader who speaks at the Austin .NET User's Group, the ADNUG Code Camp, and participates in various development communities and open source projects.
    This entry was posted in datetime, Dovetail, internationalization. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
    • Joshua

      Can anyone recommend books on this matter (handling timezone, internationalization, etc)? Thanks!

    • Goran

      There may be some problems with lira conversion in 2011

    • http://www.lostechies.com/members/chadmyers/default.aspx chadmyers

      Oh yeah. Silly Europeans and their currency. :)

      I guess that exemplifies my point — it’s not just currency values that change, but currencies themselves!

    • Noel Kennedy

      Thanks Chad, very useful post! Please can you expand a bit on how you do list data? Does this mean having say a resx with all the DB reference data hardcoded and indexed on primary key? How do you manage the duplication & maintenance of doing it this way?

    • http://www.lostechies.com/members/chadmyers/default.aspx chadmyers

      @Noel:

      We definitely do not use resx’s. Those things are a nightmare to use and maintain.

      We have a table in our database that contains the keys and translation-by-culture.

      It looks like this:

      [id] [uniqueidentifier] NOT NULL,
      [LastModified] [datetime] NULL,
      [Created] [datetime] NULL,
      [Culture] [nvarchar](255) NULL,
      [Name] [nvarchar](255) NULL,
      [Text] [nvarchar](1000) NULL,

      We cache it at app startup. We can reference keys in code using hard-coded (constant) keys that match the keys in the “Name” field in the database.

      Also, keys flow through our app framework to show up whenever we need a label, a grid column header, etc. A lot of it is conventional.

      We have tooling to ensure that the app and DB are kept up to date and we have “warts” that show us if a particular blob of text hasn’t been translated so that they stick out like a sore thumb for the tester who alerts us to their presence.

    • http://dgondotnet.blogspot.com/ Daniel

      In regards to currencies, I am the author of NMoneys (http://code.google.com/p/nmoneys/) which offers a sensible way to display monetary quantities depending of the currency.
      But currencies do have an standard (ISO 4217) that gets updated every now and then.

      But I agree with you, monetary conversions are mayor heachache, and no, a Money Pattern implementation won’t help you to solve that problem.

    • http://www.lostechies.com/members/chadmyers/default.aspx chadmyers

      @Daniel:

      ISO 4217 doesn’t have to do with how to store currency values, only how to represent them consistently.

      The Money Pattern is about how to *store* currency values (for example, in memory, in the DB, etc) for retrieval at an undetermined future date and time. ISO 4217 relates to the “currency” or “type” members of the Money pattern and to which degree of specificity to store the decimal portion of the value.

      NMoneys looks interesting for displaying and formatting currency values that have already been stored and loaded from the storage mechanism. Thanks for sharing that link!

      -Chad