SummaryBy Neal Katz
This article covers Neal Katz's experiences developing a multilingual Website using DbForms and MySQL 4.1. Katz discusses the configuration details and provides tips on how to use the various localization tools available. The article will also explain how to generate reports with JasperReports and POI (Poor Obfuscation Implementation). (3,200 words; September 6, 2004)
Developing a Unicode-based application can be a frustrating experience because of the myriad of little "gotchas" involved with making each piece of a complex system work with Unicode data. This article covers my recent experiences in building a college registration system supporting English and the Thai languages. The development was J2EE-based and used the following technologies:
During the project's initial phases, my team needed to decide on a database to use that would store Thai characters. For maximum flexibility, we decided to use Unicode to store data, instead of a Thai character set. Our database would also store English data, and future versions of this system might need to support other languages as well. Due to our limited budget, we narrowed the database choice to open source products that support Unicode.
My prior experience with MySQL led me to make the potentially risky choice of using MySQL 4.1, which is clearly marked as a develop release. After doing research on the Web and reading mail archives, I concluded that MySQL 4.1 had some problems with complex queries and complicated collating with fields of different character sets; however, 4.1 had no reported problems with data integrity or database corruption.
We started development with the MySQL 4.1.1 release and noticed a few complex queries where the ordering did not work as expected; we were forced to rewrite the queries or break a single query into two queries. We were also able to create one complex query that crashed the system (but did not cause any corruption). Upgrading to 4.1.2 solved the problem with the crashing query. To date, we have not experienced any crashing with 4.1.2.
We are happy with our decision to use 4.1.2, but it was a risky choice, and we were prepared to back out if necessary. Also, given the application type, this risk level was acceptable. For a more mission-critical application, I probably would not have used MySQL 4.1.
MySQL supports multiple table types, the default is ISAM (indexed sequential access method), which is fast, but does not offer referential integrity through foreign key constraints. Instead, we used InnoDB tables in our system and used the constraint feature for referential integrity. Any potential performance loss was more than offset by the reliability that referential integrity adds to a complex database.
Using Unicode with MySQL 4.1 is simple, and there are two ways to specify which character set to use. The first approach specifies a character set as the default for a table; the second declares a character set for a specific field. Specifying a field's character set takes precedence over the table's character set.
Example 1. Define a default character set for a table
CREATE TABLE `city` (
`city_id` int(11) NOT NULL default '0',
`name` varchar(25) NOT NULL default '',
`name_eng` varchar(25) NOT NULL default '',
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Example 2. Define a default character set and specific character set for a field
CREATE TABLE `test2` (
`alpha` varchar(5) character set utf8 NOT NULL default '',
`beta` varchar(10) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
In the first example, all fields are UTF-8: name
stores Thai data and name_eng
stores English language data. In the second example, the table's character set is latin1
and field alpha
is UTF-8. latin1
is the standard character set for tables that don't specify a default character set.
On my version of Linux (SuSE 9.1), I can display Thai characters using the MySQL shell by starting it with the command mysql --default-character-set=utf8
.
Unfortunately, this did not work on Windows. Under Windows, we had to
wait until we had a Webpage working before we could visually verify
that we had Thai data. When using the commands mysqldump
to back up a database and mysql
to restore the database, you must use --default-character-set=utf8 option
, otherwise character set information is lost.
We used a commercial product MySQL Manager 2 from EMS
to design the database and generate our queries. The tool offers a nice
visual query builder that can then display the query's text. We then
copied the SQL code into our source. The tool is designed to work with
MySQL 4.0 and did not support 4.1's character set extensions. If we
created a table with the tool, we then had to manually execute an alter table
command to change the character set.
Server platform
After selecting MySQL, our next choice was deciding which J2EE server
to use. Development started with Tomcat 4.1.x, but we quickly switched
to 5.0.19 for the better tag library support. As an extra bonus, we
were able to take advantage of the EL (expression language) support
integrated with 5.0.x.
The final major decision was our selection of DbForms. DbForms bills itself as a tool that "enables developers to build sophisticated Web-based database-driven applications in very short time and with very little efforts. DbForms applications are built in a manner conceptually similar to RAD [rapid application development]."
DbForms is best thought of as many different pieces: first, a J2EE framework for writing database-driven applications; second, a rich library of tags to take advantage of the framework; and third, a GUI tool that generates JSP (JavaServer Pages) pages from a database via XSL (Extensible Stylesheet Language) templates. We modified some of these template files to add support for including a common file at the top and generating all pages as UTF-8.
DBForms's GUI tool is
useful in creating initial JSP pages for data views and entry. For
simple screens, the code created by the GUI tool can be used unaltered.
In our application, approximately 40 percent of the data entry screens
were generated by the GUI tool and deployed unaltered. Another 30
percent of the screens required minimal coding to be useful, usually
adding DbForms's Select
tag so that the user could select a field value from a foreign database.
One limitation of the GUI tool is that it is not a "lifecycle" tool. If you use it to regenerate a page or the global dbforms-config.xml
file, it will overwrite the old file. The GUI tool does not remember
changes from previous edits. To get around this limitation, I created a
parallel directory structure called "dummy," made new edits there, then
copied and pasted from the dummy directory to the real directory. This
proved useful in situations where I added a few fields to a table and
wanted to regenerate the data entry screen, but did not want to lose
any changes I had previously made to the JSP page.
Now that we have covered the big picture, let's start looking at the individual steps required to get things working.
Getting Unicode to work
The first thing we did was write a standalone Java program that would
read in Thai characters from a MySQL database and write them out to a
file. The test database was populated by hand. The program's key part
is the section that opens the connection:
DriverManager.setLogWriter(new PrintWriter(System.out));
Class.forName("com.mysql.jdbc.Driver");
Connection db =
DriverManager.getConnection(
"jdbc:mysql:///" + dbname +
"?requireSSL=false&useUnicode=true&characterEncoding=UTF-8",
"root", "");
The setLogWriter()
function is used for diagnostics and isn't needed in production, but is helpful for tracking down problems.
The JDBC (Java Database Connectivity) connect string has three parameters. Their meaning should be obvious. We used requireSSL=false
since we had problems building MySQL 4.1.x from scratch using SSL
(Secure Socket Layer) on Linux. We didn't try very hard though; in our
environment, SSL was not important. The useUnicode
and characterSetEncoding
parameters tell the JDBC subsystem that the database is Unicode and everything is UTF-8. Note that the string UTF-8
is the Java way of specifying the charset; MySQL uses the string utf8
. If changing charsets, you'll have to compare what Java supports with what MySQL supports.
We also copied the above code to a JSP page and ensured we could display Thai. The only tricky part of that exercise was setting the page output charset. That was accomplished by putting the following line at the top of each page:
<%@ page contentType="text/html;charset=UTF-8" language="java" %>
Note: If you use Apache with mod_jk (the Apache module for running Tomcat as a plug-in within Apache), you will have to configure Apache to treat the page as UTF-8 as well; the Apache setting might overwrite the Tomcat setting (depending on which version you use). Our project used Tomcat as a standalone server.
Our next step was to get things working with DbForms. We started by using the GUI tool and setting the JDBC connect string, jdbc:mysql:///dbtest?requireSSL=false&useUnicode=true&characterEncoding=UTF-8
. The tool then created a dbforms-config.xml
file. Scroll down to the bottom of the file and you should see:
<dbconnection
Name="jdbc:mysql:///stpneal2?requireSSL=false&useUnicode=true
&characterEncoding=UTF-8"
isJndi = "false"
conClass = "com.mysql.jdbc.Driver"
username = "root"
password = ""/>
Note: If you do not see &
, then you are using an old version of the GUI tool, which does not handle XML escaping correctly.
In the DbForms GUI tool, click on the XSL Transformation tab and create some JSP files. You can then review the code produced to see what DbForms and the GUI tool can do for you in terms of generating ready-to-use JSP pages.
We then manually edited the generated JSP files and inserted the <%@ page contentType="text/html;charset=UTF-8" language="java" %>
directive. After verifying it worked, we modified the XSL files used by
the GUI tool to generate the code. The change will vary file-to-file,
but it will look something like this:
//--file "
<xsl:value-of select="$fileName" />
" ------------------------------------------------
<xsl:text disable-output-escaping="yes">
<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<%@ include file="inc_taglibs.jsp" %>
</xsl:text>
As you can see, we modified the XSL file to output two directives at the top of each file generated. One sets the page contentType
and charset
; the other includes a file that we used to define all our taglibs.
Everything worked fine
in terms of displaying data, but updating data did not work correctly.
A little digging revealed the problem to be that the form-processing
code did not know what charset to use. A limitation of HTML form
processing is that the data's charset entered in the form or in the
page used to display the form is not passed back to the server. Once we
discovered the problem, the answer was simple: we set Tomcat to treat
all form data as UTF-8. We did that by defining a filter in the web.xml
file. A filter is a code snippet called by the Web server that
manipulates the input before passing it on. The filter code is in the
class file org.dbforms.util.external.SetCharacterEncodingFilter
. This file is a copy of the sample filter that ships with Tomcat, but it has been copied into the DbForms JAR for ease of use:
<!-- ======== Filter for Set Character Encoding ======= -->
<filter>
<filter-name>Set Character Encoding</filter-name>
<filter-class>
org.dbforms.util.external.SetCharacterEncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
At this point, our application displayed and updated Thai information in the database, but we still had many tasks to complete in order to make our application fully usable as a multilingual site.
The next task was to
display static content on the page that was localized. The easiest way
to perform localization is with tags, and we could choose from two
different tag libraries: the standard formatting (prefix FMT) tag
library that is part of JSTL (Java Standard Tag Library), or DbForms's db:message
tag. We ended up supporting both. In most cases, we used the FMT tag
explicitly in our JSP code, but we used the DbForms message tag code
implicitly to perform localization on HTML widgets like buttons.
Fortunately, both tag libraries worked the same way—they used a set of property files in the WEB-INF/
classes directory composed of key-value statements.
A brief introduction to ApplicationResources files for localization
Localization is performed through a hierarchy of property files. There
is a base file and then descendent files. If a given key cannot be
found in a descendent file, its parent files will be checked in
ascending order. The hierarchy model is used to support a default file,
a specific language, and its dialects.
Assuming we use the name ApplicationResources
for our file naming, we can have four files in WEB-INF/
classes:
|
Depending on our
current locale's setting, we start with one of these files. If a key
cannot be found, then the next ascending file is checked until we run
out of files and generate a not found
.
Here are some sample ApplicationResources
files showing how they look and describing how the lookup logic works:
FILE: ApplicationResources.properties:
One=first
Two=second
Elevator=up thing
user_name=User Name
RoomSchedule.font=Lucida-plain-10
FILE: ApplicationResources_en.properties:
Two=dos
Elevator=Elevator
FILE: ApplicationResources_en_UK.properties:
Two=double
Elevator=LIFT
FILE: ApplicationResources_th.properties:
user_name=ชื่อผู
้ใช้งาน
If our locale is en_UK
, then searching for "Elevator" would find "LIFT," and searching for "One," would return "first."
In the Thai example, we
must use Unicode escape sequences to encode our strings. Unfortunately,
the property files are plain 8-bit ASCII. You can use the native2ascii -encoding UTF-8 command src dtst
command to convert a UTF-8 file to ASCII, but in our environment, we
just found it easier to use Dreamweaver in design mode, then cut and
paste from code view. Microsoft Word can also be used; type in some
characters, save as a Webpage, then view code.
You should adopt a naming convention early in your project to keep things organized. In our environment, we used the prefix bean.xyz.key
if a message was used in bean xyz
; abc.key
if a key was used on JSP page abc
; and no prefix for common things like user_name
. Ours is not the best example, but better than no naming scheme at all.
For our application, we used only the default file ApplicationResources.properties
and ApplicationResources_th.properties
,
effectively making English the default language. The files also
contained internal configuration information that was locale-specific,
such as the Bean.RoomSchedule.font=Lucida-plain-10
setting. (For more details, see Resources).
Configuration for both localization methods is done in web.xml
. For the JSTL, the configuration setting is a top-level context-param
:
<web-app>
<context-param>
<param-name>javax.servlet.jsp.jstl.fmt.localizationContext</param-name>
<param-value>ApplicationResources</param-value>
</context-param>
For DbForms, it is an init-param
to the servlet:
<!-- ======= DbForms Configuration Servlet ======= -->
<servlet>
<servlet-name>config</servlet-name>
<servlet-class>org.dbforms.servlets.ConfigServlet</servlet-class>
<init-param>
<param-name>resourceBundle</param-name>
<param-value>ApplicationResources</param-value>
</init-param>
Using the FMT tag is very simple: download and copy the file jstl.jar
into your WEB-INF/bin
directory. Then at the top of your JSP file, include a tag directive:
<%@ taglib prefix="fmt" uri="http://java.sun.com/jsp/jstl/fmt" %>
Usage is as easy as <fmt:message key="user_name"/>
.
To use the other localization approach, DbForms's db:message
tag:
<db:message key="user_name" />
The db:message
tag also takes an optional attribute params
, which is a comma separated list of values that replaces {0}
,{1}
, etc., in the found string.
DbForms's localization also works with other DbForms's tags like <db:gotoButton>
to enable the set-caption
attribute's localization to the key <db:gotoButton caption="user_name" >
, and have caption resource="true"
in the enclosing dbform
tag:
<db:dbform followUp="/menu.jsp" captionResource="true">
Note: DbForms's GUI tool sometimes embeds spaces in the caption attribute for the code it generates; you should remove them.
Setting locale
Changing locale is performed in the file options.jsp
. The key point is that when the user changes the locale, we use the fmt:setLocale
tag to set the locale used by JSTL, and we invoke the following code to set the locale used by DbForms:
<% session.setAttribute("org.dbforms.LOCALE",
new Locale( request.getParameter("locale") )); %>
Dynamic localization
In our database model, we needed to keep the Thai and English text for
various attributes. We did that by having a separate field in the
table. We also used a common naming convention: name
for Thai and name_eng
for English.
We keep this code snippet in an include file:
<c:choose>
<c:when test="${sessionScope.locale=='th'}">
<c:set var="name_field" value="name"/>
</c:when>
<c:otherwise>
<c:set var="name_field value="name_eng" ""/>
</c:otherwise>
</c:choose>
This sets the variable name_field
to the appropriate field name to access depending on our locale setting. We used EL notation ${name_field}
to access the variable.
We then display a single value in the correct language like this:
<db:dataLabel fieldName="city_id" >
<db:tableData name="city" foreignTable="city"
visibleFields="${name_field}" storeField="city_id" />
</db:dataLabel>
This will show the name of the current city in the correct language.
To let the user select a list of cities using the current language, we use the db:select
tag; inside that, we use a db:tabledata
tag:
<db:select pattern="#" fieldName="city_id" size="0"
selectedIndex="<%=(String)currentRow_person.get("city_id")%>" >
<db:tableData name="city" foreignTable="city"
visibleFields="${name_field}" storeField="city_id" />
</db:select>
The selectedIndex
line reads city_id
's current value for this record in the person
table and uses it as the initial selection.
Reports
We used JasperReports, to create reports in PDF format. DbForms has
integrated support for JasperReports. To use it, you simply create a
JasperReports definition file and store it in the WEB-INF/reports
directory or change the configuration in web.xml
to use some other directory. On your JSP page, include a line similar to the following inside a db:dbforms
tag body:
<db:gotoButton destination="/reports/city_report" caption="print_report" destTable="city" singleRow="false" />
In the db:gotoButton
, the destination attribute points to "/reports/city_report
". /report
is a servlet mapping handled by the StartReportServlet
(see the web.xml
for the configuration). The report definition file is called city_report.xml
(the development release of DbForms uses jxml
instead; this change will be incorporated into the next stable
release). The report definition file will be compiled on demand to a .jasper
file. You can edit report-file definition files at any time—no need to restart the application to test new reports. The testTable
attribute should match the enclosing db:dbform
tag's tableName
attribute. The singleRow
attribute is true if you want to display only one record, otherwise false.
One gotcha we discovered: In the report definition file, make sure that the <jasperReport>
tag's name attribute is the same as the file's base name. Otherwise it will not work:
<jasperReport name="city_report" ... >
Getting JasperReports to work with Unicode requires textElement
sections to be defined like this:
<textElement textAlignment="Left" verticalAlignment="Top"
lineSpacing="Single">
<font fontName="Angsana New"
pdfFontName="c:\fonts\ANGSA.TTF" pdfEncoding ="Identity-H"
isPdfEmbedded ="false" size="12"
isBold="false" isItalic="false" isUnderline="false"
isStrikeThrough="false" />
</textElement>
As you can see, the <font>
settings are tricky and require a lot of research and playing around. pdfEncoding
specifies that the font is an external Unicode font being drawn horizontally. pdfFontName
is a full path to a font, in this case, ANGSA.TTF, which is a Thai
language font. For Unix systems, you will have to change the path. We
use a sed script to do that automatically. isPdfEmbedded
should be false, even though true seems to make more sense.
To create our report definition files, we used an excellent tool called iReport. Unfortunately, with iReport, we could not create the <font>
settings exactly the way we needed them to be for Thai support. We had
to edit the XML file by hand and modify the font setting. Then we used
iReport to modify the file further. If we needed another field to be
Thai, we just copied the one we edited and modified that. In short, we
only needed to hand modify the file once.
POI
During my development with DbForms, I wrote a report generator called
ExcelReportWriter to generate an Excel spreadsheet using POI. POI
turned out to be easy to use; the only change required for supporting
Unicode data was the addition of this line: cell.setEncoding(HSSFCell.ENCODING_UTF_16);
.
This method call is applied to all character data; i.e., I do not
bother checking if the incoming data is non-UTF-8, I just make the
whole spreadsheet Unicode. This might result in a bigger output file,
but in my situation, that was not a concern. The full code for the
ExcelReportWriter is in the development release of DbForms.
Conclusions
Hopefully this article will help those people interested in writing
multilingual applications. Even if you don't plan to support
multilingual applications immediately, you can do a few tasks to make
the process easier if you plan to add such functionality later:
ApplicationResources
files. Separating out the messages' text from the code is just good
programming practice. Avoid hard code strings in your beans and JSP
files.
ApplicationResources
files.
Calendar
class for time and date handling, and the Formatter
class for handling numbers and currency values.
DbForms is a great tool for generating applications that are data-entry centric. This article did not cover all of the program's aspects. To get started, I suggest downloading the project and the sample codes. Look at the bookstore first, then move onto the other demos. The default installation uses a memory resident database, so you can get started quickly without configuring any database.
JasperReports is great for generating PDF files; it can create Excel files too, but I found it easier to use the ExcelReportWriter that I developed to generate simple Excel files.
MySQL 4.1 proved to be a good choice, with no problems from a stability perspective. As I write this, the 4.1 branch has graduated to beta from developer release.
About the author
Neal Katz is president of Nechsi Software of Bangkok, Thailand. He has
more than 15 years of professional software development work and has
been using Java since version 0.9. Katz prefers to use Linux and open
source technologies in his work. He is a contributor to DbForms and
other open source projects. Prior to Nechsi Software, Katz was a
founder of MessageOne technologies and an early member of Mail.Com and
Bigfoot.