US 7433877 B2
A system and method for preventing user-input text strings of illegal lengths from being submitted to a database where, for each character in the string, a character length is determined in quantities of digital units of storage according to an encoding schema, the character lengths are accumulated into a total string length, also measured in digital units of storage, and the total string length is compared to one or more database input field requirements such as non-null and maximum length specifications. If a limit is not met, the system and method are suitable disposed in a manner to block or prevent submission of the user-input string to the database. The invention can alternatively be realized as a plug-in for database front-end application programs, as a stand-alone web services provider, or as a plug-in for a client-side database access program such as a web browser.
1. A method of preventing text strings of disallowed lengths from being submitted to a database comprising the steps of:
providing a hidden field associated with a visible user input field in a web page;
receiving by said hidden field a field storage limit from a database front-end application;
receiving by said front-end database application via said hidden and visible fields a user-provided string of characters intended for input into a database, said string being encoded according to a Unicode schema;
for each character in said string, determining a character storage value by:
(a) assigning a storage value of 2 to any character having a Unicode value of zero;
(b) assigning a storage value of 1 to any character having a Unicode value between 1 and 127, inclusive;
(c) assigning a storage value of 2 to any character having a Unicode value between 128 and 4095, inclusive; and
(d) assigning a storage value of 3 to any character having a Unicode value of 4096 or greater; and
accumulating said assigned character storage values to yield a total string storage requirement;
comparing said total string storage requirement to said field storage limit received by said hidden field;
responsive to said comparison finding said total string storage value is within said field storage limit submitting said user-provided string of characters to said database;
responsive to said comparison finding said total string storage value exceeds said field storage limit and subsequent to user activation of a submit data command, capturing said user-provided string of characters in a administrator log; and
subsequent to capturing a plurality of strings of characters in said log, adjusting said field storage limit value and an associated database field definition to reduce instances of user input exceeding database field size.
2. The system as set forth in
3. The method as set forth in
4. The method as set forth in
5. The method as set forth in
6. The method as set forth in
7. The method as set forth in
8. The method as set forth in
9. The method as set forth in
10. The method as set forth in
11. The method as set forth in
12. The method as set forth in
13. The method as set forth in
14. The method as set forth in
storing a database catalog as a cache on a database client device;
performing said steps of accumulating and comparing by said client device such that only checked and qualified input strings are submitted to a front-end application program.
15. The method as set forth in
1. Field of the Invention
This invention relates to the fields of data control, and especially to fields of determining and checking input data characteristics to databases.
2. Background of the Invention
Various types of databases such as hierarchical, relational and object-oriented databases, offer consistent data storage, and provide transaction persistence, security, concurrency and performance. Consequently, a distributed architecture (30) that uses databases (34-35) as the back-end storage mechanisms and applications (33) for programming logic have become prevalent, as shown in
Most databases have a maximum input string length requirement which is often specified in characters. Most database designs, however, actually implement their maximum string length in bits, bytes or words. In such a computing environment, a front-end application (33) normally checks the length in characters of user input strings prior to submitting the queries to the back-end database (34, 35) so that it can prevent users from entering strings (36) that are longer than what database allows.
If the input strings are longer than database allowable length, an error message (37) is typically generated from database, will is often returned (37′) to the end user. However, this is an undesirable result because database error message may reveal table and column names, which is not only unprofessional in appearance to the user, it may violate one or more security guidelines. Moreover, the error message may not be user friendly.
In today's world, multi-language operating environments have increasingly become the norm of everyday business, and the application programs those enterprises use are required to handle multi-language input strings. It is not a troublesome issue in an purely English environment, such as a system using exclusively the American Standard Code for Information Interchange (“ASCII”), to check user input string length corresponding to database allowable fields since each character in ASCII encoding schema uses only one byte, and it isn't a big issue in other fixed byte-length native language encoding schema. In such a case, if a database specifies a maximum input string length of 128 characters in ASCII, one can assume that the database can handle input strings of length 128 bytes.
In another example, consider a database application which is operating in a Chinese-only environment which is utilizing GB5 encoding. GB5 stores every Chinese character in two bytes. To check input string length, the front-end application program can predict exactly how many characters are allowed corresponding to database fields by dividing allowable text entries in half (e.g. two bytes per character).
However, as different languages are used simultaneously within the same database, this can be much more problematic to address. For example, a common multi-language encoding schema is UTF-8. UTF-8 encoded strings can store characters using between one byte and three bytes per character, depending on the language from which the character or symbol is taken. For instance, a Chinese character in UTF-8 requires three bytes for encoding, while an Arabic character consumes only two bytes, a Hebrew character takes two bytes, a French character takes one or two bytes, an English character takes one byte, and special characters like currency symbols can take two bytes.
Many of today's front-end database applications are hard-coded to validate text entry length against database allowable length. Moreover, these applications are also often hard coded with logic to check whether text entry fields have at least one character to fulfill database requirement for not-nullable fields. Examples of validations done in code are shown in Table 1, using Sun Microsystem's Java™ code, and Table 2 using Java Script™.
The length( ) function in the example of Table 1 checks whether the user's text entry has at least one character, and the maxlimit in the example of Table 2 requires a declaration of variable for allowable character length within the code scope. These are fundamentally flawed processes for checking input string length, especially in multi-lingual applications, for two reasons.
First, the maxlimit variable and the maximum attribute only counts the number of characters, not the number of bytes. In a multi-language environment, checking character length may produce wrong results because characters in UTF-8 can be one to three bytes in length, and the front-end applications cannot accurately predict whether a text string reaches the allowable database length.
For example, if there is a text entry field in a front-end application that uses a 10 byte database field, and a user enters a text string such as “I like IBM very much” in Chinese:
Today's applications would calculate the total number of characters of this entry as 9, but this string actually uses 18 bytes (5*3+3) when encoded in UTF-8. The application will consider the text entry is less than the maximum length in database, so it will submit the entry to the database, the database will detect the error, and will throw back an error message that the length is too long. At this point, the user will not be able to know how many characters to remove in order to fit into the database field.
Second, even if the front-end applications check the data length in bytes, it is tedious to change hard-coded variables when requirements or design desire changes in database field length or from null to not-null attribute. Such simple changes require considerable of code re-work on front-end applications, increasing the project risk and slowing down the development pace.
Therefore, a method and mechanism is needed in the art to calculate text string lengths in bytes for multi-lingual text encoding schemes. Further, there is a need in the art in some circumstances to centralize input string length checking logic for applications, in order enable rapid changes in text entry length and enforce the not-null attribute. In other circumstances, there is a need in the art to distribute input string length checking in order efficiently leverage distributed and locally cached database storage efficiencies.
The present invention provides system and method for preventing user-input text strings of illegal lengths from being submitted to a database where, for each character in the string, a character length is determined in quantities of digital units of storage according to an encoding schema, the character lengths are accumulated into a total string length, also measured in digital units of storage, and the total string length is compared to one or more database input field requirements such as non-null and maximum length specifications. If a limit is not met, the system and method are suitable disposed in a manner to block or prevent submission of the user-input string to the database. The invention can alternatively be realized as a plug-in for database front-end application programs, as a stand-alone web services provider, or as a plug-in for a client-side database access program such as a web browser.
The following detailed description when taken in conjunction with the figures presented herein provide a complete disclosure of the invention.
According to one embodiment of the present invention, a stand-alone Unicode Length Checker (“ULC”) plug-in (51) is provided to a front-end application to determine the number of bytes in a user's text entry (36), and to verify the string length and null attribute conformation in the memory, as shown (50) in
In another embodiment of the present invention, the various functions of the ULC plug-in (51) are provided as Web services by a web server (53), which enables asset reuse for any applications (33), either using a database or content management system as a back-end system, as shown (52) in
In yet another embodiment, the ULC plug-in (51) is provided as a stand-alone product, or Integrated Development Environment (“IDE”), database or content management middleware vendors can bundle the ULC plug-in and ship it with their products to ease the application development.
Some advantages realized by the present invention over the existing length-checking methods are:
In this example, the use of the “hidden” tag is to pass the value of 50, in this case, which gives the maximum number of bytes. These two lines of code can either be written or done automatically by custom struts or Java Server Faces, both of which are programming methodologies which are well known in the art. Preferably, the input values are read from the database using initialization (“init”) processing, and then stored in a static hashtable. In this example, it would be also possible to retrieve that information in the custom Struts or JSF tag.
In this manner, the invention considers each character using native functions of the programming language in which the invention is embodied to determine the codepoint of each character. Then, according to the encoding schema of the string such as UTF-8, ASCII, GB5, etc., the number of bytes associated with that codepoint are added to an accumulated total string length, until all characters have been considered, and a total string length in binary units (e.g. bits, bytes, words, etc.), has been determined. For certain encoding schemes, such as UTF-8, the storage lengths of codepoints may be associated with ranges or segments of codepoint values, which greatly simplifies processing because it can be done not on a specific character lookup basis, but rather on a ranged basis.
For embodiments of the invention intended to assist in offline database input processing, one available implementation option is to serialize and send the static hashtable as part of the offline application.
The following, more detailed description of the invention is provided using several particular example implementations. It will be recognized by those skilled in the art that the features, descriptions, high level implementation provided in the following paragraphs can be implemented in a variety of programming languages using a variety of programming methodologies.
Invocation Using Style Sheet
In this implementation, the invention is provided as a client side style sheet in eXtensible Markup Language (“XML”). In this form, the invention is independent from any particular front-end application programs, and it also has accessibility and validity advantages.
If the length is found to be acceptable to the targeted database (104), it is passed on for normal handling and input to the database (17). Otherwise, a custom error message may be provided (105) to the user, as opposed to the cryptic error messages provided by databases upon such an error. The process (10) may be repeated (18, 11) as needed for additional input strings.
In this manner, the input strings which do not conform to the database input limits are blocked from being submitted to the database, cryptic error messages and security leaks are avoided, and intelligible, user-friendly error messages are provided in their stead.
In this implementation, a Strut is employed to automate the process of putting the hidden fields in web pages. From a programming perspective, this can be done automatically. In operation, the front-end application program passes the sizes of database fields to hidden fields in the web pages, then as users are entering the characters in the web pages, the front-end application program calculates the number of bytes for UTF-8 strings until the strings reach the maximum allowable fields for the database.
Client-side Cached Catalog Embodiment
In this implementation of the invention (51), a database catalog (54) is stored as a cache on the client side (31), which is used to perform length checking at the client, such that only checked or qualified input strings (36′) are sent to the front-end application programs (33), as shown in
The database catalog cache is preferably encrypted to provide security in circumstances where is it not desired for users be aware of or have access to database tables and column field names. In this arrangement, mapping of real database field names and alias of the cache is performed by the application program running on the server, not on the client computer, such that users only know the maximum length and the parameter names for input fields.
The database catalog cache is preferably either stored as XML or in IBM's Cloudscape™, a well-known type of embedded database. In the latter case, few users would aware of an embedded database existing the client side, but it offers many features came with matured relational database technology, such as storage persistence and SQL access.
Error Response, Tracking, and Resolution
When a user-entered text string exceeds the maximum allowed field length for a database, the ULC plug-in preferably logs the event so that application administrators or designers can a direct feedback and audit history to know how many users have similar problems. This information can guide the administrators or designers to make an informed decision whether to increase database length.
There are several options to implementing logging of field length options, including but not limited to the following:
The invention is preferably realized as a feature or addition to the software already found present on well-known computing platforms such as personal computers, web servers, and web browsers. These common computing platforms can include personal computers as well as portable computing platforms, such as personal digital assistants (“PDA”), web-enabled wireless telephones, and other types of personal information management (“PIM”) devices.
Therefore, it is useful to review a generalized architecture of a computing platform which may span the range of implementation, from a high-end web or enterprise server platform, to a personal computer, to a portable PDA or web-enabled wireless phone.
Many computing platforms are also provided with one or more storage drives (29), such as a hard-disk drives (“HDD”), floppy disk drives, compact disc drives (CD, CD-R, CD-RW, DVD, DVD-R, etc.), and proprietary disk and tape drives (e.g., Iomega Zip™ and Jaz™, Addonics SuperDisk™, etc.). Additionally, some storage drives may be accessible over a computer network.
Many computing platforms are provided with one or more communication interfaces (210), according to the function intended of the computing platform. For example, a personal computer is often provided with a high speed serial port (RS-232, RS-422, etc.), an enhanced parallel port (“EPP”), and one or more universal serial bus (“USB”) ports. The computing platform may also be provided with a local area network (“LAN”) interface, such as an Ethernet card, and other high-speed interfaces such as the High Performance Serial Bus IEEE-1394.
Computing platforms such as wireless telephones and wireless networked PDA's may also be provided with a radio frequency (“RF”) interface with antenna, as well. In some cases, the computing platform may be provided with an infrared data arrangement (“IrDA”) interface, too.
Computing platforms are often equipped with one or more internal expansion slots (211), such as Industry Standard Architecture (“ISA”), Enhanced Industry Standard Architecture (“EISA”), Peripheral Component Interconnect (“PCI”), or proprietary interface slots for the addition of other hardware, such as sound cards, memory boards, and graphics accelerators.
Additionally, many units, such as laptop computers and PDA's, are provided with one or more external expansion slots (212) allowing the user the ability to easily install and remove hardware expansion devices, such as PCMCIA cards, SmartMedia cards, and various proprietary modules such as removable hard drives, CD drives, and floppy drives.
Often, the storage drives (29), communication interfaces (210), internal expansion slots (211) and external expansion slots (212) are interconnected with the CPU (21) via a standard or industry open bus architecture (28), such as ISA, EISA, or PCI. In many cases, the bus (28) may be of a proprietary design.
A computing platform is usually provided with one or more user input devices, such as a keyboard or a keypad (216), and mouse or pointer device (217), and/or a touch-screen display (218). In the case of a personal computer, a full size keyboard is often provided along with a mouse or pointer device, such as a track ball or TrackPoint™. In the case of a web-enabled wireless telephone, a simple keypad may be provided with one or more function-specific keys. In the case of a PDA, a touch-screen (218) is usually provided, often with handwriting recognition capabilities.
Additionally, a microphone (219), such as the microphone of a web-enabled wireless telephone or the microphone of a personal computer, is supplied with the computing platform. This microphone may be used for simply reporting audio and voice signals, and it may also be used for entering user choices, such as voice navigation of web sites or auto-dialing telephone numbers, using voice recognition capabilities.
Many computing platforms are also equipped with a camera device (2100), such as a still digital camera or full motion video digital camera.
One or more user output devices, such as a display (213), are also provided with most computing platforms. The display (213) may take many forms, including a Cathode Ray Tube (“CRT”), a Thin Flat Transistor (“TFT”) array, or a simple set of light emitting diodes (“LED”) or liquid crystal display (“LCD”) indicators.
One or more speakers (214) and/or annunciators (215) are often associated with computing platforms, too. The speakers (214) may be used to reproduce audio and music, such as the speaker of a wireless telephone or the speakers of a personal computer. Annunciators (215) may take the form of simple beep emitters or buzzers, commonly found on certain devices such as PDAs and PIMs.
These user input and output devices may be directly interconnected (28′, 28″) to the CPU (21) via a proprietary bus structure and/or interfaces, or they may be interconnected through one or more industry open buses such as ISA, EISA, PCI, etc.
The computing platform is also provided with one or more software and firmware (2101) programs to implement the desired functionality of the computing platforms.
Turning to now
Additionally, one or more “portable” or device-independent programs (224) may be provided, which must be interpreted by an OS-native platform-specific interpreter (225), such as Java™ scripts and programs.
Often, computing platforms are also provided with a form of web browser or micro-browser (226), which may also include one or more extensions to the browser such as browser plug-ins (227).
The computing device is often provided with an operating system (220), such as Microsoft Windows™, UNIX, IBM OS/2™, IBM AIX™, open source LINUX, Apple's MAC OS™, or other platform specific operating systems. Smaller devices such as PDA's and wireless telephones may be equipped with other forms of operating systems such as real-time operating systems (“RTOS”) or Palm Computing's PalmOS™.
A set of basic input and output functions (“BIOS”) and hardware device drivers (221) are often provided to allow the operating system (220) and programs to interface to and control the specific hardware functions provided with the computing platform.
Additionally, one or more embedded firmware programs (222) are commonly provided with many computing platforms, which are executed by onboard or “embedded” microprocessors as part of the peripheral device, such as a micro controller or a hard drive, a communication processor, network interface card, or sound or graphics card.
The present invention has been described, including several illustrative examples. It will be recognized by those skilled in the art that these examples do not represent the full scope of the invention, and that certain alternate embodiment choices can be made, including but not limited to use of alternate programming languages or methodologies, use of alternate computing platforms, and employ of alternate communications protocols and networks. Therefore, the scope of the invention should be determined by the following claims.