Using variable-length fields in your database

Tony Marston - 6th July 2002
amended 1st February 2003

While working for several companies recently I noticed in their application models that every string field regardless of length had the packing code set to 'VC' (VARCHAR, or variable-length) rather than 'C' (CHAR, or fixed-length). This strikes me as rather strange given what I have been led to believe with regard to the use of variable-length fields.

1. The difference between fixed and variable length

Firstly, what is the difference between a fixed-length and a variable-length field?

This is what the UNIFACE Reference Manual, section 9.5.2 has to say about variable-length packing codes:

The idea behind variable-length packing codes is to store only the actual part of the field that is filled in, and not the trailing spaces that would be added for a fixed-length field. For a field which is normally only partially filled, this can mean saving considerable disk space. For small fields however it may not save any space, since the database now has to store the length information in the field, which is not necessary for a fixed-length field.

For this to work you must be using a DBMS which supports a VARCHAR data type. Most relational DBMSs support this feature. For DBMSs which do not support this format, a variable-length field will be mapped to a fixed-length field by the database driver, so you can still use these fields although they do not save any disk space.

2. Restrictions with variable-length fields

There are two restrictions when defining variable-length fields in an entity:

Here is a warning from the UNIFACE Reference Manual, section 9:

As far as most DBMSs are concerned, the variable portion is simply a single long field. If not supported by the underlying DBMS, UNIFACE maps variable-length fields and included entities into this field in the format defined in the application model.

Caution: Using special UNIFACE functionality for field and entity interface definitions can severely reduce the accessibility of your data by other programs if your DBMS does not support these techniques.

The UNIFACE documentation also identifies other restrictions that apply to variable-length fields:

3. Comparison of fixed- and variable-length fields

There may also be problems when comparing a fixed-length field with a variable-length field, especially in SQL commands. The ANSI/ISO SQL standard requires that two character values being compared must have equal lengths. So, if both values in a comparison have datatype CHAR, blank-padding semantics are used. That is, before comparing character values of unequal length, the shorter value will be blank-padded to the length of the longer value.

If either or both values in a comparison have datatype VARCHAR then non-blank-padding semantics are used. That is, when comparing character values of unequal length, no adjustments will be made and the exact lengths will be used. For example, given the declaration

name1 VARCHAR(10) := 'DOG'
name2 VARCHAR(10) := 'DOG   ' ; note trailing spaces

the condition IF NAME1 = NAME2 is FALSE because name2 is longer than name1 due to the trailing blanks.

If one value in a comparison has a datatype CHAR while the other has datatype VARCHAR then a similar situation arises:

name1 VARCHAR(10) := 'DOG'
name2 CHAR(10)    := 'DOG' ; will be blank-padded

Remember that when you assign a value to a CHAR field, if the value is shorter than the declared length it will be blank-padded. So the condition IF NAME1 = NAME2 is FALSE because name2 is longer than name1 due to the trailing blanks.

4. Automatic conversion from fixed- to variable-length

The UNIFACE Reference Manual section 9.5.2 includes this interesting piece of information:

Under certain circumstances, UNIFACE will automatically handle a fixed-sized as a variable-length field. This will be the case when both:

The assumption behind this is that large fields will hardly ever be filled completely, so using variable-length storage will save disk space. The propagation from fixed to variable is done transparently by the driver for string data types with packing code C (character), R (raw), and U (TRX format). It does not affect the definitions in your model. If necessary, you can suppress this feature by using the '=' prefix to the packing code. For instance, a field with packing code 'C50' will be stored as a VARCHAR(50), but a field with packing code '=C50' field will be stored as CHAR(50).

See the DBMS Driver Guide for your DBMS for details (if any) about the mapping of fixed- to variable packing codes.

I note that in the Oracle Driver Guide table 8.1 gives the following default mapping of UNIFACE packing codes to ORACLE storage formats:

Table 8.1 in the ORACLE Driver Guide
UNIFACE packing code ORACLE storage format
C1-C255 char
C256-C2000 varchar2
C2001-Cn long

The SOLID Driver Guide table 1.2 gives the following mapping of packing codes:

Table 1.2 in the SOLID Driver Guide
UNIFACE packing code SOLID storage format
C1-C7999 char
C8000-C* varchar

Given all the above information the general rule of thumb that I have worked to for many years is that there is no real advantage in defining a string field as variable length unless its maximum length is 255 characters or more - anything else could turn out to be a false economy. This also means that I have managed to avoid a number of bugs that UNIFACE has had regarding VARCHAR fields: - BUG 22630, 22682, 22687, 22689, 23427, 23750, and 24080.

5. Addendum

The statements I made in this document were queried by David Buxton who stated that he had several tables in his MS SQL-Server and Oracle databases where there appeared to be no problems at all with regards to interspersing variable-length and fixed-length fields. In my defence I could only say that I had quoted from the UNIFACE documentation which I considered to be the 'bible' in these matters. He took this matter up with Compuware, and it now seems that the documentation is not entirely accurate. It transpires that variable length fields can actually be split into two categories:

It now appears that the documented restrictions on variable-length fields only apply if the maximum length is not specified. These fields require overflow tables and special techniques, so cannot be used for primary/candidate keys, order by, selectdb, and cannot be interspersed with fixed-length fields.

Compuware still advise against the use of VC(n) for primary keys as there are numerous outstanding bugs in this area (specifically bug numbers 22682, 23427, 23750 and 24080).

As for automatic conversion from fixed-length to variable length - in such a situation UNIFACE "knows" it's a converted field and will treat it as if it was fixed in length.

Apparently the documentation in this area is due to be updated in a later release of UNIFACE.

Tony Marston
6th July 2002

Amendment history:

1st February 2003 Added section 5 Addendum.