

                           Introducing Tagged Types

Tagged Types are a pet project of mine that I came up with a_while_ago that
just seemed like a really nice idea. The idea is that you take an ordinary
datatype and add a tag to it and preserve that tag across various operators.
The obvious applications would be:

* A real money type: a numeric with a currency tag
* A timestamp that remembers the timezone

Anyway, I actually implemented it, you can download it below.


Table of Contents


  1. Examples
  2. How_it_works
  3. Operators
  4. Auxiliary_functions
  5. FAQ
  6. Downloading


1. Examples

Here are some examples of what you can do:

  test=# select c1, print_currency(c1) from c;
       c1     | print_currency
  ------------+----------------
   232.44 USD | US$    232.44
   21.20 EUR  | EUR       21.20
   -13.44 AUD | AU$    -13.44
   0.01 USD   | US$      0.01
   14.00 AUD  | AU$     14.00
  (5 rows)
  test=# select c1 * 5 from c where tag(c1) = 'USD';
    ?column?
  -------------
   1162.20 USD
   0.05 USD
  (2 rows)
  test=# select t,"timestamp"(t) from c;
                      t                    |      timestamp
  -----------------------------------------+---------------------
   2005-08-14 02:00:00+02 Europe/Amsterdam | 2005-08-14 02:00:00
   2005-08-14 02:00:00+02 Australia/Sydney | 2005-08-14 10:00:00
   2005-08-14 02:00:00+02 Asia/Hong_Kong   | 2005-08-14 08:00:00
   2005-08-14 02:00:00+02 America/New_York | 2005-08-13 20:00:00
   2005-08-14 02:00:00+02 Asia/Kuwait      | 2005-08-14 03:00:00
  (5 rows)

As you can see there are some rough edges, especially with the timestamp code.
That last example won't even work unless you have a recent version of
PostgreSQL as it won't recognise the time zones. After all, this only defines
the types, not the code to produce the answers.

2. How it works

This code merely exercises the type system PostgreSQL already provides. In the
beginning you provide a table (WITH OIDS) that lists all the tags you permit.
Then you execute the following:

    SELECT create_tagged_type( 'currency', 'numeric(8,2)', 'currencies' );

The first argument currency is the name of the type you want to create. The
second argument is the base type. Note that you can specify the so called
typmod of the type to specify things like precision. The third argument
currencies is the table of tags to use.
What happens is that the module looks up the details for the base type and
creates the new type with the correct parameters for storage. The storage is
essentially what is used by the base type plus 4 bytes for the OID. For details
see the FAQ.
When you insert a value it extracts the last word from the string (space
delimited) as the tag. Hence you cannot have spaces in your tags. This tag is
looked up in the tag table to check it is allowed. The remainder of the string
is passed to the normal input function for the base type. These are then
combined and stored as a unit.

3. Operators

Of course, a type by itself is not useful, you need to be able to do operations
on it. After all, a currency is a kind of number so you should be able to
number type things with it. This module does nothing automatically, since it
may not make sense. However it does provide a function to make it easier:

  # select create_tagged_operator( 'currency','*','numeric' ,'currency');
  NOTICE:  *(currency,numeric) => currency maps to *(numeric,numeric) =>
  numeric
  # select create_tagged_operator( 'numeric', '*','currency','currency');
  NOTICE:  *(numeric,currency) => currency maps to *(numeric,numeric) =>
  numeric
  # select create_tagged_operator( 'currency','/','currency','numeric' );
  NOTICE:  /(currency,currency) => numeric maps to /(numeric,numeric) =>
  numeric
  # select create_tagged_operator( 'currency','/','numeric' ,'currency' );
  NOTICE:  /(currency,currency) => numeric maps to /(numeric,numeric) =>
  numeric
  # select create_tagged_operator( 'currency','+','currency','currency');
  NOTICE:  +(currency,currency) => currency maps to +(numeric,numeric) =>
  numeric
  # select create_tagged_operator( 'currency','-','currency','currency');
  NOTICE:  -(currency,currency) => currency maps to -(numeric,numeric) =>
  numeric

As you can see, it lets you create operators and automatically determines what
base operator to use. This only produces simple operators. If the two arguments
are both tagged, it requires them to have the same tag or it will error. If you
want something more sophisticated, the auxiliary functions below will help.
Creating the operators above means you can now add two currency values or
multiply a currency value by a plain number, but you can't add a currency type
to a plain number or multiply two currency values. For simple examples like
this this is enough.

4. Auxiliary functions

Above I demonstrated ways to create simple operations on the new types. However
in real-life you're going to want to be able to do a lot more. Hence the module
defines several auxiliary functions to help you on your way.

The auxiliary type

Before getting too far, you need to know about the auxiliary type that is
created alongside each tagged type. It is created with the same name but with
_t appended. The reason it exists is because PostgreSQL doesn't allow you to
lookup fields in base types, only composite types. Hence for the type currency
above, the auxiliary type is defined as:

    CREATE TYPE currency_t ( value numeric(8,2), tag text );

There are implicit casts defined between the auxiliary type and the tagged type
so you can choose which to use in your functions. However, the auxiliary type
allows you to access the value and tag directly, which may help.

Defined functions


* tag( taggedtype )
  Returns the tag in textual form associated with the tagged value
* value( taggedtype )
  Returns the value associated with the tagged value
* tagdata( taggedtype )
  Returns the tuple associated with the tag
  This requires a little more explanation. Above I indicated that you needed to
  create a table with all valid tags. However, you are allowed to have more
  columns, providing meta-data for your functions. You may have been wondering
  how the print_currency in the example above knew how to format the value.
  Well, here is the definition of that function:

      CREATE FUNCTION print_currency( currency ) RETURNS text
      AS 'SELECT to_char( value($1), (tagdata($1)).format )'
      LANGUAGE 'sql' STRICT STABLE;

  There is a second column in the currency table (called format) which
  describes the format to pass to to_char() for this value. This allows you to
  abstract away various details from your code.
* taggedtype( basetype, tag )
  Creates the tagged type directly from its components.
* taggedtype_t( taggedtype )
  taggedtype( taggedtype_t )
  Explicit conversions between the tagged and auxiliary types.


5. FAQ


5.1. Can you index these columns?

I've given some thought as to creating a default index class for tagged types.
However, it's not clear what order would be useful. Do you ignore the tag while
ordering or order by tag first. I can't think of any use-cases that couldn't be
handled by a functional index.

5.2. Why a new type, why not just use a composite type?

Three reasons actually:

  1. Composite types are inefficient, space-wise. They have gotten more
     efficient as the versions progress, but they still store a whole tuple-
     header on disk for each value.
  2. Composite types cannot have custom input/output functions. In recent
     versions of PostgreSQL there is actually a default output for composite
     types but it's still not nice enough to use normally.
  3. Using a composite type you would either have to store the actual value of
     the tag, which is incredibly wasteful of disk space (those time zones
     names can get quite long), or store the OID and leave users with
     unreadable values (which wouldn't survive a dump/restore). A base type
     allows you to completely hide these implementation details.


5.3. What are the storage requirements

Tagged types take only the size of an OID (usually four bytes) extra on top of
the base value. So for variable length types the length gets four bytes longer.
For fixed types the value is stuck on the front. The only exception is really
short types like shortint and "char" that are passed by value within a single
datum. These become a fixed size of a datum plus an OID, usually around 8 bytes
total.

5.4. Any limitations I should know about?

Well,

* Tags can't have spaces, quotes or backslashes in them.
* Tags can't be longer than 32 characters.
* When you delete a tagged type some cruft is left behind (not in the system
  catalog though). PostgreSQL doesn't provide a way to tell the module that its
  type just got deleted. It won't hurt though unless you start doing it a lot.


5.5. What about performance?

I haven't tested huge tables so I can't really comment. There are a few caches
internally to avoid looking up too many things at once but looking up tags can
be expensive. In short, unknown at this stage.

5.6. What if I delete stuff from my table of tags?

Don't do that! Seriously, this is like deleting a row out of the a system
catalog and wondering why something doesn't work anymore.
Strictly speaking, if a value is printed with a tag that no longer there, it'll
print the OID instead (and display a NOTICE). This should be enough for you to
find them (ie WHERE tag(x) = '#94823') and either delete them or construct a
new value with the correct tag.
I've given thought to having a kind of foreign key reference to give a error
when you delete a tag in use, but that would require an index on every column
using the type to be efficient. Also, a trigger on the tagtable to stop people
updating entries may be useful. However, I decided that in the end it's the
admin's responsibility. They should revoke all privileges from the tag table if
they feel it's a risk.

5.7. What about other functions, unary operators or aggregates?

I haven't gone to the trouble of creating helper functions for these things.
it's a case of diminishing returns. Operators are useful and slightly more
difficult to create. Function are hard in the general case, but trivial if you
know what you want. For example, say you wanted to support abs() for the
currency type. You would create the function as follows:

    CREATE FUNCTION abs(currency) RETURNS currency
    AS 'SELECT currency( abs( value($1) ), tag($1) )'
    LANGUAGE SQL STRICT IMMUTABLE;

It's easy doing it by hand, you can just write it down. To write a function
that determines the right types and base function and calling everything at the
right time is anything but trivial.
Note, writing it as an SQL language function as above means it will be in-lined
directly into your queries.

5.8. Why use OIDs?

Well, because they're reasonably efficient to search for and they're non-
updateable. They're automatically created (when you ask for them in the CREATE
TABLE). For the code to check you've created a serial of the required type in
the right place and that you've created a unique index and etc... It's easier
just to use OIDs. Yes, they could wrap, but with user tables no longer having
OIDs by default this should become less of a problem. Besides, the list of tags
shouldn't change.

5.9. Bugs?

Yes, probably.

5.10. What if I want a different precision?

The above currency type was defined with two decimal places. This is usually
fine but if you want more precision you'll need to redefine the type, or create
a different version of the type for each precision you want. PostgreSQL doesn't
allow user defined types to have parameters, though this is on the TODO list.
One side-effect of this is that if you do calculations with the currency type,
the precision is enforced at each step. So if you divide by a number, the
result will immediatly be rounded to 2 decimal places. If you have intermediate
steps that require higher precision it may be worthwhile to extract the numeric
value and operate on it directly, coercing it back to currency only at the end.

5.11. What versions does this work on?

I've tested it on 7.4 and on 8.1devel. There are a few #ifs in there to handle
the changes made in the server code. The changes have been keyed to the
CATALOG_VERSION_NO where the change happened, so it should work for all in
between versions also, but I havn't tested that. There is a test file included.
If you can successfully run it through psql without any strange errors, you're
probably fine. The only expected one is older versions not recognising the
timezones for conversion.

5.12. How do I (un)install it?

See the included INSTALL file.

5.13. What now?

Well, user defined types are still somewhat second-class citizens in
PostgreSQL. For example, they can't take parameters like char and varchar.
Imagine the currency type above, but with configurable precision. Another
interesting avenue would tags be controlled by functions. The example would be
to be able have a physics_unit type that behaved like:

    SELECT '2.5 ms-1'::physics_unit * '4 s'::physics_unit;
  returns:
    10 m

Maybe not exactly something you want in a database but it would be a great
example of what user-defined types can do.
This is however just a hobby thing, I'm not using it myself for anything major
(yet) but I did learn quite a bit about PostgreSQL internals while writing it.

6. Downloading

Currently only the initial release v0.1 is available.
-------------------------------------------------------------------------------
By Martijn van Oosterhout (kleptog (at) svana.org)
Copyright  2000-2005 - Last modified 28/08/2005
