Initially the controlling end sends:

s <s|d> <seed>

This tells the other end whether it's the source or destination and what the seed is.

Similarly, the remote end sends:

V "pgrsync vx.x dbx.x\n"             # Protocol version and DB version

To which the SRC replies either E or T.

If the controlling end is the destination, the following command sends the tables requested:

S <tablename> <null> etc...

Send details of table from SRC to DEST

T <tablename> <null> <pkey index>
    <column-name> <null> <typeid> 
    <column-name> <null> <typeid> 
    ...
  <null>

Reply is ERROR

E <error message> <null>

Or 

B <tablename> <null>            # Begin transfer from table name
C <pkey> <null> <4-byte csum>   # Check for one record with pkey
M <start pkey> <null> <end pkey> <null> <md5sum> <null> # Big block check (not implemented)
F                               # Done
A                               # Abort transfer, we got an error. rollback.

To which the source responds

U <pkey> <null>                 # Update record
    <column-data> <null>
    <column-data> <null>
    ...
  <null>
D <pkey> <null>                 # Delete record
I <pkey> <null>                 # Insert record
    <column-data> <null>
    <column-data> <null>
    ...
  <null>
F

Source follows with T or with Q (which means done)


Why big blocks
==============
Many large tables do mostly inserts and the occasional update and usually
only near to end. To avoid spending heaps of time sending checksums for huge
sections of the table that never change, the destination sends a big-block
message for maybe 100 (1000?) records. The source checks whether that block
matches and if so, the entire block is accepted. If it's rejected, the
destination proceeds as normal. We MD5 for the big block to be more sure
than the plain 32-bit checksum.

-o map:table=newtable
-o map:table.field=newfield
-o expr:table.field=field+1
-o "cond:table=billid is null"
-o fields:table=*,!wcost

Resolving these work as follows:

First the expr: is evaluated. They must all use the original field names, ie
they cannot use eachothers output. The output field name may be a new field
or it may replace an existing field.

Next, the map: are done. It only renames fields. Semantically all renames
are done simultaneously. A field cannot be mapped twice. If it also an error
if two fields end up with the same name.

Finally the fields: list is stepped through to determine the final list. The
elements are:

xxx     Include field xxx
!xxx    Don't include field xxx
*       Include all fields
!*      Exclude all fields

Any * elements should appear first since they nullify the effect of any
prior elements.

Any cond: options are ANDed together. The use of any normal SQL functions
are permitted. Any tables referenced must be using the original names. The
table mappings only occur at the very end.

Encodings and orderings
=======================

The optimisation in this program relies somewhat on being able to determine
what order rows are supposed to be in so that you can easily determine which
are missing. The simplest distinction is between numeric and string
ordering. This must be detected to work properly. 

A more serious problem is that of encodings, as that determines the default
order of string output, in particular whether it is case-sensetive or not.
Because Perl uses case sensetive comprisons it may sometimes disagree with
the database and thus produce a non-optimal change set. However, we can
still see that the databases have the rows in the same order. So in most
cases this will not be a problem.

Even worse, if the source and destination use different encodings, the order
will be different and the program will *never* see them as the same and send
the same data each time. It is not clear what can be done about this.
Hopefully tables where the primary key is a string whose case varies are not
common. At least the end result still has the right data arriving.
