Define task for DataPump tool


Description

DataPump task is defined in a xml file. One file define one task. A task can have many relations to be pumped, and every relation may define many field maps for mapping source data and target. A relation represents relationship between a DML SQL statement and a table in target database.

Root node

The root node of each task defination file is "task".

Attributes

name description Required
source-driver JDBC driver class name for connecting source database. Be sure it is on class path. yes
source-db Url to source database. yes
source-user User name for connecting to source database. yes
source-password Password corresponding to source-user of source database. yes
target-driver JDBC driver class name for connecting target database. Be sure it is on class path. if it is the same of source driver, you can omit it. no
target-db Url to target database. yes
target-user User name for connecting to target database. yes
target-password Password corresponding to target-user of target database. yes
halt-on-error Speicify whether the whole task should halt if error occurs. Default "true". no
null-need-set If the value of a parameter of a SQL statement is null, whether it should be set. This depends on target driver. Default "false". e.g. For M$ SQL Server 2000 JDBC driver, it should be false; while for Postgresql 7.2 driver, it should be true. no

Convertion rules

Description

A convertion rule defines how a source value should be transform to the target. DataPump contains some built-in rules already, and you can define your own rule by implementing com.volcano.datapump.ConventionRule interface (for more infomation, please refer to api docs), define it in a "ruledef" node, and specify it for field maps.

Built-in rules

name description
default Default convertion rule for each field map, simplely copy source data to target field.
default-value Defines default value of a target column, you must specify the "value" parameter. This rule need none source column.
generate-serial Generate serial number for a column. And your map use a "step" parameter to define number step, default step is 1. This rule can automatic get current maximum value from the target column, so it is safe to insert into a non-empty table. This rule need none source column.
generate-enum Like the generate-serial rule, but this rule can define any enumeration values, you must specify the "enum" parameter, give the values to the rule, sepreted by semi-colon, for example, "new;old". It will start over the values when reaches highest value. You may alse specify "need-rollback" bool parameter (default "false"), define whether it should rollback if any following rule in the relation canceled the insertion. This rule need none source column.
date-to-long Transform java date object to its long presentation. e.g. For the source column is SQL Date/Time/TimeStamp type, while the target column is SQL BIGINT type.
reference A complex rule for mapping old relationship to new relationship of data. This rule read the source value, find it on a reference column of a reference table in target database, and transform it to reference table's id. You must specify 3 parameters: "reference-table", "reference-column" and "reference-id".
one-to-many The most complex built-in rule. The rule can map one source row to multiple target row, for deferent source columns, the rule generates multiple new rows to the target field. This rule is very useful for resolve data not comformance to RULE 2 of database design. You must specify the source columns in "sources" parameter, seperated by semi-colon, the source columns must be fetched out by the relation's SQL statement. You may alse specify "need-rollback" bool parameter (default "false"), define whether it should rollback if any following rule in the relation canceled the insertion. This rule don't use source column attribute.

ruledef node

You can define your own rule in ruledef node.

Attributes

name description requires
name The name of the rule for field maps. Must be unique to any built-in rule and custom rule. yes
class Class name of the rule class, this class should be found in class path. yes

relation node

Description

A relation represents relationship of source data to a target table. Usually it should contains 1..n map
nodes. If target-sql-type is "update", it should contains 2 map nodes : the first one represents
"set" clause and the second represents "where" clause.

Attributes

name description requires
sql The DML SQL statement for fecthing data from source database. yes
target-table The target table name in target database. yes
target-sql-type Specify the type of sql; "update" or "insert" . Default "insert" no
halt-on-error Specify whether data pump should continue when error occurs. Default "true". no

map node

Description

A map defines relationship of source column in relation's SQL statement and target column in relation's target table. A map can also specify convertion rule of the transformation.

Attributes

name description requires
source The source culumn name of the SQL statement of the relation. no
target The target column name of the target table. Same with source when omited. You must specify either source or target. no
rule The rule name for convertion. Default "default". no

Sample

Please see this sample file.