Data Analysis & Visualization
Data Compare
There are three tools in AQT for comparing data:
comparing results. If you have two Data Display windows open, you can compare the data displayed in these windows. This provides a simple and quick method of comparing different tables. However it has some distinct limitations, for instance it doesn't cope well with rows being inserted / deleted from one of the tables. Once such as difference has been found it will flag all subsequent rows as being different.
This compare tool also allows you to compare a Data Display window with some saved-results. This allows you to compare a table / query at two different points in time. This can be very useful if you are dealing with time-varying data.
Data Compare tool. This is a full featured tool for comparing tables or queries. This can handle rows inserted / deleted between the tables. In addition, this can generate a script to resync the contents of the tables.
Compare Multiple Tables. This can be used to compare a large number of tables, for instance all tables in a schema.
Generate Resync Script
AQT can generate a script to resynchronize the contents of the two tables. Options for this are specified on the Script Options tab.

In the Generate Resync Script box you can specify whether:
no script is to be generated
AQT is to generate a script that makes the Compare table the same as the Compare to table. This script will consist of a number of insert / update / delete statements against the Compare table.
AQT is to generate a script that makes the Compare to table the same as the Compare table. This script will consist of a number of insert / update / delete statements against the Compare to table.
Resync File specifies the name of the script file that AQT will generate.
Ensure lines are less than 72 bytes. This option is useful when generating SQL scripts for DB2 z/OS.
Note: when generating Update and Insert statements, AQT will only include the columns which are being compared. If you exclude a column from the compare, it will not be included in the Update / Insert. Because no matching column has been specified, AQT is unable to know what value to use for the Update / Insert.
View / Run the Resync script
These options are specified in Running the Resync Script.
Resyncing a Query
If you are comparing a query (rather than a table), you can only resync this if it is an Updateable Query. This is one which:
only queries a single table
isn't a summary query
This is the limitation of (most) databases, rather than AQT.
Which Statements to Generate
These options specify which SQL statements to include in the script. These allow you to control what categories of changes are to be applied to the target table.
By default AQT will include update, insert and delete statements. You may only wish to de-select deletes, for instance, if you only want updates and inserts applied to the target table. This can be used to avoid inadvertent deletion of data.
Generate Updates as Delete / Inserts. When this option is selected, AQT will generate an update statement as a delete followed by an insert. This is useful in circumstances when foreign key relationships on a table prevent an update from happening. In this case a delete/insert may work more successfully.
When this option is used, deletes and inserts statements will be created even if the insert and delete options are de-selected. The delete/insert can be considered as an update.
Bi-directional Resync
These options allow you to perform a bi-directional resync, which can be useful in some circumstances. For instance you might have an environment where rows are added to two tables in different databases and/or sites, and deletes/updates are not done. You may wish to resync the tables so they both contain all the rows. This requirement can be difficult to achieve by other means, however is easy with the Data Compare:
specify that only inserts are included in the script
resync TableA to TableB
resync TableB to TableA
After this, both tables will then have all the rows.
Generate the Update / Insert / Delete statements to separate files
By default, the resynchronization script will have all the SQL statements in a single file.
Sometimes it can be useful to have the Update / Insert / Delete statements in separate files - this is can be needed if you are resynchronizing a set of tables that have foreign keys defined between them. In this case the update / inserts / deletes for the various tables have to be run in a particular order.
To do this, go to the More Options tab and click on Generate Updates, Inserts, Deletes to separate files. AQT will write the update, insert, delete statements to the specified files, as well as to the "main" resynchronization script file.
Only write to these files (and not to the main script file)
When this option is selected, AQT will only write to the three update/insert/delete files, and not to the main script file.
Specifying the Columns
You can specify which columns in your tables are to be compared by clicking on the Columns tab. This shows you the columns in your tables / queries.

on the left grid, select the Compare checkboxes to include or exclude a column from the compare.
on the right grid, you can move entries up or down. You do this to indicate which column in second table matches the column in the first table.
the Insert Entry and Delete Entry buttons are useful if you need to add a blank row to the grid. You will need to do this if your first table has a column which doesn't correspond to any column in the second table. In this case you match this column to a blank entry.
columns which will not be compared are "greyed-out".
the Key checkbox can be used to specify which columns in the table comprise the unique key
Running the Compare
To run the Compare click on the Compare button. To stop the compare, click on Abort.

Once the compare has completed, you will be shown the compare results on the Results tab.
the two tables are shown side-by-side. Key columns are shown as black-text on light-yellow background. Columns which are not compared are shown as grey-text on white-background.
the icon on the left shows you whether that row was compared OK, is different, or is not present in one of the tables.
values which are different between the two tables are shown in yellow.
as you scroll through the tables, the other table will scroll with it.
as you click on cell, AQT will show you, in the status bar, the value of the cell in the other table.
if you double-click a cell (or right-click and select View Cell), you will be taken to the Data Compare Detail window. This shows you the two values side-by-side for easy comparison of the values.
you can view the detail of a row by right-clicking a cell and selecting Show Detail.
if you have a LOB-file, you can right-click and select View as to view the LOB value in an external application.
you can print a grid with File > Print Grid (or Ctrl+P). This will print the grid you have last clicked-on.
you can copy a set of cells by selecting them and hitting Edit > Copy Cells (or Ctrl+C). You can select all cells in a grid with Edit > Select All (or Ctrl+A).
Only seeing rows which are different
if you are comparing large tables, it is recommended that, on the Options tab, you select Only show that are different. When this is selected, only rows which are different are written to the grid. This will reduce the overhead involved in writting the data to this grids.
if you don't use the above option, but are displaying all the rows, you can see the rows that are different by clicking on View > Only show rows that are different. This will hide all rows in the grid where the rows are the same. Hiding rows is a slow operation, so this is not recommended if your tables are large.
you can re-display all rows with View > Show all rows.
another option is to use Edit > Find non-matching row (or F3). This allows you to move through the grids finding the rows which are different.
on the Show Compare Detail window is an option Next/Prev Diff. This allows you to move through the compare results only seeing the rows which are different.
Copying Results to Excel
You can copy the Compare Results to Excel with Edit > Export Results to Excel.
A dialog box will be shown where you can specify the name of the Excel file and the name of the worksheet.
you can specify <schema> or <tname> in the file or sheet names. AQT will substitute the schema and table name of the Compare table in the names.
if the sheet already exists, it will be overridden. There is no option to append to an existing sheet.
if you check the box Create new Sheet if Exists then, if the specified sheet already exists, a new sheet will be created. This name of the new sheet will be the specified sheet name with a sequence number added onto the end.
Exporting to Excel is slow; this feature is not recommended if you have a lot of data. If your tables are large, you can reduce the amount of data in the result grid by specifying the option Only show rows that are different.
The data exported to Excel will include the highlight color indicating that values are different. However the first column in the Excel file will not have the tick or cross icons showing the status the cell. Exporting images to Excel is very slow; instead the first column will have a color indicating the status of the row:
green - rows are the same
red - rows have some cells which are different
orange - row present in one table but not the other
Writing the results to an Excel spreadsheet
The Compare can write the results to an Excel spreadsheet. This is done with the following options on the More Options tab.

Some notes:
this function can be used when running the compare in batch
this function can be used when using Compare all Tables in Schema
data is written to the Excel file while the compare is running. If there are any errors with writing to the Excel file, the compare will stop.
data is written to the Excel file using Native Excel interface. This requires exclusive access to the Excel file, so you can't have it open in Excel while you are running this.
if you are comparing a large amount of data, this could result in a large amount of data being written to the Excel file, especially if Only write differences is de-selected. This can slow down the compare, plus use a lot of disk space.
It is not recommended to use an Excel file format of 1997. Excel files in this formats cannot have more than 32K rows.
This is a separate function to Copy Results to Excel, which copies the data in the results grid to Excel and is only available when Compare is run in interactive mode.
Specifying the Sheet Name
The sheet name can include:
<dbname>
the database containing the table being compared<schema>
the schema of the table being compared<tname>
the name of the table being compared<date>
the current date in yyyymmdd format<time>
the current time in hhmmss format
You can include these specifications in a longer string - eg. Compare_<tname>_<date>
or <schema>_<tname>
If a worksheet of this name already exists, it will be overwritten. If you are running the compare multiple times, you can avoid this by using <date> and/or <time> in the sheet name.
Results
You can see the results by clicking on the View button to view the Excel file.
Note that the Excel file will not open in the sheet which was just written. You will need to navigate to this.
You must close the Excel file before running the compare again.

Show Compare Detail
You can view the detail of the Compare Result by right-clicking a row and selecting Row Detail. This will show you the compare detail for a particular row, which can be a useful alternate way of viewing the results of the compare.

Values which are different are highlighted in yellow. Note that in this example, DATE_JOINED has not been compared so is not highlighted.
You can step though the Compare Results with Prev and Next.
Prev Diff and Next Diff move to the previous/next row which were different. This provides a fast way to move through all the differences in a compare.
If you have large text values, you can double-click a value to be shown the Data Compare Detail window.
Resyncing the two rows
If you are generating a script to resync the two tables,and the rows are different, the buttons Show Resync SQL and Run Resync SQL will be active:
clicking on Show Resync SQL will show you the SQL which will resync these two rows
Run Resync SQL will run the SQL to resync the rows
After running the SQL, the Compare Results will still show the rows as being different - you will need to rerun the compare for this to be updated.
This feature makes it easy for you to select which rows you want to be resynced.
Compare Performance
The Data Compare is designed to run against very large tables. It is commonly run against tables with millions of rows.
AQT will read the data from the database in primary-key sequence, and will do a merge-match of the results. This is the most efficient way of doing a compare. AQT does not hold the data in-memory or sort the data itself.
The compare scales linearly. This means that a compare of 100,000 rows will generally take about 10 times longer than a compare of 10,000 rows.
How to improve Performance for large tables
When running against large amounts of data:
select Only show rows than match. By default, this option is not selected; AQT will then write the full contents of both tables to the result grids. For large tables, this is slow and will use a lot of memory.
even when Only show rows that match is selected, AQT will write mis-matched rows to the grids. If you have a large number of rows which are different, this will slow down the compare. To avoid this:
if you don't expect there to be many differences, set Stop after x Differences to a low value. This will terminate the compare if it hits a lot of differences, otherwise it can take a long time to finish.
run the compare as a batch script. This does no screen processing, so is the fastest way of doing a compare. You can run the compare script from the Run SQL window (eg. can be done from interactive AQT).
limit the columns being compared to just the ones that need to be compared. The more columns being compared, the slower the compare. Large text columns are slower than small text columns.
one of the bottlenecks in the compare is network performance. If feasible, run the compare on the server that holds the database, or a machine with a small network path to it.
within AQT the bottleneck is CPU speed. Running the compare on a fast machine will make a lot of difference to the speed of the compare.
Slow time for the compare to start
AQT runs the compare by running a statement such as the following against both tables:
select * from table order by primary-key-columns
Some databases, such as DB2, will process this SQL by fetching the table to a temporary tablespace and sorting it before returning the first row to AQT. Depending on the size of the table, this can take a long time and the compare will appear to be "stuck" and not doing much. Once DB2 starts to return the data, the compare will run fast.
Running Data Compare in Batch
Compare is run in batch by using the --aqt datacomp statement. This can be run either in a batch file or in the Run SQL window. This is discussed in more detail in Scripting - Datacomp.
The data compare tool can be used to compare and resynchronise tables in unattended mode (such as overnight processing). An example of this is as follows:
--aqt connect,dbs=aqtdemo
--aqt options,stoponerror=yes
--aqt datacomp... etc ...genfile=C:\Program Files\AQT\datacomp_script.sql
--aqt include,file=C:\Program Files\AQT\datacomp_script.sql
The --aqt datacomp statement generates the resync script file in datacomp_script.sql; this then gets run with the --aqt include statement.
How the Data Compare compares values
The Data Compare will compare values in different ways depending on the data types of the columns.
Character Columns
Data Compare will use a simple string comparison:
if Ignore Case is specified, AQT will do a case-independent compare - eg. test and TEST will be regarded as being equal.
if you specify Options > Display Options > Remove trailing spaces from string values, the Data Compare will ignore trailing spaces on the strings - eg. 'test' and 'test ' will regarded as being equal.
Numeric Compare
When you are comparing numeric values, AQT will convert the numeric value to a common format before doing the compare. This common format:
removes the + sign at the start
removes any trailing zeros after the decimal point
So 1234.00, +1234 will compare successfully to 1234.
Numeric Compare Option
When you are comparing numeric columns, AQT has a choice of using a Character Compare or Numeric Compare (as discussed above). Whether AQT will do this is determined by the Numeric Compare Option. This takes values:
0. Do not use Numeric Compare (Character Compare will be used). In this case 1234.00 and 1234 will be deemed to be different values.
1. Use Numeric Compare if either of the two columns are defined to be numeric.
2. Use Numeric Compare only if both the columns are defined to be numeric.
3. Always use Numeric Compare. This option will can be used to do a numeric compare on columns even if they are defined as character.
In the case of options 1 and 3, AQT will check the values of the column to see if they are numeric. If so, a numeric compare is done, else a character compare is done.
LOB Columns
If you are wishing to compare LOB values (such as BLOBs or CLOBs), then it is recommended that you select Compare full LOB values. When this option is selected, AQT will:
read the full LOB values, and write the LOB values to files
do a bit-by-bit comparison of the LOB files
This will result in a complete compare of the LOB values. The difference of even a single bit will result in the compare being unsuccessful.
Date Columns
Date values are complicated as there are a number of different date data types - Date, Datetime and Timestamp. In addition, different databases have different variations on the date data types.
To enable the different date values to be compared AQT will convert the date values to a common format, which is:
in the case if timestamp columns, if the decimal part is 0, it is ignored
in the case of timestamp and datetime columns, if the time part is zero it is ignored.
This means that the following values will be compared to be the same value:
DB2 Timestamp - 2008-08-04-00:00:00.000000
Oracle Datetime 2008-08-04-00:00:00
DB2 Date 2008-08-04
However a DB2 Timestamp of 2008-08-04-16:37:23.000000 will not compare successfully to a DB2 Date of 2008-08-04 as the time part of the timestamp value is non-zero.
Timestamp scale
The timestamp scale is the sub-second part of the timestamp value. In the case of the value 2008-08-04-16:37:23.123456 the scale is 123456.
When timestamps are compared, the scales must match. For instance:
Oracle Timestamp | DB2 Timestamp |
|
---|---|---|
2008-08-04-16:37:23 | 2008-08-04-16:37:23.000000 | Compares OK (as the Oracle value has an implicit scale of 000000) |
2008-08-04-16:37:23 | 2008-08-04-16:37:23.123456 | Not the same (as the scales are different) |
However, if you specify Ignore diff in timestamp scale... | ||
2008-08-04-16:37:23 | 2008-08-04-16:37:23.123456 | Compares OK (as the differences in timestamp scale is ignored) |
Note that the scale will only be ignored when the scale is a different length between the two databases. So, if you are comparing two DB2 timestamp values, they much match exactly (including the scale) even when Ignore diff in timestamp scale is specfied.
Comparing Character to non-Character values
This covers the case of comparing a character column to a numeric column, or a character column to a date column.
In both these cases the numeric or date value will be converted to its common format, then this compared to the string value. So:
character value '1234' will compare successfully to numeric values 1234.00 and +1234. However the character value '1234.00' will not.
the string value '2008-08-04' will compare successfully to the timestamp value 2008-08-04-00:00:00.000000 . However the character value '2008-08-04-00:00:00.000000' will not.
Collation Order problem with Data Compare
The Data Compare doesn't work very well if you are comparing between databases which have different collating orders. When this is the case, the databases can return the data in different orders, even though the same query (select * from table order by pkcols) is run against them. This will cause AQT's merge-match processing to think the tables are different, when they may be the same.
We do not have a generic solution to this problem. However the following comments may be of some help.
Oracle
For Oracle, there are two ways you can specify the order in which rows are returned.
The NLS_SORT session parameter specifies the collation order used for sorts. You can set this with a statement such as the following (in the Run SQL window):
alter session set nls_sort=ascii7
This can be used to ensure that both databases involved in the compare return the data in the same order.
Alternatively, you can specify that the compare uses a Query (rather than a Table) and use a NLSSORT function on the ORDER BY statement. Example:
select * from table order by
NLSSORT(pkcol,
'NLS_SORT = ascii7')
DB2 for z/OS
For DB2 for z/OS, you can return the data in a particular collating sequence by two methods. In both of these cases you need to specify that you are comparing a Query rather than a Table.
You the CAST function to cast your primary-key columns into a different collation sequence. Example:
select cast(pkcol as varchar(20) CCSID ASCII) as pkcol1, col2, col3 etc from table order by 1
If you are running DB2 for z/OS v9 or later you can use the COLLATION_KEY function on the ORDER BY. Example:
select * from table order by COLLATION_KEY(pkcol, 'UCA400R1_LEN_S3')
DB2/UDB
Similar to the above:
you can use the CAST method for returning rows in a particular order.
for DB2/UDB v9.5 or higher, there is a COLLATION_KEY_BIT function. This is similar to the COLLATION_KEY function in the above example.
SQL Server
For SQL Server you can specify a collation with:
select * from table order by pkcols COLLATE collation-name
You can get a list of valid collation-names with the following query:
SELECT * FROM sys.fn_helpcollations() WHERE name NOT LIKE 'SQL%'
A collation of Latin1_General_100_BIN after works well.
Order-By Mask
This option appears on the More Options tab.
This option makes it easy for you to amend the ORDER BY statement with a particular collation function. Examples:
NLSSORT(:, 'NLS_SORT = ascii7')
COLLATION_KEY(:, 'UCA400R1_LEN_S3')
When this is specified, this function is applied to all character columns in the key. The column name is substituted wherever a colon appears in this function.
This option allows you to use these functions in the Order By clause and continue to compare a Table rather than a Query (this is useful if you are also wishing to use Two Pass Compare, as Two Pass Compare cannot be used when you are comparing a query).
Two-Pass Compare
Two Pass Compare can be specified on the More Options tab:
Use two pass compare - whether to use this feature.
Delay between first and second passes - the number of seconds to wait between first and second passes
Generate difference file - whether to generate a file containing the keys of the rows that are different between the two tables. For two pass compare, this must be specified.
Difference file. This is the name of the difference file. If this is omitted it will default to datacomp_diff.txt in your default directory.
Two-Pass Compare can only be used when you are comparing Tables, and not when you are comparing Queries.
Introduction
Two-Pass compare was developed to deal with two problems that can happen with the Data Compare:
tables are being updated while compare running. This can happen if the compare is run against tables in an operational system. Because the data is continuously being updated, the compare can detect some differences that (a few seconds later) will not exist.
collation order problem. This happens when the tables are in databases that have different collation orders.
How it works
Two pass compare works as follows:
the first pass of the compare works as per normal compare, except that it generates a difference file. The difference file contains a list of the rows that are different between the two tables.
the second pass of the compare reads the difference file and re-compares the rows to see whether or not they are still different.
AQT will wait the Delay period between the first and second phases. If you are running the compare against an operational system, you should set the delay period to a time greater than the largest commit-interval of your database processes.
If you are having problems with the collation order problem, then the re-compare will correctly compare the rows that were not compared successfully due to the collation problem.
Database Access
The first and second phases work quite differently in the way they access the data:
the first pass reads all the rows in the table by doing a sequential scan of the table. This is a very fast way of reading a table.
the second pass reads each of the difference-rows individually to see whether they are still different. This is fast if you only have a few rows, but very slow if you have large number of rows which are different.
Consequently, we only recommend using the Two-Pass Compare if you expect relatively few rows to be different.
Multi-Pass Compare
When running the Data Compare in batch mode, you have more flexibility about how the two-pass compare is run, including using multiple passes.
Normal two-pass compare
This can be done by specifying, on the --datacomp control statement, the following additional parameters:
twopass=y,gendiff=y,diff_file=<filename>
twopass specifies that two-pass compare is to be used.
gendiff specifies that a difference file be generated.
diff_file is the name of the difference file. Note: replace <filename> with the name of your difference file.
Alternatively
A two-pass compare can also be run by running the compare twice. The first time by specifying:
gendiff=y,diff_file=<filename>
and the second time by specifying
diff_file_in=<filename>
by specifying diff_file_in on the second compare, it will run in second pass mode. Eg. the compare will read the difference file and re-compare all the rows.
note that <filename> is the same in both cases.
Running the compare as two steps like this gives you more flexibility:
you can run the two compares at different times - for instance the main step during a convenient batch window, and the second step several hours later.
you can manually review / edit the difference file between the two compare steps.
Multiple passes
The above idea can be extended to run the compare in multiple passes. To do three passes:
on the first pass code:
gendiff=y,diff_file=<filename>
on the second pass code:
gendiff=y,diff_file_in=<filename>,diff_file=<filename2>
on the third pass code:
diff_file_in=<filename2>
The interesting point here is the second pass. This does second-pass processing PLUS it generates a difference file of the differences that still remain at the end of this compare pass. This second difference file is then input to the third step.
Note that <filename> and <filename2> can be specified as the same file. This will not cause a problem within AQT.
Layout of the Difference File
If you wish to view / edit the difference file, it is in the following format:
row-key;grid1-rowid;grid2-rowid;description;
where
row-key is the key of the database rows that are different. If there are multiple columns in the key, the values are comma separated.
grid1-rowid and grid2-rowid. These are only used when running the compare in interactive mode. It is the rowids of the rows in the Results grids. AQT needs this information to amend or delete these rows in these grids with the results of the second-pass of the compare.
description is a brief description of the difference between the tables. This is provided for information only.
If you are running AQT batch mode, and are adding or editing values in the difference file, you only need to specify a value for row-key.