Trailing Spaces in PRIMARY KEY Columns - Oracle to SQL Server Migration

In Oracle, values in primary key columns of type VARCHAR2 and NVARCHAR2 may differ only in trailing spaces. Oracle treats such values as not equal and allows you to use them as different primary keys.

When you transfer such data to SQL Server and then try to create the primary key constraints, SQL Server raises a duplicate key error.

Last Update: Oracle 11g R2 and Microsoft SQL Server 2012

Trailing Spaces in PRIMARY KEY

Assume there is a parent table with primary key values that differ in trailing spaces only:

Oracle:

   CREATE TABLE states
   (
      id VARCHAR2(5) NOT NULL,
      name VARCHAR2(25)
   );
 
   -- Insert 2 values that differ in trailing spaces only
   INSERT INTO states VALUES ('CA', 'California');
   INSERT INTO states VALUES ('CA ', 'California');
 
   ALTER TABLE states ADD PRIMARY KEY (id);
   -- Table altered

Unlike Oracle, SQL Server does not allow you to create the primary key:

SQL Server:

   CREATE TABLE states
   (
      id VARCHAR(5) NOT NULL,
      name VARCHAR(25)
   );
 
   -- Insert 2 values that differ in trailing spaces only
   INSERT INTO states VALUES ('CA', 'California');
   INSERT INTO states VALUES ('CA ', 'California');
 
   ALTER TABLE states ADD PRIMARY KEY (id);
   -- Msg 1505, Level 16, State 1, Line 4
   -- The CREATE UNIQUE INDEX statement terminated because a duplicate key was found

SQL Server follows the ANSI/ISO SQL-92 specification that requires padding strings to be equal in length before comparing them that is why SQL Server treats values 'CA' and 'CA ' as equal.

CHAR Columns

Note that this migration problem applies to variable-length columns only in Oracle (VARCHAR2 and NVARCHAR2). For CHAR and NCHAR columns, Oracle also raises a duplicate key error, and does not allow to create a primary key constraint:

Oracle:

   CREATE TABLE states
   (
      id CHAR(5) NOT NULL,
      name VARCHAR2(25)
   );
 
   -- Insert 2 values differ in trailing spaces only (now it is CHAR column)
   INSERT INTO states VALUES ('CA', 'California');
   INSERT INTO states VALUES ('CA ', 'California');
 
   ALTER TABLE states ADD PRIMARY KEY (id);
   -- ERROR at line 1:
   -- ORA-02437: cannot validate (ORA.SYS_C0014033) - primary key violated

Find PRIMARY KEY and UNIQUE Values that Differ in Trailing Spaces

You can use the following PL/SQL script to find all VARCHAR2 and NVARCHAR2 columns that are part of a primary key and unique key that differ in trailing spaces only in Oracle for the current user:

Oracle:

   SET SERVEROUTPUT ON
 
   -- Cursor to select all VARCHAR2 and NVARCHAR2 columns that are part of primary or unique key
   DECLARE CURSOR c1 IS
          SELECT c.table_name, cc.column_name
          FROM user_constraints c, user_cons_columns cc, user_tab_columns tc
          WHERE c.constraint_type IN ('P', 'U') AND 
                     c.constraint_name = cc.constraint_name AND c.table_name = cc.table_name AND
                     cc.table_name = tc.table_name AND cc.column_name = tc.column_name AND
                     tc.data_type IN ('VARCHAR2', 'NVARCHAR2');
 
   c2 SYS_REFCURSOR;
   stmt VARCHAR2(300);
   dup_value VARCHAR2(300);
   total_found NUMBER := 0;
   total_columns NUMBER := 0;
   count_column BOOLEAN;
 
   BEGIN
 
   DBMS_OUTPUT.PUT_LINE('Opening a cursor...');
 
   -- For each column find duplicates - values that differ in trailing spaces only
   FOR r1 IN c1 LOOP
 
      count_column := TRUE;
 
      -- Statement to select duplicate values with trailing spaces
      stmt := 'SELECT ' || r1.column_name || ' FROM ' || r1.table_name || 
                  ' WHERE RTRIM(' || r1.column_name || ') <> ' || r1.column_name || ' AND ' ||
                  '  RTRIM(' || r1.column_name || ') IN ' ||
                  '  (SELECT RTRIM(' || r1.column_name || ') FROM ' || r1.table_name || 
                  '   GROUP BY RTRIM(' || r1.column_name || ')' ||   
                  '   HAVING COUNT(*) > 1)';
 
      DBMS_OUTPUT.PUT_LINE('Checking ' || r1.table_name || '.' || r1.column_name || '...');
 
      OPEN c2 FOR stmt;
 
      -- Output all duplicate values with trailing spaces
      LOOP  
        FETCH c2 INTO dup_value;
        EXIT WHEN c2%NOTFOUND;
 
        DBMS_OUTPUT.PUT_LINE('''' || dup_value || '''');
 
        IF count_column = TRUE THEN
           total_found := total_found + 1;
           count_column := FALSE;
        END IF;
 
      END LOOP;
 
      total_columns := total_columns + 1;
 
   END LOOP; 
 
   DBMS_OUTPUT.PUT_LINE(total_found || ' columns with trailing spaces found'); 
   DBMS_OUTPUT.PUT_LINE(total_columns || ' columns checked'); 
 
   END;
/

Sample output:

  Opening a cursor...
  Checking ML_SUBSCRIPTION.SUBSCRIPTION_ID...
  Checking STATES.ID...
  'CA '
  1 columns with trailing spaces found
  2 columns checked

  PL/SQL procedure successfully completed.

Note that this script checks each VARCHAR2 and NVARCHAR2 primary key column individually, and even if duplicates are found, it may not be a problem for multi-column keys, but it is still worth checking.

Resolving Trailing Spaces Problem for Oracle to SQL Server Data Migration

Primary key values that only differ in trailing spaces cause the problem when you migrate from Oracle to SQL Server. You cannot create the primary key in SQL Server (including the unique index on such columns), and you also cannot create the foreign key constraints that refer to this table.

Data Cleaning

In most cases, having multiple primary key values that differ in trailing spaces only is caused by a data input error, so it often makes sense to clean the data after you have transferred them to SQL Server, but before creating primary key constraints.

1. Correcting Foreign Keys

The first step is to correct the foreign keys that use the values with trailing spaces, and update them to use the values without trailing spaces.

Let's assume there is a child table using a foreign key value with a space:

Oracle:

   CREATE TABLE cities
   (
      name VARCHAR2(25),
      state VARCHAR2(5) REFERENCES states
   );
 
   -- Insert 2 rows: without and with trailing spaces in the foreign key
   INSERT INTO cities VALUES ('San Diego', 'CA');
   INSERT INTO cities VALUES ('San Francisco', 'CA ');

You can use the following PL/SQL script to find all VARCHAR2 and NVARCHAR2 columns that are part of a foreign key, contain trailing spaces, and can be replaced with the foreign key without trailing spaces.

The script generates UPDATE statements to update foreign key values, you can run them for SQL Server or Oracle database to clean the data:

Oracle:

   SET SERVEROUTPUT ON
 
   -- Cursor to select all VARCHAR2 and NVARCHAR2 columns that are part of a foreign key
   DECLARE CURSOR c1 IS
          SELECT c.table_name, cc.column_name, 
                      pcc.table_name AS pk_table_name, pcc.column_name AS pk_column_name
          FROM user_constraints c, user_cons_columns cc, user_tab_columns tc, user_cons_columns pcc
          WHERE c.constraint_type = 'R' AND 
                     c.constraint_name = cc.constraint_name AND c.table_name = cc.table_name AND
                     cc.table_name = tc.table_name AND cc.column_name = tc.column_name AND
                     pcc.constraint_name = c.r_constraint_name AND cc.position = pcc.position AND
                     tc.data_type IN ('VARCHAR2', 'NVARCHAR2');
 
   stmt VARCHAR2(300);
   cnt NUMBER := 0;
   total_found NUMBER := 0;
   total_columns NUMBER := 0;
 
   BEGIN
 
   DBMS_OUTPUT.PUT_LINE('-- Opening a cursor...');
 
   -- For each foreign key column find values with trailing spaces
   FOR r1 IN c1 LOOP
 
      -- Make sure that the parent table contains a primary key value without trailing spaces
      stmt := 'SELECT COUNT(*) FROM ' || r1.table_name || 
                  ' WHERE RTRIM(' || r1.column_name || ') <> ' || r1.column_name || ' AND ' ||
                  '  RTRIM(' || r1.column_name || ') IN ' ||
                  '  (SELECT ' || r1.pk_column_name || ' FROM ' || r1.pk_table_name || ')';
 
      DBMS_OUTPUT.PUT_LINE('-- Checking ' || r1.table_name || '.' || r1.column_name || '...');
 
      EXECUTE IMMEDIATE stmt INTO cnt;
 
      -- Generate UPDATE statement 
      IF cnt > 0 THEN
        DBMS_OUTPUT.PUT_LINE('UPDATE ' || r1.table_name);
        DBMS_OUTPUT.PUT_LINE(' SET ' || r1.column_name || ' = RTRIM(' || r1.column_name || ')');
        DBMS_OUTPUT.PUT_LINE(' WHERE RTRIM(' || r1.column_name || ') <> ' || r1.column_name || ';');
 
        total_found := total_found + 1;
      END IF;
 
      total_columns := total_columns + 1;
 
   END LOOP; 
 
   DBMS_OUTPUT.PUT_LINE('-- ' || total_found || ' columns with trailing spaces found'); 
   DBMS_OUTPUT.PUT_LINE('-- ' || total_columns || ' columns checked'); 
 
   END;
/

Sample output:

  -- Opening a cursor...
  -- Checking CITIES.STATE...
  UPDATE CITIES
  SET STATE = RTRIM(STATE)
  WHERE RTRIM(STATE) <> STATE;
  -- 1 columns with trailing spaces found
  -- 1 columns checked
 
  PL/SQL procedure successfully completed.

After UPDATE statements have been generated you can run them to correct the foreign keys to not to use the values with the trailing spaces.

2. Correcting Primary Keys

After you have removed the references to the primary keys with trailing spaces, you can remove them.

You can use the following PL/SQL script to find all VARCHAR2 and NVARCHAR2 columns that are part of a primary or unique key, contain trailing spaces, and have the same values but without trailing spaces.

The script generates DELETE statements to delete primary and unique key rows, and you can run them for SQL Server or Oracle database to clean the data:

Oracle:

   SET SERVEROUTPUT ON
 
   -- Cursor to select all VARCHAR2 and NVARCHAR2 columns that are part of a primary or  unique key
   DECLARE CURSOR c1 IS
          SELECT c.table_name, cc.column_name
          FROM user_constraints c, user_cons_columns cc, user_tab_columns tc
          WHERE c.constraint_type IN ('P', 'U') AND 
                     c.constraint_name = cc.constraint_name AND c.table_name = cc.table_name AND
                     cc.table_name = tc.table_name AND cc.column_name = tc.column_name AND
                     tc.data_type IN ('VARCHAR2', 'NVARCHAR2');
 
   stmt VARCHAR2(300);
   cnt NUMBER := 0;
   total_found NUMBER := 0;
   total_columns NUMBER := 0;
 
   BEGIN
 
   DBMS_OUTPUT.PUT_LINE('-- Opening a cursor...');
 
   -- For each foreign key column find values with trailing spaces
   FOR r1 IN c1 LOOP
 
      -- Make sure that the table contains a primary key value without trailing spaces
      stmt := 'SELECT COUNT(*) FROM ' || r1.table_name || 
                  ' WHERE RTRIM(' || r1.column_name || ') <> ' || r1.column_name || ' AND ' ||
                  '  RTRIM(' || r1.column_name || ') IN ' ||
                  '  (SELECT ' || r1.column_name || ' FROM ' || r1.table_name || ')';
 
      DBMS_OUTPUT.PUT_LINE('-- Checking ' || r1.table_name || '.' || r1.column_name || '...');
 
      EXECUTE IMMEDIATE stmt INTO cnt;
 
      -- Generate UPDATE statement 
      IF cnt > 0 THEN
        DBMS_OUTPUT.PUT_LINE('DELETE ' || r1.table_name);
        DBMS_OUTPUT.PUT_LINE(' WHERE RTRIM(' || r1.column_name || ') <> ' || r1.column_name || ';');
 
        total_found := total_found + 1;
      END IF;
 
      total_columns := total_columns + 1;
 
   END LOOP; 
 
   DBMS_OUTPUT.PUT_LINE('-- ' || total_found || ' columns with trailing spaces found'); 
   DBMS_OUTPUT.PUT_LINE('-- ' || total_columns || ' columns checked'); 
 
   END;
/

Sample output:

  -- Opening a cursor...
  -- Checking ML_SUBSCRIPTION.SUBSCRIPTION_ID...
  -- Checking STATES.ID...
  DELETE STATES
  WHERE RTRIM(ID) <> ID;
  -- 1 columns with trailing spaces found
  -- 2 columns checked
 
  PL/SQL procedure successfully completed.

After DELETE statements have been generated you can run them to delete the primary and unique keys with the trailing spaces.

Note. Before deleting the duplicate rows make sure that other non-primary key columns in the deleted rows do not contain more relevant and accurate data than the rows you want to leave.

Resources

Author

Dmitry Tolpeko, dmtolpeko@sqlines.com - July 2012

Please contact me if you are interested in Oracle to SQL Server migration services, consulting, or training.

You could leave a comment if you were logged in.