This is an old revision of the document!


Case Sensitivity Issues in Primary Keys - Oracle to SQL Server Migration

In Oracle, primary keys are case-sensitive, so if two values differ only in case they can be used as two different primary keys.

In SQL Server, by default, primary keys are case-insensitive and when you transfer data from Oracle to SQL Server and then try to create the primary key constraints, you may have duplicate key errors.

Last Update: Oracle 11g R2 and Microsoft SQL Server 2012

Case Sensitivity in Primary Keys

Assume there is a table with primary keys that differ in case only:

Oracle:

   CREATE TABLE states
   (
      id CHAR(2) NOT NULL,
      name VARCHAR2(25)
   );
 
   -- Insert 2 values that differ in case only
   INSERT INTO states VALUES ('CA', 'California');
   INSERT INTO states VALUES ('Ca', 'California');
 
   ALTER TABLE states ADD PRIMARY KEY (id);
   -- Table altered

Unlike Oracle, SQL Server does not allow you to create the primary key in this case:

SQL Server:

   CREATE TABLE states
   (
      id CHAR(2) NOT NULL,
      name VARCHAR(25)
   );
 
   -- Insert 2 values that differ in case only
   INSERT INTO states VALUES ('CA', 'California');
   INSERT INTO states VALUES ('Ca', 'California');
 
   ALTER TABLE states ADD PRIMARY KEY (id);
   -- Msg 1505, Level 16, State 1, Line 1
   -- The CREATE UNIQUE INDEX statement terminated because a duplicate key was found
   -- The duplicate key value is (Ca).

By default, SQL Server treats values 'CA' and 'Ca' as equal.

Find Primary and Unique Keys that Differ in Case

You can use the following PL/SQL script to find all columns that are part of a primary or unique key, and differ in case only in Oracle for the current user:

Oracle:

   SET SERVEROUTPUT ON
 
   -- Cursor to select all character columns that are part of primary or unique key
   DECLARE CURSOR c1 IS
          SELECT c.table_name, cc.column_name
          FROM user_constraints c, user_cons_columns cc, user_tab_columns tc
          WHERE c.constraint_type IN ('P', 'U') AND 
                     c.constraint_name = cc.constraint_name AND c.table_name = cc.table_name AND
                     cc.table_name = tc.table_name AND cc.column_name = tc.column_name AND
                     tc.data_type IN ('VARCHAR2', 'NVARCHAR2', 'CHAR', 'NCHAR');
 
   c2 SYS_REFCURSOR;
   stmt VARCHAR2(300);
   dup_value VARCHAR2(300);
   total_found NUMBER := 0;
   total_columns NUMBER := 0;
   count_column BOOLEAN;
 
   BEGIN
 
   DBMS_OUTPUT.PUT_LINE('Opening a cursor...');
 
   -- For each column find duplicates - values that differ in case only
   FOR r1 IN c1 LOOP
 
      count_column := TRUE;
 
      -- Statement to select duplicate values 
      stmt := 'SELECT ' || r1.column_name || ' FROM ' || r1.table_name || 
                  ' WHERE UPPER(' || r1.column_name || ') <> ' || r1.column_name || ' AND ' ||
                  '  UPPER(' || r1.column_name || ') IN ' ||
                  '  (SELECT UPPER(' || r1.column_name || ') FROM ' || r1.table_name || 
                  '   GROUP BY UPPER(' || r1.column_name || ')' ||   
                  '   HAVING COUNT(*) > 1)';
 
      DBMS_OUTPUT.PUT_LINE('Checking ' || r1.table_name || '.' || r1.column_name || '...');
 
      OPEN c2 FOR stmt;
 
      -- Output all duplicate values 
      LOOP  
        FETCH c2 INTO dup_value;
        EXIT WHEN c2%NOTFOUND;
 
        DBMS_OUTPUT.PUT_LINE('''' || dup_value || '''');
 
        IF count_column = TRUE THEN
           total_found := total_found + 1;
           count_column := FALSE;
        END IF;
 
      END LOOP;
 
      total_columns := total_columns + 1;
 
   END LOOP; 
 
   DBMS_OUTPUT.PUT_LINE(total_found || ' duplicate columns found'); 
   DBMS_OUTPUT.PUT_LINE(total_columns || ' columns checked'); 
 
   END;
/

Sample output:

  Opening a cursor...
  Checking ML_SUBSCRIPTION.SUBSCRIPTION_ID...
  Checking STATES.ID...
  'Ca'
  1 duplicate columns found
  2 columns checked

  PL/SQL procedure successfully completed.

Note that this script checks each character primary key column individually, and even if duplicates are found, it may be not a problem for multi-column keys, but it is still worth checking.

Resolving Case-Sensitivity Issue in Primary Keys

If you transfer an Oracle table that contains primary keys that differ in case only, you will have errors when you try to create the primary key constraints in SQL Server, or data transfer errors if you create the primary key constraints before transferring the data.

Using Case-Sensitive Collation in SQL Server

The easiest way to solve this problem is to specify a case-sensitive collation for primary columns in SQL Server:

SQL Server:

   CREATE TABLE states
   (
      id CHAR(2) COLLATE Latin1_General_CS_AS NOT NULL,
      name VARCHAR(25)
   );
 
   -- Insert 2 values that differ in case only
   INSERT INTO states VALUES ('CA', 'California');
   INSERT INTO states VALUES ('Ca', 'California');
 
   ALTER TABLE states ADD PRIMARY KEY (id);
   -- Command(s) completed successfully.

CS in COLLATE Latin1_General_CS_AS collation specifies case-sensitivity for column data, and now values 'CA' and 'Ca' are different, so you can successfully create the primary key constraint.

Data Cleaning

In most cases, having multiple primary key values that differ in case only is a result of a data input error, i.e. multiple different primary keys points to the same entity. In this case it often makes sense to clean the data after you have transferred them to SQL Server, but before creating primary key constraints.

To do so, firstly you have to correct the foreign keys that refer to the table, and then remove duplicate primary keys. For more information, see Trailing Spaces in Primary Keys.

Resources

Author

Dmitry Tolpeko, dmtolpeko@sqlines.com - July 2012

Please contact me if you are interested in Oracle to SQL Server migration services, consulting, or training.

You could leave a comment if you were logged in.