Advertisements

Working with special character letters in Oracle database (grave accent, acute accent, circumflex accent, tilde, and umlaut)

We have some external data feeds that we receive as csv files, and we load them into our data warehouse and process them.    It turns out that data values that include special characters were getting messed up at some point before or during the load to the Oracle database.

In this post, I will go over one way to select, insert or update data values that contain these special characters.  The “special characters” I am referring to are the grave accent, acute accent, circumflex accent, tilde, and umlaut.

The most common one we come across is the “acute accent” which is present in words such as café or entrée (accent over the e in both words).

If you want to insert these words without the accents, into an example table, WORDS, with a single column, SPECIAL_CHAR_WORD, it would simply be:

insert into WORDS (SPECIAL_CHAR_WORD) values ('cafe');
insert into WORDS (SPECIAL_CHAR_WORD) values ('entree');

But if you want to insert these words with the accents, then you would need to do this:

insert into WORDS (SPECIAL_CHAR_WORD) values UNISTR('cafe\0301');
insert into WORDS (SPECIAL_CHAR_WORD) values UNISTR('entre\0301e');

To select the word café with the accent, run this statement:

select UNISTR('cafe\0301') from dual;

Once a column value is already in a table in the proper format, you can simply select the column name using a normal SQL select statement, and the output will show properly (maybe with just the need for some formatting as you will later in the article).

And for update – to update the word entree with the word entrée, run this statement:

update WORD set SPECIAL_CHAR_WORD = UNISTR('entre\0301e') where SPECIAL_CHAR_WORD = 'entree';

To see several special characters (grave accent, acute accent, circumflex accent, tilde, and umlaut), run this statement …

select UNISTR('bare\0300ge') from dual   -- barège    -- grave accent
union
select UNISTR('entre\0301e') from dual   -- entrée    -- acute accent
union
select UNISTR('pa\0302turer') from dual  -- pâturer   -- circumflex accent
union
select UNISTR('jalapen\0303o') from dual -- jalapeño  -- tilde
union 
select UNISTR('fu\0308r') from dual;          -- für    --umlaut

… the output would look like this … which seems not quite right …
Special_Characters_UNISTR

Add the COMPOSE statement (which is just for formatting) as shown below …

select COMPOSE(UNISTR('bare\0300ge')) as SPECIAL_CHAR_WORD from dual   -- barège    -- grave accent
union
select COMPOSE(UNISTR('entre\0301e')) from dual   -- entrée    -- acute accent
union
select COMPOSE(UNISTR('pa\0302turer')) from dual  -- pâturer   -- circumflex accent
union
select COMPOSE(UNISTR('jalapen\0303o')) from dual -- jalapeño  -- tilde
union 
select COMPOSE(UNISTR('fu\0308r')) from dual;     -- führer    --umlaut

and the output will look like this …

Special_Characters_UNISTR_and_COMPOSE2

As  you can see, the key to this is knowing the code for the special character you need, and then using the UNISTR function to add the special character to the rest of the text, and if necessary, use COMPOSE for display purposes.

Thanks for reading. Hope you found this helpful.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: