Oracle SQL/PLSQL Tips and Techniques: Oracle Regular Expression useful in preventing the SQL Injection through Oracle forms or other front end PUI.

The following examples taken from the Oracle Documentation http://docs.oracle.com/ for reference.

The following example examines phone_number, looking for the pattern xxx.xxx.xxxx. Oracle reformats this pattern with (xxx) xxx-xxxx.

SELECT
  REGEXP_REPLACE(phone_number,
                 '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})',
                 '(\1) \2-\3') "REGEXP_REPLACE"
  FROM employees;

REGEXP_REPLACE
--------------------------------------------------------------------------------
(515) 123-4567
(515) 123-4568
(515) 123-4569
(590) 423-4567
. . .

The following example examines country_name. Oracle puts a space after each non-null character in the string.

SELECT
  REGEXP_REPLACE(country_name, '(.)', '\1 ') "REGEXP_REPLACE"
  FROM countries;

REGEXP_REPLACE
--------------------------------------------------------------------------------
A r g e n t i n a
A u s t r a l i a
B e l g i u m
B r a z i l
C a n a d a
. . .

The following example examines the string, looking for two or more spaces. Oracle replaces each occurrence of two or more spaces with a single space.

SELECT
  REGEXP_REPLACE('500   Oracle     Parkway,    Redwood  Shores, CA',
                 '( ){2,}', ' ') "REGEXP_REPLACE"
  FROM DUAL;

REGEXP_REPLACE
--------------------------------------
500 Oracle Parkway, Redwood Shores, CA

Multilingual Regular Expression Syntax




Table C-1 Regular Expression Operators and Metasymbols


Operator Description

\ ^a The backslash character can have four different meanings depending on the context. It can:
Stand for itself
Quote the next character
Introduce an operator
Do nothing


* Matches zero or more occurrences
+ Matches one or more occurrences
? Matches zero or one occurrence
| Alternation operator for specifying alternative matches
^ ^b Matches the beginning-of-line character
$ ^b Matches the end-of-line character
. ^c Matches any character in the supported character set except NULL
[ ] ^d Bracket expression for specifying a matching list that should match any one of the expressions represented in the list. A nonmatching list expression begins with a circumflex (^) and specifies a list that matches any character except for the expressions represented in the list.
( ) Grouping expression, treated as a single subexpression
{m} Matches exactly m times
{m,} Matches at least m times
{m,n} Matches at least m times but no more than n times
\n ^e The backreference expression (n is a digit between 1 and 9) matches the n^th subexpression enclosed between '(' and ')' preceding the \n
[..] ^f Specifies one collation element, and can be a multicharacter element (for example, [.ch.] in Spanish)
[: :] ^g Specifies character classes (for example, [:alpha:]). It matches any character within the character class.
[==] ^h Specifies equivalence classes. For example, [=a=] matches all characters having base letter 'a'.



Notes on the POSIX operators and Oracle enhancements:

^a '\': The backslash operator can be used to make the character following it normal if it is an operator. For example, '\*' is interpreted as the asterisk string literal.

^b '^' and '$': The characters '^' and '$' are the POSIX anchoring operators. By default, they match only the beginning or end of an entire string. Oracle lets you specify '^' and '$' to match the start or end of any line anywhere within the source string. This in turns lets you treat the source string as multiple lines.

^c '.': In the POSIX standard, the "match any character" operator ('.') is defined to match any English character except NULL and the newline character. In the Oracle implementation, the '.' operator can match any character in the database character set, including the newline character.

^d '[]': In the POSIX standard, a range in a regular expression includes all collation elements between the start and end points of the range in the linguistic definition of the current locale. Therefore, ranges in regular expressions are linguistic ranges rather than byte values ranges, and the semantics of the range expression are independent of character set. Oracle implements this independence by interpreting range expressions according to the linguistic definition determined by the NLS_SORT initialization parameter.

^e '\n': The backreference expression '\n' matches the same string of characters as was matched by the n^th subexpression. The character n must be a digit from 1 to 9, designating the n^th subexpression, numbered from left to right. The expression is invalid if the source string contains fewer than n subexpressions preceding the \n. For example, the regular expression ^(.*)\1$ matches a line consisting of two adjacent appearances of the same string. Oracle supports the backreference expression in the regular expression pattern and the replacement string of the REGEXP_REPLACE function.

^f '[..]': A collating element is a unit of collation and is equal to one character in most cases, but may comprise two or more characters in some languages. Historically, regular expression syntax does not support ranges containing multicharacter collation elements, such as the range 'a' through 'ch'. The POSIX standard introduces the collation element delimiter '[..]', which lets you delimit multicharacter collection elements such like this one as follows: '[a-[.ch.]]'. The collation elements supported by Oracle are determined by the setting of the NLS_SORT initialization parameter. The collation element is valid only inside the bracketed expression.

^g '[::]': In English regular expressions, range expressions often indicate a character class. For example, '[a-z]' indicates any lowercase character. This convention is not useful in multilingual environments where the first and last character of a given character class may not be the same in all languages. The POSIX standard introduces a portable character class syntax, '[::]'. Oracle supports the following character classes, based on character class definitions in NLS classification data:


Character Class Syntax Meaning

[:alnum:] All alphanumeric characters
[:alpha:] All alphabetic characters
[:blank:] All blank space characters.
[:cntrl:] All control characters (nonprinting)
[:digit:] All numeric digits
[:graph:] All [:punct:], [:upper:], [:lower:], and [:digit:] characters.
[:lower:] All lowercase alphabetic characters
[:print:] All printable characters
[:punct:] All punctuation characters
[:space:] All space characters (nonprinting)
[:upper:] All uppercase alphabetic characters
[:xdigit:] All valid hexadecimal characters



This character class syntax lets you make better use of NLS character definitions to write flexible regular expressions. These character classes are valid only inside the bracketed expression.

^h '[==]': Oracle supports the equivalence classes through the POSIX '[==]' syntax. A base letter and all of its accented versions constitute an equivalence class. For example, the equivalence class '[=a=]' matches ä and â. The equivalence classes are valid only inside the bracketed expression.

Restriction on equivalence classes: Composed and decomposed versions of the same equivalence class do not match. For example, ä does not match a followed by umlaut.

Oracle SQL/PLSQL Tips and Techniques

Thursday, April 17, 2014

Oracle Regular Expression useful in preventing the SQL Injection through Oracle forms or other front end PUI.

No comments:

Post a Comment

Operator	Description
\ ^a	The backslash character can have four different meanings depending on the context. It can: Stand for itself Quote the next character Introduce an operator Do nothing
*	Matches zero or more occurrences
+	Matches one or more occurrences
?	Matches zero or one occurrence
\|	Alternation operator for specifying alternative matches
^ ^b	Matches the beginning-of-line character
$ ^b	Matches the end-of-line character
. ^c	Matches any character in the supported character set except `NULL`
[ ] ^d	Bracket expression for specifying a matching list that should match any one of the expressions represented in the list. A nonmatching list expression begins with a circumflex (^) and specifies a list that matches any character except for the expressions represented in the list.
( )	Grouping expression, treated as a single subexpression
{m}	Matches exactly m times
{m,}	Matches at least m times
{m,n}	Matches at least m times but no more than n times
\n ^e	The backreference expression (n is a digit between 1 and 9) matches the n^th subexpression enclosed between '(' and ')' preceding the \n
[..] ^f	Specifies one collation element, and can be a multicharacter element (for example, [.ch.] in Spanish)
[: :] ^g	Specifies character classes (for example, [:alpha:]). It matches any character within the character class.
[==] ^h	Specifies equivalence classes. For example, [=a=] matches all characters having base letter 'a'.

Character Class Syntax	Meaning
`[:alnum:]`	All alphanumeric characters
`[:alpha:]`	All alphabetic characters
`[:blank:]`	All blank space characters.
`[:cntrl:]`	All control characters (nonprinting)
`[:digit:]`	All numeric digits
`[:graph:]`	All `[:punct:]`, `[:upper:]`, `[:lower:]`, and `[:digit:]` characters.
`[:lower:]`	All lowercase alphabetic characters
`[:print:]`	All printable characters
`[:punct:]`	All punctuation characters
`[:space:]`	All space characters (nonprinting)
`[:upper:]`	All uppercase alphabetic characters
`[:xdigit:]`	All valid hexadecimal characters