XQuery/Regular Expressions

< XQuery

Motivation

You want to test to see if a text matches a specific pattern of characters You want to replace patterns of text with other patterns. You have text with repeating patterns and you would like to break the text up into discrete items.

Method

To deal with the above three problems, XQuery has the following functions:

Through these functions we have access to the powerful syntax of regular expressions.

Summary of Regular Expressions

Regular expressions ("regex") are a field unto itself. If you wish to derive full benefit from this way of describing strings with patterns, you should consult a separate introduction. Priscilla Walmsley's XQuery (Chapter 18) has a clear summary of the functionality offered.

In regular expressions, most characters represent themselves, so you are not obliged to use the special regex syntax in order to utilise these three functions. In regular expressions, a dot (.) represents all characters except newlines. Immediately following a character or an expression such as a dot, one can add a quantifier which tells how many times the character should be repeated: "*" for "0, 1 or many times" "?" for "0 or 1 times," and "+" for "1 or many times." The combination "*?" replaces the shortest substring that matches the pattern. NB: this only scratches the surface of the subject of regular expressions!

The three functions all accept optional flag parameters to set matching modes. The following four flags are available:

If one do not use a flag, one can just leave the slot empty or write "".

Examples of matches()

let $input := 'Hello World'
return
<result>{
  (matches($input, 'Hello') =  true(),
   matches($input, 'Hi') =  false(),
   matches($input, 'H.*') = true(),
   matches($input, 'H.*o W.*d') =  true(),
   matches($input, 'Hel+o? W.+d') = true(),
   matches($input, 'Hel?o+') = false(),
   matches($input, 'hello', "i") = true(), 
   matches($input, 'he l lo', "ix") = true() ,
   matches($input, '^Hello$') = false(), 
   matches($input, '^Hello') = true()
    )}
</result>

Execute

Examples of tokenize()

<result>{
(let $input := 'red,orange,yellow,green,blue'
return deep-equal( tokenize($input, ',') , ('red','orange','yellow','green','blue'))
 ,
let $input := 'red,
orange,	     yellow,  green,blue'
return  deep-equal(tokenize($input, ',\s*') , ('red','orange','yellow','green','blue'))
,
let $input := 'red   ,
orange  ,	     yellow    ,  green ,  blue'
return  not(deep-equal(tokenize($input, ',\s*') , ('red','orange','yellow','green','blue')))
,
let $input := 'red   ,
orange  ,	     yellow    ,  green ,  blue'
return  deep-equal(tokenize($input, '\s*,\s*') , ('red','orange','yellow','green','blue'))
)
}</result>

In the second example, "\s" represents one whitespace character and thus matches the newline before "orange" and the tab character before "yellow". It is quantified with "*" so the pattern removes whitespace after the comma, but not before it. To remove all whitespace, use the pattern '\s*,\s*'.

Execute

Examples of replace()

<result>{
( 
let $input := 'red,orange,yellow,green,blue'
return ( replace($input, ',', '-') = 'red-orange-yellow-green-blue' )
,
let $input := 'Hello World'
return (
    replace($input, 'o', 'O') = "HellO WOrld" ,
    replace($input, '.', 'X') = "XXXXXXXXXXX" ,
    replace($input, 'H.*?o', 'Bye') = "Bye World" 
    )
,
let $input := 'HellO WOrld'
return ( replace($input, 'o', 'O', "i") = "HellO WOrld" )
,
let $input := 'Chapter 1 … Chapter 2 …'
  return ( replace($input, "Chapter (\d)", "Section $1.0")  =  "Section 1.0 … Section 2.0 …")
)
}</result>

In the last example, "\d" represents any digit; the parenthesis around "\d" binds the variable "$1" to whatever digit it matches; in the replacement string, this variable is replaced by the matched digit.

Execute

Larger examples

References

The Regular Expression Library has more than 2,600 sample regular expressions: Regular Expression Library

This page has a very useful summary of the regular expression patterns: Regular Expression Cheat Sheet

This page describes how to use Regular Expressions within XQuery and XPath: XQuery and XPath Regular Expressions

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.