XQuery/Lorum Ipsum text

< XQuery

Motivation

You want to create realistically-sized example XML for testing or demonstration. Lorum impsum text is often used to fill out the contents and it would be useful to add this text wherever needed in an XML file.

We explore two approaches, one based on modifying the text and the other modifying the XML.

Approach 1 : string replacement

The places in the incomplete XML file where lorum ipsum text is to be placed is marked with ellipsis "...". The XML file is read, serialised to a string, split into parts, and the parts re-assembled adding a randomly chosen section of the lorum ipsum text in place of the ellipsis. The string is then turned back into XML for output. The base lorum ipsum text is stored as an XML file:

http://www.cems.uwe.ac.uk/xmlwiki/apps/lorumipsum/words.xml

Concepts used

XQuery

declare function local:join-random($parts,$words) {
if (count($parts) > 1)
then 
 let $randomtext :=string-join(subsequence ($words,util:random(100), util:random(100))," ")
 return string-join(($parts[1],$randomtext, local:join-random(subsequence($parts,2), $words)),"")
else $parts
};

let $lorumipsum := doc("/db/Wiki/apps/lorumipsum/words.xml")/lorumipsum
let $words := tokenize($lorumipsum,"\s+")
let $file := request:get-parameter("file",())
let $doc := doc($file)/*
let $docText := util:serialize($doc,"media-type=text/xml method=xml")
let $parts := tokenize($docText, "\.\.\.")
let $completedText := local:join-random($parts,$words)
return util:parse($completedText)

Example

Explanation

Improvements

Approach 2 - XML replacement

The choice of ellipsis as marker is problematic if this is to appear in the text. The conversion into text and back to XML is an overhead.

An alternative approach would be to use an XML element, for example <ipsum/> to mark the places where ipsum lorum text is to appear and replace every occurrance with random word. The replacement of a specific element anywhere in the XML tree can be accomplished by modifying the identify transformation discussed in XQuery/Filtering_Nodes.

Concepts

XQuery

declare variable $lorumipsum := doc("/db/Wiki/apps/lorumipsum/words.xml")/lorumipsum;
declare variable $words := tokenize($lorumipsum,"\s+");
declare variable $marker:= "ipsum";

declare function local:copy-with-random($element as element()) as element() {
   element {node-name($element)}
      {$element/@*,
          for $child in $element/node()
          return
               if ($child instance of element())
               then
                  if (name($child) =  $marker)
                  then subsequence($words,util:random(100),util:random(100))
                  else local:copy-with-random($child)
              else $child
      }
};

let $file := request:get-parameter("file",())
let $root := doc($file)/*
return
    local:copy-with-random($root)

Explanation

Example

Discussion

The second approach is simpler. Performance is about the same.

Acknowledgements

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.