Server-side DOM and parsing
elements
The DIV helper and all derived helpers provide the search methods element
and elements
.
element
returns the first child element matching a specified condition (or None if no match).
elements
returns a list of all matching children.
element and elements use the same syntax to specify the matching condition, which allows for three possibilities that can be mixed and matched: jQuery-like expressions, match by exact attribute value, match using regular expressions.
Here is a simple example:
>>> a = DIV(DIV(DIV('a', _id='target', _class='abc')))
>>> d = a.elements('div#target')
>>> d[0][0] = 'changed'
>>> print a
<div><div><div id="target" class="abc">changed</div></div></div>
The un-named argument of elements
is a string, which may contain: the name of a tag, the id of a tag preceded by a pound symbol, the class preceded by a dot, the explicit value of an attribute in square brackets.
Here are 4 equivalent ways to search the previous tag by id:
d = a.elements('#target')
d = a.elements('div#target')
d = a.elements('div[id=target]')
d = a.elements('div', _id='target')
Here are 4 equivalent ways to search the previous tag by class:
d = a.elements('.abc')
d = a.elements('div.abc')
d = a.elements('div[class=abc]')
d = a.elements('div', _class='abc')
Any attribute can be used to locate an element (not just id
and class
), including multiple attributes (the function element can take multiple named arguments), but only the first matching element will be returned.
Using the jQuery syntax “div#target” it is possible to specify multiple search criteria separated by a comma:
a = DIV(SPAN('a', _id='t1'), DIV('b', _class='c2'))
d = a.elements('span#t1, div.c2')
or equivalently
a = DIV(SPAN('a', _id='t1'), DIV('b', _class='c2'))
d = a.elements('span#t1', 'div.c2')
If the value of an attribute is specified using a name argument, it can be a string or a regular expression:
a = DIV(SPAN('a', _id='test123'), DIV('b', _class='c2'))
d = a.elements('span', _id=re.compile('test\d{3}')
A special named argument of the DIV (and derived) helpers is find
. It can be used to specify a search value or a search regular expression in the text content of the tag. For example:
>>> a = DIV(SPAN('abcde'), DIV('fghij'))
>>> d = a.elements(find='bcd')
>>> print d[0]
<span>abcde</span>
or
>>> a = DIV(SPAN('abcde'), DIV('fghij'))
>>> d = a.elements(find=re.compile('fg\w{3}'))
>>> print d[0]
<div>fghij</div>
components
Here’s an example of listing all elements in an html string:
>>> html = TAG('<a>xxx</a><b>yyy</b>')
>>> for item in html.components:
... print item
...
<a>xxx</a>
<b>yyy</b>
parent
and siblings
parent
returns the parent of the current element.
>>> a = DIV(SPAN('a'), DIV('b'))
>>> s = a.element('span')
>>> d = s.parent
>>> d['_class']='abc'
>>> print a
<div class="abc"><span>a</span><div>b</div></div>
>>> for e in s.siblings(): print e
<div>b</div>
Replacing elements
Elements that are matched can also be replaced or removed by specifying the replace
argument. Notice that a list of the original matching elements is still returned as usual.
>>> a = DIV(SPAN('x'), DIV(SPAN('y'))
>>> b = a.elements('span', replace=P('z')
>>> print a
<div><p>z</p><div><p>z</p></div>
replace
can be a callable. In this case it will be passed the original element and it is expected to return the replacement element:
>>> a = DIV(SPAN('x'), DIV(SPAN('y'))
>>> b = a.elements('span', replace=lambda t: P(t[0])
>>> print a
<div><p>x</p><div><p>y</p></div>
If replace=None
, matching elements will be removed completely.
>>> a = DIV(SPAN('x'), DIV(SPAN('y'))
>>> b = a.elements('span', replace=None)
>>> print a
<div></div>
flatten
The flatten method recursively serializes the content of the children of a given element into regular text (without tags):
>>> a = DIV(SPAN('this', DIV('is', B('a'))), SPAN('test'))
>>> print a.flatten()
thisisatest
Flatten can be passed an optional argument, render
, i.e. a function that renders/flattens the content using a different protocol. Here is an example to serialize some tags into Markmin wiki syntax:
>>> a = DIV(H1('title'), P('example of a ', A('link', _href='#test')))
>>> from gluon.html import markmin_serializer
>>> print a.flatten(render=markmin_serializer)
# titles
example of [[a link #test]]
At the time of writing we provide markmin_serializer
and markdown_serializer
.
Parsing
The TAG object is also an XML/HTML parser. It can read text and convert into a tree structure of helpers. This allows manipulation using the API above:
>>> html = '<h1>Title</h1><p>this is a <span>test</span></p>'
>>> parsed_html = TAG(html)
>>> parsed_html.element('span')[0]='TEST'
>>> print parsed_html
<h1>Title</h1><p>this is a <span>TEST</span></p>