XML with SAX
The klyn.io.xml.sax package provides an event-driven XML parser. SAX is the right
model when you want to process large XML documents without materializing a whole tree.
import klyn.collections
import klyn.io
import klyn.io.xml.sax
import org.xml.sax
SAX calls handler methods as it reads the document. Keep handler state explicit: booleans for
current context, counters for statistics, and builders for text that may arrive in several
characters callbacks.
import klyn.collections
import klyn.io
import klyn.io.xml.sax
import org.xml.sax
class TitleHandler extends DefaultHandler:
public titles as ArrayList<String>
private _insideTitle as Boolean = false
private _text as StringBuilder
public TitleHandler():
this.titles = ArrayList<String>()
this._text = StringBuilder()
public override startElement(uri as String, localName as String, qName as String, atts as Attributes) as Void throws SAXException:
if qName == "title":
this._insideTitle = true
this._text = StringBuilder()
public override characters(ch as Array<Char>, start as Int, length as Int) as Void throws SAXException:
if not this._insideTitle:
return
i as Int = 0
while i < length:
this._text.append(ch[start + i])
i += 1
public override endElement(uri as String, localName as String, qName as String) as Void throws SAXException:
if qName == "title":
this.titles.add(this._text.toString().trim())
this._insideTitle = false
Create a SAXParser, then call parse with a file path and handler. Parser
errors raise SAXException or a more specific subclass such as
SAXParseException.
handler = TitleHandler()
parser = SAXParser()
parser.parse("books.xml", handler)
for title in handler.titles:
print(title)
For generated XML or tests, wrap text with StringReader and InputSource.
xml = """<books><book><title>Klyn</title></book></books>"""
handler = TitleHandler()
SAXParser().parse(InputSource(StringReader(xml)), handler)
print(handler.titles[0])
- Use SAX for large XML files, import pipelines, logs, and one-pass transformations.
- Use DOM when you need random access, tree edits, or repeated queries over the same document.
- Do not store every event in a SAX handler unless you actually need a tree; that is DOM's job.