In a recent blog, ESR coined an interesting phrase, high-church XML. I have identified with this term, and all of its connotations. I have expanded upon the idea and declared myself a traditionalists bishop in the church of XML
ESR was trying to point out that regexes are appropriate for extracting information from html, because its tag soup, and DOM parsing doesn’t work well. I personally think some clever xPaths could be made to work, but I question the performance. I also think regexes are better when you are screen scraping because you can’t even guarantee the xml will be well formed.
He coined, the term to describe the other extreme, when you have a tightly defined schemas. Assumably the schemas are defined by xsd, but dtd and the exotic schema standards could do similar jobs.
To make a long story short, I spent a lot of time at the other extreme. I’ve used, extended, and authors several xsd schemas at a previous employer. I cannot discuss specifics, but can claim that thousands of documents go through their systems a day that must pass my strict schema, or the data gets sent back to its source like a citizen at the DMV that forgets to fill out part of a form.
I don’t claim superiority to ESR. He is not in a position to tell the forges “clean up your data.” My company was. He is not doing the forges a favor. I am just saying that I do appreciate the beauty of strictly defined xml documents.
Now, just some definitions related to his term an my self appointed title. High and low church refers to emphasis on ritual, incense, and “tradition.” High church is mass in latin. Low church is Joel Osteen. A bishop is a clergyman tasked with authority over an area served by several churches. And traditionalists refers to Traditionalist Catholics, or a group of Catholics, some excommunicated, some not, that want the Catholic Church to return to its state before Vatican II where mass and the other church rituals were conducted in latin, meat was forbidden on all Fridays, and so on.
I’m a big fan of structure in my data so I’m all about my high church schema’s. I care about xml passionately, so I appoint myself a bishop. But where does the traditionalism come from? What innovations do I reject? The answer is html5. I don’t actually reject html5, I just reject it not being xml. I want a video tag in xhtml.