Self appointed traditionalist bishop of the Church of XML

In a recent blog, ESR coined an interesting phrase, high-church XML. I have identified with this term, and all of its connotations. I have expanded upon the idea and declared myself a traditionalists bishop in the church of XML

ESR was trying to point out that regexes are appropriate for extracting information from html, because its tag soup, and DOM parsing doesn’t work well. I personally think some clever xPaths could be made to work, but I question the performance. I also think regexes are better when you are screen scraping because you can’t even guarantee the xml will be well formed.

He coined, the term to describe the other extreme, when you have a tightly defined schemas. Assumably the schemas are defined by xsd, but dtd and the exotic schema standards could do similar jobs.

To make a long story short, I spent a lot of time at the other extreme. I’ve used, extended, and authors several xsd schemas at a previous employer. I cannot discuss specifics, but can claim that thousands of documents go through their systems a day that must pass my strict schema, or the data gets sent back to its source like a citizen at the DMV that forgets to fill out part of a form.

I don’t claim superiority to ESR. He is not in a position to tell the forges “clean up your data.” My company was. He is not doing the forges a favor. I am just saying that I do appreciate the beauty of strictly defined xml documents.

Now, just some definitions related to his term an my self appointed title. High and low church refers to emphasis on ritual, incense, and “tradition.”  High church is mass in latin. Low church is Joel Osteen. A bishop is a clergyman tasked with authority over an area served by several churches. And traditionalists refers to Traditionalist Catholics, or a group of Catholics, some excommunicated, some not, that want the Catholic Church to return to its state before Vatican II where mass and the other church rituals were conducted in latin, meat was forbidden on all Fridays, and so on.

I’m a big fan of structure in my data so I’m all about my high church schema’s. I care about xml passionately, so I appoint myself a bishop. But where does the traditionalism come from? What innovations do I reject? The answer is html5. I don’t actually reject html5, I just reject it not being xml. I want a video tag in xhtml.

Github, because software kittens should be cloned, not adopted

This is a tale of two open source software projects. However, unlike two certain European cities, there fates are not particularly intertwined, and there is little greatness about their history.

The projects are mqmanager, and console. Mqmanager is a tool for managing MSMQs written in .NET. Console is a multi-tabbed wrapper around cmd.exe.

Both projects are hosted on sourceforge. Neither project is terribly popular, although console is more so than mqmanager. The difference is in the SCM or source control management.

See mqmanager uses SVN, after a CVS migration, while Console uses git. More specifically Console hosts git on github.

“Whats so special about github?” you may ask. Well, mainly the forking presentment, or more specifically the one click forking. See, svn is centralized source control management, there is one repo that everyone reads from and writes to. In git, everyone has a full copy of the repo, and people can push and pull from each other based on network availability and security settings. More importantly, every open source git repo (they have a paid for option for closed source hosting) on github has a “fork button.” Want to fork project? Make an account, login, browse to the project, click the fork button. You have a full copy of the repo, along with the ability to add issue tracking, downloads, etc.

Now your asking, “Doesn’t this make forking too easy?” Well forking in and of itself isn’t that bad. Current trends in SCM policies for commercial companies as well as open source projects. And making it easier to fork doesn’t make it easier to make anyone care about anyone elses forks. If I forked emacs or firefox, few people would use my fork, regardless of how long it took me to setup SCM. My fork would do little to distract any resources besides my own.

I established forking does little harm, but whats the benifit? The benefit, at least the one I care about is for the less popular projects. Console is much more popular than mqmanager, but neither are hugely popular. Neither have funding or a development team of more than one person.

When I found mqmanager, it was abandoned like a stray kitten. I contacted the original developers, and was given control of the project. I improved it, used it, and forgot about it. Someday someone else might make changes to the code that they want to contribute back to the community. They will have to find me, I will have to give them access to the source forge project, and they will be stock caring for the project.

With console its a different model. The sourceforge page has  a sorce archive and a binary archive. I wanted an installer, and I was willing to write one myself. I made an account on github, forked the project (there is a fork button on each repo that will fork any project), made some changes, committed to my branch, and sent a pull request to the original author. Github makes it quite clear that I am a fork of bozho, and bozho can accept my changes at any time. A year from now someone can fork my version and send bozho and myself push requests.

The benefit of github is you don’t have to ask permission. Your cloning my kitten, not adopting him. Thats important for smaller niche open source projects. If I write a niche open source product, I probably won’t be using it regularly two years from now, but someone else might need it. The decentralized github model works better than the forge model for this.

I have adopted a few kittens over the years, and have a few kittens in need of adoption. I plan on migrating all my OSS project over to github. I will keep up the sourceforge pages, but  strip most of the sourceforge functionality to encourage use of the github pages. Perhaps people will fork these projects. Maybe at the time I will be to busy to evaluate their pull requests, but in the end my software . . . my craft . . . my art will live on and hopefully be mare more useful by others.