Open Source Site Roundup 2010

Its the end of the year, and a great time to look back. In December 2007, I wrote an article about my favorite open source sites, that I am revisiting. It would be grossly inaccurate to call this an annual update, but if I do this again in 2011 I can call that article an annual update. So without further adieu, these are my favorite Open Source Sites for 2010.

Github (http://www.github.comMy ProfileJustAProgrammer Org)

Distributed version control initially bothered me on principle. I like centralized and federated systems. However, contributing to mongodb forced me to use git and github. This lead me to the realization that in DVCS, the centralized authority came from the one repository that the official builds were generated from. The other repository clones are simply sandboxes and don’t need to be controlled by the buildmaster.

However, github is much more than a git hosting service. It lets you form a community thats focused on the code. It lets you host downloads and have a project wiki so it can replace a site like sourceforge. Unlike sourceforge, it puts the repository at central stage. We here at justaprogrammer like github so much, we setup a github organization to host our open source projects.

Codeplex (http://www.codeplex.comMy Profile)

Between codeplex and Github, I have pretty much written off sourceforge. I think there is plenty wrong with codeplex, but its been getting better. For example, their SVN endpoints for their TFS repositories perform better than they used to. Also, the fact that they support, and sponsor, the DVCS Mecurial is great. Although I’ve not used it myself yet, it seems that you can do a one button clone of a codeplex project that uses Mercurial for SCM a la github’s clone feature.

Now the one obvious problem with github is that it is windows centric. I would not host anything on it that was not primarily a windows or .NET application.

Ohloh (http://www.ohloh.netMy Profile)

This is the only site on my 2007 list that stood the test of time. Ohloh is a unique site. It is a combination of social networking and open source software metrics. If you write open source software you can list it on the site, and have it scan your version control repository. It will report metrics about your software. It will also generate metrics about the lines of code you write across all open source projects on the site. You can also list software you use, and give other users on the site “kudos” if you enjoy their work. Finally, all these metrics are used to sort every user on the site by a single ranking system. The exact formula is a secret like those used to calculate credit scores.

AlternativeTo (http://www.alternativeto.netMy Profile)

This user supplied content site lets you search for alternatives to any program for any operating system. Its awesome for two reasons. First of all, site membership is OpenId based. Secondly, it distinguishes between freeware and open source. As you can tell if you look at my profile, I contribute to the site as well as use it. The majority of the software on it is desktop software, but the list of server software is growing as well.

Conclusion

My list has changed quite a bit in three years. Only one site remained on it, Ohloh. This shows that Open Source and the web have evolved in 3 years. I can’t wait to revisit this site in 2011. Until then I invite you, the reader to share the sites you enjoy in the comments below.

The great open source bounty experiment

Like Joel Atwood, I am a huge fan of paying for software, especially donating to OSS projects. I am also a fan of contributing patches to open source projects I use. Usually, my monetary donations are for projects I’ve used for years as a thank you for the value rendered to me by them. I have also offered money in the past in exchange for the implementation of feature requests. Sometimes my features were implemented, but my money has never been accepted. I am conducting an experiment that will hopefully change this. However, first some backstory.

The Backstory

When I began my current position as a “backend developer” for a Madison Avenue ad firm, I found myself one of two de-facto sysadmins for a Windows 2008 colo server. This server had a comercial software product installed on it called WinSSHD. We mainly used this product for its SFTP capabilities to publish websites to the server. However, WinSSHD is a fully featured ssh daemon, so I can ssh to my Windows box with putty and get a command prompt. From there, I can execute windows shell commands, or run vim, sqlcmd or powershell. In other words, I can administer my windows box in a civilized manner, or at least in a manner that my unix tendencies consider civilized.

However, all was not perfect, as is always the case with software.  I found an obscure bug that was quickly fixed, then I found two bugs that were actually protocol limitations. The first was you had to hit escape twice to enter command mode in my beloved vim, and the second was that function keys would not work, rendering my beloved farmanager useless.

BitVise, the makers of WinSSHD, offers a solution to this problem. That solution is to use their ssh client, Tunnlier. It implements a propietary terminal protocol they included in WinSSHD called bvterm. However, I want to use my beloved putty as the client. So I talked to bitvise and I talked to Simon Tatham, the maintainer of putty. Bitvise published bvterm on the spec page, and Simon took a look at the protocol. Simon said it would take some work, but he was supportive of someone else doing that work.

The Experiment

So the next step seemed obvious to me , setup a bounty, put up some money, and promote the hell out of it. I selected FossFactory as my bounty host. I made an account, registered my project, and put up $200. Simon has been very gracious in reviewing my bounty proposal, and clarifying his requirements for accepting this feature into putty.

So consider this blog post step one of promoting the hell out of it. I’ll be promoting it through as many channels as I think effective, and updating the justaprogrammer readership on the progress of my experiment.

In the brave new world of MaybeSQL, we still need DBAs

Yesterday, Chuck Reeve’s tweeted an article from Daniel Lemire’s blog entitled Who will need database administrators in 2020?. The thesis is that with the advent of all these NoSQL technologies, the role of DBA will become unnecessary. I disagree with this for two reasons. First of all, NoSQL will not replace SQL. Secondly, your NoSQL data store probably needs a DBA, even if he has a different title.

Just a quick note, I’ve worked with enough SQL databases to make broad generalizations about them. The only NoSQL database I have experience with is MongoDB.

NoSQL will not completely replace RDBMSes

SQL databases are the primary practical implementation of the relational model. Most of the “trade school” explanations of the relational model and normal forms use SQL syntax as an example. Relational databases are great at storing data in an organized fashion. Through constraints you can enforce most business rules. Triggers will allow you to do the rest. Relational databases also usually have fine grained access control systems, and mechanisms for auditing changes. Finally, if you have to build a report from your data in a way you never did before or planned to, its usually nice to be able to start out with your data normalized.

Now there are a lot of things a NoSQL database like MongoDB does better than most RDBMSes. For example, MongoDB would be better suited for hosting a simple blog than MySQL. However, MongoDB has not been around all that long, and before MongoDB, relational databases did a good enough job. Now there are many things SQL is better at than MongoDB. For example, I would never use mongo for a complex inventory system. However, many technologists, like Daniel, have been focused on thing that NoSQL is good at, like blogs and simple ecommerce sites. These technologists recognize NoSQL as disruptive technology in the data management field. However, they make the mistake of assuming NoSQL will usurp the role of relational databases completely.

To put it another way, in the brave new MaybeSQL future, we will use SQL for some things and NoSQL for others. The things we will use SQL for, like complex inventory systems, will have complex schemas and need specialists to manage all that data. We already call those specialists DBAs.

Your NoSQL Database needs a DBA

Ok I lied. Your NoSQL database might not need a DBA, just like your relational database might not need one. In relational database shops without formal DBA positions,  there are usually defacto DBAs, senior developers who’ve made it their business to manage the companies databases because management would not allocate a dedicated salary to that function. Currently, I am serving as a defacto DBA for some small databases.

Now I’ve also been playing with MongoDB a lot. I’ve contributed to mongo, spoke about mongo, and been to three mongo conferences. I’ve talked to a lot of people using mongo, and I’ve made a lot of observations. My primary observation is mongo tends to get used in startups. These startups don’t have dedicated DBAs. However, they do have well rounded senior developers that perform DBA and sysadmin functions. Many of these NoSQL programmers also know more about relational databases than I do, which is why they didn’t fight “the mongo way” tooth and nail before accepting it like I did. Now as is the nature of startups, most of the businesses these programmers work at will fail. However, a few will succeed and get big enough to have to hire technologists with more specialized roles. I expect to see a mongo specialists role thats part sysadmin and part programmer evolving at these companies. For companies that use a a combination of a relational databases and MongoDB, I expect a DBA to be hired, learn MongoDB, and take ownership of managing the data stored in that companies MongoDB instances.

Conclusion

NoSQL databases were designed for different problems than relational databases. Relational databases were not designed for things like blogs and massive sites lie facebook. They were used for this role because they were the best tool at the time for the job. MongoDB on the other hand was founded by a founder of doubleclick, who wanted to build a database that scaled the way a database for websites should scale. MongoDB is taking a piece of the pie from relational databases, but not all of it. Also, just like not all relational databases have a full time DBA to maintain them, not all NoSQL databases have a full time administrator. However, that does not mean that a role similar to DBA for NoSQL databases is unnecessary.

Cross database queries in MongoDB

Until recently, I could accurately claim that I’ve spent more time hacking the source code to mongod, then writing code that made db calls to running instances of mongod. That was before I started my current project. For better or for worse, I’m approaching the point where I’m as comfortable with querying mongo collections as I am doing multi table joins in SQL server.

Naturally, as I use MongoDB I find myself asking a lot of “how do I do this in mongo” questions for tasks that I am able to do easily in SQL. More often than not, my main trouble in figuring out how to do the task in question is knowing what to ask google. Recently, my “How do I du jur” was cross database queries.

To define my problem more specifically, I had one document in one collection in my staging database that I wanted deployed to production. My staging and production databases lived on the same server. I realize this is not ideal, but it is the reality of my current situation. If I were to do the equivalent task in Microsoft SQL server, that is copy one row from a table in my staging database into one row in my production database, I’d use a query similar to the following:

USE productionDb
INSERT INTO tableName (id, name, subtitle, isBoolean, misc)
    SELECT id, name, subtitle, isBoolean, misc
        FROM stagingDb..tableName
        WHERE id='6f84dc60-fce6-11df-8cff-0800200c9a66'

A simple query for a simple task. It turns out the equivalent mongo query is about as simple. It just took me a while to find the right syntax, because the docs did not refer to it as a cross database query until I updated them. The shell command is db.getSisterDB(dbName). That functions returns an instance of another db on the server, which in turn contains collection objects that have the familar methods find(), findOne(), update(), save(), remove(), etc. So I did the following:

db.collectionName.save
    (db.getSisterDB('STAGE_database)
        .colletionName
        .findOne({_id: ObjectId("4cfc182f2c320000000013b4")}))

and my object was copied over.

However, there is one caveat to be aware of. Most drivers allow you the full range of bson data types. The shell does not. For example a 32 bit int in a mongo document becomes a double in the shell. So the data is not copied perfectly. I discovered this issue while using a pre-release of the official 1ogen CSharp driver for MongoDB. After some update queries in the shell, objects were not being de-serialized. Luckily, the great people at 10gen made the driver more tolerant on deserialization so this is no longer a problem with current builds of the driver. There are open tickets to add shell support for the missing data types (int32s and GUIDs), so the deficiency of the shell will be addressed. However, until then, be aware of the caveat I mentioned.