?

Log in

No account? Create an account

Open source software - John C. Kirk

Apr. 28th, 2008

01:00 am - Open source software

Previous Entry Share Next Entry

Free software is a funny thing, partly because it tends to spark off "holy wars", so it can be hard to focus on the practical issues when you've got people shouting about their vision of purity. I like this blog post (a parody), which applies those principles to cars: The transmission tax.

Most of the people reading this are probably aware of the basic principles, but here's a quick recap. If you get a piece of software for your computer, it can be "closed source" or "open source". Closed source is something like Microsoft Office or Adobe Reader: you get the application itself, so you can run it on your machine, but you don't get the source code that the programmers used to create it. Open source means that you get the source code too, and there are some (theoretical) advantages to this:

a) You can read through the code, to see what it's doing. That way, you can be confident that it's not going to start stealing your credit card numbers. According to advocates (e.g. the Slashdot monkeys), this makes open source software intrinsically more secure than closed source.

b) If you need to make any changes, you can do that without going back to the original company. This is particularly useful if they've gone out of business.

In both cases, you could either do the inspection/changes yourself, or hire someone else to do it for you.

As an ideology, I think it's slightly flawed. For instance, I am free to inspect the Firefox source code if I want to. Looking at a couple of sources (e.g. here and here), that's at least 2 million lines of code. Let's say that I could read one line of code per second; working for 8 hours a day, it would take me 70 days to read it all. So, if I used all my annual leave solely for that, and devoted every weekend to it, I could be done in about six months. Given that I have other things to do with my time, I don't think there's any way I can keep up with the changes, and I'm an IT professional, so this clearly isn't practical for the average home user. That means that you wind up relying on other people to do the inspection/changes. For a big project like Firefox, that's probably a safe bet, but there are plenty of smaller projects that don't attract the same level of attention. Still, that's not a reason to keep a program closed source, it just means you should be a bit sceptical towards the zealots.

It can also be a bit tricky to install this software in a company; Raymond Chen has written a couple of blog entries about that (Solving one problem by creating a bigger problem and If you work at a company, it's not your computer any more). This comment is particularly significant:

I don't know what the issues are with using perl. That's why you have to ask the lawyers. Some IT departments also don't like it when people run unauthorized software on their work computers.

Some packages may contain restrictions such as "The terms of this license apply to anything produced with the aid of this packages," or "For non-commercial use only." And if you redistribute the packages with your own product, then things get really interesting. (Who is liable if a security hole or patent infringement is found in that package?)


I know that several professional writers (e.g. Greg Weisman, JMS) refuse to look at story ideas because they're worried that they might come up with the same idea independently and then be accused of stealing it. In a similar way, what happens if I look through some open source code and then release proprietary software which coincidentally has some similar code in it? Will I (or my company) be sued? I gather that the companies who originally cloned the IBM PC had a strict division of labour: one team reverse engineered the spec, and then a completely different team implemented that spec. (Edit: See pp170-171 of "Accidental Empires", by Robert X. Cringely.)

Anyway, returning to development, let's say that you want to give your software away as a gift to the world. Should you release the source code too? And if so, which licence should you choose? I'm not a lawyer, and reading legal documents isn't really my idea of fun. In fact, this doesn't just apply to software; there's a similar issue for photos. For instance, last year someone asked me to contribute some photos of stations to the Randomness Guide to London. The Creative Commons licence is used for this, but I have absolutely zero interest in wading through it to work out what it's for, so I took a simpler approach: I emailed the photos to her, and basically said "Here you go, as far as I'm concerned you now own these pictures, do whatever you like with them".

Coming back to software, I wrote a small C program in 1995 that would convert a Pine address book into an Elm aliases file. (They're both email programs for Unix systems.) I then released that, with the source code, as described here and here. However, I didn't bother with licences, I just said "here you go" and left it at that. More recently, I put a copy of my MSc project report on my website, which includes the source code. I haven't put that under any specific licence, on the vague basis that it's just included as part of the report rather than an application in its own right, and it will eventually become obsolete when I release an updated version of the bee simulation. I've also put the Kana Test application up as a free application; I haven't included the source code, although it's trivial to decompile it from the MSIL. Someone emailed me a while back to ask whether I'd release the source, and I'm quite happy for other people to have a copy (on the "do whatever you like with it" principle), but then I come back to the problem of choosing a licence. As Jeff Atwood wrote (Pick a License, Any License), there are many to choose from. For instance, I'm quite happy for people to copy chunks of my code into their own closed source applications; what licence do I need for that? So, for now I've dodged that issue.

In the meantime, I've been looking at GameBase. This is a front-end application for emulators; in my case, it's a way to play old Commodore 64 games on my PC. It's a good program, but it has a slight flaw - it doesn't work properly as a limited user in Windows. There are workarounds for that, but it would be better to change the application's behaviour. The guy who wrote it doesn't have time to maintain it anymore, but he released the source code under the GPL, so I downloaded it last June. That code is written in VB6, connecting to Access databases, so it fits in quite well with my professional experience; I've been converting it to VB.NET, and doing a general overhaul (e.g. removing global variables).

I announced this a few days ago in the relevant forum, and the response has been pretty positive so far. Someone asked whether I'd release the source, and I said that I would because that's a condition of the GPL. He/she then said: "I guess if you ever decided to truly open source the project (to let other developers help you), all of the developers would have to use the same version of visual studio."

This echoes another one of Jeff Atwood's posts: Defining Open Source.

The project must provide public evidence that it accepts and encourages code contributions from the outside world. Is a project truly open source if it only has one developer? Is a project truly open source if it has a cabal of three developers who summarily ignore all outside suggestions and contributions?


In the case of GameBase, it's an application with a thriving community, so it makes sense that other people would like to get involved in the development; since I'm a relative newcomer, arguably I shouldn't be making unilateral decisions about what's best. (Mind you, nobody else seems to have been in a rush to do anything with it up until now, so we'll see what happens in due course.) However, how does this work for something like Kana Test? I wrote that for myself, and I'm happy for other people to use it, but am I under a moral obligation to review all the code changes that people submit? If I disagree with a proposed change, but most other users want it, should I make my own application worse for me? Of course, once the code is released people can do what they like with it, but then if I make changes to my own version (e.g. adding Katakana) we wind up with forked copies, and people have to consider which changes to copy across.

Coming back to GameBase, the existing code includes clsCRC, a class written by Fredrik Qvarfort. This has object code directly embedded into it, in order to calculate a CRC checksum (to check whether a given file has been modified). I'm quite impressed, since I didn't know you could do that in VB6. However, this is one step below assembly code - it's a bunch of hex codes, so although I know the overall goal I don't know what it's really doing. Does it still count as open source? If so, how is it different from a compiled program, which consists solely of object code? If not, doesn't this put more limitations on developers? (I assume that they did it this way for a particular reason, e.g. speed, not to be obscure.) I'm guessing that not many people are fluent in x86 machine code, and they may not be particularly interested in this application, so how does this fit Linus's Law? (That theory says: "Given enough eyeballs, all bugs are shallow.") Does the same thing apply to other obscure languages, e.g. Clipper? I intend to replace this code with the managed equivalent, but I think I'll have to dig through Knuth's books to find the original algorithm for that.

One concern I had was about reusing code between applications. For instance, let's say that I re-write the CRC algorithm in VB.NET; it's plausible that I might want to use that elsewhere. Would I be allowed to copy that class into a proprietary application, once it's been released under the GPL? According to the FAQ: "You cannot incorporate GPL-covered software in a proprietary system." Fortunately, it turns out that there is an exception for code that I've added myself: "To release a non-free program is always ethically tainted, but legally there is no obstacle to your doing this. If you are the copyright holder for the code, you can release it under various different non-exclusive licenses at various times."

Ultimately, I lean towards the attitude of "Screw you, hippy!" This "one true way" stuff is what makes it seem like a religion, and if people make it this hard for me to do them a favour, why bother? Anyway, I'll give GameBase my best shot, and use that as a guideline for my other applications.

Comments:

[User Picture]
From:rjw1
Date:April 28th, 2008 08:14 am (UTC)
(Link)
bsd licences are your answer then.
you retain copyright but anyone can do anything they like with the code as long as they credit your copyright.
the other of course is public domain. where you give up all rights to it and people can do anything they feel like.

the cc licences arnt that hard to work out which one you want. okay you can read all the legalese or you can trust the summary of each one.

basiclally they boil down to answering 3 questions

do you want attribution?
can people modify it?
are commercial uses allowed?
(Reply) (Thread)
[User Picture]
From:johnckirk
Date:April 28th, 2008 09:55 am (UTC)
(Link)
Thanks, I'll look into the BSD one.

As for public domain, some people are saying that it's a legal construct which only exists in the USA, so it's not a good idea to do that in other countries. I don't know whether that's true, but I've implicitly done that with the source code I've already released.
(Reply) (Parent) (Thread)
[User Picture]
From:pozorvlak
Date:April 28th, 2008 09:24 am (UTC)
(Link)
Sounds like you want the WTFPL :-)

More seriously, BSD-style licenses allow people to incorporate your code into a proprietary product, so that sounds like what you want. Though there are a lot of open-source licenses, they're mostly pretty short and easy to read (nothing like a proprietary Windows program's EULA), and only a couple of them have any significant traction.

Embedding object code, or other obfuscation, is disallowed by at least the GPL, which requires code to be in the normal form for easy modification.

As for your Firefox example, nobody's expecting you to read the whole thing! To fix any given bug, you shouldn't have to read more than a couple of hundred lines, unless it's a particularly deep bug. Of course, finding the right 200 lines to read is a trick in itself: how easy this is to do is a major determiner of success for large projects.
(Reply) (Thread)
[User Picture]
From:johnckirk
Date:April 28th, 2008 10:05 am (UTC)
(Link)
Sounds like you want the WTFPL :-)

Yeah, I saw that linked from the Pick a License, Any License blog post mentioned above, and it is quite tempting :)

Embedding object code, or other obfuscation, is disallowed by at least the GPL, which requires code to be in the normal form for easy modification.

Hmm, so what does that mean for the VB6 version of GameBase - is it not actually covered by the GPL after all? Does that then mean that I'm in breach of copyright by using his code, or that I'm not obliged to release my modified version?

As for your Firefox example, nobody's expecting you to read the whole thing!

I think it depends what the goal is. For instance, if I just wanted to fix the permission problems in GameBase, I could have identified the relevant places in the VB6 code and just tweaked them, leaving the rest intact, so I wouldn't have to dig through the entire thing.

On the other hand, I mentioned G-Archiver a while back: that program would steal your passwords and email them back to the original programmer. One of the first comments on the linked article was "this story is the best warning against closed source software". So, the implication there is that you should do a code review before you run the code, so that you'll know whether it's up to anything dodgy; that way, you avoid losing your password in the first place. Coming back to Firefox, for all I know the "Print Preview" code will delete all your mp3 files if it's noon on 29th February. It wouldn't be much consolation to be able to identify the bug after I've lost all my data.
(Reply) (Parent) (Thread)
[User Picture]
From:johnckirk
Date:April 28th, 2008 07:12 pm (UTC)
(Link)
Embedding object code, or other obfuscation, is disallowed by at least the GPL, which requires code to be in the normal form for easy modification.

As a further thought on this, I'd argue that obfuscation is somewhat subjective. For instance, I'm guessing that you've come across this Perl module before:
Mail::RFC822::Address: regexp-based address validation
It may mean something to you, but that's a page of complete gibberish to me, so there's no way I can check it for correctness. (I learnt the basics of regular expressions as an undergrad, but Perl is way down my "to do" list.) Raymond Chen has a more moderate regexp here (to match IP addresses), but it's still over my head.

Similarly, I use a program at work which was written in Clarion. I have a copy of the source code too, but I don't have a copy of the Clarion IDE (since that costs about £1000). Going through the code in a text editor, it looks as if the form layout code is all mixed in with the database code, so it's very difficult for me to figure out what's going on. Arguably the same thing applies to something like GameBase, for people who don't have a copy of VB6.
(Reply) (Parent) (Thread)
[User Picture]
From:pozorvlak
Date:April 28th, 2008 07:24 pm (UTC)
(Link)
Yikes! No, I hadn't seen that before. I could, given enough time, work out how it works, but I certainly can't read it easily.

If that's the form in which the code was written, that's the form that should be distributed. OTOH, if that was generated using some script, then the script should be distributed.
(Reply) (Parent) (Thread)
[User Picture]
From:rjw1
Date:April 28th, 2008 12:08 pm (UTC)
(Link)
the source beign there does mean that the bad people can finds flaws. since they dont find to many it cant be all bad. also it does mean that security fixes can be more immediate.
(Reply) (Thread)
[User Picture]
From:rjw1
Date:April 28th, 2008 12:09 pm (UTC)
(Link)
this point makes more sense in my head. im just failing to articulate it.
(Reply) (Parent) (Thread)
[User Picture]
From:shuripentu
Date:April 28th, 2008 02:18 pm (UTC)
(Link)
More beer!
(Reply) (Parent) (Thread)
[User Picture]
From:johnckirk
Date:April 28th, 2008 07:25 pm (UTC)
(Link)
I think I know what you mean - you're arguing against "security by obscurity"?
(Reply) (Parent) (Thread)
[User Picture]
From:elvum
Date:April 28th, 2008 05:08 pm (UTC)
(Link)
You appear to be saying in your analysis of open source "ideology" that you don't have time to read all the code in a large open source project, and therefore you wouldn't have the time to make changes to a small open source project. Is that really what you meant?
(Reply) (Thread)
[User Picture]
From:johnckirk
Date:April 28th, 2008 06:22 pm (UTC)
(Link)
No, that's not what I meant at all, so I apologise if that was unclear. There are two separate things that I'm saying:

1) I don't have time to read all the code in a large open source project, so I can't personally say that it's secure. I therefore have to rely on other people to review the code for me, in which case it makes no difference (in this context) whether the code is open source or closed source.

2) I do have the time to make changes to a small open source project, e.g. GameBase.
(Reply) (Parent) (Thread)
[User Picture]
From:pozorvlak
Date:April 28th, 2008 07:26 pm (UTC)
(Link)
I therefore have to rely on other people to review the code for me, in which case it makes no difference (in this context) whether the code is open source or closed source.

The point is more that interested third parties (who might not be you) can see the source, and fix bugs - you're not reliant on the (hardly disinterested) vendor. Also, the bad guys can see the source, so there's no temptation to rely on "security through obscurity" - you have to actually design your code for security from the start.
(Reply) (Parent) (Thread)