Thursday, February 22, 2007

A Watershed for Open Source

Open-source software isn't a new phenomenon. It has been winding its way through the tech world for decades, starting with Richard Stallman's Free Software movement in 1980s. But only in recent years have businesses warmed to the promise of low-cost, openly available software. In fact, open-source programs have become so popular, they now pose a legitimate threat to the established software giants.

Looking back, 2005 will likely be viewed as a turning point. It was a year when CIOs signed off on open-source projects, a big change from previous years when that happened only after low-level engineers started such projects on their own initiative. It was a year when venture capitalists woke up to the new business opportunities of open source. It was a year when open source was the word on the lips of not just early adopters but of an early majority. According to a new study by consulting firm Optaros, 87% of organizations are now using open-source software, somewhere.

BusinessWeek Online paused in the final days of 2005 to poll a dozen experts, investors, early adopters, and entrepreneurs to get their take on the five biggest open-source events of 2005 -- as well as what to expect for 2006. The following are based on their responses.

1. Red Hat finally proves to everyone it can make money from free software. It took Red Hat (RHAT), which sells and supports a version of the Linux operating system for businesses, nearly 10 years to find its footing, but boy has it. On Dec. 22 it announced stellar third-quarter earnings, with revenues up 43.6%, to $73.1 million, and profits up 114%, to 12 cents per share.

Finally, the Linux movement has a pure open-source success story to point to, and as practically the only vendor that's publicly traded, Red Hat has become a hot commodity. The stock is trading north of $28 as of Dec. 27, up from $13.06 at the beginning of 2005 -- a boost of more than 110%.

And Wall Street is bullish about next year. "Red Hat is one of the best-positioned stocks in software and should be able to further capitalize on the growing demand for open source," wrote Credit Suisse First Boston analyst Jason Maynard in a post-earnings research note.

2. Sun Microsystems open sources everything -- except Java. One reason Linux is becoming mainstream is the broad endorsement from just about everyone who matters in techdom, whether it's Dell (DELL) or Hewlett-Packard (HPQ), whose servers run Linux, or IBM (IBM), which is making a name in open-source support and integration.

Enter Sun Microsystems (SUNW), which made a bold move in late November to open-source almost all of its software except Java. The move transformed Sun into one of the largest open-source software players overnight. Yet critics have complained that what open-source developers really want is Java.

Several experts expect that Sun might finally capitulate in 2006. "It took them a long time to realize if you don't open-source and you're not a market leader, you're dead," says Peter Yared, CEO of open-source startup Active Grid and a former Sun executive.

Novell (NOVL) is another company trying to revive its business through open source. The results have been mixed since it bought Red Hat competitor Suse Linux two years ago. Look for 2006 to be the year it gets its act together -- or gets a new management team (see BW, 10/31/05, "Cold Realities for Novell").

3. Motorola bets big on mobile Linux. Linux is commonplace on servers and is working its way onto many desktops around the world. But desktop- and server-makers don't have to worry about details like battery life. Wireless-phone manufacturers do, and that's Linux' next great frontier. Open Source Development Labs, a nonprofit group that governs and advocates for Linux, formed a Mobile Linux Initiative in October to address these problems.

Even more exciting for penguin lovers, Motorola (MOT), the second-biggest handset maker in the world, announced that Linux would be its standard operating system for the bulk of its future phones. If the OSDL makes progress on the code, other handset makers could follow suit in 2006 (see BW Online, 11/8/05, "Linux Answers Phone Makers' Call").

4. Firefox goes mainstream. The bulk of open-source strides have been made in the business world, as most Linux phones are only sold in China, and Microsoft (MSFT) still dominates the desktop. Firefox is an important exception. The popular browser marked its 100 millionth download in October just before its first birthday, proving how well a mass market can accept open-source software when done right.

"There was a question as to whether we [open-source developers] could do user interfaces, and that's much less of a question now," says Bruce Perens, head of developer relations for open-source startup SourceLabs. Perens and some others think Linux desktop programs could gain steam among consumers in 2006, particularly in emerging countries in Asia and South America where Microsoft's Windows hasn't gained dominance.

5. Venture capitalists wake up to open source. Industry estimates show some $400 million was invested in open-source startups in 2005. Two types of companies dominated the landscape: First, so-called application companies, such as SugarCRM which makes customer relationship management software for companies and aims to compete with Siebel (SEBL) and (CRM).

The other category is services companies, which play the middleman between open-source projects and the info-tech departments at large corporations. Companies such as SpikeSource and SourceLabs test and maintain applications like SugarCRM for companies (see BW Online, 10/3/05, "Open Source: Now It's an Ecosystem").

There's a lot of skepticism about these newer entrants. A few are hits, such as MySQL, which makes open-source database software and is said to be closing in on $40 million in revenues this year. But not too many others are showing much traction.

In 2006, they'll have to put up real revenues or shut down. "Half the companies that raised venture money in 2005 won't be able to raise money in 2006," says Matt Asay who organizes the annual Open Source Business Conference and is vice-president for business development at Alfresco, an open-source document-management startup.

All in all, it has been a pretty great year for open source. And 2006 may be even bigger and better.

Open-Source Software Development


Open source is software developed by uncoordinated but loosely collaborating programmers, using freely distributed source code and the communications infrastructure of the Internet. Open source has a long history rooted in the Hacker Ethic. The term open source was adopted in large part because of the ambiguous nature of free software. Various categories of free and non-free software are commonly referenced, some with interchangeable meanings. Several licensing agreements have therefore been developed to formalize distribution terms. The Cathedral and the Bazaar is the most frequently cited description of the open-source development methodology , however although the paper identifies many mechanisms of successful open-source development, it does not expose the dynamics. There are literally hundreds, if not thousands, of open-source projects currently in existence.

1. Introduction

Open source has generated a considerable amount of interest over the past year. The concept itself is based on the philosophy of free software, which advocates freely available source code as a fundamental right. However, open source extends this ideology slightly to present a more commercial approach that includes both a business model and development methodology.

Open Source Software, or OSS, refers to software for which the source code is distributed without charge or limitations on modifications. Open source sells this approach as a business model, emphasizing faster development and lower overhead, as well as a closer customer relationship and exposure to a broader market.

Open source also encompasses a software development methodology. Open source development is described as a rapid evolutionary process, which leverages large-scale peer review. The basic premise is that allowing source code to be freely modified and redistributed encourages collaborative development. The software can be incrementally improved and more easily tested, resulting in a highly reliable product.

1.1 Background

Much of the Internet infrastructure is open-source software. For example, Sendmail is the dominant mail transfer system on the Internet. BIND is the most widely used implementation of the Internet Domain Name System, and InterNetNews is the most popular Usenet news server. (O’Reilly, 1998a) It is therefore no surprise that the momentum associated with open source has coincided with the rapid growth of the Internet. The Web has made collaboration between programmers easier and possible on a larger scale than before, and projects such as Linux and Apache have become immensely successful. The projected size of various open-source communities is shown in Table 1 (O'Reilly, 1998b).

Table 1. Projected size of open-source communities.

Estimating size of user community
Linux 7,000,000
Perl 1,000,000
BSD 960,000
Apache 400,000
Sendmail 350,000
Python 325,000
Tcl/Tk 300,000
Samba 160,000

The response from the software industry has been varied, but open source has made some notable inroads in a relatively short time. IBM has adopted the Apache Web server as the cornerstone of its WebSphere Internet-commerce application server (IBM, 1998). IBM has also released the source code for an XML parser and Jikes, a Java byte code interpreter (Gonsalves, 1998). Netscape has released the source code for the next generation of its popular Communicator product, restructuring ongoing development as an open-source project (Charles, 1998). Apple has taken a similar approach, releasing portions of its next generation operating system, MacOS X, as open source (Apple, 1999a). Microsoft acknowledged open source as a potential business threat in an internal memo that was subsequently leaked to the press (Valloppillil, 1998), and has recently indicated that it may consider releasing some code.

These developments demonstrate a sustained interest in open source, and it is quickly becoming a viable alternative to conventional methods of software development, as companies attempt to leverage the Internet in reducing time to market.

2. History

2.1 The Hacker Ethic

Open source is firmly rooted in the Hacker Ethic. In the late 1950’s, MIT’s computer culture originated the term hacker, defined today as "a person who enjoys exploring the details of programmable systems …" (Raymond, 1996). Various members of the Tech Model Railroad Club, or TMRC, formed the nucleus of MIT’s Artificial Intelligence Laboratory. These individuals were obsessed with the way systems worked. The word hack had long been used to describe elaborate college pranks devised by MIT students, however TMRC members used the word to describe a task ‘imbued with innovation, style, and technical virtuosity" (Levy, 1984). A project undertaken not solely to fulfill some constructive goal, but with some intense creative interest was called a hack.

Projects encompassed everything electronic, including constant improvements to the elaborate switching system controlling the TMRC’s model railroad. Increasingly though, attentions were directed toward writing computer programs, initially for an IBM 704 and later on the TX-0, one of the first transistor-run computers in the world. Early hackers would spend days working on programs intended to explore the limits of these machines.

In 1961, MIT acquired a PDP-1, the first minicomputer, designed not for huge number-crunching tasks but for scientific inquiry, mathematical formulations, and of course hacking. Manufactured by Digital Equipment Corporation, the PDP series of computers pioneered commercial interactive computing and time-sharing operating systems. MIT hackers developed software that was freely distributed by DEC to other PDP owners. Programming at MIT became a rigorous application of the Hacker Ethic, a belief that "access to computers – and anything which might teach you something about the way the world works – should be unlimited and total" (Levy, 1984).

2.2 ARPAnet

MIT was soon joined by Stanford University’s Artificial Intelligence Laboratory and later Carnegie-Mellon University. All were thriving centres of software development able to communicate with each other through the ARPAnet, the first transcontinental, high-speed data network. Built by the Defense Department in the late 1960’s, it was originally designed as an experiment in digital communication. However, the ARPAnet quickly grew to link hundreds of universities, defense contractors, and research laboratories. This allowed for the free exchange of information with unprecedented speed and flexibility, particularly software.

Programmers began to actively contribute to various shared projects. These early collaborative efforts led to informal principles and guidelines for distributed software development stemming from the Hacker Ethic. The most widely known of these projects was UNIX, which contributed to the ongoing growth of what would eventually become the Internet.

2.3 Unix and BSD

Unix was originally developed at AT&T Bell Labs, and was not strictly speaking a freely available product. However, it was licensed to universities for a nominal sum, which resulted in an explosion of creativity as programmers built on each other’s work.

Traditionally, operating systems had been written in assembler to maximize hardware efficiency, but by the early 1970’s hardware and compiler technology had become good enough that an entire operating system could be written in a higher level language. UNIX was written in C, and this provided unheard of portability between hardware platforms, allowing programmers to write software that could be more easily shared and dispersed.

The most significant source of Unix development outside of Bell Labs was the University of California at Berkeley. UC Berkeley’s Computer Science Research Group folded their own changes and other contributions into a succession of releases. Berkley Unix came to be known as BSD, or Berkley Standard Distribution, and included a rewritten file system, networking capabilities, virtual memory support, and a variety of utilities (Ritchie, 1979).

A few of the BSD contributors founded Sun Microsystems, marketing Unix on 68000-based hardware. Rivalry ensued between supporters of Berkley Unix and AT&T versions. This intensified in 1984, when AT&T divested and Unix was sold as a commercial product for the first time through Unix System Laboratories.

2.4 The GNU Project

The commercialization of Unix not only fractured the developer community, but it resulted in a confusing mass of competing standards that made it increasingly difficult to develop portable software. Other companies had entered the marketplace, selling various proprietary versions of Unix. Development largely stagnated, and Unix System Laboratories was sold to Novell after efforts to create a canonical commercial version failed. The GNU project was conceived in 1983 to rekindle the cooperative spirit that had previously dominated software development.

GNU, which stands for GNU’s Not Unix, was initiated under the direction of Richard S. Stallman, who had been a later participant in MIT’s Artificial Intelligence Lab and believed strongly in the Hacker Ethic. The GNU project had the ambitious goal of developing a freely available Unix-like operating system that would include command processors, assemblers, compilers, interpreters, debuggers, text editors, mailers, and much more. (FSF, 1998a)

Stallman created the Free Software Foundation, an organization that promotes the development and use of free software, particularly the GNU operating system (FSF, 1998c). Hundreds of programmers created new, freely available versions of all major Unix utility programs. Many of these utilities were so powerful that they became the de facto standard on all Unix systems. However, a project to create a replacement for the Unix kernel faltered.

By the early 1990’s, the proliferation of low-cost, high-performance personal computers along with the rapid growth of the World Wide Web had reduced entry barriers to participation in collaborative projects. Free software development extended to reach a much larger community of potential contributors, and projects such as Linux and Apache became immensely successful, prompting a further formalism of hacker best practices.

2.5 The Cathedral and the Bazaar

The Cathedral and the Bazaar (Raymond, 1998a), a position paper advocating the Linux development model, was first presented at Linux Kongress 97 and made widely available on the Web shortly thereafter. The paper presents two singular approaches to software development. The Cathedral represents conventional commercial practices, where developers work using a relatively closed, centralized methodology. In contrast, the Bazaar embodies the Hacker Ethic, in which software development is an openly cooperative effort.

The paper essentially ignored contemporary techniques in software engineering, using the Cathedral as a pseudonym for the waterfall lifecycle of the 1970s (Royce, 1970), however it served to attract widespread attention. A grassroots movement quickly developed, culminating in a January 1998 announcement that Netscape Communications would release the source code for its Web browser. This was the first time that a Fortune 500 company had transformed an enormously popular commercial product into free software.

The term Open Source was coined shortly afterward out of a growing realization that free software development could be marketed as a viable alternative to commercial companies.

3. Definition

The term open source was adopted in large part because of the ambiguous nature of the expression free software. The notion of free software does not mean free in the financial sense, but instead refers to the users' freedom to run, copy, distribute, study, change and improve software. Confusion over the meaning can be traced to the problem that, in English, free can mean no cost as well as freedom. In most other languages, free and freedom do not share the same root; gratuit and libre, for instance. "To understand the concept, you should think of free speech, not free beer," writes Richard Stallman (FSF, 1999a).

3.1 Categories of Free and Non-Free Software

Due to the inherent ambiguity of the terminology, various wordings are used interchangeably. This is misleading, as software may be interpreted as something it is not. Even closely related terms such as free software and open source have developed subtle distinctions. (FSF, 1998b)

3.1.1 Public Domain

Free software is often confused with public domain software. If software is in the public domain, then it is not subject to ownership and there are no restrictions on its use or distribution. More specifically, public domain software is not copyrighted. If a developer places software in the public domain, then he or she has relinquished control over it. Someone else can take the software, modify it, and restrict the source code.

3.1.2 Freeware

Freeware is commonly used to describe software that can be redistributed but not modified. The source code is not available, and consequently freeware should not be used to refer to free software.

3.1.3 Shareware

Shareware is distributed freely, like freeware. Users can redistribute shareware, however anyone who continues to use a copy is required to pay a modest license fee. Shareware is seldom accompanied by the source code, and is not free software.

3.1.4 Open Source

Open source is used to mean more or less the same thing as free software. Free software is "software that comes with permission for anyone to use, copy, and distribute, either verbatim or with modifications, either gratis or for a fee." (FSF, 1999a) In particular, this means that source code must be available.

Free software is often used in a political context, whereas open source is a more commercially oriented term. The Free Software Foundation advocates free software as a right, emphasizing the ethical obligations associated with software distribution (Stallman, 1999). Open source is commonly used to describe the business case for free software, focusing more on the development process rather than any underlying moral requirements.

3.2 Licensing

Various free software licenses have been developed. The licenses each disclaim all warranties. The intent is to protect the author from any liability associated with the software. Since the software is provided free of charge, this would seem to be a reasonable request.

Table 2 provides a comparison of several common licensing practices (Perens, 1999).

Table 2. Comparison of licensing practices.

License Can be mixed with non-free software Modifications can be taken private and not returned to you Can be re-licensed by anyone Contains special privileges for the original copyright holder over your modifications
Public Domain X X X

3.2.1 Copyleft and the GNU Public License

Copyleft is a concept originated by Richard Stallman to address problems associated with placing software in the public domain. As mentioned previously, public domain software is not copyrighted. Someone can make changes to the software, many or few, and distribute the result as a proprietary product. People who receive the modified product may not have the same freedoms that the original author provided. Copyleft says that "anyone who redistributes the software, with or without changes, must pass along the freedom to further copy and change it." (FSF, 1999b)

To copyleft a program, first it is copyrighted and then specific distribution terms are added. These terms are a legal instrument that provide rights to "use, modify, and redistribute the program's code or any program derived from it but only if the distribution terms are unchanged." (FSF, 1999b)

In the GNU project, copyleft distribution terms are contained in the GNU General Public License, or GPL. The GPL does not allow private modifications. Any changes must also be distributed under the GPL. This not only protects the original author, but it also encourages collaboration, as any improvements are made freely available

Additionally, the GPL does not allow the incorporation of licensed programs into proprietary software. Any software that does not grant as many rights as the GPL is defined as proprietary. However, the GPL contains certain loopholes that allow it to be used with software that is not entirely free. Software libraries that are normally distributed with the compiler or operating system may be linked with programs licensed under the GPL. The result is a partially-free program. The copyright holder has the right to violate the license, but this right does not extend to any third parties who redistribute the program. Subsequent distributions must follow all of the terms of the license, even those that the copyright holder violates.

An alternate form of the GPL, the GNU General Library Public License or LGPL, allows the linking of free software libraries into proprietary executables under certain conditions. In this way, commercial development can also benefit from free software. A program covered by the LGPL can be converted to the GPL at any time, but that program, or anything derived from it, cannot be converted back to the LGPL.

The GPL is a political manifesto as well as a software license, and much of the text is concerned with explaining the rationale behind the license. Unfortunately this political dialogue has alienated some developers. For example, Larry Wall, creator of Perl and the Artistic license, says "the FSF [Free Software Foundation] has religious aspects that I don’t care for" (Lash, 1998). As a result, some free software advocates have created more liberal licensing terms, avoiding the political rhetoric associated with the GPL.

3.2.2 The X, BSD, and Apache Licenses

The X license and the related BSD and Apache licenses are very different from the GPL and LGPL. The software originally covered by the X and BSD licenses was funded by monetary grants from the US government. In this sense, the public owned the software, and the X and BSD licenses therefore grant relatively broad permissions.

The most important difference is that X-licensed modifications can be made private. An X-licensed program can be modified and redistributed without including the source or applying the X license to the modifications. Other developers have adopted the X license and its variants, including the BSD and the Apache web server.

3.2.3 The Artistic License

The Artistic license was originally developed for Perl, however it has since been used for other software. The terms are more loosely defined in comparison with other licensing agreements, and the license is more commercially oriented. For instance, under certain conditions modifications can be made private. Furthermore, although sale of the software is prohibited, the software can be bundled with other programs, which may or may not be commercial, and sold.

3.2.4 The Netscape Public License and the Mozilla Public License

The Netscape Public License, or NPL, was originally developed by Netscape. The NPL contains special privileges that apply only to Netscape. Specifically, it allows Netscape to re-license code covered by the NPL to third parties under different terms. This provision was necessary to satisfy proprietary contracts between Netscape and other companies. The NPL also allows Netscape to use code covered by the NPL in other Netscape products without those products falling under the NPL.

Not surprisingly, the free software community was somewhat critical of the NPL. Netscape subsequently released the MPL, or Mozilla Public License. The MPL is similar to the NPL, but it does not contain exemptions. Both the NPL and the MPL allow private modifications.

3.2.5 The Open Source Definition

The Open Source Definition is not a software license. Instead it is a specification of what is permissible in a software license for that software to be considered open source. The Open Source Definition is based on the Debian free software guidelines or social contract, which provides a framework for evaluating other free software licenses.

The Open Source Definition includes several criteria, which can be paraphrased as follows (OSI, 1999):

  1. Free Redistribution – Copies of the software can be made at no cost.
  2. Source Code – The source code must be distributed with the original work, as well as all derived works.
  3. Derived Works – Modifications are allowed, however it is not required that the derived work be subject to the same license terms as the original work.
  4. Integrity of the Author’s Source Code – Modifications to the original work may be restricted only if the distribution of patches is allowed. Derived works may be required to carry a different name or version number from the original software.
  5. No Discrimination Against Persons or Groups – Discrimination against any person or group of persons is not allowed.
  6. No Discrimination Against Fields of Endeavor – Restrictions preventing use of the software by a certain business or area of research are not allowed.
  7. Distribution of License – Any terms should apply automatically without written authorization.
  8. License Must Not Be Specific to a Product – Rights attached to a program must not depend on that program being part of a specific software distribution.
  9. License Must Not Contaminate Other Software – Restrictions on other software distributed with the licensed software are not allowed.

The GNU GPL, BSD, X Consortium, MPL, and Artistic licenses are all examples of licenses that conform to the Open Source Definition.

The evaluation of a proposed license elicits considerable debate in the free software community. With the growing popularity of open source, many companies are developing licenses intended to capitalize on this interest. Some of these licenses conform to the Open Source Definition, however others do not. For example, the Sun Community Source License approximates some open source concepts, but it does not conform to the Open Source Definition. The Apple Public Source License, or APSL (Apple, 1999b), as been alternately endorsed and rejected by members of the open-source community.

4. Methodology

The Cathedral and the Bazaar is the most frequently cited description of the open-source development methodology. Eric Raymond’s discussion of the Linux development model as applied to a small project is a useful commentary. However, it should be noted that although the paper identifies many mechanisms of successful open-source development, it does not expose the dynamics. In this sense, the description is inherently weak.

4.1 Plausible Promise

Raymond remarks that it would be difficult to originate a project in bazaar mode. To build a community, a program must first demonstrate plausible promise. The implementation can be crude or incomplete, but it must convince others of its potential. This is given as a necessary precondition of the bazaar, or open-source, style.

Interestingly, many commercial software companies use this approach to ship software products. Microsoft, for example, consistently ships early versions of products that are notoriously bug ridden. However as long as a product can demonstrate plausible promise, either by setting a standard or uniquely satisfying a potential need, it is not necessary for early versions to be particularly strong.

Critics suggest that the effective utilization of bazaar principles by closed source developers implies ambiguity. Specifically, that the Cathedral and the Bazaar does not sufficiently describe certain aspects of the open-source development process (Eunice, 1998).

4.2 Release Early, Release Often

Early and frequent releases are critical to open-source development. Improvements in functionality are incremental, allowing for rapid evolution, and developers are "rewarded by the sight of constant improvement in their work." (Raymond, 1998a)

Product evolution and incremental development are not new. Mills initially proposed that any software system should be grown by incremental development (Mills, 1971). Brooks would later elaborate on this concept, suggesting that developers should grow rather than build software, adding more functions to systems as they are run, used, and tested (Brooks, 1986). Basili suggested the concept of iterative enhancement in large-scale software development (Basili and Turner, 1975), and Boehm proposed the spiral model, a evolutionary prototyping approach incorporating risk management (Boehm, 1986).

Open source relies on the Internet to noticeably shorten the iterative cycle. Raymond notes that "it wasn’t unknown for [Linus] to release a new kernel more than once a day." (Raymond, 1998a) Mechanisms for efficient distribution and rapid feedback make this practice effective.

However, successful application of an evolutionary approach is highly dependent on a modular architecture. Weak modularity compromises change impact and minimizes the effectiveness of individual contributors. In this respect, projects that do not encourage a modular architecture may not be suitable for open-source development. This contradicts Raymond’s underlying assertion, that open source is a universally better approach.

4.3 Debugging is Parallelizable

Raymond emphasizes large-scale peer review as the fundamental difference underlying the cathedral and bazaar styles. The bazaar style assumes that "given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone." Debugging requires less coordination relative to development, and thus is not subject "to the same quadratic complexity and management costs that make adding developers problematic." (Raymond, 1998a)

The basic premise is that more debuggers will contribute to a shorter test cycle without significant additional cost. In other words, "more users find more bugs because adding more users adds more ways of stressing the program." (Raymond, 1998a) However, open source is not a prerequisite for peer review. For instance, various forms of peer review are commonly employed in software engineering. The question might then become one of scale, but Microsoft practices beta-testing on a scale matched only by larger open-source projects.

Raymond continues, suggesting that debugging is even more efficient when users are co-developers, as is most often the case in open-source projects. This is also subject to debate. Raymond notes that each tester "approaches the task of bug characterization with a slightly different perceptual set and analytical toolkit, a different angle on the problem." (Raymond, 1998a) This is characterized by the fact that developers and end-users evaluate products in very different ways. It therefore seems likely that peer review under the bazaar model would be constrained by a disproportionate number of co-developers.

5. Project Profiles

There are literally hundreds, if not thousands, of open-source projects currently in existence. These projects include operating systems, programming languages, utilities, Internet applications and many more. The following projects are notable for their influence, size, and success.

5.1 Linux

Linux is a Unix-like operating system that runs on several platforms, including Intel processors, Motorola MC68K, and DEC Alphas (SSC, 1998). It is a superset of the POSIX specification, with SYS V and BSD extensions. Linux began as a hobby project of Linus Torvalds, a graduate student at the University of Helsinki. The project was inspired by his interest in Minix, a small Unix system developed primarily as an educational tool by Andy Tannenbaum. Linus set out to create, in his own words, "a better Minix than Minix." In October 1991, Linus announced the first official release of Linux, version 0.02. Since then, hundreds of programmers have contributed to the ongoing improvement of the system.

Linux kernel development is largely coordinated through the linux-kernel mailing list. The list is high volume, and currently includes over 200 active developers as well as many other debuggers and testers. With the growth of the project, Linus has relinquished control over certain areas of the kernel, such as file systems and networking, to other ‘trusted lieutenants." However, Linus remains the final authority on decisions related to kernel development. The kernel is under the GPL, and official versions are made available via ftp.

Arguably the most well known open-source project, Linux has quietly gained popularity in academia as well as among scientific researchers and Internet service providers. Recently, it has made commercial advances, and is currently marketed as the only viable alternative to Microsoft Windows NT. A study by International Data Corporation reported that Linux accounted for 17.2 % of server operating system shipments in 1998, an increase of 212% over the previous year (Shankland, 1998). The Linux kernel is typically packaged with the various other programs that comprise a Unix operating system. Several commercial companies currently sell these packages as Linux distributions.

5.2 Apache

Apache originated in early 1995 as a series of enhancements to the then-popular public domain HTTP daemon developed by Rob McCool at the National Center for Supercomputing Applications, or NCSA. Rob McCool had left NCSA in mid 1994, and many Webmasters had become frustrated with a lack of further development. Some proceeded to develop their own fixes and improvements. A small group coordinated these changes in the form of patches and made the first official release of the Apache server in April 1995, hence the name A PAtCHy server. (Laurie, 1999)

The Apache Group is currently a core group of about 20 project contributors, who now focus more on business issues and security problems. The larger user community manages mainstream development. Apache operates as a meritocracy, in a format similar to most open-source projects. Responsibility is based on contribution, or "the more work you have done, the more work you are allowed to do." (The Apache Group, 1999) Development is coordinated through the new-httpd mailing list, and a voting process exists for conflict resolution.

Apache has consistently ranked as the most popular Web server on the Internet (Netcraft, 1999). Currently, Apache dominates the market and is more widely used than all other Web servers combined. Industry leaders such as DEC, UUNet, and Yahoo use Apache. Several companies, including C2Net, distribute commercial versions of Apache, earning money for support services and added utilities.

5.3 Mozilla

Mozilla is an open-source deployment of Netscape’s popular Web browsing suite, Netscape Communicator. Netscape’s decision was strongly influenced by a whitepaper written by employee Frank Hecker (Hecker, 1998), which referenced the Cathedral and the Bazaar. In January 1998, Netscape announced that the source code for the next generation of Communicator would be made freely available. The first developer release of the source code was made in late March 1998. exists as a group within Netscape responsible for coordinating development. Mozilla has established an extensive web site, which includes problem reporting and version management tools. Discussion forums are available through various newsgroups and mailing lists. The project is highly modular and consists of about 60 groups, each responsible for a particular subsystem. All code issued in March was released under the NPL. New code can be released under the MPL or any compatible license. Changes to the original code are considered modifications and are covered by the NPL.

Although it has benefited from widespread media exposure, Mozilla has yet to result in a production release. It is therefore difficult to evaluate the commercial success of the project. The recent merger of AOL and Netscape has introduced additional uncertainty, but many continue to feel confident that the project will produce a next generation browser.

5.4 Perl and Python

Perl and Python are mature scripting languages that have achieved considerable market success. Originally developed in 1986 by Larry Wall, Perl has become the language of choice for system and network administration, as well as CGI programming. Large commercial Web sites such as Yahoo and Amazon make extensive use of Perl to provide interactive services.

Perl, which stands for Practical Extraction and Report Language, is maintained by a core group of programmers via the perl5porters mailing list. Larry Wall retains artistic control of the language, but a well-defined extension mechanism allows for the development of add-on modules by independent programmers. (Wall et al, 1996)

Python was developed by Guido van Rossum at Centrum voor Wiskunde en Informatica, or CWI, in Amsterdam. It is an interactive, object-oriented language and includes interfaces to various system calls and libraries, as well as to several popular windowing systems. The Python implementation is portable and runs on most common platforms. (Lutz, 1996)

5.5 KDE and GNOME

KDE and GNOME are X11 based desktop environments. KDE also includes an application development framework and desktop office suite. The application framework is based on KOM/OpenParts technology, and leverages open industry standards such as the object request broker CORBA 2.0. The office suite, KOffice, consists of a spreadsheet, a presentation tool, an organizer, and an email and news client.

GNOME, or the GNU Network Object Model Environment, is similar in many ways to KDE. However GNOME uses the gtk+ toolkit, which is also open source, whereas KDE uses Qt, a foundation library from Troll Tech that was commercially licensed until recently.

KDE and GNOME are interesting because they represent the varying commitments in the open source community to commercial markets and the free software philosophy. The KDE group and Troll Tech initially tried to incorporate Qt, a proprietary product, into the Linux infrastructure. This was met with mixed reactions. The prospect of a graphical desktop for Linux was so attractive that some were willing to overlook the contradictory nature of the project. However, others rejected KDE and instead supported GNOME, which was initiated as a fully open source competitor. Eventually, Troll Tech realized Qt would not be successful in the Linux market without a change in license, and a new agreement was released, defusing the conflict. GNOME continues, aiming to best KDE in terms of functionality rather than philosophy (Perens, 1999).

5.6 Other Projects

Other lesser known, but equally interesting, projects include GIMP, FreeBuilder, Samba, and Kaffe. Each of these projects follows the open source methodology, originating under the direction of an individual or small group and rapidly extending to a larger development community.

GIMP, or the GNU Image Manipulation Program, can be used for tasks such as photo retouching, image composition and image authoring. GIMP was written by Peter Mattis and Spencer Kimball, and released under the GPL. FreeBuilder is a visual programming environment based on Java. It includes an integrated text editor, debugger, and compiler. Samba allows Unix systems to act as file and print servers on Microsoft Windows networks. Development is headed by Andrew Tridgell. Kaffe is a cleanroom implementation of the Java virtual machine and class libraries.

6. Summary and Conclusions

Open source is software developed by uncoordinated but loosely collaborating programmers, using freely distributed source code and the communications infrastructure of the Internet. Open source is based on the philosophy of free software. However, open source extends this ideology slightly to present a more commercial approach that includes both a business model and development methodology. Various categories of free and non-free software are commonly, and incorrectly, referenced, including public domain, freeware, and shareware. Licensing agreements such as the GPL have been developed to formalize distribution terms. The Open Source Definition provides a framework for evaluating these licenses.

The Cathedral and the Bazaar is the most frequently cited description of the open-source development methodology, however although the paper identifies many mechanisms of successful open-source development, it does not expose the dynamics. Critics note that certain aspects remain ambiguous, which suggests that the paper does not sufficiently describe the open-source development process.

There are hundreds, if not thousands, of open-source projects currently in existence. These projects face growing challenges in terms of scalability and inherently weak tool support. However open source is a pragmatic example of software development over the Internet. It provides interesting lessons with regard to large-scale collaboration and distributed coordination. If the open-source approach is adequately studied and evaluated, ideally the methodology might be applied in a broader context.

Free Speech and Free Beer: The Evolution of Open Source in a Massively Connected World

One of the big myths about open source software is that being free is its chief value. The myth, of course, is in the meaning people ascribe to "free."

As members of the Free Software Foundation and many others point out, the word "free" in this context is not about the price; it is about the liberty. "Free" here is as used in the phrase "free speech" (a right we all covet) rather than in the phrase "free beer" (which is always too good to be true) or "free kitten" (which sounds good but turns out to have a high overhead).

The issue is made more confusing because much of the affected software also has a zero price tag. A natural consequence of the GNU General Public License (GPL) that enforces the liberty of software developers to use the source code created by their peers is the inability of anyone to charge a premium for it or its derivatives. The innovation of the Open Source Initiative1 was to provide new language and new licenses with which to explore the conceptual space. The open source movement allowed the evolution of the concept of free source-code from its original socialist purity towards a more useful plan. It suggested alternatives to GPL licensing and promoted the possibility of hybrid open source/closed source works by working with companies to write business-friendly licenses.

Of course, the early years of the movement have focused on the free (as in beer) software as a natural consequence of their starting point, so it is still possible for people to misunderstand. But we have seen a definite shift in purpose for open source over the last few years. The open source community has welcomed companies that build commercial enterprises on open source code, as long as they do so symbiotically rather than parasitically. Today, based on Sun Microsystems' experiences with several diverse projects, it is clear that open source has matured into a well-tried, validated development methodology.

The Massively Connected Developer

The key values of the Internet flow from the massively connected mesh of participants, which Metcalfe's Law observes as leading to an exponentially growing pool of potential relationships. Complementing that are sets of loosely coupled, open and royalty-free standards, allowing both individuals and corporations to participate at the cost not of negotiating each relationship, but rather of establishing connection to the standards. The loosely coupled nature of the standards means that the relationships formed are flexible and tolerant, not brittle and fragile. This exponential-relationship, linear-cost world is termed the Net Effect and has been the primary energy source of the Internet revolution from its inception.

The Web came about on the Internet through the operation of the Net Effect, and today we need more than ever a development and deployment methodology that harnesses it. Open source has reached this evolutionary stage. It provides the ideal, loosely coupled yet rigorous environment for the massively connected community that comprises many of the key developments in the new Internet. What distinguishes projects like Apache, NetBeans, Linux and GNOME is less the price tag of the deliverable but rather the eclectic inclusiveness of the project itself. If a standard is a technology set where the community affected by changes controls the changes, then open source looks destined to be the foundation for standards in the 21st century.

Open source is not without cost. Someone has to underwrite the community. Developers have to donate their time and expertise. Sun's experience of underwriting and (and others) has been of the massively connected community of both commercial and individual end users and developers joining with generosity and cooperation to build the platform on which the solutions they need can be delivered. The evolution of these projects has revealed the existence of a commercially valid business model based on open source. It has become the development methodology of the Net Effect.

Where to Draw the Line

One of the best uses of the evolved open source development model is as the basis for business ideas where there is the potential of a large user community. A "community of code" maintains a code base of open source components or elements, preferably itself evolved "in the open," using the behaviors and principles of the open source movement well documented elsewhere2. These inherently lead to better code being created, debugged and documented faster, not least because of the scrutiny of the community. These benefits are obtained as long as there is a viable, massively connected community of interested parties to create, maintain and improve the code. While one might expect the "community of code" to be chaotic, it is in fact typically well regulated yet dynamic.

Open Source ModelFrom the "community of code", a gatekeeper function emerges, operating with the implicit consent of the community. The gatekeeper sets the bounds for the project (embodying the will of the community rather than imposing an external view) and creates the "reference implementation." Changes are made sufficiently infrequently to ensure stability of the solution layer above while paying due regard to the work of the community below. This gatekeeper is the key to the success of the evolved open source model, bridging the informality of the community with the stability needs of the commercial enterprise.

Above the reference implementation is the all-important dividing line between the community-based foundation platform and the commercial-based solution offerings that depend on it. The licenses used below the line foster and protect the community; those above the line facilitate commercial success in an area where a viable development community could not be sustained. Drawing the dividing line is the great art; the balance between community and commerce differs for every project.

To make this model clearer, it is helpful to consider an example, such as The community of code in this case is a democratic and independent community founded by Sun in 2000 and hosted by CollabNet at Sun's expense. The community is diverse, with thousands of subscribers and a number of full commercial offerings3 built on the reference implementation. Individual developers participate for the reason they always have -- to gain software that fits like a glove and the reputation for excellence that their work deserves.

Likewise Linux and Apache can be easily illustrated in the same way. In each case, the commercial offerings do not just consume the reference platform parasitically; rather, the engineers working on them join the community of code and share in its evolution.

Perhaps we need to refer to this new, massively connected methodology by a new name to help people avoid the intellectual traps that bias them against open source. Open method? Collaborative source? Collaborative commons? Suggestions welcomed!
Open or Closed?

Just as the open source movement has grown from idealism to commercial success, we need to evolve our understanding of it. The experience of Sun and others is that open source provides the ideal development and business models for today's massively connected, Net Effect economy. It is not about free stuff; it is about enfranchising every member of the user and development community, both commercial and individual developers. Today's software innovations need this Net Effect model more than ever before.

The opportunities for subtle commercial lock-in offered by our traditional understanding of 'standards' are largely avoided by it. Instead we see genuine collaborative innovation. With a closed model, innovation is the domain solely of the corporation, which can then monopolistically tax the results. With an open model, companies can gain their just compensation for their innovations "above the line" but there remains room for free innovation by all.

Most importantly, open source is not just about the code; it is about the community. You don't make a project open source simply by publishing the source code. It takes the potentially costly and time-consuming creation and maintenance of a community of code, a trusted gatekeeper function and a series of symbiotic commercial enterprises to make true open source. So true 21st century open source is not free. But it is liberating for the Net community.

Open Source Development

Most people who know anything about Linux know that the kernel – the core of the operating system, Linux itself really – is developed by Linus Torvalds and a large number of volunteers.

In a nutshell, Linus is the top dog, and the one responsible for guiding the overall process. Beneath him are people responsible for various kernel sections and even versions. One person might be in charge of maintaining a kernel through its production life cycle, such as Andrew Morton preparing to take care of kernel series 2.6. Others are in charge of various platforms (64-bit Sparc, Mac 68K, SGI, etc.). Yet more are in charge of subsystems, such as the layer that handles SCSI hardware operation. It's a sensible top-down approach that has grown from a need to manage a code base of everincreasing complexity in which both work and responsibility are divided among respected members of the community.

And yet, ultimately, anyone can get involved in the Linux kernel development process. You could, for example, assign someone at your company to function as a beta tester for the Linux kernel and the collection of Linux projects and products you use in your business. If having thousands of beta testers all over the world helps to produce top-notch software like we have in the Linux community, then making sure that your own people report problems you experience before taking a new kernel or tool version into a production environment increases the return on your Linux investment.

All those who want to contribute have to do is a bit of homework. A quick visit to the Linux Kernel Mailing List (LKML) FAQ at helps you understand the main kernel discussion list in all of its glory, and going to html teaches you how to effectively report bugs to the kernel maintainers. Even just testing the experimental kernel tree can be a great help, and you'll learn a ton along the way.

These are open source values. Everyone can contribute, even if they're not a programming guru. But there's a finer point to this as well. To really be helpful to many open source projects, you have to take the time to at least learn some rudimentary ways about their "system." Some have online forums, some have mailing lists, some are just a small Web presence with a single e-mail address where you can write the developer. It all depends on the size of the project and the audience.

The Linux kernel serves as an extreme example. Its mailing list alone is so busy that there are sites such as Kernel Traffic ( whose sole purpose is to summarize the information in a useful manner. On top of that, there are millions of users. Even if one-half of one percent of all Linux users sent bug reports to the list or directly to the various maintainers each day, that would be thousands of reports. Hence, a system. This also explains why blundering on without learning the system tends to get people grouchy responses.

Shared values are the glue that holds the open source community together. This is the single biggest thing that many journalists and skeptics still haven't grasped. It's not money, fame, or power. I'm not even entirely convinced it's all about the itch scratching we seem so fond of talking of in open source land, like everyone has fleas.

What are some more of the values that hold us together? Let me use an example to shed some light on the subject.

An Example: The Birth of ext3
Consider this once-contentious issue: adding a default journaling filesystem to Linux. Way back at the turn of the century (early 1999), Linus Torvalds and the gang were working on the 2.3 kernel series, on their way to kernel 2.4. Kernel list participant Alan Curry had been experiencing performance problems on a Linux server handling high traffic. He was able to trace this to a problem with two components: syslogd and fsync().

syslogd is the program that handles recording errors, accesses (such as a piece of mail being sent, or someone requesting a Web page), and more for the various services on many Linux systems. As you might imagine, on an ISP's e-mail server syslogd can grow quite busy. A feature called log rotation prevents individual log files from getting too huge by breaking them into pieces, and creating a new file each time the current file reaches a certain size. Since the files will add up infinitely if left alone, this feature also keeps only a set number of pieces around before either compressing them and farming them off for backup and deletion, or just outright deleting them. The system administrator can either set how often to do this, or put limits on how large to let the individual files grow.

Curry was able to determine that his problem hit whenever a particular log file grew huge, to approximately 36MB. At this stage, the syslogd program would consistently hang – it would stall and stop working – until the logfile was rotated and small once again. Tracing this issue further, he discovered that this was the fault of fsync(), the C programming function that ensures that the data in memory gets properly written into files.

The first suggestions from the kernel mailing list were all workarounds, things that various people would try on their own servers just to keep things moving along. One was to simply rotate the log files more often. That works, of course, but it's not really a solution. Others suggested another approach: disabling syslogd's use of fsync(). Of course, if you do that you may find after a system crash that there's vital data missing from your log files, so that's no good. Right?

Was fsync() needed? A patch was submitted, but the technical solution offered wasn't strong enough for transaction-oriented databases. Debate raged again, with Linus trying to push people toward simpler and simpler solutions rather than letting things get more complex, and therefore more likely to have problems. Extensions to the ext2 filesystem were proposed and Linus Torvalds said no, no, no, and again no.

While Torvalds is revered by many in the Linux community, he receives little special treatment on the kernel development list. Everyone involved in kernel development wants to do the best job possible, which means that discussions – or arguments, which is what this degenerated to for a bit – tend to happen with everyone as peers for the most part. Torvalds might have the last word, but that doesn't mean that people always let a topic drop if they think that there really is something to it.

Apparently, Stephen Tweedie had already started working on such extensions to ext2 in an attempt to quickly answer the need for a journaling filesystem in Linux – something that would definitely address the fsync() problem. This displeased Torvalds to no end, since he didn't want ext2 known as the everchanging filesystem, and pointing out that Tweedie was calling it ext3 did only a little to dull Torvalds' annoyance. Finally, in an exchange that would do an armchair psychologist proud, Alan Cox and Tweedie managed to help steer things to calmer waters.

Once there, the debate continued on just how far this journaling filesystem should go. These discussions – tense or otherwise – are one of the natural ways that innovation is constantly fostered in the Linux community. When the fact of a nextgeneration default filesystem was accepted, all of those little "wish lists" that lurk in the back of the mind started leaking out from all directions. Torvalds himself started this by outlining some of the immediate issues he would love to see dealt with, such as removing "." and ".." from the directory trees. In true open source developer fashion, that comment began a discussion about whether there were enough benefits or too many dangers in doing so.

Somehow, in all of this, the whole issue fell off the radar and folks must have left Tweedie to do his work in peace. Now he was aware of their concerns and wishes, and they simply must have trusted him to offer something to test and pound on when the time came. After all, it's one thing to talk about creating a journaled filesystem for Linux. It's another thing to do it.

ext3: A Work in Progress
A mere two weeks later, Thomas Pornin asked an innocent question about whether BSD-style softupdates were in the works for Linux. This brought up the issue of Tweedie's work on ext3 and an alreadyexisting solution called dtfs (now LinLogFS). A new filesystem permissions model somehow wormed its way into the discussion, sidetracking everything, and then in mid-1999 SGI announced that they were making a version of their own IRIX filesystem into an open source filesystem for Linux – XFS, which is a journaling filesystem.

Was Tweedie's work in vain? (Some would say that such projects are never in vain, since they often reveal issues that people might not otherwise have considered.) This would seem a great time for a cliffhanger, but everyone knows the answer. It was agreed that if XFS was placed under the GPL he might drop ext3. An SGI employee pointed out that XFS had to be partially rewritten to replace code that belonged to other people – and remove patent issues – so XFS wouldn't be ready to be placed into the open source domain any time soon. The folks at SGI didn't even know exactly which license they would choose yet. This put Tweedie's work back into the running, since no one was going to adopt a new default filesystem that wasn't actually written.

Once that furor died down, the fledgling ReiserFS became a serious contender. Timing issues prevented it from being included in the 2.3 kernel stream, and around a month later the issue of ext3 came up once again. By then, ext3 had attained the lofty status of release 0.0.1 with 0.0.2 on the way. Already, at this point, the only difference from a user's point of view between ext2 and ext3 was the journal file. Whether it would remain this way, Tweedie was still not sure.

Where are the values here? Well, for one thing, everyone was working on their own projects. No one committed to which would be the "winner" ahead of time. It might seem a bit backward to those from the commercial world of planning everything out and driving all of your resources into a single project, but in this type of environment, it's acknowledged that there are many valid means to achieving the same end. Filesystem theory is a complex issue. Today, we have journaling filesystems with various strengths and weaknesses to pick and choose from. Some handle tiny files best, some handle huge files best, some are that middle ground that's great in many circumstances.

When asked in January 2000 if LVM and filesystem journaling would be folded into kernel 2.4, whether with ext3 or ReiserFS, the general consensus on the kernel list was no. There were too many issues that needed to be ironed out before Torvalds and others felt ext3 was solid enough for production use. ReiserFS, however, was closer to reaching this point. Neither journaling filesystem ultimately made it into the initial 2.4 release – an interesting fact considering that ext3 is in such heavy use today.

ReiserFS did, however, make it into kernel 2.4.1 in 2001, mostly due to the fact that "of the journaling filesystems it's the only one I know of that is in major real production use already, and has been for some time," according to Torvalds.

XFS was also in heavy testing then, and so was ext3. However, Torvalds has a policy against just integrating anything and everything into the kernel. If a small group of people fully capable of patching the kernel themselves – or building the modules on their own – are the only people interested in a particular area (such as XFS in this case) then he chooses to wait until there is more demand. As far as ext3's demand went, Torvalds said, "I would expect ext3 to be the next filesystem to be integrated, but I would also expect that Red Hat will actually integrate it into their kernel first, and expect me to integrate it into the standard kernel only afterwards."

This little quirk of various distributions using slightly different kernels is another thing that confuses both new users and the businessfolk trying to track which version best suits them. These changes are made due to many factors, anything from developers or users requesting a particular nondefault feature, to a convenience for the distribution's own people. Innovation is continually fostered as the Linux distributions try to identify the very best tools that can help them solidify their positions against other distributions.

The key thing here, really, is that in Linux it is possible to exchange the core of your operating system for a different version. Anyone who doesn't like a distribution's specialized kernel can "simply" grab the source of the main kernel and build a replacement. It's actually not as hard as it sounds, though the process can be intimidating to newcomers.

ext3's Coming Out Party
In mid-2001, Andrew Morton (at the time, the kernel maintainer for ext2, ext3, and network drivers in 2.4) showed up as being involved in ext3. This fact signals that ext3 had been, in essence, escalated to the next level. His posts regarding ext3's status arrived around once a month, suggesting that ext3 was considered mature enough to be under serious consideration for merging into the kernel.

Then, by late September 2001, Morton released a test patch that integrated ext3 into kernel 2.4.09. This was very much a test for those who were brave enough to try it. Morton's announcement included, "This will soon be broken out into a separate patch to make ext3 suitable for submission for the mainstream kernel." In the next week, people started asking again when ext3 would be added, indicating the level of anticipation for those waiting for a journaling filesystem fully compatible with ext2.

Eventually, Alan Cox – the "next level up" maintainer – answered. "When the ext3 folk ask me to merge it," he said. His policy, it appeared, was not to merge patches into his test version of the kernel (known as the -ac tree) until the project's developers asked him to. Sometimes he can be overridden or will decide to make a special case, but typically the developers know exactly where they are when working with the code, and whether trying to merge it at the time would be a disaster or fairly smooth sailing.

So, people waited. Somewhere between then and October 8, 2001, Tweedie and his cohorts must have spoken up. On that day, ext3 was merged into Cox's version of the 2.4.10 kernel. This was the last major testbed. Many people testing new features that they desperately wanted or needed used an -ac kernel on various systems to try to shake out the bugs.

ext3 development still continued, of course. In early November 2001, Morton announced another significant ext3 update. People continued agitating for ext3 to be added in the next kernel version, and the next, and others asked Torvalds to wait until the current "big" problems – 2.4 was actually a pretty stable new release – with the 2.4 kernel were better ironed out.

The next issues that showed up are kind of odd and amusing, and while they aren't about values, are a demonstration of the strange things that can happen when a new technology is introduced. Red Hat added ext3 to 7.2 (as Torvalds predicted). Administrators using Red Hat 7.2 began making strange observations about the filesystem checker running on boot. The strange part was that this isn't necessary with ext3, nor was it the default behavior on a system using ext3. It turned out that, somehow, ext3 was not being properly enabled on those systems. People had been running ext2 all that time, instead. I'm sure this little gaffe was on developers' minds as ext3 came closer to being officially added.

By mid-November, ext3 reached Torvalds' own "test kernel," which means it was added into a "pre" version of the kernel. Using the kernel naming scheme, ext3 was officially added to kernel 2.4.15-pre2, which eventually became 2.4.15-final, which is the same as 2.4.15. There was one ext3 fix added in kernel 2.4.15-pre8, and then only two more tweaks to the fledgling kernel letter, and kernel 2.4.15 was released for production use on November 22, 2001. Of course, development of ext3 didn't stop there either. Since then, Access Control Lists (ACLs) have been incorporated into the filesystem, along with many more features and improvements.

(To give you an overall time line for how long it takes for even minor kernel versions to advance, the current Red Hat Linux beta [Severn] is [at the time of this writing] based on kernel 2.4.21.)

Organic, and Yet Organized
Throughout more than two years of work, many other features were added to the Linux kernel. Others were refined, and some were even removed. Kernel maintainers changed as well, according to both time constraints and interests. Even the process of posting new kernel versions was "upgraded." The team added ChangeLogs – files containing a list of the pertinent changes in each minor code update, including who made the changes – so that people can more easily track what in the heck is going on.

All of this happened in the midst of bug reports and fixes, discussions of the best way to approach upcoming requirements, and more. Ultimately, everything keeps moving. The Linux kernel grows and improves, and all of the bits and pieces find their way to where they need to be.

Ultimately, that is how open source development works. Bringing your own company into this process gives you a number of advantages. If you manufacture hardware, you can either assign someone to the Linux kernel team to produce the Linux drivers for your products, or you can give your product's specifications to someone from the driver community to build the drivers for you. Not only does this guarantee you that Linux users will consider your product, but it's great PR as well. Software companies can become involved in the Linux Standard Base (, develop their products to this specification, and have a Linux beta program to help the community feel involved in the product's development.

If there is one phrase that is true for the Linux and open source communities, it is this: You get out of it what you put into it. Work with the community, maybe even contribute some source code along the way, and you will experience not only a kind of product loyalty that just might astound you, but a stronger product offering as well.

Why Open Source Software / Free Software (OSS/FS, FLOSS, or FOSS)? Look at the Numbers!

1. Introduction

Open Source Software / Free Software (OSS/FS) (also abbreviated as FLOSS or FOSS) has risen to great prominence. Briefly, OSS/FS programs are programs whose licenses give users the freedom to run the program for any purpose, to study and modify the program, and to redistribute copies of either the original or modified program (without having to pay royalties to previous developers).

The goal of this paper is to convince you to consider using OSS/FS when you’re looking for software, using quantitive measures. Some sites provide a few anecdotes on why you should use OSS/FS, but for many that’s not enough information to justify using OSS/FS. Instead, this paper emphasizes quantitative measures (such as experiments and market studies) to justify why using OSS/FS products is in many circumstances a reasonable or even superior approach. I should note that while I find much to like about OSS/FS, I’m not a rabid advocate; I use both proprietary and OSS/FS products myself. Vendors of proprietary products often work hard to find numbers to support their claims; this page provides a useful antidote of hard figures to aid in comparing proprietary products to OSS/FS.

I believe that this paper has met its goal; others seem to think so too. The 2004 report of the California Performance Review, a report from the state of California, urges that “the state should more extensively consider use of open source software”, and specifically references this paper. A review at the Canadian Open Source Education and Research (CanOpenER) site stated “This is an excellent look at the some of the reasons why any organisation should consider the use of [OSS/FS]... [it] does a wonderful job of bringing the facts and figures of real usage comparisons and how the figures are arrived at. No FUD or paid for industry reports here, just the facts”. This paper been referenced by many other works, too. It’s my hope that you’ll find it useful as well.

The following subsections describe the paper’s scope, challenges in creating it, the paper’s terminology, and the bigger picture. This is followed by a description of the rest of the paper’s organization (listing the sections such as market share, reliability, performance, scalability, security, and total cost of ownership). Those who find this paper interesting may also be interested in the other documents available on David A. Wheeler’s personal home page.

1.1 Scope

As noted above, the goal of this paper is to convince you to consider using OSS/FS when you’re looking for software, using quantitive measures. Note that this paper’s goal is not to show that all OSS/FS is better than all proprietary software. Certainly, there are many who believe this is true from ethical, moral, or social grounds. It’s true that OSS/FS users have fundamental control and flexibility advantages, since they can modify and maintain their own software to their liking. And some countries perceive advantages to not being dependent on a sole-source company based in another country. However, no numbers could prove the broad claim that OSS/FS is always “better” (indeed you cannot reasonably use the term “better” until you determine what you mean by it). Instead, I’ll simply compare commonly-used OSS/FS software with commonly-used proprietary software, to show that at least in certain situations and by certain measures, some OSS/FS software is at least as good or better than its proprietary competition. Of course, some OSS/FS software is technically poor, just as some proprietary software is technically poor. And remember -- even very good software may not fit your specific needs. But although most people understand the need to compare proprietary products before using them, many people fail to even consider OSS/FS products, or they create policies that unnecessarily inhibit their use; those are errors this paper tries to correct.

This paper doesn’t describe how to evaluate particular OSS/FS programs; a companion paper describes how to evaluate OSS/FS programs. This paper also doesn’t explain how an organization would transition to an OSS/FS approach if one is selected. Other documents cover transition issues, such as The Interchange of Data between Adminisrations (IDA) Open Source Migration Guidelines (November 2003) and the German KBSt’s Open Source Migration Guide (July 2003) (though both are somewhat dated). Organizations can transition to OSS/FS in part or in stages, which for many is a more practical transition approach.

I’ll emphasize the operating system (OS) known as GNU/Linux (which many abbreviate as “Linux”), the Apache web server, the Mozilla Firefox web browser, and the office suite, since these are some of the most visible OSS/FS projects. I’ll also primarily compare OSS/FS software to Microsoft’s products (such as Windows and IIS), since Microsoft Windows has a significant market share and Microsoft is one of proprietary software’s strongest proponents. Note, however, that even Microsoft makes and uses OSS/FS themselves (they have even sold software using the GNU GPL license, as discussed below).

I’ll mention Unix systems as well, though the situation with Unix is more complex; today’s Unix systems include many OSS/FS components or software primarily derived from OSS/FS components. Thus, comparing proprietary Unix systems to OSS/FS systems (when examined as whole systems) is often not as clear-cut. This paper uses the term “Unix-like” to mean systems intentionally similar to Unix; both Unix and GNU/Linux are “Unix-like” systems. The most recent Apple Macintosh OS (MacOS OS X) presents the same kind of complications; older versions of MacOS were wholly proprietary, but Apple’s OS has been redesigned so that it’s now based on a Unix system with substantial contributions from OSS/FS programs. Indeed, Apple is now openly encouraging collaboration with OSS/FS developers.

1.2 Challenges

It’s a challenge to write any paper like this; measuring anything is always difficult, for example. Most of these figures are from other works, and it was difficult to find many of them. But there are two special challenges that you should be aware of: legal problems in publishing data, and dubious studies (typically those funded by a product vendor).

Many proprietary software product licenses include clauses that forbid public criticism of the product without the vendor’s permission. Obviously, there’s no reason that such permission would be granted if a review is negative -- such vendors can ensure that any negative comments are reduced and that harsh critiques, regardless of their truth, are never published. This significantly reduces the amount of information available for unbiased comparisons. Reviewers may choose to change their report so it can be published (omitting important negative information), or not report at all -- in fact, they might not even start the evaluation. Some laws, such as UCITA (a law in Maryland and Virginia), specifically enforce these clauses forbidding free speech, and in many other locations the law is unclear -- making researchers bear substantial legal risk that these clauses might be enforced. These legal risks have a chilling effect on researchers, and thus makes it much harder for customers to receive complete unbiased information. This is not merely a theoretical problem; these license clauses have already prevented some public critique, e.g., Cambridge researchers reported that they were forbidden to publish some of their benchmarked results of VMWare ESX Server and Connectix/Microsoft Virtual PC. Oracle has had such clauses for years. Hopefully these unwarranted restraints of free speech will be removed in the future. But in spite of these legal tactics to prevent disclosure of unbiased data, there is still some publicly available data, as this paper shows.

This paper omits or at least tries to warn about studies funded by a product’s vendor, which have a fundamentally damaging conflict of interest. Remember that vendor-sponsored studies are often rigged (no matter who the vendor is) to make the vendor look good instead of being fair comparisons. Todd Bishop’s January 27, 2004 article in the Seattle Post-Intelligencer Reporter discusses the serious problems when a vendor funds published research about itself. A study funder could directly pay someone and ask them to directly lie, but it’s not necessary; a smart study funder can produce the results they wish without, strictly speaking, lying. For example, a study funder can make sure that the evaluation carefully defines a specific environment or extremely narrow question that shows a positive trait of their product (ignoring other, probably more important factors), require an odd measurement process that happens show off their product, seek unqualified or unscrupulous reviewers who will create positive results (without careful controls or even without doing the work!), create an unfairly different environment between the compared products (and not say so or obfuscate the point), require the reporter to omit any especially negative results, or even fund a large number of different studies and only allow the positive reports to appear in public. (A song by Steve Taylor expresses these kinds of approaches eloquently: “They can state the facts while telling a lie”.)

This doesn’t mean that all vendor-funded studies are misleading, but many are, and there’s no way to be sure which studies (if any) are actually valid. For example, Microsoft’s “get the facts” campaign identifies many studies, but nearly every study is entirely vendor-funded, and I have no way to determine if any of them are valid. After a pair of vendor-funded studies were publicly lambasted, Forrester Research announced that it will no longer accept projects that involve paid-for, publicized product comparisons. One ad, based on a vendor-sponsored study, was found to be misleading by the UK Advertising Standards Authority (an independent, self-regulatory body), who formally adjudicated against the vendor. This example is important because the study was touted as being fair by an “independent” group, yet it was found unfair by an organization who examines advertisements; failing to meeting the standard for truth for an advertisement is a very low bar.

Steve Hamm’s BusinessWeek article “The Truth about Linux and Windows” (April 22, 2005) noted that far too many reports are simply funded by one side or another, and even when they say they aren’t, it’s difficult to take some seriously. In particular, he analyzed a report by the Yankee Group’s Laura DiDio, asking deeper questions about the data, and found many serious problems. His article explained why he just doesn’t “trust its conclusions” because “the work seems sloppy [and] not reliable” ( a Groklaw article also discussed these problems).

Many companies fund studies that place their products in a good light, not just Microsoft, and the concerns about vendor-funded studies apply equally to vendors of OSS/FS products. I’m independent; I have received no funding of any kind to write this paper, and I have no financial reason to prefer either OSS/FS or proprietary software. I recommend that you

This paper includes data over a series of years, not just the past year; all relevant data should be considered when making a decision, instead of arbitrarily ignoring older data. Note that the older data shows that OSS/FS has a history of many positive traits, as opposed to being a temporary phenomenon.

1.3 Terminology and Conventions

You can see more detailed explanation of the terms “open source software” and “Free Software”, as well as related information, in the appendix and my list of Open Source Software / Free Software (OSS/FS) references at Note that those who use the term “open source software” tend to emphasize technical advantages of such software (such as better reliability and security), while those who use the term “Free Software” tend to emphasize freedom from control by another and/or ethical issues. The opposite of OSS/FS is “closed” or “proprietary” software.

Other alternative terms for OSS/FS, besides either of those terms alone, include “libre software” (where libre means free as in freedom), “livre software” (same thing), free-libre / open-source software (FLOS software or FLOSS), open source / Free Software (OS/FS), free / open source software (FOSS or F/OSS), open-source software (indeed, “open-source” is often used as a general adjective), “freed software,” and even “public service software” (since often these software projects are designed to serve the public at large). I recommend the term “FLOSS” because it is easy to say and directly counters the problem that “free” is often misunderstood as “no cost”. However, since I began writing this document before the term “FLOSS” was coined, I have continued to use OSS/FS here.

Software that cannot be modified and redistributed without further limitation, but whose source code is visible (e.g., “source viewable” or “open box” software, including “shared source” and “community” licenses), is not considered here since such software don’t meet the definition of OSS/FS. OSS/FS is not “freeware”; freeware is usually defined as proprietary software given away without cost, and does not provide the basic OSS/FS rights to examine, modify, and redistribute the program’s source code.

A few writers still make the mistake of saying that OSS/FS is “non-commercial” or “public domain”, or they mistakenly contrast OSS/FS with “commercial” products. However, today many OSS/FS programs are commercial programs, supported by one or many for-profit companies, so this designation is quite wrong. Don’t make the mistake of thinking OSS/FS is equivalent to “non-commercial” software! Also, nearly all OSS/FS programs are not in the public domain. the term “public domain software” has a specific legal meaning -- software that has no copyright owner -- and that’s not true in most cases. In short, don’t use the terms “public domain” or “non-commercial” as synonyms for OSS/FS.

An OSS/FS program must be released under some license giving its users a certain set of rights; the most popular OSS/FS license is the GNU General Public License (GPL). All software released under the GPL is OSS/FS, but not all OSS/FS software uses the GPL; nevertheless, some people do inaccurately use the term “GPL software” when they mean OSS/FS software. Given the GPL’s dominance, however, it would be fair to say that any policy that discriminates against the GPL discriminates against OSS/FS.

This is a large paper, with many acronyms. A few of the most common acryonyms are:
Acronym Meaning
GNU GNU’s Not Unix (a project to create an OSS/FS operating system)
GPL GNU General Public License (the most common OSS/FS license)
OS, OSes Operating System, Operating Systems
OSS/FS Open Source Software/Free Software

This paper uses logical style quoting (as defined by Hart’s Rules and the Oxford Dictionary for Writers and Editors); quotations do not include extraneous punctuation.

Monday, February 19, 2007

Open source development gains ground

OPEN-SOURCE DATABASES are emerging as viable alternatives to costly proprietary databases and, in doing so, are filling a key slot in the open-source e-business development stack.

With vendors working to enhance performance, add new features, and fix bugs, analysts said that open-source databases are inching their way up to the level of proprietary databases in terms of features and functionality, albeit slowly.

Open source becomes a serious development alternative when combined with Linux running on commodity Intel servers, the database, the open-source development language PHP (personal home page), the Apache Web server, and the Sendmail e-mail server (see chart).

Momentum around open source in the Web-server arena is well-established. According to the now-famous results of NetCraft's latest survey, as of November 2000 nearly 60 percent of all respondents' Web sites ran on Apache, the open-source Web server.

The last several months have seen a number of developments in the open-source database space as well.

In early December, Great Bridge, in Norfolk, Va., announced a boxed version of PostgreSQL, and a number of services and support offerings to accompany the software. At $50,000 for the database and with the most comprehensive support package, Great Bridge's PostgreSQL is hardly free, but considering that proprietary databases easily can cost 10 times as much, it isn't exactly expensive, either. Great Bridge is merely the latest in a string of companies to support an open-source database.

Earlier this year, Borland turned its InterBase database over to the open-source community; Progress Software spun off its MySQL open-source database software and support into NuSphere; and AbriaSoft started offering support for MySQL as well.

While Nusphere and AbriaSoft duke it out in the MySQL trenches, PostgreSQL -- which has been in development for more than 25 years, beginning at the University of California at Berkeley -- is the open-source database most suited for enterprise-level use, according to analysts.

Great Bridge, in fact, ran the database through the Transaction Processing Council's TPC-C benchmark testing, and the results were similar to two vendors' technologies that the company would only describe as proprietary.

These benchmarks, however, tested the system against 100 users. Great Bridge's President and CEO Bob Gilbert acknowledged that the company still has to prove itself at the high-end, at least in terms of benchmarks.

Even the vendors said that open-source databases are not appropriate for every situation in which a proprietary database would be.

Despite the attractive initial price point, companies still need an IT infrastructure that supports the particular database. Otherwise, implementing an open-source solution of any kind is merely shifting the associated costs from one basket to another.

"If you reduce the overall licensing costs, in the long run, you'll come out ahead," said Bernie Mills, vice president of marketing at CollabNet, an online provider of collaborative development platforms based on open-source software.

Analysts said that the open-source databases are a perfect fit for the overall open-source e-business Web-development stack, but the software is less practical for companies that run other proprietary pieces.

Whether open-source databases are up to par with the proprietary brethren or not, companies are beginning to choose them instead of closed systems.

After spending six months studying alternatives and speaking with the bigwig vendors such as Oracle and IBM, Gazos Creek Group put its faith in AbriaSoft's incarnation of MySQL, according to David Lovering, a network applications engineer at the Fort Collins, Colo., provider of combined broadband, voice, data, and video services.

Singing the old Linux anthem, Lovering added that Gazos chose MySQL because of the breadth and price of tools for the database, as well as for the faster time getting new features added into the system. Lovering also said that scalability was a huge issue for Gazos, and he needed a database capable of supporting a flexible number of users.

"The open-source database can be scaled overnight," he added. "Provided that you don't violate the GPL [General Public License], you can essentially go from zero to 100,000s of users."

In the end, Bill Claybrook, an analyst at Aberdeen Group in Boston, said that most customers look at open-source databases because of the relatively low-priced ticket.

"If they weren't open source, I'd say they wouldn't have a chance," Claybrook said. "But the vendors can keep the price down, and that is important."