The Dark Ages (Pre-2004)
First, let's imagine a fairly simple switching application, such as a prepaid calling card service that's designed to handle up to 500 concurrent callers. People tend to stay on the phone for a long time when making long distance calls, so this isn't a very big system (systems that handled thousands of callers were commonplace). We won't use any extra-fancy stuff like conferencing or speech recognition, just basic IVR and call routing.
To build this system, we would have needed specialized computer telephony hardware or a programmable switch, such as Excel. Using standard PCs and computer telephony hardware, we would have spent about $100 to $200 per phone line, or about $50,000 to $100,000, not including the servers. These boards provided very crude APIs for controlling the hardware, so most developers would use middleware, such as Pronexus or Visual Voice, to program the boards in higher-level languages. This typically added another $100 per port. Next, we'd need to provision 20 T1s for the service. These typically cost $1000 to install, and about $500 per month, not including toll charges. So, before you've written a single line of code, you've had to allocate close to $200,000 just to cover hardware, installation, and the first few months of operations. Because you had to use specialized hardware, it was also not possible to use rent-a-server operations like Server Beach.
Now let's say you want to add enhanced services such as conferencing or especially, speech recognition. Then you would have increased this figured by two to ten times, making this a million-dollar investment just to build a medium-sized system. Of course, you could start small and grow, but I am assuming that you were trying to build a system for scale, not an experimental hobby project. If you had been building a web service to handle 500 concurrent users, you could have done that with one or two off-the-shelf servers for 1/100th the cost.
Standards-Based Telephony and VoIP
The widespread adoption of SIP, and especially the release of a stable version of Asterisk, changed everything. It is now possible to develop telecom services in much the same way as web services, with a few important differences.
Using Asterisk, Freeswitch, and HMP (host media processing) services from leading telecom hardware vendors (including Intel and NMS), you can run a telecom app as a service on standard Intel servers. This radically simplifies things for developers, because you no longer need to build and install your own boxes. You can just provision servers on the fly at Server Beach, Rackspace, or a similar provider, and install these services and your apps remotely. This reduces data center costs by roughly tenfold.
With Asterisk and Freeswitch, you eliminate all of the per-port costs associated with specialized telecom hardware. With commercial HMP packages, per-port costs are reduced, by roughly 80 percent, to somewhere in the neighborhood of $25 per port unless you are using enhanced services like speech recognition.
With VoIP origination services, such as Voxbone, you eliminate all of your local loop costs. Companies like Voxbone offer inbound call transport for a flat per-port fee, usually $10-20 per port, with reasonable installation fees. It's not free, but it's way cheaper than local loop T1, and you can trunk calls into your system from over 50 countries worldwide, with no toll charges (something that was simply not possible a few years ago).
Let's revisit that 500-port switching application that would have required a six-figure investment a couple of years ago. Using open standards telephony and VoIP services, you could now prototype the same system for much less. Using a conservative assumption of 100 concurrent VoIP calls per Asterisk server, you would need to buy or rent five standard Intel servers, at a cost of about $2000 per month at a rented data center, or about $10,000 if bought outright. Since VoIP service can be provisioned almost instantly, you would be able to prototype your system with a small amount of ports, and then turn on the full 500 ports shortly before your system goes live. This eliminates another steep up-front cost when building larger-scale telecom services. The bottom line: what would have cost $200,000 to $1,000,000 to prototype now requires a much more modest investment, probably about $10,000 to $20,000, in hardware or data center costs.
Ongoing operations costs are slightly lower with VoIP than local loop circuits, depending on the country and region where your service will operate. Voxbone charges about $20/month per port, which is about the same as the per-port cost for a local loop T1 or ISDN line, but with one big difference: you can be trunking calls in from all over the world to one data center, something you could not do with conventional local loop service. Moreover, virtualizing the service eliminates the need for the T1 to be hardwired into an equipment cage, which reduces a lot of other operations costs (e.g., colo rental, on-site technicians to troubleshoot hardware, etc.). Rapid provisioning also enables you to scale on demand, so you don't make the mistake of ordering 1000 ports when you only need 200. All of these things combined reduce the cost of prototyping a medium-sized telecom app from six to seven figures to the low- to mid-five figures, a big reduction. It's still not so cheap that a couple of college students can build an app on their own dime, but the barrier to entry is a lot lower now than it ever has been before.
This, of course, does not factor in the cost of development. That, unfortunately, has not changed very much. Closed telecom systems provided a low-level C API that, while difficult to master, was well designed and logically mapped functions (e.g. play a WAV file to a caller) to tasks that typical systems needed to perform. Open source telecom environments, especially Asterisk, have actually made a leap backwards in terms of APIs. It pains me to say this, because I think Asterisk is otherwise a great platform, but its AGI interface (a CGI-like interface for building custom applications) is one of the least well designed and generally broken interfaces I've seen in telecom. This makes otherwise simple tasks (like playing a speech file while doing something in the background) a lot more complicated than they have to be. I hope Digium uses some of their newly gained VC funding to create a top-drawer API for custom application development.
Asterisk should go look at the Intel/Dialogic API and copy it function for function; then they'd have an interface that is more appropriate for developers. In any case, my point is that programming phone systems is more difficult than it needs to be. There are few rapid development tools out there, and those that exist today are either poorly supported or require licensing fees. You'll need to factor in a lot of developer time. I can't say how much because I don't know how simple or complicated your app is, but that is the one cost that has not decreased by a similar magnitude. Fortunately, if you're the founding team of a new company, you can work for ramen noodles for a few months to keep costs under control. Not that that is fun, but if you're not funded, you have to do what you have to.
Radio Handi
My current project (Radio Handi, a group communication and broadcasting platform) is a case in point. We developed a system that enables people to do flat-rate or free conference calls from over 30 countries, and worldwide via VoIP. The system integrates group voice (voice mail and conferencing), email listserv, and SMS communication, and interoperates with many different services, such as Gizmo and PhoneGnome. It cost under a million dollars to build, including developer time, and is a local call in most of the developed world.
If we had built a system with the today's features in Radio Handi several years ago, the resulting platform would have cost several million dollars to build. We would have needed a programmable switch with dedicated VoIP interface hardware. We would have needed leased T1s to provide local origination in remote markets. We would have needed a hardware-based conference bridge. I figure it would have ended up costing somewhere between $2 to $5 million, just for the hardware and facilities. Some aspects of the system, such as local access from 30+ countries or interoperability with SIP-based VoIP services (they didn't exist a few years ago), simply would not have been possible. Even if we had built it, the operations costs, specifically leased T1 lines, would have sunk the business or required us to charge steep per-minute fees that would have dramatically undermined its utility. So I don't think it would have been possible to build today's service just two or three years ago; things have changed that quickly.
Now is a great time for developers to experiment with telecom services. One I just spotted recently is Frucall. It enables you to call in and spot-check prices on goods using scan codes. It's not a new idea; several companies have tried some version of this before. But today the convergence of web services and cheap VoIP infrastructure will make services like this cheaper to build, more economically viable, and more likely to succeed. I remember that back in the mid-1990s, people had great expectations for telephone interfaces to things like product pricing databases. All of the ideas that were tried then made sense, but the expense of building and running them killed all but a handful (like MovieFone). Now the economics are getting closer to those of web services, which will enable smaller entrepreneurs to go after niche opportunities without having to raise millions just to build a prototype.
This is an important shift because the low cost of prototyping enables entrepreneurs to use the prototype as a form of real-world market research. If you have to commit several million dollars to build a prototype, you have to be sure there is a market big enough to support that investment. The problem is, you often don't know the real size of the market until you have a product to sell, and by then it's too late. With cheap prototyping, you can risk a five-digit investment, get some quick real-world feedback to confirm or refute your assumptions, and change course if you don't like what the market is saying.
That's a big deal because many successful web products (Flickr is one prominent example) started their lives as one thing before morphing into something altogether different.