Prerequesites: You should be comfortable with both HTML and JavaScript.
WebRTC is a technology that allows two peers to communicate peer-to-peer, that is to say, to exchange data that doesn't go through a server. It includes the ability to capture, transmit, receive, and play audio/video data, well as text and raw byte arrays. It's audio/video transmission is optimized to reduce latency (the delay between when audio/video is captured on end and played on the other end), which means its better for applications like video conferencing than watching movies.
WebRTC is currently supported by several brands of browsers (Chrome, Firefox, Opera, maybe Edge, but not Safari yet). It is also possible to write native WebRTC clients, but that is a more advanced topic than covered in this document.
The crucial thing to realize is that peer-to-peer doesn't mean that no servers are involved, it just means that the regular data doesn't go through them. At the bare minimum, a server is still needed for two clients to exchange some basic information (where am I on the network, what codecs do I support) so that they can establish that peer-to-peer connection. The information used to establish the peer to peer connection is called signaling, and the server is called a signaling server.
WebRTC makes no specification of what you use for a signalling server or what protocol it uses. Websockets is the most common, but people have also used long polling and even email (twitter is possible but more painful due to the character length limitation).
When your product is in development, running your own server locally is very convenient; you can restart it whenever you want so that its at a known state, it's easier to check the logs, you can customize it's behavior, and it doesn't cost you anything. However, when your product moves into production you'll eventually be more concerned with robustness and scalablity, which are full time jobs in their own right.
EasyRTC OpenSource is:
Together, these two components will let you write a simple video conferencing application, or a file sharing application, etc in just a few lines of simple code.
Web Browser
Your EasyRTC Application Code
EasyRTC Client Libraries
EasyRTC
EasyApp
EasyRTC_Rates
EasyRTC_Recorder
Socket.io
WebRTC
EasyRTC Server
Another EasyRTC Client
Installing an EasyRTC server on your workstation (be it a desktop or laptop) is the quick and easy way to get started. It will let you run your EasyRTC clients on the same workstation without having to fuss with SSL certificates. It will be less useful for testing your application with clients running other devices; WebRTC clients running on Chrome, can't access cameras or microphones unless they are hosted from SSL servers or on the same machine (it's a browser security constraint).
The easy way to do this is:
git clone git@github.com:priologic/easyrtc.git
cd easyrtc
npm install
cd server_example
npm install
Assuming you haven't gotten any nasty errors, you should be able to start the seserver by issuing the following command (while in the server_example directory):
node server.js
By default, the server serves demo applications on port 8080, so if you run a compatible browser on the same machine, and point it at localhost:8080, it should take you to the demos page. Pick a demo to run (the simple audio video demo is a good choice). Open a new tab and point it at the same URL to have a second instance of the demo so you can try the two way communications.
Warning: if you run two instances of the same webrtc application in the same physical room without using earphones, each will pick up the audio output of the other, resulting in a shrill feedback squeal. If there is anybody else in the same room, consider muting your speakers first.
Note: the simple audio video demo displays the output of the local camera (it's a good reminder so you don't do anything uncouth while in a conference) and the output of a potentially remote camera. When you are running everything on the same machine you'll notice that one display is mirrored compared to the other. If you are wondering why this is done, try the following:
Video track: An object representing the output of a single camera or equivalent device
Audio track: An object representing the output of a single microphone or equivalent device
Media Stream: An object that serves as a package to hold one or more video tracks and/or one or more audio tracks.
Data channel: An object used to send strings or binary data from one peer to another peer.
Peer connection: An object that serves as portal between two peers, allowing media streams and data channels to be shared across a network.
Easyrtcid: An id for a peer that is unique across an EasyRTC server.
Call: As a verb, the act of establishing a peer connection with another peer so that media streams or data can be sent. As a noun, the state of having a working peer connection to another peer.
Our simple browser app is going to follow the model of the simple audio video demo, which has two parts: an HTML file that defines some HTMLVideoObjects to display media streams and some buttons to initiate calls, and a JavaScript file for the program logic.
The HTML will look like this:
<!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>MyFirst app</title> <link rel="stylesheet" type="text/css" href="/easyrtc/easyrtc.css" /> <script src="/socket.io/socket.io.js"></script> <script type="text/javascript" src="/easyrtc/easyrtc.js"></script> <script type="text/javascript" src="MyFirstApp.js"></script> <style> video { width:320px; height:240px; } .divHolder { position:relative; float:left; background-color: blue; margin: 1px; } </style> </head> <body> <button onclick = "connect();"> Connect to server</button> <div id="otherClients">Peers:</div> <div> <div class="divHolder"> <video autoplay="autoplay" class="easyrtcMirror" id="selfVideo" muted="muted" volume="0" ></video> </div> <div class="divHolder"> <video autoplay="autoplay" id="callerVideo1"></video> </div> <div class="divHolder"> <video autoplay="autoplay" id="callerVideo2"></video> </div> <div class="divHolder"> <video autoplay="autoplay" id="callerVideo3"></video> </div> </div> </body> </html>
The first thing you'll notice is the reference to /easyrtc/easyrtc.css and /easyrtc/easyrtc.js. The easyrtc.css and easyrtc.js files are actually in the api folder, but the server is set up to map /easyrtc to the api folder.
The easyrtc.js file contains the basic Easyrtc methods and the Easyrtc_App methods. The easyrtc.css file contains a css definition for making a video mirrored and for positioning a "close call" icon above on the videos. The socket.io.js file provides websocket communication with the server. The MyFirstAp.js file is where our application specific JavaScript code will reside.
The connect method will be used to start the connection to the server.
The otherClients div will be used to hold buttons to call the other peers.
The divs that surrond each of the video objects are part of a mechanism that positions the "close call" icon on the contained video object.
The JavaScript we will need is below.
function connect() { easyrtc.setRoomOccupantListener(convertListToButtons); easyrtc.easyApp("easyrtc.audioVideoSimple", "selfVideo", ["callerVideo1", "callerVideo2", "callerVideo3"], loginSuccess, loginFailure); } function convertListToButtons (roomName, data, isPrimary) { var otherClientDiv = document.getElementById('otherClients'); otherClientDiv.innerHTML = ""; for(var easyrtcid in data) { var button = document.createElement('button'); button.onclick = function(easyrtcid) { return function() { performCall(easyrtcid); }; }(easyrtcid); var label = document.createTextNode(easyrtcid); button.appendChild(label); otherClientDiv.appendChild(button); } } function performCall(otherEasyrtcid) { var successCB = function() {}; var failureCB = function() {}; easyrtc.call(otherEasyrtcid, successCB, failureCB); } function loginSuccess(easyrtcid) { easyrtc.showError("none", "Successfully connected"); } function loginFailure(errorCode, message) { easyrtc.showError(errorCode, message); }
The connect method does two things:
The convertListToButtons function builds a button for each peer known by the server. Each button has a callback that generates a call to a particular peer.
The performCall method takes a peer's easyrtcid and requests a call to it. When the call is established and a remote media stream is received from the peer, a callback (implicitly set up by the easyApp method) will attach that media stream to the first available video object. The easyApp method we used to initialize the connection sets up callbacks; one of the callbacks attaches a media stream received from the peer to a free video object, another callback clears a video object when the associated media stream ends.
This application just scratches on the surface of the EasyRTC API.
There are three parts to hosting your application on a server.
To change the port that the applications serves on, open your server_example/server.js file in an editor. Look for a line that reads:
var webServer = http.createServer(app).listen(8080);
This line is determining what port your server is reached at. Change the 8080 to some other port number that isn't already being used. Be aware that on linux based systems, the lower numbered ports (< 1024) can normally only be used by a service that is run with root level permissions, so sticking with a higher numbered port is a good idea.
As mentioned earlier, apps running on Chrome browsers can't access local cameras and microphones unless the application is hosted from localhost or an SSL server (https). When you are doing development, it is simplest to get node.js to handle the SSL. You can find detailed instructions on the EasyRTC: Using SSL page.
For production however, you should look at using NGINX, HAProxy, or a similar tool. These tools will map a low numbered port (like 443, the default https port) to a high-numbered port while handling the SSL encryption so your server code just sees regular non-SSL traffic coming in from a high numbered port. Our experience is that they offer much greater performance and reduced latency compared to handling the SSL inside the easyrtc server itself.
WebRTC is advertised as being peer-to-peer, so the first question a new comer to the technology has is, "if it's peer-to-peer, why do I still need servers?"
In an ideal world (from the perspective of implementing peer-to-peer applications), all our devices would have unique addresses that could be reached from anywhere on the internet. Of course, the real world is far from ideal thanks to routers and firewalls that provide NAT and symmetric NAT behavior.
NAT is an abbreviation for Network Address Translation. What it basically means is that devices behind a router (that provides NAT) all appear to have the same IP address. When a device inside the network sends a packet to a device behind the router, its IP address is replaced by the routers external address, and its port is replaced by a unique port on the router (allocated for just that dialog). When the device in front of the router sends a packet back, the router looks up the port number to figure out original device's address and relays the packet onto it. What this means is that a device in front of the router can't initiate a dialog with a device behind the router, which implies, two devices that are behind different routers can't initiate dialogs with each other at all (since neither can initiate the dialog). This is where a STUN server becomes useful.
STUN servers take advantage of something called port busting. When a device behind a NAT enforcing router contacts a device in front of the router, a hole is opened for the allocated port on the router, and the router doesn't check who (in front of the router) is sending packets through that hole. The way it works is: each devices contacts a STUN server, which opens up a hole in their respective routers and tells them the location of hole (IP address and port) as seen by the outside world. Using a signaling service, each device tells the other device the address of the hole it's opened. After that communication can proceed in a true peer-to-peer manner without further reliance on the STUN server.
While consumer owned devices tend to be behind routers with NAT behavior, but corporations lock down their networks much more, enforcing a policy called symmetrical NAT. Symmetrical NAT means that a router only allows an outside device send packets back through a router hole if it was the original device that was contacted to create the hole. If two devices are behind different routers that both have symmetrical NAT behavior, STUN servers are useless and you need a proxy service (that both devices can reach) to relay packets between the two of them. This is call a TURN service. Because it has to relay every packet between the two devices. In theory, only %5 of all calls will need the use of a TURN server, but that is generalization for WebRTC in general and the actual percentage is going to reflect what target audiences a particular WebRTC application serves.
In theory, you can make use of public STUN servers for free. STUN servers only need to send a few packets to help initiate any particular call, so they are very cheap to run. However, public STUN servers aren't always there when you need them, they come and go. Having your own means you have a stable set of servers whose addresses don't change.
TURN servers, being bandwidth intensive, are not available for free, except as a "freemium" service with limitations that prevent you from using it for production. If your target clients are inside businesses, government, or education institutions, you are going to need more turn servers than if your target clients are consumers connecting from their homes.
If your peer-to-peer traffic is staying inside a particular organization, a few STUN servers at various levels of the network may be all you need. In the more general case, you want TURN servers located someplace that minimizes the packet travel times between the peers and the TURN server. For example, if you are running service on people in Australia, you'd want to use a TURN service hosted in Australia instead of the United States.
There are two possible directions you can go with TURN services; run your own or rent somebody elses. In choosing, you'll need to consider pricing structure, scalability, ease of integration, locality to your clients, and how much of your own time you want to devote to setting up and running the service.
In the "run your own" camp, the popular solution is called CoTURN which can be found here.
In the "rent somebody elses" camp, there is a lot to choose from. Below is a partial list of the bigger names.
Priologic Software is also planning to offer a STUN/TURN service that will have advantage of being trivial to integrate with your EasyRTC Server and very competitive pricing. See EasyRTC STUN / TURN Service for more details.
This is covered in detail in this document: EasyRTC Server: ICE Configuration.
The EasyRTC Open offering has scalability limitations in two different areas: the number of peers that can be in a video conference together, and the number of peers that can connect to the server.
Any client using a purely peer-to-peer connection is limited in the number of media streams it can send and receive. That limit depends on the the processing power of the hardware the client is running on, and the upload speed of its connections to other peers. Typically, the later is encountered well before the former. We've seen a 640x480 video successfully shared to 14 peers when all parties are on the same local network, but struggle with 3 peers when the peers were spread across the internet. The user observed effect is that the frame rate gets low, the picture gets grainy and pixelated, sound gets choppy, and connections can get dropped.
There are two solutions for this general problem: Media Servers and Selective Forwarding Units (SFU's). The idea of a media server is that everybody sends their local media to the media server, and in return the media server sends everybody a single composite media stream. An SFU also receives the peers local media streams, but instead of combining them, it relays the received media streams to the other parties.
EasyRTC doesn't currently support SFU's and Media Servers.
Currently, the EasyRTC server runs on a single Node.js instance. That means it is limited by the amount of memory and the number of ports the node.js instance has access to. It can also only send out and receive messages so fast. The number of clients it can support can range from hundreds to thousands, depending on how much demand your application puts on the server. This also means the EasyRTC server isn't as robust as you want for a production system. If the host running the server process goes down, the service fails.
The upcoming PlatformRTC PAAS has a wrapper that will let most EasyRTC clients run with minimal (if any) changes, while providing scalibilty and robustness through a distributed system. See EasyRTC Enterprise link for more details.
There are also plans to add MediaServer and/or SFU support to the PlatformRTC service.