SipThorDescription

Version 92 (Adrian Georgescu, 06/04/2013 09:26 pm)

1 84 Adrian Georgescu
h1. SIP Thor
2 78 Adrian Georgescu
3 33 Adrian Georgescu
4 84 Adrian Georgescu
SIP Thor provides scalability, load-sharing  and resilience for [[MSPDescription|Multimedia Service Platform]]. The software is mature and stable, having several years in production environments with a good track record. Based on previous experiences, it takes between 6 to 12 weeks to put in service a SIP service infrastructure based on it.
5 1 Adrian Georgescu
6 84 Adrian Georgescu
SIP Thor platform is using the same software components for the interfaces with the end-user SIP devices, namely the SIP Proxy, Media relay and XCAP server used by _Multimedia Service Platform_ but it implements a different system architecture for them by using Peer-To-Peer concepts.
7 1 Adrian Georgescu
8 85 Tijmen de Mes
!{width:500px}http://www.ag-projects.com/images/stories/ag_images/thor-platform-big.png!
9 1 Adrian Georgescu
10 84 Adrian Georgescu
11 84 Adrian Georgescu
h2. Architecture
12 84 Adrian Georgescu
13 84 Adrian Georgescu
14 84 Adrian Georgescu
To implement its functions, SIP Thor introduces several new components to Multimedia Service Platform. SIP Thor creates a self-organizing peer-to-peer overlay of several logical network entities called *roles* installed on multiple physical machines called *nodes*. 
15 84 Adrian Georgescu
16 84 Adrian Georgescu
Each node can be configured to run one or multiple roles. Typical example of such roles are *sip_proxy* and *media_relay*. Nodes that advertise such capabilities, will handle the load associated with the SIP and RTP traffic respectively and will inherit the built-in resilience and load distribution provided by SIP Thor design.
17 84 Adrian Georgescu
18 1 Adrian Georgescu
SIP Thor operates at IP layer and the nodes can be installed at different IP locations, like different data centers, cities or countries. The sum of all nodes provide a consolidated single logical platform. 
19 1 Adrian Georgescu
20 1 Adrian Georgescu
21 84 Adrian Georgescu
h2. NAT Traversal
22 35 Adrian Georgescu
23 1 Adrian Georgescu
24 84 Adrian Georgescu
The platform provides a fail-proof NAT traversal solution that impose no requirements in the SIP clients by using a reverse-outbound technique for SIP signaling and geographically distributed relay function for RTP media streams. Based on configured policy in the nodes, "ICE ":http://mediaproxy-ng.org/wiki/ICE is supported in the end-points and selection of a media relay can be done by taking into consideration geographical location of the calling party.
25 1 Adrian Georgescu
26 84 Adrian Georgescu
27 84 Adrian Georgescu
h2. References
28 84 Adrian Georgescu
29 84 Adrian Georgescu
30 84 Adrian Georgescu
The closest reference of a standard related to what SIP Thor implements is the "Self-organizing SIP Proxy farm":http://tools.ietf.org/html/draft-bryan-p2psip-usecases-00#section-3.4.2 described in 2007 by the original P2P use cases draft produced by IETF "P2PSIP Working Group":http://www.ietf.org/dyn/wg/charter/p2psip-charter.html. SIP Thor started development during early 2005, for this reason the software uses a slight variation of the terminology used later by the P2PSIP Working Group.
31 84 Adrian Georgescu
32 1 Adrian Georgescu
SIP Thor particular design and implementation has been explored in several white-papers and conferences:
33 1 Adrian Georgescu
34 84 Adrian Georgescu
* "Addressing survivability and scalability of SIP networks by using Peer-to-Peer protocols":http://www.sipcenter.com/sip.nsf/html/AG+P2P+SIP published by *SIP Center* in September 2005
35 84 Adrian Georgescu
* "Building scalable SIP networks":http://ag-projects.com/docs/Present/20060518-ScalableSIP.pdf presented by Adrian Georgescu at *VON Conference* held Stockholm in May 2006 
36 84 Adrian Georgescu
* "Solving IMS problems using P2P technology":http://ag-projects.com/docs/Present/20061004-IMSP2P.pdf presented by Adrian Georgescu at *Telecom Signalling World* held in London in October 2006
37 84 Adrian Georgescu
* "Overview of P2P SIP Principles and Technologies":http://ag-projects.com/docs/Present/20070227-P2PSIP.pdf presented by Dan Pascu at *International SIP Conference* held in Paris in January 2007 
38 84 Adrian Georgescu
* "P2PSIP and the IMS: Can they complement each other?":http://www.imsforum.org/search/imsforum/p2p published by *IMS forum* June 2008 -  online accessible "here":http://www.ag-projects.com/content/view/519/176/
39 1 Adrian Georgescu
40 1 Adrian Georgescu
41 84 Adrian Georgescu
h2. P2P Design
42 84 Adrian Georgescu
43 84 Adrian Georgescu
44 31 Adrian Georgescu
SIP Thor is designed around the concept of a peer-to-peer overlay with equal peers. The overlay is a flat level logical network that handles multiple roles. Peers are dedicated servers with good IP connectivity and low churn rate and are part of an infrastructure managed by a service provider. The software design and implementation has been fine-tuned for this scope and differs to some degree from other classic implementations of P2P overlays that are typically run by transitive end-points.
45 75 Adrian Georgescu
46 1 Adrian Georgescu
The nodes interface with native SIP clients that are unaware of the overlay logic employed by the servers. Internally to the SIP Thor network, the lookup of a resource (a node that handles a certain subscriber for a given role at the moment of the query) is a one step lookup in a hash table.
47 21 Adrian Georgescu
48 84 Adrian Georgescu
The hash table is an address space with integers arranged on a circle, nodes and SIP addresses map to integers in this space. This concept can be found in classic DHT implementations like "Chord":http://en.wikipedia.org/wiki/Chord_(DHT). Join and leave primitives take care for the addition and removal of nodes in the overlay in a self-organizing fashion.
49 53 Adrian Georgescu
50 1 Adrian Georgescu
51 84 Adrian Georgescu
h2. Security
52 84 Adrian Georgescu
53 84 Adrian Georgescu
54 12 Adrian Georgescu
Communication between SIP Thor nodes is encrypted by using Transport Level Security (TLS). Each node part of the SIP Thor network is uniquely identified by a X.509 certificate. The certificates are  signed by a Certificate Authority managed by the service provider and can be revoked as necessary for example when a node has been compromised.
55 16 Adrian Georgescu
56 1 Adrian Georgescu
The X.509 certificate and its attributes are used for authentication and authorization of the nodes when they exchange overlay messages over the SIP Thor network. 
57 1 Adrian Georgescu
58 1 Adrian Georgescu
59 84 Adrian Georgescu
h2. Scalability
60 84 Adrian Georgescu
61 84 Adrian Georgescu
62 1 Adrian Georgescu
Because by scope, the number of peers in the overlay is fairly limited (tens to hundreds of nodes in practice), there is no need for a Chord-like finger table, iterative or recursive queries. The overlay lookup type is one hop, referred as O(1) in classic P2P terminology and SIP Thor's implementation handles up to half a million queries per second on a typically server processor, which is several orders of magnitude higher than what is expected in normal operations.
63 1 Adrian Georgescu
64 1 Adrian Georgescu
Thanks to the single hop lookup mechanism, SIP call flows over the SIP Thor overlay involves a maximum of two nodes, regardless of the number of nodes, subscribers or devices handled by the SIP Thor network. Shall SIP devices be 'SIP Thor aware' and able to perform lookups in the overlay themselves, this could greatly improve the overal efficiency of the system as less SIP traffic and less queries will be generated inside the SIP Thor network. A publicly reachable lookup interface is exposed over a TCP socket by each node using a simple query syntax.
65 5 Adrian Georgescu
66 53 Adrian Georgescu
The current implementation allows SIP Thor to grow to accomodate thousands of physical nodes, which can handle the traffic of any size for a real-time communication service deployable in the real world today (e.g. if the SIP server node implementation can handle one hundred thousand subscribers then 100 nodes (roughly the equivalent of three 19 inch racks of data center equipment) are required to handle a base of 10 million subscribers. 
67 17 Adrian Georgescu
68 1 Adrian Georgescu
The service scalability is in reality limited by the performance of accounting sub-system used by the operator or by the presence of centralized functions like prepaid. If the accounting functions are performed outside SIP Thor, for instance in external gateway system, there is no hard limitation in how much the overlay can really scale.
69 1 Adrian Georgescu
70 1 Adrian Georgescu
71 84 Adrian Georgescu
h2. Load Sharing
72 29 Adrian Georgescu
73 84 Adrian Georgescu
74 84 Adrian Georgescu
SIP Thor is designed to equally share the traffic between all available nodes. This is done by returning to the SIP clients that use standard RFC 3263 style lookups, a random and limited slice of the DNS records that point to actual live nodes that perform the SIP server role. DNS records are managed internally by a special role *thor-dns* on multiple nodes assigned as DNS servers in the network. This simple DNS query/response mechanism achieves a near perfect distribution without introducing any intermediate load balancer or latency. Internally to SIP Thor, similar principle is used for load balancing internal functions like XCAP queries or SOAP/XML provisioning requests.
75 84 Adrian Georgescu
76 73 Adrian Georgescu
For functions driven internally by SIP Thor, for instance the reservation of a media relay for a SIP session, other selection techniques could be potentially applied for instance selecting a candidate based on geographic proximity to the calling party to minimize round trip time. Though captured in the initial design, such techniques have not been implemented because no customers demanded them.
77 73 Adrian Georgescu
78 1 Adrian Georgescu
By using a virtualization technique, the peer-to-peer network is able to function with a minimum number of nodes while still achieving fair equal distribution of load when using at least three physical servers.
79 73 Adrian Georgescu
80 1 Adrian Georgescu
81 84 Adrian Georgescu
h2. Zero Configuration
82 84 Adrian Georgescu
83 84 Adrian Georgescu
84 1 Adrian Georgescu
There is no need to configure anything in the SIP Thor network for supporting the addition of a new node besides starting it with the right X.509 certificate.
85 1 Adrian Georgescu
86 1 Adrian Georgescu
87 84 Adrian Georgescu
h2. Failover
88 84 Adrian Georgescu
89 84 Adrian Georgescu
90 1 Adrian Georgescu
SIP Thor is designed to automatically recover from disasters like network connectivity loss, server failures or denial of service attacks. On node failure, all requests handled by the faulty node are automatically distributed to surviving nodes without any human intervention. When the failed node becomes available, it takes back its place in the network without any manual interaction. 
91 1 Adrian Georgescu
92 1 Adrian Georgescu
The logic of all active and signaling active components inherit this failover property from SIP Thor.
93 1 Adrian Georgescu
94 1 Adrian Georgescu
95 84 Adrian Georgescu
h2. Thor Event Server
96 84 Adrian Georgescu
97 84 Adrian Georgescu
98 84 Adrian Georgescu
*thor-eventserver* is an event server, which is the core of the messaging system that is used by the SIP Thor network to implement communication between the network
99 1 Adrian Georgescu
members.  The  messaging  system  is based on publish/subscribe messages that are exchanged between network members.  Each entity in the network publishes its own
100 1 Adrian Georgescu
capabilities and status for whomever is interested in the information. At the same time each entity may subscribe to certain types of information which  is  published by the other network members based on the entity's functionality in the network.
101 1 Adrian Georgescu
102 1 Adrian Georgescu
Multiple event servers can be run as part of a SIP Thor network (on different systems, that are preferably in different hosting facilities) which will improve the
103 1 Adrian Georgescu
redundancy of the SIP Thor network and its resilience in the face of network/system failures, at the expense of linearly increasing the  messaging  traffic  with
104 1 Adrian Georgescu
the number of the network members. It is recommended to run at least 3 event servers in a given SIP Thor network.
105 1 Adrian Georgescu
106 1 Adrian Georgescu
107 84 Adrian Georgescu
h2. Thor Manager
108 1 Adrian Georgescu
109 84 Adrian Georgescu
110 84 Adrian Georgescu
*thor-manager* is the SIP Thor network manager, which has the role of maintaining the consistency of the SIP Thor network as members join and leave the network. The manager  will  publish  the  SIP  Thor  network status regularly, or as events occur to inform all network members of the current network status, allowing them to adjust their internal state as the network changes.
111 84 Adrian Georgescu
112 1 Adrian Georgescu
Multiple managers can be run as part of a SIP Thor network (on different systems, that are preferably in different hosting  facilities), which  will  improve  the redundancy  of  the  SIP Thor network and its resilience in the face of network/system failures, at the expense of a slight increase in the messaging traffic with each new manager that is added. If multiple managers are run, they will automatically elect one of them as the active one and the others will be  idle until the active manager stops working or leaves the network. Then a new manager is elected and becomes the active manager.  It is recommended to run at least 3 managers in a given SIP Thor network preferably in separate hosting facilities.
113 1 Adrian Georgescu
114 1 Adrian Georgescu
115 84 Adrian Georgescu
h2. Thor Database
116 1 Adrian Georgescu
117 2 Adrian Georgescu
118 84 Adrian Georgescu
*thor-database* is a component of the SIP Thor network that runs on the central database(s) used by the SIP Thor network. Its purpose is to publish the location of the provisioning database in the network, so that other SIP Thor network members know where to find the central database if they need to access information from it.
119 1 Adrian Georgescu
120 1 Adrian Georgescu
121 84 Adrian Georgescu
h2. Thor DNS
122 1 Adrian Georgescu
123 2 Adrian Georgescu
124 84 Adrian Georgescu
*thor-dns* is a component of the SIP Thor network that runs on the authoritative name servers for the SIP Thor domain. Its purpose is to keep the  DNS  entries for the SIP Thor network in sync with the network members that are currently online. Each authoritative name-server needs to run a copy of the DNS manager in combination with a DNS server. The SIP Thor DNS manager will update the DNS backend database with the appropriate records as nodes join/leave the SIP Thor network,  making  it reflect the network status in realtime.
125 1 Adrian Georgescu
126 2 Adrian Georgescu
127 84 Adrian Georgescu
h2. Thor Node
128 1 Adrian Georgescu
129 84 Adrian Georgescu
130 84 Adrian Georgescu
*thor-node* is to be run on a system that wishes to become a SIP Thor network member. By running this program, the system will join the SIP Thor network and  become part of it, sharing its resources and announcing its capabilities to the other SIP Thor network members.
131 84 Adrian Georgescu
132 84 Adrian Georgescu
The network can accomodate one or more nodes with this role, SIP Thor takes care automatically of the additions and removal of each instance. The currently supported roles are *sip_proxy* in combination with OpenSIPS and *voicemail_server* in combination with Asterisk. Other roles are directly built in MediaProxy (*media_relay*), NGNPro (*provisioning_server*) and OpenXCAP (*xcap_server*), for these resources no thor-node standalone component is required. 
133 84 Adrian Georgescu
134 84 Adrian Georgescu
135 84 Adrian Georgescu
h2. Thor Monitor
136 84 Adrian Georgescu
137 84 Adrian Georgescu
138 84 Adrian Georgescu
*thor-monitor* is a utility that shows the SIP Thor network state in a terminal. It can be used to monitor the SIP Thor network status and events.
139 84 Adrian Georgescu
140 84 Adrian Georgescu
141 84 Adrian Georgescu
h2. NGNPro
142 84 Adrian Georgescu
143 84 Adrian Georgescu
144 4 Adrian Georgescu
NGNPro component performs the enrollment and provisioning server role. It saves all changes persistently in the bootstrap database and caches the data on the responsable node at the moment of the change. The network can accomodate multiple nodes with this role, SIP Thor takes care automatically of the additions and removal of each instance.
145 59 Adrian Georgescu
146 84 Adrian Georgescu
NGNPro exposes a [[ProvisioningGuide|SOAP/XML interface]] to the outside world and bridges the SOAP/XML queries with the distributed data structures employed by SIP Thor nodes. 
147 59 Adrian Georgescu
148 2 Adrian Georgescu
NGNPro is also the component used to harvest usage statistics and provide status information from the SIP Thor nodes.
149 62 Adrian Georgescu
150 1 Adrian Georgescu
151 84 Adrian Georgescu
h2. Third-party Software
152 84 Adrian Georgescu
153 84 Adrian Georgescu
154 1 Adrian Georgescu
Adding new roles to the system can be realized programatically by obeying to the SIP Thor API and depending on the way of working of the component that needs to be integrated in the SIP Thor network. 
155 66 Adrian Georgescu
156 56 Adrian Georgescu
The following integration steps must be taken to add a new role to the system in the form of a third-party software:
157 56 Adrian Georgescu
158 84 Adrian Georgescu
# The third-party software must implement a component that publishes its availability in the network. This can also be programmed outside of the specific software by adding it to the generic *thor_node* configuration and logic
159 1 Adrian Georgescu
# The third-party software must be able to lookup resources in the SIP Thor network and use the returned results in its own application logic
160 1 Adrian Georgescu
# Depending of the inner-working of the application performed by the new role, other roles may need  to be updated in order to serve it (e.g. adding specific entries into the DNS or moving provisioning data to it)
161 86 Adrian Georgescu
162 86 Adrian Georgescu
h2. Best topology
163 86 Adrian Georgescu
164 92 Adrian Georgescu
While the software is designed to be self-organizing, it can only do so if is deployed in a way that avoids correlated failures related to Internet connectivity. If the DNS, central database and Thor manager functions are all down at the same time, no self-organizing software is of much use. The following measures can improve the self-recovery in both complete connectivity failures or unstable connectivity with high packet loss:
165 86 Adrian Georgescu
166 86 Adrian Georgescu
 * Host the DNS servers and SIP Thor manager/event servers in different data centers than the Thor nodes used for signaling and media (DC1, DC2, DC3)
167 89 Adrian Georgescu
 * Host all Thor nodes for signaling and media in different data centers than the ones used above in three different data centers (DC4, DC5, DC6)
168 1 Adrian Georgescu
 * Host central database in DC1 with active slaves in DC2 and DC3
169 87 Adrian Georgescu
 * The SIP Thor DNS zone must be run by other DNS servers (typically an external DNS registrar)
170 1 Adrian Georgescu
171 87 Adrian Georgescu
With such setup most connectivity failures are handled automatically:
172 87 Adrian Georgescu
173 87 Adrian Georgescu
 * In case of complete Thor node failures, the network will automatically take off the DNS and routing logic the faulty components.
174 87 Adrian Georgescu
 * In case of partial connectivity loss (A sees B, B sees C but A does not see A), the network will pick the best visible candidates automatically
175 90 Adrian Georgescu
 * In case of intermittent packet loss (flip-lop of network connectivity causing continuous re-organization, nodes can be shut down administratively)
176 1 Adrian Georgescu
177 91 Adrian Georgescu
In the worst case scenario that the location of the central database is completely down, the network can fall back to secondary database automatically with the exception of the prepaid functionality. Accounting records are synced with the central database at a later time when connectivity is resumed. In case the main datacenter does not come back online manual failover to another data center can be done by changing the DNS records of the Thor domain in the DNS of the parent zone to point to another datacenter where data has been previously replicated. This allows continuing the account and provisioning in the new data center.
178 89 Adrian Georgescu
179 89 Adrian Georgescu
In practice, such setup is not cost efficient, there is always a high price to pay to handle automatically failures related to IP connectivity.