Subject: Multimedia Communication and Content Security (MZKO), Department of Telecommunications, Faculty of electrical engineering and computer science, VSB-TUO.
Name: Bc. Kryštof Šara (SAR0130)
Task syllabus:
- master key exchange (symetric cryptography)
- SRTP-DES exchange, SIP VoIP, SDP session descriptor, RTP stream description (codecs, media type, ports, SRTP master key in bae64) in SIP signalling
- key distribution problem (MitM-prone)
- ZRTP and Diffie-Hellmann (DH) alg (MitM-prone and DH implementation in old HW problem)
- SRTP-DTLS session, WebRTC over DTLS channel, media encryption
- simulation
introduction
In the world of a continuous need for communication (preferably in real-time), it is vital for the media stream transportation to be reliable (uninterrupeted), secure (end-to-end encrypted, and fast (UDP/IP, low jitter and RTT).
Internet telephony (also Voice over Internet Protocol, VoIP) technology mainly ensures a connection between Plain Old Telephone Systems (POTS), which could be represented by a plain analogue telephone device, and between an interconnected generic computer network built on the Internet Protocol (IP). This mutual integration is usually called a converged network. [1]
symmetric vs. asymmetric encryption
The main purpose of an encryption is two ensure confidentiality of data. Also, the encryption can be used for authentication. [7] [8]
General types of encryption: [7] [8]
- symmetric using the shared private key — DES (3DES), AES, Kerberos
- asymmetric using the private-public key pair — Diffie-Hellman, RSA, digital signs and certificates
Asymmetric cryptography is an implementation of so-called trapdoor hash function, where the hashing procedure can be inverted using a secret key. [8]
Another key difference there is the computational speed of the encryption. While symmetric encryption is very fast, asymmetric encryption is way slower due to more complex algorithm used. This can by bypassed by using the symmetric type for the interchanged data encryption, while asymmetric type is used for secure symmetric key encryption and distribution. [7] [8]
the VoIP protocol stack
To ensure the transportation integrity, a set of various VoIP protocols are introduced and to be implemented by the reliable systems. Probably the most used VoIP (signalling) protocol is called the Session Initiation Protocol (SIP). This protocol defines an interface for both sides to implement to create a media transportation session. [1]
When the session is initiated, the media stream tunnel can be initiated henceforth too. The stream itself is (under the SIP protocol) usually defined by the Real-time Transport Protocol (RTP). The encrypted version of the just-mentioned RTP protocol, is called SRTP (Secure RTP). [1]
To negotiate the session’s parameters, SIP implements the Session Description Protocol (SDP), which is used in the first phase/message sent by one conterpart (INVITE + SDP header). [1]
summary
protocol abbreviation | protocol full name | protocol type |
---|---|---|
SIP | Session Initiation Protocol | signalling protocol |
SDP | Session Description Protocol | media session management protocol |
(S)RTP | (Secure) Real-time Transport Protocol | real-time media transportation protocol |
WebRTC
Web Real-Time Communication (WebRTC) is a technology used for web applications to capture and stream multimedia (audio, video) content (even binary data could be exchanged between peers). The main advantage is that peers don’t need any additional plug-ins or software besides supported web browser. Connection between peers can often be direct, which means that no intermediary supportive servers are needed. There is also a support for interoperability with PSTN networks already implemented in WebRTC. [13]
RTP stream security
As RTP stream often transfer sensitive media data such as business conference calls, it is very important to ensuse security of such stream. For this purpose, Secure RTP (SRTP) protocol has been introduced.
SRTP, SDES and SDP
SRTP to RTP is of a similar paradigm as HTTPS to HTTP — RTP streams can be encrypted using SRTP. It is however not always as easy to enable encryption as intended, because used encryption techniques and protocols (and their combinations) could possibly not be supported on remote devices. Encryption has to be enabled on both sides to allow session initiation. [14]
The exchange of encryption keys can be executed via various channels and using multiple technologies. The original method of exchange was to use SDES (Session Description Protocol Security Descriptions) though the signalling channel — for example SIP channel. Those keys however can easily be catched on the SIP proxy and used for SRTP stream decryption. Those streams can also be tempered by the man-in-the-middle (MitM) attack — stream could be decrypted, recorded, changed and retransmitted to the original peer. [14] [15]
Fig. 1: Example of SDP media atributes defined by SDES including crypto
part carrying the master key. A wireshark listing of SIP/SDP packet message body.
Master Key Identifier (MKI) is an optional field of SRTP protocol, that identifies the master key. The master key is used for secure symmetric keys generation (session keys) — VoIP media data encryption. Those symmetric keys are to be negotiated at the beginning of the call and are often included in SDP packet body. [11]
The main problem with SDP there is that the master key is distributed/transported in insecure plaintext form, meaning it is prone to MitM attack — it could be easily sniffed when the SIP session is initiated with INVITE packet. [11]
ZRTP (Zimmermann RTP) and SAS
ZRTP suits as enhancement of SRTP protocol. The main pro there is that the master key is distributed using Diffie-Hellman (DH) mechanism. After the successful master key exchange, the session is switched back to SRTP. During and before the exchange is completed, ZRTP protocol usually informs the user, that the call is not encrypted yet if the ZRTP mechanism is enabled (with the encryption enabled implicitly). [11] [14]
Besides DH exchange improvement, the Short authentication string (SAS) is also introduced to both call sides as another layer of call integrity. The SAS message can be then read by both sides to ensure that the call is encrypted and secured. This technique however is prone to speaker’s voice. If the other peer is a complete stranger, we have no certainty of the call authenticity — that the person speaking is not an attacker. [14]
The another problem of ZRTP is its support in various VoIP applications and hardware. Most importantly, ZRTP is not currently supported in WebRTC, or its support is very limited. [14]
SRTP-DTLS
Another security alternative for SRTP is SRTP-DTLS. The main mechanism of security ensurement there is that Datagram Transport Layer Security (DTLS) protocol, that is used to secure UDP traffic. DTLS is based on stream-oriented TLS protocol. This technique is widely used by web browsers and by WebRTC calls. [12] [14]
SIP proxy
Session Initiation Protocol proxy is a special type of software, that allows SIP-based VoIP call packets to bypass various network firewalls. The SIP proxy also could provide address translation in order to direct calls to the VoIP call peers/members. It has got support for authentication, authorization, accounting (AAA), and encryption. [10]
Kamailio
Fig. 2: Official Kamailio logo.
Kamailio is free and open-source SIP server. It can handle up to thousands of call setups per second. Kamailio can be used to build WebRTC conference applications, presence detection systems and instant messaging applications. [9]
As far as the protocol stack used in Kamailio is concerned, it can run on TCP, UDP, securely using TLS for VoIP, and for WebRTC it could use WebSockets. Network layer protocol IPv4 and also IPv6 are supported too. Moreover, it has embedded support for various backend systems like MySQL, Postgres, LDAP, Redis, MongoDB or SNMP. [9]
demonstration
To show the master-key delivery vulnerability of SRTP protocol, simple VoIP call capture and analysis is to be done in this section. As far as the signalling protocol is concerned, SIP is going to be used, and Kamailio SIP server is going to be used as SIP proxy server. Then two clients are to be connected/registered to the server to be ready to start/receive a call. The call itself is going to be captured on an egress interface of one client. Finally, the captured packets are to be filtered, and the filtered UDP stream will be analyzed and possibly decrypted.
Demonstration syllabus:
- SIP proxy (Kamailio), signalling processing, direct RTP stream
- Kamailio + MySQL for SIP phone registration
- capture SDP packets/stream and get SRTP master key
- capture SRTP stream
- insert SRTP master key into decryptor(s)
- desipher SRTP stream and play the media
- include scripts listings
used software and hardware
software
- Kamailio v5.7.1 SIP (proxy) server
- MariaDB v11.1.2 RMDB
- Docker engine v24.0.5
- Fedora 38
- Raspbian GNU/Linux 12 (bookworm)
- iOS 17.1.1
- Wireshark v4.0.8
- tcpdump
- Jami Qt 6.4.2 for Fedora 38
- Jami for iPhone v3.52
hardware
- Raspberry Pi 4B 8 GB RAM
- iPhone SE (2nd edition)
kamailio configuration
For the further usage of the Kamailio SIP server, we are going to use Docker engine. Although the Alpine image is a way smaller than the Debian-based one, it lacks the support for MySQL client and for TLS (no option for apk install
installment using the apk
package manager). Therefore we are going to use xenial
Docker image.
Kamailio Official Docker Image
|
|
For the better setup, I am going to introduce a functional docker-compose file, which is going to include services, networks, and volumes settings all together. The compose file was written from scratch.
Service kamailio
has to wait for the database engine to start and initialize properly, so there is a helthcheck for the mariadb
service implemented, while the kamailio
service can start up after the condition of the database container is set to “healthy”. [5][6]
kamailio-compose.yml
:
|
|
The defualt config file directory structure can be obtained using this recommended procedure [2]:
We want to use MySQL (MariaDB) as a database engine, so we have to tweak the Kamailio configuration a bit. [3]
/etc/kamailio/kamailio.cfg
[3]:
Disable certificate checking for both server and client in /etc/kamailio/tls.cfg
[3]:
/etc/kamailio/kamctlrc
[3]:
Generate self-signed certificate and key using the openssl
library [4]:
database configuration
For the kamailio
container to start, the database has to be configured manually — at least one has to create user kamailio
identified by password and a kamailio
database. We can start only the mariadb
container at first, as the kamailio
container would fail and stop anyway.
|
|
Then we can use mariadb
client connector within the server’s container and log-in using the root credentials (password specified using the environmental constant in docker-compose YAML file).
|
|
Run the docker compose stack using:
|
|
Then we can reinitialize our database and its tables using:
|
|
SIP clients registration
At first, we need to create SIP accounts using the kamctl
command against the docker container:
We can then examine the existing account using kamctl show <SIP account number>
|
|
|
|
Then, we can add created accounts into Jami SIP client for desktop (account 1000), and for iOS (account 2000).
Fig. 3: SIP account registeration in Jami desktop application.
Fig. 4: SIP account registeration in Jami iOS mobile application.
capturing the secured call
To be able to access all network interfaces, we need to start Wireshark with root’s privileges:
|
|
Now, execute the call from one SIP account to another in Jami application.
Fig. 5: A call initiation from Jami iOS mobile application.
After the call is ended, we can stop the packet captuing in Wireshark. Now, we can start extracting the captured information like SDES master key/salt pair form SIP/SDP packets, then find the UDP VoIP stream to extract as RTP stream, and finally to decode/decrypt it and load back to Wireshark.
extracting the information and single stream
To get the key/salt pair, simply filter out SIP/SDP packets in Wireshark:
|
|
Fig. 6: Filtering out SDP packets and exploring the media attributes of to find SDES key/salt crypto pair.
Now we can click on the crypto media attribute and choose Copy » Value to get base64-encoded key/salt pair. Note that the source there is 10.4.5.131 (receiver) and the SIP response is 200 OK
. Also, we will extract hexadecimal value of the key/salt pair (Key and Salt field, Copy » …as a Hex stream). [16]
Next we need to find the raw UDP stream (heuristically by typing udp
to the filter bar, and by looking at source/destination ports). Note that we need to choose the same source IP address. Also. we don’t want any ICMP or STUN packets to intercept our UDP stream.
|
|
Next step there is to decode UDP stream as RTP stream by click on the random UDP packet and choosing Decode As…, and double-clicking on the UDP row and Current column — from the options do choose RTP.
Fig. 7: Decoding the UDP stream into a RTP stream.
When we go Telephony » RTP » RTP Stream Analysis » Graph, we can see the time fluctuation of the RTP stream.
Fig. 8: RTP Stream Analysis, millisecond delay in time.
As we now can analyse the stream partially, we can export specified packets into a pcap file — File » Export Specified Packets… » Save as wireshark/tcpdump pcap file (and name it rtp_single_stream.pcap
for example).
decryption using srtp-decrypt
There’s one C library, that allows us to decrypt extracted single (one-way only) RTP stream into a hexadecimal stream (another UDP/RTP stream). [16]
Before the srtp-decrypt
program compiling process, we need to install the dependencies:
Then we can clone the repository and build the project:
Execute the decryptor with SDES key by piping the pcap exported packets file into the executable [16] [18]:
If you encounter errors like this:
|
|
be sure to provide the correct key/salt pair. Basically it means that the decryption process failed, the program has not been able to decrypt the given packet/frame. [16]
The process itself should be fairly quick. Now we can import the hexadecimal stream from decryptor back into Wireshark to examine it further:
- File » Import From Hex Dump,
- Offsets as Hexadecimal,
- Encapsulation as UDP,
- ports are not important in this step (we can make them up),
- click Import.
Now, decode the UDP packet stream back into RTP stream by clicking on random packet and choosing Decode As… » Current: RTP again. [16]
The stream should now be playable: Telephony » RTP » RTP Player.
decryption using libsrtp’s rtp_decoder
Alternatively, we can use Cisco’s C library called libsrtp
, which introduces a program called rtp_decoder
. [17]
To fetch the project and build it, simple run this:
Here, we can use the hexadecimal stream key/salt string we got earlier. Perform the decoding process [17]:
|
|
conclusion
Decoding/decrypting is a fairly complicated process — mainly it is not easy to extract a simple RTP stream from the full pcap capture file. It is required to have some Wireshark and GNU/Linux knowledge, as well as some knowledge about software source fetching, configuring and building. However, in the end it is a nice exercise in network communication debugging and analysing.
Last but not least, I should note that the decryption process was not fully successful as the final stream was not playable and was, by some reason, of duration zero in seconds.