At SANOG 37, I had the opportunity to share some of the ways in which we have been doing Threat Hunting using DNS at my $dayjob.
Here is the video of the presentation.
I also had a little demo but I decided to improvise and add slides instead, since the program was running a little behind schedule and I was the only one standing between everyone and their lunch. trouble was also lurking.
That aside, the same paper ‘Threat Hunting using DNS’ has been accepted at APNIC 52 and hopefully, I will be able to demo the juicy bits.
tl;dr If Jio VoWiFi isn’t working for you, set a different DNS resolver on the phone. While I am a big proponent of running your own resolver in the network, you could test by using open resolvers. The issue doesn’t seem to be impacting everyone and only a subset of users.
To begin with, there are multiple things broken in the authoritative name servers ns1.vowifi.jio.com. and ns2.vowifi.jio.com. of vowifi.jio.com which I’ll cover a bit later.
I came across reports ( See here & here ) of Jio VoWiFi not working for many and while the reports were sketchy, I decided to test this myself.
Below is a snippet from a log file of a dns query to vowifi.jio.com from my phone(192.168.1.137) to a recursive resolver(Unbound) which I run in my network,
May 28 15:54:35 root unbound: [1300:0] info: 192.168.1.137 vowifi.jio.com. A IN
Ideally, the domain is standardised & is made up of Mobile Network Code(MNC) and Mobile Country Code(MCC). For example – in the case of Airtel VoWiFi, the domain name that I see hitting my Unbound resolver is epdg.epc.mnc045.mcc404.pub.3gppnetwork.org. where MNC – 045 and MCC – 404 which signifies Airtel – Karnataka region.
However, oddly enough, Reliance Jio seems to be using vowifi.jio.com. Having said that, the standardised domain name works as well. For example – epdg.epc.mnc861.mcc405.pub.3gppnetwork.org. resolves to 49.44.59.36 and 49.44.59.38
Below is the dns resolution entire delegation chain. From my home network, I can see that the vowifi.jio.com resolves to 49.44.59.38 and 49.44.59.36
. 518400 IN NS a.root-servers.net.
. 518400 IN NS b.root-servers.net.
. 518400 IN NS c.root-servers.net.
. 518400 IN NS d.root-servers.net.
. 518400 IN NS e.root-servers.net.
. 518400 IN NS f.root-servers.net.
. 518400 IN NS g.root-servers.net.
. 518400 IN NS h.root-servers.net.
. 518400 IN NS i.root-servers.net.
. 518400 IN NS j.root-servers.net.
. 518400 IN NS k.root-servers.net.
. 518400 IN NS l.root-servers.net.
. 518400 IN NS m.root-servers.net.
com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS l.gtld-servers.net.
com. 172800 IN NS c.gtld-servers.net.
com. 172800 IN NS h.gtld-servers.net.
com. 172800 IN NS e.gtld-servers.net.
com. 172800 IN NS d.gtld-servers.net.
com. 172800 IN NS i.gtld-servers.net.
com. 172800 IN NS f.gtld-servers.net.
com. 172800 IN NS m.gtld-servers.net.
com. 172800 IN NS j.gtld-servers.net.
com. 172800 IN NS g.gtld-servers.net.
com. 172800 IN NS k.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
jio.com. 172800 IN NS ns1.jio.com.
jio.com. 172800 IN NS ns2.jio.com.
jio.com. 172800 IN NS ns3.jio.com.
jio.com. 172800 IN NS ns4.jio.com.
vowifi.jio.com. 3600 IN NS ns1.vowifi.jio.com.
vowifi.jio.com. 3600 IN NS ns2.vowifi.jio.com.
vowifi.jio.com. 5 IN A 49.44.59.38
vowifi.jio.com. 5 IN A 49.44.59.36
At this point, I confirmed that VoWiFi on Jio works by putting the phone on Airplane mode while remain connected to WiFi. A ~22 minute call worked flawlessly.
To confirm that vowifi.jio.com was indeed the domain name that needs to resolve for VoWiFi to work on Jio, I configured an entry for vowifi.jio.com to return a NXDOMAIN answer in my DNS RPZ aka DNS Firewall in Unbound.
With that configured, any DNS query for vowifi.jio.com from any device in the network will be meted out with a NXDOMAIN answer. Below is a snippet from the Unbound log confirming the RPZ rule applied.
May 28 17:31:50 root unbound: [1191:0] info: 192.168.0.137 vowifi.jio.com. A IN
May 28 17:31:50 root unbound: [1191:0] info: RPZ applied [custom block to test vowifi] vowifi.jio.com. nxdomain 192.168.0.137@64521 vowifi.jio.com. A IN
;; ->>HEADER<<- opcode: QUERY, rcode: NXDOMAIN, id: 14747
;; flags: qr aa rd ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; vowifi.jio.com. IN A
;; ANSWER SECTION:
;; AUTHORITY SECTION:
;; ADDITIONAL SECTION:
;; Query time: 136 msec
;; SERVER: 192.168.0.250
;; WHEN: Thu May 28 18:03:42 2020
;; MSG SIZE rcvd: 32
In the context of VoWiFi, the other noticeable problems with DNS infrastructure of Jio –
A/AAAA records for ns1.vowifi.jio.com, ns2.vowifi.jio.com are missing
ns1.vowifi.jio.com(49.44.59.6), ns2.vowifi.jio.com(49.44.59.7) don’t respond to queries over TCP
The other interesting thing that is worth observing is that when you try resolving vowifi.jio.com from outside India or use a DNS resolver which is perhaps not geographically located within India, the authoritative name servers ns1.vowifi.jio.com(49.44.59.6), ns2.vowifi.jio.com(49.44.59.7) give out a different set of IP addresses – 49.45.63.1, 49.45.63.2
; <<>> DiG 9.16.3 <<>> @127.0.0.1 vowifi.jio.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13728
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;vowifi.jio.com. IN A
;; ANSWER SECTION:
vowifi.jio.com. 4 IN A 49.45.63.1
vowifi.jio.com. 4 IN A 49.45.63.2
;; Query time: 352 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat May 30 06:23:23 IST 2020
;; MSG SIZE rcvd: 75
To confirm this hypothesis, I decided to utilise the RIPE Atlas probes to run a measurement. If you’re unaware of the RIPE Atlas project, check an earlier post – Host a RIPE Atlas software probe in your network.
And the results of the measurement are interesting. Out of the 75 probes which participated in the measurement, there were many probes which received the response 49.45.63.1 & 49.45.63.2 to the DNS query to vowifi.jio.com
ASN
AS Name
DNS Response 1
DNS Response 2
Resolver IP address
4758
NICNET-VSNL-BOARDER-AP National Informatics Centre, IN’
49.45.63.2
49.45.63.1
164.100.3.1
4758
NICNET-VSNL-BOARDER-AP National Informatics Centre, IN’
49.45.63.1
49.45.63.2
164.100.3.1
24186
RAILTEL-AS-IN RailTel Corporation of India Ltd., Internet Service Provider, New Delhi, IN’
TATACOMM-AS TATA Communications formerly VSNL is Leading ISP, IN’
49.45.63.1
49.45.63.2
8.8.8.8
16509
AMAZON-02, US’
49.44.59.38
49.44.59.36
::1
15169
GOOGLE, US’
49.45.63.1
49.45.63.2
::1
139331
DCORP-AS-AP DevelentCorp., IN’
49.44.59.36
49.44.59.38
::1
24309
CABLELITE-AS-AP Atria Convergence Technologies Pvt. Ltd. Broadband Internet Service Provider INDIA, IN’
49.45.63.2
49.45.63.1
192.168.1.10
9498
BBIL-AP BHARTI Airtel Ltd., IN’
49.45.63.1
49.45.63.2
192.168.139.245
internationalvowifi.jio.com also seems to indicate VoWiFi International calling, which resolves to 49.44.59.36 and 49.44.59.38 from my vantage point. The same resolves to 49.45.63.1 and 49.45.63.2 from every location that I’ve managed to check from outside India.
Looking at the results, most likely the issue is with how ns1.vowifi.jio.com & ns2.vowifi.jio.com are responding to client subnet (EDNS0) in DNS queries.
Recommended reading
If you enjoyed reading this blog post, you might find root hints vs RFC 8806 interesting.
Image shows the locations of the root server IP Anycast instances. Source: https://root-servers.org/
Current State of DNS Root Servers
The DNS root server system uses IP Anycast.There are 13 root server operators with a total of 1084 instances all over the world. Let’s look at some of the problems in the context of the root server system,
Decrease the round trip time to the root servers
The round trip time to the root servers is dependent on multiple factors. Availability of a root server instance within the country and optimal routing. While the first can be addressed by installing an instance of the root server in a country, the second one is a bit hard to address. Routing determines whether the traffic to the root server from the last mile will reach the instance which is local or take the transit route to an instance outside the country.
If the traffic is transiting outside the country, the result is increased latency and poor performance in the context of DNS resolution.
Case in point, in the context of India, Netnod which is a root server operator managing the i-root-servers.net has an Ancast IPv4 node in Mumbai.
A traceroute from AS9498 to i.root-servers.net shows that traffic is not hitting the local instance but taking the transit route.
traceroute from AS9498 to i.root-servers.net The above image has been taken from a RIPE Atlas measurement.
Similarly, RIPE NCC is the root server operator managing the k.root-servers.net. Again, in the context of India, there is an Anycast node IPv6 node in Mumbai and Noida.
A traceroute from AS9498 to k.root-servers.net shows that traffic is not hitting the local instance but taking the transit route.
traceroute from AS9498 to k.root-servers.net The above image has been taken from a RIPE Atlas measurement.
If you aren’t aware of the RIPE Atlas project, check the earlier post
Prevent snooping of queries
In the case of traditional DNS or DNS over 53( Do53), the traffic is unencrypted. In response to the privacy concerns and to secure DNS traffic between the client and the recursive resolver, IETF standardised DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT). While both of the protocols secure the communication between the client and the recursive resolver, traffic between the recursive resolver and the root servers is still in the open i.e unencrypted.
Faster negative responses to queries for non-existent domains
The recent study by ICANN OCTO reveals that a vast majority of the queries to the root servers are for names which do not exist in the root zone. By providing faster negative responses to non existent domains to the stub resolver, eliminate sending the junk queries to the root servers entirely.
Increase the resiliency of the root server system
In the context of DNS, the primary intention of using IP Anycast is to have the topologically closest server provide the answer. This model fails if there is suboptimal routing as seen in the examples of traceroute to the root servers earlier.
The additional benefit of using IP Anycast is that considering optimal routing, in the event of a DDoS attack, the impact is limited in effect as it gets confined to certain areas. In the past, IP Anycast has helped to mitigate attacks on the root server system where the attack became limited in scope to certain Anycast instances of the root server and caused a saturation of network connections.
On the contrary, Mirai botnet attack on Dyn Infrastructure also tells us that large scale attack can cause congestion across the Anycast instances resulting in unavailability of services.
Finally, we get to a set of broader questions – How do we increase resiliency against a DDoS on the root server system ? Since the root server system doesn’t penalise abuse (period), should we continue abusing it ?
A probable solution as proposed in RFC 7706 is to run a local copy of the full root zone on the loopback. What this essentially suggests is that the full root zone on the loopback will serve as upstream to the recursive resolver and the recursive resolver should be able to validate the zone from the upstream using DNSSEC.
In order to implement this, one first needs a copy of the root zone. The following root servers currently allow transfer of the root zone using AXFR over TCP,
Root Server Operators which support transfer of the root zone
The process of manually pulling the root zone has an operational issue – one needs to periodically check if the root zone has changed in the root zone copy at the upstream and then update the copy of the root zone configured to run on the loopback.
Even though RFC 7706 is Informational, recursive resolver software such as ISC BIND, Unbound, Knot Resolver have built-in support.
Slaving of the root zone – ISC BIND 9.16.3(stable)
Image of an excerpt from named.conf showing the slaving of the root zone configuration
Part II of this post will contain operational instructions for running a local copy of the root zone and document some of the pitfalls of doing so.
Update (06/08/2020) – APNIC has published this post on their blog. Robbie Mitchell from APNIC was of great help in correcting a few things and polishing the article. You can read the Part 1 on the APNIC blog here
DNS(Domain Name System) is the crucial & ubiquitous fabric of the Internet. While on the surface, users rely on accessing websites, apps, email etc underneath it’s the DNS database which provides the map for the Internet.
It’s fair to say that everything on the Internet begins with a DNS query. This means that the DNS is used for legitimate purposes and as well as abused by bad actors.
Adding a layer of security to a flat network
In the context of COVID-19, where most of us are working from home, security of the the devices & data being accessed from a hostile home network has become a major talking point over the last couple of months. The home network is atypical from an enterprise network from a security perspective and apart from its inherent flaws, it’s a flat network.
A flat network is a computer networkdesign approach that aims to reduce cost, maintenance and administration.[1] Flat networks are designed to reduce the number of routers and switches on a computer network by connecting the devices to a single switch instead of separate switches. Unlike a hierarchical network design, the network is not physically separated using different switches. The topology of a flat network is not segmented or separated into different broadcast areas by using routers.
Here is a representation of a flat network design,
The constraints of a flat network are,
No segmentation of traffic – Single broadcast domain
Easy & rapid propagation of malicious traffic within the network
One of the layers of security that can be brought into a flat network at an economical cost is by leveraging DNS. Before we look into how that can be implemented, here is a DNS primer for what happens when a domain name is accessed in a network,
Shift of the recursive resolvers
In the above diagrammatic representation, the part which is doing the most heavy lifting is the Recursive DNS Server or Recursive resolver. At the very beginning of the Internet, users themselves ran recursive resolvers on the machines or in the network. This model slowly shifted to the network operators (ISP’s) offering this as a bundled, free of cost offering along with the service. And the model has moved DNS resolution further away from the user with the advent of the Cloud/Quad DNS providers. To name a notable few, Google Public DNS (8.8.8.8, 8.8.4.4), Cloudflare (1.1.1.1, 1.0.0.1), Quad9(9.9.9.9) etc.
While each of these open resolvers services promote faster dns resolution, in reality they are still further away from the user from a round trip metric. Even though all of these open resolver services use IP Anycast, the proximity to the user cannot compete with a local resolver. In obvious terms, the recursive resolver which is in the users network or even the resolver provided by the Internet Service Provider will always be closest.
The one definitive advantage that the cloud/quad DNS open resolvers provide is the availability of a large cache.
If you aren’t convinced yet on running your own DNS resolver instead of outsourcing it to the cloud/quad DNS providers, I would urge you to read Why should I run my own DNS resolver?
And most importantly, if you want to leverage DNS Response Policy Zones (DNS Firewall) to add a layer of security in your network, you need to run a recursive resolver.
What is DNS Response Policy Zones(RPZ) ?
It’s currently an Internet-draft and not a standard yet. The latest draft is available here
Allows policy to be applied to DNS queries. Set a differentiated route for the bad domains
Economical solution – a RaspberryPi can act as recursive resolver with DNS RPZ for the entire network – especially useful & low cost solution for home networks, SOHO etc
Just like the functioning of a firewall, RPZ is made up of TRIGGERS & ACTIONS.
This is all good but without threat intelligence data, a DNS Firewall doesn’t add any value.
Threat intelligence RPZ feeds
While there are many threat intelligence providers which provide a DNS RPZ feed, below are some of the free/community ones,
DNS root servers are the heart of the DNS infrastructure. Although there are just 13 of them, the actual number comprises of 1084 instances in Anycast operated by 12 independent root server operators.
A recent study by ICANN OCTO on Analysis of the Effects of COVID-19-Related Lockdowns on IMRS Traffic shed some light on DNS traffic patterns before COVID-19 and during. While the study looked at the ICANN Managed Root Server Instance (IMRS) i.e a few instances of the L-Root Server ( l.root-servers.net), I wouldn’t be surprised if the pattern is similar for other root servers as well.
One stark observation in the study was the amount of DNS traffic for non-existent TLDs. As every DNS transaction begins with a query to the root server and goes down the delegation chain, queries for non-existent records are also sent to the root servers.
Topping the chart is browsers based on Chromium. Not surprising since Chromium based browsers send a 7-15 character three random strings on startup to check if the browser is sitting behind captive portal. Check my earlier blog post Chromium based browsers & DNS for more information on the topic.
So, I had sent in a question to the Ask Mr. DNS podcast asking if they knew if there was a formal specification/guidelines for consequences of excessively abusing the root servers. And guess what,
I would urge you to listen to the entire episode as it contains juicy bits by Kim Davies about the Root Key Signing Key Ceremony, but if you’re the impatient lot & !DNS Geek, skip to 31:48 to tune in for my few seconds of fame 😀
While this is not something new, it perhaps has more significance because of the ever increasing market share of more than 60% of Chromium based browsers.
Chromium based browsers have a very uncanny method to check if the web browser is sitting behind a captive portal. And if you’re running a recursive resolver in your network with a large user base running Chromium based browsers (Google Chrome, Brave etc), it might even startle you if you observe the recursive resolver logs.
Here is a snippet from my unbound resolver as soon as I start Google Chrome on the machine(192.168.0.188),
Jun 3 11:16:31 root unbound: [1283:0] info: 192.168.0.188 pwpsfrn. A IN
Jun 3 11:16:31 root unbound: [1283:0] info: 192.168.0.188 yeytluindg. A IN
Jun 3 11:16:31 root unbound: [1283:0] info: 192.168.0.188 zkgtcrxrpfjcjxr. A IN
A research project at USC What’s In A Name? goes into some detail with the classification.
Here is the summary of the study,
Though the root server system handles this application-specific load sufficiently, it is clear that Chrome’s trick of using randomly generated names to discover whether it’s behind a captive portal contributes significantly to the traffic received at the root zone.
The Shadowserver Foundation releases and updates a scan report containing results for open resolvers on the Internet. Open resolvers basically respond to DNS queries from anyone on the Internet. Open resolvers are bad for the Internet primarily because they are a catalyst in a DNS amplification attack.
A Domain Name Server (DNS) Amplification attack is a popular form of Distributed Denial of Service (DDoS), in which attackers use publicly accessible open DNS servers to flood a target system with DNS response traffic. The primary technique consists of an attacker sending a DNS name lookup request to an open DNS server with the source address spoofed to be the target’s address. When the DNS server sends the DNS record response, it is sent instead to the target.
At the time of writing this, from an India perspective, there are 33,384 open resolvers. The number was 72,736 a couple of weeks ago.
Of the quantum, at that time,
ASN
AS Name
Count
AS9829
BSNL-NIB National Internet Backbone
77,736
So, what’s going on here ? Most likely, it’s a broken configuration in the CPE(Customer Premise Equipment) of AS9829 which is allowing DNS requests on the WAN IP address and performing recursion.
Most of the cheap Consumer Premise Equipment(CPE) devices that are bundled with the Internet connection run dnsmasq and the firmware never sees an update.
Interestingly, when I compare this with my own measurements, the number of IP addresses responding to port 53 in my results is much higher – 260,886. Though, I haven’t filtered the responses for IP addresses which are performing recursion. There could be IP addresses in the results which are configured as authoritative name servers and that’s perfectly valid.
For some reason, if you are running a DNS resolver on the Internet, strongly suggest that you restrict access by IP address/network.
A better approach is perhaps to configure the DNS resolver software on a RFC1918 private IP address & configure Wireguard/openvpn. Using this approach, the resolver is never exposed to the Internet while at the same time, devices can send DNS queries via the wireguard/openvpn tunnel.