Blog

Network issues with Tailscale and Aliyun ECS

Aliyun ECS is basically the AWS EC2 of Alibaba.

While setting up my ECS instance, I had trouble accessing certain websites. For instance, I could reach google.com but not fast.com. In the terminal, ping google.com and ping 1.1.1.1 worked, but ping fast.com did not.

I eventually figured out that the problem was with my DNS server. I could ping google.com only because its DNS results had been cached from an earlier network test.

When I ran dig fast.com, I saw that the DNS queries sent to my local DNS resolver were timing out:

$ dig fast.com
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out

; <<>> DiG 9.18.24-0ubuntu0.22.04.1-Ubuntu <<>> fast.com
;; global options: +cmd
;; no servers could be reached

On the other hand, using Google's DNS servers with dig @8.8.8.8 fast.com worked fine:

$ dig @8.8.8.8 fast.com

; <<>> DiG 9.18.24-0ubuntu0.22.04.1-Ubuntu <<>> @8.8.8.8 fast.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55771
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;fast.com.			IN	A

;; ANSWER SECTION:
fast.com.		20	IN	A	173.222.154.196

;; Query time: 308 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Sun Sep 29 13:38:41 CST 2024
;; MSG SIZE  rcvd: 53

Looking at my ECS instance settings, I found that the default DNS servers were 100.100.2.136 and 100.100.2.138, which fell within the 100.64.0.0/10 subnet used by Tailscale. When you run tailscale up, Tailscale automatically creates firewall rules that block traffic from this subnet unless it goes through the tailscale0 interface. As a result, DNS responses were blocked since they came from this subnet but went through the eth0 interface, which is the interface used for internet access.

To fix this, I needed to insert additional firewall rules before the ones created by Tailscale. These rules explicitly allow DNS traffic (UDP port 53) from 100.100.2.136 and 100.100.2.138 on the eth0 interface:

# 100.100.2.136
iptables -I INPUT 1 -s 100.100.2.136 -p udp --sport 53 -i eth0 -j ACCEPT
iptables -I OUTPUT 1 -d 100.100.2.136 -p udp --dport 53 -o eth0 -j ACCEPT

# 100.100.2.138
iptables -I INPUT 2 -s 100.100.2.138 -p udp --sport 53 -i eth0 -j ACCEPT
iptables -I OUTPUT 2 -d 100.100.2.138 -p udp --dport 53 -o eth0 -j ACCEPT

With this, I could verify that my DNS rules came before Tailscale's ts-input chain:

$ iptables --list INPUT
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     udp  --  100.100.2.136        anywhere             udp spt:domain
ACCEPT     udp  --  100.100.2.138        anywhere             udp spt:domain
ts-input   all  --  anywhere             anywhere

Unfortunately, each time Tailscale restarts, its rules are re-inserted before the ones I added. This means my DNS rules become lower in priority than Tailscale’s rules, causing the problem to return. Furthermore, iptables rules aren’t saved, so my DNS rules disappear whenever the instance restarts. In the next post, we will tackle both of these issues to create a complete solution.

#ecs #networking #tailscale