Node v20, AggregateError ETIMEDOUT and Happy Eyeballs
If you run Node.js v20 and higher, you might notice your fetches and other connections sometimes fail with an AggregateError ETIMEDOUT error similar to this:
TypeError: fetch failed at node:internal/deps/undici/undici:13178:13 at process.processTicksAndRejections (node:internal/process/task_queues:95:5) { [cause]: AggregateError [ETIMEDOUT]: at internalConnectMultiple (node:net:1118:18) at internalConnectMultiple (node:net:1186:5) at Timeout.internalConnectMultipleTimeout (node:net:1712:5) at listOnTimeout (node:internal/timers:583:11) at process.processTimers (node:internal/timers:519:7) { code: 'ETIMEDOUT', [errors]: [ [Error], [Error] ] }
This is due to a broken implementation of the "Happy Eyeballs" standard in Node.js versions 20 and above. Happy Eyeballs attempts to use both IPv4 and IPv6 to connect to a host that has both IPv4 and IPv6 addresses, with quick fallback so that there's no perceivable delay if one protocol isn't reachable. It's supposed to work like this:
- Attempt to open an IPv6 connection to
host.example.com
- Wait 200ms, if that connection hasn't completed, attempt to open an IPv4 connection to
host.example.com
- Use whichever connection completes first
However, the Node.js implementation typically goes like this:
- Attempt to open an IPv4 connection to
host.example.com
- Wait 250ms, if that connection hasn't completed, cancel it and attempt to open an IPv6 connection to
host.example.com
- Hope that the IPv6 connection works
There are several things wrong here. Note that Happy Eyeballs typically starts with IPv6 and falls back to IPv4. Node.js starts with whatever the DNS server returned first, which in an IPv4-only environment will most likely be an IPv4 address. Second, Node.js cancels the entire IPv4 connection after 250ms in violation of the standard. This breaks the whole point of Happy Eyeballs which is supposed to use whichever connection completes first. On an IPv4-only host, this first connection is the only one that will ever have a chance of working! A timeout of 250ms is often too short for many internet connections, especially for users of higher latency connections like satellite or cellular or who are geographically far away from the target server. This doesn't even allow for enough time to retry a TCP SYN in case there was random packet loss, so even on a low latency connection, some percentage of connections could fail from a single dropped packet.
So when you see AggregateError ETIMEDOUT in your error logs, you are probably in an IPv4-only environment and the server you are trying to connect to has both IPv4 and IPv6 addresses. Thankfully there are several workarounds:
Start node with
--no-network-family-autoselection
. This disables the broken Happy Eyeballs implementation and goes back to trying the first returned address with a normal timeout.Start node with
--network-family-autoselection-attempt-timeout 5000
. This changes the 250ms timeout to 5000ms, giving the connection a much higher chance of succeeding before it's canceled.Start node with
--dns-result-order=ipv6first
. This changes the order of DNS results so that the IPv6 attempt comes first, after it fails the IPv4 attempt will use a longer timeout as it's the last remaining candidate.
For more technical information and discussion about this problem, please see this GitHub issue: https://github.com/nodejs/node/issues/54359