MSMQ and cached DNS

A couple of weeks ago, one of our hosting providers switched a number of our hosted servers from DHCP to use NAT internally, but kept the same external IP addresses. Evidently we had exhausted the IPv4 addresses internally, and a new server we were bringing online forced a change.

Our systems were minimally affected, and the only real change was in our DB connection strings. I deployed a change for all of our services to use machine names instead of IP addresses, so that the DB connection strings wouldn’t know the difference, and DNS would do the rest. Trying to resolve the external IP address internally would fail, so machine names and DNS should do what it’s supposed to do.

Our NServiceBus configuration was already like that, using the “queuename@machinename” notation. It turns out that when the internal IP addresses changed, our DB connections worked, but P2P MSMQ connections failed, causing thousands of messages to stack up in outgoing queues. The outgoing queue had a status of “Waiting to connect”, but reported no other errors. My only clue was that the “Next hop” showed the incorrect external IP address.

The issue was that MSMQ actually caches IP addresses for resolved machine names, and it was caching the incorrect IP address…on the recipient machine. The recipient machine was rejecting connections from the sender, which could only be confirmed by observing raw TCP traffic using WireShark. The recipient machine was listening to traffic on the incorrect IP address, which I also confirmed using “netstat -anb“.

The resolution was to restart the MSMQ service on both the sender and the receiver machines. That really flushed the cached DNS, and messages went through (ipconfig /flushdns did nothing).

So just an ops lesson in MSMQ kiddies, if your internal IP addresses change on your servers, just DNS alone is not enough to protect against connection failures.

Related Articles:

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

About Jimmy Bogard

I'm a technical architect with Headspring in Austin, TX. I focus on DDD, distributed systems, and any other acronym-centric design/architecture/methodology. I created AutoMapper and am a co-author of the ASP.NET MVC in Action books.
This entry was posted in NServiceBus. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Anonymous

    I’ve been bitten by this DNS bug so many times it hurts. Now, anytime we make IP address or serious DNS changes, we reboot the whole effing cluster.

  • Scooletz

    Nasty thing. What have you done with messages? Resent them?

    • Anonymous

      No need to do anything, they were all just sitting in the outgoing queue on the sender machine, waiting to go out. It was the receiver that had the bad DNS juju. Weird.

  • Pingback: Distributed Weekly 116 — Scott Banwart's Blog()

  • John Breakwell

    Hi,Some more background infortmation if you need it.

    • Anonymous

      Awesome, thanks for the details!!