I was building a new report for SignalLeaf last weekend, to get the RSS Subscriber count for a given podcast. Along the way, I was doing some research in to the best way to track that information. It turned out to be mostly simple: track the IP address for every machine that gets the RSS feed for a given podcast, along with the user-agent for each (and a few other bits of info). Once you take out the known list of bots and crawlers, you can use a “distinct” combination of IP and user-agent to get a good idea of how many subscribers a podcast has.
I was rocking the code, getting data tracked and analyzed, and I got the first version of the report published. So naturally, I went out to my own podcast that I have set up and started refreshing the RSS feed from my browser and from curl commands. I wanted to see what would happen in the staging and production environments, on Heroku.
To my horror, I found my RSS subscriber count increasing by 1 every time I refreshed the feed!
Digging In To The Issue
By adding a console.log on the IP address that I was tracking, I found out that I was getting a new IP address for every refresh, all within the 10.##.###.### IP Address range. I know this is a private IP range, used internally on networks, so this didn’t make sense to me at all.
A few quick searches later, and I found some info that told me how Heroku uses an internal routing system to forward the original request to any of the actual machines running your process. The IP address that your process actually receives is not the original requesting IP address, but the internal router or proxy or whatever it is.
Getting The Real IP Address
Fortunately, there is a way to get the real IP Address for a client that is connected to your app. Heroku attaches a “x-forwarded-for” header to requests, and gives you an array of IP Addresses as the value. All you need to do is read this array by splitting the raw string at “,” and then find the last item in the array. This will be the real IP address of the client.
So in my app, I added this code:
And now I have the real IP address to track RSS activity and produce the right report.
Artificial Bump In Subscribers
Of course, my RSS subscriber history has a large bump in the first day, from all my errant subscriber counts:
But the good news is that most people aren’t going to sit there, refreshing their RSS feed every few seconds, dozens of times. So no one else really had the problem of an extra bump in subscribers that first day. And as you can see, my count is now leveled out where it should be.
Hopefully I’ll be able to bump that number back up … legitimately … as I produce more episodes, though. :)