How to *easily* apt-get update “offline”

Building a Virtual Appliance is hard.

A virtual appliance is a pre-configured virtual machine image, ready to run on a hypervisor; virtual appliances are a subset of the broader class of software appliances. Installation of a software appliance on a virtual machine and packaging that into an image creates a virtual appliance. Like software appliances, virtual appliances are intended to eliminate the installation, configuration and maintenance costs associated with running complex stacks of software.
via Wikipedia

Put simply, a virtual applicance is a VM export (typically in OVF via an ova file) with your application pre-installed on it. Sometimes thats enough to sell your product or install your product on-premises, but to truely deliver on the above promise of eliminating installation, configuration, and maintenace costs you need to provide a few basic things.

  1. Basic network configuration
  2. User interface to input configuration
  3. Upload endpoint for updates
  4. Disconnected Installation/Provisioning process (No internet!)

By far the most challenging part of building a virtual applicance is provisioning the machine while disconnected from the internet. It’s easy to take for granted the ease and simplicity of apt-get install and as it turns out, “offline” apt support isn’t a trivial endevour.

Like most things in linux there are many many many many ways to skin a cat but I think I’ve found a strong solution.

First step, you need a clean distro of linux. You need to start with a machine image that effectively a fresh install from an iso.

Install apt-cacher-ng before you install anything else

apt-get install apt-cacher-ng

Next you’ll need to configure apt to use apt-cacher-ng as a proxy server.

echo 'Acquire::http { Proxy "http://localhost:3142"; };' > /etc/apt/apt.conf.d/02Proxy

Now you can provision your machine as would normally, personally we use puppet.

puppet apply

Next you’ll want to package up the apt-cacher-ng cache folder that’s located at /var/cache/apt-cacher-ng. For HuBoard:Enterprise we use debian files as our “file delivery” mechanism. We have a simple rake script that uses fpm to build a debian package containing the cache folder, but a tarball will work just as well.

task :package do
  package_name = "apt-cacher-offline"
  description = "Offline package of the apt-cacher-ng cacher folder"
  maintainer = "<techops@huboard.com>"
  url = "https://huboard.com"

  build_number = ENV["BUILD_NUMBER"] || "0"
  version = "1.0.#{build_number}"

  cache_dir = File.join("", "var", "cache", "apt-cacher-ng")


  opts = [
    "-f",
    "-t deb",
    "-s dir",
    "-a all",
    "-n '#{package_name}'",
    "--description '#{description}'",
    "-m '#{maintainer}'",
    "--url '#{url}'",
    "-v '#{version}'",
    "-d 'apt-cacher-ng'"
  ].join(" ")

  sh %{ fpm #{opts} #{cache_dir} }
end

Ok, so now you have a base line, you’ve effectively built a 1.0 version of every debian package installed on your image. It’s important understand the concept of your “offline” image and your “online” image. The offline image is the Virtual Appliance that your have given to your customer your online image is your snapshot of the image before you exported it to the OVF. Here is where it can get a little tricky, let imagine that you’ve released 1.0 of your amazing product… Let’s call it HuBoard:Enterprise ;)

You shipped your customers the MVP to get your application to work and update, but your getting flooded with support requests that your customers would like to monitor the resources your Appliance is using, so you’re like “I know, I’ll use collectd”. Here is how you would do it.

First step, on your online machine, install collectd like you normally would.

apt-get install collectd

Next package up the apt-cacher-ng folder

rake package

Now upload the cached folder to your offline machine and install it right over top the baseline cache folder. If your using debian like we are…

dpkg -i apt-cacher-offline_1.0.1_all.deb
chown -R apt-cacher-ng:apt-cacher-ng /var/cache/apt-cacher-ng

Now what’s really cool about this approach is that apt-cacher-ng builds legit bonifide apt repositories so you can point your sources.list directly at your /var/cache folder.

Here’s an example

deb file:///var/cache/apt-cacher-ng/uburep trusty main restricted universe multiverse
deb file:///var/cache/apt-cacher-ng/uburep trusty-backports main restricted universe multiverse
deb file:///var/cache/apt-cacher-ng/uburep trusty-updates main restricted universe multiverse

deb file:///var/cache/apt-cacher-ng/security.ubuntu.com/ubuntu trusty-security main restricted universe multiverse

deb file:///var/cache/apt-cacher-ng/apt.puppetlabs.com trusty main dependencies

Ok now that you’ve pointed your offline machine at your local file system, you can install the missing packages extremely easy

apt-get update
apt-get install collectd

Here’s a link to the GitHub gist if you prefer to read the post in gist form.

  • Ryan

About Ryan Rauh

My name is Ryan Rauh and I like shiny things. If I've seen it on twitter more the twice most likely I've looked at it or tried it. I really like memes and other silly and seemingly useless things out on the internet. I blog about things *I* think are cool and interesting, I hope you will find them cool and interesting too.
This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Robert Kozak

    I’d really like to see a technique that does this for a windows server

  • stevejansen

    Hi Ryan!

    Just wanted to say I’ve been through a few upgrades with our test of HuBoard:Enterprise, and the upgrade cycle has been really smooth. So whatever you’re doing seems to just work.

    Out of sheer curiosity, why is offline support important? Do you have customers running HuBoard on networks without public Internet access? Or is offline support more about predictability and flexibility during upgrades?

    Cheers,
    Steve

    • rauhryan

      Thanks Steve,

      > Out of sheer curiosity, why is offline support important?

      The main driving force for offline support is so that we can install and update smoothly on networks that require proxy server credentials to reach out to the public network. It’s much easier to deliver everything needed to be downloaded from the internet in our package file than it is to configure several internal package management systems with proxy credentials (gems, npm, apt, etc)

      An added benefit to architecting it this way is that we now have a guaranteed consistent environment across all of our customers because we have full control over every single debian package installed on the machine.

  • Hi Ryan! Thank you a bunch for sharing this with all folks you actually know what you are speaking approximately! Bookmarked. Kindly also consult with my website
    pou gamesPony games

  • Hi Ryan, thanks a lot for such a wonderful post, the stuff posted were really interesting and useful