Creating screenshots with nanoc command

I just created a cool nanoc command that i would like to share with the world. It’s a nanoc command that goes to a URL, fetches all the links, does some simple sorting on these links, and then generates screenshot, with yaml metadata, for usage in a portfolio site, built in nanoc.

The script uses the selenium-webdriver to fire up a chrome or firefox session and click and shoot until it runs out of links, or it reaches the limit that you give on the command line. It saves the files in a folder of your choosing.

I made this, because I got tired of having to take screenshots of all the sites I worked on. “It can be done automatically”, the geek in me said!

First of start off by adding the selenium-webdriver gem to your project. You should use a Gemfile for managing gems in your project.

Make a Gemfile if you haven’t already in the root of your project:

touch Gemfile

Add the nanoc gem and the selenium-webdriver gem

gem 'nanoc'
gem 'selenium-webdriver'

To let the selenium webdriver control your browser you should also install chromedriver if you’re using chrome. Firefox will run without any modification. I am on a mac by the way, and I won’t try to be clever about how this works on a pc.

brew install chromedriver

Create the commands folder in your project root, and drop this code in commands/make_screenshots.rb:

require 'rubygems'
require 'selenium-webdriver'
require 'fileutils'

usage       'make_screenshots url destination count'
summary     'Generates screenshots'
description 'Generates screenshots from urls specified in the items'

flag   :h, :help,  'show help for this command' do |value, cmd|
  puts cmd.help
  exit 0
end

@already_visited = []

def invalid_link?(url)
  url.nil? || url =~ /^#|^mailto:|javascript:/
end

def inside_domain?(url)
  url.hostname.nil? || url.hostname == @root_url.hostname
end

def create_screenshot(url)

  # don't do doubles
  return if @already_visited.include?(url)

  # create a counter
  @counter ||= 0
  @counter = @counter + 1

  # filenames
  png_filename = "#{@destination}/screenshot-#{@counter}.png"
  yaml_filename = png_filename.chomp(File.extname(png_filename)) + ".yaml"  

  puts "generating thumbnail for #{url} in #{png_filename}"

  # go to the desired url
  @driver.get url

  # grab a screen shot
  @driver.save_screenshot(png_filename)

  # set a title
  title = "Screenshot #{@counter}"

  # create yaml metadata
  puts "creating yaml file #{yaml_filename}"
  File.open(yaml_filename, "w") do |file|
    file.puts "---"
    file.puts "title: #{title}"
    file.puts "url: #{url}"
    file.puts "position: #{@counter}"
    file.puts "created_at: #{Time.now}"
    file.puts "---"
  end

  # add the url to visited urls
  @already_visited << url
end 

run do |opts, args, cmd|

  # get the args
  @url, @destination, @max_items = args

  # check for args
  unless @url && @destination && @max_items  
    puts "Please fill in all the arguments"
    puts cmd.usage
    exit 0
  end

  @max_items = @max_items.to_i

  # create the destination dir
  FileUtils.mkdir_p(@destination)

  # fire up the fox!!
  @driver = Selenium::WebDriver.for :ff # or use :chrome

  # set a counter
  @root_url = URI::parse(@url)

  # create screen shot of homepage
  create_screenshot(@url)

  # get all the href of anchors in the homepage
  hrefs = @driver.find_elements(:tag_name => "a").map{|a| a.attribute("href")}.uniq

  # reject invalid links
  hrefs.reject!{|h| invalid_link?(h)}

  # parse all the urls and reject failure
  urls = hrefs.map{|h| URI::parse(h) rescue nil }.compact

  # only urls inside this domain
  urls.reject!{|h| !inside_domain?(h) }.uniq

  # higher priority to top-level links
  urls.sort_by!{|u| u.path.split("/").count }

  # make the screenshots
  for url in urls[0..@max_items]

    # remove fragment and query portion of the url
    url.fragment = nil
    url.query = nil
    create_screenshot(url)
  end

  # quit the browser
  @driver.quit
end

You can now run the command from the command-line. It takes three arguments (seperated by spaces): the url, the destination folder for the images, and how many max_items you want:

nanoc make_screenshots http://icanhas.cheezburger.com/ content/test 10

This should put 10 screen shots of funny cats in the folder content/test. It also creates the metadata yaml files, so you can add metadata to the images.

The files are named:

  • screenshot-1.png
  • screenshot-1.yaml
  • screenshot-2.png
  • screenshot-2.yaml
  • … etc.

Have fun using this script. I hope you find it useful.

Comments

comments powered by Disqus