Hpricot is a great gem for parsing web pages, and combined with the automatic navigation capabilities provided by WWW::Mechanize, it really becomes easy to create a robot to scrape web sites.
One problem, mentioned in this blog post, is that an ever increasing number of ASP.NET web sites have huge amounts of data in an HTML attribute.
Instead of using the methods provided by Hpricot and WWW::Mechanize to work around this issue (as described in the blog post), I used the following monkey patch.
module WWW require 'hpricot' class Mechanize Hpricot.buffer_size = 262144 # added by naofumi end end
You can put it an initializer if you are working in Rails.