Hpricot is a great gem for parsing web pages, and combined with the automatic navigation capabilities provided by WWW::Mechanize, it really becomes easy to create a robot to scrape web sites.
One problem, mentioned in this blog post, is that an ever increasing number of ASP.NET web sites have huge amounts of data in an HTML attribute.
Instead of using the methods provided by Hpricot and WWW::Mechanize to work around this issue (as described in the blog post), I used the following monkey patch.
module WWW
require 'hpricot'
class Mechanize
Hpricot.buffer_size = 262144 # added by naofumi
end
end
You can put it an initializer if you are working in Rails.