ruby - Nokogiri can get og:image in some sites -
ruby - Nokogiri can get og:image in some sites -
i'm using nokogiri parse html , og:image
value:
def get_og_image url html = open(url, "r:binary").read doc = nokogiri::html(html.toutf8, nil, 'utf-8') if doc.css("meta[property='og:image']").present? img_path = doc.css("meta[property='og:image']").first.attributes["content"].value end img_path end
now
> get_og_image "http://techcrunch.com/2014/08/05/the-hug-a-water-bottle-sensor-and-app-helps-you-stay-hydrated/" => "http://tctechcrunch2011.files.wordpress.com/2014/08/the-hug_office.jpg?w=680" > get_og_image "http://www.yahoo.co.jp/" => nil
however yahoo.co.jp has og:image value:
<meta property="og:image" content="http://k.yimg.jp/images/top/ogp/fb_y_1500px.png">
how can right og:image in nokogiri?
the response html of "http://www.yahoo.co.jp/", had problem with, changed user agent.
i set dummy user-agent when access url nokogiri , can og:image.
ruby nokogiri
Comments
Post a Comment