python - how to scrape product details on amazon webpage using beautifulsoup -
python - how to scrape product details on amazon webpage using beautifulsoup -
for webpage: http://www.amazon.com/harry-potter-prisoner-azkaban-rowling/dp/0439136369/ref=pd_sim_b_2?ie=utf8&refrid=1mfbraecgpmvzc5mjcwg how scrape product details , output dict in python. in above case, dict output want have be:
age range: 9 - 12 years grade level: 4 - 7 ... ...
i'm new beautifulsoup , didn't find illustration create happen. want have illustration follow.
the thought iterate on product details
items help of table#productdetailstable div.content ul li
css selector
, utilize bold text key , next sibling value:
from pprint import pprint bs4 import beautifulsoup import requests url = 'http://www.amazon.com/dp/0439136369' response = requests.get(url, headers={'user-agent': 'mozilla/5.0 (windows nt 6.2; wow64) applewebkit/537.36 (khtml, gecko) chrome/37.0.2062.120 safari/537.36'}) soup = beautifulsoup(response.content) tags = {} li in soup.select('table#productdetailstable div.content ul li'): try: title = li.b key = title.text.strip().rstrip(':') value = title.next_sibling.strip() tags[key] = value except attributeerror: break pprint(tags)
prints:
{ u'age range': u'9 - 12 years', u'amazon best sellers rank': u'#1,440 in books (', u'average client review': u'', u'grade level': u'4 - 7', u'isbn-10': u'0439136369', u'isbn-13': u'978-0439136365', u'language': u'english', u'lexile measure': u'880l', u'mass market paperback': u'448 pages', u'product dimensions': u'1.2 x 5.2 x 7.8 inches', u'publisher': u'scholastic paperbacks (september 11, 2001)', u'series': u'harry potter (book 3)', u'shipping weight': u'11.2 ounces (' }
note breaking loop nail attributeerror
. happens on after there no more bold text within li
element.
python web-scraping beautifulsoup scrape
Comments
Post a Comment