python - how to scrape product details on amazon webpage using beautifulsoup -



python - how to scrape product details on amazon webpage using beautifulsoup -

for webpage: http://www.amazon.com/harry-potter-prisoner-azkaban-rowling/dp/0439136369/ref=pd_sim_b_2?ie=utf8&refrid=1mfbraecgpmvzc5mjcwg how scrape product details , output dict in python. in above case, dict output want have be:

age range: 9 - 12 years grade level: 4 - 7 ... ...

i'm new beautifulsoup , didn't find illustration create happen. want have illustration follow.

the thought iterate on product details items help of table#productdetailstable div.content ul li css selector, utilize bold text key , next sibling value:

from pprint import pprint bs4 import beautifulsoup import requests url = 'http://www.amazon.com/dp/0439136369' response = requests.get(url, headers={'user-agent': 'mozilla/5.0 (windows nt 6.2; wow64) applewebkit/537.36 (khtml, gecko) chrome/37.0.2062.120 safari/537.36'}) soup = beautifulsoup(response.content) tags = {} li in soup.select('table#productdetailstable div.content ul li'): try: title = li.b key = title.text.strip().rstrip(':') value = title.next_sibling.strip() tags[key] = value except attributeerror: break pprint(tags)

prints:

{ u'age range': u'9 - 12 years', u'amazon best sellers rank': u'#1,440 in books (', u'average client review': u'', u'grade level': u'4 - 7', u'isbn-10': u'0439136369', u'isbn-13': u'978-0439136365', u'language': u'english', u'lexile measure': u'880l', u'mass market paperback': u'448 pages', u'product dimensions': u'1.2 x 5.2 x 7.8 inches', u'publisher': u'scholastic paperbacks (september 11, 2001)', u'series': u'harry potter (book 3)', u'shipping weight': u'11.2 ounces (' }

note breaking loop nail attributeerror. happens on after there no more bold text within li element.

python web-scraping beautifulsoup scrape

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -