python - how to scrape product details on amazon webpage using beautifulsoup -

for webpage: http://www.amazon.com/harry-potter-prisoner-azkaban-rowling/dp/0439136369/ref=pd_sim_b_2?ie=utf8&refrid=1mfbraecgpmvzc5mjcwg how scrape product details , output dict in python. in above case, dict output want have be:

age range: 9 - 12 years grade level: 4 - 7 ... ...

i'm new beautifulsoup , didn't find illustration create happen. want have illustration follow.

the thought iterate on product details items help of table#productdetailstable div.content ul li css selector, utilize bold text key , next sibling value:

from pprint import pprint bs4 import beautifulsoup import requests  url = 'http://www.amazon.com/dp/0439136369' response = requests.get(url, headers={'user-agent': 'mozilla/5.0 (windows nt 6.2; wow64) applewebkit/537.36 (khtml, gecko) chrome/37.0.2062.120 safari/537.36'})  soup = beautifulsoup(response.content) tags = {} li in soup.select('table#productdetailstable div.content ul li'):     try:         title = li.b         key = title.text.strip().rstrip(':')         value = title.next_sibling.strip()          tags[key] = value     except attributeerror:         break  pprint(tags)

prints:

{     u'age range': u'9 - 12 years',     u'amazon best sellers rank': u'#1,440 in books (',     u'average  client review': u'',     u'grade level': u'4 - 7',     u'isbn-10': u'0439136369',     u'isbn-13': u'978-0439136365',     u'language': u'english',     u'lexile measure': u'880l',     u'mass market paperback': u'448 pages',     u'product dimensions': u'1.2 x 5.2 x 7.8 inches',     u'publisher': u'scholastic paperbacks (september 11, 2001)',     u'series': u'harry potter (book 3)',     u'shipping weight': u'11.2 ounces (' }

note breaking loop nail attributeerror. happens on after there no more bold text within li element.

python web-scraping beautifulsoup scrape

Search This Blog

Jaimee

python - how to scrape product details on amazon webpage using beautifulsoup -

Comments

Post a Comment

Popular posts from this blog

c - Compilation of a code: unkown type name string -

java - Bypassing "final local variable defined in an enclosing type" -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -