Scraping this URL, R XML and getting siblings -



Scraping this URL, R XML and getting siblings -

hi: want scrap table federal electoral districts – representation order of 2003 subtable "ontario". url here: http://www.elections.ca/content.aspx?section=res&dir=cir/list&document=index&lang=e#list

i've tried code , gets me close, not exclusively there.

doc<-htmlparse('http://www.elections.ca/content.aspx?section=res&dir=cir/list&document=index&lang=e#list', useinternalnodes=true) doc2<-getnodeset(doc, "//table/caption[text()='ontario']")

i know utilize readhtmltable , find particular table, want know how select sibling nodes of caption node equals ontario. thanks

you can utilize following-sibling in xpath:

library(xml) appurl <- 'http://www.elections.ca/content.aspx?section=res&dir=cir/list&document=index&lang=e#list' doc<-htmlparse(appurl, encoding = "utf-8") tablenode <- doc["//*[@id='list']/following-sibling::table/caption[text()='ontario']/.."][[1]] mytable <- readhtmltable(tablenode) > head(mytable) code federal electoral districts population 2006 1 35001 ajax–pickering 117,183 2 35002 algoma–manitoulin–kapuskasing 77,961 3 35003 ancaster–dundas–flamborough–westdale 111,844 4 35004 barrie 128,430 5 35005 beaches–east york 104,831 6 35006 bramalea–gore–malton 152,698

so break downwards xpath. heading federal electoral districts – representation order of 2003 has id="list". id's in html unique can filter on this

//*[@id='list'] find node id equal "list" /following-sibling::table sibling nodes follow tables /caption[text()='ontario'] select nodes have caption text equals "ontario" /.. go node

this gives required table nodes list. there 1 node satisfies above requirements. node can processed readhtmltable.

xml r xpath web-scraping

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -