Scraping this URL, R XML and getting siblings -
Scraping this URL, R XML and getting siblings -
hi: want scrap table federal electoral districts – representation order of 2003 subtable "ontario". url here: http://www.elections.ca/content.aspx?section=res&dir=cir/list&document=index&lang=e#list
i've tried code , gets me close, not exclusively there.
doc<-htmlparse('http://www.elections.ca/content.aspx?section=res&dir=cir/list&document=index&lang=e#list', useinternalnodes=true) doc2<-getnodeset(doc, "//table/caption[text()='ontario']")
i know utilize readhtmltable , find particular table, want know how select sibling nodes of caption node equals ontario. thanks
you can utilize following-sibling
in xpath:
library(xml) appurl <- 'http://www.elections.ca/content.aspx?section=res&dir=cir/list&document=index&lang=e#list' doc<-htmlparse(appurl, encoding = "utf-8") tablenode <- doc["//*[@id='list']/following-sibling::table/caption[text()='ontario']/.."][[1]] mytable <- readhtmltable(tablenode) > head(mytable) code federal electoral districts population 2006 1 35001 ajax–pickering 117,183 2 35002 algoma–manitoulin–kapuskasing 77,961 3 35003 ancaster–dundas–flamborough–westdale 111,844 4 35004 barrie 128,430 5 35005 beaches–east york 104,831 6 35006 bramalea–gore–malton 152,698
so break downwards xpath. heading federal electoral districts – representation order of 2003
has id="list"
. id's in html unique can filter on this
//*[@id='list']
find node id equal "list" /following-sibling::table
sibling nodes follow tables /caption[text()='ontario']
select nodes have caption text equals "ontario" /..
go node this gives required table nodes list. there 1 node satisfies above requirements. node can processed readhtmltable
.
xml r xpath web-scraping
Comments
Post a Comment