How to web scraping XML page with hidden codes in R? -
How to web scraping XML page with hidden codes in R? -
i'm trying web scraping open info (title, author , journal) in sciencedirect website. there problem code source because these info hidden, has resulted in problems in code. function i've created produce data.frame 1 row only. 1 time seek produce excel sheet these info result long row.
necessary packageslibrary(bitops) library(rcurl) library(xml) library(rjsonio) library(devtools) library(rselenium)
accessing search page remotelycheckforserver() startserver() #openning firefox firefox_con <- remotedriver(remoteserveraddr = "localhost", port = 4444, browsername = "firefox")
firefox_con$open() # allow window open setting info search url <- "http://www.sciencedirect.com" firefox_con$navigate("http://www.sciencedirect.com") busca <- firefox_con$findelement(using = "css selector", value = "#qs_all") keyword <- busca$sendkeystoelement(list("key word", key="enter"))`> passing page source r pagina <- xmlroot( htmlparse(unlist(firefox_con$getpagesource()) ) )
function used in page source scrapingscraper_science <- function(x) { doc <- htmlparse(url, encoding="utf-8")
tit <- xpathapply(x, "//a[@id]", xmlvalue, "id")
class <- xpathapply(x, "//li [@class]", xmlvalue, "class")
inf.art <- class[seq(59,227,7)]
dat <- data.frame(title=tit, inf=inf.art)
}
xml r web
Comments
Post a Comment