XML and Web pages
We will use XML package to do the parsing.
Install package
install.packages("XML", dep = T)
library(XML)Load XML tree
install.packages("XML")
library(XML)
url <- "http://www.w3schools.com/xml/simple.xml"
document <- xmlTreeParse(url, useInternal=TRUE)Get element name
rootNode <- xmlRoot(document)
xmlName(rootNode)Access first element.
rootNode[[1]]Get element on exact position (going through subelements).
rootNode[[1]][[1]]Using custom function to load values from XML. xmlValue is that function.
xmlSApply(rootNode, xmlValue)Using XPath.
xpathSApply(rootNode, "//name", xmlValue)Read a table from HTML
library(XML)
url <- "http://www.drugs.com/top200_2003.html"
html.table.data <- readHTMLTable(url, which = 2, skip.rows = 1)
View(html.table.data)Get page using httr package
library(httr)
library(XML)
url <- "http://www.drugs.com/top200_2003.html"
html = GET(url)
content = content(html, as="text")
parsedHtml = htmlParse(content, asText=TRUE)
xpathSApply(parsedHtml, "//title", xmlValue)Authentificate with httr package
We can use authenticate function in order to access a secured page.
library(httr)
library(XML)
url <- "http://httpbin.org/basic-auth/user/passwd"
GET(url, authenticate("user", "passwd"))The code above returns the following response.
Response [http://httpbin.org/basic-auth/user/passwd]
Date: 2014-12-30 16:30
Status: 200
Content-Type: application/json
Size: 47 B
{
"authenticated": true,
"user": "user"
}Use
handlefunction to access more page with during one authentificated session.
Last updated
Was this helpful?