XML and Web pages
We will use XML package to do the parsing.
Install package
install.packages("XML", dep = T)
library(XML)
Load XML tree
install.packages("XML")
library(XML)
url <- "http://www.w3schools.com/xml/simple.xml"
document <- xmlTreeParse(url, useInternal=TRUE)
Get element name
rootNode <- xmlRoot(document)
xmlName(rootNode)
Access first element.
rootNode[[1]]
Get element on exact position (going through subelements).
rootNode[[1]][[1]]
Using custom function to load values from XML. xmlValue
is that function.
xmlSApply(rootNode, xmlValue)
Using XPath.
xpathSApply(rootNode, "//name", xmlValue)
Read a table from HTML
library(XML)
url <- "http://www.drugs.com/top200_2003.html"
html.table.data <- readHTMLTable(url, which = 2, skip.rows = 1)
View(html.table.data)
Get page using httr package
library(httr)
library(XML)
url <- "http://www.drugs.com/top200_2003.html"
html = GET(url)
content = content(html, as="text")
parsedHtml = htmlParse(content, asText=TRUE)
xpathSApply(parsedHtml, "//title", xmlValue)
Authentificate with httr package
We can use authenticate
function in order to access a secured page.
library(httr)
library(XML)
url <- "http://httpbin.org/basic-auth/user/passwd"
GET(url, authenticate("user", "passwd"))
The code above returns the following response.
Response [http://httpbin.org/basic-auth/user/passwd]
Date: 2014-12-30 16:30
Status: 200
Content-Type: application/json
Size: 47 B
{
"authenticated": true,
"user": "user"
}
Use
handle
function to access more page with during one authentificated session.
Last updated
Was this helpful?