Reading from a table in html

 

1.  Depends a lot on what format they are in. Many are just graphics, so are not really tables. There is software that can scan tables and "read" the numbers.

2. Some are tables, but it will depend how they are made. There is a "table" style in html (and variants), but most people making html are actually using a higher level package like Dreamweaver, Expression, FrontPage, etc. I think these all use the basic html table, but I don't know. I am guessing the professional web people, those whose tables you are likely to want to read, use other packages and a mixture of packages. I will try with one of these pro-pages.

3.  I will use the ESPN baseball stats page for this morning's national league batters. Here is a screen shot so a picture, not something you can copy the data from, but go to http://espn.go.com/mlb/statistics for the tables):

NL batting, Sept 1, 2011 

 

4.  I copied all this into my clipboard (I'm using a PC) by highlighting this (including the variable name line). It is now in my clipboard.

5. I then type of the following (btw, because I type most R commands in an R editor ... you'll see one this week, I have to remember to type these commands rather than copy and paste them into R ... and yes multiple clipboards can be used, but that just makes things confusing, so just type:)

Reading clipboard 

The sep=" " tells R that there is a between each item. The header says that the first line is value labels. read.table also works. I think a lot will usually work, but anytime reading from the web be careful. Here RK, which should be the 1-10, is the first name of the player. This is alright here, but if someone had one or more that two names, or a space in their last name, it could cause problems. The clipboard can be copy and pasted directly into Excel and presumably other spreadsheets, and then read in that way. Not sure why the X and the NAs show up in the last column.

Conclusion: Yea, you can read from html tables, but I would only do it cautiously. Always check to make sure it has worked.