I know some websites like this where you can search for good football data: http://www.pro-football-reference.com/play-index/pgl_finder.cgiClearly the web page queries some database when you run that. Is there any other way to grab a bunch of data from that database?I'm sure that I could write a script in SAS or Excel but that could be long, tedious, and have mistakes.
12/30/2010 11:26:33 AM
Hmmm. A web page that uses a database backend. What magics is this you speak of?
12/30/2010 11:54:36 AM
Think about what you're asking for. No you cant do this, at least not legally and not for free. You're talking about an enormous amount of data that is the websites entire point of existence.
12/30/2010 12:30:17 PM
Obviously you guys have never seen that movie The Social Network. This is how facebook got started people!
12/30/2010 12:52:35 PM
Well, really, the trick is establishing a connection to the database, which you're not gonna be able to do unless some newbie hard-coded the credentials in the source of the page there.I guess you could write a script to spoof form submissions and collect all the results, but my God, just e-mail the webmaster and ask for whatever it is you want.
12/30/2010 1:24:48 PM
12/30/2010 1:33:00 PM
The way to get their data is to find a place where they themselves have dumped it all in html form ...Chances are that they have organized it into a heirarchy for display that is somewhat similar to the heirarchy used in their database ... so while you cannot query the whole database, you can scrape the entire website (unless they shutdown your IP when they find out what you are doing).I did this with IMDB a few years back (but never used the data) ... scraping websites is not some new thing. There are plenty of tools out there to help you (DOM is your friend) ... but you'll just have to look for patterns in the html that you can use to parse out the data you need, and then insert that data into your own database.
12/30/2010 1:39:51 PM
^this is fine for personal use, but is going to quickly get you in legal trouble for anything publicly available, commercially or not.
12/30/2010 3:16:59 PM
as a person who has also scraped a few websites in my day, i have not heard of anybody actually getting in real legal trouble for this kindof activity, and would be interested if you could cite any examples
12/30/2010 3:22:25 PM
It's data theft. A database is protected intellectual property just like anything else.There are plenty of legal precidents for this, going back to the beginnings of google maps before they releases a public access API, there were hundreds of cease and desist orders from mashups and commercial sites leveraging their data without permission.
12/30/2010 3:37:10 PM
I figured it couldn't be done but it's for personal use and was worth a shot to ask.Of course I plan to comply with all terms of use.
12/30/2010 4:11:06 PM
^^ a c&d is no big deal if you comply with it. i'm talking about a scenario where pro-football-reference.com sues Shivan Bird for $1,000,000web scraping happens every day, and it would be a pity imo if the legal ramifications were so prevalent that nobody dared try to mash up data from some other siteof course, in situations like this, where the site has made a good faith effort (based on synapse's link) to convince people not to try to hammer their site, and also to provide access to the data on a commercial basis, it would be polite to respect their wishesiow, i think "quickly get you in legal trouble" is a little more FUD than is realistic]
12/30/2010 4:41:46 PM
Here's a better idea, click the damn about linkhttp://www.pro-football-reference.com/download/
12/30/2010 4:43:33 PM
^ well done ...and of course, in terms of scraping others data from sites ... you put yourself in front of legal issues if you use it for commercial purposes Don't do it (or just dont get caught doing it) ... all the same [Edited on December 30, 2010 at 4:49 PM. Reason : .]
12/30/2010 4:47:45 PM
I know some websites like this where you can search for good customer data: http://www.BankOfAmerica.comClearly the web page queries some database when you run that. Is there any other way to grab a bunch of data from that database?I'm sure that I could write a script in SAS or Excel but that could be long, tedious, and have mistakes.
12/30/2010 5:59:17 PM
^^
12/30/2010 8:27:28 PM
12/30/2010 10:03:37 PM