http://hackaday.com/2012/06/14/penny-auction-hacking-put-on-your-statisticians-hat/If you look at my code, please be nice.
6/19/2012 10:39:31 AM
That's pretty neat. I know a guy trying to start his own version of one of those sites.Why'd you use Selenium?
6/19/2012 12:11:20 PM
I tried using BeautifulSoup and urllib but when it parsed the website, the containers with bidding information were empty (due to the AJAX script running). The only scraping module that would actually recover the values I wanted was Selenium.The script basically functions by opening an auction in a window, recovering bidding info, then refreshing every 10 seconds. After the auction ends, the data is organized and dumped into neat files and a final summary file that contains all the auction goodies you would want to know (Number of bidders, number of bids per user, auction length, etc).
6/19/2012 12:20:43 PM
kick-ass dude, nice job.you probably mentioned it in one of your blog posts, but why aren't you just scraping directly with http requests? why bother with selenium? also, you could much more easily analyze the data if you use a real database and not just csv/excel[Edited on June 19, 2012 at 12:30 PM. Reason : nm just read ^ still, i can't imagine you couldn't scrape it if you know the right ajax]
6/19/2012 12:23:52 PM
You're right about scraping the AJAX requests directly. There was a way to do it, but it required individual cookies that the server generated (http://pennystats.blogspot.com/2012/04/very-interesting-find.html). It is possible to do it that way, but the data was messy and I honestly didn't know how to generate valid auction cookies and scrape them directly.Selenium offered a turn-key solution that just worked, so I just decided to go with it.[Edited on June 19, 2012 at 12:42 PM. Reason : .]
6/19/2012 12:39:28 PM
i too would have approached it with php/curl and DBed the data, but hey, if you got a solution that works for you why not
6/19/2012 12:44:18 PM
I am pretty sure the majority of people that scrape the data use php and dump them into a database. (http://www.allpennyauctions.com/).Another benefit of doing it that way would be that I could use a significantly less powerful server to scrape data. Right now I have a dual core Xeon server (3.3ghz) with 8GB of RAM chugging away and it can only scrape about 2000-2500 auctions per day. I think if I upped the ram to 16 GB I could probably grab them all at once.
6/19/2012 12:50:07 PM
http://pennystats.blogspot.com/2012/04/first-post-in-what-could-be-quite.htmlThat pop up next to the scroll bar is annoying as shit
6/19/2012 12:54:29 PM
I usually use a scrolly mouse so I never noticed. I can see how that would be annoying.The worst part is that wordpress doesn't allow you to modify their "Dynamic" theme so there's nothing I can do about it.
6/19/2012 12:57:55 PM
Very nice work.
6/19/2012 3:25:39 PM
You'd have to venture over to Java, but htmlunit would give you a way to run the page's Javascript.
6/19/2012 6:44:43 PM
This thread is epic. Great work timbo
6/19/2012 7:17:51 PM
How can I use this to make money?
6/20/2012 12:41:39 AM
You need to break down the data and look at stuff you want to target. Then look for the best time to try and win.The charts of the day are useful for doing this. This one in particular.http://pennystats.blogspot.com/2012/06/pennystats-chart-of-day-61112.html
6/20/2012 9:18:32 AM
So basically I should log in the middle of the night on weekends, buy $50 gift cards, and sell them to Plastic Jungle?That simple?
6/21/2012 4:00:08 AM
That was my theory. But 5000+ people have read my blog since then, so I duno if it is still applicable. You could always use my software to data mine and see if those statistics are still accurate.[Edited on June 21, 2012 at 1:29 PM. Reason : spelling]
6/21/2012 1:28:47 PM