Scraping along with YQL
| Tuesday, June 23rd, 2009 | --Sean |
I haven't used YQL nearly enough, and I'll need to do some sort of nice overview post at some point to make amends for the general lack of coverage. But there's only 4 minutes left in the day, which does not give us time for an actual interesting, comprehensive post! Instead, you get a feature!
For instance, did you know that, using YQL, you can parse through HTML on any webpage? As per the example given in the console, the following query will pull out the top headlines related to YHOO from Yahoo! Finance.
select href,content from html where url="http://finance.yahoo.com/q?s=yhoo" and xpath='//div[@id="yfi_headlines"]/div[2]/ul/li/a'
And from there, you can generate a nice XML (or JSON) formatted request running through the YQL APIs that you can interact with almost like a real webservice for whatever fun application you've put together.
Because if you're going to be page-scraping a site, you might as well do it right!