Monday, 8 July 2013

Processing RSS Feeds and Sending the Results to Instapaper using Java

Back in March I wrote about a small Google App Engine (GAE) app I'd written that polled Google Reader for selected feeds and sent the resulting articles to Instapaper for later, offline, reading.

Google has unfortunately recently decommissioned Google Reader so I've been forced to find an alternative mechanism to do this.  After a bit of investigation I couldn't really find anything lightweight enough that supported this sort of API so decided to see if I could write something myself that would handle the RSS interaction directly.

This turned out to be a really easy job thanks to a 3rd party library called Rome - it made the retrieval and parsing of feeds a trivial exercise, as shown below:

    SyndFeedInput input = new SyndFeedInput();
    SyndFeed feed = input.build(new XmlReader(inputstream));

    List<SyndEntry> entries = (List<SyndEntry>) feed.getEntries();

    entries.get(0).getTitle()...
    entries.get(0).getAuthor()...
    entries.get(0).getLink()...


And so on.

Now that I had a way to retrieve what I needed from the feeds I was interested in, I then needed a way to remember which articles I'd already processed.  For this I used GAE's datastore mechanism:

    Key storeKey = KeyFactory.createKey("Store", "StoreKey");
    DatastoreService datastore = 
        DatastoreServiceFactory.getDatastoreService();

To write:

    entry = new Entity("Entry", storeKey);
    entry.setProperty("id", id);
    entry.setProperty("data", data);
    entry.setProperty("date", new Date());

    datastore.put(entry);

To read by a specific attribute (id in this example):

    Query query = new Query("Entry", storeKey)
        .setFilter(Query.FilterOperator.EQUAL.of("id", id));
    List<Entity> entries = datastore.prepare(query)
        .asList(FetchOptions.Builder.withLimit(1));

To query with an operator (less than a specified date in this example):

    Query query = new Query("Entry", storeKey)
        .setFilter(Query.FilterOperator.LESS_THAN.of("date", date));
    List<Entity> entries = datastore.prepare(query)
        .asList(FetchOptions.Builder.withDefaults());

Using the query above I  perform a housekeeping task that deleted entries older than 10 days old:

    for (Entity entry : entries) {
        datastore.delete(entry.getKey());
    }

The updated code (that no longer references Google Reader) is, as before, available on GitHub.

No comments:

Post a Comment