Drupalcon Boston Recap: Bringing Guests to the Drupal External Data Party

This year's Drupalcon was the biggest Drupal Party to date with over 800 attendees and many different sponsors (including Chapter Three of course). Drupal rocked Boston all week and gave everyone in the community a chance to catch up and keep abreast of some of the cool new stuff going on in the Drupal World. Plus, there was some scrumpcious and tasty vegetarian food on the scene.

In my Drupalcon rockstar moment, I took the stage and gave a session with Drupal Genius Neil Drumm and Digital Newspaper Extraordinaire Ken Rickard on Using External Data Sources with Drupal (slides here). The general idea is that Drupal is marvelous for content management, but becomes even better when it integrates with data from external sources and Drupal allows a lot of different ways to make that happen.

Ken kicked off our session by laying the groundwork for why using external data is important and transitioned into talking about some of the basic tools (drupal_execute, drupal_http_request, database switching) and techniques (lazy instantiation, hook_search) needed to make it all happen. He has a lot of good experience around using external data sources from his work in the newspaper world and provided a detailed orientation to the 150 odd people who attended our session.

Neil talked about his work with a wonderful organization called Map Light that records and analyzes information about campaign contributions and congressional votes. For that project Neil had to import *millions* of pieces of information and developed a couple modules to assist him in his work. Job Queue is his module that helps to import external data in batches (instead of all at once) and Import Manager allows administrators to monitor and manage the overall import process.

For my part, I concluded the session by discussing a few specific case examples of how to integrate external data into a Drupal website. I started with an example, which was dear to my heart as a librarian and admirer of philosophy, was to use the Swish-E Module (which I maintain on Drupal.org) to full text index a folder of philosophical essays in .PDF format and make them available via the Drupal search. The second example (see below) was probably the coolest and involved some data mining magic to use the Data Miner API to take a Drupal user's MySpace name and automatically download their picture from MySpace for display on their user profile. This general technique I was calling "Data Enrichment" and could be used to enhance the data a Drupal site has around either users or specific pieces of node content. Careful consideration here needs to be made to respect both the terms of service and privacy of your users. The final example I presented had to do with some client work Chapter Three did involving the migration of 20,000+ pieces of raw HTML content through a cool audit system set up as several Views and managed by the Workflow System. This system seemed to have a lot of practical value for people and I got a lot of questions about it after the session.

All in all, Drupalcon was awesome and we are looking forward to Drupalcon Europe later this year and Drupalcon San Francisco in 2009.

Topics