Pi, Python and I (part 2)

Raspberry PiIn my previous post, I talked about how I’m using a Raspberry Pi to run a Facebook backup service and provided the Python code needed to get (and maintain) a valid Facebook token to do this. This post will be discussing the actual Facebook backup service and the Python code to do that. It will be my second Python program ever (the first was in the previous post), so there will likely be better ways to do what I’ve done, although you’ll see it’s still a pretty simple exercise. (I’m happy to hear about possible improvements.)

The first thing I need to do is pull in all the Python modules that will be useful. The extra packages should’ve been installed from before. Also, because the Facebook messages will be backed-up to Gmail using its IMAP interface, the Google credentials are specified here, too. Given that those credentials are likely to be something you want to keep secret at all costs, all the more reason to run this on a home server rather than on a publicly hosted server.

from facepy import GraphAPI
import urlparse
import dateutil.parser
from crontab import CronTab
import imaplib
import time

# How many status updates to go back in time (first time, and between runs)
MAX_ITEMS = 5000
# How many items to ask for in each request
REQUEST_ITEMS = 25
# Default recipient
DEFAULT_TO = "my_gmail_acct@gmail.com" # Replace with yours
# Suffix to turn Facebook message IDs into email IDs
ID_SUFFIX = "@exportfbfeed.facebook.com"
# Gmail account
GMAIL_USER = "my_gmail_acct@gmail.com" # Replace with yours
# and its secret password
GMAIL_PASS = "S3CR3TC0D3" # Replace with yours
# Gmail folder to use (will be created if necessary)
GMAIL_FOLDER = "Facebook"

Before we get into the guts of the backup service, I first need to create a few basic functions to simplify the code that comes later. Initially, there’s a function that is used to make it easy to pull a value from the results of a call to the Facebook Graph API:

def lookupkey(the_list, the_key, the_default):
  try:
    return the_list[the_key]
  except KeyError:
    return the_default

Next a function to retrieve the Facebook username for a given Facebook user. Given that we want to back-up messages into Gmail, we have to make them look like email. So, each message will have to appear to come from a unique email address belonging to the relevant Facebook user. Since Facebook generally provides all their users with email addresses at the facebook.com domain based on their usernames, I’ve used these. However, to make it a bit more efficient, I cache the usernames in a list so that I don’t have to query Facebook again when the same person appears in the feed multiple times.

def getusername(id, friendlist):
  uname = lookupkey(friendlist, id, '')
  if '' == uname:
    uname = lookupkey(graph.get(str(id)), 'username', id)
    friendlist[id] = uname # Add the entry to the dictionary for next time
  return uname

The email standards expect times and dates to appear in particular formats, so now a function to achieve this based on whatever date format Facebook gives us:

def getnormaldate(funnydate):
  dt = dateutil.parser.parse(funnydate)
  tz = long(dt.utcoffset().total_seconds()) / 60
  tzHH = str(tz / 60).zfill(2)
  if 0 <= tz:
    tzHH = '+' + tzHH
  tzMM = str(tz % 60).zfill(2)
  return dt.strftime("%a, %d %b %Y %I:%M:%S") + ' ' + tzHH + tzMM

Next, a function to find the relevant bit of a URL to help travel back and forth in the Facebook feed. Given that the feed is returned to use from the Graph API in small chunks, we need to know how to query the next or previous chunk in order to get it all. Facebook uses a URL format to give us this information, but I want to unpack it to allow for more targeted navigation.

def getpagingpart(urlstring, part):
  url = urlparse.urlsplit(urlstring)
  qs = urlparse.parse_qs(url.query)
  return qs[part][0]

Now a function to construct the headers and body of the email from a range of information gleaned from processing the Facebook Graph API results.

def message2str(fromname, fromaddr, toname, toaddr, date, subj1, subj2, msgid, msg1, msg2, inreplyto=''):
  if '' == inreplyto:
    header = ''
  else:
    header = 'In-Reply-To: <' + inreplyto + '>\n'
  utcdate = dateutil.parser.parse(date).astimezone(dateutil.tz.tzutc()).strftime("%a %b %d %I:%M:%S %Y")
  return "From nobody {}\nFrom: {} <{}>\nTo: {} <{}>\nDate: {}\nSubject: {} - {}\nMessage-ID: <{}>\n{}Content-Type: text/html\n\n

{}{}

".format(utcdate, fromname, fromaddr, toname, toaddr, date, subj1, subj2, msgid, header, msg1, msg2)

Okay, now we've gotten all that out of the way, here's the main function to process a message obtained from the Graph API and place it in an IMAP message folder. The Facebook message is in the form of a dictionary, so we can look up the relevant parts by using keys. In particular, any comments to a message will appear in the same format, so we recurse over those as well using the same function.

Note that in a couple of places I call encode("ascii", "ignore"). This is an ugly hack that strips out all of the unicode goodness that was in the original Facebook message (which allows foreign language characters and symbols), dropping anything exotic to leave plain ASCII characters behind. However, for some reason, the Python installation on my Raspberry Pi would crash the program whenever it came across unusual characters. To ensure that everything works smoothly, I ensure that these aren't present when the text is processed later.

def printdata(data, friendlist, replytoid='', replytosub='', max=MAX_ITEMS, conn=None):
  c = 0
  for d in data:
    id = lookupkey(d, 'id', '') # get the id of the post
    msgid = id + ID_SUFFIX
    try: # get the name (and id) of the friend who posted it
      f = d['from']
      n = f['name'].encode("ascii", "ignore")
      fid = f['id']
      uname = getusername(fid, friendlist) + "@facebook.com"
    except KeyError:
      n = ''
      fid = ''
      uname = ''
    try: # get the recipient (eg. if a wall post)
      dest = d['to']
      destn = dest['name']
      destid = dest['id']
      destname = getusername(destid, friendlist) + "@facebook.com"
    except KeyError:
      destn = ''
      destid = ''
      destname = DEFAULT_TO
    t = lookupkey(d, 'type', '') # get the type of this post
    try:
      st = d['status_type']
      t += " " + st
    except KeyError:
      pass
    try: # get the message they posted
      msg = d['message'].encode("ascii", "ignore")
    except KeyError:
      msg = ''
    try: # there may also be a description
      desc = d['description'].encode("ascii", "ignore")
      if '' == msg:
        msg = desc
      else:
        msg = msg + "
\n" + desc except KeyError: pass try: # get an associated image img = d['picture'] msg = msg + '
\n' except KeyError: img = '' try: # get link details if they exist ln = d['link'] ln = '
\nlink' except KeyError: ln = '' try: # get the date date = d['created_time'] date = getnormaldate(date) except KeyError: date = '' if '' == msg: continue if '' == replytoid: email = message2str(n, uname, destn, destname, date, t, id, msgid, msg, ln) else: email = message2str(n, uname, destn, destname, date, 'Re: ' + replytosub, replytoid, msgid, msg, ln, replytoid + ID_SUFFIX) if conn: conn.append(GMAIL_FOLDER, "", time.time(), email) else: print email print "----------" try: # process comments if there are any comments = d['comments'] commentdata = comments['data'] printdata(commentdata, friendlist, replytoid=id, replytosub=t, conn=conn) except KeyError: pass c += 1 if c == max: break return c

The last bit of the program uses these functions to perform the backup and to set up a cron job to run the program again every hour. Here's how it works..

First, I grab the Facebook Graph API token that the previous program (setupfbtoken.py) provided, and initialise the module that will be used to query it.

# Initialise the Graph API with a valid access token
try:
  with open("fbtoken.txt", "r") as f:
    oauth_access_token = f.read()
except IOError:
  print 'Run setupfbtoken.py first'
  exit(-1)

# See https://developers.facebook.com/docs/reference/api/user/
graph = GraphAPI(oauth_access_token)

Next, I set up the connection to Gmail that will be used to store the messages using the credentials from before.

# Setup mail connection
mailconnection = imaplib.IMAP4_SSL('imap.gmail.com')
mailconnection.login(GMAIL_USER, GMAIL_PASS)
mailconnection.create(GMAIL_FOLDER)

Now we just need to initialise some things that will be used in the main loop: the cache of the Facebook usernames, the count of the number of status updates to read, and the timestamp that marks the point in time to begin reading status from. This last one is to ensure that we don't keep uploading the same messages again and again, and the timestamp is kept in the file fbtimestamp.txt.

friendlist = {}

countdown = MAX_ITEMS
try:
  with open("fbtimestamp.txt", "r") as f:
    since = '&since=' + f.read()
except IOError:
  since = ''

Now we do the actual work, reading the status feed and processing them:

stream = graph.get('me/home?limit=' + str(REQUEST_ITEMS) + since)
newsince = ''
while stream and 0 < countdown:
  streamdata = stream['data']
  numitems = printdata(streamdata, friendlist, max=countdown, conn=mailconnection)
  if 0 == numitems:
    break;
  countdown -= numitems
  try: # get the link to ask for next (going back in time another step)
    p = stream['paging']
    next = p['next']
    if '' == newsince:
      try:
        prev = p['previous']
        newsince = getpagingpart(prev, 'since')
      except KeyError:
        pass
  except KeyError:
    break
  until = '&until=' + getpagingpart(next, 'until')
  stream = graph.get('me/home?limit=' + str(REQUEST_ITEMS) + since + until)

Now we clean things up: record the new timestamp and close the connection to Gmail.

if '' != newsince:
  with open("fbtimestamp.txt", "w") as f:
    f.write(newsince) # Record the new timestamp for next time

mailconnection.logout()

Finally, we set up a cron job to keep the status updates flowing. As you can probably guess from this code snippet, this all is meant to be saved in a file called exportfbfeed.py.

cron = CronTab() # get crontab for the current user
if [] == cron.find_comment("exportfbfeed"):
  job = cron.new(command="python ~/exportfbfeed.py", comment="exportfbfeed")
  job.minute.on(0) # run this script @hourly, on the hour
  cron.write()

Alright. Well, that was a little longer than I thought it would be. However, the bit that does the actual work is not very big. (No sniggering, people. This is a family show.)

It's been interesting to see how stable the Raspberry Pi has been. While it wasn't designed to be a home server, it's been running fine for me for weeks.

There was an additional benefit to this backup service that I hadn't expected. Since all my email and Facebook messages are now in the one place, I can easily search the lot of them from a single query. In fact, the Facebook search feature isn't very extensive, so it's great that I can now do Google searches to look for things people have sent me via Facebook. It's been a pretty successful project for me and I'm glad I got the chance to play with a Raspberry Pi.

For those that want the original source code files, rather than cut-and-pasting from this blog, you can download them here:

If you end up using this for something, let me know!

Pi, Python and I (part 1)

Raspberry PiI’ve been on Facebook for almost six years now, and active for almost five. This is a long time in Internet time.

Facebook has, captured within it, the majority of my interactions with my friends. Many of them have stopped blogging and just share via Facebook, now. (Although, at least two have started blogging actively in the last year or so, and perhaps all is not lost.) At the start, I wasn’t completely convinced it would still be around – these things tended to grow and then fade within just a few years. So, I wasn’t too concerned about all the *stuff* that Facebook would accumulate and control. I don’t expect them to do anything nefarious with it, but I don’t expect them to look after it, either.

However, I’ve had a slowly building sense that I should do something about it. What if Facebook glitched, and accidentally deleted everything? There’s nothing essential in there, but there are plenty of memories I’d like to preserve. I really wanted my own backup of my interactions with my friends, in the same way I have my own copies of emails that I’ve exchanged with people over the years. (Although, fewer people seem to email these days, and again they just share via Facebook.)

The trigger to finally do something about this was when every geek I knew seemed to have got themselves a Raspberry Pi. I tried to think of an excuse to get one myself, and didn’t have to think too hard. I could finally sort out this Facebook backup issue.

Part of the terms of my web host are that I can’t run any “robots” – it’s purely meant to handle incoming web requests. Also, none of the computers at home are on all the time, as we only have tablets, laptops and phones. I didn’t have a server that I could run backup software on.. but a Raspberry Pi could be that server.

For those who came in late, the Raspberry Pi is a tiny, single-board computer that came out last year, is designed and built in the UK, and (above all) is really, really cheap. I ordered mine from the local distributor, Element14, whose prices start at just under $30 for the Model A. To make it work, you need to at least provide a micro-USB power supply ($5 if you just want to plug it into your car, but more like $20 if you want to plug it into the wall) and a Micro SD card ($5-$10) to provide the disk, so it’s close to $60, unless you already have those to hand. You can get the Model B, which is about $12 more and gets you both more memory and an Ethernet port, which is what I did. You’ll need to find an Ethernet cable as well, in that case ($4).

When a computer comes that cheap, you can afford to get one for projects that would otherwise be too expensive to justify. You can give them to kids to tinker with and there’s no huge financial loss if they brick them. Also, while cheap, they can do decent graphics through an HDMI port, and have been compared to a Microsoft Xbox. No wonder they managed to sell a million units in their first year. Really, I’m a bit slow on the uptake with the Raspberry Pi, but I got there in the end.

While you can run other operating systems onto it, if you get a pre-configured SD card, it comes with a form of Linux called Raspbian and has a programming language called Python set up ready to go. Hence, I figured as well as getting my Facebook backup going, I could use this as an excuse to teach myself Python. I’d looked at it briefly a few years back, but this would be the first time I’d used it in anger. I’ll document here the steps I went through to implement my project, in case anyone else wants to do something similar or just wants to learn from this (if only to learn how simple it is).

The first thing to do is to head over to developers.facebook.com and create a new “App” that will have the permissions that I’ll use to read my Facebook  feed. Once I logged in, I chose “Apps” from the toolbar at the top and clicked on “Create New App”. I gave my app a cool name (like “Awesome Backup Thing”) and clicked on “Continue”, passed the security check to keep out robots, and the app was created. The App ID and App secret are important and should be recorded somewhere for later.

Now I just needed to give it the right permissions. Under the Settings menu, I clicked on “Permissions”, then added in the ones needed into the relevant fields. For what I want, I needed: user_about_me, user_status, friends_about_me, friends_status, and read_stream. “Save Changes” and this step is done. Actually, I’m not sure if this is technically needed, given the next step.

Now I needed to get a token that can be used by the software on the server to query Facebook from time to time. The easiest way is to go to the Graph API Explorer, accessible under the “Tools” menu in the toolbar.

I changed the Application specified in the top right corner to Awesome Backup Thing (insert your name here), then clicked on “Get access token”. Now I need to specify the same permissions as before, across the three tabs of User Data Permissions (user_about_me, user_status), Friends Data Permissions (friends_about_me, friends_status) and Extended Permissions (read_stream). Lastly, I clicked on “Get Access Token”, clicked “OK” to the Facebook confirmation page that appeared, and returned to the Graph API explorer where there was a new token waiting for me in the “Access token” textbox. It’ll be needed later, but it’s valid for about two hours. If you need to generate another one, just click “Get access token” again.

Now it’s time to return to the Pi. Once I logged in, I needed to set up some additional Python packages like this:

$ sudo pip install facepy
$ sudo pip install python-dateutil
$ sudo pip install python-crontab

And then I was ready to write some code. The first thing was to write the code that will keep my access token valid. The one that Facebook provides via the Graph API Explorer expires too quickly and can’t be renewed, so it needs to be turned into a renewable access token with a longer life. This new token then needs to be recorded somewhere so that we can use it for the backing-up. Luckily, this is pretty easy to do with those Python packages. The code looks like this (you’ll need to put in the App ID, App Secret, and Access Token that Facebook gave you):

# Write a long-lived Facebook token to a file and setup cron job to maintain it
import facepy
from crontab import CronTab
import datetime

APP_ID = '1234567890' # Replace with yours
APP_SECRET = 'abcdef123456' # Replace with yours

try:
  with open("fbtoken.txt", "r") as f:
  old_token = f.read()
except IOError:
  old_token = ''
if '' == old_token:
  # Need to get old_token from https://developers.facebook.com/tools/explorer/
  old_token = 'FooBarBaz' # Replace with yours

new_token, expires_on = facepy.utils.get_extended_access_token(old_token, APP_ID, APP_SECRET)

with open("fbtoken.txt", "w") as f:
  f.write(new_token)

cron = CronTab() # get crontab for the current user
for oldjob in cron.find_comment("fbtokenrenew"):
  cron.remove(oldjob)
job = cron.new(command="python ~/setupfbtoken.py", comment="fbtokenrenew")
renew_date = expires_on - datetime.timedelta(1)
job.minute.on(0)
job.hour.on(1) # 1:00am
job.dom.on(renew_date.day)
job.month.on(renew_date.month) # on the day before it's meant to expire
cron.write()

Apologies for the pretty rudimentary Python coding, but it was my first program. The only other things to explain are that the program sits in the home directory as the file “setupfbtoken.py” and when it runs, it writes the long-lived token to “fbtoken.txt” then sets up a cron-job to refresh the token before it expires, by running itself again.

I’ll finish off the rest of the code in the next post.