skip to main | skip to sidebar

Professor Craven's Blog

 

Sunday, June 6, 2010

Parsing Facebook using Python

This is an example of parsing Facebook to pull status updates out:


 from HTMLParser import HTMLParser
 
 class MyHTMLParser(HTMLParser):
 

  story = False
  time = True
  text = ''
  updates=[]
  times=[]
  
  def handle_starttag(self, tag, attrs):

         #print "Encountered the beginning of a %s tag" % tag
   if tag == 'span' and attrs:
    for name,value in attrs:

     if name=='class' and value=='UIStory_Message':
      self.story=True
     if name=='class' and value=='UIIntentionalStory_Time':

      self.time=True
 
  def handle_endtag(self, tag):
   if self.story:
    #if len(self.text) > 0:
    self.updates.append(self.text)

    self.text = ''
    self.story=False
   
  def handle_data(self, data):
   if self.story:
    self.text = self.text + data.strip().replace("\n","")

   if self.time:
    update_count = len(self.updates)
    if( update_count > 0 ):
     self.updates[update_count-1] = data + " " +self.updates[update_count-1]
    self.time=False
 
 f = open('facebook log.htm', 'r')

 
 htmlSource = f.read()
 myparser = MyHTMLParser()
 myparser.feed(htmlSource)
 
 update_count = len(myparser.updates)
 while update_count > 0:
  update_count -= 1
  print myparser.updates[update_count]
Posted by Paul Vincent Craven at 12:10 PM
Newer Post Older Post Home

About Me

My photo
Paul Vincent Craven
The pseudo-exciting sagas of a Computer Science Professor at Simpson College, Iowa.
View my complete profile

Where to go from here:

  • Follow on Twitter
  • Learn to Program Arcade Games
  • See my Arduino projects
  • Log of building a hot air balloon
  • My DEL.ICIO.US bookmarks

Blog Archive

  • ►  2016 (1)
    • ►  November (1)
  • ►  2015 (2)
    • ►  April (1)
    • ►  February (1)
  • ►  2014 (1)
    • ►  December (1)
  • ►  2013 (3)
    • ►  March (1)
    • ►  February (2)
  • ►  2012 (10)
    • ►  December (1)
    • ►  October (1)
    • ►  September (3)
    • ►  August (2)
    • ►  June (2)
    • ►  January (1)
  • ►  2011 (1)
    • ►  May (1)
  • ▼  2010 (4)
    • ►  October (1)
    • ►  September (1)
    • ▼  June (1)
      • Parsing Facebook using Python
    • ►  February (1)
  • ►  2009 (9)
    • ►  November (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  March (1)
    • ►  February (2)
    • ►  January (1)
  • ►  2008 (41)
    • ►  December (3)
    • ►  November (1)
    • ►  October (3)
    • ►  August (2)
    • ►  July (11)
    • ►  June (3)
    • ►  May (5)
    • ►  April (4)
    • ►  March (4)
    • ►  February (4)
    • ►  January (1)
  • ►  2007 (35)
    • ►  December (10)
    • ►  November (3)
    • ►  October (3)
    • ►  September (2)
    • ►  August (5)
    • ►  July (5)
    • ►  June (4)
    • ►  May (2)
    • ►  April (1)

del.icio.us/pvcraven

Loading...