skip to main | skip to sidebar
Professor Cravens Blog

Sunday, June 6, 2010

Parsing Facebook using Python

This is an example of parsing Facebook to pull status updates out:


 from HTMLParser import HTMLParser
 
 class MyHTMLParser(HTMLParser):
 

  story = False
  time = True
  text = ''
  updates=[]
  times=[]
  
  def handle_starttag(self, tag, attrs):

         #print "Encountered the beginning of a %s tag" % tag
   if tag == 'span' and attrs:
    for name,value in attrs:

     if name=='class' and value=='UIStory_Message':
      self.story=True
     if name=='class' and value=='UIIntentionalStory_Time':

      self.time=True
 
  def handle_endtag(self, tag):
   if self.story:
    #if len(self.text) > 0:
    self.updates.append(self.text)

    self.text = ''
    self.story=False
   
  def handle_data(self, data):
   if self.story:
    self.text = self.text + data.strip().replace("\n","")

   if self.time:
    update_count = len(self.updates)
    if( update_count > 0 ):
     self.updates[update_count-1] = data + " " +self.updates[update_count-1]
    self.time=False
 
 f = open('facebook log.htm', 'r')

 
 htmlSource = f.read()
 myparser = MyHTMLParser()
 myparser.feed(htmlSource)
 
 update_count = len(myparser.updates)
 while update_count > 0:
  update_count -= 1
  print myparser.updates[update_count]
Posted by Paul Vincent Craven at 12:10 PM
Newer Posts Older Posts Home
Subscribe to: Posts (Atom)

About Me

My Photo
Paul Vincent Craven
The pseudo-exciting sagas of a Computer Science Professor at Simpson College, Iowa.
View my complete profile

Where to go from here:

  • Notes for my classes
  • Hot Air Ballooning
  • My DEL.ICIO.US bookmarks

Blog Archive

  • ►  2012 (1)
    • ►  January (1)
      • Educational Assessment
  • ►  2011 (1)
    • ►  May (1)
      • Creating movies from still images
  • ▼  2010 (4)
    • ►  October (1)
      • Creating a brick wall with Blender 2.54
    • ►  September (1)
      • Agile Programming
    • ▼  June (1)
      • Parsing Facebook using Python
    • ►  February (1)
      • Computer Science a Top-Paying Undergraduate Degree...
  • ►  2009 (9)
    • ►  November (1)
      • Lazy programmers
    • ►  August (1)
      • HP Counterfeit Ink
    • ►  July (1)
      • Searching and Sorting algorithms
    • ►  June (2)
      • 8-bit microcontoller programming
      • Python and scope
    • ►  March (1)
      • Running into walls with Python and Pygame
    • ►  February (2)
      • Wells Fargo Rewards
      • DSL Splitter
    • ►  January (1)
      • TinyMCE Javascript WYSIWYG editor
  • ►  2008 (41)
    • ►  December (3)
      • Breakout example code using Python and Pygame
      • Spying On The Homefront
      • Dad makes the paper
    • ►  November (1)
      • Salisbury House
    • ►  October (3)
      • Excellent Logitech Harmony Customer Service
      • Ballooning Photos
      • The Hobbit
    • ►  August (2)
      • Using NBC.com to watch the Olympics
      • Creating Presentations with Beamer
    • ►  July (11)
      • 2008 National Balloon Classic
      • The wicker and nylon rods to form my new balloon b...
      • Here we show how the nylon rod will come up throug...
      • In this photo, I'm staining a piece of 3/4" Ash pl...
      • 2008 National Balloon Classic
      • 2008 National Balloon Classic
      • 2008 National Balloon Classic
      • 2008 National Balloon Classic Parade
      • 2008 National Balloon Classic
      • Updated design
      • Designing a balloon basket
    • ►  June (3)
      • Balloons in June
      • Creating Web-Based Flash Games
      • Creating a 3D movie using Blender
    • ►  May (5)
      • Fish husbandry
      • Blender and cloth simulations
      • Jello Cubes From Heaven 2.0
      • Jello Cubes From Heaven
      • Creating 3D Movies
    • ►  April (4)
      • Web development
      • Earthquakes & TV
      • PHP Profiling
      • Ubuntu 8.04 vs Vista on a Thinkpad X61s
    • ►  March (4)
      • IFIP 11.10 2008 Conference
      • Automatically created PDF thumbnails
      • Download: The True Story of the Internet
    • ►  February (4)
    • ►  January (1)
  • ►  2007 (36)
    • ►  December (10)
    • ►  November (3)
    • ►  October (3)
    • ►  September (2)
    • ►  August (5)
    • ►  July (5)
    • ►  June (5)
    • ►  May (2)
    • ►  April (1)

del.icio.us/pvcraven

Loading...