Categories
Presentation Programming Twitter Visualization

Pycon AU: Exploring Science on Twitter with IPython Notebook and Python Pandas

Brenda gave a great talk at Pycon-AU about using IPython and Pandas for her research. Slightly rough notes below.

She has a dataset of 12 million tweets containing the word “science” – about a years worth of data, after filtering fout non-English tweets and spam.

Using UTC for fewer timezone problems. Although still some – mostly things expecting the month first cause date-related problems.

Found more tweets about science mid-week than at weekends – this matches wider patterns of Twitter use in other research.

IPython features:

  • describe() – summary of the object.
  • groupby() – reorganize your data-structure to group by some attribute.
  • Exports to Latex.

IP[y] : Notebook

  • Really cool – make notes about what you are doing, interleaved with code.
  • Great for research.
  • ? – inline help.
  • ?? – inline src.
  • %%timeit – times execution, useful for neasuring performance.
  • %pastebin – sends code to pastebin
  • %save – makes a .py file
  • %run – run a script

Pandas

  • Data structures.
  • Data analysis.
  • Time-based indexing.

Overall

I’m pretty fascinated with the results of this research, which we didn’t see much of  as the talk was about the technical setup. I feel like this would have been incredibly handy doing my own research though, and it was good to chat to Brenda at our women’s breakfast and compare notes on other tools like processing, prefuse etc.