Nathaniel Brown's

Guide To Learning Python For Data Analysis

Written March 26, 2016

Often people ask me for advice on how to learn Python for data analyis. I also proffer advice to people who probably don't care. Either way, I usually end up forwarding them an email that I first composed for Steven Pampreen. This post is a slight improvement on that email.

This is a two step process. One, learn python. Two, learn numpy, matplotlib and pandas, three of the most important libraries for doing data analysis with Python.

Learning Python

Probably start learning python at this Udacity course. Continue or maybe skip to this one.

They have rearranged things a bit since I took these courses. It use to be the second course was an add-on to the first, teaching you object oriented features, but now I am not so sure how it is setup.

If you want more to help you memorize things, codeacacemy is a little more brainless/interactive. I have not used this too much but I feel it may help you memorize syntax faster at the expense of understanding.

Learn the Libraries

To be an effective data Python analyst, you basically need to know three libraries:

  • numpy provides lower level vectorization, efficient data structures and linear algebra tools
  • matplotlib makes plots
  • pandas builds on these tools to make you move at lightning speed!

I have not taken this but it looks good: Intro to Data Analysis. It should help learning these libraries.

And read Python for Data Analysis. It was written by the author of pandas, Wes McKinney, a very awesome dude who you should lookup.

Setup Your Environment

Download python here. This distribution will include most everything you need, including some libraries which are particularly difficult to install on windows (like numpy). You may have to edit your system path to include your installation. Google this. If you need libraries not in the Anaconda distribution, use pip. You should check out the IPython Notebook, now the Jupyter Notebook. I believe that after installing Anaconda you should be able to launch one with "jupyter notebook” at a command prompt.

Also, there is the command line version of IPython (started with “ipython”). I almost never use the python shell itself, I either use ipython or the jupyter notebook. Wes McKinney gives a very good guide to setting up and using your environment in his book.

And have a blast! You will get super powers.