Vote on HN

R and pandas and what I've learned about each

by yhat

In step with our recent article about essential R packages, this post explores tools for data analysis in Python.

what is pandas?

pandas is the utility belt for data analysts using python. The package centers around the pandas DataFrame, a two-dimensional data structure with indexable rows and columns. It has effectively taken the best parts of Base R, R packages like plyr and reshape2 and consolidated them into a single library. It has lots of features (see library highlights). pandas gets its name from panel data, an econometrics term for multidimensional structured datasets (McKinney 5., 2013)

Pandas has a lot in common with R (pandas comparison with R), and as someone who's familiar with R and Python (but not specifically pandas) I've found pandas to be extremely easy to use. This is a post about R and pandas and about what I've learned about each.

Munging and Plotting in Python

This just scratches the surface of pandas' functionality. Another topic that isn't mentioend in this post is the excellent time series capabilities that pandas has (similar to zoo in R). They're extensive enough that it merits its own post. In the meantime you can check out some of Wes McKinney's great tutorials .


yhat is the easiest way to operationalize predictive models.

Contact us at info@yhathq.com for details.