The Yhat Blog
machine learning, data science, engineering

Rodeo: A data science IDE for Python
Tweetby Greg  Apr 23 2015Introducing our latest open source project: the Rodeo IDE.

Building a Clientside Blog Search Algorithm
Tweetby Greg  Apr 14 2015How we built a page recommender to power our blog's search engine.

db.py 0.4: Handlebars Meets SQL
Tweetby Greg  Mar 13 2015Learn how to use db.py and Handlebars to make your SQL scripts shorter and easier to read.

ML Pitfalls: Measuring Performance (Part 1)
Tweetby Eric  Mar 03 2015Common machine learning pitfalls and how to avoid them.

Base R Plots
Tweetby Greg  Feb 23 2015Introduction to plotting and graphics in R (without ggplot2)

What is Linear Regression? A Qualitative Exploration
Tweetby Greg  Feb 19 2015A high level introduction to what linear regression is and how it works.

11 Python Libraries You Might Not Know
Tweetby Greg  Jan 20 2015A highlight of 10 lesserknown Python libraries, that even you experienced Pythonistas may have not seen!

Running R in Parallel (the easy way)
Tweetby Greg  Jan 14 2015Running code in parallel is tricky. This post shows how to quickly (and easily) parallelize your R code.

Currency Portfolio Optimization Using ScienceOps
Tweetby Ryan J. O'Neil  Jan 05 2015Create a currency portfolio optimization algorithm and deploy it to ScienceOps

Scraping and Analyzing Baseball Data with R
Tweetby Greg  Dec 23 2014A quick howto on scraping and analyzing MLB data using R.

Reducing your R memory footprint by 7000x
Tweetby Greg  Dec 17 2014R can be a bit bloated someitmes. Learn how to make your R models more effecient.

Naive Bayes in Python
Tweetby Greg  Dec 11 2014How to implement your own naive bayes classifier in Python and a detailed explanation of how it all works.

Introducing db.r
Tweetby Greg  Dec 04 2014db.py but for R. A database library that makes working with SQL in R a little more enjoyable.

How Yhat Does Cloud Balancing: A Case Study
Tweetby Ryan J. O'Neil  Nov 10 2014How we use optimization to minimize our server costs without impacting server uptime.

Introducing db.py
Tweetby Greg Lamp  Nov 05 2014Our latest contribution to the open source community: db.py. A database library for working with SQL in pandas/python.

Using data science to build better products
Tweetby Colin Ristig  Sep 17 2014How data science and machine learning can be embedded into products to make them better.

Analysing your ecommerce funnel with R
Tweetby Justin Marciszewski  Aug 05 2014Case study using R to evaluate the impact of your website changes

Fuzzy Matching with Yhat
Tweetby Greg  Jul 23 2014An use case of using Yhat and the python library "fuzzywuzzy" to build your own string matching service.

Yhat ScienceBox
Tweetby Colin Ristig  Jun 17 2014A brief overview of our newest product: ScienceBox!

Python Sparse Random Projections
Tweetby Adrian Rosebrock  Jun 05 2014Sparse Random Projections are a great for dimensionality reduction. Here's a great example

Yhat meets Go
Tweetby Jess Frazelle  May 29 2014We're big fans of Go, and here's one of the ways we use it.

Neural networks and a dive into Julia
Tweetby Eric Chiang  May 15 2014An into to Julia and how to use it to build neural networks.

ggplot tutorial
Tweetby Greg  May 02 2014Want to try ggplot for Python? This post shows you how to analyze MLB data using ggplot and pandas.

Python Multiarmed Bandits (and Beer!)
Tweetby Eric Chiang  Apr 07 2014Bandit algorithms + beer. A match made in heaven.

Predicting customer churn with scikitlearn
Tweetby Eric Chiang  Mar 20 2014Using machine learning to predict which customers are likely to churn.

Realtime NLP with Twitter and Yhat
Tweetby Greg  Mar 14 2014Using the twitter steaming API, NLTK, and Yhat to classify tweets in realtime.

Yhat at NY Enterprise Technology Meetup
Tweetby Greg  Mar 11 2014Our NYC Enterprise Tech meetup talk/slides.

Yhat at the SF Data Science Meetup
Tweetby Greg  Feb 17 2014Our talk at the SF data science metup.

Image Processing with scikitimage
Tweetby Eric Chiang  Jan 30 2014An introduction to the wonderful python package, scikitimage.


Data Science in Python
Tweetby Greg  Jan 13 2014A series of IPython notebooks that give an introduction to using Python for data science.

Detecting Outlier Car Prices on the Web
Tweetby Josh Levy  Dec 18 2013A case study of how Vast.com detects outliers using ScienceOps.

Weather Forecasting with Twitter & Pandas
Tweetby Eric Chiang  Dec 05 2013How to forecast the weather using Twitter data and the pandas library

Building email reports with R
Tweetby yhat  Nov 22 2013Use R to send dashboards and reports by email


ggplot for python
Tweetby Yhat  Oct 13 2013Announcing ggplot for Python! Our initial version of ggplot for python.

Random Forest Regression and Classification in R and Python
Tweetby yhat  Sep 29 2013Side by side comparison of various Random Forest implementations in R and Python

Fast summary statistics in R with data.table
Tweetby Jeff  Sep 26 2013R can be a bit slow. If you need to speed things up, give data.table a try. This post provides a quick intro with some useful snippets.

Two great things that go great together: Yhat and fantasy football
Tweetby Drew Conway  Aug 25 2013Learn how to predict a fantasy football draft with Drew Conway.

Estimating User Lifetimes  the right and many wrong ways
Tweetby Cam DavidsonPilon  Aug 20 2013Learn how to use PyMC to determine which users will stick with you.

Machine Learning for Predicting Bad Loans
Tweetby yhat  Aug 16 2013Using the open LendingClub dataset to develop a credit model.


PyData Boston 2013 Slides
Tweetby yhat  Jul 29 2013A review of our presentation at PyData Boston, 2013.

Intuitive Classification using KNN and Python
Tweetby yhat  Jul 25 2013Overview of KNearest Neighbors and how to use it

Recognizing Handwritten Digits in Python
Tweetby yhat  Jul 14 2013Building a handwriting detector using ScienceOps and node.js.

Named Entities in Law & Order Episodes
Tweetby yhat  Jul 04 2013A post combining our two favorite things: Law & Order and Natural Language Processing.

Running R in the Cloud (Part 1)
Tweetby yhat  Jun 27 2013Getting up and running with RStudio on EC2.

Statistical Quality Control in R
Tweetby yhat  Jun 25 2013An indepth look at the qcc quality control library in R for catching outliers in time series data.


Contentbased image classification in Python
Tweetby yhat  Jun 12 2013Using machine learning to classify image based on their contents

Random Forests in Python
Tweetby yhat  Jun 05 2013An introduction to working with random forests in Python.

Fitting & Interpreting Linear Models in R
Tweetby yhat  May 18 2013An overview of inspecting linear model results in R.

Deploy Your R Models to yhat
Tweetby yhat  May 10 2013Announcing Yhat support for R. Learn how to deploy your R models on ScienceOps.

pandas & google analytics
Tweetby yhat  Apr 12 2013Learn how to import your Google Analytics data into pandas and analyze it using python.

7 handy SQL features for data scientists
Tweetby yhat  Apr 09 2013Some tips and tricks for data scientists using SQL.


Logistic Regression in Python
Tweetby yhat  Mar 03 2013The basics you need to know for doing logsitic regression in Python.

SQL for pandas DataFrames
Tweetby yhat  Feb 24 2013Use SQL but new to python? Check out pandasql. An easy way for SQL users to learn pandas.

R and pandas and what I've learned about each
Tweetby yhat  Feb 16 2013Showcasing common data analysis operations in R and Python

Setting Up Scientific Python
Tweetby yhat  Feb 15 2013A quick getting started guide to using scientific python.

10 R packages I wish I knew about earlier
Tweetby yhat  Feb 10 201310 great R packages that we love and use every day!

