ŷhat

The Yhat Blog


machine learning, data science, engineering




  • Rodeo: A data science IDE for Python

    by Greg | Apr 23 2015

    Introducing our latest open source project: the Rodeo IDE.


  • Building a Client-side Blog Search Algorithm

    by Greg | Apr 14 2015

    How we built a page recommender to power our blog's search engine.


  • db.py 0.4: Handlebars Meets SQL

    by Greg | Mar 13 2015

    Learn how to use db.py and Handlebars to make your SQL scripts shorter and easier to read.


  • ML Pitfalls: Measuring Performance (Part 1)

    by Eric | Mar 03 2015

    Common machine learning pitfalls and how to avoid them.


  • Base R Plots

    by Greg | Feb 23 2015

    Introduction to plotting and graphics in R (without ggplot2)


  • What is Linear Regression? A Qualitative Exploration

    by Greg | Feb 19 2015

    A high level introduction to what linear regression is and how it works.


  • 11 Python Libraries You Might Not Know

    by Greg | Jan 20 2015

    A highlight of 10 lesser-known Python libraries, that even you experienced Pythonistas may have not seen!


  • Running R in Parallel (the easy way)

    by Greg | Jan 14 2015

    Running code in parallel is tricky. This post shows how to quickly (and easily) parallelize your R code.


  • Currency Portfolio Optimization Using ScienceOps

    by Ryan J. O'Neil | Jan 05 2015

    Create a currency portfolio optimization algorithm and deploy it to ScienceOps


  • Scraping and Analyzing Baseball Data with R

    by Greg | Dec 23 2014

    A quick howto on scraping and analyzing MLB data using R.


  • Reducing your R memory footprint by 7000x

    by Greg | Dec 17 2014

    R can be a bit bloated someitmes. Learn how to make your R models more effecient.


  • Naive Bayes in Python

    by Greg | Dec 11 2014

    How to implement your own naive bayes classifier in Python and a detailed explanation of how it all works.


  • Introducing db.r

    by Greg | Dec 04 2014

    db.py but for R. A database library that makes working with SQL in R a little more enjoyable.


  • How Yhat Does Cloud Balancing: A Case Study

    by Ryan J. O'Neil | Nov 10 2014

    How we use optimization to minimize our server costs without impacting server up-time.


  • Introducing db.py

    by Greg Lamp | Nov 05 2014

    Our latest contribution to the open source community: db.py. A database library for working with SQL in pandas/python.


  • Using data science to build better products

    by Colin Ristig | Sep 17 2014

    How data science and machine learning can be embedded into products to make them better.


  • Analysing your e-commerce funnel with R

    by Justin Marciszewski | Aug 05 2014

    Case study using R to evaluate the impact of your website changes


  • Fuzzy Matching with Yhat

    by Greg | Jul 23 2014

    An use case of using Yhat and the python library "fuzzywuzzy" to build your own string matching service.


  • Yhat ScienceBox

    by Colin Ristig | Jun 17 2014

    A brief overview of our newest product: ScienceBox!


  • Python Sparse Random Projections

    by Adrian Rosebrock | Jun 05 2014

    Sparse Random Projections are a great for dimensionality reduction. Here's a great example


  • Yhat meets Go

    by Jess Frazelle | May 29 2014

    We're big fans of Go, and here's one of the ways we use it.


  • Neural networks and a dive into Julia

    by Eric Chiang | May 15 2014

    An into to Julia and how to use it to build neural networks.


  • ggplot tutorial

    by Greg | May 02 2014

    Want to try ggplot for Python? This post shows you how to analyze MLB data using ggplot and pandas.


  • Python Multi-armed Bandits (and Beer!)

    by Eric Chiang | Apr 07 2014

    Bandit algorithms + beer. A match made in heaven.


  • Predicting customer churn with scikit-learn

    by Eric Chiang | Mar 20 2014

    Using machine learning to predict which customers are likely to churn.


  • Real-time NLP with Twitter and Yhat

    by Greg | Mar 14 2014

    Using the twitter steaming API, NLTK, and Yhat to classify tweets in real-time.


  • Yhat at NY Enterprise Technology Meetup

    by Greg | Mar 11 2014

    Our NYC Enterprise Tech meetup talk/slides.


  • Yhat at the SF Data Science Meetup

    by Greg | Feb 17 2014

    Our talk at the SF data science metup.


  • Image Processing with scikit-image

    by Eric Chiang | Jan 30 2014

    An introduction to the wonderful python package, scikit-image.


  • What's new in ggplot-0.4?

    by Yhat | Jan 22 2014

    Announcing ggplot 0.4 for Python!


  • Data Science in Python

    by Greg | Jan 13 2014

    A series of IPython notebooks that give an introduction to using Python for data science.


  • Detecting Outlier Car Prices on the Web

    by Josh Levy | Dec 18 2013

    A case study of how Vast.com detects outliers using ScienceOps.


  • Weather Forecasting with Twitter & Pandas

    by Eric Chiang | Dec 05 2013

    How to forecast the weather using Twitter data and the pandas library


  • Building email reports with R

    by yhat | Nov 22 2013

    Use R to send dashboards and reports by email


  • Aggregating & plotting time series in python

    by yhat | Nov 03 2013


  • ggplot for python

    by Yhat | Oct 13 2013

    Announcing ggplot for Python! Our initial version of ggplot for python.


  • Random Forest Regression and Classification in R and Python

    by yhat | Sep 29 2013

    Side by side comparison of various Random Forest implementations in R and Python


  • Fast summary statistics in R with data.table

    by Jeff | Sep 26 2013

    R can be a bit slow. If you need to speed things up, give data.table a try. This post provides a quick intro with some useful snippets.


  • Two great things that go great together: Yhat and fantasy football

    by Drew Conway | Aug 25 2013

    Learn how to predict a fantasy football draft with Drew Conway.


  • Estimating User Lifetimes - the right and many wrong ways

    by Cam Davidson-Pilon | Aug 20 2013

    Learn how to use PyMC to determine which users will stick with you.


  • Machine Learning for Predicting Bad Loans

    by yhat | Aug 16 2013

    Using the open LendingClub dataset to develop a credit model.


  • 10 Books for Data Enthusiasts

    by yhat | Aug 11 2013

    Our 10 favorite data books.


  • PyData Boston 2013 Slides

    by yhat | Jul 29 2013

    A review of our presentation at PyData Boston, 2013.


  • Intuitive Classification using KNN and Python

    by yhat | Jul 25 2013

    Overview of K-Nearest Neighbors and how to use it


  • Recognizing Handwritten Digits in Python

    by yhat | Jul 14 2013

    Building a handwriting detector using ScienceOps and node.js.


  • Named Entities in Law & Order Episodes

    by yhat | Jul 04 2013

    A post combining our two favorite things: Law & Order and Natural Language Processing.


  • Running R in the Cloud (Part 1)

    by yhat | Jun 27 2013

    Getting up and running with RStudio on EC2.


  • Statistical Quality Control in R

    by yhat | Jun 25 2013

    An in-depth look at the qcc quality control library in R for catching outliers in time series data.


  • Recommendation System in R

    by yhat | Jun 19 2013

    Building a beer recommender using R.


  • Content-based image classification in Python

    by yhat | Jun 12 2013

    Using machine learning to classify image based on their contents


  • Random Forests in Python

    by yhat | Jun 05 2013

    An introduction to working with random forests in Python.


  • Fitting & Interpreting Linear Models in R

    by yhat | May 18 2013

    An overview of inspecting linear model results in R.


  • Deploy Your R Models to yhat

    by yhat | May 10 2013

    Announcing Yhat support for R. Learn how to deploy your R models on ScienceOps.


  • pandas & google analytics

    by yhat | Apr 12 2013

    Learn how to import your Google Analytics data into pandas and analyze it using python.


  • 7 handy SQL features for data scientists

    by yhat | Apr 09 2013

    Some tips and tricks for data scientists using SQL.


  • yhat is going to PyCon

    by yhat | Mar 10 2013

    Our talk at PyCon.


  • Logistic Regression in Python

    by yhat | Mar 03 2013

    The basics you need to know for doing logsitic regression in Python.


  • SQL for pandas DataFrames

    by yhat | Feb 24 2013

    Use SQL but new to python? Check out pandasql. An easy way for SQL users to learn pandas.


  • R and pandas and what I've learned about each

    by yhat | Feb 16 2013

    Showcasing common data analysis operations in R and Python


  • Setting Up Scientific Python

    by yhat | Feb 15 2013

    A quick getting started guide to using scientific python.


  • 10 R packages I wish I knew about earlier

    by yhat | Feb 10 2013

    10 great R packages that we love and use every day!


  • Predicting SMS spam

    by yhat | Jan 08 2013

    Integrating Twilio with scikit-learn using Yhat.


  • Repeatable, Scalable, Analytics using yhat

    by yhat | Jan 05 2013

    Welcome to the Yhat blog!