The Yhat Blog

machine learning, data science, engineering

  • Rodeo: A data science IDE for Python

    by Greg | Apr 23 2015

    Introducing our latest open source project: the Rodeo IDE.

  • Building a Client-side Blog Search Algorithm

    by Greg | Apr 14 2015

    How we built a page recommender to power our blog's search engine.

  • db.py 0.4: Handlebars Meets SQL

    by Greg | Mar 13 2015

    Learn how to use db.py and Handlebars to make your SQL scripts shorter and easier to read.

  • ML Pitfalls: Measuring Performance (Part 1)

    by Eric | Mar 03 2015

    Common machine learning pitfalls and how to avoid them.

  • Base R Plots

    by Greg | Feb 23 2015

    Introduction to plotting and graphics in R (without ggplot2)

  • What is Linear Regression? A Qualitative Exploration

    by Greg | Feb 19 2015

    A high level introduction to what linear regression is and how it works.

  • 11 Python Libraries You Might Not Know

    by Greg | Jan 20 2015

    A highlight of 10 lesser-known Python libraries, that even you experienced Pythonistas may have not seen!

  • Running R in Parallel (the easy way)

    by Greg | Jan 14 2015

    Running code in parallel is tricky. This post shows how to quickly (and easily) parallelize your R code.

  • Currency Portfolio Optimization Using ScienceOps

    by Ryan J. O'Neil | Jan 05 2015

    Create a currency portfolio optimization algorithm and deploy it to ScienceOps

  • Scraping and Analyzing Baseball Data with R

    by Greg | Dec 23 2014

    A quick howto on scraping and analyzing MLB data using R.

  • Reducing your R memory footprint by 7000x

    by Greg | Dec 17 2014

    R can be a bit bloated someitmes. Learn how to make your R models more effecient.

  • Naive Bayes in Python

    by Greg | Dec 11 2014

    How to implement your own naive bayes classifier in Python and a detailed explanation of how it all works.

  • Introducing db.r

    by Greg | Dec 04 2014

    db.py but for R. A database library that makes working with SQL in R a little more enjoyable.

  • How Yhat Does Cloud Balancing: A Case Study

    by Ryan J. O'Neil | Nov 10 2014

    How we use optimization to minimize our server costs without impacting server up-time.

  • Introducing db.py

    by Greg Lamp | Nov 05 2014

    Our latest contribution to the open source community: db.py. A database library for working with SQL in pandas/python.

  • Using data science to build better products

    by Colin Ristig | Sep 17 2014

    How data science and machine learning can be embedded into products to make them better.

  • Analysing your e-commerce funnel with R

    by Justin Marciszewski | Aug 05 2014

    Case study using R to evaluate the impact of your website changes

  • Fuzzy Matching with Yhat

    by Greg | Jul 23 2014

    An use case of using Yhat and the python library "fuzzywuzzy" to build your own string matching service.

  • Yhat ScienceBox

    by Colin Ristig | Jun 17 2014

    A brief overview of our newest product: ScienceBox!

  • Python Sparse Random Projections

    by Adrian Rosebrock | Jun 05 2014

    Sparse Random Projections are a great for dimensionality reduction. Here's a great example

  • Yhat meets Go

    by Jess Frazelle | May 29 2014

    We're big fans of Go, and here's one of the ways we use it.

  • Neural networks and a dive into Julia

    by Eric Chiang | May 15 2014

    An into to Julia and how to use it to build neural networks.

  • ggplot tutorial

    by Greg | May 02 2014

    Want to try ggplot for Python? This post shows you how to analyze MLB data using ggplot and pandas.

  • Python Multi-armed Bandits (and Beer!)

    by Eric Chiang | Apr 07 2014

    Bandit algorithms + beer. A match made in heaven.

  • Predicting customer churn with scikit-learn

    by Eric Chiang | Mar 20 2014

    Using machine learning to predict which customers are likely to churn.

  • Real-time NLP with Twitter and Yhat

    by Greg | Mar 14 2014

    Using the twitter steaming API, NLTK, and Yhat to classify tweets in real-time.

  • Yhat at NY Enterprise Technology Meetup

    by Greg | Mar 11 2014

    Our NYC Enterprise Tech meetup talk/slides.

  • Yhat at the SF Data Science Meetup

    by Greg | Feb 17 2014

    Our talk at the SF data science metup.

  • Image Processing with scikit-image

    by Eric Chiang | Jan 30 2014

    An introduction to the wonderful python package, scikit-image.

  • What's new in ggplot-0.4?

    by Yhat | Jan 22 2014

    Announcing ggplot 0.4 for Python!

  • Data Science in Python

    by Greg | Jan 13 2014

    A series of IPython notebooks that give an introduction to using Python for data science.

  • Detecting Outlier Car Prices on the Web

    by Josh Levy | Dec 18 2013

    A case study of how Vast.com detects outliers using ScienceOps.

  • Weather Forecasting with Twitter & Pandas

    by Eric Chiang | Dec 05 2013

    How to forecast the weather using Twitter data and the pandas library

  • Building email reports with R

    by yhat | Nov 22 2013

    Use R to send dashboards and reports by email

  • Aggregating & plotting time series in python

    by yhat | Nov 03 2013

  • ggplot for python

    by Yhat | Oct 13 2013

    Announcing ggplot for Python! Our initial version of ggplot for python.

  • Random Forest Regression and Classification in R and Python

    by yhat | Sep 29 2013

    Side by side comparison of various Random Forest implementations in R and Python

  • Fast summary statistics in R with data.table

    by Jeff | Sep 26 2013

    R can be a bit slow. If you need to speed things up, give data.table a try. This post provides a quick intro with some useful snippets.

  • Two great things that go great together: Yhat and fantasy football

    by Drew Conway | Aug 25 2013

    Learn how to predict a fantasy football draft with Drew Conway.

  • Estimating User Lifetimes - the right and many wrong ways

    by Cam Davidson-Pilon | Aug 20 2013

    Learn how to use PyMC to determine which users will stick with you.

  • Machine Learning for Predicting Bad Loans

    by yhat | Aug 16 2013

    Using the open LendingClub dataset to develop a credit model.

  • 10 Books for Data Enthusiasts

    by yhat | Aug 11 2013

    Our 10 favorite data books.

  • PyData Boston 2013 Slides

    by yhat | Jul 29 2013

    A review of our presentation at PyData Boston, 2013.

  • Intuitive Classification using KNN and Python

    by yhat | Jul 25 2013

    Overview of K-Nearest Neighbors and how to use it

  • Recognizing Handwritten Digits in Python

    by yhat | Jul 14 2013

    Building a handwriting detector using ScienceOps and node.js.

  • Named Entities in Law & Order Episodes

    by yhat | Jul 04 2013

    A post combining our two favorite things: Law & Order and Natural Language Processing.

  • Running R in the Cloud (Part 1)

    by yhat | Jun 27 2013

    Getting up and running with RStudio on EC2.

  • Statistical Quality Control in R

    by yhat | Jun 25 2013

    An in-depth look at the qcc quality control library in R for catching outliers in time series data.

  • Recommendation System in R

    by yhat | Jun 19 2013

    Building a beer recommender using R.

  • Content-based image classification in Python

    by yhat | Jun 12 2013

    Using machine learning to classify image based on their contents

  • Random Forests in Python

    by yhat | Jun 05 2013

    An introduction to working with random forests in Python.

  • Fitting & Interpreting Linear Models in R

    by yhat | May 18 2013

    An overview of inspecting linear model results in R.

  • Deploy Your R Models to yhat

    by yhat | May 10 2013

    Announcing Yhat support for R. Learn how to deploy your R models on ScienceOps.

  • pandas & google analytics

    by yhat | Apr 12 2013

    Learn how to import your Google Analytics data into pandas and analyze it using python.

  • 7 handy SQL features for data scientists

    by yhat | Apr 09 2013

    Some tips and tricks for data scientists using SQL.

  • yhat is going to PyCon

    by yhat | Mar 10 2013

    Our talk at PyCon.

  • Logistic Regression in Python

    by yhat | Mar 03 2013

    The basics you need to know for doing logsitic regression in Python.

  • SQL for pandas DataFrames

    by yhat | Feb 24 2013

    Use SQL but new to python? Check out pandasql. An easy way for SQL users to learn pandas.

  • R and pandas and what I've learned about each

    by yhat | Feb 16 2013

    Showcasing common data analysis operations in R and Python

  • Setting Up Scientific Python

    by yhat | Feb 15 2013

    A quick getting started guide to using scientific python.

  • 10 R packages I wish I knew about earlier

    by yhat | Feb 10 2013

    10 great R packages that we love and use every day!

  • Predicting SMS spam

    by yhat | Jan 08 2013

    Integrating Twilio with scikit-learn using Yhat.

  • Repeatable, Scalable, Analytics using yhat

    by yhat | Jan 05 2013

    Welcome to the Yhat blog!