pydata

Keep Looking, Don't Settle

Exploratory analysis of Two Sigma Financial Modeling Challenge

Two sigma provides the interesting data: y is a series of capped and floored time series which converged by time. The explainaroty variables have three types: fundmental, derived and techinical. This data also has a lot of missing values. All together makes the prediction interesting.

Exploratory analysis of Two Sigma Connect: Rental Listing Inquiries

this is the exploratory analysis of the data in kaggle Two Sigma Connect: Rental Listing Inquiries. The data itself is very easy to understand. Here it focus on figureing out the relation between the explainatory variables and the dependend variable. Exploring the relation between x and y is very important in building a predictive and powerful model. This is the step one.

increase disk space on vmware for ubuntu

Ubuntu was install on vmware. More spaces are needed since lots of data download. This is a note how to increase the disk space: add new partition, set up file system, and mount to the dir.

linear regression in python, Chapter 1

UCLA ATS has very good introduction of Applied Statistics, including using R/SAS/Stata to do hands-on projects. Here I am trying to provide a python version of the web book about linear regression. At least I will try to cover their first 3 to 4 Chapters based on my time schedule. I will focus on Chapter 2 to discuss linear regression diagnostic. In Chapter 1 I will introduce how to run linear regression in python statsmodels to get the same result as R or SAS. And how to do data analysis and data visualization in python.In the future, I will try to introudce machine learning in sklearn and deep learning in Theano and Tensorflow.

linear regression in python, outliers / leverage detect

in section I will introduce how to detect ourliers and high leverage points in the linear regression. I also shows in graph how the ourliers will affect your regression fitting. More details of detecting using cook's distance, dffits, dfbeta will be in section 2 -- regression diagnostic.