variable selection in linear regression: 2 — pydata: Huiming's learning notes

summary

This is an example of variable selection in linear regression(and it can be easily applied in logistic regression). It has two steps:

The first step is univariate analysis. It checks the pearson correlation between each x and y. This is helpful to reduce the candidate predictors by only selecting the variables has significant correlation. The cutoff point is set p-value no more than 10%. Also the correlation coefficient sign is tested to make sure the correlation should make sense.

The second step is multi-variate analysis. For the variables selected in step 1(say there are n variables left), all possible variables combinations will be run in the linear regression. Like all 2 variables combination(C(n, 2) combinations), 3 variables combination(C(n, 3) combinations)... will be tested in the linear regression. If there are duplicated variables(like GDP annual change and GDP quarterly change) in the model, it will de droped. The regression sign of each variable will be checked again. If any sign does not make sense, then the model will be droped. If there is any variable is not significant, then the model will be droped. If there is multicollinearity(checked by VIF), the model will be droped.

The output will list the variables in the model, the regression coefficient, the p-value, vif, R2 and RMSE on training data and validation data set.

import pandas as pd
import numpy as np
from itertools import chain, combinations

import statsmodels.formula.api as smf
import scipy.stats as scipystats
from scipy.stats import norm
from scipy.stats.stats import pearsonr
import statsmodels.api as sm
import statsmodels.stats.stattools as stools
import statsmodels.stats as stats 
from patsy import dmatrices
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.graphics.regressionplots import *
import matplotlib.pyplot as plt
import math
mypath =  r'F:\Dropbox\ipynb_notes\ipynb_blog\\'

indata = pd.read_pickle(mypath + "data.pkl")

set up display options

pd.options.display.max_rows
pd.set_option('display.max_colwidth', -1)

from IPython.display import display, HTML

step 1. univariate variable selection

1.1. pearson calculation

set up the variable sign
calculate the pearson correlation
keep only significant variables by p-value < 0.1

def pearson_corr(indata, xvar, yvar):
    outdf = pd.DataFrame()
    for i in range(len(xvar)):
        pcorr = pearsonr(indata[xvar[i]], indata[yvar])
        corrdf = pd.DataFrame({"var_name": xvar[i], "pearson_corr":pcorr[0], "p_value":pcorr[1]}, index = [0])
        outdf = pd.concat([outdf, corrdf], axis = 0)
        outdf.sort_values('p_value')
    outdf.columns = corrdf.columns
    return outdf

def pearson_calc(indata, xvar, yvar):
    pearsoncorr = pearson_corr(indata, xvar, yvar)
    pearsoncorr = pearsoncorr.sort_values(by = ['p_value'])
    pearsoncorr = pearsoncorr.query('p_value <= 0.1')
    pearsoncorr = pearsoncorr.reset_index(drop = True)
    return pearsoncorr

xvarcol = indata.columns[1:-3]
yvarcol = indata.columns[0]
pearsoncorr = pearson_calc(indata, xvarcol, yvarcol)
pearsoncorr.head()

	p_value	pearson_corr	var_name
0	1.617254e-09	0.498257	cap_rate_p0
1	2.287193e-08	-0.466109	realgpd_yoy_p1
2	2.615205e-08	0.464391	ted_spread_p4
3	8.876455e-08	-0.448275	realgpd_yoy_p0
4	2.407472e-07	0.434471	cap_rate_p1

1.2. check the sign of pearson correlation

only the correlation that makes sense will be kept. For example, gdp growth should have negative correlation with default rate. If the data shows positive sign, we will drop that data.

varsign = {'bbb_spread_p': 1, 'ted_spread_p': 1, 'unemp_p': 1, 'realgpd_yoy_p': -1, 'cap_rate_p': 1, 'crea_hp_yoy_p': -1,    \
    'corp_profit_p': -1, 'com_price_index_yoy_p': -1, 'spindex_yoy_p': -1}

def check_sign(indata, varsign):
    varsign = pd.DataFrame.from_dict(varsign, orient = "index").reset_index()
    varsign.columns = ["var_name_abb", "sign"]
    indata['var_name_abb'] = indata.var_name.map(lambda x: x[:-1])
    indata = pd.merge(indata, varsign, how = "left", on = ['var_name_abb'])
    indata['sign_true'] = np.where(indata.pearson_corr > 0, 1, -1) * indata.sign
    return indata.query('sign_true == 1')

output = check_sign(pearsoncorr, varsign)
output.sort_values('var_name')
output.head()

	p_value	pearson_corr	var_name	var_name_abb	sign	sign_true
0	1.617254e-09	0.498257	cap_rate_p0	cap_rate_p	1	1
1	2.287193e-08	-0.466109	realgpd_yoy_p1	realgpd_yoy_p	-1	1
2	2.615205e-08	0.464391	ted_spread_p4	ted_spread_p	1	1
3	8.876455e-08	-0.448275	realgpd_yoy_p0	realgpd_yoy_p	-1	1
4	2.407472e-07	0.434471	cap_rate_p1	cap_rate_p	1	1

2. Multi-variate variable selection

In this step, we will try all the combinations of the X variables in the data and run the multi-variable regression.

2.1 Prepare data

From the univariate analysis, we will get the list of X variables(pearson p-vaule is significant, and correlation sign is correct) that should be tested on the multi-variable analysis.

We will consider both the training data and the validation data. So the data will be prepared into training and validation.

xvar = output.var_name
cniX = indata.ix[:, xvar]
cniY = indata.ix[:, "cum_pd_num"]    
train_flag = indata.train_flag

xtrain, xtest = cniX[train_flag], cniX[~train_flag] 
ytrain, ytest = cniY[train_flag], cniY[~train_flag]

print xtrain.shape
print xtest.shape

(91, 28)
(39, 28)

2.2. multi-variable analysis

for all 2/3/4 ... variable combination, run the linear regression
calculate the vif and vif_max for the X combination
run linear regression
if there is variables duplicated, then drop it, otherwise go to next step
if the any variables coefficient sign is not intuitive, drop it; otherwise go to next step
if new model score is better than the previous model, keep it; otherwise drop it

def verify_sign(params):
    setsign = varsign
    setsign = pd.DataFrame.from_dict(setsign, orient = "index").reset_index()
    setsign.columns = ['vars', 'coefsign']
    params = params.reset_index()
    params.columns = ['vars', 'coefd']
    params['vars'] = params.vars.map(lambda x: x[:-1])
    combinedata = pd.merge(setsign, params, how = 'inner', on = 'vars')
    combinedata['coefd_'] = combinedata.coefsign * np.sign(combinedata.coefd)
    if sum(combinedata['coefd_'] > 0) == combinedata.shape[0]:
        return True
    else:
        return False


def check_dup_vars(xvar):
    xx = xvar
    xxx = set([x[:-1] for x in xx])
    if len(xxx) == len(xx):
        return True
    else:
        return False

def all_combination_withsign2(xtrain, ytrain, xtest, ytest, k_features = [1]):
    outdf = pd.DataFrame()
    features = xtrain.columns.tolist()
    n_features = xtrain.shape[1]
    subsets = chain.from_iterable(combinations(xrange(n_features), k) for k in k_features)
    best_score = -np.inf
    all_combination = None
    for subset in subsets:
        newxtrain = xtrain.iloc[:, subset]
        newxtrain = sm.add_constant(newxtrain)
        vif = dict(zip(newxtrain.columns, [variance_inflation_factor(newxtrain.values, i) for i in range(newxtrain.shape[1])])) 
        del vif['const']
        vif_max = max(vif.itervalues())
        newxtest = xtest.iloc[:, subset]
        newxtest = sm.add_constant(newxtest)        
        lin_reg = sm.OLS(ytrain, newxtrain).fit()       
        #lin_reg = sm.OLS(ytrain, newxtrain).fit(cov_type='HAC',cov_kwds={'maxlags':1})
        if check_dup_vars(lin_reg.params.index.tolist()):        
            r2train = lin_reg.rsquared
            r2test1 = np.sum((lin_reg.predict(newxtest) - np.mean(ytest))**2) / np.sum((ytest - np.mean(ytest))**2)
            r2test = 1 - np.sum((lin_reg.predict(newxtest) -  ytest)**2) / np.sum((ytest - np.mean(ytest))**2)            
            r2test2 = 1 - np.var(ytest - lin_reg.predict(newxtest)) / np.var(ytest)       
            rmse_train = math.sqrt(sum((lin_reg.predict(newxtrain) - ytrain)**2) / (len(ytrain) - len(lin_reg.params)))
            rmse_train2 = math.sqrt(lin_reg.mse_resid)
            pred = lin_reg.predict(newxtest)        
            rmse_test = math.sqrt(sum((pred - ytest)**2)/(len(ytest) - len(lin_reg.params)))
            if (verify_sign(lin_reg.params)):
                sss = " ".join([features[i] for i in subset])
                pvalues = lin_reg.pvalues.drop('const').to_dict(); pvalues_max = max(pvalues.itervalues())
                dd = {'vars':sss, 'r2train':r2train, 'r2test':r2test, 'rmse_train':rmse_train,  'rmse_test':rmse_test, 'pvalues': [pvalues], \
                    'pvalues_max': pvalues_max, "reg_coef": [dict(lin_reg.params)], "vif":[vif], "vif_max":vif_max}
                df_i = pd.DataFrame(dd, index = [0], columns = dd.keys())  
                outdf = pd.concat([df_i, outdf], axis = 0)
                score = lin_reg.rsquared
                if (score > best_score) & (vif_max < 5) & (pvalues_max < 0.1):                  
                    best_score, all_combination = score, subset
                    best_reg_result = lin_reg

    outdf = outdf[["vars", "reg_coef", "pvalues", "pvalues_max", "vif", "vif_max", "r2train", "r2test", "rmse_train", "rmse_test"]]                                       
    return all_combination, best_score, best_reg_result, outdf

all_combination, best_score, best_reg_result, outdf  = all_combination_withsign2(xtrain, ytrain, xtest, ytest, k_features = [3])

List all the candidate models and sort them by the R2 on training data.

outdf.sort_values('r2train', ascending = False).query('r2train > 0.5')

vars	reg_coef	pvalues	pvalues_max	vif	vif_max	r2train	r2test	rmse_train	rmse_test
ted_spread_p0 cap_rate_p3 crea_hp_yoy_p3	{u'cap_rate_p3': 0.844407825413, u'crea_hp_yoy_p3': -9.75552407688, u'const': -5.82475185843, u'ted_spread_p0': 2.9978370395}	{u'cap_rate_p3': 3.94306563611e-10, u'crea_hp_yoy_p3': 1.24108359437e-13, u'ted_spread_p0': 1.12453478033e-13}	3.943066e-10	{u'cap_rate_p3': 1.0291090023, u'crea_hp_yoy_p3': 1.26682457883, u'ted_spread_p0': 1.29763514744}	1.297635	0.614574	0.478290	0.633015	0.767676
ted_spread_p0 cap_rate_p2 crea_hp_yoy_p3	{u'cap_rate_p2': 0.811141476444, u'crea_hp_yoy_p3': -8.88100839499, u'const': -5.58527727789, u'ted_spread_p0': 2.83870388981}	{u'cap_rate_p2': 5.84092070233e-10, u'crea_hp_yoy_p3': 5.64924235802e-12, u'ted_spread_p0': 8.5084006685e-13}	5.840921e-10	{u'cap_rate_p2': 1.02146127586, u'crea_hp_yoy_p3': 1.26361213133, u'ted_spread_p0': 1.27468501164}	1.274685	0.611142	0.509502	0.635827	0.744358
ted_spread_p0 cap_rate_p3 corp_profit_p1	{u'cap_rate_p3': 1.02088758198, u'corp_profit_p1': -0.0194663514096, u'const': -7.15607863379, u'ted_spread_p0': 2.49614318727}	{u'cap_rate_p3': 1.48967861708e-12, u'corp_profit_p1': 2.36587341489e-13, u'ted_spread_p0': 1.39564236076e-11}	1.395642e-11	{u'cap_rate_p3': 1.08231697692, u'corp_profit_p1': 1.14482598717, u'ted_spread_p0': 1.13625413757}	1.144826	0.608887	0.414865	0.637668	0.813001
cap_rate_p1 ted_spread_p0 crea_hp_yoy_p3	{u'cap_rate_p1': 0.789251206651, u'const': -5.40588738423, u'crea_hp_yoy_p3': -8.3294013128, u'ted_spread_p0': 2.69331735652}	{u'cap_rate_p1': 1.01807994814e-09, u'crea_hp_yoy_p3': 8.57212928459e-11, u'ted_spread_p0': 6.79453180355e-12}	1.018080e-09	{u'cap_rate_p1': 1.02530776004, u'crea_hp_yoy_p3': 1.27841193373, u'ted_spread_p0': 1.26335700212}	1.278412	0.606241	0.549077	0.639821	0.713698
ted_spread_p0 crea_hp_yoy_p3 cap_rate_p4	{u'crea_hp_yoy_p3': -10.9366916932, u'cap_rate_p4': 0.873672209208, u'const': -5.99301833528, u'ted_spread_p0': 3.0785031203}	{u'crea_hp_yoy_p3': 4.03277792628e-15, u'cap_rate_p4': 1.21300198442e-09, u'ted_spread_p0': 8.47813573869e-14}	1.213002e-09	{u'crea_hp_yoy_p3': 1.32415796614, u'cap_rate_p4': 1.06453290377, u'ted_spread_p0': 1.31619929247}	1.324158	0.604684	0.434813	0.641085	0.799023
ted_spread_p0 cap_rate_p2 corp_profit_p1	{u'cap_rate_p2': 0.961038613195, u'corp_profit_p1': -0.0172934378291, u'const': -6.71919753298, u'ted_spread_p0': 2.35603497118}	{u'cap_rate_p2': 2.71762604741e-12, u'corp_profit_p1': 1.33144749492e-11, u'ted_spread_p0': 9.53692120781e-11}	9.536921e-11	{u'cap_rate_p2': 1.03374482226, u'corp_profit_p1': 1.09884169943, u'ted_spread_p0': 1.11410042186}	1.114100	0.603520	0.423553	0.642028	0.806943
cap_rate_p1 ted_spread_p0 corp_profit_p1	{u'cap_rate_p1': 0.920991926182, u'corp_profit_p1': -0.0158660128406, u'const': -6.4030489504, u'ted_spread_p0': 2.21313727996}	{u'cap_rate_p1': 7.0585880264e-12, u'corp_profit_p1': 3.02949988624e-10, u'ted_spread_p0': 8.21215334728e-10}	8.212153e-10	{u'cap_rate_p1': 1.01298762173, u'corp_profit_p1': 1.08530202926, u'ted_spread_p0': 1.09761360937}	1.097614	0.594854	0.457804	0.649007	0.782603
ted_spread_p0 corp_profit_p1 cap_rate_p4	{u'corp_profit_p1': -0.0223204781179, u'const': -7.56469914659, u'cap_rate_p4': 1.08379220206, u'ted_spread_p0': 2.51812630047}	{u'corp_profit_p1': 1.1949152689e-14, u'cap_rate_p4': 7.10244169808e-12, u'ted_spread_p0': 2.14516288708e-11}	2.145163e-11	{u'corp_profit_p1': 1.2612989854, u'cap_rate_p4': 1.18006908653, u'ted_spread_p0': 1.1431268946}	1.261299	0.594797	0.388572	0.649052	0.831067
cap_rate_p0 ted_spread_p0 crea_hp_yoy_p3	{u'crea_hp_yoy_p3': -7.8525157607, u'cap_rate_p0': 0.737744628637, u'const': -4.98396736099, u'ted_spread_p0': 2.38727745306}	{u'crea_hp_yoy_p3': 1.75576444313e-09, u'cap_rate_p0': 8.69348819727e-09, u'ted_spread_p0': 1.05148081701e-09}	8.693488e-09	{u'crea_hp_yoy_p3': 1.30577823981, u'cap_rate_p0': 1.03552221083, u'ted_spread_p0': 1.27216361003}	1.305778	0.586771	0.550868	0.655449	0.712279
cap_rate_p0 ted_spread_p0 corp_profit_p1	{u'cap_rate_p0': 0.872842905445, u'const': -5.97367379688, u'corp_profit_p1': -0.0149625516801, u'ted_spread_p0': 1.89650848195}	{u'cap_rate_p0': 2.90085948633e-11, u'corp_profit_p1': 3.02326631849e-09, u'ted_spread_p0': 8.42420292381e-08}	8.424203e-08	{u'cap_rate_p0': 1.00022202315, u'corp_profit_p1': 1.08376800619, u'ted_spread_p0': 1.08376614561}	1.083768	0.581685	0.465779	0.659471	0.776826
realgpd_yoy_p1 ted_spread_p0 cap_rate_p4	{u'cap_rate_p4': 0.459310529281, u'const': -3.19209503322, u'realgpd_yoy_p1': -0.317694979609, u'ted_spread_p0': 2.24499133704}	{u'cap_rate_p4': 0.000880252388483, u'realgpd_yoy_p1': 4.64502170524e-13, u'ted_spread_p0': 1.71481655757e-09}	8.802524e-04	{u'cap_rate_p4': 1.03139580039, u'realgpd_yoy_p1': 1.10223254627, u'ted_spread_p0': 1.08994866485}	1.102233	0.559669	0.461177	0.676602	0.780165
realgpd_yoy_p1 ted_spread_p0 cap_rate_p3	{u'cap_rate_p3': 0.441726138975, u'const': -3.10473119229, u'realgpd_yoy_p1': -0.292255362611, u'ted_spread_p0': 2.22392026902}	{u'cap_rate_p3': 0.0016672139689, u'realgpd_yoy_p1': 8.08991799692e-11, u'ted_spread_p0': 2.71002168046e-09}	1.667214e-03	{u'cap_rate_p3': 1.14924858178, u'realgpd_yoy_p1': 1.21544761004, u'ted_spread_p0': 1.08765234604}	1.215448	0.553622	0.462181	0.681231	0.779438
realgpd_yoy_p1 ted_spread_p0 cap_rate_p2	{u'cap_rate_p2': 0.43466635192, u'const': -3.05451545084, u'realgpd_yoy_p1': -0.275385243575, u'ted_spread_p0': 2.16541214241}	{u'cap_rate_p2': 0.00266629601475, u'realgpd_yoy_p1': 3.95110812196e-09, u'ted_spread_p0': 6.47022044576e-09}	2.666296e-03	{u'cap_rate_p2': 1.28453682239, u'realgpd_yoy_p1': 1.36522930468, u'ted_spread_p0': 1.08353965769}	1.365229	0.549157	0.458911	0.684631	0.781804
cap_rate_p0 ted_spread_p0 corp_profit_p2	{u'corp_profit_p2': -0.013387541678, u'cap_rate_p0': 0.866163613446, u'const': -5.84769476189, u'ted_spread_p0': 1.66498201142}	{u'cap_rate_p0': 1.4478538301e-10, u'corp_profit_p2': 9.61024512588e-08, u'ted_spread_p0': 2.40214450834e-06}	2.402145e-06	{u'corp_profit_p2': 1.03765625407, u'cap_rate_p0': 1.00059373935, u'ted_spread_p0': 1.03729905889}	1.037656	0.547915	0.428487	0.685573	0.803482
realgpd_yoy_p1 cap_rate_p1 ted_spread_p0	{u'cap_rate_p1': 0.42522657551, u'const': -2.97879289237, u'realgpd_yoy_p1': -0.264924269333, u'ted_spread_p0': 2.10282371324}	{u'cap_rate_p1': 0.00449937414885, u'realgpd_yoy_p1': 5.72394243797e-08, u'ted_spread_p0': 1.78714604679e-08}	4.499374e-03	{u'cap_rate_p1': 1.41658994804, u'realgpd_yoy_p1': 1.51749717529, u'ted_spread_p0': 1.08664051545}	1.517497	0.544166	0.475067	0.688410	0.770043
ted_spread_p0 bbb_spread_p3 corp_profit_p1	{u'bbb_spread_p3': 0.554595648355, u'corp_profit_p1': -0.0076010199434, u'const': -2.38704628717, u'ted_spread_p0': 2.85607944113}	{u'bbb_spread_p3': 1.8523928591e-09, u'corp_profit_p1': 0.00482637840243, u'ted_spread_p0': 1.3484537929e-11}	4.826378e-03	{u'bbb_spread_p3': 1.54114854916, u'corp_profit_p1': 1.32791416339, u'ted_spread_p0': 1.26387542399}	1.541149	0.540536	0.425760	0.691145	0.805397
cap_rate_p1 ted_spread_p0 corp_profit_p2	{u'corp_profit_p2': -0.0135243557809, u'cap_rate_p1': 0.877749475261, u'const': -6.03948102864, u'ted_spread_p0': 1.93705865732}	{u'cap_rate_p1': 3.44329193716e-10, u'corp_profit_p2': 9.48169185403e-08, u'ted_spread_p0': 1.09945034371e-07}	1.099450e-07	{u'corp_profit_p2': 1.03734814056, u'cap_rate_p1': 1.01163126479, u'ted_spread_p0': 1.04793790581}	1.047938	0.538986	0.407210	0.692310	0.818302
cap_rate_p0 realgpd_yoy_p1 ted_spread_p0	{u'cap_rate_p0': 0.399969892292, u'const': -2.76856127445, u'realgpd_yoy_p1': -0.261163454589, u'ted_spread_p0': 1.96391841957}	{u'cap_rate_p0': 0.00823171721478, u'realgpd_yoy_p1': 2.44100020902e-07, u'ted_spread_p0': 2.02057150048e-07}	8.231717e-03	{u'cap_rate_p0': 1.51138715628, u'realgpd_yoy_p1': 1.63739288792, u'ted_spread_p0': 1.13076824898}	1.637393	0.538396	0.479249	0.692753	0.766970
cap_rate_p1 ted_spread_p0 spindex_yoy_p3	{u'cap_rate_p1': 0.65706188261, u'spindex_yoy_p3': -0.0278317743804, u'const': -4.80448757293, u'ted_spread_p0': 2.33803685851}	{u'cap_rate_p1': 2.74601006446e-06, u'spindex_yoy_p3': 1.52047871931e-07, u'ted_spread_p0': 3.75314669611e-09}	2.746010e-06	{u'cap_rate_p1': 1.11830677895, u'spindex_yoy_p3': 1.30113118522, u'ted_spread_p0': 1.17735336332}	1.301131	0.534091	0.467292	0.695975	0.775725
ted_spread_p0 bbb_spread_p3 crea_hp_yoy_p3	{u'bbb_spread_p3': 0.496427591994, u'crea_hp_yoy_p3': -4.04408099734, u'const': -2.11272790298, u'ted_spread_p0': 2.98218905816}	{u'bbb_spread_p3': 2.53868889258e-06, u'crea_hp_yoy_p3': 0.013182986019, u'ted_spread_p0': 8.44964879018e-12}	1.318299e-02	{u'bbb_spread_p3': 2.14915809275, u'crea_hp_yoy_p3': 2.15508279604, u'ted_spread_p0': 1.31551155593}	2.155083	0.530881	0.487799	0.698369	0.760648
cap_rate_p0 ted_spread_p0 spindex_yoy_p3	{u'spindex_yoy_p3': -0.0268613344048, u'cap_rate_p0': 0.637203038203, u'const': -4.58585413988, u'ted_spread_p0': 2.10847588993}	{u'spindex_yoy_p3': 5.48924336311e-07, u'cap_rate_p0': 4.08336221776e-06, u'ted_spread_p0': 9.57175247542e-08}	4.083362e-06	{u'spindex_yoy_p3': 1.34011381199, u'cap_rate_p0': 1.13890662152, u'ted_spread_p0': 1.20436511897}	1.340114	0.529975	0.481577	0.699043	0.765254
ted_spread_p0 realgpd_yoy_p2 crea_hp_yoy_p2	{u'crea_hp_yoy_p2': -5.23054020025, u'const': -0.213910926506, u'realgpd_yoy_p2': -0.300526775383, u'ted_spread_p0': 2.54263066825}	{u'crea_hp_yoy_p2': 4.73746805923e-05, u'realgpd_yoy_p2': 2.90405468424e-11, u'ted_spread_p0': 4.65498962823e-10}	4.737468e-05	{u'crea_hp_yoy_p2': 1.14885263493, u'realgpd_yoy_p2': 1.07915752568, u'ted_spread_p0': 1.20209011854}	1.202090	0.528894	0.490617	0.699847	0.758552
realgpd_yoy_p1 ted_spread_p0 bbb_spread_p3	{u'bbb_spread_p3': 0.3392041554, u'const': -1.48765136041, u'realgpd_yoy_p1': -0.183029785519, u'ted_spread_p0': 2.53186775198}	{u'bbb_spread_p3': 0.0371576508071, u'realgpd_yoy_p1': 0.0268215589596, u'ted_spread_p0': 4.5194564047e-09}	3.715765e-02	{u'bbb_spread_p3': 5.6016017398, u'realgpd_yoy_p1': 4.82586299937, u'ted_spread_p0': 1.36892246555}	5.601602	0.524097	0.448446	0.703401	0.789328
ted_spread_p0 corp_profit_p1 bbb_spread_p4	{u'corp_profit_p1': -0.0137172688584, u'const': -2.37417449176, u'bbb_spread_p4': 0.524233697796, u'ted_spread_p0': 3.14985768412}	{u'corp_profit_p1': 2.02303842995e-07, u'bbb_spread_p4': 9.78700091729e-09, u'ted_spread_p0': 6.29018807036e-12}	2.023038e-07	{u'corp_profit_p1': 1.09331461883, u'bbb_spread_p4': 1.37836343887, u'ted_spread_p0': 1.42051444169}	1.420514	0.523007	0.343924	0.704206	0.860876
ted_spread_p0 cap_rate_p2 corp_profit_p2	{u'cap_rate_p2': 0.861897957176, u'corp_profit_p2': -0.0139592553872, u'const': -5.98901672449, u'ted_spread_p0': 2.01153180982}	{u'cap_rate_p2': 2.3561392043e-09, u'corp_profit_p2': 7.3640978496e-08, u'ted_spread_p0': 8.20990655485e-08}	8.209907e-08	{u'cap_rate_p2': 1.01956965532, u'corp_profit_p2': 1.03727640573, u'ted_spread_p0': 1.05720340499}	1.057203	0.518564	0.346343	0.707478	0.859287
ted_spread_p0 cap_rate_p2 spindex_yoy_p3	{u'cap_rate_p2': 0.623635734378, u'spindex_yoy_p3': -0.0288247874575, u'const': -4.63667583576, u'ted_spread_p0': 2.40681253901}	{u'cap_rate_p2': 1.38801807046e-05, u'spindex_yoy_p3': 8.44057937049e-08, u'ted_spread_p0': 2.69534162048e-09}	1.388018e-05	{u'cap_rate_p2': 1.11243360337, u'spindex_yoy_p3': 1.28413160743, u'ted_spread_p0': 1.17710523955}	1.284132	0.517091	0.428256	0.708559	0.803645
ted_spread_p0 bbb_spread_p3 crea_hp_yoy_p2	{u'bbb_spread_p3': 0.60361226282, u'crea_hp_yoy_p2': -2.36021166288, u'const': -2.44208114633, u'ted_spread_p0': 2.90459106331}	{u'bbb_spread_p3': 1.15389476089e-10, u'crea_hp_yoy_p2': 0.0792156318557, u'ted_spread_p0': 3.35277155454e-11}	7.921563e-02	{u'bbb_spread_p3': 1.45111260067, u'crea_hp_yoy_p2': 1.31847516127, u'ted_spread_p0': 1.30031898369}	1.451113	0.513970	0.494352	0.710845	0.755766
ted_spread_p0 bbb_spread_p3 corp_profit_p2	{u'bbb_spread_p3': 0.582278194028, u'corp_profit_p2': -0.00457269604228, u'const': -2.42447060517, u'ted_spread_p0': 2.73939873183}	{u'bbb_spread_p3': 4.46517244848e-09, u'corp_profit_p2': 0.103032476656, u'ted_spread_p0': 1.65472591821e-10}	1.030325e-01	{u'bbb_spread_p3': 1.69554846895, u'corp_profit_p2': 1.39827155759, u'ted_spread_p0': 1.26336327403}	1.695548	0.511588	0.404442	0.712585	0.820210
ted_spread_p0 crea_hp_yoy_p2 bbb_spread_p4	{u'crea_hp_yoy_p2': -6.70808022288, u'const': -2.42877868756, u'bbb_spread_p4': 0.607734054772, u'ted_spread_p0': 3.4961201786}	{u'crea_hp_yoy_p2': 6.05151670014e-07, u'bbb_spread_p4': 1.48369689826e-10, u'ted_spread_p0': 1.03002090634e-12}	6.051517e-07	{u'crea_hp_yoy_p2': 1.15154216378, u'bbb_spread_p4': 1.37674369466, u'ted_spread_p0': 1.55319251249}	1.553193	0.511203	0.561489	0.712866	0.703807
realgpd_yoy_p1 ted_spread_p0 crea_hp_yoy_p3	{u'crea_hp_yoy_p3': -2.69122158056, u'const': -0.311597133369, u'realgpd_yoy_p1': -0.271003198186, u'ted_spread_p0': 2.36903437474}	{u'crea_hp_yoy_p3': 0.160540939231, u'realgpd_yoy_p1': 1.68115665371e-05, u'ted_spread_p0': 1.73940931661e-08}	1.605409e-01	{u'crea_hp_yoy_p3': 2.92635266698, u'realgpd_yoy_p1': 2.51416529656, u'ted_spread_p0': 1.28272242148}	2.926353	0.510855	0.496386	0.713120	0.754245
realgpd_yoy_p1 ted_spread_p0 bbb_spread_p4	{u'const': -0.833411322829, u'realgpd_yoy_p1': -0.284642070828, u'bbb_spread_p4': 0.157377817458, u'ted_spread_p0': 2.41688117752}	{u'realgpd_yoy_p1': 6.46698989093e-07, u'bbb_spread_p4': 0.167778133568, u'ted_spread_p0': 3.11105712134e-08}	1.677781e-01	{u'realgpd_yoy_p1': 1.99595132113, u'bbb_spread_p4': 2.51669899622, u'ted_spread_p0': 1.392766576}	2.516699	0.510479	0.431894	0.713393	0.801084
cap_rate_p1 ted_spread_p0 bbb_spread_p3	{u'bbb_spread_p3': 0.544850450153, u'cap_rate_p1': 0.272985200716, u'const': -3.98664801529, u'ted_spread_p0': 2.60924536243}	{u'bbb_spread_p3': 1.44198483922e-06, u'cap_rate_p1': 0.121325501399, u'ted_spread_p0': 2.7102689375e-09}	1.213255e-01	{u'bbb_spread_p3': 2.34776209124, u'cap_rate_p1': 1.88813529519, u'ted_spread_p0': 1.36428733329}	2.347762	0.510130	0.428608	0.713648	0.803397
ted_spread_p0 realgpd_yoy_p2 crea_hp_yoy_p3	{u'crea_hp_yoy_p3': -5.59792480702, u'const': -0.330017028638, u'realgpd_yoy_p2': -0.215888553711, u'ted_spread_p0': 2.58561014684}	{u'crea_hp_yoy_p3': 0.000296441873851, u'realgpd_yoy_p2': 1.87381427588e-05, u'ted_spread_p0': 1.09664873088e-09}	2.964419e-04	{u'crea_hp_yoy_p3': 1.77986311362, u'realgpd_yoy_p2': 1.51489441528, u'ted_spread_p0': 1.26112101248}	1.779863	0.509685	0.447184	0.713972	0.790230
cap_rate_p0 ted_spread_p0 bbb_spread_p3	{u'bbb_spread_p3': 0.538791932676, u'cap_rate_p0': 0.269149311217, u'const': -3.90589431381, u'ted_spread_p0': 2.51233917448}	{u'bbb_spread_p3': 3.69381353544e-06, u'cap_rate_p0': 0.128373725201, u'ted_spread_p0': 3.88957741361e-08}	1.283737e-01	{u'bbb_spread_p3': 2.51413269277, u'cap_rate_p0': 1.99928074526, u'ted_spread_p0': 1.52794519095}	2.514133	0.509631	0.434051	0.714011	0.799561
ted_spread_p0 crea_hp_yoy_p3 bbb_spread_p4	{u'crea_hp_yoy_p3': -7.124071928, u'const': -1.81674640673, u'bbb_spread_p4': 0.402048103596, u'ted_spread_p0': 3.28772335045}	{u'crea_hp_yoy_p3': 7.40618262123e-07, u'bbb_spread_p4': 1.99672792352e-05, u'ted_spread_p0': 4.0450297686e-12}	1.996728e-05	{u'crea_hp_yoy_p3': 1.43640456697, u'bbb_spread_p4': 1.55605617691, u'ted_spread_p0': 1.46835089361}	1.556056	0.508998	0.470157	0.714472	0.773637
ted_spread_p0 crea_hp_yoy_p2 realgpd_yoy_p3	{u'realgpd_yoy_p3': -0.311113715762, u'crea_hp_yoy_p2': -9.309071086, u'const': -0.103457190105, u'ted_spread_p0': 2.84913168832}	{u'realgpd_yoy_p3': 1.92976213253e-10, u'crea_hp_yoy_p2': 5.13571840644e-10, u'ted_spread_p0': 6.5891420931e-11}	5.135718e-10	{u'realgpd_yoy_p3': 1.19851914035, u'crea_hp_yoy_p2': 1.30632930877, u'ted_spread_p0': 1.28561708211}	1.306329	0.508293	0.518727	0.714985	0.737325
ted_spread_p0 bbb_spread_p3 spindex_yoy_p3	{u'bbb_spread_p3': 0.525123157535, u'spindex_yoy_p3': -0.0106469798442, u'const': -2.29476615365, u'ted_spread_p0': 2.80601709969}	{u'bbb_spread_p3': 3.20465565732e-05, u'spindex_yoy_p3': 0.152675572385, u'ted_spread_p0': 7.93900361033e-11}	1.526756e-01	{u'bbb_spread_p3': 3.02253841133, u'spindex_yoy_p3': 2.82820562708, u'ted_spread_p0': 1.26016903124}	3.022538	0.508117	0.429300	0.715113	0.802911
realgpd_yoy_p0 ted_spread_p0 bbb_spread_p3	{u'realgpd_yoy_p0': -0.0822791822097, u'const': -2.17908597925, u'bbb_spread_p3': 0.540487885674, u'ted_spread_p0': 2.63164243642}	{u'bbb_spread_p3': 5.73922047576e-06, u'realgpd_yoy_p0': 0.153193960743, u'ted_spread_p0': 1.95668256146e-09}	1.531940e-01	{u'realgpd_yoy_p0': 2.14853765757, u'bbb_spread_p3': 2.63888992404, u'ted_spread_p0': 1.35238585586}	2.638890	0.508087	0.458403	0.715134	0.782170
ted_spread_p0 bbb_spread_p3 cap_rate_p4	{u'bbb_spread_p3': 0.615821291858, u'cap_rate_p4': 0.208708505931, u'const': -3.8335303104, u'ted_spread_p0': 2.74915454136}	{u'bbb_spread_p3': 6.60288226921e-11, u'cap_rate_p4': 0.166847470536, u'ted_spread_p0': 1.67384077652e-10}	1.668475e-01	{u'bbb_spread_p3': 1.44211057025, u'cap_rate_p4': 1.16255484012, u'ted_spread_p0': 1.26230863366}	1.442111	0.507353	0.402521	0.715668	0.821532
ted_spread_p0 crea_hp_yoy_p3 realgpd_yoy_p3	{u'crea_hp_yoy_p3': -8.76409349446, u'realgpd_yoy_p3': -0.180879352626, u'const': -0.318947340435, u'ted_spread_p0': 2.86354149694}	{u'crea_hp_yoy_p3': 5.84134461416e-10, u'realgpd_yoy_p3': 2.43206207395e-05, u'ted_spread_p0': 6.41362452225e-11}	2.432062e-05	{u'crea_hp_yoy_p3': 1.26977569699, u'realgpd_yoy_p3': 1.05558827276, u'ted_spread_p0': 1.29286084071}	1.292861	0.506863	0.458546	0.716024	0.782067
ted_spread_p0 cap_rate_p2 bbb_spread_p3	{u'cap_rate_p2': 0.225756509972, u'const': -3.78840359933, u'bbb_spread_p3': 0.570654122306, u'ted_spread_p0': 2.67079906481}	{u'cap_rate_p2': 0.193180231382, u'bbb_spread_p3': 2.30746281702e-07, u'ted_spread_p0': 9.08484906408e-10}	1.931802e-01	{u'cap_rate_p2': 1.75920678376, u'bbb_spread_p3': 2.17026703782, u'ted_spread_p0': 1.31977926124}	2.170267	0.506110	0.411367	0.716570	0.815428
ted_spread_p0 realgpd_yoy_p2 corp_profit_p1	{u'corp_profit_p1': -0.00960849919163, u'const': -0.407772979121, u'realgpd_yoy_p2': -0.257333465025, u'ted_spread_p0': 2.29032216488}	{u'corp_profit_p1': 0.000448333783238, u'realgpd_yoy_p2': 4.97703208541e-08, u'ted_spread_p0': 7.49817339718e-09}	4.483338e-04	{u'corp_profit_p1': 1.23841713376, u'realgpd_yoy_p2': 1.22668430577, u'ted_spread_p0': 1.11609747523}	1.238417	0.505281	0.321448	0.717171	0.875497
ted_spread_p0 bbb_spread_p3 cap_rate_p3	{u'cap_rate_p3': 0.201637275061, u'bbb_spread_p3': 0.593429505205, u'const': -3.71870866222, u'ted_spread_p0': 2.71903806154}	{u'cap_rate_p3': 0.217238393496, u'bbb_spread_p3': 7.84889309862e-09, u'ted_spread_p0': 3.23824482993e-10}	2.172384e-01	{u'cap_rate_p3': 1.4724277304, u'bbb_spread_p3': 1.80756321736, u'ted_spread_p0': 1.27957899111}	1.807563	0.505135	0.405297	0.717277	0.819622
ted_spread_p0 spindex_yoy_p3 cap_rate_p3	{u'cap_rate_p3': 0.594228769802, u'spindex_yoy_p3': -0.031124114991, u'const': -4.49069384587, u'ted_spread_p0': 2.48354562619}	{u'cap_rate_p3': 4.50370245447e-05, u'spindex_yoy_p3': 8.36225082073e-09, u'ted_spread_p0': 1.64539790007e-09}	4.503702e-05	{u'cap_rate_p3': 1.06865169504, u'spindex_yoy_p3': 1.22753770126, u'ted_spread_p0': 1.18199596216}	1.227538	0.504430	0.408088	0.717788	0.817696
ted_spread_p0 realgpd_yoy_p2 bbb_spread_p3	{u'bbb_spread_p3': 0.517442462306, u'const': -2.1307227209, u'realgpd_yoy_p2': -0.0864549354609, u'ted_spread_p0': 2.66817475933}	{u'bbb_spread_p3': 0.000504611426207, u'realgpd_yoy_p2': 0.249265257706, u'ted_spread_p0': 1.27380241683e-09}	2.492653e-01	{u'bbb_spread_p3': 4.29026413402, u'realgpd_yoy_p2': 3.6616378528, u'ted_spread_p0': 1.34053982558}	4.290264	0.504018	0.388464	0.718086	0.831140
realgpd_yoy_p1 ted_spread_p0 spindex_yoy_p3	{u'spindex_yoy_p3': -0.00697418929236, u'const': -0.358295064272, u'realgpd_yoy_p1': -0.287950006036, u'ted_spread_p0': 2.24832163495}	{u'spindex_yoy_p3': 0.396355057686, u'realgpd_yoy_p1': 4.80031372299e-05, u'ted_spread_p0': 2.97028063227e-08}	3.963551e-01	{u'spindex_yoy_p3': 3.44623710801, u'realgpd_yoy_p1': 3.17299042455, u'ted_spread_p0': 1.18476784624}	3.446237	0.503737	0.457973	0.718290	0.782481
realgpd_yoy_p1 ted_spread_p0 crea_hp_yoy_p2	{u'crea_hp_yoy_p2': -1.0795826604, u'const': -0.272715216966, u'realgpd_yoy_p1': -0.318271064261, u'ted_spread_p0': 2.22651917837}	{u'crea_hp_yoy_p2': 0.450477095771, u'realgpd_yoy_p1': 3.13546780517e-10, u'ted_spread_p0': 2.90322487555e-08}	4.504771e-01	{u'crea_hp_yoy_p2': 1.48055472983, u'realgpd_yoy_p1': 1.40383603833, u'ted_spread_p0': 1.15790564414}	1.480555	0.502876	0.479863	0.718912	0.766517
realgpd_yoy_p1 ted_spread_p0 spindex_yoy_p4	{u'spindex_yoy_p4': -0.00287623611266, u'const': -0.307424934206, u'realgpd_yoy_p1': -0.319272637088, u'ted_spread_p0': 2.2027800319}	{u'spindex_yoy_p4': 0.630438613957, u'realgpd_yoy_p1': 1.06155602716e-08, u'ted_spread_p0': 4.3262669625e-08}	6.304386e-01	{u'spindex_yoy_p4': 1.90330721432, u'realgpd_yoy_p1': 1.77597233856, u'ted_spread_p0': 1.16353394534}	1.903307	0.500930	0.450725	0.720318	0.787695
ted_spread_p0 bbb_spread_p3 spindex_yoy_p2	{u'bbb_spread_p3': 0.601012948771, u'spindex_yoy_p2': -0.00542467267777, u'const': -2.49819167847, u'ted_spread_p0': 2.8074946861}	{u'bbb_spread_p3': 5.53590432482e-08, u'spindex_yoy_p2': 0.387126006585, u'ted_spread_p0': 1.07989092515e-10}	3.871260e-01	{u'bbb_spread_p3': 2.12232941161, u'spindex_yoy_p2': 1.94673426734, u'ted_spread_p0': 1.26539793559}	2.122329	0.500684	0.429745	0.720496	0.802598
realgpd_yoy_p1 ted_spread_p0 corp_profit_p1	{u'corp_profit_p1': -0.00111510712547, u'const': -0.295544468835, u'realgpd_yoy_p1': -0.322134804647, u'ted_spread_p0': 2.16984747298}	{u'corp_profit_p1': 0.746943635339, u'realgpd_yoy_p1': 7.85857215941e-08, u'ted_spread_p0': 3.07526883363e-08}	7.469436e-01	{u'corp_profit_p1': 2.09799265347, u'realgpd_yoy_p1': 2.09768958589, u'ted_spread_p0': 1.09854672985}	2.097993	0.500194	0.446912	0.720849	0.790424
realgpd_yoy_p0 ted_spread_p0 cap_rate_p3	{u'cap_rate_p3': 0.639727648052, u'realgpd_yoy_p0': -0.253032450789, u'const': -4.28034401961, u'ted_spread_p0': 1.90847482334}	{u'cap_rate_p3': 1.18825887769e-05, u'realgpd_yoy_p0': 1.23019019096e-08, u'ted_spread_p0': 3.86554982571e-07}	1.188259e-05	{u'cap_rate_p3': 1.04954092806, u'realgpd_yoy_p0': 1.04901230219, u'ted_spread_p0': 1.04189905245}	1.049541	0.500114	0.505297	0.720907	0.747542

all_combination

(5, 14, 18)

best_reg_result.summary()

OLS Regression Results
Dep. Variable:	cum_pd_num	R-squared:	0.615
Model:	OLS	Adj. R-squared:	0.601
Method:	Least Squares	F-statistic:	46.24
Date:	Mon, 17 Apr 2017	Prob (F-statistic):	5.77e-18
Time:	21:56:19	Log-Likelihood:	-85.467
No. Observations:	91	AIC:	178.9
Df Residuals:	87	BIC:	189.0
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
const	-5.8248	0.770	-7.569	0.000	-7.354 -4.295
ted_spread_p0	2.9978	0.340	8.805	0.000	2.321 3.675
cap_rate_p3	0.8444	0.120	7.055	0.000	0.607 1.082
crea_hp_yoy_p3	-9.7555	1.111	-8.784	0.000	-11.963 -7.548

Omnibus:	3.899	Durbin-Watson:	2.125
Prob(Omnibus):	0.142	Jarque-Bera (JB):	3.893
Skew:	0.210	Prob(JB):	0.143
Kurtosis:	3.922	Cond. No.	107.