pydata

Keep Looking, Don't Settle

variable selection in linear regression: 2

summary

This is an example of variable selection in linear regression(and it can be easily applied in logistic regression). It has two steps:

The first step is univariate analysis. It checks the pearson correlation between each x and y. This is helpful to reduce the candidate predictors by only selecting the variables has significant correlation. The cutoff point is set p-value no more than 10%. Also the correlation coefficient sign is tested to make sure the correlation should make sense.

The second step is multi-variate analysis. For the variables selected in step 1(say there are n variables left), all possible variables combinations will be run in the linear regression. Like all 2 variables combination(C(n, 2) combinations), 3 variables combination(C(n, 3) combinations)... will be tested in the linear regression. If there are duplicated variables(like GDP annual change and GDP quarterly change) in the model, it will de droped. The regression sign of each variable will be checked again. If any sign does not make sense, then the model will be droped. If there is any variable is not significant, then the model will be droped. If there is multicollinearity(checked by VIF), the model will be droped.

The output will list the variables in the model, the regression coefficient, the p-value, vif, R2 and RMSE on training data and validation data set.

import pandas as pd
import numpy as np
from itertools import chain, combinations

import statsmodels.formula.api as smf
import scipy.stats as scipystats
from scipy.stats import norm
from scipy.stats.stats import pearsonr
import statsmodels.api as sm
import statsmodels.stats.stattools as stools
import statsmodels.stats as stats 
from patsy import dmatrices
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.graphics.regressionplots import *
import matplotlib.pyplot as plt
import math
mypath =  r'F:\Dropbox\ipynb_notes\ipynb_blog\\'

indata = pd.read_pickle(mypath + "data.pkl")

set up display options

pd.options.display.max_rows
pd.set_option('display.max_colwidth', -1)

from IPython.display import display, HTML

step 1. univariate variable selection

1.1. pearson calculation

  1. set up the variable sign
  2. calculate the pearson correlation
  3. keep only significant variables by p-value < 0.1
def pearson_corr(indata, xvar, yvar):
    outdf = pd.DataFrame()
    for i in range(len(xvar)):
        pcorr = pearsonr(indata[xvar[i]], indata[yvar])
        corrdf = pd.DataFrame({"var_name": xvar[i], "pearson_corr":pcorr[0], "p_value":pcorr[1]}, index = [0])
        outdf = pd.concat([outdf, corrdf], axis = 0)
        outdf.sort_values('p_value')
    outdf.columns = corrdf.columns
    return outdf

def pearson_calc(indata, xvar, yvar):
    pearsoncorr = pearson_corr(indata, xvar, yvar)
    pearsoncorr = pearsoncorr.sort_values(by = ['p_value'])
    pearsoncorr = pearsoncorr.query('p_value <= 0.1')
    pearsoncorr = pearsoncorr.reset_index(drop = True)
    return pearsoncorr
xvarcol = indata.columns[1:-3]
yvarcol = indata.columns[0]
pearsoncorr = pearson_calc(indata, xvarcol, yvarcol)
pearsoncorr.head()
p_value pearson_corr var_name
0 1.617254e-09 0.498257 cap_rate_p0
1 2.287193e-08 -0.466109 realgpd_yoy_p1
2 2.615205e-08 0.464391 ted_spread_p4
3 8.876455e-08 -0.448275 realgpd_yoy_p0
4 2.407472e-07 0.434471 cap_rate_p1

1.2. check the sign of pearson correlation

only the correlation that makes sense will be kept. For example, gdp growth should have negative correlation with default rate. If the data shows positive sign, we will drop that data.

varsign = {'bbb_spread_p': 1, 'ted_spread_p': 1, 'unemp_p': 1, 'realgpd_yoy_p': -1, 'cap_rate_p': 1, 'crea_hp_yoy_p': -1,    \
    'corp_profit_p': -1, 'com_price_index_yoy_p': -1, 'spindex_yoy_p': -1}

def check_sign(indata, varsign):
    varsign = pd.DataFrame.from_dict(varsign, orient = "index").reset_index()
    varsign.columns = ["var_name_abb", "sign"]
    indata['var_name_abb'] = indata.var_name.map(lambda x: x[:-1])
    indata = pd.merge(indata, varsign, how = "left", on = ['var_name_abb'])
    indata['sign_true'] = np.where(indata.pearson_corr > 0, 1, -1) * indata.sign
    return indata.query('sign_true == 1')

output = check_sign(pearsoncorr, varsign)
output.sort_values('var_name')
output.head()
p_value pearson_corr var_name var_name_abb sign sign_true
0 1.617254e-09 0.498257 cap_rate_p0 cap_rate_p 1 1
1 2.287193e-08 -0.466109 realgpd_yoy_p1 realgpd_yoy_p -1 1
2 2.615205e-08 0.464391 ted_spread_p4 ted_spread_p 1 1
3 8.876455e-08 -0.448275 realgpd_yoy_p0 realgpd_yoy_p -1 1
4 2.407472e-07 0.434471 cap_rate_p1 cap_rate_p 1 1

2. Multi-variate variable selection

In this step, we will try all the combinations of the X variables in the data and run the multi-variable regression.

2.1 Prepare data

From the univariate analysis, we will get the list of X variables(pearson p-vaule is significant, and correlation sign is correct) that should be tested on the multi-variable analysis.

We will consider both the training data and the validation data. So the data will be prepared into training and validation.

xvar = output.var_name
cniX = indata.ix[:, xvar]
cniY = indata.ix[:, "cum_pd_num"]    
train_flag = indata.train_flag

xtrain, xtest = cniX[train_flag], cniX[~train_flag] 
ytrain, ytest = cniY[train_flag], cniY[~train_flag] 
print xtrain.shape
print xtest.shape
(91, 28)
(39, 28)

2.2. multi-variable analysis

  1. for all 2/3/4 ... variable combination, run the linear regression
  2. calculate the vif and vif_max for the X combination
  3. run linear regression
  4. if there is variables duplicated, then drop it, otherwise go to next step
  5. if the any variables coefficient sign is not intuitive, drop it; otherwise go to next step
  6. if new model score is better than the previous model, keep it; otherwise drop it
def verify_sign(params):
    setsign = varsign
    setsign = pd.DataFrame.from_dict(setsign, orient = "index").reset_index()
    setsign.columns = ['vars', 'coefsign']
    params = params.reset_index()
    params.columns = ['vars', 'coefd']
    params['vars'] = params.vars.map(lambda x: x[:-1])
    combinedata = pd.merge(setsign, params, how = 'inner', on = 'vars')
    combinedata['coefd_'] = combinedata.coefsign * np.sign(combinedata.coefd)
    if sum(combinedata['coefd_'] > 0) == combinedata.shape[0]:
        return True
    else:
        return False


def check_dup_vars(xvar):
    xx = xvar
    xxx = set([x[:-1] for x in xx])
    if len(xxx) == len(xx):
        return True
    else:
        return False

def all_combination_withsign2(xtrain, ytrain, xtest, ytest, k_features = [1]):
    outdf = pd.DataFrame()
    features = xtrain.columns.tolist()
    n_features = xtrain.shape[1]
    subsets = chain.from_iterable(combinations(xrange(n_features), k) for k in k_features)
    best_score = -np.inf
    all_combination = None
    for subset in subsets:
        newxtrain = xtrain.iloc[:, subset]
        newxtrain = sm.add_constant(newxtrain)
        vif = dict(zip(newxtrain.columns, [variance_inflation_factor(newxtrain.values, i) for i in range(newxtrain.shape[1])])) 
        del vif['const']
        vif_max = max(vif.itervalues())
        newxtest = xtest.iloc[:, subset]
        newxtest = sm.add_constant(newxtest)        
        lin_reg = sm.OLS(ytrain, newxtrain).fit()       
        #lin_reg = sm.OLS(ytrain, newxtrain).fit(cov_type='HAC',cov_kwds={'maxlags':1})
        if check_dup_vars(lin_reg.params.index.tolist()):        
            r2train = lin_reg.rsquared
            r2test1 = np.sum((lin_reg.predict(newxtest) - np.mean(ytest))**2) / np.sum((ytest - np.mean(ytest))**2)
            r2test = 1 - np.sum((lin_reg.predict(newxtest) -  ytest)**2) / np.sum((ytest - np.mean(ytest))**2)            
            r2test2 = 1 - np.var(ytest - lin_reg.predict(newxtest)) / np.var(ytest)       
            rmse_train = math.sqrt(sum((lin_reg.predict(newxtrain) - ytrain)**2) / (len(ytrain) - len(lin_reg.params)))
            rmse_train2 = math.sqrt(lin_reg.mse_resid)
            pred = lin_reg.predict(newxtest)        
            rmse_test = math.sqrt(sum((pred - ytest)**2)/(len(ytest) - len(lin_reg.params)))
            if (verify_sign(lin_reg.params)):
                sss = " ".join([features[i] for i in subset])
                pvalues = lin_reg.pvalues.drop('const').to_dict(); pvalues_max = max(pvalues.itervalues())
                dd = {'vars':sss, 'r2train':r2train, 'r2test':r2test, 'rmse_train':rmse_train,  'rmse_test':rmse_test, 'pvalues': [pvalues], \
                    'pvalues_max': pvalues_max, "reg_coef": [dict(lin_reg.params)], "vif":[vif], "vif_max":vif_max}
                df_i = pd.DataFrame(dd, index = [0], columns = dd.keys())  
                outdf = pd.concat([df_i, outdf], axis = 0)
                score = lin_reg.rsquared
                if (score > best_score) & (vif_max < 5) & (pvalues_max < 0.1):                  
                    best_score, all_combination = score, subset
                    best_reg_result = lin_reg

    outdf = outdf[["vars", "reg_coef", "pvalues", "pvalues_max", "vif", "vif_max", "r2train", "r2test", "rmse_train", "rmse_test"]]                                       
    return all_combination, best_score, best_reg_result, outdf

all_combination, best_score, best_reg_result, outdf  = all_combination_withsign2(xtrain, ytrain, xtest, ytest, k_features = [3])

List all the candidate models and sort them by the R2 on training data.

outdf.sort_values('r2train', ascending = False).query('r2train > 0.5')
vars reg_coef pvalues pvalues_max vif vif_max r2train r2test rmse_train rmse_test
0 ted_spread_p0 cap_rate_p3 crea_hp_yoy_p3 {u'cap_rate_p3': 0.844407825413, u'crea_hp_yoy_p3': -9.75552407688, u'const': -5.82475185843, u'ted_spread_p0': 2.9978370395} {u'cap_rate_p3': 3.94306563611e-10, u'crea_hp_yoy_p3': 1.24108359437e-13, u'ted_spread_p0': 1.12453478033e-13} 3.943066e-10 {u'cap_rate_p3': 1.0291090023, u'crea_hp_yoy_p3': 1.26682457883, u'ted_spread_p0': 1.29763514744} 1.297635 0.614574 0.478290 0.633015 0.767676
0 ted_spread_p0 cap_rate_p2 crea_hp_yoy_p3 {u'cap_rate_p2': 0.811141476444, u'crea_hp_yoy_p3': -8.88100839499, u'const': -5.58527727789, u'ted_spread_p0': 2.83870388981} {u'cap_rate_p2': 5.84092070233e-10, u'crea_hp_yoy_p3': 5.64924235802e-12, u'ted_spread_p0': 8.5084006685e-13} 5.840921e-10 {u'cap_rate_p2': 1.02146127586, u'crea_hp_yoy_p3': 1.26361213133, u'ted_spread_p0': 1.27468501164} 1.274685 0.611142 0.509502 0.635827 0.744358
0 ted_spread_p0 cap_rate_p3 corp_profit_p1 {u'cap_rate_p3': 1.02088758198, u'corp_profit_p1': -0.0194663514096, u'const': -7.15607863379, u'ted_spread_p0': 2.49614318727} {u'cap_rate_p3': 1.48967861708e-12, u'corp_profit_p1': 2.36587341489e-13, u'ted_spread_p0': 1.39564236076e-11} 1.395642e-11 {u'cap_rate_p3': 1.08231697692, u'corp_profit_p1': 1.14482598717, u'ted_spread_p0': 1.13625413757} 1.144826 0.608887 0.414865 0.637668 0.813001
0 cap_rate_p1 ted_spread_p0 crea_hp_yoy_p3 {u'cap_rate_p1': 0.789251206651, u'const': -5.40588738423, u'crea_hp_yoy_p3': -8.3294013128, u'ted_spread_p0': 2.69331735652} {u'cap_rate_p1': 1.01807994814e-09, u'crea_hp_yoy_p3': 8.57212928459e-11, u'ted_spread_p0': 6.79453180355e-12} 1.018080e-09 {u'cap_rate_p1': 1.02530776004, u'crea_hp_yoy_p3': 1.27841193373, u'ted_spread_p0': 1.26335700212} 1.278412 0.606241 0.549077 0.639821 0.713698
0 ted_spread_p0 crea_hp_yoy_p3 cap_rate_p4 {u'crea_hp_yoy_p3': -10.9366916932, u'cap_rate_p4': 0.873672209208, u'const': -5.99301833528, u'ted_spread_p0': 3.0785031203} {u'crea_hp_yoy_p3': 4.03277792628e-15, u'cap_rate_p4': 1.21300198442e-09, u'ted_spread_p0': 8.47813573869e-14} 1.213002e-09 {u'crea_hp_yoy_p3': 1.32415796614, u'cap_rate_p4': 1.06453290377, u'ted_spread_p0': 1.31619929247} 1.324158 0.604684 0.434813 0.641085 0.799023
0 ted_spread_p0 cap_rate_p2 corp_profit_p1 {u'cap_rate_p2': 0.961038613195, u'corp_profit_p1': -0.0172934378291, u'const': -6.71919753298, u'ted_spread_p0': 2.35603497118} {u'cap_rate_p2': 2.71762604741e-12, u'corp_profit_p1': 1.33144749492e-11, u'ted_spread_p0': 9.53692120781e-11} 9.536921e-11 {u'cap_rate_p2': 1.03374482226, u'corp_profit_p1': 1.09884169943, u'ted_spread_p0': 1.11410042186} 1.114100 0.603520 0.423553 0.642028 0.806943
0 cap_rate_p1 ted_spread_p0 corp_profit_p1 {u'cap_rate_p1': 0.920991926182, u'corp_profit_p1': -0.0158660128406, u'const': -6.4030489504, u'ted_spread_p0': 2.21313727996} {u'cap_rate_p1': 7.0585880264e-12, u'corp_profit_p1': 3.02949988624e-10, u'ted_spread_p0': 8.21215334728e-10} 8.212153e-10 {u'cap_rate_p1': 1.01298762173, u'corp_profit_p1': 1.08530202926, u'ted_spread_p0': 1.09761360937} 1.097614 0.594854 0.457804 0.649007 0.782603
0 ted_spread_p0 corp_profit_p1 cap_rate_p4 {u'corp_profit_p1': -0.0223204781179, u'const': -7.56469914659, u'cap_rate_p4': 1.08379220206, u'ted_spread_p0': 2.51812630047} {u'corp_profit_p1': 1.1949152689e-14, u'cap_rate_p4': 7.10244169808e-12, u'ted_spread_p0': 2.14516288708e-11} 2.145163e-11 {u'corp_profit_p1': 1.2612989854, u'cap_rate_p4': 1.18006908653, u'ted_spread_p0': 1.1431268946} 1.261299 0.594797 0.388572 0.649052 0.831067
0 cap_rate_p0 ted_spread_p0 crea_hp_yoy_p3 {u'crea_hp_yoy_p3': -7.8525157607, u'cap_rate_p0': 0.737744628637, u'const': -4.98396736099, u'ted_spread_p0': 2.38727745306} {u'crea_hp_yoy_p3': 1.75576444313e-09, u'cap_rate_p0': 8.69348819727e-09, u'ted_spread_p0': 1.05148081701e-09} 8.693488e-09 {u'crea_hp_yoy_p3': 1.30577823981, u'cap_rate_p0': 1.03552221083, u'ted_spread_p0': 1.27216361003} 1.305778 0.586771 0.550868 0.655449 0.712279
0 cap_rate_p0 ted_spread_p0 corp_profit_p1 {u'cap_rate_p0': 0.872842905445, u'const': -5.97367379688, u'corp_profit_p1': -0.0149625516801, u'ted_spread_p0': 1.89650848195} {u'cap_rate_p0': 2.90085948633e-11, u'corp_profit_p1': 3.02326631849e-09, u'ted_spread_p0': 8.42420292381e-08} 8.424203e-08 {u'cap_rate_p0': 1.00022202315, u'corp_profit_p1': 1.08376800619, u'ted_spread_p0': 1.08376614561} 1.083768 0.581685 0.465779 0.659471 0.776826
0 realgpd_yoy_p1 ted_spread_p0 cap_rate_p4 {u'cap_rate_p4': 0.459310529281, u'const': -3.19209503322, u'realgpd_yoy_p1': -0.317694979609, u'ted_spread_p0': 2.24499133704} {u'cap_rate_p4': 0.000880252388483, u'realgpd_yoy_p1': 4.64502170524e-13, u'ted_spread_p0': 1.71481655757e-09} 8.802524e-04 {u'cap_rate_p4': 1.03139580039, u'realgpd_yoy_p1': 1.10223254627, u'ted_spread_p0': 1.08994866485} 1.102233 0.559669 0.461177 0.676602 0.780165
0 realgpd_yoy_p1 ted_spread_p0 cap_rate_p3 {u'cap_rate_p3': 0.441726138975, u'const': -3.10473119229, u'realgpd_yoy_p1': -0.292255362611, u'ted_spread_p0': 2.22392026902} {u'cap_rate_p3': 0.0016672139689, u'realgpd_yoy_p1': 8.08991799692e-11, u'ted_spread_p0': 2.71002168046e-09} 1.667214e-03 {u'cap_rate_p3': 1.14924858178, u'realgpd_yoy_p1': 1.21544761004, u'ted_spread_p0': 1.08765234604} 1.215448 0.553622 0.462181 0.681231 0.779438
0 realgpd_yoy_p1 ted_spread_p0 cap_rate_p2 {u'cap_rate_p2': 0.43466635192, u'const': -3.05451545084, u'realgpd_yoy_p1': -0.275385243575, u'ted_spread_p0': 2.16541214241} {u'cap_rate_p2': 0.00266629601475, u'realgpd_yoy_p1': 3.95110812196e-09, u'ted_spread_p0': 6.47022044576e-09} 2.666296e-03 {u'cap_rate_p2': 1.28453682239, u'realgpd_yoy_p1': 1.36522930468, u'ted_spread_p0': 1.08353965769} 1.365229 0.549157 0.458911 0.684631 0.781804
0 cap_rate_p0 ted_spread_p0 corp_profit_p2 {u'corp_profit_p2': -0.013387541678, u'cap_rate_p0': 0.866163613446, u'const': -5.84769476189, u'ted_spread_p0': 1.66498201142} {u'cap_rate_p0': 1.4478538301e-10, u'corp_profit_p2': 9.61024512588e-08, u'ted_spread_p0': 2.40214450834e-06} 2.402145e-06 {u'corp_profit_p2': 1.03765625407, u'cap_rate_p0': 1.00059373935, u'ted_spread_p0': 1.03729905889} 1.037656 0.547915 0.428487 0.685573 0.803482
0 realgpd_yoy_p1 cap_rate_p1 ted_spread_p0 {u'cap_rate_p1': 0.42522657551, u'const': -2.97879289237, u'realgpd_yoy_p1': -0.264924269333, u'ted_spread_p0': 2.10282371324} {u'cap_rate_p1': 0.00449937414885, u'realgpd_yoy_p1': 5.72394243797e-08, u'ted_spread_p0': 1.78714604679e-08} 4.499374e-03 {u'cap_rate_p1': 1.41658994804, u'realgpd_yoy_p1': 1.51749717529, u'ted_spread_p0': 1.08664051545} 1.517497 0.544166 0.475067 0.688410 0.770043
0 ted_spread_p0 bbb_spread_p3 corp_profit_p1 {u'bbb_spread_p3': 0.554595648355, u'corp_profit_p1': -0.0076010199434, u'const': -2.38704628717, u'ted_spread_p0': 2.85607944113} {u'bbb_spread_p3': 1.8523928591e-09, u'corp_profit_p1': 0.00482637840243, u'ted_spread_p0': 1.3484537929e-11} 4.826378e-03 {u'bbb_spread_p3': 1.54114854916, u'corp_profit_p1': 1.32791416339, u'ted_spread_p0': 1.26387542399} 1.541149 0.540536 0.425760 0.691145 0.805397
0 cap_rate_p1 ted_spread_p0 corp_profit_p2 {u'corp_profit_p2': -0.0135243557809, u'cap_rate_p1': 0.877749475261, u'const': -6.03948102864, u'ted_spread_p0': 1.93705865732} {u'cap_rate_p1': 3.44329193716e-10, u'corp_profit_p2': 9.48169185403e-08, u'ted_spread_p0': 1.09945034371e-07} 1.099450e-07 {u'corp_profit_p2': 1.03734814056, u'cap_rate_p1': 1.01163126479, u'ted_spread_p0': 1.04793790581} 1.047938 0.538986 0.407210 0.692310 0.818302
0 cap_rate_p0 realgpd_yoy_p1 ted_spread_p0 {u'cap_rate_p0': 0.399969892292, u'const': -2.76856127445, u'realgpd_yoy_p1': -0.261163454589, u'ted_spread_p0': 1.96391841957} {u'cap_rate_p0': 0.00823171721478, u'realgpd_yoy_p1': 2.44100020902e-07, u'ted_spread_p0': 2.02057150048e-07} 8.231717e-03 {u'cap_rate_p0': 1.51138715628, u'realgpd_yoy_p1': 1.63739288792, u'ted_spread_p0': 1.13076824898} 1.637393 0.538396 0.479249 0.692753 0.766970
0 cap_rate_p1 ted_spread_p0 spindex_yoy_p3 {u'cap_rate_p1': 0.65706188261, u'spindex_yoy_p3': -0.0278317743804, u'const': -4.80448757293, u'ted_spread_p0': 2.33803685851} {u'cap_rate_p1': 2.74601006446e-06, u'spindex_yoy_p3': 1.52047871931e-07, u'ted_spread_p0': 3.75314669611e-09} 2.746010e-06 {u'cap_rate_p1': 1.11830677895, u'spindex_yoy_p3': 1.30113118522, u'ted_spread_p0': 1.17735336332} 1.301131 0.534091 0.467292 0.695975 0.775725
0 ted_spread_p0 bbb_spread_p3 crea_hp_yoy_p3 {u'bbb_spread_p3': 0.496427591994, u'crea_hp_yoy_p3': -4.04408099734, u'const': -2.11272790298, u'ted_spread_p0': 2.98218905816} {u'bbb_spread_p3': 2.53868889258e-06, u'crea_hp_yoy_p3': 0.013182986019, u'ted_spread_p0': 8.44964879018e-12} 1.318299e-02 {u'bbb_spread_p3': 2.14915809275, u'crea_hp_yoy_p3': 2.15508279604, u'ted_spread_p0': 1.31551155593} 2.155083 0.530881 0.487799 0.698369 0.760648
0 cap_rate_p0 ted_spread_p0 spindex_yoy_p3 {u'spindex_yoy_p3': -0.0268613344048, u'cap_rate_p0': 0.637203038203, u'const': -4.58585413988, u'ted_spread_p0': 2.10847588993} {u'spindex_yoy_p3': 5.48924336311e-07, u'cap_rate_p0': 4.08336221776e-06, u'ted_spread_p0': 9.57175247542e-08} 4.083362e-06 {u'spindex_yoy_p3': 1.34011381199, u'cap_rate_p0': 1.13890662152, u'ted_spread_p0': 1.20436511897} 1.340114 0.529975 0.481577 0.699043 0.765254
0 ted_spread_p0 realgpd_yoy_p2 crea_hp_yoy_p2 {u'crea_hp_yoy_p2': -5.23054020025, u'const': -0.213910926506, u'realgpd_yoy_p2': -0.300526775383, u'ted_spread_p0': 2.54263066825} {u'crea_hp_yoy_p2': 4.73746805923e-05, u'realgpd_yoy_p2': 2.90405468424e-11, u'ted_spread_p0': 4.65498962823e-10} 4.737468e-05 {u'crea_hp_yoy_p2': 1.14885263493, u'realgpd_yoy_p2': 1.07915752568, u'ted_spread_p0': 1.20209011854} 1.202090 0.528894 0.490617 0.699847 0.758552
0 realgpd_yoy_p1 ted_spread_p0 bbb_spread_p3 {u'bbb_spread_p3': 0.3392041554, u'const': -1.48765136041, u'realgpd_yoy_p1': -0.183029785519, u'ted_spread_p0': 2.53186775198} {u'bbb_spread_p3': 0.0371576508071, u'realgpd_yoy_p1': 0.0268215589596, u'ted_spread_p0': 4.5194564047e-09} 3.715765e-02 {u'bbb_spread_p3': 5.6016017398, u'realgpd_yoy_p1': 4.82586299937, u'ted_spread_p0': 1.36892246555} 5.601602 0.524097 0.448446 0.703401 0.789328
0 ted_spread_p0 corp_profit_p1 bbb_spread_p4 {u'corp_profit_p1': -0.0137172688584, u'const': -2.37417449176, u'bbb_spread_p4': 0.524233697796, u'ted_spread_p0': 3.14985768412} {u'corp_profit_p1': 2.02303842995e-07, u'bbb_spread_p4': 9.78700091729e-09, u'ted_spread_p0': 6.29018807036e-12} 2.023038e-07 {u'corp_profit_p1': 1.09331461883, u'bbb_spread_p4': 1.37836343887, u'ted_spread_p0': 1.42051444169} 1.420514 0.523007 0.343924 0.704206 0.860876
0 ted_spread_p0 cap_rate_p2 corp_profit_p2 {u'cap_rate_p2': 0.861897957176, u'corp_profit_p2': -0.0139592553872, u'const': -5.98901672449, u'ted_spread_p0': 2.01153180982} {u'cap_rate_p2': 2.3561392043e-09, u'corp_profit_p2': 7.3640978496e-08, u'ted_spread_p0': 8.20990655485e-08} 8.209907e-08 {u'cap_rate_p2': 1.01956965532, u'corp_profit_p2': 1.03727640573, u'ted_spread_p0': 1.05720340499} 1.057203 0.518564 0.346343 0.707478 0.859287
0 ted_spread_p0 cap_rate_p2 spindex_yoy_p3 {u'cap_rate_p2': 0.623635734378, u'spindex_yoy_p3': -0.0288247874575, u'const': -4.63667583576, u'ted_spread_p0': 2.40681253901} {u'cap_rate_p2': 1.38801807046e-05, u'spindex_yoy_p3': 8.44057937049e-08, u'ted_spread_p0': 2.69534162048e-09} 1.388018e-05 {u'cap_rate_p2': 1.11243360337, u'spindex_yoy_p3': 1.28413160743, u'ted_spread_p0': 1.17710523955} 1.284132 0.517091 0.428256 0.708559 0.803645
0 ted_spread_p0 bbb_spread_p3 crea_hp_yoy_p2 {u'bbb_spread_p3': 0.60361226282, u'crea_hp_yoy_p2': -2.36021166288, u'const': -2.44208114633, u'ted_spread_p0': 2.90459106331} {u'bbb_spread_p3': 1.15389476089e-10, u'crea_hp_yoy_p2': 0.0792156318557, u'ted_spread_p0': 3.35277155454e-11} 7.921563e-02 {u'bbb_spread_p3': 1.45111260067, u'crea_hp_yoy_p2': 1.31847516127, u'ted_spread_p0': 1.30031898369} 1.451113 0.513970 0.494352 0.710845 0.755766
0 ted_spread_p0 bbb_spread_p3 corp_profit_p2 {u'bbb_spread_p3': 0.582278194028, u'corp_profit_p2': -0.00457269604228, u'const': -2.42447060517, u'ted_spread_p0': 2.73939873183} {u'bbb_spread_p3': 4.46517244848e-09, u'corp_profit_p2': 0.103032476656, u'ted_spread_p0': 1.65472591821e-10} 1.030325e-01 {u'bbb_spread_p3': 1.69554846895, u'corp_profit_p2': 1.39827155759, u'ted_spread_p0': 1.26336327403} 1.695548 0.511588 0.404442 0.712585 0.820210
0 ted_spread_p0 crea_hp_yoy_p2 bbb_spread_p4 {u'crea_hp_yoy_p2': -6.70808022288, u'const': -2.42877868756, u'bbb_spread_p4': 0.607734054772, u'ted_spread_p0': 3.4961201786} {u'crea_hp_yoy_p2': 6.05151670014e-07, u'bbb_spread_p4': 1.48369689826e-10, u'ted_spread_p0': 1.03002090634e-12} 6.051517e-07 {u'crea_hp_yoy_p2': 1.15154216378, u'bbb_spread_p4': 1.37674369466, u'ted_spread_p0': 1.55319251249} 1.553193 0.511203 0.561489 0.712866 0.703807
0 realgpd_yoy_p1 ted_spread_p0 crea_hp_yoy_p3 {u'crea_hp_yoy_p3': -2.69122158056, u'const': -0.311597133369, u'realgpd_yoy_p1': -0.271003198186, u'ted_spread_p0': 2.36903437474} {u'crea_hp_yoy_p3': 0.160540939231, u'realgpd_yoy_p1': 1.68115665371e-05, u'ted_spread_p0': 1.73940931661e-08} 1.605409e-01 {u'crea_hp_yoy_p3': 2.92635266698, u'realgpd_yoy_p1': 2.51416529656, u'ted_spread_p0': 1.28272242148} 2.926353 0.510855 0.496386 0.713120 0.754245
0 realgpd_yoy_p1 ted_spread_p0 bbb_spread_p4 {u'const': -0.833411322829, u'realgpd_yoy_p1': -0.284642070828, u'bbb_spread_p4': 0.157377817458, u'ted_spread_p0': 2.41688117752} {u'realgpd_yoy_p1': 6.46698989093e-07, u'bbb_spread_p4': 0.167778133568, u'ted_spread_p0': 3.11105712134e-08} 1.677781e-01 {u'realgpd_yoy_p1': 1.99595132113, u'bbb_spread_p4': 2.51669899622, u'ted_spread_p0': 1.392766576} 2.516699 0.510479 0.431894 0.713393 0.801084
0 cap_rate_p1 ted_spread_p0 bbb_spread_p3 {u'bbb_spread_p3': 0.544850450153, u'cap_rate_p1': 0.272985200716, u'const': -3.98664801529, u'ted_spread_p0': 2.60924536243} {u'bbb_spread_p3': 1.44198483922e-06, u'cap_rate_p1': 0.121325501399, u'ted_spread_p0': 2.7102689375e-09} 1.213255e-01 {u'bbb_spread_p3': 2.34776209124, u'cap_rate_p1': 1.88813529519, u'ted_spread_p0': 1.36428733329} 2.347762 0.510130 0.428608 0.713648 0.803397
0 ted_spread_p0 realgpd_yoy_p2 crea_hp_yoy_p3 {u'crea_hp_yoy_p3': -5.59792480702, u'const': -0.330017028638, u'realgpd_yoy_p2': -0.215888553711, u'ted_spread_p0': 2.58561014684} {u'crea_hp_yoy_p3': 0.000296441873851, u'realgpd_yoy_p2': 1.87381427588e-05, u'ted_spread_p0': 1.09664873088e-09} 2.964419e-04 {u'crea_hp_yoy_p3': 1.77986311362, u'realgpd_yoy_p2': 1.51489441528, u'ted_spread_p0': 1.26112101248} 1.779863 0.509685 0.447184 0.713972 0.790230
0 cap_rate_p0 ted_spread_p0 bbb_spread_p3 {u'bbb_spread_p3': 0.538791932676, u'cap_rate_p0': 0.269149311217, u'const': -3.90589431381, u'ted_spread_p0': 2.51233917448} {u'bbb_spread_p3': 3.69381353544e-06, u'cap_rate_p0': 0.128373725201, u'ted_spread_p0': 3.88957741361e-08} 1.283737e-01 {u'bbb_spread_p3': 2.51413269277, u'cap_rate_p0': 1.99928074526, u'ted_spread_p0': 1.52794519095} 2.514133 0.509631 0.434051 0.714011 0.799561
0 ted_spread_p0 crea_hp_yoy_p3 bbb_spread_p4 {u'crea_hp_yoy_p3': -7.124071928, u'const': -1.81674640673, u'bbb_spread_p4': 0.402048103596, u'ted_spread_p0': 3.28772335045} {u'crea_hp_yoy_p3': 7.40618262123e-07, u'bbb_spread_p4': 1.99672792352e-05, u'ted_spread_p0': 4.0450297686e-12} 1.996728e-05 {u'crea_hp_yoy_p3': 1.43640456697, u'bbb_spread_p4': 1.55605617691, u'ted_spread_p0': 1.46835089361} 1.556056 0.508998 0.470157 0.714472 0.773637
0 ted_spread_p0 crea_hp_yoy_p2 realgpd_yoy_p3 {u'realgpd_yoy_p3': -0.311113715762, u'crea_hp_yoy_p2': -9.309071086, u'const': -0.103457190105, u'ted_spread_p0': 2.84913168832} {u'realgpd_yoy_p3': 1.92976213253e-10, u'crea_hp_yoy_p2': 5.13571840644e-10, u'ted_spread_p0': 6.5891420931e-11} 5.135718e-10 {u'realgpd_yoy_p3': 1.19851914035, u'crea_hp_yoy_p2': 1.30632930877, u'ted_spread_p0': 1.28561708211} 1.306329 0.508293 0.518727 0.714985 0.737325
0 ted_spread_p0 bbb_spread_p3 spindex_yoy_p3 {u'bbb_spread_p3': 0.525123157535, u'spindex_yoy_p3': -0.0106469798442, u'const': -2.29476615365, u'ted_spread_p0': 2.80601709969} {u'bbb_spread_p3': 3.20465565732e-05, u'spindex_yoy_p3': 0.152675572385, u'ted_spread_p0': 7.93900361033e-11} 1.526756e-01 {u'bbb_spread_p3': 3.02253841133, u'spindex_yoy_p3': 2.82820562708, u'ted_spread_p0': 1.26016903124} 3.022538 0.508117 0.429300 0.715113 0.802911
0 realgpd_yoy_p0 ted_spread_p0 bbb_spread_p3 {u'realgpd_yoy_p0': -0.0822791822097, u'const': -2.17908597925, u'bbb_spread_p3': 0.540487885674, u'ted_spread_p0': 2.63164243642} {u'bbb_spread_p3': 5.73922047576e-06, u'realgpd_yoy_p0': 0.153193960743, u'ted_spread_p0': 1.95668256146e-09} 1.531940e-01 {u'realgpd_yoy_p0': 2.14853765757, u'bbb_spread_p3': 2.63888992404, u'ted_spread_p0': 1.35238585586} 2.638890 0.508087 0.458403 0.715134 0.782170
0 ted_spread_p0 bbb_spread_p3 cap_rate_p4 {u'bbb_spread_p3': 0.615821291858, u'cap_rate_p4': 0.208708505931, u'const': -3.8335303104, u'ted_spread_p0': 2.74915454136} {u'bbb_spread_p3': 6.60288226921e-11, u'cap_rate_p4': 0.166847470536, u'ted_spread_p0': 1.67384077652e-10} 1.668475e-01 {u'bbb_spread_p3': 1.44211057025, u'cap_rate_p4': 1.16255484012, u'ted_spread_p0': 1.26230863366} 1.442111 0.507353 0.402521 0.715668 0.821532
0 ted_spread_p0 crea_hp_yoy_p3 realgpd_yoy_p3 {u'crea_hp_yoy_p3': -8.76409349446, u'realgpd_yoy_p3': -0.180879352626, u'const': -0.318947340435, u'ted_spread_p0': 2.86354149694} {u'crea_hp_yoy_p3': 5.84134461416e-10, u'realgpd_yoy_p3': 2.43206207395e-05, u'ted_spread_p0': 6.41362452225e-11} 2.432062e-05 {u'crea_hp_yoy_p3': 1.26977569699, u'realgpd_yoy_p3': 1.05558827276, u'ted_spread_p0': 1.29286084071} 1.292861 0.506863 0.458546 0.716024 0.782067
0 ted_spread_p0 cap_rate_p2 bbb_spread_p3 {u'cap_rate_p2': 0.225756509972, u'const': -3.78840359933, u'bbb_spread_p3': 0.570654122306, u'ted_spread_p0': 2.67079906481} {u'cap_rate_p2': 0.193180231382, u'bbb_spread_p3': 2.30746281702e-07, u'ted_spread_p0': 9.08484906408e-10} 1.931802e-01 {u'cap_rate_p2': 1.75920678376, u'bbb_spread_p3': 2.17026703782, u'ted_spread_p0': 1.31977926124} 2.170267 0.506110 0.411367 0.716570 0.815428
0 ted_spread_p0 realgpd_yoy_p2 corp_profit_p1 {u'corp_profit_p1': -0.00960849919163, u'const': -0.407772979121, u'realgpd_yoy_p2': -0.257333465025, u'ted_spread_p0': 2.29032216488} {u'corp_profit_p1': 0.000448333783238, u'realgpd_yoy_p2': 4.97703208541e-08, u'ted_spread_p0': 7.49817339718e-09} 4.483338e-04 {u'corp_profit_p1': 1.23841713376, u'realgpd_yoy_p2': 1.22668430577, u'ted_spread_p0': 1.11609747523} 1.238417 0.505281 0.321448 0.717171 0.875497
0 ted_spread_p0 bbb_spread_p3 cap_rate_p3 {u'cap_rate_p3': 0.201637275061, u'bbb_spread_p3': 0.593429505205, u'const': -3.71870866222, u'ted_spread_p0': 2.71903806154} {u'cap_rate_p3': 0.217238393496, u'bbb_spread_p3': 7.84889309862e-09, u'ted_spread_p0': 3.23824482993e-10} 2.172384e-01 {u'cap_rate_p3': 1.4724277304, u'bbb_spread_p3': 1.80756321736, u'ted_spread_p0': 1.27957899111} 1.807563 0.505135 0.405297 0.717277 0.819622
0 ted_spread_p0 spindex_yoy_p3 cap_rate_p3 {u'cap_rate_p3': 0.594228769802, u'spindex_yoy_p3': -0.031124114991, u'const': -4.49069384587, u'ted_spread_p0': 2.48354562619} {u'cap_rate_p3': 4.50370245447e-05, u'spindex_yoy_p3': 8.36225082073e-09, u'ted_spread_p0': 1.64539790007e-09} 4.503702e-05 {u'cap_rate_p3': 1.06865169504, u'spindex_yoy_p3': 1.22753770126, u'ted_spread_p0': 1.18199596216} 1.227538 0.504430 0.408088 0.717788 0.817696
0 ted_spread_p0 realgpd_yoy_p2 bbb_spread_p3 {u'bbb_spread_p3': 0.517442462306, u'const': -2.1307227209, u'realgpd_yoy_p2': -0.0864549354609, u'ted_spread_p0': 2.66817475933} {u'bbb_spread_p3': 0.000504611426207, u'realgpd_yoy_p2': 0.249265257706, u'ted_spread_p0': 1.27380241683e-09} 2.492653e-01 {u'bbb_spread_p3': 4.29026413402, u'realgpd_yoy_p2': 3.6616378528, u'ted_spread_p0': 1.34053982558} 4.290264 0.504018 0.388464 0.718086 0.831140
0 realgpd_yoy_p1 ted_spread_p0 spindex_yoy_p3 {u'spindex_yoy_p3': -0.00697418929236, u'const': -0.358295064272, u'realgpd_yoy_p1': -0.287950006036, u'ted_spread_p0': 2.24832163495} {u'spindex_yoy_p3': 0.396355057686, u'realgpd_yoy_p1': 4.80031372299e-05, u'ted_spread_p0': 2.97028063227e-08} 3.963551e-01 {u'spindex_yoy_p3': 3.44623710801, u'realgpd_yoy_p1': 3.17299042455, u'ted_spread_p0': 1.18476784624} 3.446237 0.503737 0.457973 0.718290 0.782481
0 realgpd_yoy_p1 ted_spread_p0 crea_hp_yoy_p2 {u'crea_hp_yoy_p2': -1.0795826604, u'const': -0.272715216966, u'realgpd_yoy_p1': -0.318271064261, u'ted_spread_p0': 2.22651917837} {u'crea_hp_yoy_p2': 0.450477095771, u'realgpd_yoy_p1': 3.13546780517e-10, u'ted_spread_p0': 2.90322487555e-08} 4.504771e-01 {u'crea_hp_yoy_p2': 1.48055472983, u'realgpd_yoy_p1': 1.40383603833, u'ted_spread_p0': 1.15790564414} 1.480555 0.502876 0.479863 0.718912 0.766517
0 realgpd_yoy_p1 ted_spread_p0 spindex_yoy_p4 {u'spindex_yoy_p4': -0.00287623611266, u'const': -0.307424934206, u'realgpd_yoy_p1': -0.319272637088, u'ted_spread_p0': 2.2027800319} {u'spindex_yoy_p4': 0.630438613957, u'realgpd_yoy_p1': 1.06155602716e-08, u'ted_spread_p0': 4.3262669625e-08} 6.304386e-01 {u'spindex_yoy_p4': 1.90330721432, u'realgpd_yoy_p1': 1.77597233856, u'ted_spread_p0': 1.16353394534} 1.903307 0.500930 0.450725 0.720318 0.787695
0 ted_spread_p0 bbb_spread_p3 spindex_yoy_p2 {u'bbb_spread_p3': 0.601012948771, u'spindex_yoy_p2': -0.00542467267777, u'const': -2.49819167847, u'ted_spread_p0': 2.8074946861} {u'bbb_spread_p3': 5.53590432482e-08, u'spindex_yoy_p2': 0.387126006585, u'ted_spread_p0': 1.07989092515e-10} 3.871260e-01 {u'bbb_spread_p3': 2.12232941161, u'spindex_yoy_p2': 1.94673426734, u'ted_spread_p0': 1.26539793559} 2.122329 0.500684 0.429745 0.720496 0.802598
0 realgpd_yoy_p1 ted_spread_p0 corp_profit_p1 {u'corp_profit_p1': -0.00111510712547, u'const': -0.295544468835, u'realgpd_yoy_p1': -0.322134804647, u'ted_spread_p0': 2.16984747298} {u'corp_profit_p1': 0.746943635339, u'realgpd_yoy_p1': 7.85857215941e-08, u'ted_spread_p0': 3.07526883363e-08} 7.469436e-01 {u'corp_profit_p1': 2.09799265347, u'realgpd_yoy_p1': 2.09768958589, u'ted_spread_p0': 1.09854672985} 2.097993 0.500194 0.446912 0.720849 0.790424
0 realgpd_yoy_p0 ted_spread_p0 cap_rate_p3 {u'cap_rate_p3': 0.639727648052, u'realgpd_yoy_p0': -0.253032450789, u'const': -4.28034401961, u'ted_spread_p0': 1.90847482334} {u'cap_rate_p3': 1.18825887769e-05, u'realgpd_yoy_p0': 1.23019019096e-08, u'ted_spread_p0': 3.86554982571e-07} 1.188259e-05 {u'cap_rate_p3': 1.04954092806, u'realgpd_yoy_p0': 1.04901230219, u'ted_spread_p0': 1.04189905245} 1.049541 0.500114 0.505297 0.720907 0.747542
all_combination
(5, 14, 18)
best_reg_result.summary()
OLS Regression Results
Dep. Variable: cum_pd_num R-squared: 0.615
Model: OLS Adj. R-squared: 0.601
Method: Least Squares F-statistic: 46.24
Date: Mon, 17 Apr 2017 Prob (F-statistic): 5.77e-18
Time: 21:56:19 Log-Likelihood: -85.467
No. Observations: 91 AIC: 178.9
Df Residuals: 87 BIC: 189.0
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
const -5.8248 0.770 -7.569 0.000 -7.354 -4.295
ted_spread_p0 2.9978 0.340 8.805 0.000 2.321 3.675
cap_rate_p3 0.8444 0.120 7.055 0.000 0.607 1.082
crea_hp_yoy_p3 -9.7555 1.111 -8.784 0.000 -11.963 -7.548
Omnibus: 3.899 Durbin-Watson: 2.125
Prob(Omnibus): 0.142 Jarque-Bera (JB): 3.893
Skew: 0.210 Prob(JB): 0.143
Kurtosis: 3.922 Cond. No. 107.