summary
This is an example of variable selection in linear regression(and it can be easily applied in logistic regression). It has two steps:
The first step is univariate analysis. It checks the pearson correlation between each x and y. This is helpful to reduce the candidate predictors by only selecting the variables has significant correlation. The cutoff point is set p-value no more than 10%. Also the correlation coefficient sign is tested to make sure the correlation should make sense.
The second step is multi-variate analysis. For the variables selected in step 1(say there are n variables left), all possible variables combinations will be run in the linear regression. Like all 2 variables combination(C(n, 2) combinations), 3 variables combination(C(n, 3) combinations)... will be tested in the linear regression. If there are duplicated variables(like GDP annual change and GDP quarterly change) in the model, it will de droped. The regression sign of each variable will be checked again. If any sign does not make sense, then the model will be droped. If there is any variable is not significant, then the model will be droped. If there is multicollinearity(checked by VIF), the model will be droped.
The output will list the variables in the model, the regression coefficient, the p-value, vif, R2 and RMSE on training data and validation data set.
import pandas as pd
import numpy as np
from itertools import chain, combinations
import statsmodels.formula.api as smf
import scipy.stats as scipystats
from scipy.stats import norm
from scipy.stats.stats import pearsonr
import statsmodels.api as sm
import statsmodels.stats.stattools as stools
import statsmodels.stats as stats
from patsy import dmatrices
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.graphics.regressionplots import *
import matplotlib.pyplot as plt
import math
mypath = r'F:\Dropbox\ipynb_notes\ipynb_blog\\'
indata = pd.read_pickle(mypath + "data.pkl")
set up display options
pd.options.display.max_rows
pd.set_option('display.max_colwidth', -1)
from IPython.display import display, HTML
step 1. univariate variable selection
1.1. pearson calculation
- set up the variable sign
- calculate the pearson correlation
- keep only significant variables by p-value < 0.1
def pearson_corr(indata, xvar, yvar):
outdf = pd.DataFrame()
for i in range(len(xvar)):
pcorr = pearsonr(indata[xvar[i]], indata[yvar])
corrdf = pd.DataFrame({"var_name": xvar[i], "pearson_corr":pcorr[0], "p_value":pcorr[1]}, index = [0])
outdf = pd.concat([outdf, corrdf], axis = 0)
outdf.sort_values('p_value')
outdf.columns = corrdf.columns
return outdf
def pearson_calc(indata, xvar, yvar):
pearsoncorr = pearson_corr(indata, xvar, yvar)
pearsoncorr = pearsoncorr.sort_values(by = ['p_value'])
pearsoncorr = pearsoncorr.query('p_value <= 0.1')
pearsoncorr = pearsoncorr.reset_index(drop = True)
return pearsoncorr
xvarcol = indata.columns[1:-3]
yvarcol = indata.columns[0]
pearsoncorr = pearson_calc(indata, xvarcol, yvarcol)
pearsoncorr.head()
p_value | pearson_corr | var_name | |
---|---|---|---|
0 | 1.617254e-09 | 0.498257 | cap_rate_p0 |
1 | 2.287193e-08 | -0.466109 | realgpd_yoy_p1 |
2 | 2.615205e-08 | 0.464391 | ted_spread_p4 |
3 | 8.876455e-08 | -0.448275 | realgpd_yoy_p0 |
4 | 2.407472e-07 | 0.434471 | cap_rate_p1 |
1.2. check the sign of pearson correlation
only the correlation that makes sense will be kept. For example, gdp growth should have negative correlation with default rate. If the data shows positive sign, we will drop that data.
varsign = {'bbb_spread_p': 1, 'ted_spread_p': 1, 'unemp_p': 1, 'realgpd_yoy_p': -1, 'cap_rate_p': 1, 'crea_hp_yoy_p': -1, \
'corp_profit_p': -1, 'com_price_index_yoy_p': -1, 'spindex_yoy_p': -1}
def check_sign(indata, varsign):
varsign = pd.DataFrame.from_dict(varsign, orient = "index").reset_index()
varsign.columns = ["var_name_abb", "sign"]
indata['var_name_abb'] = indata.var_name.map(lambda x: x[:-1])
indata = pd.merge(indata, varsign, how = "left", on = ['var_name_abb'])
indata['sign_true'] = np.where(indata.pearson_corr > 0, 1, -1) * indata.sign
return indata.query('sign_true == 1')
output = check_sign(pearsoncorr, varsign)
output.sort_values('var_name')
output.head()
p_value | pearson_corr | var_name | var_name_abb | sign | sign_true | |
---|---|---|---|---|---|---|
0 | 1.617254e-09 | 0.498257 | cap_rate_p0 | cap_rate_p | 1 | 1 |
1 | 2.287193e-08 | -0.466109 | realgpd_yoy_p1 | realgpd_yoy_p | -1 | 1 |
2 | 2.615205e-08 | 0.464391 | ted_spread_p4 | ted_spread_p | 1 | 1 |
3 | 8.876455e-08 | -0.448275 | realgpd_yoy_p0 | realgpd_yoy_p | -1 | 1 |
4 | 2.407472e-07 | 0.434471 | cap_rate_p1 | cap_rate_p | 1 | 1 |
2. Multi-variate variable selection
In this step, we will try all the combinations of the X variables in the data and run the multi-variable regression.
2.1 Prepare data
From the univariate analysis, we will get the list of X variables(pearson p-vaule is significant, and correlation sign is correct) that should be tested on the multi-variable analysis.
We will consider both the training data and the validation data. So the data will be prepared into training and validation.
xvar = output.var_name
cniX = indata.ix[:, xvar]
cniY = indata.ix[:, "cum_pd_num"]
train_flag = indata.train_flag
xtrain, xtest = cniX[train_flag], cniX[~train_flag]
ytrain, ytest = cniY[train_flag], cniY[~train_flag]
print xtrain.shape
print xtest.shape
(91, 28)
(39, 28)
2.2. multi-variable analysis
- for all 2/3/4 ... variable combination, run the linear regression
- calculate the vif and vif_max for the X combination
- run linear regression
- if there is variables duplicated, then drop it, otherwise go to next step
- if the any variables coefficient sign is not intuitive, drop it; otherwise go to next step
- if new model score is better than the previous model, keep it; otherwise drop it
def verify_sign(params):
setsign = varsign
setsign = pd.DataFrame.from_dict(setsign, orient = "index").reset_index()
setsign.columns = ['vars', 'coefsign']
params = params.reset_index()
params.columns = ['vars', 'coefd']
params['vars'] = params.vars.map(lambda x: x[:-1])
combinedata = pd.merge(setsign, params, how = 'inner', on = 'vars')
combinedata['coefd_'] = combinedata.coefsign * np.sign(combinedata.coefd)
if sum(combinedata['coefd_'] > 0) == combinedata.shape[0]:
return True
else:
return False
def check_dup_vars(xvar):
xx = xvar
xxx = set([x[:-1] for x in xx])
if len(xxx) == len(xx):
return True
else:
return False
def all_combination_withsign2(xtrain, ytrain, xtest, ytest, k_features = [1]):
outdf = pd.DataFrame()
features = xtrain.columns.tolist()
n_features = xtrain.shape[1]
subsets = chain.from_iterable(combinations(xrange(n_features), k) for k in k_features)
best_score = -np.inf
all_combination = None
for subset in subsets:
newxtrain = xtrain.iloc[:, subset]
newxtrain = sm.add_constant(newxtrain)
vif = dict(zip(newxtrain.columns, [variance_inflation_factor(newxtrain.values, i) for i in range(newxtrain.shape[1])]))
del vif['const']
vif_max = max(vif.itervalues())
newxtest = xtest.iloc[:, subset]
newxtest = sm.add_constant(newxtest)
lin_reg = sm.OLS(ytrain, newxtrain).fit()
#lin_reg = sm.OLS(ytrain, newxtrain).fit(cov_type='HAC',cov_kwds={'maxlags':1})
if check_dup_vars(lin_reg.params.index.tolist()):
r2train = lin_reg.rsquared
r2test1 = np.sum((lin_reg.predict(newxtest) - np.mean(ytest))**2) / np.sum((ytest - np.mean(ytest))**2)
r2test = 1 - np.sum((lin_reg.predict(newxtest) - ytest)**2) / np.sum((ytest - np.mean(ytest))**2)
r2test2 = 1 - np.var(ytest - lin_reg.predict(newxtest)) / np.var(ytest)
rmse_train = math.sqrt(sum((lin_reg.predict(newxtrain) - ytrain)**2) / (len(ytrain) - len(lin_reg.params)))
rmse_train2 = math.sqrt(lin_reg.mse_resid)
pred = lin_reg.predict(newxtest)
rmse_test = math.sqrt(sum((pred - ytest)**2)/(len(ytest) - len(lin_reg.params)))
if (verify_sign(lin_reg.params)):
sss = " ".join([features[i] for i in subset])
pvalues = lin_reg.pvalues.drop('const').to_dict(); pvalues_max = max(pvalues.itervalues())
dd = {'vars':sss, 'r2train':r2train, 'r2test':r2test, 'rmse_train':rmse_train, 'rmse_test':rmse_test, 'pvalues': [pvalues], \
'pvalues_max': pvalues_max, "reg_coef": [dict(lin_reg.params)], "vif":[vif], "vif_max":vif_max}
df_i = pd.DataFrame(dd, index = [0], columns = dd.keys())
outdf = pd.concat([df_i, outdf], axis = 0)
score = lin_reg.rsquared
if (score > best_score) & (vif_max < 5) & (pvalues_max < 0.1):
best_score, all_combination = score, subset
best_reg_result = lin_reg
outdf = outdf[["vars", "reg_coef", "pvalues", "pvalues_max", "vif", "vif_max", "r2train", "r2test", "rmse_train", "rmse_test"]]
return all_combination, best_score, best_reg_result, outdf
all_combination, best_score, best_reg_result, outdf = all_combination_withsign2(xtrain, ytrain, xtest, ytest, k_features = [3])
List all the candidate models and sort them by the R2 on training data.
outdf.sort_values('r2train', ascending = False).query('r2train > 0.5')
vars | reg_coef | pvalues | pvalues_max | vif | vif_max | r2train | r2test | rmse_train | rmse_test | |
---|---|---|---|---|---|---|---|---|---|---|
0 | ted_spread_p0 cap_rate_p3 crea_hp_yoy_p3 | {u'cap_rate_p3': 0.844407825413, u'crea_hp_yoy_p3': -9.75552407688, u'const': -5.82475185843, u'ted_spread_p0': 2.9978370395} | {u'cap_rate_p3': 3.94306563611e-10, u'crea_hp_yoy_p3': 1.24108359437e-13, u'ted_spread_p0': 1.12453478033e-13} | 3.943066e-10 | {u'cap_rate_p3': 1.0291090023, u'crea_hp_yoy_p3': 1.26682457883, u'ted_spread_p0': 1.29763514744} | 1.297635 | 0.614574 | 0.478290 | 0.633015 | 0.767676 |
0 | ted_spread_p0 cap_rate_p2 crea_hp_yoy_p3 | {u'cap_rate_p2': 0.811141476444, u'crea_hp_yoy_p3': -8.88100839499, u'const': -5.58527727789, u'ted_spread_p0': 2.83870388981} | {u'cap_rate_p2': 5.84092070233e-10, u'crea_hp_yoy_p3': 5.64924235802e-12, u'ted_spread_p0': 8.5084006685e-13} | 5.840921e-10 | {u'cap_rate_p2': 1.02146127586, u'crea_hp_yoy_p3': 1.26361213133, u'ted_spread_p0': 1.27468501164} | 1.274685 | 0.611142 | 0.509502 | 0.635827 | 0.744358 |
0 | ted_spread_p0 cap_rate_p3 corp_profit_p1 | {u'cap_rate_p3': 1.02088758198, u'corp_profit_p1': -0.0194663514096, u'const': -7.15607863379, u'ted_spread_p0': 2.49614318727} | {u'cap_rate_p3': 1.48967861708e-12, u'corp_profit_p1': 2.36587341489e-13, u'ted_spread_p0': 1.39564236076e-11} | 1.395642e-11 | {u'cap_rate_p3': 1.08231697692, u'corp_profit_p1': 1.14482598717, u'ted_spread_p0': 1.13625413757} | 1.144826 | 0.608887 | 0.414865 | 0.637668 | 0.813001 |
0 | cap_rate_p1 ted_spread_p0 crea_hp_yoy_p3 | {u'cap_rate_p1': 0.789251206651, u'const': -5.40588738423, u'crea_hp_yoy_p3': -8.3294013128, u'ted_spread_p0': 2.69331735652} | {u'cap_rate_p1': 1.01807994814e-09, u'crea_hp_yoy_p3': 8.57212928459e-11, u'ted_spread_p0': 6.79453180355e-12} | 1.018080e-09 | {u'cap_rate_p1': 1.02530776004, u'crea_hp_yoy_p3': 1.27841193373, u'ted_spread_p0': 1.26335700212} | 1.278412 | 0.606241 | 0.549077 | 0.639821 | 0.713698 |
0 | ted_spread_p0 crea_hp_yoy_p3 cap_rate_p4 | {u'crea_hp_yoy_p3': -10.9366916932, u'cap_rate_p4': 0.873672209208, u'const': -5.99301833528, u'ted_spread_p0': 3.0785031203} | {u'crea_hp_yoy_p3': 4.03277792628e-15, u'cap_rate_p4': 1.21300198442e-09, u'ted_spread_p0': 8.47813573869e-14} | 1.213002e-09 | {u'crea_hp_yoy_p3': 1.32415796614, u'cap_rate_p4': 1.06453290377, u'ted_spread_p0': 1.31619929247} | 1.324158 | 0.604684 | 0.434813 | 0.641085 | 0.799023 |
0 | ted_spread_p0 cap_rate_p2 corp_profit_p1 | {u'cap_rate_p2': 0.961038613195, u'corp_profit_p1': -0.0172934378291, u'const': -6.71919753298, u'ted_spread_p0': 2.35603497118} | {u'cap_rate_p2': 2.71762604741e-12, u'corp_profit_p1': 1.33144749492e-11, u'ted_spread_p0': 9.53692120781e-11} | 9.536921e-11 | {u'cap_rate_p2': 1.03374482226, u'corp_profit_p1': 1.09884169943, u'ted_spread_p0': 1.11410042186} | 1.114100 | 0.603520 | 0.423553 | 0.642028 | 0.806943 |
0 | cap_rate_p1 ted_spread_p0 corp_profit_p1 | {u'cap_rate_p1': 0.920991926182, u'corp_profit_p1': -0.0158660128406, u'const': -6.4030489504, u'ted_spread_p0': 2.21313727996} | {u'cap_rate_p1': 7.0585880264e-12, u'corp_profit_p1': 3.02949988624e-10, u'ted_spread_p0': 8.21215334728e-10} | 8.212153e-10 | {u'cap_rate_p1': 1.01298762173, u'corp_profit_p1': 1.08530202926, u'ted_spread_p0': 1.09761360937} | 1.097614 | 0.594854 | 0.457804 | 0.649007 | 0.782603 |
0 | ted_spread_p0 corp_profit_p1 cap_rate_p4 | {u'corp_profit_p1': -0.0223204781179, u'const': -7.56469914659, u'cap_rate_p4': 1.08379220206, u'ted_spread_p0': 2.51812630047} | {u'corp_profit_p1': 1.1949152689e-14, u'cap_rate_p4': 7.10244169808e-12, u'ted_spread_p0': 2.14516288708e-11} | 2.145163e-11 | {u'corp_profit_p1': 1.2612989854, u'cap_rate_p4': 1.18006908653, u'ted_spread_p0': 1.1431268946} | 1.261299 | 0.594797 | 0.388572 | 0.649052 | 0.831067 |
0 | cap_rate_p0 ted_spread_p0 crea_hp_yoy_p3 | {u'crea_hp_yoy_p3': -7.8525157607, u'cap_rate_p0': 0.737744628637, u'const': -4.98396736099, u'ted_spread_p0': 2.38727745306} | {u'crea_hp_yoy_p3': 1.75576444313e-09, u'cap_rate_p0': 8.69348819727e-09, u'ted_spread_p0': 1.05148081701e-09} | 8.693488e-09 | {u'crea_hp_yoy_p3': 1.30577823981, u'cap_rate_p0': 1.03552221083, u'ted_spread_p0': 1.27216361003} | 1.305778 | 0.586771 | 0.550868 | 0.655449 | 0.712279 |
0 | cap_rate_p0 ted_spread_p0 corp_profit_p1 | {u'cap_rate_p0': 0.872842905445, u'const': -5.97367379688, u'corp_profit_p1': -0.0149625516801, u'ted_spread_p0': 1.89650848195} | {u'cap_rate_p0': 2.90085948633e-11, u'corp_profit_p1': 3.02326631849e-09, u'ted_spread_p0': 8.42420292381e-08} | 8.424203e-08 | {u'cap_rate_p0': 1.00022202315, u'corp_profit_p1': 1.08376800619, u'ted_spread_p0': 1.08376614561} | 1.083768 | 0.581685 | 0.465779 | 0.659471 | 0.776826 |
0 | realgpd_yoy_p1 ted_spread_p0 cap_rate_p4 | {u'cap_rate_p4': 0.459310529281, u'const': -3.19209503322, u'realgpd_yoy_p1': -0.317694979609, u'ted_spread_p0': 2.24499133704} | {u'cap_rate_p4': 0.000880252388483, u'realgpd_yoy_p1': 4.64502170524e-13, u'ted_spread_p0': 1.71481655757e-09} | 8.802524e-04 | {u'cap_rate_p4': 1.03139580039, u'realgpd_yoy_p1': 1.10223254627, u'ted_spread_p0': 1.08994866485} | 1.102233 | 0.559669 | 0.461177 | 0.676602 | 0.780165 |
0 | realgpd_yoy_p1 ted_spread_p0 cap_rate_p3 | {u'cap_rate_p3': 0.441726138975, u'const': -3.10473119229, u'realgpd_yoy_p1': -0.292255362611, u'ted_spread_p0': 2.22392026902} | {u'cap_rate_p3': 0.0016672139689, u'realgpd_yoy_p1': 8.08991799692e-11, u'ted_spread_p0': 2.71002168046e-09} | 1.667214e-03 | {u'cap_rate_p3': 1.14924858178, u'realgpd_yoy_p1': 1.21544761004, u'ted_spread_p0': 1.08765234604} | 1.215448 | 0.553622 | 0.462181 | 0.681231 | 0.779438 |
0 | realgpd_yoy_p1 ted_spread_p0 cap_rate_p2 | {u'cap_rate_p2': 0.43466635192, u'const': -3.05451545084, u'realgpd_yoy_p1': -0.275385243575, u'ted_spread_p0': 2.16541214241} | {u'cap_rate_p2': 0.00266629601475, u'realgpd_yoy_p1': 3.95110812196e-09, u'ted_spread_p0': 6.47022044576e-09} | 2.666296e-03 | {u'cap_rate_p2': 1.28453682239, u'realgpd_yoy_p1': 1.36522930468, u'ted_spread_p0': 1.08353965769} | 1.365229 | 0.549157 | 0.458911 | 0.684631 | 0.781804 |
0 | cap_rate_p0 ted_spread_p0 corp_profit_p2 | {u'corp_profit_p2': -0.013387541678, u'cap_rate_p0': 0.866163613446, u'const': -5.84769476189, u'ted_spread_p0': 1.66498201142} | {u'cap_rate_p0': 1.4478538301e-10, u'corp_profit_p2': 9.61024512588e-08, u'ted_spread_p0': 2.40214450834e-06} | 2.402145e-06 | {u'corp_profit_p2': 1.03765625407, u'cap_rate_p0': 1.00059373935, u'ted_spread_p0': 1.03729905889} | 1.037656 | 0.547915 | 0.428487 | 0.685573 | 0.803482 |
0 | realgpd_yoy_p1 cap_rate_p1 ted_spread_p0 | {u'cap_rate_p1': 0.42522657551, u'const': -2.97879289237, u'realgpd_yoy_p1': -0.264924269333, u'ted_spread_p0': 2.10282371324} | {u'cap_rate_p1': 0.00449937414885, u'realgpd_yoy_p1': 5.72394243797e-08, u'ted_spread_p0': 1.78714604679e-08} | 4.499374e-03 | {u'cap_rate_p1': 1.41658994804, u'realgpd_yoy_p1': 1.51749717529, u'ted_spread_p0': 1.08664051545} | 1.517497 | 0.544166 | 0.475067 | 0.688410 | 0.770043 |
0 | ted_spread_p0 bbb_spread_p3 corp_profit_p1 | {u'bbb_spread_p3': 0.554595648355, u'corp_profit_p1': -0.0076010199434, u'const': -2.38704628717, u'ted_spread_p0': 2.85607944113} | {u'bbb_spread_p3': 1.8523928591e-09, u'corp_profit_p1': 0.00482637840243, u'ted_spread_p0': 1.3484537929e-11} | 4.826378e-03 | {u'bbb_spread_p3': 1.54114854916, u'corp_profit_p1': 1.32791416339, u'ted_spread_p0': 1.26387542399} | 1.541149 | 0.540536 | 0.425760 | 0.691145 | 0.805397 |
0 | cap_rate_p1 ted_spread_p0 corp_profit_p2 | {u'corp_profit_p2': -0.0135243557809, u'cap_rate_p1': 0.877749475261, u'const': -6.03948102864, u'ted_spread_p0': 1.93705865732} | {u'cap_rate_p1': 3.44329193716e-10, u'corp_profit_p2': 9.48169185403e-08, u'ted_spread_p0': 1.09945034371e-07} | 1.099450e-07 | {u'corp_profit_p2': 1.03734814056, u'cap_rate_p1': 1.01163126479, u'ted_spread_p0': 1.04793790581} | 1.047938 | 0.538986 | 0.407210 | 0.692310 | 0.818302 |
0 | cap_rate_p0 realgpd_yoy_p1 ted_spread_p0 | {u'cap_rate_p0': 0.399969892292, u'const': -2.76856127445, u'realgpd_yoy_p1': -0.261163454589, u'ted_spread_p0': 1.96391841957} | {u'cap_rate_p0': 0.00823171721478, u'realgpd_yoy_p1': 2.44100020902e-07, u'ted_spread_p0': 2.02057150048e-07} | 8.231717e-03 | {u'cap_rate_p0': 1.51138715628, u'realgpd_yoy_p1': 1.63739288792, u'ted_spread_p0': 1.13076824898} | 1.637393 | 0.538396 | 0.479249 | 0.692753 | 0.766970 |
0 | cap_rate_p1 ted_spread_p0 spindex_yoy_p3 | {u'cap_rate_p1': 0.65706188261, u'spindex_yoy_p3': -0.0278317743804, u'const': -4.80448757293, u'ted_spread_p0': 2.33803685851} | {u'cap_rate_p1': 2.74601006446e-06, u'spindex_yoy_p3': 1.52047871931e-07, u'ted_spread_p0': 3.75314669611e-09} | 2.746010e-06 | {u'cap_rate_p1': 1.11830677895, u'spindex_yoy_p3': 1.30113118522, u'ted_spread_p0': 1.17735336332} | 1.301131 | 0.534091 | 0.467292 | 0.695975 | 0.775725 |
0 | ted_spread_p0 bbb_spread_p3 crea_hp_yoy_p3 | {u'bbb_spread_p3': 0.496427591994, u'crea_hp_yoy_p3': -4.04408099734, u'const': -2.11272790298, u'ted_spread_p0': 2.98218905816} | {u'bbb_spread_p3': 2.53868889258e-06, u'crea_hp_yoy_p3': 0.013182986019, u'ted_spread_p0': 8.44964879018e-12} | 1.318299e-02 | {u'bbb_spread_p3': 2.14915809275, u'crea_hp_yoy_p3': 2.15508279604, u'ted_spread_p0': 1.31551155593} | 2.155083 | 0.530881 | 0.487799 | 0.698369 | 0.760648 |
0 | cap_rate_p0 ted_spread_p0 spindex_yoy_p3 | {u'spindex_yoy_p3': -0.0268613344048, u'cap_rate_p0': 0.637203038203, u'const': -4.58585413988, u'ted_spread_p0': 2.10847588993} | {u'spindex_yoy_p3': 5.48924336311e-07, u'cap_rate_p0': 4.08336221776e-06, u'ted_spread_p0': 9.57175247542e-08} | 4.083362e-06 | {u'spindex_yoy_p3': 1.34011381199, u'cap_rate_p0': 1.13890662152, u'ted_spread_p0': 1.20436511897} | 1.340114 | 0.529975 | 0.481577 | 0.699043 | 0.765254 |
0 | ted_spread_p0 realgpd_yoy_p2 crea_hp_yoy_p2 | {u'crea_hp_yoy_p2': -5.23054020025, u'const': -0.213910926506, u'realgpd_yoy_p2': -0.300526775383, u'ted_spread_p0': 2.54263066825} | {u'crea_hp_yoy_p2': 4.73746805923e-05, u'realgpd_yoy_p2': 2.90405468424e-11, u'ted_spread_p0': 4.65498962823e-10} | 4.737468e-05 | {u'crea_hp_yoy_p2': 1.14885263493, u'realgpd_yoy_p2': 1.07915752568, u'ted_spread_p0': 1.20209011854} | 1.202090 | 0.528894 | 0.490617 | 0.699847 | 0.758552 |
0 | realgpd_yoy_p1 ted_spread_p0 bbb_spread_p3 | {u'bbb_spread_p3': 0.3392041554, u'const': -1.48765136041, u'realgpd_yoy_p1': -0.183029785519, u'ted_spread_p0': 2.53186775198} | {u'bbb_spread_p3': 0.0371576508071, u'realgpd_yoy_p1': 0.0268215589596, u'ted_spread_p0': 4.5194564047e-09} | 3.715765e-02 | {u'bbb_spread_p3': 5.6016017398, u'realgpd_yoy_p1': 4.82586299937, u'ted_spread_p0': 1.36892246555} | 5.601602 | 0.524097 | 0.448446 | 0.703401 | 0.789328 |
0 | ted_spread_p0 corp_profit_p1 bbb_spread_p4 | {u'corp_profit_p1': -0.0137172688584, u'const': -2.37417449176, u'bbb_spread_p4': 0.524233697796, u'ted_spread_p0': 3.14985768412} | {u'corp_profit_p1': 2.02303842995e-07, u'bbb_spread_p4': 9.78700091729e-09, u'ted_spread_p0': 6.29018807036e-12} | 2.023038e-07 | {u'corp_profit_p1': 1.09331461883, u'bbb_spread_p4': 1.37836343887, u'ted_spread_p0': 1.42051444169} | 1.420514 | 0.523007 | 0.343924 | 0.704206 | 0.860876 |
0 | ted_spread_p0 cap_rate_p2 corp_profit_p2 | {u'cap_rate_p2': 0.861897957176, u'corp_profit_p2': -0.0139592553872, u'const': -5.98901672449, u'ted_spread_p0': 2.01153180982} | {u'cap_rate_p2': 2.3561392043e-09, u'corp_profit_p2': 7.3640978496e-08, u'ted_spread_p0': 8.20990655485e-08} | 8.209907e-08 | {u'cap_rate_p2': 1.01956965532, u'corp_profit_p2': 1.03727640573, u'ted_spread_p0': 1.05720340499} | 1.057203 | 0.518564 | 0.346343 | 0.707478 | 0.859287 |
0 | ted_spread_p0 cap_rate_p2 spindex_yoy_p3 | {u'cap_rate_p2': 0.623635734378, u'spindex_yoy_p3': -0.0288247874575, u'const': -4.63667583576, u'ted_spread_p0': 2.40681253901} | {u'cap_rate_p2': 1.38801807046e-05, u'spindex_yoy_p3': 8.44057937049e-08, u'ted_spread_p0': 2.69534162048e-09} | 1.388018e-05 | {u'cap_rate_p2': 1.11243360337, u'spindex_yoy_p3': 1.28413160743, u'ted_spread_p0': 1.17710523955} | 1.284132 | 0.517091 | 0.428256 | 0.708559 | 0.803645 |
0 | ted_spread_p0 bbb_spread_p3 crea_hp_yoy_p2 | {u'bbb_spread_p3': 0.60361226282, u'crea_hp_yoy_p2': -2.36021166288, u'const': -2.44208114633, u'ted_spread_p0': 2.90459106331} | {u'bbb_spread_p3': 1.15389476089e-10, u'crea_hp_yoy_p2': 0.0792156318557, u'ted_spread_p0': 3.35277155454e-11} | 7.921563e-02 | {u'bbb_spread_p3': 1.45111260067, u'crea_hp_yoy_p2': 1.31847516127, u'ted_spread_p0': 1.30031898369} | 1.451113 | 0.513970 | 0.494352 | 0.710845 | 0.755766 |
0 | ted_spread_p0 bbb_spread_p3 corp_profit_p2 | {u'bbb_spread_p3': 0.582278194028, u'corp_profit_p2': -0.00457269604228, u'const': -2.42447060517, u'ted_spread_p0': 2.73939873183} | {u'bbb_spread_p3': 4.46517244848e-09, u'corp_profit_p2': 0.103032476656, u'ted_spread_p0': 1.65472591821e-10} | 1.030325e-01 | {u'bbb_spread_p3': 1.69554846895, u'corp_profit_p2': 1.39827155759, u'ted_spread_p0': 1.26336327403} | 1.695548 | 0.511588 | 0.404442 | 0.712585 | 0.820210 |
0 | ted_spread_p0 crea_hp_yoy_p2 bbb_spread_p4 | {u'crea_hp_yoy_p2': -6.70808022288, u'const': -2.42877868756, u'bbb_spread_p4': 0.607734054772, u'ted_spread_p0': 3.4961201786} | {u'crea_hp_yoy_p2': 6.05151670014e-07, u'bbb_spread_p4': 1.48369689826e-10, u'ted_spread_p0': 1.03002090634e-12} | 6.051517e-07 | {u'crea_hp_yoy_p2': 1.15154216378, u'bbb_spread_p4': 1.37674369466, u'ted_spread_p0': 1.55319251249} | 1.553193 | 0.511203 | 0.561489 | 0.712866 | 0.703807 |
0 | realgpd_yoy_p1 ted_spread_p0 crea_hp_yoy_p3 | {u'crea_hp_yoy_p3': -2.69122158056, u'const': -0.311597133369, u'realgpd_yoy_p1': -0.271003198186, u'ted_spread_p0': 2.36903437474} | {u'crea_hp_yoy_p3': 0.160540939231, u'realgpd_yoy_p1': 1.68115665371e-05, u'ted_spread_p0': 1.73940931661e-08} | 1.605409e-01 | {u'crea_hp_yoy_p3': 2.92635266698, u'realgpd_yoy_p1': 2.51416529656, u'ted_spread_p0': 1.28272242148} | 2.926353 | 0.510855 | 0.496386 | 0.713120 | 0.754245 |
0 | realgpd_yoy_p1 ted_spread_p0 bbb_spread_p4 | {u'const': -0.833411322829, u'realgpd_yoy_p1': -0.284642070828, u'bbb_spread_p4': 0.157377817458, u'ted_spread_p0': 2.41688117752} | {u'realgpd_yoy_p1': 6.46698989093e-07, u'bbb_spread_p4': 0.167778133568, u'ted_spread_p0': 3.11105712134e-08} | 1.677781e-01 | {u'realgpd_yoy_p1': 1.99595132113, u'bbb_spread_p4': 2.51669899622, u'ted_spread_p0': 1.392766576} | 2.516699 | 0.510479 | 0.431894 | 0.713393 | 0.801084 |
0 | cap_rate_p1 ted_spread_p0 bbb_spread_p3 | {u'bbb_spread_p3': 0.544850450153, u'cap_rate_p1': 0.272985200716, u'const': -3.98664801529, u'ted_spread_p0': 2.60924536243} | {u'bbb_spread_p3': 1.44198483922e-06, u'cap_rate_p1': 0.121325501399, u'ted_spread_p0': 2.7102689375e-09} | 1.213255e-01 | {u'bbb_spread_p3': 2.34776209124, u'cap_rate_p1': 1.88813529519, u'ted_spread_p0': 1.36428733329} | 2.347762 | 0.510130 | 0.428608 | 0.713648 | 0.803397 |
0 | ted_spread_p0 realgpd_yoy_p2 crea_hp_yoy_p3 | {u'crea_hp_yoy_p3': -5.59792480702, u'const': -0.330017028638, u'realgpd_yoy_p2': -0.215888553711, u'ted_spread_p0': 2.58561014684} | {u'crea_hp_yoy_p3': 0.000296441873851, u'realgpd_yoy_p2': 1.87381427588e-05, u'ted_spread_p0': 1.09664873088e-09} | 2.964419e-04 | {u'crea_hp_yoy_p3': 1.77986311362, u'realgpd_yoy_p2': 1.51489441528, u'ted_spread_p0': 1.26112101248} | 1.779863 | 0.509685 | 0.447184 | 0.713972 | 0.790230 |
0 | cap_rate_p0 ted_spread_p0 bbb_spread_p3 | {u'bbb_spread_p3': 0.538791932676, u'cap_rate_p0': 0.269149311217, u'const': -3.90589431381, u'ted_spread_p0': 2.51233917448} | {u'bbb_spread_p3': 3.69381353544e-06, u'cap_rate_p0': 0.128373725201, u'ted_spread_p0': 3.88957741361e-08} | 1.283737e-01 | {u'bbb_spread_p3': 2.51413269277, u'cap_rate_p0': 1.99928074526, u'ted_spread_p0': 1.52794519095} | 2.514133 | 0.509631 | 0.434051 | 0.714011 | 0.799561 |
0 | ted_spread_p0 crea_hp_yoy_p3 bbb_spread_p4 | {u'crea_hp_yoy_p3': -7.124071928, u'const': -1.81674640673, u'bbb_spread_p4': 0.402048103596, u'ted_spread_p0': 3.28772335045} | {u'crea_hp_yoy_p3': 7.40618262123e-07, u'bbb_spread_p4': 1.99672792352e-05, u'ted_spread_p0': 4.0450297686e-12} | 1.996728e-05 | {u'crea_hp_yoy_p3': 1.43640456697, u'bbb_spread_p4': 1.55605617691, u'ted_spread_p0': 1.46835089361} | 1.556056 | 0.508998 | 0.470157 | 0.714472 | 0.773637 |
0 | ted_spread_p0 crea_hp_yoy_p2 realgpd_yoy_p3 | {u'realgpd_yoy_p3': -0.311113715762, u'crea_hp_yoy_p2': -9.309071086, u'const': -0.103457190105, u'ted_spread_p0': 2.84913168832} | {u'realgpd_yoy_p3': 1.92976213253e-10, u'crea_hp_yoy_p2': 5.13571840644e-10, u'ted_spread_p0': 6.5891420931e-11} | 5.135718e-10 | {u'realgpd_yoy_p3': 1.19851914035, u'crea_hp_yoy_p2': 1.30632930877, u'ted_spread_p0': 1.28561708211} | 1.306329 | 0.508293 | 0.518727 | 0.714985 | 0.737325 |
0 | ted_spread_p0 bbb_spread_p3 spindex_yoy_p3 | {u'bbb_spread_p3': 0.525123157535, u'spindex_yoy_p3': -0.0106469798442, u'const': -2.29476615365, u'ted_spread_p0': 2.80601709969} | {u'bbb_spread_p3': 3.20465565732e-05, u'spindex_yoy_p3': 0.152675572385, u'ted_spread_p0': 7.93900361033e-11} | 1.526756e-01 | {u'bbb_spread_p3': 3.02253841133, u'spindex_yoy_p3': 2.82820562708, u'ted_spread_p0': 1.26016903124} | 3.022538 | 0.508117 | 0.429300 | 0.715113 | 0.802911 |
0 | realgpd_yoy_p0 ted_spread_p0 bbb_spread_p3 | {u'realgpd_yoy_p0': -0.0822791822097, u'const': -2.17908597925, u'bbb_spread_p3': 0.540487885674, u'ted_spread_p0': 2.63164243642} | {u'bbb_spread_p3': 5.73922047576e-06, u'realgpd_yoy_p0': 0.153193960743, u'ted_spread_p0': 1.95668256146e-09} | 1.531940e-01 | {u'realgpd_yoy_p0': 2.14853765757, u'bbb_spread_p3': 2.63888992404, u'ted_spread_p0': 1.35238585586} | 2.638890 | 0.508087 | 0.458403 | 0.715134 | 0.782170 |
0 | ted_spread_p0 bbb_spread_p3 cap_rate_p4 | {u'bbb_spread_p3': 0.615821291858, u'cap_rate_p4': 0.208708505931, u'const': -3.8335303104, u'ted_spread_p0': 2.74915454136} | {u'bbb_spread_p3': 6.60288226921e-11, u'cap_rate_p4': 0.166847470536, u'ted_spread_p0': 1.67384077652e-10} | 1.668475e-01 | {u'bbb_spread_p3': 1.44211057025, u'cap_rate_p4': 1.16255484012, u'ted_spread_p0': 1.26230863366} | 1.442111 | 0.507353 | 0.402521 | 0.715668 | 0.821532 |
0 | ted_spread_p0 crea_hp_yoy_p3 realgpd_yoy_p3 | {u'crea_hp_yoy_p3': -8.76409349446, u'realgpd_yoy_p3': -0.180879352626, u'const': -0.318947340435, u'ted_spread_p0': 2.86354149694} | {u'crea_hp_yoy_p3': 5.84134461416e-10, u'realgpd_yoy_p3': 2.43206207395e-05, u'ted_spread_p0': 6.41362452225e-11} | 2.432062e-05 | {u'crea_hp_yoy_p3': 1.26977569699, u'realgpd_yoy_p3': 1.05558827276, u'ted_spread_p0': 1.29286084071} | 1.292861 | 0.506863 | 0.458546 | 0.716024 | 0.782067 |
0 | ted_spread_p0 cap_rate_p2 bbb_spread_p3 | {u'cap_rate_p2': 0.225756509972, u'const': -3.78840359933, u'bbb_spread_p3': 0.570654122306, u'ted_spread_p0': 2.67079906481} | {u'cap_rate_p2': 0.193180231382, u'bbb_spread_p3': 2.30746281702e-07, u'ted_spread_p0': 9.08484906408e-10} | 1.931802e-01 | {u'cap_rate_p2': 1.75920678376, u'bbb_spread_p3': 2.17026703782, u'ted_spread_p0': 1.31977926124} | 2.170267 | 0.506110 | 0.411367 | 0.716570 | 0.815428 |
0 | ted_spread_p0 realgpd_yoy_p2 corp_profit_p1 | {u'corp_profit_p1': -0.00960849919163, u'const': -0.407772979121, u'realgpd_yoy_p2': -0.257333465025, u'ted_spread_p0': 2.29032216488} | {u'corp_profit_p1': 0.000448333783238, u'realgpd_yoy_p2': 4.97703208541e-08, u'ted_spread_p0': 7.49817339718e-09} | 4.483338e-04 | {u'corp_profit_p1': 1.23841713376, u'realgpd_yoy_p2': 1.22668430577, u'ted_spread_p0': 1.11609747523} | 1.238417 | 0.505281 | 0.321448 | 0.717171 | 0.875497 |
0 | ted_spread_p0 bbb_spread_p3 cap_rate_p3 | {u'cap_rate_p3': 0.201637275061, u'bbb_spread_p3': 0.593429505205, u'const': -3.71870866222, u'ted_spread_p0': 2.71903806154} | {u'cap_rate_p3': 0.217238393496, u'bbb_spread_p3': 7.84889309862e-09, u'ted_spread_p0': 3.23824482993e-10} | 2.172384e-01 | {u'cap_rate_p3': 1.4724277304, u'bbb_spread_p3': 1.80756321736, u'ted_spread_p0': 1.27957899111} | 1.807563 | 0.505135 | 0.405297 | 0.717277 | 0.819622 |
0 | ted_spread_p0 spindex_yoy_p3 cap_rate_p3 | {u'cap_rate_p3': 0.594228769802, u'spindex_yoy_p3': -0.031124114991, u'const': -4.49069384587, u'ted_spread_p0': 2.48354562619} | {u'cap_rate_p3': 4.50370245447e-05, u'spindex_yoy_p3': 8.36225082073e-09, u'ted_spread_p0': 1.64539790007e-09} | 4.503702e-05 | {u'cap_rate_p3': 1.06865169504, u'spindex_yoy_p3': 1.22753770126, u'ted_spread_p0': 1.18199596216} | 1.227538 | 0.504430 | 0.408088 | 0.717788 | 0.817696 |
0 | ted_spread_p0 realgpd_yoy_p2 bbb_spread_p3 | {u'bbb_spread_p3': 0.517442462306, u'const': -2.1307227209, u'realgpd_yoy_p2': -0.0864549354609, u'ted_spread_p0': 2.66817475933} | {u'bbb_spread_p3': 0.000504611426207, u'realgpd_yoy_p2': 0.249265257706, u'ted_spread_p0': 1.27380241683e-09} | 2.492653e-01 | {u'bbb_spread_p3': 4.29026413402, u'realgpd_yoy_p2': 3.6616378528, u'ted_spread_p0': 1.34053982558} | 4.290264 | 0.504018 | 0.388464 | 0.718086 | 0.831140 |
0 | realgpd_yoy_p1 ted_spread_p0 spindex_yoy_p3 | {u'spindex_yoy_p3': -0.00697418929236, u'const': -0.358295064272, u'realgpd_yoy_p1': -0.287950006036, u'ted_spread_p0': 2.24832163495} | {u'spindex_yoy_p3': 0.396355057686, u'realgpd_yoy_p1': 4.80031372299e-05, u'ted_spread_p0': 2.97028063227e-08} | 3.963551e-01 | {u'spindex_yoy_p3': 3.44623710801, u'realgpd_yoy_p1': 3.17299042455, u'ted_spread_p0': 1.18476784624} | 3.446237 | 0.503737 | 0.457973 | 0.718290 | 0.782481 |
0 | realgpd_yoy_p1 ted_spread_p0 crea_hp_yoy_p2 | {u'crea_hp_yoy_p2': -1.0795826604, u'const': -0.272715216966, u'realgpd_yoy_p1': -0.318271064261, u'ted_spread_p0': 2.22651917837} | {u'crea_hp_yoy_p2': 0.450477095771, u'realgpd_yoy_p1': 3.13546780517e-10, u'ted_spread_p0': 2.90322487555e-08} | 4.504771e-01 | {u'crea_hp_yoy_p2': 1.48055472983, u'realgpd_yoy_p1': 1.40383603833, u'ted_spread_p0': 1.15790564414} | 1.480555 | 0.502876 | 0.479863 | 0.718912 | 0.766517 |
0 | realgpd_yoy_p1 ted_spread_p0 spindex_yoy_p4 | {u'spindex_yoy_p4': -0.00287623611266, u'const': -0.307424934206, u'realgpd_yoy_p1': -0.319272637088, u'ted_spread_p0': 2.2027800319} | {u'spindex_yoy_p4': 0.630438613957, u'realgpd_yoy_p1': 1.06155602716e-08, u'ted_spread_p0': 4.3262669625e-08} | 6.304386e-01 | {u'spindex_yoy_p4': 1.90330721432, u'realgpd_yoy_p1': 1.77597233856, u'ted_spread_p0': 1.16353394534} | 1.903307 | 0.500930 | 0.450725 | 0.720318 | 0.787695 |
0 | ted_spread_p0 bbb_spread_p3 spindex_yoy_p2 | {u'bbb_spread_p3': 0.601012948771, u'spindex_yoy_p2': -0.00542467267777, u'const': -2.49819167847, u'ted_spread_p0': 2.8074946861} | {u'bbb_spread_p3': 5.53590432482e-08, u'spindex_yoy_p2': 0.387126006585, u'ted_spread_p0': 1.07989092515e-10} | 3.871260e-01 | {u'bbb_spread_p3': 2.12232941161, u'spindex_yoy_p2': 1.94673426734, u'ted_spread_p0': 1.26539793559} | 2.122329 | 0.500684 | 0.429745 | 0.720496 | 0.802598 |
0 | realgpd_yoy_p1 ted_spread_p0 corp_profit_p1 | {u'corp_profit_p1': -0.00111510712547, u'const': -0.295544468835, u'realgpd_yoy_p1': -0.322134804647, u'ted_spread_p0': 2.16984747298} | {u'corp_profit_p1': 0.746943635339, u'realgpd_yoy_p1': 7.85857215941e-08, u'ted_spread_p0': 3.07526883363e-08} | 7.469436e-01 | {u'corp_profit_p1': 2.09799265347, u'realgpd_yoy_p1': 2.09768958589, u'ted_spread_p0': 1.09854672985} | 2.097993 | 0.500194 | 0.446912 | 0.720849 | 0.790424 |
0 | realgpd_yoy_p0 ted_spread_p0 cap_rate_p3 | {u'cap_rate_p3': 0.639727648052, u'realgpd_yoy_p0': -0.253032450789, u'const': -4.28034401961, u'ted_spread_p0': 1.90847482334} | {u'cap_rate_p3': 1.18825887769e-05, u'realgpd_yoy_p0': 1.23019019096e-08, u'ted_spread_p0': 3.86554982571e-07} | 1.188259e-05 | {u'cap_rate_p3': 1.04954092806, u'realgpd_yoy_p0': 1.04901230219, u'ted_spread_p0': 1.04189905245} | 1.049541 | 0.500114 | 0.505297 | 0.720907 | 0.747542 |
all_combination
(5, 14, 18)
best_reg_result.summary()
Dep. Variable: | cum_pd_num | R-squared: | 0.615 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.601 |
Method: | Least Squares | F-statistic: | 46.24 |
Date: | Mon, 17 Apr 2017 | Prob (F-statistic): | 5.77e-18 |
Time: | 21:56:19 | Log-Likelihood: | -85.467 |
No. Observations: | 91 | AIC: | 178.9 |
Df Residuals: | 87 | BIC: | 189.0 |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
const | -5.8248 | 0.770 | -7.569 | 0.000 | -7.354 -4.295 |
ted_spread_p0 | 2.9978 | 0.340 | 8.805 | 0.000 | 2.321 3.675 |
cap_rate_p3 | 0.8444 | 0.120 | 7.055 | 0.000 | 0.607 1.082 |
crea_hp_yoy_p3 | -9.7555 | 1.111 | -8.784 | 0.000 | -11.963 -7.548 |
Omnibus: | 3.899 | Durbin-Watson: | 2.125 |
---|---|---|---|
Prob(Omnibus): | 0.142 | Jarque-Bera (JB): | 3.893 |
Skew: | 0.210 | Prob(JB): | 0.143 |
Kurtosis: | 3.922 | Cond. No. | 107. |