<8.8f}".format(mean_squared_error(oof_br_49, target)))
弹性网络
folds = KFold(n_splits=5, shuffle=True, random_state=13)oof_en_49 = np.zeros(train_shape)predictions_en_49 = np.zeros(len(X_test_49))#for fold_, (trn_idx, val_idx) in enumerate(folds.split(X_train_49, y_train)):print("fold n°{}".format(fold_+1))tr_x = X_train_49[trn_idx]tr_y = y_train[trn_idx]en_49 = en(alpha=1.0,l1_ratio=0.05)en_49.fit(tr_x,tr_y)oof_en_49[val_idx] = en_49.predict(X_train_49[val_idx])predictions_en_49 += en_49.predict(X_test_49) / folds.n_splitsprint("CV score: {:<8.8f}".format(mean_squared_error(oof_en_49, target)))
至此 , 得到了以上4种新模型的基于49个特征的预测结果以及模型架构及参数 。其中在每一种特征工程中 , 进行5折的交叉验证 , 并重复两次(简单的线性回归) , 取得每一个特征数下的模型的结果 。
train_stack4 = np.vstack([oof_br_49,oof_kr_49,oof_en_49,oof_ridge_49]).transpose()test_stack4 = np.vstack([predictions_br_49, predictions_kr_49,predictions_en_49,predictions_ridge_49]).transpose()folds_stack = RepeatedKFold(n_splits=5, n_repeats=2, random_state=7)oof_stack4 = np.zeros(train_stack4.shape[0])predictions_lr4 = np.zeros(test_stack4.shape[0])for fold_, (trn_idx, val_idx) in enumerate(folds_stack.split(train_stack4,target)):print("fold {}".format(fold_))trn_data, trn_y = train_stack4[trn_idx], target.iloc[trn_idx].valuesval_data, val_y = train_stack4[val_idx], target.iloc[val_idx].values#LinearRegressionlr4 = lr()lr4.fit(trn_data, trn_y)oof_stack4[val_idx] = lr4.predict(val_data)predictions_lr4 += lr4.predict(test_stack1) / 10mean_squared_error(target.values, oof_stack4)
7. 模型融合
这里对于上述四种集成学习的模型的预测结果进行加权的求和 , 得到最终的结果 。
mean_squared_error(target.values, 0.7*(0.6*oof_stack2 + 0.4*oof_stack3)+0.3*(0.55*oof_stack1+0.45*oof_stack4))
更好的方式是将以上的4中集成学习模型再次进行集成学习的训练 , 这里直接使用简单线性回归的进行集成 。
train_stack5 = np.vstack([oof_stack1,oof_stack2,oof_stack3,oof_stack4]).transpose()test_stack5 = np.vstack([predictions_lr1, predictions_lr2,predictions_lr3,predictions_lr4]).transpose()folds_stack = RepeatedKFold(n_splits=5, n_repeats=2, random_state=7)oof_stack5 = np.zeros(train_stack5.shape[0])predictions_lr5= np.zeros(test_stack5.shape[0])for fold_, (trn_idx, val_idx) in enumerate(folds_stack.split(train_stack5,target)):print("fold {}".format(fold_))trn_data, trn_y = train_stack5[trn_idx], target.iloc[trn_idx].valuesval_data, val_y = train_stack5[val_idx], target.iloc[val_idx].values#LinearRegressionlr5 = lr()lr5.fit(trn_data, trn_y)oof_stack5[val_idx] = lr5.predict(val_data)predictions_lr5 += lr5.predict(test_stack5) / 10mean_squared_error(target.values, oof_stack5)
8. 结果保存
【幸福感预测Task14:集成学习案例一】submit_example = pd.read_csv('submit_example.csv',sep=',',encoding='latin-1')submit_example['happiness'] = predictions_lr5submit_example.loc[submit_example['happiness']>4.96,'happiness']= 5submit_example.loc[submit_example['happiness']<=1.04,'happiness']= 1submit_example.loc[(submit_example['happiness']>1.96)&(submit_example['happiness']<2.04),'happiness']= 2submit_example.to_csv("submision.csv",index=False)submit_example.happiness.describe()
- 快来一起挖掘幸福感--新人实战--阿里云天池
- 为什么天气能预报 天气为什么能预报
- 未来计算机能否可能统治人类世界,未来学家预测人工智能或将统治人类社会
- 深度学习-第R1周心脏病预测
- 今日白银趋势预测,如何预测白银价格走向
- 为什么英布刚刚举起反旗就预测到了会失败
- 永城一周天气
- 魏明帝曹睿的原配虞氏:准确预测了曹魏的灭亡
- 高安一周天气
- 2023年中国数据仓库软件市场规模及结构预测分析 中国之最数据分析