Feature Engineering
  • Feature selection: selecting the most useful features to train on among existing features
  • Feature extraction: combining existing features to produce a more useful one, such as, dimensionality reduction
  • Creating new features by gathering new data
  • Encoding Categorical Features
  • Count Encoding, count encoding replaces each categorical value with the number of times it appears in the dataset
  • Target Encoding, target encoding replaces a categorical value with the average value of the target for that value of the feature
  • CatBoost Encoding, based on the target probablity for a given value
  • Kaggle Tutorial
  • Correlated Features
  • Why Removing Correlated Features
  • Greedy
  • Recursive Feature Elimination (RFE)
  • Dimensionality Reduction
  • Lasso Regularision
  • Principle Component Analysis (PCA)
  • Reference
  • Wiki
  • Curse of Dimensionality
  • In supervised learning, why is it bad to have correlated features?