Supervised
Unsupervised
Self-supervised: Supervised learning with no human labels in the loop, labels were learned from the input data based on a heuristic algorithm. E.g. autoencoders, where teh generated labels are the input, unmodified.
Reinforcement Learning
Goal: Generaliation, to deal with overfitting.
Training, valdiation, and test sets.
K-Fold CV:
k = 4
indices = sample(1:nrow(data))
folds = cut(indices, breaks = k, labels = FALSE)
validation_scores = c()
for (i in 1:k) {
validation_indices = which(folds == i, arr.ind = TRUE)
validation_data = data[validation_indices,]
training_data = data[-validation_indices,]
model = get_model()
model %>% train(training_data)
results = model %>% evaluate(validation_data)
validation_scores = c(validation_scores, results$accuracy)
}
validation_score = mean(validation_scores)
model = get_model()
model %>% train(data)
results = model %>% evaluate(test_data)
Vectorization
Value normalization: take small values, be homogeneous.
One crucial point about normalization: You will normalize features in both training and test data. In this case, you want to compute the mean and SD on the training data only and then apply them to both the training and test data.
mean = apply(train_data, 2, mean)
std = apply(train_data, 2, sd)
train_data = scale(train_data, center = mean, scale = std)
test_data = scale(test_data, center = mean, scale = std)
To battle overfitting, we can put constraints on the complexity of the network by forcing its weights to take only small values, which makes the distribution of weight values more regular.
L1 regularization
L2 regularization
model <- keras_model_sequential() %>%
layer_dense(units = 16, kernel_regularizer = regularizer_l2(0.001), activation = "relu", input_shape = c(10000)) %>%
layer_dense(units = 16, kernel_regularizer = regularizer_l2(0.001), activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
regularizer_l2(0.001)
means every coefficient in the weight matrix of the layer will add 0.001 * weight_coefficient_value to the total loss of the network. Note that because this penalty is only added at training time, the loss for this network will be much higher at training time than at test time.
Randomly dropping out a number of output features of the layer. We can tune drop_out_rate
, which is the fraction of the features to be zeroed out.
model <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>% layer_dropout(rate = 0.5) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1, activation = "sigmoid")