When the feature dimension is very small (e.g., 5 or 10), applying dropout can be destructive. If you drop 50% of the neurons in a 10-dimensional vector, you might lose critical, non-redundant information. With dimension 20, you have enough redundancy to survive aggressive dropout while still keeping the model compact.

The model with dropout dimension 20 achieves the best trade-off between generalization (92.7% accuracy) and training stability. The dimension 50 model without dropout overfits; with dropout it matches the 20-dim model but requires more computation.