How does attention mechanism in deep learning improve model performance?