…has 100% correlation with the target). In this way, the model would not be learning anything useful neither about the grammar or syntax of the language nor the meanings of the tokens and would not be useful for downstream tasks.Dissecting BERT Part 2: BERT Specifics1K15Francisco InghamjihyeonFollowNov 5 · 1 min read따라서 인코딩보다는 디코딩에 더 맞음