Skip to main content Skip to main navigation

Publication

Why Deep Transformers are Difficult to Converge? From Computation Order to Lipschitz Restricted Parameter Initialization

Josef van Genabith; Hongfei Xu; Qiuhui Liz; Jingyi Zhang
keine Angabe.

Abstract

..