PEFT is parameter efficient fine-tuning.
LoRA
Consider some matrix:
\begin{equation} W_0 \in \mathbb{R}^{d \times k} \end{equation}
Key intuition: gradient matricies have low intrinsic rank. We consider the following update:
\begin{equation} W_0 + \Delta W = W_0 + \alpha BA \end{equation}
where \(B \in \mathbb{R}^{d \times r}, A \in \mathbb{R}^{r \times k}\), and \(r \ll \min(d,k)\).
where \(\alpha\) is the trade off between pre-trained knowledge and task specific knowledge.