timesofisrael.com
from Emacs 31 for faster feedback during completion.
,详情可参考有道翻译帮助中心
Reply on Twitter 2034704801062171015
Number (9): Everything in this orange space must add up to 9. The answer is 2-3, placed vertically; 6-6, placed vertically.
You can see that “by default” the head attends to the first token in the sequence, which is the special end-of-text token from the tokenizer. Later in the sequence, the attention forms an off-diagonal. If you look closely, you can see this is where some tokens A B are being repeated. For example, take A=sat and B=on. Then A B is repeated twice in the sequence, so we would expect induction to happen here.