Sign in Create account

#deepread — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #deepread, aggregated by home.social.

Leshem Choshen @[email protected] · 2023-10-13 · 14:40 UTC

Back in the days of 2021
there was a lovely evaluation paper:
Automatically identifying label errors
Improving score's reliability
Finding example's difficulty
Active Learning
https://aclanthology.org/2021.acl-long.346/
@par @hoyle
#machinelearning #evaluation #IRT #LLM #deepRead

#machinelearning #evaluation #irt #llm #deepread
Leshem Choshen @[email protected] · 2023-08-30 · 11:26 UTC

Did you know:
Evaluating a single model on HELM took
⏱️4K GPU hours or 💸+10K$ in API calls?!
Flash-HELM⚡️can reduce costs by X200!
https://arxiv.org/abs/2308.11696
#deepRead #machinelearning #evaluation #eval #nlproc #NLP #LLM

#deepread #machinelearning #evaluation #eval #nlproc #nlp
Leshem Choshen @[email protected] · 2023-08-09 · 07:02 UTC

The newFormer is introduced,
but what do we really know about it?
@ari and others
imagine a new large-scale architecture &
ask how would you interptret its abilities and behaviours 🧵
https://arxiv.org/abs/2308.00189
#deepRead #NLProc #MachineLearning

#deepread #nlproc #machinelearning
Leshem Choshen @[email protected] · 2023-03-20 · 12:59 UTC

@mega Linear transformations can skip over layers, even till the end
We can see 👀 what the network 🧠 thought!
We can stop🛑 generating at early layers!
https://arxiv.org/abs/2303.09435v1
#NLProc #deepRead

#nlproc #deepread
Leshem Choshen @[email protected] · 2023-03-20 · 12:54 UTC

🔎What's in a layer?🌹🕵🏻‍♀️
Representations are vectors
If only they were words...
Finding:
Any layer can be mapped well to another linearly
Simple, efficient & interpretable
& improves early exit
https://arxiv.org/abs/2303.09435v1
Story and 🧵
#nlproc #deepRead #MachinLearning

#nlproc #deepread #machinlearning
Leshem Choshen @[email protected] · 2023-03-15 · 08:41 UTC

Mindblowing pretraining paradigm
Train the same model to predict the two directions separately
Better results, more parallelization
https://arxiv.org/abs/2303.07295
#deepRead #nlproc #pretraining #machinelearning

#deepread #nlproc #pretraining #machinelearning
Leshem Choshen @[email protected] · 2023-01-23 · 12:20 UTC

3 reasons for hallucinations started
only 2 prevailed
Finding how networks behave while hallucinating, they
filter hallucinations (with great success)
https://arxiv.org/abs/2301.07779
#NLProc #neuralEmpty #NLP #deepRead

#nlproc #neuralempty #nlp #deepread
Leshem Choshen @[email protected] · 2022-12-07 · 08:29 UTC

What neurons determine agreement in multilingual LLMs?
#deepRead but some answers:
Across languages-2 distinct ways to encode syntax
Share neurons not info
Autoregressive have dedicated synt. neurons (MLM just spread across)
@[email protected] yu xia @[email protected] #conllLivetweet2022

#deepread #conlllivetweet2022
Leshem Choshen @[email protected] · 2022-12-07 · 08:29 UTC

What neurons determine agreement in multilingual LLMs?
#deepRead but some answers:
Across languages-2 distinct ways to encode syntax
Share neurons not info
Autoregressive have dedicated synt. neurons (MLM just spread across)
@[email protected] yu xia @[email protected] #conllLivetweet2022

#deepread #conlllivetweet2022
Leshem Choshen @[email protected] · 2022-12-07 · 08:29 UTC

What neurons determine agreement in multilingual LLMs?
#deepRead but some answers:
Across languages-2 distinct ways to encode syntax
Share neurons not info
Autoregressive have dedicated synt. neurons (MLM just spread across)
@[email protected] yu xia @[email protected] #conllLivetweet2022

#deepread #conlllivetweet2022
Leshem Choshen @[email protected] · 2022-12-07 · 08:29 UTC

What neurons determine agreement in multilingual LLMs?
#deepRead but some answers:
Across languages-2 distinct ways to encode syntax
Share neurons not info
Autoregressive have dedicated synt. neurons (MLM just spread across)
@[email protected] yu xia @[email protected] #conllLivetweet2022

#conlllivetweet2022 #deepread
Leshem Choshen @[email protected] · 2022-12-07 · 08:29 UTC

What neurons determine agreement in multilingual LLMs?
#deepRead but some answers:
Across languages-2 distinct ways to encode syntax
Share neurons not info
Autoregressive have dedicated synt. neurons (MLM just spread across)
@[email protected] yu xia @[email protected] #conllLivetweet2022

#deepread #conlllivetweet2022