#cemantix — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #cemantix, aggregated by home.social.
-
Je suis très content pour vous quand vous trouvez #cemantix, #pedantix, #cetautomatix ou #idefix, mais j’aimerais bien pouvoir ignorer ces hashtags depuis un client mobile Mastodon. Est-ce qu’il y en a un qui permet ça facilement ?
-
A perspective on #chatGPT (or Large Language Models #LLMs in general): #Hype or milestone?
[Rodney Brooks (https://spectrum.ieee.org/amp/gpt-4-calm-down-2660261157) tells us that
What large language models are good at is saying what an answer should sound like, which is different from what an answer should be.
For a nice in-depth technical analysis, see this blog post by Stephen Wolfram (himself!) on "What is ChatGPT Doing ... and Why Does It Work? ". Worth reading -even for non-experts- in a non-trivial effort to make the whole process explainable. The different steps are:
#LLMs compute probabilities for the next word. To do this, they aggregate huge datasets of text so that they create a function that, given a sequence of words, computes for all possible words in the dictionary the probability that adding this new word is statistically congruent with past words. Interestingly, this probability, conditioned on what has been observed so far, falls of as a power law, just like the global probability of words in the dictionary,
These #probabilities are computed by a function that leans on the dataset to generate the best approximation. Wolfram makes a minute description of how to do such an approximation, starting from linear regression to using non-linearities. This leads to deep learning methods and their potential for universal function approximators,
Crucial is how these #models are trainable, in particular by way of #backpropagation. This leads the author to describe the process, but also to point out some limitations of the trained model, especially, as you might have guessed, compared to potentially more powerful systems, like #cellularautomata of course...
This now brings us to #embeddings, the crucial ingredient to define "words" in these #LLMs models. To relate "alligator" to "crocodile" vs. a "vending machine," this technique computes distances between words based on their relative distance in the large dataset of text corpus, so that each word is assigned an address in a high-dimensional space, with the intuition that words that are syntactically closer should be closer in the embedding space. It is highly non-trivial to understand the geometry of high-dimensional spaces - especially when we try to relate it to our physical 3D space - but this technique has proven to give excellent results, I highly recommend the #cemantix puzzle to test your intuition about word embeddings: https://cemantle.certitudes.org
Finally, these different parts are glued together by a humongous #transformer network. A standard #NeuralNetwork could perform a computation to predict the probabilities for the next word, but the results would mostly give nonsensical answers... Something more is needed to make this work. Just as traditional Convolutional Neural Networks #CNNs hardwire the fact that operations applied to an image should be applied to nearby pixels first, transformers do not operate uniformly on the sequence of words (i.e., embeddings), but weight them differently to ultimately get a better approximation. It is clear that much of the mechanism is a bunch of heuristics selected based on their performance - but we can understand the mechanism as giving different weights to different tokens - specifically based on the position of each token and its importance in the meaning of the current sentence. Based on this calculation, the sequence is reweighted so that a probability is ultimately computed. When applied to a sequence of words where words are added progressively, this creates a kind of loop in which the past sequence is constantly re-processed to update the generation.
Can we do more and include syntax? Wolfram discusses the internals of #chatGPT, and in particular how it trained iOS to "be a good bot" - and adds another possibility, which is to inject the knowledge that language is organized grammatically, and whether #transformers are able to learn such rules. This points to certain limitations of the architecture and the potential of using graphs as a generalization of geometric rules. The post ends with a comparison of #LLMs, which just aim to sound right, with rule-based models, a debate reminiscent of the older days of AI...