home.social

#cnns — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #cnns, aggregated by home.social.

  1. "Ah, the riveting showdown between #ViTs and #CNNs, where you get a convoluted explanation on how images are turned into a mush of pixels and self-attention. 😴 But don't worry, you won't be distracted by any tracking or analytics, because apparently, nobody cares to watch this spectacle. 🤷‍♂️ If you can't read code without JavaScript, that's your problem, not ours. 🖼️🔍"
    lucasb.eyer.be/articles/vit_cn #imageprocessing #selfattention #techhumor #HackerNews #ngated

  2. Today I tried out #AMD #Instinct #MI300a for my existing Deep Learning pipeline. Good news: It worked out of the box. Bad news: For some reason it could not beat my local #Nvidia #1080ti...
    After trying all sorts of #ROCM installation methods via prebuild wheels, #apptainer images etc I tried #nanogpt by @karpathy and sure enought: The gpt code ran approx 2x faster than on a #a100 ... I hope that this is due to my programming skills. Not AMD prefering #transformers over #CNNs ...

  3. 'Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds', by Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao.

    jmlr.org/papers/v25/24-0066.ht

    #cnns #cnn #dimensional

  4. I remembered playing with DeepDream computer vision program that uses CNN (Convolutional Neural Network) almost 10 years ago. How fun it was to be able to run it locally on one my machines despite being very slow. Luckily I was able to find an already trained model, because training it locally was not feasible for me.

    #convolutionalneuralnetwork #cnns

  5. I remembered playing with DeepDream computer vision program that uses CNN (Convolutional Neural Network) almost 10 years ago. How fun it was to be able to run it locally on one my machines despite being very slow. Luckily I was able to find an already trained model, because training it locally was not feasible for me.

  6. I remembered playing with DeepDream computer vision program that uses CNN (Convolutional Neural Network) almost 10 years ago. How fun it was to be able to run it locally on one my machines despite being very slow. Luckily I was able to find an already trained model, because training it locally was not feasible for me.

    #convolutionalneuralnetwork #cnns

  7. I remembered playing with DeepDream computer vision program that uses CNN (Convolutional Neural Network) almost 10 years ago. How fun it was to be able to run it locally on one my machines despite being very slow. Luckily I was able to find an already trained model, because training it locally was not feasible for me.

    #convolutionalneuralnetwork #cnns

  8. A perspective on #chatGPT (or Large Language Models #LLMs in general): #Hype or milestone?

    [Rodney Brooks (spectrum.ieee.org/amp/gpt-4-ca) tells us that

    What large language models are good at is saying what an answer should sound like, which is different from what an answer should be.

    For a nice in-depth technical analysis, see this blog post by Stephen Wolfram (himself!) on "What is ChatGPT Doing ... and Why Does It Work? ". Worth reading -even for non-experts- in a non-trivial effort to make the whole process explainable. The different steps are:

    • #LLMs compute probabilities for the next word. To do this, they aggregate huge datasets of text so that they create a function that, given a sequence of words, computes for all possible words in the dictionary the probability that adding this new word is statistically congruent with past words. Interestingly, this probability, conditioned on what has been observed so far, falls of as a power law, just like the global probability of words in the dictionary,

    • These #probabilities are computed by a function that leans on the dataset to generate the best approximation. Wolfram makes a minute description of how to do such an approximation, starting from linear regression to using non-linearities. This leads to deep learning methods and their potential for universal function approximators,

    • Crucial is how these #models are trainable, in particular by way of #backpropagation. This leads the author to describe the process, but also to point out some limitations of the trained model, especially, as you might have guessed, compared to potentially more powerful systems, like #cellularautomata of course...

    • This now brings us to #embeddings, the crucial ingredient to define "words" in these #LLMs models. To relate "alligator" to "crocodile" vs. a "vending machine," this technique computes distances between words based on their relative distance in the large dataset of text corpus, so that each word is assigned an address in a high-dimensional space, with the intuition that words that are syntactically closer should be closer in the embedding space. It is highly non-trivial to understand the geometry of high-dimensional spaces - especially when we try to relate it to our physical 3D space - but this technique has proven to give excellent results, I highly recommend the #cemantix puzzle to test your intuition about word embeddings: cemantle.certitudes.org

    • Finally, these different parts are glued together by a humongous #transformer network. A standard #NeuralNetwork could perform a computation to predict the probabilities for the next word, but the results would mostly give nonsensical answers... Something more is needed to make this work. Just as traditional Convolutional Neural Networks #CNNs hardwire the fact that operations applied to an image should be applied to nearby pixels first, transformers do not operate uniformly on the sequence of words (i.e., embeddings), but weight them differently to ultimately get a better approximation. It is clear that much of the mechanism is a bunch of heuristics selected based on their performance - but we can understand the mechanism as giving different weights to different tokens - specifically based on the position of each token and its importance in the meaning of the current sentence. Based on this calculation, the sequence is reweighted so that a probability is ultimately computed. When applied to a sequence of words where words are added progressively, this creates a kind of loop in which the past sequence is constantly re-processed to update the generation.

    • Can we do more and include syntax? Wolfram discusses the internals of #chatGPT, and in particular how it trained iOS to "be a good bot" - and adds another possibility, which is to inject the knowledge that language is organized grammatically, and whether #transformers are able to learn such rules. This points to certain limitations of the architecture and the potential of using graphs as a generalization of geometric rules. The post ends with a comparison of #LLMs, which just aim to sound right, with rule-based models, a debate reminiscent of the older days of AI...