Compress large language models for better perf with SparseGPT

Shaun
Jan 5, 2023
Apps, Software
|
7

On Twitter recently, a user named Jay Hack (@mathemagic1an) posted an excerpt of an abstract that deals with how to prune out massive language models to achieve a better performance metric. The abstract was written by Austrian researchers Elias Frantar and Dan Alistarh. In the abstract, the researchers attempt to show how language models in the GPT family can be ‘pruned to 50% sparsity without any retraining.’

This is a rather significant breakthrough in terms of the usability of large language models, particularly in the GPT family. GPT stands for Generative Pretrained Transformers, but you may be familiar with the most popular of these systems in the current artificial intelligence landscape; ChatGPT by OpenAI. As outlined in the abstract, this incredible breakthrough was achieved with a new pruning method known as SparseGPT. This method was designed specifically to work on truly large language models like OPT-175B and BLOOM-176B, both of which are completely open-source. Both language models are also viewed as the largest of their kind that is currently available. 

Related: What are lawmakers and regulators doing about AI?

The wording of the abstract is relatively difficult to understand if you don’t have a background in this particular field of scientific inquiry, but I’ll try and break it down into simpler terms. Basically, all machine learning algorithms work with some kind of neural network. Early researchers and pioneers chose to call this a neural network due to the end goal of trying to simulate how the human brain, with all of its billions of neurons, operates. In a simple neural network, you always have an input, and out, and a lot of confusing mathematical bits in the middle in various layers of complex data handling systems. 

Each ‘neuron’ represents a value, and the neurons connect with other neurons in the system to form a network that can perform certain equations to determine output values. Within these hidden layers, there are certain parameters in place that transform input data into the eventual output. These parameters are basically like dials you can adjust to change the way your network operates, and we call these weights.

We can now accurately prune large language models down to size

Now, with these massive language models, you need to take into account what kind of information these systems have to process. First, these systems need to recognize certain patterns that make up what we inherently recognize as letters, numbers, words, phrases and abstract thoughts. All of this complex computation is achieved through weights, and in large language models these weights can number in the innumerable. 

Basically, this new method, SparseGPT, shows how, in some cases, more than 100 billion of these individual weights can be ignored, while still producing largely accurate results. In even more basic terms, SparseGPT lets us minimize the number of computations and determinations that need to take place in these hidden layers of complex digital grey matter without altering the efficiency or the accuracy of the results we obtain. 

Game changing stuff, right? If you’re interested in artificial intelligence, machine learning, or neural networks, we have a host of other tantalizing news bits for you to read. If this wasn’t really your cup of tea, or rather, what you like to read while drinking a cup of tea, we have a bunch of other articles that may interest you. For instance, is it possible that Elon Musk’s Twitter verification crusade was all just clever marketing?

Advertisement

Tutorials & Tips


Previous Post: «
Next Post: «

Comments

  1. DrKnow said on January 6, 2023 at 2:31 am
    Reply

    @John G
    The article is in no way complex. Basic yes.

    >>SparseGPT, shows how, in some cases, more than 100 billion of these individual weights can be ignored, while still producing largely accurate results <<

    The article in no way tries to define the benefits of this. How accurate are these results?
    "Largely accurate" is that 60%, 80% or 90% compared to the full data set.

    On the plus side, this article a WAY better than Shaun's usual drivel that has zero substance.

    I feel that if Shaun really tried he could produce a decent length article on this rather than the couple of hundred words he seems to be commissioned for.

    Come on Shaun, step up and contribute to Ghacks in a useful way.
    Although, sadly, I suspect the next article will be something like 'How to open Office to do….?' and don't forget the useless question mark!

    1. John G. said on January 6, 2023 at 10:19 am
      Reply

      You’re the first one that have understood my first comment. I know that I am young however I like computers despite I study botanical career. Some articles are very interesting and other some are easier, however I like all of then and I do my best to understand all of them. This was really difficult for me at first reading. :[

  2. beaver the cleaver said on January 6, 2023 at 12:45 am
    Reply

    > I have read several times and even translated to my language is too difficult for my computer skills .

    Then Shaun’s usual M$ love fest articles are right up your alley!

    1. John G. said on January 6, 2023 at 10:15 am
      Reply

      You didn’t understand what I meant.

  3. John G. said on January 5, 2023 at 6:11 pm
    Reply

    I think that all the bullies who often chase @Shaun are still reading this high level article, mostly trying to understand what the h*** is talked about here. I have read several times and even translated to my language is too difficult for my computer skills . Thanks forvthe article.

    1. Anonymous said on January 6, 2023 at 12:29 am
      Reply

      Elias Frantar -who wrote the is a third-year Computer Science PhD (!) candidate at IST Austria, supervised by Dan Alistarh, working on practical neural network compression. He already COMPLETED Computer Science batchelor’s and master’s degrees.

      On the other hand, Johnny boy is a student-dash-troll who loves Shaun’s articles, therefore, mocks other people for -possibly- not being able to understand something they’re NOT supposed to. So now, johnny boy must get a life by himself as Santa Claus obviously did a lousy job this year.

      The paper is here:
      https://paperswithcode.com/paper/massive-language-models-can-be-accurately

      PS: This is not SOFTWARE nor APPS, Shaun categorized a showoff as such.

      1. John G. said on January 6, 2023 at 10:12 am
        Reply

        Your lack of sense of humour is real awesome even for your taste. I just wanted to pointed out that the article was very good and has an enormous and clear importance in so many ways for the next future. It’s so difficult to be ironic here, it’s so hard to even try to help… it completely seems that some people here is waiting for @Shaun articles to throw their miseries all around. That’s not good at all! :[

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

We love comments and welcome thoughtful and civilized discussion. Rudeness and personal attacks will not be tolerated. Please stay on-topic.
Please note that your comment may not appear immediately after you post it.