kevindamm t1_jbq3w44 wrote on March 10, 2023 at 9:12 PM

Reply to [D] What's the Time and Space Complexity of Transformer Models Inference? by Smooth-Earth-9897

The analysis isn't as straightforward as that, for a few reasons. Transformer architectures are typically a series of alternating Multi-Head Attention (MHA) and Multi-Layer Perceptron (MLP) networks. The MHA may merge the heads from multiple MLPs. Each layer in the network is dominated by a matrix multiply and if it were all being computed on a CPU then a reasonable upper bound would be O(n^3 ) where n is the widest layer. But the bottleneck isn't based on how many multiplies a CPU would have to do because we are typically using a GPU or TPU to process it and these can parallelize a lot of the additions and multiplies of the matrix ops. The real bottleneck is often the memory copies going to and from the GPU or TPU, and this will vary greatly based on the model size, GPU memory limits, batch processing size, etc.

You're better off profiling performance for a particular model and hardware combination.

kevindamm t1_j6qmixr wrote on February 1, 2023 at 4:51 AM

Reply to comment by 9-11GaveMe5G in OpenAI releases tool to detect AI-generated text, including from ChatGPT by whitecastle92

There are four buckets (of unequal size) but I don't know if success was measured by landing within the "correct" bucket or being within the highest p(AI-gen) bucket as TP, or both extreme top and bottom buckets. I only read the journalistic article and not the original research, so idk. 1000 character minimum worries me more, there's quite a lot of text smaller than that (like this comment).

kevindamm t1_j6q2lwy wrote on February 1, 2023 at 2:12 AM

Reply to OpenAI releases tool to detect AI-generated text, including from ChatGPT by whitecastle92

1000 character minimum and 26% success rate, but it's good that they're working on it