Search

50 results for arxiv.org:

Submitted by LesleyFair t3_10fw22o in deeplearning

Think About Scaling LLMs In 2020, a team of researchers from OpenAI released a [paper](https://arxiv.org/pdf/2001.08361.pdf) called: “Scaling Laws For Neural Language Models”. They observed a predictable decrease in training loss when increasing ... that is what people did. The models got larger and larger with GPT-3 (175B), [Gopher](https://arxiv.org/pdf/2112.11446.pdf) (280B), [Megatron-Turing NLG](https://arxiv.org/pdf/2201.11990) (530B) just to name a few. But the bigger ... number of training tokens should double as well. This was published in DeepMind’s 2022 [paper](https://arxiv.org/pdf/2203.15556.pdf): “Training Compute-Optimal Large Language Models” The researchers fitted over 400 language models ranging from

68

InfuriatinglyOpaque t1_ivb9otw wrote

Dorka, N., Burgard, W., Koltun, V., & Brox, T. (2020). Scaling Imitation Learning in Minecraft. [http://arxiv.org/abs/2007.02701](http://arxiv.org/abs/2007.02701) Bramlage, L., & Cortese, A. (2021). Generalized Attention-Weighted Reinforcement Learning. Neural Networks. [https://doi.org/10.1016/j.neunet.2021.09.023](https://doi.org/10.1016/j.neunet.2021.09.023) Frey ... Characterizing the dynamics of learning in repeated reference games. Cognitive Science, 44(6), e12845. [http://arxiv.org/abs/1912.07199](http://arxiv.org/abs/1912.07199) Kumaran, V., Mott, B. W., & Lester, J. C. (2019.). Generating Game Levels for Multiple Distinct Games with ... Hjelm, D., Bachman, P., & Courville, A. (2021). Pretraining Representations for Data-Efficient Reinforcement Learning. [http://arxiv.org/abs/2106.04799](http://arxiv.org/abs/2106.04799) Sibert, C., Gray, W. D., & Lindstedt, J. K. (2017). Interrogating Feature Learning Models to Discover Insights

3

Nameless1995 t1_iyyl3m5 wrote

Technically Lambda already uses "external database" i.e external tools (the internet, calculator, etc.) to retrieve information: https://arxiv.org/pdf/2201.08239.pdf (Section 6.2) It doesn't solve /u/ThePahtomPhoton's memory problem (I don't remember what GPT3 ... GPT3 level). One solution is using a kNN lookup in a non-differentiable manner: https://arxiv.org/abs/2203.08913 One solution is making Transformers semi-recurrent (process inside chunks parallely, then sequencially process some coarse-compressed-chunk ... representation sequentially.). This can allow information to be carried in through the sequential process: https://arxiv.org/pdf/2203.07852 https://openreview.net/forum?id=mq-8p5pUnEX Another solution is to augment Transformer with a State Space model which have shown great

13

FrogBearSalamander t1_jc5vvrb wrote

Would love to read some research papers if you have a link! - [Nonlinear Transform Coding](https://arxiv.org/abs/2007.03034) - [An Introduction to Neural Data Compression](https://arxiv.org/abs/2202.06533) - [SoundStream: An End-to-End Neural Audio Codec ... arxiv.org/abs/2107.03312) - Old but foundational: [End-to-end Optimized Image Compression](https://arxiv.org/abs/1611.01704) - And this paper made the connection between compression models and VAEs: [Variational image compression with a scale hyperprior](https://arxiv.org/abs/1802.01436) ... that SoundStream (mentioned above) uses residual VQ (RVQ). - [Image Compression with Product Quantized Masked Image Modeling](https://arxiv.org/abs/2212.07372) uses a kind of VQ (subdivide the latent vectors and code separate to form a product

2

Submitted by LesleyFair t3_11alh40 in singularity

www.siegemedia.com/seo/most-popular-keywords#:~:text=The) winner of most popular,or "weather" for short. \[5\] [https://twitter.com/vladquant/status/1624996869654056960?s=46&t=oAzVIB-avPf-JbQAnhcbtA](https://twitter.com/vladquant/status/1624996869654056960?s=46&t=oAzVIB-avPf-JbQAnhcbtA) \[6\] [https://arxiv.org/pdf/2112.09332.pdf](https://arxiv.org/pdf/2112.09332.pdf) \[7\] [https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/](https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/) \[8\] [https://arxiv.org/abs/1706.03762](https://arxiv.org/abs/1706.03762) \[9\] [https://arxiv.org/abs/2201.08239](https://arxiv.org/abs/2201.08239) \[10\] [https://arxiv.org/abs/2112.04426](https://arxiv.org/abs/2112.04426) ... www.quora.com/What-percentage-of-web-search-queries-are-navigational](https://www.quora.com/What-percentage-of-web-search-queries-are-navigational) \[13\] [https://www.statista.com/statistics/413229/search-query-size-search-engine-share/](https://www.statista.com/statistics/413229/search-query-size-search-engine-share/) \[14\] [https://www.forbes.com/sites/johanmoreno/2021/08/27/google-estimated-to-be-paying-15-billion-to-remain-default-search-engine-on-safari/?sh=40cbbfcf669b](https://www.forbes.com/sites/johanmoreno/2021/08/27/google-estimated-to-be-paying-15-billion-to-remain-default-search-engine-on-safari/?sh=40cbbfcf669b) \[15\] [https://businessquant.com/microsoft-revenue-by-product](https://businessquant.com/microsoft-revenue-by-product) \[16\] [https://arxiv.org/abs/2209.01667](https://arxiv.org/abs/2209.01667)

6

Submitted by fromnighttilldawn t3_y11a7r in MachineLearning

popular practice/belief is unsound or useless. Some famous examples are: **Troubling Trends in ML** [https://arxiv.org/pdf/1807.03341.pdf](https://arxiv.org/pdf/1807.03341.pdf) **ML that Matters** [https://arxiv.org/abs/1206.4656](https://arxiv.org/abs/1206.4656) **On the Convergence of ADAM** [https://arxiv.org/abs/1904.09237](https://arxiv.org/abs/1904.09237) **On the Information Bottleneck ... iopscience.iop.org/article/10.1088/1742-5468/ab3985](https://iopscience.iop.org/article/10.1088/1742-5468/ab3985) **Implementation Matters in Deep Policy Gradients** [https://arxiv.org/abs/2005.12729](https://arxiv.org/abs/2005.12729) (showed a certain purported algorithm gain is actually mainly due to code-level optimization) **Critique of Turing Award** [https://people.idsia.ch/\~juergen/critique-turing-award-bengio-hinton-lecun.html](https://people.idsia.ch/~juergen/critique-turing-award-bengio-hinton-lecun.html) ... basically a critique on the citation practice in ML) **Deep Learning a Critical Appraisal** [https://arxiv.org/abs/1801.00631](https://arxiv.org/abs/1801.00631) However, these are a little bit dated. Does anyone have any recent critique papers of similar flavour

131

Submitted by mjrossman t3_11ws42u in Futurology

trend has been AI's societal impact. if anyone's read the[ recent job impact paper](https://arxiv.org/abs/2303.10130), one of the factors that jumped out was the exposure of blockchain engineering to AI-based ... function of any group of market participants. with respect to ML frameworks like[ sparsely-gated MoE](https://arxiv.org/abs/1701.06538v1),[ world models](https://arxiv.org/abs/2301.04104v1),[ multimodality](https://arxiv.org/abs/2303.03378), and[ adaptive agents](https://arxiv.org/abs/2301.07608):

17

qalis t1_j8driqb wrote

help. A bit of self promotion, but my Master's thesis was about GNNs: [https://arxiv.org/abs/2211.03666](https://arxiv.org/abs/2211.03666). It should be very beginner-friendly, since I had to write it while also learning about this step ... articles are also great, e.g. [https://distill.pub/2021/gnn-intro/](https://distill.pub/2021/gnn-intro/) or a well known (in this field) [https://arxiv.org/abs/1901.00596](https://arxiv.org/abs/1901.00596). You should also definitely read papers about GCN (very intuitively written), GAT, GraphSAGE and GIN, the most ... with **a lot** of suspicion. This paper about fair comparison is becoming more and more used: [https://arxiv.org/abs/1912.09893](https://arxiv.org/abs/1912.09893). This baseline, not GNN but similar, gives very strong results: [https://arxiv.org/abs/1811.03508](https://arxiv.org/abs/1811.03508). I will

3

cnapun t1_j10a9jz wrote

better or worse results. Some not super-recent papers I can think of: [https://research.google/pubs/pub50257/](https://research.google/pubs/pub50257/) [https://arxiv.org/abs/1706.07567](https://arxiv.org/abs/1706.07567) [https://arxiv.org/abs/2010.14395](https://arxiv.org/abs/2010.14395) [https://arxiv.org/abs/1907.00937](https://arxiv.org/abs/1907.00937) (3.2) [https://arxiv.org/abs/2006.11632](https://arxiv.org/abs/2006.11632) (2.2/2.4

5

serge_cell t1_j5akgwk wrote

several years ago and in [this same subreddit too](https://www.reddit.com/r/MachineLearning/comments/a8xjh0/d_im_tired_of_reading_resultsoriented_papers_what/). For example: https://arxiv.org/abs/1810.02054 https://arxiv.org/abs/1811.03804 https://arxiv.org/abs/1811.03962 https://arxiv.org/abs/1811.08888 This is recurring question, people asking it every year

1

benanne OP t1_j427zj0 wrote

very easy to use architectures where computation is largely decoupled from the sequence length, like Perceivers (https://arxiv.org/abs/2103.03206, https://arxiv.org/abs/2107.14795), or Recurrent Interface Networks (https://arxiv.org/abs/2212.11972). This is highly speculative though ... aware that an autoregressive variant of the Perceiver architecture exists (https://arxiv.org/abs/2202.07765), but it is actually quite a bit less general/flexible than Perceiver IO / the original Perceiver

2

olmec-akeru OP t1_iy2zjoi wrote

arxiv.org/pdf/2204.04273.pdf](https://arxiv.org/pdf/2204.04273.pdf) [https://arxiv.org/pdf/2203.09347.pdf](https://arxiv.org/pdf/2203.09347.pdf) [https://arxiv.org/pdf/2206.06513.pdf](https://arxiv.org/pdf/2206.06513.pdf) and the one speaking to categorical variables: [https://arxiv.org/pdf/2112.00362.pdf](https://arxiv.org/pdf/2112.00362.pdf)

11

prototypist t1_j0c5p2j wrote

human-like decoder for language models and seeing what outputs humans prefer. Transformers supports [typical decoding](https://arxiv.org/abs/2202.00666) and [contrastive search](https://huggingface.co/blog/introducing-csearch), and there are papers and code out for [RankGen ... arxiv.org/abs/2205.09726), [Time Control](https://arxiv.org/abs/2203.11370), and [Contrastive Decoding](https://arxiv.org/abs/2210.15097) (which is totally different from contrastive search

3

JNmbrs t1_isgqdyr wrote

work on these systems, the work seems to focus on improvements in (a) search algorithms (e.g., [https://arxiv.org/pdf/2110.12485.pdf](https://arxiv.org/pdf/2110.12485.pdf)); (b) program abstraction/library compression (e.g., [https://mlb2251.github.io/stitch\_jul11.pdf](https://mlb2251.github.io/stitch_jul11.pdf) and [http://andrewcropper.com/pubs/aaai20-forgetgol.pdf](http://andrewcropper.com/pubs/aaai20-forgetgol.pdf)); ... optimizing neural guidance (e.g., [https://openreview.net/pdf?id=rCzfIruU5x5](https://openreview.net/pdf?id=rCzfIruU5x5) and [https://arxiv.org/pdf/2206.05922.pdf](https://arxiv.org/pdf/2206.05922.pdf)); and (d) specification (e.g., [https://arxiv.org/pdf/2007.05060.pdf](https://arxiv.org/pdf/2007.05060.pdf) and [https://arxiv.org/pdf/2204.02495.pdf](https://arxiv.org/pdf/2204.02495.pdf)). While obviously work proceeds in these (and other related) domains, I'd love

1

Throwaway00000000028 t1_iy42ker wrote

Blog: [https://yang-song.net/blog/2021/score/](https://yang-song.net/blog/2021/score/) Youtube videos: [https://www.youtube.com/watch?v=fbLgFrlTnGU](https://www.youtube.com/watch?v=fbLgFrlTnGU) Seminal papers: \- Denoising Diffusion Probabilistic Models: [https://arxiv.org/abs/2006.11239](https://arxiv.org/abs/2006.11239) \- Improved Techniques for Training Score-based Generative Models: [https://arxiv.org/abs/2006.09011](https://arxiv.org/abs/2006.09011) \- Hierarchical Text-Conditional Image Generation with ... CLIP Latents: [https://arxiv.org/abs/2204.06125](https://arxiv.org/abs/2204.06125) Review papers: \- Understanding Diffusion Models: [https://arxiv.org/pdf/2208.11970.pdf](https://arxiv.org/pdf/2208.11970.pdf)

2

tariban t1_irw5z8d wrote

problems, despite many claims to the contrary: * [Tabular Data: Deep Learning is Not All You Need](http://arxiv.org/abs/2106.03253) * [In Search of Lost Domain Generalization](http://arxiv.org/abs/2007.01434) * [Unsupervised Domain Adaptation: A Reality Check ... arxiv.org/abs/2111.15672) * [A Baseline for Few-Shot Image Classification](http://arxiv.org/abs/1909.02729)

13

dangerhexagon t1_j4x2yrp wrote

There's some papers on applying transformers to trees: [https://arxiv.org/abs/1909.06639](https://arxiv.org/abs/1909.06639) , [https://arxiv.org/abs/1911.09983](https://arxiv.org/abs/1911.09983) , [https://papers.nips.cc/paper/2019/hash/6e0917469214d8fbd8c517dcdc6b8dcf-Abstract.html](https://papers.nips.cc/paper/2019/hash/6e0917469214d8fbd8c517dcdc6b8dcf-Abstract.html) And some recent work on tree extraction: [https://arxiv.org/abs/2301.00447](https://arxiv.org/abs/2301.00447) There's also this paper which recovers ... tree by observing the leaf nodes: [https://arxiv.org/abs/2208.14924](https://arxiv.org/abs/2208.14924)

8

BerenMillidge t1_iy814ur wrote

view them, is as a idealised exploration of a specific limit of PC. In recent work (https://arxiv.org/pdf/2206.02629), we expand on this limit idea and show that all current EBM approximations to BP, such ... number of its properties. We also have a more theoretical analysis of standard PC (https://arxiv.org/pdf/2207.12316) where we show that although it differs from backdrop, it can also converge to minima of a supervised ... advantages of PC over BP including the ability for it to learn arbitrary recurrent computation graphs (https://arxiv.org/pdf/2201.13180), the fact that you can significantly speed it up with incremental variants, and that

2

DinosParkour t1_iy7j1hw wrote

choosing the most suitable ones) when it comes to computing the query-doc similarity. \[1\] [https://arxiv.org/abs/2201.10005](https://arxiv.org/abs/2201.10005) \[2\] [https://github.com/facebookresearch/faiss/](https://github.com/facebookresearch/faiss/) \[3\] [https://arxiv.org/abs/2107.05720](https://arxiv.org/abs/2107.05720) \[4\] [https://arxiv.org/abs/2004.12832](https://arxiv.org/abs/2004.12832) \[5\] [https://arxiv.org/abs/2211.01267](https://arxiv.org/abs/2211.01267)

15

Submitted by kizumada t3_11rfxca in MachineLearning

understanding model in 2019 and evolved to ERNIE 3.0 Titan with 260 billion parameters. ERNIE 1.0: [https://arxiv.org/abs/1904.09223](https://arxiv.org/abs/1904.09223) ERNIE 2.0: [https://arxiv.org/abs/1907.12412](https://arxiv.org/abs/1907.12412) ERNIE 3.0: [https://arxiv.org/abs/2112.12731](https://arxiv.org/abs/2112.12731) ERNIE for text-to-image ... arxiv.org/abs/2210.15257](https://arxiv.org/abs/2210.15257) ERNIE Bot live-stream on YouTube: [https://www.youtube.com/watch?v=ukvEUI3x0vI](https://www.youtube.com/watch?v=ukvEUI3x0vI)

31

Submitted by IamTimNguyen t3_105v7el in MachineLearning

papers: Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes: [https://arxiv.org/abs/1910.12478](https://arxiv.org/abs/1910.12478) Tensor Programs II: Neural Tangent Kernel for Any Architecture: [https://arxiv.org/abs/2006.14548](https://arxiv.org/abs/2006.14548) Tensor Programs III: Neural ... Matrix Laws: [https://arxiv.org/abs/2009.10685](https://arxiv.org/abs/2009.10685) Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks: [https://proceedings.mlr.press/v139/yang21c.html](https://proceedings.mlr.press/v139/yang21c.html) Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer: [https://arxiv.org/abs/2203.03466](https://arxiv.org/abs/2203.03466)

401

K3tchM t1_j46kidw wrote

have [this survey about ML for Combinatorial Optimization](https://arxiv.org/abs/1811.06128) from Bengio, Lodi, and Provost. OpenAI's paper about a [robot hand learning to solve a rubik's cube](https://arxiv.org/abs/1910.07113) Also check ... aims to combine neural network learning with logic-based reasoning. Gary Marcus wrote [an extensive note](https://arxiv.org/pdf/2002.06177.pdf) on the subject that I recommend as well

7

blazejd OP t1_ix7mr03 wrote

merging the two concepts of language models and RL-based feedback. Some papers mentioned are: [https://arxiv.org/abs/2203.02155](https://arxiv.org/abs/2203.02155) and ["Experience Grounds Language"](https://aclanthology.org/2020.emnlp-main.703/) (although I didn't read them entirely yet). We could ... looking for more related resources, my thoughts were inspired by the field of language emergence ([https://arxiv.org/pdf/2006.02419.pdf](https://arxiv.org/pdf/2006.02419.pdf)) and this work ([https://arxiv.org/pdf/2112.11911.pdf](https://arxiv.org/pdf/2112.11911.pdf)).

3

MetaAI_Official OP t1_izfk9ug wrote

could tackle along the way. That led to our papers on [human-level no-press Diplomacy](https://arxiv.org/abs/2010.02923), [no-press Diplomacy from scratch](https://arxiv.org/abs/2110.02924), [better modeling of humans in no-press Diplomacy ... proceedings.mlr.press/v162/jacob22a.html), and [expert-level no-press Diplomacy](https://arxiv.org/abs/2210.05492).

7

Aseyhe t1_jc6ofrj wrote

gravity. Beyond these, here are articles discussing the point further: (1) [A diatribe on expanding space](https://arxiv.org/abs/0809.4573). This is pretty technical, but it's the most direct attack on the idea of expanding ... cosmic expansion is simply not relevant to it. (2) [The kinematic origin of the cosmological redshift](https://arxiv.org/abs/0808.1081). Very well written and less technical, although there are mathematical arguments. The main point of this ... space is nonexistent, not merely negligible. (3) [On The Relativity of Redshifts: Does Space Really "Expand"?](https://arxiv.org/abs/1605.08634) The least technical of the batch, this article is also focused on the interpretation

53

eyeofthephysics t1_jbhu9d4 wrote

just tuned for sentiment analysis. There are two groups who developed models they called FinBERT [https://arxiv.org/abs/1908.10063](https://arxiv.org/abs/1908.10063) and [https://arxiv.org/abs/2006.08097](https://arxiv.org/abs/2006.08097). The first paper's model can be fond [here](https://olab.research.google.com/drive/1hFJrZXZBClzz6Fqkb9kbETYZqS2qdbj3?authuser=1#scrollTo=0Ph5eRsIqWA7) ... tasks. Since you're interested in text embeddings, you may also be interested in this paper [https://arxiv.org/pdf/2111.00526.pdf](https://arxiv.org/pdf/2111.00526.pdf). The focus of that paper is sentiment analysis, but the general idea of using a sentence

2

1azytux OP t1_jd2ho88 wrote

papers given : \- [Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering](https://arxiv.org/abs/2209.09513) \- [Multimodal Chain-of-Thought Reasoning in Language Models](https://arxiv.org/abs/2302.00923) and such .. with general chain of thought ... idea for language can be looked at [this paper](https://arxiv.org/abs/2201.11903). I'm not sure if the link you provided will work, but as it's huge I might have missed (I've glanced

1

ttt05 t1_j0ju037 wrote

looks like I messed up the years, but all of these are good references) 1. MSP: [https://arxiv.org/abs/1610.02136](https://arxiv.org/abs/1610.02136) 2. OE: [https://arxiv.org/pdf/1812.04606.pdf](https://arxiv.org/pdf/1812.04606.pdf) 3. One vs all: [https://arxiv.org/abs/2007.05134](https://arxiv.org/abs/2007.05134)

15

Aseyhe t1_jaka1l0 wrote

Further reading on *expanding space* not being a physically real phenomenon: * [A diatribe on expanding space](https://arxiv.org/abs/0809.4573) * [The kinematic origin of the cosmological redshift](https://arxiv.org/abs/0808.1081) * [On The Relativity of Redshifts: Does ... Space Really "Expand"?](https://arxiv.org/abs/1605.08634) Further reading on cosmological dynamics with Newtonian gravity: * [The dynamics of Newtonian cosmology](https://web.mit.edu/8.286/www/lecn18/ln03-euf18.pdf) * or more generally, just search for "Newtonian cosmology

18

Aseyhe t1_j2kql8y wrote

public consciousness, here are some articles discussing the point further. (1) [A diatribe on expanding space](https://arxiv.org/abs/0809.4573). This is pretty technical, but it's the most direct attack on the idea of expanding ... expansion is simply no longer relevant to it. (2) [The kinematic origin of the cosmological redshift](https://arxiv.org/abs/0808.1081). Very well written and less technical, although there are mathematical arguments. The main point of this ... viewed as just a Doppler shift. (3) [On The Relativity of Redshifts: Does Space Really "Expand"?](https://arxiv.org/abs/1605.08634) The least technical of the batch. This article is also focused on the interpretation

17

activatedgeek t1_j9jvj8h wrote

prefer functions that handle translation equivariance (not exactly true but only roughly due to pooling layers). https://arxiv.org/abs/1806.01261 Graph neural networks provide a relational inductive bias. https://arxiv.org/abs/1806.01261 Neural networks overall prefer simpler ... solutions, embodying Occam’s razor, another inductive bias. This argument is made theoretically using Kolmogorov complexity. https://arxiv.org/abs/1805.08522

107

adt t1_j9neq5w wrote

optimizations mean that you can squish models onto modern GPUs now (i.e. [int8](https://arxiv.org/abs/2208.07339) etc.). Designed to be fit onto a standard GPU, DeepMind Gato was bigger than I thought, with starting size ... paper, which compresses the models to 7MB? It lists some 1.2M-6.2M param models: [https://arxiv.org/pdf/1909.11687.pdf](https://arxiv.org/pdf/1909.11687.pdf) My table shows... [https://docs.google.com/spreadsheets/d/1O5KVQW1Hx5ZAkcg8AIRjbQLQzx2wVaLl0SqUu-ir9Fs/edit#gid=1158069878](https://docs.google.com/spreadsheets/d/1O5KVQW1Hx5ZAkcg8AIRjbQLQzx2wVaLl0SqUu-ir9Fs/edit#gid=1158069878) \*looks at table\* Smallest seems to be Microsoft Pact, which ... they were not really LLMs. They did train a 10M model during scaling research ([paper](https://arxiv.org/abs/2205.10487)), but the model hasn't been released

25

MysteryInc152 t1_j81e986 wrote

Reply to comment by rretaemer1 in Open source AI by rretaemer1

LLMs are insanely impressive for a number of reasons. They emerge new abilities at scale - [https://arxiv.org/abs/2206.07682](https://arxiv.org/abs/2206.07682) They build internal world models - [https://thegradient.pub/othello/](https://thegradient.pub/othello/) They can be grounded to robotics ... robots brain) - [https://say-can.github.io/](https://say-can.github.io/), https://inner-monologue.github.io/ They can teach themselves how to use tools - [https://arxiv.org/abs/2302.04761](https://arxiv.org/abs/2302.04761) They've developed a theory of mind - [https://arxiv.org/abs/2302.02083](https://arxiv.org/abs/2302.02083) I'm sorry but anyone who looks

3

ARGleave t1_iuseu7k wrote

think we ever claimed it was. This is building on the [adversarial policies threat model](https://arxiv.org/abs/1905.10615) we introduced a couple of years ago. The norm-bounded perturbation threat model is an interesting lens ... think it's pretty limited: [Gilmer et al (2018)](https://arxiv.org/abs/1807.06732) had an interesting exploration of alternative threat models for supervised learning, and we view our work as similar in spirit to [unrestricted adversarial ... examples](https://arxiv.org/abs/1809.08352).

2

albertzeyer t1_j65rtdq wrote

papers where people only use attention-based encoder-decoder (AED) for speech recognition. Some random papers: * [https://arxiv.org/abs/1508.01211](https://arxiv.org/abs/1508.01211) * [https://arxiv.org/abs/2001.07263](https://arxiv.org/abs/2001.07263) * [https://arxiv.org/abs/2104.05544](https://arxiv.org/abs/2104.05544) See my Phd thesis for some overview over

2

PiGuyInTheSky t1_j9sx3nd wrote

problems to solve, yes, but there are also very technical problems to solve, like [power-seeking](https://arxiv.org/abs/2206.13477) or [inner misalignment](https://arxiv.org/abs/2105.14111) or [mechanistic interpretability](https://arxiv.org/abs/2301.05217) that are much less

11

qalis t1_j6mbu5s wrote

www.youtube.com/watch?v=CAm21rqCeSU) and [GPT-3 lecture 2](https://www.youtube.com/watch?v=5D315JD8kYg) and [GPT-3 paper](https://arxiv.org/pdf/2005.14165.pdf) to learn about GPT-3 \- [InstructGPT page](https://openai.com/blog/instruction-following/) and [InstructGPT paper](https://arxiv.org/pdf/2203.02155.pdf) to learn ... RLHF is based on Proximal Policy Optimization algorithm \- [PPO page](https://openai.com/blog/openai-baselines-ppo/) and [PPO paper](https://arxiv.org/pdf/1707.06347.pdf)

3

andreichiffa t1_j6n9lg6 wrote

memorizing a lot of information from the training dataset a little less than a year later: https://arxiv.org/abs/2012.07805 About a year after that Anthropic came out with a paper that suggested that there were ... that meant undertrained larger models did not that much better and actually did need more data: https://arxiv.org/pdf/2202.07785.pdf Finally, more recent results from DeepMind did an additional pass on the topic and seem ... that a 4x smaller model trained for 4x the time would out-perform the larger model: https://arxiv.org/pdf/2203.15556.pdf Basically the original OpenAI paper did contradict a lot of prior research on overfitting and generalization

1

i-heart-turtles t1_iusf0zy wrote

full Jacobian- people do similar things in adversarial robustness so you can have a look. [https://arxiv.org/abs/1907.02610](https://arxiv.org/abs/1907.02610) [https://arxiv.org/abs/1901.08573](https://arxiv.org/abs/1901.08573) I think you should check the stuff on evaluating for disentanglement. This paper could ... also be useful for u: [https://arxiv.org/abs/1812.06775](https://arxiv.org/abs/1812.06775). For vae disentanglement better Jacobian is close to orthogonal than just small norm

1

lorepieri t1_j1z4zp5 wrote

years ago and nobody took the effort to put into a modern GPU accelerated codebase. [https://arxiv.org/abs/2012.05876](https://arxiv.org/abs/2012.05876) Neurosymbolic AI: The 3rd Wave [https://arxiv.org/abs/2105.05330](https://arxiv.org/abs/2105.05330) Neuro-Symbolic Artificial Intelligence: Current Trends [https://arxiv.org/abs/2002.00388](https://arxiv.org/abs/2002.00388)

11