Search

50 results for arxiv.org:

Think About Scaling LLMs In 2020, a team of researchers from OpenAI released a [paper](https://arxiv.org/pdf/2001.08361.pdf) called: “Scaling Laws For Neural Language Models”. They observed a predictable decrease in training loss when increasing ... that is what people did. The models got larger and larger with GPT-3 (175B), [Gopher](https://arxiv.org/pdf/2112.11446.pdf) (280B), [Megatron-Turing NLG](https://arxiv.org/pdf/2201.11990) (530B) just to name a few. But the bigger ... number of training tokens should double as well. This was published in DeepMind’s 2022 [paper](https://arxiv.org/pdf/2203.15556.pdf): “Training Compute-Optimal Large Language Models” The researchers fitted over 400 language models ranging from

InfuriatinglyOpaque t1_ivb9otw wrote on November 6, 2022 at 6:26 PM

Reply to Training a board game player AI for an asymmetric game by computing_professor

Dorka, N., Burgard, W., Koltun, V., & Brox, T. (2020). Scaling Imitation Learning in Minecraft. [http://arxiv.org/abs/2007.02701](http://arxiv.org/abs/2007.02701) Bramlage, L., & Cortese, A. (2021). Generalized Attention-Weighted Reinforcement Learning. Neural Networks. [https://doi.org/10.1016/j.neunet.2021.09.023](https://doi.org/10.1016/j.neunet.2021.09.023) Frey ... Characterizing the dynamics of learning in repeated reference games. Cognitive Science, 44(6), e12845. [http://arxiv.org/abs/1912.07199](http://arxiv.org/abs/1912.07199) Kumaran, V., Mott, B. W., & Lester, J. C. (2019.). Generating Game Levels for Multiple Distinct Games with ... Hjelm, D., Bachman, P., & Courville, A. (2021). Pretraining Representations for Data-Efficient Reinforcement Learning. [http://arxiv.org/abs/2106.04799](http://arxiv.org/abs/2106.04799) Sibert, C., Gray, W. D., & Lindstedt, J. K. (2017). Interrogating Feature Learning Models to Discover Insights

Nameless1995 t1_iyyl3m5 wrote on December 5, 2022 at 3:30 AM

Reply to comment by ReadSeparate in [D] OpenAI’s ChatGPT is unbelievable good in telling stories! by Far_Pineapple770

Technically Lambda already uses "external database" i.e external tools (the internet, calculator, etc.) to retrieve information: https://arxiv.org/pdf/2201.08239.pdf (Section 6.2) It doesn't solve /u/ThePahtomPhoton's memory problem (I don't remember what GPT3 ... GPT3 level). One solution is using a kNN lookup in a non-differentiable manner: https://arxiv.org/abs/2203.08913 One solution is making Transformers semi-recurrent (process inside chunks parallely, then sequencially process some coarse-compressed-chunk ... representation sequentially.). This can allow information to be carried in through the sequential process: https://arxiv.org/pdf/2203.07852 https://openreview.net/forum?id=mq-8p5pUnEX Another solution is to augment Transformer with a State Space model which have shown great

FrogBearSalamander t1_jc5vvrb wrote on March 14, 2023 at 7:24 AM

Reply to comment by currentscurrents in [D]: Generalisation ability of autoencoders by Blutorangensaft

Would love to read some research papers if you have a link! - [Nonlinear Transform Coding](https://arxiv.org/abs/2007.03034) - [An Introduction to Neural Data Compression](https://arxiv.org/abs/2202.06533) - [SoundStream: An End-to-End Neural Audio Codec ... arxiv.org/abs/2107.03312) - Old but foundational: [End-to-end Optimized Image Compression](https://arxiv.org/abs/1611.01704) - And this paper made the connection between compression models and VAEs: [Variational image compression with a scale hyperprior](https://arxiv.org/abs/1802.01436) ... that SoundStream (mentioned above) uses residual VQ (RVQ). - [Image Compression with Product Quantized Masked Image Modeling](https://arxiv.org/abs/2212.07372) uses a kind of VQ (subdivide the latent vectors and code separate to form a product

www.siegemedia.com/seo/most-popular-keywords#:~:text=The) winner of most popular,or "weather" for short. \[5\] [https://twitter.com/vladquant/status/1624996869654056960?s=46&t=oAzVIB-avPf-JbQAnhcbtA](https://twitter.com/vladquant/status/1624996869654056960?s=46&t=oAzVIB-avPf-JbQAnhcbtA) \[6\] [https://arxiv.org/pdf/2112.09332.pdf](https://arxiv.org/pdf/2112.09332.pdf) \[7\] [https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/](https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/) \[8\] [https://arxiv.org/abs/1706.03762](https://arxiv.org/abs/1706.03762) \[9\] [https://arxiv.org/abs/2201.08239](https://arxiv.org/abs/2201.08239) \[10\] [https://arxiv.org/abs/2112.04426](https://arxiv.org/abs/2112.04426) ... www.quora.com/What-percentage-of-web-search-queries-are-navigational](https://www.quora.com/What-percentage-of-web-search-queries-are-navigational) \[13\] [https://www.statista.com/statistics/413229/search-query-size-search-engine-share/](https://www.statista.com/statistics/413229/search-query-size-search-engine-share/) \[14\] [https://www.forbes.com/sites/johanmoreno/2021/08/27/google-estimated-to-be-paying-15-billion-to-remain-default-search-engine-on-safari/?sh=40cbbfcf669b](https://www.forbes.com/sites/johanmoreno/2021/08/27/google-estimated-to-be-paying-15-billion-to-remain-default-search-engine-on-safari/?sh=40cbbfcf669b) \[15\] [https://businessquant.com/microsoft-revenue-by-product](https://businessquant.com/microsoft-revenue-by-product) \[16\] [https://arxiv.org/abs/2209.01667](https://arxiv.org/abs/2209.01667)

popular practice/belief is unsound or useless. Some famous examples are: **Troubling Trends in ML** [https://arxiv.org/pdf/1807.03341.pdf](https://arxiv.org/pdf/1807.03341.pdf) **ML that Matters** [https://arxiv.org/abs/1206.4656](https://arxiv.org/abs/1206.4656) **On the Convergence of ADAM** [https://arxiv.org/abs/1904.09237](https://arxiv.org/abs/1904.09237) **On the Information Bottleneck ... iopscience.iop.org/article/10.1088/1742-5468/ab3985](https://iopscience.iop.org/article/10.1088/1742-5468/ab3985) **Implementation Matters in Deep Policy Gradients** [https://arxiv.org/abs/2005.12729](https://arxiv.org/abs/2005.12729) (showed a certain purported algorithm gain is actually mainly due to code-level optimization) **Critique of Turing Award** [https://people.idsia.ch/\~juergen/critique-turing-award-bengio-hinton-lecun.html](https://people.idsia.ch/~juergen/critique-turing-award-bengio-hinton-lecun.html) ... basically a critique on the citation practice in ML) **Deep Learning a Critical Appraisal** [https://arxiv.org/abs/1801.00631](https://arxiv.org/abs/1801.00631) However, these are a little bit dated. Does anyone have any recent critique papers of similar flavour

trend has been AI's societal impact. if anyone's read the[ recent job impact paper](https://arxiv.org/abs/2303.10130), one of the factors that jumped out was the exposure of blockchain engineering to AI-based ... function of any group of market participants. with respect to ML frameworks like[ sparsely-gated MoE](https://arxiv.org/abs/1701.06538v1),[ world models](https://arxiv.org/abs/2301.04104v1),[ multimodality](https://arxiv.org/abs/2303.03378), and[ adaptive agents](https://arxiv.org/abs/2301.07608):

qalis t1_j8driqb wrote on February 13, 2023 at 3:47 PM

Reply to [D] What are resources to start with GNN and GraphML? by chhaya_35

help. A bit of self promotion, but my Master's thesis was about GNNs: [https://arxiv.org/abs/2211.03666](https://arxiv.org/abs/2211.03666). It should be very beginner-friendly, since I had to write it while also learning about this step ... articles are also great, e.g. [https://distill.pub/2021/gnn-intro/](https://distill.pub/2021/gnn-intro/) or a well known (in this field) [https://arxiv.org/abs/1901.00596](https://arxiv.org/abs/1901.00596). You should also definitely read papers about GCN (very intuitively written), GAT, GraphSAGE and GIN, the most ... with **a lot** of suspicion. This paper about fair comparison is becoming more and more used: [https://arxiv.org/abs/1912.09893](https://arxiv.org/abs/1912.09893). This baseline, not GNN but similar, gives very strong results: [https://arxiv.org/abs/1811.03508](https://arxiv.org/abs/1811.03508). I will

cnapun t1_j10a9jz wrote on December 20, 2022 at 7:00 PM

Reply to comment by hawkxor in [D] Deep Learning based Recommendation Systems by Awekonti

better or worse results. Some not super-recent papers I can think of: [https://research.google/pubs/pub50257/](https://research.google/pubs/pub50257/) [https://arxiv.org/abs/1706.07567](https://arxiv.org/abs/1706.07567) [https://arxiv.org/abs/2010.14395](https://arxiv.org/abs/2010.14395) [https://arxiv.org/abs/1907.00937](https://arxiv.org/abs/1907.00937) (3.2) [https://arxiv.org/abs/2006.11632](https://arxiv.org/abs/2006.11632) (2.2/2.4

serge_cell t1_j5akgwk wrote on January 21, 2023 at 4:28 PM

Reply to [D] Are there any results on convergence guarantees when optimizing NNs? by Dartagnjan

several years ago and in [this same subreddit too](https://www.reddit.com/r/MachineLearning/comments/a8xjh0/d_im_tired_of_reading_resultsoriented_papers_what/). For example: https://arxiv.org/abs/1810.02054 https://arxiv.org/abs/1811.03804 https://arxiv.org/abs/1811.03962 https://arxiv.org/abs/1811.08888 This is recurring question, people asking it every year

benanne OP t1_j427zj0 wrote on January 12, 2023 at 5:52 PM

Reply to comment by chodegoblin69 in [R] Diffusion language models by benanne

very easy to use architectures where computation is largely decoupled from the sequence length, like Perceivers (https://arxiv.org/abs/2103.03206, https://arxiv.org/abs/2107.14795), or Recurrent Interface Networks (https://arxiv.org/abs/2212.11972). This is highly speculative though ... aware that an autoregressive variant of the Perceiver architecture exists (https://arxiv.org/abs/2202.07765), but it is actually quite a bit less general/flexible than Perceiver IO / the original Perceiver

olmec-akeru OP t1_iy2zjoi wrote on November 28, 2022 at 10:26 AM

Reply to comment by NonOptimized in [D] What method is state of the art dimensionality reduction by olmec-akeru

arxiv.org/pdf/2204.04273.pdf](https://arxiv.org/pdf/2204.04273.pdf) [https://arxiv.org/pdf/2203.09347.pdf](https://arxiv.org/pdf/2203.09347.pdf) [https://arxiv.org/pdf/2206.06513.pdf](https://arxiv.org/pdf/2206.06513.pdf) and the one speaking to categorical variables: [https://arxiv.org/pdf/2112.00362.pdf](https://arxiv.org/pdf/2112.00362.pdf)

prototypist t1_j0c5p2j wrote on December 15, 2022 at 4:09 PM

Reply to [D] Is "natural" text always maximally likely according to language models ? by Emergency_Apricot_77

human-like decoder for language models and seeing what outputs humans prefer. Transformers supports [typical decoding](https://arxiv.org/abs/2202.00666) and [contrastive search](https://huggingface.co/blog/introducing-csearch), and there are papers and code out for [RankGen ... arxiv.org/abs/2205.09726), [Time Control](https://arxiv.org/abs/2203.11370), and [Contrastive Decoding](https://arxiv.org/abs/2210.15097) (which is totally different from contrastive search

JNmbrs t1_isgqdyr wrote on October 15, 2022 at 9:24 PM

Reply to comment by evanthebouncy in [P] a minimalist guide to program synthesis by evanthebouncy

work on these systems, the work seems to focus on improvements in (a) search algorithms (e.g., [https://arxiv.org/pdf/2110.12485.pdf](https://arxiv.org/pdf/2110.12485.pdf)); (b) program abstraction/library compression (e.g., [https://mlb2251.github.io/stitch\_jul11.pdf](https://mlb2251.github.io/stitch_jul11.pdf) and [http://andrewcropper.com/pubs/aaai20-forgetgol.pdf](http://andrewcropper.com/pubs/aaai20-forgetgol.pdf)); ... optimizing neural guidance (e.g., [https://openreview.net/pdf?id=rCzfIruU5x5](https://openreview.net/pdf?id=rCzfIruU5x5) and [https://arxiv.org/pdf/2206.05922.pdf](https://arxiv.org/pdf/2206.05922.pdf)); and (d) specification (e.g., [https://arxiv.org/pdf/2007.05060.pdf](https://arxiv.org/pdf/2007.05060.pdf) and [https://arxiv.org/pdf/2204.02495.pdf](https://arxiv.org/pdf/2204.02495.pdf)). While obviously work proceeds in these (and other related) domains, I'd love

Throwaway00000000028 t1_iy42ker wrote on November 28, 2022 at 4:29 PM

Reply to comment by Afghan_ in [D] Simple Questions Thread by AutoModerator

Blog: [https://yang-song.net/blog/2021/score/](https://yang-song.net/blog/2021/score/) Youtube videos: [https://www.youtube.com/watch?v=fbLgFrlTnGU](https://www.youtube.com/watch?v=fbLgFrlTnGU) Seminal papers: \- Denoising Diffusion Probabilistic Models: [https://arxiv.org/abs/2006.11239](https://arxiv.org/abs/2006.11239) \- Improved Techniques for Training Score-based Generative Models: [https://arxiv.org/abs/2006.09011](https://arxiv.org/abs/2006.09011) \- Hierarchical Text-Conditional Image Generation with ... CLIP Latents: [https://arxiv.org/abs/2204.06125](https://arxiv.org/abs/2204.06125) Review papers: \- Understanding Diffusion Models: [https://arxiv.org/pdf/2208.11970.pdf](https://arxiv.org/pdf/2208.11970.pdf)

tariban t1_irw5z8d wrote on October 11, 2022 at 2:27 PM

Reply to [D] Looking for some critiques on recent development of machine learning by fromnighttilldawn

problems, despite many claims to the contrary: * [Tabular Data: Deep Learning is Not All You Need](http://arxiv.org/abs/2106.03253) * [In Search of Lost Domain Generalization](http://arxiv.org/abs/2007.01434) * [Unsupervised Domain Adaptation: A Reality Check ... arxiv.org/abs/2111.15672) * [A Baseline for Few-Shot Image Classification](http://arxiv.org/abs/1909.02729)

dangerhexagon t1_j4x2yrp wrote on January 18, 2023 at 9:35 PM

Reply to [R] Researchers out there: which are current research directions for tree-based models? by BenXavier

There's some papers on applying transformers to trees: [https://arxiv.org/abs/1909.06639](https://arxiv.org/abs/1909.06639) , [https://arxiv.org/abs/1911.09983](https://arxiv.org/abs/1911.09983) , [https://papers.nips.cc/paper/2019/hash/6e0917469214d8fbd8c517dcdc6b8dcf-Abstract.html](https://papers.nips.cc/paper/2019/hash/6e0917469214d8fbd8c517dcdc6b8dcf-Abstract.html) And some recent work on tree extraction: [https://arxiv.org/abs/2301.00447](https://arxiv.org/abs/2301.00447) There's also this paper which recovers ... tree by observing the leaf nodes: [https://arxiv.org/abs/2208.14924](https://arxiv.org/abs/2208.14924)

BerenMillidge t1_iy814ur wrote on November 29, 2022 at 1:09 PM

Reply to comment by Ambitious_Smile_981 in [Project] Erlang based framework to replace backprop using predictive coding by abhitopia

view them, is as a idealised exploration of a specific limit of PC. In recent work (https://arxiv.org/pdf/2206.02629), we expand on this limit idea and show that all current EBM approximations to BP, such ... number of its properties. We also have a more theoretical analysis of standard PC (https://arxiv.org/pdf/2207.12316) where we show that although it differs from backdrop, it can also converge to minima of a supervised ... advantages of PC over BP including the ability for it to learn arbitrary recurrent computation graphs (https://arxiv.org/pdf/2201.13180), the fact that you can significantly speed it up with incremental variants, and that

DinosParkour t1_iy7j1hw wrote on November 29, 2022 at 9:19 AM

Reply to [D] Difference between sparse and dense information retrieval by itsyourboiirow

choosing the most suitable ones) when it comes to computing the query-doc similarity. \[1\] [https://arxiv.org/abs/2201.10005](https://arxiv.org/abs/2201.10005) \[2\] [https://github.com/facebookresearch/faiss/](https://github.com/facebookresearch/faiss/) \[3\] [https://arxiv.org/abs/2107.05720](https://arxiv.org/abs/2107.05720) \[4\] [https://arxiv.org/abs/2004.12832](https://arxiv.org/abs/2004.12832) \[5\] [https://arxiv.org/abs/2211.01267](https://arxiv.org/abs/2211.01267)

understanding model in 2019 and evolved to ERNIE 3.0 Titan with 260 billion parameters. ERNIE 1.0: [https://arxiv.org/abs/1904.09223](https://arxiv.org/abs/1904.09223) ERNIE 2.0: [https://arxiv.org/abs/1907.12412](https://arxiv.org/abs/1907.12412) ERNIE 3.0: [https://arxiv.org/abs/2112.12731](https://arxiv.org/abs/2112.12731) ERNIE for text-to-image ... arxiv.org/abs/2210.15257](https://arxiv.org/abs/2210.15257) ERNIE Bot live-stream on YouTube: [https://www.youtube.com/watch?v=ukvEUI3x0vI](https://www.youtube.com/watch?v=ukvEUI3x0vI)

papers: Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes: [https://arxiv.org/abs/1910.12478](https://arxiv.org/abs/1910.12478) Tensor Programs II: Neural Tangent Kernel for Any Architecture: [https://arxiv.org/abs/2006.14548](https://arxiv.org/abs/2006.14548) Tensor Programs III: Neural ... Matrix Laws: [https://arxiv.org/abs/2009.10685](https://arxiv.org/abs/2009.10685) Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks: [https://proceedings.mlr.press/v139/yang21c.html](https://proceedings.mlr.press/v139/yang21c.html) Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer: [https://arxiv.org/abs/2203.03466](https://arxiv.org/abs/2203.03466)

K3tchM t1_j46kidw wrote on January 13, 2023 at 2:57 PM

Reply to comment by ElectronicCress3132 in [D] Has ML become synonymous with AI? by Valachio

have [this survey about ML for Combinatorial Optimization](https://arxiv.org/abs/1811.06128) from Bengio, Lodi, and Provost. OpenAI's paper about a [robot hand learning to solve a rubik's cube](https://arxiv.org/abs/1910.07113) Also check ... aims to combine neural network learning with logic-based reasoning. Gary Marcus wrote [an extensive note](https://arxiv.org/pdf/2002.06177.pdf) on the subject that I recommend as well

blazejd OP t1_ix7mr03 wrote on November 21, 2022 at 10:57 AM

Reply to [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

merging the two concepts of language models and RL-based feedback. Some papers mentioned are: [https://arxiv.org/abs/2203.02155](https://arxiv.org/abs/2203.02155) and ["Experience Grounds Language"](https://aclanthology.org/2020.emnlp-main.703/) (although I didn't read them entirely yet). We could ... looking for more related resources, my thoughts were inspired by the field of language emergence ([https://arxiv.org/pdf/2006.02419.pdf](https://arxiv.org/pdf/2006.02419.pdf)) and this work ([https://arxiv.org/pdf/2112.11911.pdf](https://arxiv.org/pdf/2112.11911.pdf)).

MetaAI_Official OP t1_izfk9ug wrote on December 8, 2022 at 7:11 PM

Reply to comment by Roger_M8 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

could tackle along the way. That led to our papers on [human-level no-press Diplomacy](https://arxiv.org/abs/2010.02923), [no-press Diplomacy from scratch](https://arxiv.org/abs/2110.02924), [better modeling of humans in no-press Diplomacy ... proceedings.mlr.press/v162/jacob22a.html), and [expert-level no-press Diplomacy](https://arxiv.org/abs/2210.05492).

Nameless1995 t1_j43ku48 wrote on January 12, 2023 at 10:48 PM

Reply to [R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? by Chemont

Universal Transformer: https://arxiv.org/abs/1807.03819 Ponder Net: https://arxiv.org/abs/2107.05407 Deep Equilibrium Net: https://arxiv.org/abs/1909.01377 http://www.gatsby.ucl.ac.uk/~balaji/udl2021/accepted-papers/UDL2021-paper-072.pdf

Aseyhe t1_jc6ofrj wrote on March 14, 2023 at 1:13 PM

Reply to Does space expansion occur uniformly in all directions and dimensions? by Tank_AT

gravity. Beyond these, here are articles discussing the point further: (1) [A diatribe on expanding space](https://arxiv.org/abs/0809.4573). This is pretty technical, but it's the most direct attack on the idea of expanding ... cosmic expansion is simply not relevant to it. (2) [The kinematic origin of the cosmological redshift](https://arxiv.org/abs/0808.1081). Very well written and less technical, although there are mathematical arguments. The main point of this ... space is nonexistent, not merely negligible. (3) [On The Relativity of Redshifts: Does Space Really "Expand"?](https://arxiv.org/abs/1605.08634) The least technical of the batch, this article is also focused on the interpretation

eyeofthephysics t1_jbhu9d4 wrote on March 9, 2023 at 3:33 AM

Reply to [D] Text embedding model for financial documents by [deleted]

just tuned for sentiment analysis. There are two groups who developed models they called FinBERT [https://arxiv.org/abs/1908.10063](https://arxiv.org/abs/1908.10063) and [https://arxiv.org/abs/2006.08097](https://arxiv.org/abs/2006.08097). The first paper's model can be fond [here](https://olab.research.google.com/drive/1hFJrZXZBClzz6Fqkb9kbETYZqS2qdbj3?authuser=1#scrollTo=0Ph5eRsIqWA7) ... tasks. Since you're interested in text embeddings, you may also be interested in this paper [https://arxiv.org/pdf/2111.00526.pdf](https://arxiv.org/pdf/2111.00526.pdf). The focus of that paper is sentiment analysis, but the general idea of using a sentence

1azytux OP t1_jd2ho88 wrote on March 21, 2023 at 11:17 AM

Reply to comment by aozorahime in Recent advances in multimodal models: What are your thoughts on chain of thoughts models? [D] by 1azytux

papers given : \- [Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering](https://arxiv.org/abs/2209.09513) \- [Multimodal Chain-of-Thought Reasoning in Language Models](https://arxiv.org/abs/2302.00923) and such .. with general chain of thought ... idea for language can be looked at [this paper](https://arxiv.org/abs/2201.11903). I'm not sure if the link you provided will work, but as it's huge I might have missed (I've glanced

ttt05 t1_j0ju037 wrote on December 17, 2022 at 4:35 AM

Reply to comment by 3nilBarca in [D] Is softmax a good choice for confidence? by thanderrine

looks like I messed up the years, but all of these are good references) 1. MSP: [https://arxiv.org/abs/1610.02136](https://arxiv.org/abs/1610.02136) 2. OE: [https://arxiv.org/pdf/1812.04606.pdf](https://arxiv.org/pdf/1812.04606.pdf) 3. One vs all: [https://arxiv.org/abs/2007.05134](https://arxiv.org/abs/2007.05134)

Aseyhe t1_jaka1l0 wrote on March 2, 2023 at 12:59 AM

Reply to Why do cosmologists say that gravity should "slow down" the expansion of the universe? by crazunggoy47

Further reading on *expanding space* not being a physically real phenomenon: * [A diatribe on expanding space](https://arxiv.org/abs/0809.4573) * [The kinematic origin of the cosmological redshift](https://arxiv.org/abs/0808.1081) * [On The Relativity of Redshifts: Does ... Space Really "Expand"?](https://arxiv.org/abs/1605.08634) Further reading on cosmological dynamics with Newtonian gravity: * [The dynamics of Newtonian cosmology](https://web.mit.edu/8.286/www/lecn18/ln03-euf18.pdf) * or more generally, just search for "Newtonian cosmology

Aseyhe t1_j2kql8y wrote on January 2, 2023 at 1:28 AM

Reply to comment by InSight89 in Is any "movement" visible in the fluctuations of the CMB over time, or does it appear static? by JarasM

public consciousness, here are some articles discussing the point further. (1) [A diatribe on expanding space](https://arxiv.org/abs/0809.4573). This is pretty technical, but it's the most direct attack on the idea of expanding ... expansion is simply no longer relevant to it. (2) [The kinematic origin of the cosmological redshift](https://arxiv.org/abs/0808.1081). Very well written and less technical, although there are mathematical arguments. The main point of this ... viewed as just a Doppler shift. (3) [On The Relativity of Redshifts: Does Space Really "Expand"?](https://arxiv.org/abs/1605.08634) The least technical of the batch. This article is also focused on the interpretation

activatedgeek t1_j9jvj8h wrote on February 22, 2023 at 2:25 PM

Reply to [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

prefer functions that handle translation equivariance (not exactly true but only roughly due to pooling layers). https://arxiv.org/abs/1806.01261 Graph neural networks provide a relational inductive bias. https://arxiv.org/abs/1806.01261 Neural networks overall prefer simpler ... solutions, embodying Occam’s razor, another inductive bias. This argument is made theoretically using Kolmogorov complexity. https://arxiv.org/abs/1805.08522

julbern OP t1_ivyy0g1 wrote on November 11, 2022 at 5:29 PM

Reply to comment by Benlus in [R] An optimal control perspective on diffusion-based generative modeling by julbern

more recent works in this direction are the following: 1. [B. Tzen and M. Raginsky (2019)](https://arxiv.org/abs/1903.01608) 2. [N. Nüsken and L. Richter (2021)](https://arxiv.org/abs/2005.05409) 3. [M. Pavon (2022)](https://arxiv.org

adt t1_j9neq5w wrote on February 23, 2023 at 5:25 AM

Reply to [D] 14.5M-15M is the smallest number of parameters I could find for current pretrained language models. Are there any that are smaller? by Seankala

optimizations mean that you can squish models onto modern GPUs now (i.e. [int8](https://arxiv.org/abs/2208.07339) etc.). Designed to be fit onto a standard GPU, DeepMind Gato was bigger than I thought, with starting size ... paper, which compresses the models to 7MB? It lists some 1.2M-6.2M param models: [https://arxiv.org/pdf/1909.11687.pdf](https://arxiv.org/pdf/1909.11687.pdf) My table shows... [https://docs.google.com/spreadsheets/d/1O5KVQW1Hx5ZAkcg8AIRjbQLQzx2wVaLl0SqUu-ir9Fs/edit#gid=1158069878](https://docs.google.com/spreadsheets/d/1O5KVQW1Hx5ZAkcg8AIRjbQLQzx2wVaLl0SqUu-ir9Fs/edit#gid=1158069878) \*looks at table\* Smallest seems to be Microsoft Pact, which ... they were not really LLMs. They did train a 10M model during scaling research ([paper](https://arxiv.org/abs/2205.10487)), but the model hasn't been released

MysteryInc152 t1_j81e986 wrote on February 10, 2023 at 10:24 PM

Reply to comment by rretaemer1 in Open source AI by rretaemer1

LLMs are insanely impressive for a number of reasons. They emerge new abilities at scale - [https://arxiv.org/abs/2206.07682](https://arxiv.org/abs/2206.07682) They build internal world models - [https://thegradient.pub/othello/](https://thegradient.pub/othello/) They can be grounded to robotics ... robots brain) - [https://say-can.github.io/](https://say-can.github.io/), https://inner-monologue.github.io/ They can teach themselves how to use tools - [https://arxiv.org/abs/2302.04761](https://arxiv.org/abs/2302.04761) They've developed a theory of mind - [https://arxiv.org/abs/2302.02083](https://arxiv.org/abs/2302.02083) I'm sorry but anyone who looks

ARGleave t1_iuseu7k wrote on November 2, 2022 at 6:00 PM

Reply to comment by KellinPelrine in [N] Adversarial Policies Beat Professional-Level Go AIs by xutw21

think we ever claimed it was. This is building on the [adversarial policies threat model](https://arxiv.org/abs/1905.10615) we introduced a couple of years ago. The norm-bounded perturbation threat model is an interesting lens ... think it's pretty limited: [Gilmer et al (2018)](https://arxiv.org/abs/1807.06732) had an interesting exploration of alternative threat models for supervised learning, and we view our work as similar in spirit to [unrestricted adversarial ... examples](https://arxiv.org/abs/1809.08352).

albertzeyer t1_j65rtdq wrote on January 27, 2023 at 10:22 PM

Reply to [D] Why are there no End2End Speech Recognition models using the same Encoder-Decoder learning process as BART (no CTC) ? by KarmaCut132

papers where people only use attention-based encoder-decoder (AED) for speech recognition. Some random papers: * [https://arxiv.org/abs/1508.01211](https://arxiv.org/abs/1508.01211) * [https://arxiv.org/abs/2001.07263](https://arxiv.org/abs/2001.07263) * [https://arxiv.org/abs/2104.05544](https://arxiv.org/abs/2104.05544) See my Phd thesis for some overview over

PiGuyInTheSky t1_j9sx3nd wrote on February 24, 2023 at 9:18 AM

Reply to [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt

problems to solve, yes, but there are also very technical problems to solve, like [power-seeking](https://arxiv.org/abs/2206.13477) or [inner misalignment](https://arxiv.org/abs/2105.14111) or [mechanistic interpretability](https://arxiv.org/abs/2301.05217) that are much less

qalis t1_j6mbu5s wrote on January 31, 2023 at 10:02 AM

Reply to [Discussion] Misinformation about ChatGPT and ML in media and where to find good sources of information by Silvestron

www.youtube.com/watch?v=CAm21rqCeSU) and [GPT-3 lecture 2](https://www.youtube.com/watch?v=5D315JD8kYg) and [GPT-3 paper](https://arxiv.org/pdf/2005.14165.pdf) to learn about GPT-3 \- [InstructGPT page](https://openai.com/blog/instruction-following/) and [InstructGPT paper](https://arxiv.org/pdf/2203.02155.pdf) to learn ... RLHF is based on Proximal Policy Optimization algorithm \- [PPO page](https://openai.com/blog/openai-baselines-ppo/) and [PPO paper](https://arxiv.org/pdf/1707.06347.pdf)

andreichiffa t1_j6n9lg6 wrote on January 31, 2023 at 3:22 PM

Reply to comment by visarga in Few questions about scalability of chatGPT [D] by besabestin

memorizing a lot of information from the training dataset a little less than a year later: https://arxiv.org/abs/2012.07805 About a year after that Anthropic came out with a paper that suggested that there were ... that meant undertrained larger models did not that much better and actually did need more data: https://arxiv.org/pdf/2202.07785.pdf Finally, more recent results from DeepMind did an additional pass on the topic and seem ... that a 4x smaller model trained for 4x the time would out-perform the larger model: https://arxiv.org/pdf/2203.15556.pdf Basically the original OpenAI paper did contradict a lot of prior research on overfitting and generalization

i-heart-turtles t1_iusf0zy wrote on November 2, 2022 at 6:02 PM

Reply to comment by Dear-Vehicle-3215 in [D] About the evaluation of the features extracted by an Autoencoder by Dear-Vehicle-3215

full Jacobian- people do similar things in adversarial robustness so you can have a look. [https://arxiv.org/abs/1907.02610](https://arxiv.org/abs/1907.02610) [https://arxiv.org/abs/1901.08573](https://arxiv.org/abs/1901.08573) I think you should check the stuff on evaluating for disentanglement. This paper could ... also be useful for u: [https://arxiv.org/abs/1812.06775](https://arxiv.org/abs/1812.06775). For vae disentanglement better Jacobian is close to orthogonal than just small norm

lorepieri t1_j1z4zp5 wrote on December 28, 2022 at 2:06 PM

Reply to [D] DeepMind has at least half a dozen prototypes for abstract/symbolic reasoning. What are their approaches? by valdanylchuk

years ago and nobody took the effort to put into a modern GPU accelerated codebase. [https://arxiv.org/abs/2012.05876](https://arxiv.org/abs/2012.05876) Neurosymbolic AI: The 3rd Wave [https://arxiv.org/abs/2105.05330](https://arxiv.org/abs/2105.05330) Neuro-Symbolic Artificial Intelligence: Current Trends [https://arxiv.org/abs/2002.00388](https://arxiv.org/abs/2002.00388)