Submitted by cautioushedonist t3_yto34q in MachineLearning

I work exclusively in NLP and since the transformers and especially their pretrained type took over, I haven't written a neural nets (RNN, LSTM, etc.) in over 3 years and haven't had to worry about things like # of layers, hidden size, etc.

Tabular data has XGBoost, etc. NLP has Pretained Transformers. Images have Pretrained CNNs, Transformers.

But I've been through some ML system design books and recommendation system solutions often display neural nets, so that's interesting.

What was the problem and type of data at hand when you last wrote a neural net yourself, layer by layer?

Thanks y'all!

183

Comments

You must log in or register to comment.

entropyvsenergy t1_iw58dge wrote

It's all frameworks now, some better than others. I haven't written one outside of demos or interviews in years. With that being said, I've modified neural networks a whole bunch. Usually you can just tweak parameters in a config file but sometimes you want additional outputs or to fundamentally change the model in some way...usually minor tweaks codewise.

16

WigglyHypersurface t1_iw5c2u1 wrote

Thankfully I'm doing niche enough projects I still get to. Last one was a multi-modal iwae for imputing missing data.

79

moist_buckets t1_iw5g8j3 wrote

I’ve never used a pre trained network for anything but my projects are applied to astrophysics which has some very different requirements than NLP or image classification

21

IntelArtiGen t1_iw5gjpi wrote

When needed, I usually take an existing architecture and only adapt small parts of it to solve my task. I also wrote a custom autoencoder layer by layer for audio spectrograms (I didn't find an existing model which could do it with my constraints), and a model to convert embeddings from one self-supervised model to another self-supervised model (it's not a complex architecture) while the three models train simultaneously.

Tbh I would prefer to use existing architectures because re-designing an architecture is long to do, to optimize, and to train, but existing models are often very adapted to one task and perform bad on unexpected new tasks. Also you may have constraints (like real-time, memory efficiency etc.) which are not taken into account in easy-to-reuse published models.

Images have pretrained CNNs, but if you want a model to perform self-supervised continual learning and real-time inference on images with just one RTX, it can be harder to find an existing optimized solution for this task.

52

chatterbox272 t1_iw5jfwn wrote

I'll regularly write custom components, but pretty rarely write whole custom networks. Writing custom prediction heads that capture the task more specifically can improve training efficiency and performance (e.g. doing hierarchical classification where it makes sense, customized suppression postprocessing based on domain knowledge, etc.).

When I do write networks from scratch, they're usually variations on existing architectures anyway. E.g. implementing a 1D or 3D CNN using the same approach as existing 2D CNNs like ResNet or ConvNext. I usually find I'm doing this when I'm in a domain task and don't already have access to pretrained networks that are likely to be reasonable initialisation.

15

DigThatData t1_iw5mqyx wrote

i was implementing something from a paper that didn't have a public implementation and I wanted to play with it.

20

ThatInternetGuy t1_iw5seho wrote

It was just yesterday. Not a custom neural net. It's just taking different neural networks, arranging them in particular orders, and training them.

The last time I coded a neural net from scratch was some 10 years ago when I coded a Genetic Algorithm and Backpropagation Neural Network. Suffice it to say, the AI field has come a long way since.

10

dr_death47 t1_iw5ua5p wrote

Lightweight vision module for simple classification for autonomous cars. Have all layers and their sizes memorized at this point lol.

5

cautioushedonist OP t1_iw5vajt wrote

Great comment! Always helps newbies like me to read about how things used to be done.

I was "born" into these luxuries of huggingface, papers with github repos, and extensive community interaction online that keeps on giving for years. I feel grateful.

7

Zephos65 t1_iw5wauk wrote

First I read the title, then I thought "oh they mean like no outside libraries. Just write a neural net from scratch just using math"

Read the comments and was surprised by the answers.

Went back to the body of your post... oh no

7

AConcernedCoder t1_iw5wdvw wrote

The comments in this thread would leave you thinking hardly anyone uses tensorflow, scikit or pytorch any more, but TF alone averages @ around 500k downloads a day.

9

utopiah t1_iw64365 wrote

Not for a while but I started doing it again, did this little thing for Alameda Research and it went pretty great for a while but ... yeah, maybe I shouldn't. (sarcasm, just being facetious with the latest crypto scandal)

3

evanthebouncy t1_iw66ov5 wrote

  1. I took a bet that all the training and architecture will be subsumed into some centralized company, where you only really have to worry about dataset.

So in a way it paid off. Now I do everything with hugging face transformers and only worry about dataset haha.

6

karius85 t1_iw6737h wrote

I write custom networks and modules all the time in my research. I imagine most of the researchers in my group do as well.

15

chewxy t1_iw68kzn wrote

Regularly. Perks of using my own framework :)

3

piman01 t1_iw6aq0d wrote

Just for learning purposes i wrote code from scratch in python to implement a neural network structure with backpropagation, SGD, with momentum and regularization. It works pretty well. Was able to fit MNIST at 95% on a test set using fully connected architecture.

2

DisWastingMyTime t1_iw6hca9 wrote

Can't sell pretrained models, can't sell models who were trained on public data, can't run modern models on edge devices that the clients want because they want it as cheap as possible.

5

FromUnderTheOcean t1_iw6s6fb wrote

Last week. LSTM to consume both tabular data and a number of text fields. We have tabular transformers and text transformers, but this was easier than tying them both together and trying to make it work.

1

trolls_toll t1_iw6ugg7 wrote

about a week ago. GCNs lesgoooo

2

orthomonas t1_iw6vrt3 wrote

I think the field has gotten mature enough that there's a widening difference between 'using ML as a tool' and 'researching ML'.

As a perhaps poor analogy, it's the difference between an ecologist using an R package to just get some standard ordinations done vs. a statistician developing and writing up a new ordination method.

4

ProdigyManlet t1_iw6wcls wrote

Yeah so many factors involved and little things that aren't ever mentioned in the paper. Implementing a few models and algorithms from papers made me realise how poorly they're written a lot of the time. Code publishing really is essential for validation

10

ProdigyManlet t1_iw6x6sf wrote

Earlier this year wrote a custom U-Net inspired model for semantic image segmentation in PyTorch. This one I designed to be fully modular so that it's really simple to change the architecture from a config file. Writing the code eloquently took some time, I like the idea of generating blocks of layers to keep the model neat rather than layer by layer (e.g. block includes conv layer, bn, activation, etc.). Probably going to try and do a GAN soon

6

Constant_Physics8504 t1_iw73mbb wrote

Almost never, most of the stuff is pre-done and our AI guys usually work on a design and prototype to give to software engineers to write for the company.

Luckily, I’m on that side. We usually grab their model, params and test data and mesh it with our SW to meet the need

1

hcrp2rock t1_iw75bbl wrote

Depending on the underlying data its often not allowed when it comes to commercial use (atleast in germany). So I often use custom LSTMs and stuff.

1

TradyMcTradeface t1_iw7bpz0 wrote

Only when i feel like playing around. Most of the time, fine-tuning OSs models is best effort/value prop

1

sanderbaduk t1_iw7ctql wrote

Mostly custom heads and custom loss functions, more than being concerned with the base model. Implementing theatter is increasingly like implementing your own sort function or something: a learning exercise.

1

tripple13 t1_iw7kfzy wrote

Well, daily.

But I do research.

While you can solve a lot of problems with out-of-the-box models and a bit of fine-tuning, solving problems in a new way often requires custom/new models.

3

Used-Routine-4461 t1_iw7lr3v wrote

Was this for one feature or multiple features and if so did it require chained runs/predictions/equations? I’m currently using a MICE solution but have been looking for something better and this sounds interesting, any relevant papers or material you’d recommend?

2

ThePhantomPhoton t1_iw7n0ee wrote

Not for a few years now— back in the day, one did not simply call MLPClassifier().

3

WigglyHypersurface t1_iw7qykq wrote

Search for MIWAE and notMIWAE to find the papers on the technique.

If your data is small and tabular than you can't really beat bayes. If your data is too big for bayes but just tabular than random forest imputation is pretty good. Or if you have specific hypotheses you know you will test you can do mice with SMCFCS.

The real utility of the (M)IWAE I think is when you have non-tabular data with missings. This is my use case. I have to impute a mixture of audio, string, and tabular data.

2

DigThatData t1_iw7r5ou wrote

this is interesting to me and it sounds like there's probably an opportunity here to develop some pre-trained models specifically to support astrophysicists, especially with JWST hitting the scene.

would you be interested in connecting to discuss the potential opportunity here? just because there aren't currently any 'foundation models' relevant to your work doesn't mean there couldn't be.

1

moist_buckets t1_iw7ryet wrote

My latest project is modeling irregularly sampled multivariate time series using neural differential equations. We want to both reconstruct the time series and also do parameter estimation. The architecture is a variational auto encoder where the decoder is the neural differential equation.

3

IntelArtiGen t1_iw860x6 wrote

  • Task: reproduce how humans learn new words from image and sounds. I used 3 models. For the autoencoder the task was just to rebuild the input (loss is a distance between original spectrogram and rebuilt spectrogram)
  • Input: video (multiple images and sounds in a continuous stream + real-time constraint)
  • Input of the audio autoencoder is the sound from the mic (the mel spectrogram of that sound), output is the mel-spectrogram (autoencoding task).
  • Architecture: for audio I just used convolutions to compress the spectrogram and transposed convolutions to re-build it

So I just stacked multiple convolutions and "deconvolution", I ran some hyperparameter optimization but the architecture is not SOTA (it wasn't the goal), I just needed a model which could autoencode mel-spectrogram of human voices in realtime. I wanted to use a vocal synthetizer but they didn't fit my constraints.

3

denim_duck t1_iw8jo97 wrote

Not since learning the theory.

There are smarter people than me who can write faster/better code. I’m happy to leverage tool like PyTorch to build and customize nets (like changing the embedding layer in a transformer or tweaking some parts of a CNN) but no need to reinvent the wheel

4

marcus_hk t1_iw9gdpi wrote

I designed a custom architecture to model an analog signal processor with lots of different settings combinations. It was a custom MGU (minimal gated unit) that modulates HiPPO memory according to settings embeddings. Can train in parallel, so much faster than, say, a PyTorch GRU.

Another recent design combines convolution and transformers to model spinal CT scans, which is challenging because a single scan can have a shape like (512, 1, 1024, 1024) that is too large to train for dense tasks like segmentation. If you simply resize to a constant shape, then you lose or distort the physical information embedded in the scans. You don't want a scan of the neck to be the same size as a scan of the whole spine, for instance. So you've got to be more clever than that, and something this specialized doesn't come ready to go out of the box.

3

Seankala t1_iwab2nh wrote

I'm also in NLP and I usually just use a pretrained model from HuggingFace as a backbone and build on top of that. It's not usually anything complicated though, maybe an MLP.

1

visarga t1_iwamsbt wrote

You say it's enough to import a pretrained transformer from HuggingFace. I say not even that, you don't need to create a dataset and train a model in most cases, just try a few prompts on GPT-3.

In the last 4 years I worked on an information extraction task, created in-house dataset, and surprise - it seems GPT-3 can solve the task without any fine-tuning. GPT-3 is eating the regular ML engineer and labeller work. What's left to do, just templating prompts in and parsing text out?

1

[deleted] t1_iwcxf6f wrote

Admittedly I'm quite new to ML, but for my master's thesis I wrote several custom CNNs to try and improve on the models of other researchers in the field.

1

unethicalangel t1_iwczta3 wrote

It's actually super rare to deal with custom NN if you have direct customer products. Usually what you see come out of academia is not viable /scalable for any large customer base. You'd be surprised how many baselines power the systems we consider SOTA. Especially in the recommendation systems domain.

1

YamEnvironmental4720 t1_iwge3d2 wrote

A couple of years ago. I was interested in the classification problem for stock price movements. The goal was to predict if the stock yielded positive returns in the future 25-30 days, using daily data of the type provided by Yahoo Finance. I did some feature engineering to derive classical indicators, their moving averages over different time periods and certain normalizations of them so as to have features ranging between 0 and 1. I experimented with various thresholds x and discovered that I get better predictive power by labelling vectors by 1 if the stock returns is at least x %, for some certain x close to 1, than by simply choosing x=0, which means looking only at the direction of the price movement. One drawback, however, was that there was not a clear correlation between the profits and the accuracy of the model: a false positive of, say x/2 %, obviously affected the accuracy in a negative way while it at the same time contributed postively to the profit. Moreover, not defing a recommendation to be a prediction of at least 0.5, but rather something between 0.6 and 0.7 (depending on, for instance the stock index), significantly reduced the number of false positives with negative price movements.

I would still be interested in the question of finding suitable metrics, other than the accuracy, for measuring the performance of the classification algorithm.

1

gamerx88 t1_iwkogi7 wrote

Work in NLP. Ever since HuggingFace became mainstream we almost never had to do this.

We used to have to implement the cutting edge stuff ourselves because papers do not come with code, or give code that requires huge amount of work to run. Now they often appear on HF within a few weeks of publication.

The only occasion in the last 2 or 3 years where I wrote a DNN from scratch was when I had to give a short lecture, for pedagogical reasons.

1

csreid t1_iwlzags wrote

Depending on what you mean by "custom", I still put those things together like legos and fine-tune

Also, I do mostly RL things and a lot of that stuff doesn't have good pretrained things (at least not for my purposes).

1

drML-AI t1_iwu8ls6 wrote

I write custom models all the time in order to run them on low resource hardware. It definitely consumes more time but it is worth it.

1