Submitted by cautioushedonist t3_yto34q in MachineLearning

I work exclusively in NLP and since the transformers and especially their pretrained type took over, I haven't written a neural nets (RNN, LSTM, etc.) in over 3 years and haven't had to worry about things like # of layers, hidden size, etc.

Tabular data has XGBoost, etc. NLP has Pretained Transformers. Images have Pretrained CNNs, Transformers.

But I've been through some ML system design books and recommendation system solutions often display neural nets, so that's interesting.

What was the problem and type of data at hand when you last wrote a neural net yourself, layer by layer?

Thanks y'all!

183

Comments

You must log in or register to comment.

WigglyHypersurface t1_iw5c2u1 wrote

Thankfully I'm doing niche enough projects I still get to. Last one was a multi-modal iwae for imputing missing data.

79

MontanaBananaJCabana t1_iw5kxzt wrote

What's iwae?

23

WigglyHypersurface t1_iw5lak8 wrote

Importance weighted autoencoder.

36

schwagggg t1_iw5vivc wrote

were you able to use the measure valued derivative for poisson? you posted a thread couple months ago

9

WigglyHypersurface t1_iw5xnxa wrote

It's possible I'll use it down the line, but it's not currently in the model.

5

Used-Routine-4461 t1_iw7lr3v wrote

Was this for one feature or multiple features and if so did it require chained runs/predictions/equations? I’m currently using a MICE solution but have been looking for something better and this sounds interesting, any relevant papers or material you’d recommend?

2

WigglyHypersurface t1_iw7qykq wrote

Search for MIWAE and notMIWAE to find the papers on the technique.

If your data is small and tabular than you can't really beat bayes. If your data is too big for bayes but just tabular than random forest imputation is pretty good. Or if you have specific hypotheses you know you will test you can do mice with SMCFCS.

The real utility of the (M)IWAE I think is when you have non-tabular data with missings. This is my use case. I have to impute a mixture of audio, string, and tabular data.

2

IntelArtiGen t1_iw5gjpi wrote

When needed, I usually take an existing architecture and only adapt small parts of it to solve my task. I also wrote a custom autoencoder layer by layer for audio spectrograms (I didn't find an existing model which could do it with my constraints), and a model to convert embeddings from one self-supervised model to another self-supervised model (it's not a complex architecture) while the three models train simultaneously.

Tbh I would prefer to use existing architectures because re-designing an architecture is long to do, to optimize, and to train, but existing models are often very adapted to one task and perform bad on unexpected new tasks. Also you may have constraints (like real-time, memory efficiency etc.) which are not taken into account in easy-to-reuse published models.

Images have pretrained CNNs, but if you want a model to perform self-supervised continual learning and real-time inference on images with just one RTX, it can be harder to find an existing optimized solution for this task.

52

iamnotlefthanded666 t1_iw7vatj wrote

Can you elaborate (task, input, output, architecture) on the audio spectrogram auto encoder thing if you don't mind?

3

IntelArtiGen t1_iw860x6 wrote

  • Task: reproduce how humans learn new words from image and sounds. I used 3 models. For the autoencoder the task was just to rebuild the input (loss is a distance between original spectrogram and rebuilt spectrogram)
  • Input: video (multiple images and sounds in a continuous stream + real-time constraint)
  • Input of the audio autoencoder is the sound from the mic (the mel spectrogram of that sound), output is the mel-spectrogram (autoencoding task).
  • Architecture: for audio I just used convolutions to compress the spectrogram and transposed convolutions to re-build it

So I just stacked multiple convolutions and "deconvolution", I ran some hyperparameter optimization but the architecture is not SOTA (it wasn't the goal), I just needed a model which could autoencode mel-spectrogram of human voices in realtime. I wanted to use a vocal synthetizer but they didn't fit my constraints.

3

moist_buckets t1_iw5g8j3 wrote

I’ve never used a pre trained network for anything but my projects are applied to astrophysics which has some very different requirements than NLP or image classification

21

cautioushedonist OP t1_iw5uajx wrote

Astrophysics is an interesting use-case!

Can you share with us what the data looks like? Is it structured, tabular data?

6

moist_buckets t1_iw7ryet wrote

My latest project is modeling irregularly sampled multivariate time series using neural differential equations. We want to both reconstruct the time series and also do parameter estimation. The architecture is a variational auto encoder where the decoder is the neural differential equation.

3

HallowedAntiquity t1_iw8917k wrote

This sounds cool. I’m a physicist with an interest in ML—would you happen to have a link to a paper?

1

Kurohagane t1_iw6dgck wrote

That sounds cool, what kind of work do you do?

1

moist_buckets t1_iw7s24x wrote

I mainly research quasars (supermassive black holes at the center of galaxies)

1

wavyje t1_iw6o0hp wrote

How did you get into the astrophysics niche? I really would like to in the near future.

1

moist_buckets t1_iw7rdde wrote

I’m doing a PhD in it. That would be the most straightforward way. I’m not sure how feasible it is without a PhD.

2

DigThatData t1_iw7r5ou wrote

this is interesting to me and it sounds like there's probably an opportunity here to develop some pre-trained models specifically to support astrophysicists, especially with JWST hitting the scene.

would you be interested in connecting to discuss the potential opportunity here? just because there aren't currently any 'foundation models' relevant to your work doesn't mean there couldn't be.

1

moist_buckets t1_iw7sjwh wrote

I’m in the process of building pre-trained models to apply to different astronomical surveys. We almost exclusively train on simulated data.

1

DigThatData t1_iw5mqyx wrote

i was implementing something from a paper that didn't have a public implementation and I wanted to play with it.

20

cruddybanana1102 t1_iw6hi7a wrote

From what I have learnt, if the paper doesn't have a github repo where they have shown an implementation, you'll probably never be able to reproduce the results.

43

ProdigyManlet t1_iw6wcls wrote

Yeah so many factors involved and little things that aren't ever mentioned in the paper. Implementing a few models and algorithms from papers made me realise how poorly they're written a lot of the time. Code publishing really is essential for validation

10

entropyvsenergy t1_iw58dge wrote

It's all frameworks now, some better than others. I haven't written one outside of demos or interviews in years. With that being said, I've modified neural networks a whole bunch. Usually you can just tweak parameters in a config file but sometimes you want additional outputs or to fundamentally change the model in some way...usually minor tweaks codewise.

16

chatterbox272 t1_iw5jfwn wrote

I'll regularly write custom components, but pretty rarely write whole custom networks. Writing custom prediction heads that capture the task more specifically can improve training efficiency and performance (e.g. doing hierarchical classification where it makes sense, customized suppression postprocessing based on domain knowledge, etc.).

When I do write networks from scratch, they're usually variations on existing architectures anyway. E.g. implementing a 1D or 3D CNN using the same approach as existing 2D CNNs like ResNet or ConvNext. I usually find I'm doing this when I'm in a domain task and don't already have access to pretrained networks that are likely to be reasonable initialisation.

15

karius85 t1_iw6737h wrote

I write custom networks and modules all the time in my research. I imagine most of the researchers in my group do as well.

15

ThatInternetGuy t1_iw5seho wrote

It was just yesterday. Not a custom neural net. It's just taking different neural networks, arranging them in particular orders, and training them.

The last time I coded a neural net from scratch was some 10 years ago when I coded a Genetic Algorithm and Backpropagation Neural Network. Suffice it to say, the AI field has come a long way since.

10

cautioushedonist OP t1_iw5vajt wrote

Great comment! Always helps newbies like me to read about how things used to be done.

I was "born" into these luxuries of huggingface, papers with github repos, and extensive community interaction online that keeps on giving for years. I feel grateful.

7

AConcernedCoder t1_iw5wdvw wrote

The comments in this thread would leave you thinking hardly anyone uses tensorflow, scikit or pytorch any more, but TF alone averages @ around 500k downloads a day.

9

cautioushedonist OP t1_iw5y4mu wrote

Can most/many of those downloads come from it being in requirements.txt of other widely used repos?

16

utopiah t1_iw646oq wrote

Indeed, makes me wonder what's the share from Transformers alone or untouched Docker images.

4

VVindrunner t1_iw6z4d3 wrote

Yeah, I’ve probably downloaded torch ~30 times this week on various servers as part of setting up a pre trained model.

2

Zephos65 t1_iw5wauk wrote

First I read the title, then I thought "oh they mean like no outside libraries. Just write a neural net from scratch just using math"

Read the comments and was surprised by the answers.

Went back to the body of your post... oh no

7

evanthebouncy t1_iw66ov5 wrote

  1. I took a bet that all the training and architecture will be subsumed into some centralized company, where you only really have to worry about dataset.

So in a way it paid off. Now I do everything with hugging face transformers and only worry about dataset haha.

6

ProdigyManlet t1_iw6x6sf wrote

Earlier this year wrote a custom U-Net inspired model for semantic image segmentation in PyTorch. This one I designed to be fully modular so that it's really simple to change the architecture from a config file. Writing the code eloquently took some time, I like the idea of generating blocks of layers to keep the model neat rather than layer by layer (e.g. block includes conv layer, bn, activation, etc.). Probably going to try and do a GAN soon

6

dr_death47 t1_iw5ua5p wrote

Lightweight vision module for simple classification for autonomous cars. Have all layers and their sizes memorized at this point lol.

5

Valdaora t1_iw68x4h wrote

Is it a variation of the YOLO architecture?

2

DisWastingMyTime t1_iw6hca9 wrote

Can't sell pretrained models, can't sell models who were trained on public data, can't run modern models on edge devices that the clients want because they want it as cheap as possible.

5

orthomonas t1_iw6vrt3 wrote

I think the field has gotten mature enough that there's a widening difference between 'using ML as a tool' and 'researching ML'.

As a perhaps poor analogy, it's the difference between an ecologist using an R package to just get some standard ordinations done vs. a statistician developing and writing up a new ordination method.

4

denim_duck t1_iw8jo97 wrote

Not since learning the theory.

There are smarter people than me who can write faster/better code. I’m happy to leverage tool like PyTorch to build and customize nets (like changing the embedding layer in a transformer or tweaking some parts of a CNN) but no need to reinvent the wheel

4

utopiah t1_iw64365 wrote

Not for a while but I started doing it again, did this little thing for Alameda Research and it went pretty great for a while but ... yeah, maybe I shouldn't. (sarcasm, just being facetious with the latest crypto scandal)

3

tripple13 t1_iw7kfzy wrote

Well, daily.

But I do research.

While you can solve a lot of problems with out-of-the-box models and a bit of fine-tuning, solving problems in a new way often requires custom/new models.

3

ThePhantomPhoton t1_iw7n0ee wrote

Not for a few years now— back in the day, one did not simply call MLPClassifier().

3

marcus_hk t1_iw9gdpi wrote

I designed a custom architecture to model an analog signal processor with lots of different settings combinations. It was a custom MGU (minimal gated unit) that modulates HiPPO memory according to settings embeddings. Can train in parallel, so much faster than, say, a PyTorch GRU.

Another recent design combines convolution and transformers to model spinal CT scans, which is challenging because a single scan can have a shape like (512, 1, 1024, 1024) that is too large to train for dense tasks like segmentation. If you simply resize to a constant shape, then you lose or distort the physical information embedded in the scans. You don't want a scan of the neck to be the same size as a scan of the whole spine, for instance. So you've got to be more clever than that, and something this specialized doesn't come ready to go out of the box.

3

piman01 t1_iw6aq0d wrote

Just for learning purposes i wrote code from scratch in python to implement a neural network structure with backpropagation, SGD, with momentum and regularization. It works pretty well. Was able to fit MNIST at 95% on a test set using fully connected architecture.

2

trolls_toll t1_iw6ugg7 wrote

about a week ago. GCNs lesgoooo

2

FromUnderTheOcean t1_iw6s6fb wrote

Last week. LSTM to consume both tabular data and a number of text fields. We have tabular transformers and text transformers, but this was easier than tying them both together and trying to make it work.

1

Constant_Physics8504 t1_iw73mbb wrote

Almost never, most of the stuff is pre-done and our AI guys usually work on a design and prototype to give to software engineers to write for the company.

Luckily, I’m on that side. We usually grab their model, params and test data and mesh it with our SW to meet the need

1

hcrp2rock t1_iw75bbl wrote

Depending on the underlying data its often not allowed when it comes to commercial use (atleast in germany). So I often use custom LSTMs and stuff.

1

TradyMcTradeface t1_iw7bpz0 wrote

Only when i feel like playing around. Most of the time, fine-tuning OSs models is best effort/value prop

1

sanderbaduk t1_iw7ctql wrote

Mostly custom heads and custom loss functions, more than being concerned with the base model. Implementing theatter is increasingly like implementing your own sort function or something: a learning exercise.

1

Seankala t1_iwab2nh wrote

I'm also in NLP and I usually just use a pretrained model from HuggingFace as a backbone and build on top of that. It's not usually anything complicated though, maybe an MLP.

1

visarga t1_iwamsbt wrote

You say it's enough to import a pretrained transformer from HuggingFace. I say not even that, you don't need to create a dataset and train a model in most cases, just try a few prompts on GPT-3.

In the last 4 years I worked on an information extraction task, created in-house dataset, and surprise - it seems GPT-3 can solve the task without any fine-tuning. GPT-3 is eating the regular ML engineer and labeller work. What's left to do, just templating prompts in and parsing text out?

1

[deleted] t1_iwcxf6f wrote

Admittedly I'm quite new to ML, but for my master's thesis I wrote several custom CNNs to try and improve on the models of other researchers in the field.

1

unethicalangel t1_iwczta3 wrote

It's actually super rare to deal with custom NN if you have direct customer products. Usually what you see come out of academia is not viable /scalable for any large customer base. You'd be surprised how many baselines power the systems we consider SOTA. Especially in the recommendation systems domain.

1

YamEnvironmental4720 t1_iwge3d2 wrote

A couple of years ago. I was interested in the classification problem for stock price movements. The goal was to predict if the stock yielded positive returns in the future 25-30 days, using daily data of the type provided by Yahoo Finance. I did some feature engineering to derive classical indicators, their moving averages over different time periods and certain normalizations of them so as to have features ranging between 0 and 1. I experimented with various thresholds x and discovered that I get better predictive power by labelling vectors by 1 if the stock returns is at least x %, for some certain x close to 1, than by simply choosing x=0, which means looking only at the direction of the price movement. One drawback, however, was that there was not a clear correlation between the profits and the accuracy of the model: a false positive of, say x/2 %, obviously affected the accuracy in a negative way while it at the same time contributed postively to the profit. Moreover, not defing a recommendation to be a prediction of at least 0.5, but rather something between 0.6 and 0.7 (depending on, for instance the stock index), significantly reduced the number of false positives with negative price movements.

I would still be interested in the question of finding suitable metrics, other than the accuracy, for measuring the performance of the classification algorithm.

1

gamerx88 t1_iwkogi7 wrote

Work in NLP. Ever since HuggingFace became mainstream we almost never had to do this.

We used to have to implement the cutting edge stuff ourselves because papers do not come with code, or give code that requires huge amount of work to run. Now they often appear on HF within a few weeks of publication.

The only occasion in the last 2 or 3 years where I wrote a DNN from scratch was when I had to give a short lecture, for pedagogical reasons.

1

csreid t1_iwlzags wrote

Depending on what you mean by "custom", I still put those things together like legos and fine-tune

Also, I do mostly RL things and a lot of that stuff doesn't have good pretrained things (at least not for my purposes).

1

drML-AI t1_iwu8ls6 wrote

I write custom models all the time in order to run them on low resource hardware. It definitely consumes more time but it is worth it.

1