WigglyHypersurface t1_iw5c2u1 wrote on November 13, 2022 at 2:02 AM

Thankfully I'm doing niche enough projects I still get to. Last one was a multi-modal iwae for imputing missing data.

MontanaBananaJCabana t1_iw5kxzt wrote on November 13, 2022 at 3:15 AM

What's iwae?

WigglyHypersurface t1_iw5lak8 wrote on November 13, 2022 at 3:18 AM

Importance weighted autoencoder.

schwagggg t1_iw5vivc wrote on November 13, 2022 at 4:47 AM

were you able to use the measure valued derivative for poisson? you posted a thread couple months ago

WigglyHypersurface t1_iw5xnxa wrote on November 13, 2022 at 5:08 AM

It's possible I'll use it down the line, but it's not currently in the model.

Used-Routine-4461 t1_iw7lr3v wrote on November 13, 2022 at 4:09 PM

Was this for one feature or multiple features and if so did it require chained runs/predictions/equations? I’m currently using a MICE solution but have been looking for something better and this sounds interesting, any relevant papers or material you’d recommend?

WigglyHypersurface t1_iw7qykq wrote on November 13, 2022 at 4:45 PM

Search for MIWAE and notMIWAE to find the papers on the technique.

If your data is small and tabular than you can't really beat bayes. If your data is too big for bayes but just tabular than random forest imputation is pretty good. Or if you have specific hypotheses you know you will test you can do mice with SMCFCS.

The real utility of the (M)IWAE I think is when you have non-tabular data with missings. This is my use case. I have to impute a mixture of audio, string, and tabular data.

Used-Routine-4461 t1_iw8f2ms wrote on November 13, 2022 at 7:22 PM

Awesome thank you!

IntelArtiGen t1_iw5gjpi wrote on November 13, 2022 at 2:38 AM

When needed, I usually take an existing architecture and only adapt small parts of it to solve my task. I also wrote a custom autoencoder layer by layer for audio spectrograms (I didn't find an existing model which could do it with my constraints), and a model to convert embeddings from one self-supervised model to another self-supervised model (it's not a complex architecture) while the three models train simultaneously.

Tbh I would prefer to use existing architectures because re-designing an architecture is long to do, to optimize, and to train, but existing models are often very adapted to one task and perform bad on unexpected new tasks. Also you may have constraints (like real-time, memory efficiency etc.) which are not taken into account in easy-to-reuse published models.

Images have pretrained CNNs, but if you want a model to perform self-supervised continual learning and real-time inference on images with just one RTX, it can be harder to find an existing optimized solution for this task.

iamnotlefthanded666 t1_iw7vatj wrote on November 13, 2022 at 5:14 PM

Can you elaborate (task, input, output, architecture) on the audio spectrogram auto encoder thing if you don't mind?

IntelArtiGen t1_iw860x6 wrote on November 13, 2022 at 6:25 PM

Task: reproduce how humans learn new words from image and sounds. I used 3 models. For the autoencoder the task was just to rebuild the input (loss is a distance between original spectrogram and rebuilt spectrogram)
Input: video (multiple images and sounds in a continuous stream + real-time constraint)
Input of the audio autoencoder is the sound from the mic (the mel spectrogram of that sound), output is the mel-spectrogram (autoencoding task).
Architecture: for audio I just used convolutions to compress the spectrogram and transposed convolutions to re-build it

So I just stacked multiple convolutions and "deconvolution", I ran some hyperparameter optimization but the architecture is not SOTA (it wasn't the goal), I just needed a model which could autoencode mel-spectrogram of human voices in realtime. I wanted to use a vocal synthetizer but they didn't fit my constraints.

iamnotlefthanded666 t1_iwdsa30 wrote on November 14, 2022 at 10:05 PM

Thanks on elaborative answer.

moist_buckets t1_iw5g8j3 wrote on November 13, 2022 at 2:36 AM

I’ve never used a pre trained network for anything but my projects are applied to astrophysics which has some very different requirements than NLP or image classification

cautioushedonist OP t1_iw5uajx wrote on November 13, 2022 at 4:36 AM

Astrophysics is an interesting use-case!

Can you share with us what the data looks like? Is it structured, tabular data?

moist_buckets t1_iw7ryet wrote on November 13, 2022 at 4:52 PM

My latest project is modeling irregularly sampled multivariate time series using neural differential equations. We want to both reconstruct the time series and also do parameter estimation. The architecture is a variational auto encoder where the decoder is the neural differential equation.

HallowedAntiquity t1_iw8917k wrote on November 13, 2022 at 6:44 PM

This sounds cool. I’m a physicist with an interest in ML—would you happen to have a link to a paper?

moist_buckets t1_iw8b39a wrote on November 13, 2022 at 6:57 PM

No not yet haha. I’m still working on it so it’ll be probably 6 months until it’s actually published.

HallowedAntiquity t1_iw8bjjk wrote on November 13, 2022 at 7:00 PM

Cool. Good luck.

Kurohagane t1_iw6dgck wrote on November 13, 2022 at 8:16 AM

That sounds cool, what kind of work do you do?

moist_buckets t1_iw7s24x wrote on November 13, 2022 at 4:53 PM

I mainly research quasars (supermassive black holes at the center of galaxies)

wavyje t1_iw6o0hp wrote on November 13, 2022 at 10:46 AM

How did you get into the astrophysics niche? I really would like to in the near future.

moist_buckets t1_iw7rdde wrote on November 13, 2022 at 4:48 PM

I’m doing a PhD in it. That would be the most straightforward way. I’m not sure how feasible it is without a PhD.

DigThatData t1_iw7r5ou wrote on November 13, 2022 at 4:47 PM

this is interesting to me and it sounds like there's probably an opportunity here to develop some pre-trained models specifically to support astrophysicists, especially with JWST hitting the scene.

would you be interested in connecting to discuss the potential opportunity here? just because there aren't currently any 'foundation models' relevant to your work doesn't mean there couldn't be.

moist_buckets t1_iw7sjwh wrote on November 13, 2022 at 4:56 PM

I’m in the process of building pre-trained models to apply to different astronomical surveys. We almost exclusively train on simulated data.

DigThatData t1_iw5mqyx wrote on November 13, 2022 at 3:30 AM

i was implementing something from a paper that didn't have a public implementation and I wanted to play with it.

cruddybanana1102 t1_iw6hi7a wrote on November 13, 2022 at 9:14 AM

From what I have learnt, if the paper doesn't have a github repo where they have shown an implementation, you'll probably never be able to reproduce the results.

ProdigyManlet t1_iw6wcls wrote on November 13, 2022 at 12:33 PM

Yeah so many factors involved and little things that aren't ever mentioned in the paper. Implementing a few models and algorithms from papers made me realise how poorly they're written a lot of the time. Code publishing really is essential for validation

entropyvsenergy t1_iw58dge wrote on November 13, 2022 at 1:33 AM

It's all frameworks now, some better than others. I haven't written one outside of demos or interviews in years. With that being said, I've modified neural networks a whole bunch. Usually you can just tweak parameters in a config file but sometimes you want additional outputs or to fundamentally change the model in some way...usually minor tweaks codewise.

chatterbox272 t1_iw5jfwn wrote on November 13, 2022 at 3:02 AM

I'll regularly write custom components, but pretty rarely write whole custom networks. Writing custom prediction heads that capture the task more specifically can improve training efficiency and performance (e.g. doing hierarchical classification where it makes sense, customized suppression postprocessing based on domain knowledge, etc.).

When I do write networks from scratch, they're usually variations on existing architectures anyway. E.g. implementing a 1D or 3D CNN using the same approach as existing 2D CNNs like ResNet or ConvNext. I usually find I'm doing this when I'm in a domain task and don't already have access to pretrained networks that are likely to be reasonable initialisation.

karius85 t1_iw6737h wrote on November 13, 2022 at 6:51 AM

I write custom networks and modules all the time in my research. I imagine most of the researchers in my group do as well.

ThatInternetGuy t1_iw5seho wrote on November 13, 2022 at 4:19 AM

It was just yesterday. Not a custom neural net. It's just taking different neural networks, arranging them in particular orders, and training them.

The last time I coded a neural net from scratch was some 10 years ago when I coded a Genetic Algorithm and Backpropagation Neural Network. Suffice it to say, the AI field has come a long way since.

cautioushedonist OP t1_iw5vajt wrote on November 13, 2022 at 4:45 AM

Great comment! Always helps newbies like me to read about how things used to be done.

I was "born" into these luxuries of huggingface, papers with github repos, and extensive community interaction online that keeps on giving for years. I feel grateful.

AConcernedCoder t1_iw5wdvw wrote on November 13, 2022 at 4:56 AM

The comments in this thread would leave you thinking hardly anyone uses tensorflow, scikit or pytorch any more, but TF alone averages @ around 500k downloads a day.

cautioushedonist OP t1_iw5y4mu wrote on November 13, 2022 at 5:13 AM

Can most/many of those downloads come from it being in requirements.txt of other widely used repos?

utopiah t1_iw646oq wrote on November 13, 2022 at 6:16 AM

Indeed, makes me wonder what's the share from Transformers alone or untouched Docker images.

VVindrunner t1_iw6z4d3 wrote on November 13, 2022 at 1:04 PM

Yeah, I’ve probably downloaded torch ~30 times this week on various servers as part of setting up a pre trained model.

Zephos65 t1_iw5wauk wrote on November 13, 2022 at 4:55 AM

First I read the title, then I thought "oh they mean like no outside libraries. Just write a neural net from scratch just using math"

Read the comments and was surprised by the answers.

Went back to the body of your post... oh no

cautioushedonist OP t1_iw5xlhh wrote on November 13, 2022 at 5:07 AM

I am only slightly old fashioned :)

WhyDoIHaveAnAccount9 t1_iw5934q wrote on November 13, 2022 at 1:39 AM

Ha! Never

evanthebouncy t1_iw66ov5 wrote on November 13, 2022 at 6:46 AM

I took a bet that all the training and architecture will be subsumed into some centralized company, where you only really have to worry about dataset.

So in a way it paid off. Now I do everything with hugging face transformers and only worry about dataset haha.

ProdigyManlet t1_iw6x6sf wrote on November 13, 2022 at 12:43 PM

Earlier this year wrote a custom U-Net inspired model for semantic image segmentation in PyTorch. This one I designed to be fully modular so that it's really simple to change the architecture from a config file. Writing the code eloquently took some time, I like the idea of generating blocks of layers to keep the model neat rather than layer by layer (e.g. block includes conv layer, bn, activation, etc.). Probably going to try and do a GAN soon

dr_death47 t1_iw5ua5p wrote on November 13, 2022 at 4:36 AM

Lightweight vision module for simple classification for autonomous cars. Have all layers and their sizes memorized at this point lol.

Valdaora t1_iw68x4h wrote on November 13, 2022 at 7:14 AM

Is it a variation of the YOLO architecture?

crazymonezyy t1_iw69a2p wrote on November 13, 2022 at 7:19 AM

grad school

DisWastingMyTime t1_iw6hca9 wrote on November 13, 2022 at 9:11 AM

Can't sell pretrained models, can't sell models who were trained on public data, can't run modern models on edge devices that the clients want because they want it as cheap as possible.

orthomonas t1_iw6vrt3 wrote on November 13, 2022 at 12:27 PM

I think the field has gotten mature enough that there's a widening difference between 'using ML as a tool' and 'researching ML'.

As a perhaps poor analogy, it's the difference between an ecologist using an R package to just get some standard ordinations done vs. a statistician developing and writing up a new ordination method.

denim_duck t1_iw8jo97 wrote on November 13, 2022 at 7:51 PM

Not since learning the theory.

There are smarter people than me who can write faster/better code. I’m happy to leverage tool like PyTorch to build and customize nets (like changing the embedding layer in a transformer or tweaking some parts of a CNN) but no need to reinvent the wheel

utopiah t1_iw64365 wrote on November 13, 2022 at 6:15 AM

Not for a while but I started doing it again, did this little thing for Alameda Research and it went pretty great for a while but ... yeah, maybe I shouldn't. (sarcasm, just being facetious with the latest crypto scandal)

chewxy t1_iw68kzn wrote on November 13, 2022 at 7:10 AM

Regularly. Perks of using my own framework :)

cautioushedonist OP t1_iw6bez1 wrote on November 13, 2022 at 7:48 AM

Shameless (/s) of you to not plug it in, if it's public.

chewxy t1_iw6f6s9 wrote on November 13, 2022 at 8:41 AM

Oh it's.Gorgonia

tripple13 t1_iw7kfzy wrote on November 13, 2022 at 4:00 PM

Well, daily.

But I do research.

While you can solve a lot of problems with out-of-the-box models and a bit of fine-tuning, solving problems in a new way often requires custom/new models.

ThePhantomPhoton t1_iw7n0ee wrote on November 13, 2022 at 4:18 PM

Not for a few years now— back in the day, one did not simply call MLPClassifier().

marcus_hk t1_iw9gdpi wrote on November 13, 2022 at 11:29 PM

I designed a custom architecture to model an analog signal processor with lots of different settings combinations. It was a custom MGU (minimal gated unit) that modulates HiPPO memory according to settings embeddings. Can train in parallel, so much faster than, say, a PyTorch GRU.

Another recent design combines convolution and transformers to model spinal CT scans, which is challenging because a single scan can have a shape like (512, 1, 1024, 1024) that is too large to train for dense tasks like segmentation. If you simply resize to a constant shape, then you lose or distort the physical information embedded in the scans. You don't want a scan of the neck to be the same size as a scan of the whole spine, for instance. So you've got to be more clever than that, and something this specialized doesn't come ready to go out of the box.

piman01 t1_iw6aq0d wrote on November 13, 2022 at 7:39 AM

Just for learning purposes i wrote code from scratch in python to implement a neural network structure with backpropagation, SGD, with momentum and regularization. It works pretty well. Was able to fit MNIST at 95% on a test set using fully connected architecture.

trolls_toll t1_iw6ugg7 wrote on November 13, 2022 at 12:11 PM

about a week ago. GCNs lesgoooo

[deleted] t1_iw7r61b wrote on November 13, 2022 at 4:47 PM

[deleted]

FromUnderTheOcean t1_iw6s6fb wrote on November 13, 2022 at 11:42 AM

Last week. LSTM to consume both tabular data and a number of text fields. We have tabular transformers and text transformers, but this was easier than tying them both together and trying to make it work.

Constant_Physics8504 t1_iw73mbb wrote on November 13, 2022 at 1:48 PM

Almost never, most of the stuff is pre-done and our AI guys usually work on a design and prototype to give to software engineers to write for the company.

Luckily, I’m on that side. We usually grab their model, params and test data and mesh it with our SW to meet the need

hcrp2rock t1_iw75bbl wrote on November 13, 2022 at 2:04 PM

Depending on the underlying data its often not allowed when it comes to commercial use (atleast in germany). So I often use custom LSTMs and stuff.

TradyMcTradeface t1_iw7bpz0 wrote on November 13, 2022 at 2:56 PM

Only when i feel like playing around. Most of the time, fine-tuning OSs models is best effort/value prop

sanderbaduk t1_iw7ctql wrote on November 13, 2022 at 3:05 PM

Mostly custom heads and custom loss functions, more than being concerned with the base model. Implementing theatter is increasingly like implementing your own sort function or something: a learning exercise.

Seankala t1_iwab2nh wrote on November 14, 2022 at 3:30 AM

I'm also in NLP and I usually just use a pretrained model from HuggingFace as a backbone and build on top of that. It's not usually anything complicated though, maybe an MLP.

visarga t1_iwamsbt wrote on November 14, 2022 at 5:20 AM

You say it's enough to import a pretrained transformer from HuggingFace. I say not even that, you don't need to create a dataset and train a model in most cases, just try a few prompts on GPT-3.

In the last 4 years I worked on an information extraction task, created in-house dataset, and surprise - it seems GPT-3 can solve the task without any fine-tuning. GPT-3 is eating the regular ML engineer and labeller work. What's left to do, just templating prompts in and parsing text out?

[deleted] t1_iwcxf6f wrote on November 14, 2022 at 6:43 PM

Admittedly I'm quite new to ML, but for my master's thesis I wrote several custom CNNs to try and improve on the models of other researchers in the field.

unethicalangel t1_iwczta3 wrote on November 14, 2022 at 6:58 PM

It's actually super rare to deal with custom NN if you have direct customer products. Usually what you see come out of academia is not viable /scalable for any large customer base. You'd be surprised how many baselines power the systems we consider SOTA. Especially in the recommendation systems domain.

YamEnvironmental4720 t1_iwge3d2 wrote on November 15, 2022 at 1:18 PM

A couple of years ago. I was interested in the classification problem for stock price movements. The goal was to predict if the stock yielded positive returns in the future 25-30 days, using daily data of the type provided by Yahoo Finance. I did some feature engineering to derive classical indicators, their moving averages over different time periods and certain normalizations of them so as to have features ranging between 0 and 1. I experimented with various thresholds x and discovered that I get better predictive power by labelling vectors by 1 if the stock returns is at least x %, for some certain x close to 1, than by simply choosing x=0, which means looking only at the direction of the price movement. One drawback, however, was that there was not a clear correlation between the profits and the accuracy of the model: a false positive of, say x/2 %, obviously affected the accuracy in a negative way while it at the same time contributed postively to the profit. Moreover, not defing a recommendation to be a prediction of at least 0.5, but rather something between 0.6 and 0.7 (depending on, for instance the stock index), significantly reduced the number of false positives with negative price movements.

I would still be interested in the question of finding suitable metrics, other than the accuracy, for measuring the performance of the classification algorithm.

gamerx88 t1_iwkogi7 wrote on November 16, 2022 at 10:05 AM

Work in NLP. Ever since HuggingFace became mainstream we almost never had to do this.

We used to have to implement the cutting edge stuff ourselves because papers do not come with code, or give code that requires huge amount of work to run. Now they often appear on HF within a few weeks of publication.

The only occasion in the last 2 or 3 years where I wrote a DNN from scratch was when I had to give a short lecture, for pedagogical reasons.

csreid t1_iwlzags wrote on November 16, 2022 at 5:02 PM

Depending on what you mean by "custom", I still put those things together like legos and fine-tune

Also, I do mostly RL things and a lot of that stuff doesn't have good pretrained things (at least not for my purposes).

drML-AI t1_iwu8ls6 wrote on November 18, 2022 at 11:47 AM

I write custom models all the time in order to run them on low resource hardware. It definitely consumes more time but it is worth it.

[deleted] t1_iw6xvl6 wrote on November 13, 2022 at 12:50 PM

[deleted]

Comments