marcus_hk t1_jdmpb8u wrote on March 25, 2023 at 3:25 PM

Reply to [D] Do you use a website or program to organise and annotate your papers? by who_here_condemns_me

Overleaf

marcus_hk t1_jdfjp25 wrote on March 24, 2023 at 12:50 AM

Reply to [N] ChatGPT plugins by Singularian2501

How is this different from prompt engineering with langchain? They don't say.

marcus_hk t1_jcrgwqm wrote on March 19, 2023 at 12:09 AM

Reply to [P] Web Stable Diffusion by crowwork

Just browsing on my phone and haven’t dug deep yet, but in the notebook it says that build.py targets M2 by default but can also target CUDA. What about CPU?

I’d love to see a super minimal example, like running a small nn.Linear layer, for pedagogical purposes and to abstract away the complexity of a larger model like Stable Diffusion.

marcus_hk t1_jcrdufd wrote on March 18, 2023 at 11:46 PM

Reply to comment by race2tb in [P] Web Stable Diffusion by crowwork

For weights, yes, and for inference. If you can decompose and distribute a model across enough nodes, then you can get meaningful compute out of CPUs too — for instance for tokenization and smaller models.

marcus_hk t1_jan8rmh wrote on March 2, 2023 at 5:22 PM

Reply to [D] Are Genetic Algorithms Dead? by TobusFire

They might see a resurgence in dynamic multi-agent environments.

marcus_hk t1_j9gij1a wrote on February 21, 2023 at 7:59 PM

Reply to [P] The First Depthwise-separable Convolution Animation by Animated-AI

Looks great. Might not be intelligible to those who don't know what they're looking at, though. Maybe include labels of, say, filters, what each slice of input represents, etc.?

Would like to see the same for normalization layers. And RNNs. And transformers. Keep it up!

marcus_hk t1_j9g5hns wrote on February 21, 2023 at 6:05 PM

Reply to [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics

Seems it shouldn't be too difficult to run one stage or layer at a time and cache intermediate results.

marcus_hk t1_j8ypfa6 wrote on February 17, 2023 at 10:14 PM

Reply to [D] Is anyone working on ML models that infer and train at the same time? by Cogwheel

Take a look at Hebbian and adaptive resonance models. No backprop, no distinct training/inference phases.

marcus_hk t1_j8ejn0n wrote on February 13, 2023 at 6:57 PM

Reply to comment by Reasonable_Ad_6572 in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho

Which part do you disagree with here:

My unwavering opinion on current (auto-regressive) LLMs

They are useful as writing aids.
They are "reactive" & don't plan nor reason.
They make stuff up or retrieve stuff approximately.
That can be mitigated but not fixed by human feedback.
Better systems will come

https://twitter.com/ylecun/status/1625118108082995203?s=20

marcus_hk t1_j7lqpav wrote on February 7, 2023 at 6:42 PM

Reply to [D] Which is the fastest and lightweight ultra realistic TTS for real-time voice cloning? by akshaysri0001

I haven't been keeping up with TTS since Tacotron 2, but it seems Eleven Labs works fundamentally the same way.

As for real-time performance you may need to port your Python code to C++.

marcus_hk t1_j2xqe50 wrote on January 4, 2023 at 6:08 PM

Reply to [D] Transformer effectiveness for time series forecasting (doubts) by AttentionImaginary54

>Are there other recent deep learning based alternatives?

Structured State Space Models

Transformers seem best suited to forming associations among discrete elements. That's what self-attention is, after all. Where transformers perform well over very long ranges (in audio generation for example) there is typically heavy use of Fourier transforms and CNNs as "feature extractors", and the transformer does not process raw data directly.

The S4 model linked above treats time-series data, not as discrete samples, but as continuous signal. Consequently it works much better.

marcus_hk t1_iwp6dtu wrote on November 17, 2022 at 8:33 AM

Reply to [R] The Near Future of AI is Action-Driven by hardmaru

A model that takes actions to minimize uncertainty will appear to be curious. Intelligent sampling of the input space is the way to go.

marcus_hk t1_iw9gdpi wrote on November 13, 2022 at 11:29 PM

Reply to [D] When was the last time you wrote a custom neural net? by cautioushedonist

I designed a custom architecture to model an analog signal processor with lots of different settings combinations. It was a custom MGU (minimal gated unit) that modulates HiPPO memory according to settings embeddings. Can train in parallel, so much faster than, say, a PyTorch GRU.

Another recent design combines convolution and transformers to model spinal CT scans, which is challenging because a single scan can have a shape like (512, 1, 1024, 1024) that is too large to train for dense tasks like segmentation. If you simply resize to a constant shape, then you lose or distort the physical information embedded in the scans. You don't want a scan of the neck to be the same size as a scan of the whole spine, for instance. So you've got to be more clever than that, and something this specialized doesn't come ready to go out of the box.

marcus_hk t1_ispjwcw wrote on October 17, 2022 at 7:14 PM

Reply to [D] Video Tracking vs Image detection by Dense-Smf-6032

Object detection involves a predetermined class of objects.

Video tracking means the tracking of any arbitrary thing with a bounding box around it. There are no classes per se.

marcus_hk t1_isp9ggz wrote on October 17, 2022 at 6:06 PM

Reply to [D] What is the deal with breast cancer scans? by Overall-Importance54

Healthcare in the USA is not a competitive market. It's a government-sponsored cartel run by large institutional fiefdoms. It is heavily regulated and there is little incentive to innovate.

marcus_hk t1_isgbpv9 wrote on October 15, 2022 at 7:39 PM

Reply to comment by radio_wave in [D] Interpolation in medical imaging? by Delacroid

If you have a dense 3D image, as in CT, then there is really no distinction between "within image" and "across slices" because these are the same thing, just along a different axis. Of course with sparse MRI slices, though, you're right.