chief167 t1_j9ku5mq wrote

I don't think it implies that all datasets are equally likely. I think it only implies that given all possible datasets, there is no best approach to modelling them. All possible != All are equally likely

But I don't have my book with me, and I do t trust the internet since it seems to lead to random blogposts instead of the original paper (Wikipedia gave a 404 in the footnotes)


chief167 t1_j9jev01 wrote

Define scale

Language models? Sure. Images? Sure. Huge amounts of transaction data to search for fraud? Xgboost all the way lol.

Church no free lunch theorem: there is no single approach best for every possible problem. Djeezes I hate it when marketing takes over. You learn this principle in the first chapter of literally every data course


chief167 t1_j45eo3z wrote

AI is basically decision making. Giving information, how does a machine learn from its environment, take decisions, without human oversight. How does a machine adapt itself with more experience.

ML is just a way to create models.

For example the SLAM algorithm is an important algorithm in AI, because it allows robots to map their environment. However, this is not ML at all.

Another example of AI is knowledge graphs, like the earliest chess engines. A perfect chess AI can be made without any machine learning at all.

It's important to keep making the distinction.


chief167 t1_j3borij wrote

First thought: decide for yourself who your target audience is

If you hope to sell this to companies, or even start-ups, be prepared for a lot of questions around data governance, security, ....

Second: do you have an idea how many users you need for break even and how the infrastructure needs to scale to cope with that? Gpu's aren't cheap of course, neither is electricity or cloud providers


chief167 t1_j027p7v wrote

Don't waste your time. Check datarobot (and H2O is the closest competition).

Everybody else plainly sucks at automl, sorry to put it so bluntly but it's true

I am a happy customer of them, and it took a mountain of effort to convince our it teams to move away from Microsoft and databricks etc..., But the results were just in another ballpark, so we had a strong business case


chief167 t1_j027csj wrote

Honestly, if you want decent automl results, you should only consider datarobot. Everything else is noticeably worse

We are a customer of them and it's a game changer. Yes it's expensive and not aimed at hobbyists, and it's like super expensive. But it's good

If I find the time, I shall upload this dataset into our system and check the results. Remind me later if I forget


chief167 t1_isuculx wrote

Ok yeah well that's stupid. Because I am actually in favour of column names instead of indexes. Indexes are pain in the ass when your incoming dataframe changes, it creates an implicit dependency.

But your last line is my point. You shouldn't be concerned about MLops stuff, but if your models is already in the right framework, it saves soooo much time


chief167 t1_istrptp wrote

As someone who sometimes has to hire people, perhaps this is the issue:

Imagine how difficult it is for big companies to get a MLOps framework going, with all the red tape and scattered IT systems. It was very painful where I work. In the end we got something working using a python platform that really needs you to use pandas and sklearn type interfaces.

Let's hypothetically say you are a great data scientist using R, or Sas or MATLAB or ... If I don't have a lot of options I'd hire you and put you on a training program for our framework. But if I have multiple decent candidates, and some don't require retraining, yeah imma gonna pick one of them. I am not spending 2 months trying to get compliance and cybersec to approve your docker container with R code in it, if I can have a similar model in our pre-approved workflow.


chief167 t1_is9lmou wrote

Would you ask a doctor for a few online resources on your medical treatment? You are completely disrespectful towards the architecture profession if you think a few online resources are all you need to get started, and that you'll just fix it in the future by hiring an architect to clean up your mess.


chief167 t1_is9lh2u wrote

I will give you free advice, for once: don't trust any of the online simple architecture articles and take free advice. It will cost you much more in the end

Designing an architecture for your company is a multi week project, with lots of nuances and decisions. There is no best option, and especially without your business context it's literally impossible to recommend something good. From your question, it's clear your architecture expertise is very low

Do yourself a favour and get an architect, preferably from a dedicated local data specific consulting company and not a big box all-rounder like Accenture or TCS. Expect day rates of 1200-1500 if it's a short term project.

What were you expecting, someone to say 'just use Google cloud', and just go with it?

Don't fool yourself, if you are not ready to build an architecture, don't. Youll have to start over next year. You were hired for the wrong job then