jobeta

jobeta t1_jbhg8xy wrote

Right. But to be able to assess this you need to define a task and evaluate your model’s performance to perform that task. Embedding accuracy cannot be discussed completely in the ether. Even the most general comments you will read about a model beating another, will refer to that new model performing better for specific tasks on benchmark datasets.

It would be a lot easier to help you if you explained what you are trying to accomplish that requires “higher accuracy”

1

jobeta t1_jbgnfsq wrote

> a model that can generate more accurate embeddings in general

What do you mean by this? Accuracy is an evaluation metric that doesn't really mean anything "in general" but only with respect to a specific prediction being made. I think this is a slightly ill-posed question

1

jobeta t1_jb3hh74 wrote

This is cool and I haven’t finished reading it yet but, intuitively, isn’t that roughly equivalent to have a higher learning rate in the beginning? You make the learning algorithm purposefully imprecise at the beginning to explore quickly the loss landscape and later on, once a rough approximation of a minimum has been found, you are able to explore more carefully to look for a deeper minimum or something? Like the dropout introduces noise doesn’t it?

6

jobeta t1_ja1jdgc wrote

You don’t need code. You can use a service for that. Check Descript overdub for instance. Or whatever other similar thing you can find. I’m not affiliated with them but saw a demo. It will be done overnight after you spend 20 min reading some text.

1

jobeta t1_ja0m0og wrote

Yes but just pick two or three and ask? Also check on Amazon mechanical Turk if you find labeling job listed and the rates. I have only needed this one but used upwork. We paid well and it was a while ago so I don’t think the price I will give you will be a good reference.

1

jobeta t1_ja08xy1 wrote

I don’t think there is a general answer to that. For labeling there are multiple services that you can use. You could just contact them and ask or look if they advertise how much they pay people to label to get a proxy. For the data itself, it completely depends on the data. I would imagine medical data would be hard to obtain and require some legal consideration around privacy (at least I would hope so).

3

jobeta t1_j62eibb wrote

IMHO the buzz is mainly around the UX provided by ChatGPT. Most LLMs are not that easily accessible and most people never get to experience any aha moment with them, so most people don't care. As for Google, I do think there is real but not immediate danger for their business model. The big issue for them is that 60% of their revenue comes from ads in Google search, so rolling out an amazing ChatGPT equivalent could potentially hurt their business. They would have to rethink the entire model. For now and AFAIK, ChatGPT doesn't provide web links so it doesn't feel like it is trying to sell you something. If Google if going to use one of their SOTA LLM and build a conversational AI out of it and make it available for free, surely they have to consider the implications for Alphabet as a whole.

3

jobeta t1_izyks32 wrote

Here is my 2 cents:

They have a process, it is slow but it does the job. So start by saving them several hours and help automate their current process. This is valuable (saving manual labor) and will already be challenging: you will have to sit with them and understand how they do it. If it takes several hours, it is unlikely that they have a deterministic algorithm for it. To create one, you will likely have to have them make a number of decisions. You can probably help them make them. The outcome should be an algorithm, as simple as possible, that can perform the assignment of associates to teams.

Once you have solved this problem for them, you can think about ways to improve the assignment. But this open a very different can of worms. What does better mean? How do I measure that it is better? You will have to define some meaningful metrics (make sure they define them or that they definitely sign off on them) to be able to compare different assignment algorithms. Because you have so few teams, it will be pretty difficult to design a rigorous experiment that helps you determine if your new assignment algorithm beats the baseline. You can always come up with some fancy algorithm but how do you prove it works better? Some associates will say they don't like the new system some will like it. Who should we believe?Not to mention that you'll want to be able to track teams easily to be able to run some analytics. Chances are you'll have to build tracking for the teams. It might not be worth your time.

I've spent months trying to do things like this. The main challenge is that the Ops team wanted something better but never wanted to invest into defining or measuring what better was.

Alternatively you can keep adding simple constraints to your model that satisfies their intuition but that's not exactly Machine Learning so I would try to not get stuck in that position.

Good luck!

2

jobeta t1_izy9slt wrote

This is mostly a constrained optimization problem. It could benefit from ML if you need to predict some of the variables you’re optimizing for I guess? How many teams? How big are they? It’s hard to help you without details.

2

jobeta t1_iwlin8p wrote

I’m not sure what you mean by wrap that up. What programming language are you using? Python? You can create a pandas dataframe that contains one row per subject and per time stamp and as many columns as measurements. Then depending on the problem, you would transform that and engineer features that work with the type of model you’re thinking using.

1