Tgs91

Tgs91 t1_jdvhofw wrote

Never heard of the show until it started getting promoted on Hulu to advertise the new season. Watched the old seasons and it was great. I'm not gonna subscribe to Starz for it though, I already have way too many streaming services. I'll watch it when they put it on Hulu

4

Tgs91 t1_j954pam wrote

If you work in a job where you're frequently asked to apply your code using different cloud environments (AWS, Azure, Google, local machines, etc, etc), then it's good to dev/test code locally and have a mix of Windows and Mac on your team. If your tests pass on both Mac and Windows, then they'll probably also pass on just about any Linux based environment in a cloud service. Dev local, train on cloud with minimal debugging because you pay by the hour.

3

Tgs91 t1_j5bemlb wrote

Piggybacking on this. There are a lot of data science jobs that hire out of undergrad. A lot of the work is actually more data analysis and simple business analytics, but you can get started with a broad data science role, learn on the job, and specialize in ML

4

Tgs91 t1_j2na3mm wrote

As a student, you should take the time to work on code cleanup. Usually I see students use one big training script that has a lot going on. For my projects I typically build out a pip installable module with submodules for preprocessing/structuring raw data, model building with lots of kwargs so it can be customized, dataset objects with transformations or randomness etc for batch loading efficiently, etc etc. My actual training scripts are only a few lines of code. Hyperparams in all caps at the top, import functions from my module, and call the functions. And my modules are written in a way that employees of various skill levels can contribute to the project. Myself and another colleague do all of the more advanced AI work, but any member of the team can be a USER of the module, and we have more general data scientists that can contribute to preprocessing code, containerization, post processing tools, etc.

Even if you don't do a full module, make a utils.py file to pull out any long pieces of code and write it as an importable function. Use docstrings for every function with Google's docstring style guide (or use the autodocstring extension on VSCode, it's great). Use a linter like flake8 or black to make sure your code looks clean and professional. This all seems like minor, tedious stuff, but if you have to go back and edit/maintain code you wrote a year ago, it's a lifesaver. And it also means that in an industry environment, another coworker can step in and easily understand and edit your code. It might not make a functional difference to you right now, but good, clean, professional code is great on a resume.

9

Tgs91 t1_j2n5aih wrote

Are you in academia or industry? In industry, I do other work while I wait for training to complete. Code cleanup, refactor and simplify my modules so they'll be easier to maintain, start building out modules for post processing / integrating the model for the end use case. If all of that is already completed, I start working on another project in my teams backlog. There's always other work to do, no reason to sit around waiting for a model to train.

15

Tgs91 t1_iy491cv wrote

Important piece of this question is whether you want a lossy dim reduction or lossless. With something like PCA you can reconstruct the original dimension. With a deep learning based method, there is some degree of information loss (which could be a good thing since it's supervised information loss / choosing important information to retain). If you want to be able to reconstruct the original inputs, you'd need to also build a generator model that is supervised to reconstruct the original. You can get very high quality reconstructions, but it won't be an exact match of the original.

3

Tgs91 t1_ixmylf3 wrote

What are the probability outputs during training? What about cross-entropy? Is there a difference between the training cross-entropy and the testing cross-entropy (assuming you are following the same process for testing as you are for inference). Cross-entropy is what actually gets optimized in the training process, and that's what drives probabilities towards 0 and 1. You could perform poorly on cross-entropy and still potentially get a decent accuracy. If your testing cross-entropy matches training and val cross-entropy, then the probability outputs are correct and your model just isn't very confident in it's answers. What kind of regularization are you using? Are you doing label smoothing on the prediction outputs? High regularization or label smoothing could drive your predictions away from 0 and 1.

Is there a class imbalance in your data? Is 93% a significant improvement over a more basic method like KNN? It could be that your model didn't really learn much but gets good accuracy because class imbalance makes accuracy trivial.

1

Tgs91 t1_ix8vf0r wrote

After the first skim I like to go to YouTube and try to find a Paper Explained video for the paper if one is available. If it's a niche paper with no videos, look for the most important cited paper and go for that instead. Quality may vary, but it'll usually at least cover the important points and might catch something you missed on your first skim. I also like to do this with some work colleagues. We'll each take a paper, then explain them to each other and discuss

3

Tgs91 t1_iryz3ac wrote

I'll give you a real answer:

Most startup companies over hype their products and just flat out aren't very valuable to my workflow. Usually the problems they "solve" are the easiest parts of my job that I can do in 10 minutes with an open source python package. Or they build some software around a solution that is 2-3 years out of date and only applies to cookie cutter problems. And when questioned about problems that aren't their basic simplistic use cases, they often aren't even aware of the flaws in their methods. I don't trust the tools until after I read their backend code to double check their claims, and at that point I might as well have implemented it myself.

3

Tgs91 t1_iryxsqa wrote

A neural network is capable of learning non-linear relationships from a 1d input to a 1d output. The problem is that your data doesn't doesn't have any relationship between those variables. You need to find some input variables that are actually related to the output. A neural net can't approximate a relationship that doesn't exist

1

Tgs91 t1_irxr8oa wrote

Depends the subject matter. I was playing with summarizers recently and they are VERY sensitive to writing styles. For example, models trained on news articles will perform poorly on fiction and be nearly unreadable for technical documents.

Idk what models are out there for Turkish, but finding a Turkish model that also is trained on the correct writing style will be difficult. If you only need to summarize a paragraph that opens up possibilities for you. Regular transformers (like all of the models based on BERT or BART) can only handle a max of 512 tokens (words or pieces of words). The length you can feed them in limited, but they've been around since 2016 so there are a lot of options. If you want to use newer models that can handle longer inputs, you'll want to use a Longformer.

But no matter what you use, factual accuracy is a major problem in summarization models. They sort of compress an input into a representation, then use that representation to generate an output. So sort of like you give someone a writing prompt and they write a story. There's really no way to confirm accuracy. Extractive summarization isnt as state of the art, but it's more reliable if accuracy is important to your use case. This approach looks for pieces of text within the input that act as a good summary, then uses them as a summary. So if you feed it 2 pages of text, it can find 4-5 sentences that do a good job of summarizing the 2 pages. This also might be a more robust approach if you can't find a Turkish model that's a good fit for your use case. Huggingface does not have good support for extractive summarization, but I've been using the bert-extractive-summarizer package, which is built on top of Huggingface and can import Huggingface models.

8

Tgs91 t1_ir0n6hb wrote

You are missing the activation function, which is part of the neuron. They're sometimes passed a separate layer, but it's just a way to represent nested functions. So it isnt:

F(X) = WX + b

It is:

F(X) = A(WX + b), where A is a nonlinear function.

You could make A a polynomial function and it would be equivalent to your suggestion. However polynomials have poor convergence properties and are expensive to compute. Early neural nets used sigmoid activations for non-linearity, now various versions of ReLU are most popular. It turns out that basically any non-linear function gives the model enough freedom to approximate any non-linear relationship, because so many neurons are then recombined. In the case of ReLU, it's like using the Epcot Ball to approximate a sphere.

1