YamEnvironmental4720

YamEnvironmental4720 t1_j2qww2y wrote

I am not surprised by the limited amount of mentoring for your master thesis. I know that doctoral students are supposed to be very independent in Germany in comparison to many other countries, at least in more traditional academic disciplines. Maybe it is a little different in ML. Anyway, there seems to be the prospect of some brainstorming with the post doc, and there will be other PhD students, even if they don't have the same advisor as you. So, chances are that you would enjoy your PhD years more than you have been doing working on your master thesis.

Since you mentioned that you may drift towards more practical aspects of CS when you get stuck, you should also consider how much you enjoy programming. Can you easily sit for hours with your own hobby programming projects, almost not noticing how time passes? If this is the case, you would perhaps be more happy working on implementing ML techniques in industry than analyzing the theoretical aspects of it.

1

YamEnvironmental4720 t1_j2qr952 wrote

Most people have commented on how a person of your age would be viewed in academia and in industry. I'd like to take a different perspective: what kind of thesis you would produce.

This is of course impossible to say for sure, but the following questions may be of relevance:

  1. How has your master thesis been going, and how enthusiastic have you been working on it?
  2. Have you already been suggested a topic for your thesis, and to what extent have you been able to influence this choice of topic?
  3. Do you struggle with procrastination?
  4. When you procrastinate, do you find yourself doing things that could perhaps be developed further into useful skills for the job market?
2

YamEnvironmental4720 OP t1_iymfub5 wrote

Do you mean that uncertainty means almost equality of probabilities? For entropy, we usually group the space into halfspaces separated by a coordinate hyperplane. But any hypersurface, such as the zero level of a function f, also does this. A classifier function f whose zero level hypersurface yields a splitting of the full space that does not significantly reduce the entropy would probably be a bad classifier by other metrics also.

1

YamEnvironmental4720 t1_iwge3d2 wrote

A couple of years ago. I was interested in the classification problem for stock price movements. The goal was to predict if the stock yielded positive returns in the future 25-30 days, using daily data of the type provided by Yahoo Finance. I did some feature engineering to derive classical indicators, their moving averages over different time periods and certain normalizations of them so as to have features ranging between 0 and 1. I experimented with various thresholds x and discovered that I get better predictive power by labelling vectors by 1 if the stock returns is at least x %, for some certain x close to 1, than by simply choosing x=0, which means looking only at the direction of the price movement. One drawback, however, was that there was not a clear correlation between the profits and the accuracy of the model: a false positive of, say x/2 %, obviously affected the accuracy in a negative way while it at the same time contributed postively to the profit. Moreover, not defing a recommendation to be a prediction of at least 0.5, but rather something between 0.6 and 0.7 (depending on, for instance the stock index), significantly reduced the number of false positives with negative price movements.

I would still be interested in the question of finding suitable metrics, other than the accuracy, for measuring the performance of the classification algorithm.

1

YamEnvironmental4720 t1_iu3frfr wrote

Ok, in that case there is the cost function, defined on the model's parameters, that measures the average distance from the sample points to your hypothesis. This is the average error the model has for the fixed parameters. In the case of linear regression, the importance of a certain variable is given by the weight parameter attached to that variable.

If you are familiar with multidimensional calculus, the dependence of a fixed such parameter is given by the partial derivative of the cost function in this direction.

This is quite well explained in Andrew Ng's video lecture on linear regression: https://www.youtube.com/watch?v=pkJjoro-b5c&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&index=19.

1

YamEnvironmental4720 t1_iu3a34c wrote

I would recommend Andrew Ng's courses on Coursera. He is very respected both as a researcher and a teacher of ML. The courses start from the basics with linear regression and evolve to treating neural nets and deep learning. With your education, you'll have no problems with the mathematics: matrix theory, multi-dimensional calculus (in particular gradient flow) and some probability theory. He explains the intuition behind many of these topics very well, though, but it makes it easier to already be familiar with them. As for programming languages, the assignments for his first course on ML were in Octave, if I remember correctly, but he later switches by Python, which is by now probably the number one language for these purposes due the the multitude of libraries for ML. As you have a diploma in CS, I assume that you are already fluent in some programming languages, and it would be a good exercise to build your own ML model, e.g. neural net or random forest, from scratch in your language of choice in order to develop a deeper understanding.

2

YamEnvironmental4720 t1_itpvllg wrote

You may want to take a look at the Random Forest algorithm, for instance one of the introductory lectures by Nando de Freitas on YouTube on this topic. The key word is entropy, and the idea is to study how this changes when you look at all sample points with some variable value below and above some threshold value, respectively. You do this for all the variables and for each variable you also test different threshold values.

1