Comments

You must log in or register to comment.

suflaj t1_j8qwt5d wrote

People usually create datasets when they work on something new. I don't know why you would think that just because a dataset exists you can't or even need to outperform anything.

0

suflaj t1_j8qxasd wrote

That's more of an issue of you searching. You mention sentiment analysis, for example, but it is a problem that is considered to be solved for years. There is no novelty you could do here besides a bigger model.

Obviously you need to stop looking at what people have done, and start looking at what in their process of doing something they didn't do or did poorly. One such thing is tokenization of text. You can't tell me that it's all figured out.

5

Kapri111 t1_j8qxq26 wrote

I've worked in some of those topics but from a human-computer interaction perspective. As in, how sentiment analysis distorts information perception and such.

1

Mikarz t1_j8r11wh wrote

If you’re going to need a dataset that’s NLP related, go to https://aclanthology.org (THE database for NLP research) and search “Reddit dataset” with some keywords that you’re interested in. Read the papers. There’s loads of annotated Reddit datasets out there. Good luck with your thesis.

1

2blazen t1_j8r3le4 wrote

You'd want to find a more in-depth topic for a master's thesis, Reddit scraping and sentiment analysis sounds more like an assignment. Ask your supervisor if they have a topic they're researching on, and if you can join. Look around if your university has example projects or even better, open projects. Look around past year's theses if you can continue working on any of them (hint: future works section) Once you find a topic you're interested in and is niche enough, it's still too broad so you have to filter it down to research questions, for which you have to start an in-depth research about the challenges of the topic and such.

Don't panic, there are many topics that need research. I'm starting my thesis in audio processing - health AI / speaker embeddings / impaired speech / diagnosis assistance and it's wild west over here, partially because the data is not publicly accessible though

0

redflexer t1_j8ryw02 wrote

Actually, i find this notion harmful. I consider senior PhD students to be able to assess whether an idea in their field is novel, feasible, and in the right scope given fixed resources. I would never expect that from Master students. That does of course not mean that students can’t have great ideas, but it’s not mandatory for a degree.

1