Submitted by Ch1nada t3_10oauj5 in MachineLearning

I’ll try to get into detail about the implementation and difficulties in case it is useful for anyone else trying to do something similar with an applied ML project, so there’s a TLDR at the end if you’d like the short version/result.

At the end of last year I convinced myself to start 2023 by creating a side-project that I'd actually finish and deploy and perhaps earn some “passive” income (spoiler, not so passive after all :P), and after some brainstorming I settled on making an automated Youtube channel about finance news since I had just gotten into investing. Shorts seemed to be more manageable and monetization is changing in February so I went with that.

My rough initial idea was to get online articles, summarize them, make a basic compilation with some combination of pymovie, opencv and stock photos and done. I was pretty worried about the summarization, since in my ML day job I mainly work with vision or sensor data in manufacturing not NLP. Also, I quickly realized pymovie with still images and some overlayed text was not very attractive for viewers (starting with myself).

Fast-forward a few days, and after some research online I came across two things, Huggingface transformers (yep, I know I’ve been living under a rock :P) and After Effects scripting. From here, it became mainly about figuring out exactly which ML models I needed to fine-tune for finance / social media and for what, then putting it all together.

The entire workflow looks something like this: the bot fetches online daily news about a topic (stocks or crypto), then sentiment analysis is performed on the title and the full text is summarized into a single sentence. I fine-tuned SBERT on ~1.5M posts from /r/worldnews publicly available in Google Cloud BigQuery so that it could predict a “social engagement” score that could be used to rank and filter the news that would make it into the video.

Finally, all of this is combined into a single JSON object written into a .js file that can be used by another “content creator” script to render the video from a template using aerender in Python. The content of this template is generated dynamically based on the contents of the .js file via AE Expressions. This module also uses the TTS lib to generate voice-overs for the text, and is also responsible for generating the title (using NLTK to identify the main subjects of each title) and the video’s description. Pexel stock videos are used for the background.

In principle automating the upload to Youtube could also be done, but at this stage I’m handling this manually as the JSON generation is not as robust as I’d like, so the output file often needs to be tweaked and fixed before the video can be finalized and uploaded. An examples is the summary being too short or vague when taken out of the context of the original article. If you increase the max_length of the summarizer to compensate, it can easily become too long to for the overlay to fit the pre-defined dimensions, or the total audio length can be too long for the max duration of a youtube short.

With some more work I’m confident the whole process can be automated further. For those interested, feel free to check the result here:

Byte Size Bot channel

If you have any questions or suggestions I’d be happy to hear them.

TLDR: Coded an automated (not 100% yet, but will get there) Youtube Shorts channel about finance news to create a passive income stream. Ended up being way harder, more fun and not so “passive” than my initial expectations.

63

Comments

You must log in or register to comment.

nTro314 t1_j6e5cbh wrote

I am impressed

14

Ch1nada OP t1_j6e5ns3 wrote

Thank you! I'm just happy I got to deploy a side-project, the pile of "projects that sound great but lost steam half-way through" was getting too big haha

7

iamsunnycoast t1_j6ez3ci wrote

Youtube detects TTS, it's only a matter of time before you get demonetized. Cool project though.

11

_poisonedrationality t1_j6gz4it wrote

Really? I see a lot of text to speech channels doing stuff like reading reddit comments. Are you sure youtube demonetizes these because I don't think people would bother without monetization.

3

Ch1nada OP t1_j6hk9ae wrote

I think there is no clear response from their side, but from the guidelines one can infer that they'd cut it only if there's no added value (in this case the summary and analysis for instance) and mass produced, for example just chopping clips with random TTS from a tv show.

3

Ch1nada OP t1_j6f451e wrote

Ah, good thing it's not monetized yet :P all jokes aside, I think their policy is against repetetive, fully programatically generated content. Since I'm still manually curating the contents of the video due to the current limitations, that might actually be a good thing. But thanks for pointing that out, I'll try to clarify it and adjust as needed.

2

MrBarryThor12 t1_j6dwr2b wrote

That’s very impressive, I think you could end up making some money if you continue to tweak it

3

Ch1nada OP t1_j6dxgam wrote

Thank you very much! I got it to a point where it "works", but there's a lot of work to be done still. Off the top of my head the TTS pace is a bit off and pronunciation is often clunky, videos are chosen from a random pool and could be matched to the article. But the biggest challenge so far has really been automating it end-to-end in a reliable way

3

zachguo t1_j6ibmvd wrote

Why do you need sentiment analysis? To categorize the news into "bullish" and "bearish"?

3

Ch1nada OP t1_j6idoze wrote

Yep, precisely. It is sentiment analysis fine-tuned on finance news. This is then mapped to bullish, neutral or bearish for the analysis overlay

2

doctorjuice t1_j6h0hpn wrote

Very cool! How did you get Pexel to give you relevant videos? I’ve tried to use Pexel for video editing before, but it usually gave videos that didn’t really make sense for the content.

2

Ch1nada OP t1_j6h1gdw wrote

Thank you! Tbh I really over-engineered it at first, trained a model to classify articles into sub-categories, then built a query around that to fetch contextualized videos from pexels, but it was really clunky (e.g., stock market could return something like someone buying fruit at the farmer's market). Currently, I have a 2 pools of videos I curate manually, and the content creator script just picks randomly either from the stocks pool, or crypto pool.

2