Viewing a single comment thread. View all comments

Most_Engineering_992 t1_j2crux8 wrote

When you click on a link the browser loads a lot of stuff, including page layout information (HTML & CSS), references to images, and Javascript code, which is like a program to do things. Normally the code communicates with the host platform to get information from databases and pass along things like passwords and emails, but that can be changed to do bad things.

Search engines don't do that. The contents of the page are downloaded and scanned for text, links, and images, but no code is run. It's like the difference between looking at directions on a map, and actually following those directions.

21

sailor_sega_saturn t1_j2cwu4c wrote

Some search engines do execute JavaScript. In particular both Bing and Google have "engines" based on Chromium.

For Bing:

> Bing is adopting Microsoft Edge as the Bing engine to run JavaScript and render web pages.

For Google:

> Googlebot now runs the latest Chromium rendering engine

5

Officialsparxx OP t1_j2csqyo wrote

I feel like I messed up by making this a “two parter” question. I know the wayback machine isn’t really a search engine or a web crawler, but would it be safe from malware for a similar reason, if not, why?

1

Toke_Ivo t1_j2d0wzf wrote

Code is not self-executing. I know certain articles can make it seem like it, but the reason code is "self-executing" is really because your program is instructed to "download, read, and follow all instructions at <website>".

If you don't want or need that, you can just make a program that just downloads the page, without running the code. Or you can limit what code it can run.

Like, imagine a really stupid chef. You hand him a recipe and he makes the food. One day you hand him the recipe for a Molotov cocktail, and he blows up the kitchen. The issue isn't the recipe - it's the chef.

10

sailor_sega_saturn t1_j2cx838 wrote

A crawler is only vulnerable to the input that it tries to parse or execute. Wayback Machine may archive windows executables, but it's certainly just treating them as binary bytes if so, and wouldn't even know how to execute them.

So I'd expect Wayback Machine to be immune to downloads to weird executables.

^(The user who downloads and runs the archived file on the other hand...)

3

cafk t1_j2dnu4h wrote

It's like instead of clicking on a link you right click and select save target - that's how mirroring works - they just download the files. You're not rendering the page just downloading one file.
It's only when you open the downloaded file with a browser that it is rendered and possibly included javascript code is run, which can exploit some weakness in a specific browser.

Similarly to downloading malware - It doesn't do anything until you run it - but you can open the executable with a decompiler and look at what it does without actually running it.

2