Jaohni
Jaohni t1_j09jjfj wrote
Reply to comment by swisstraeng in Japan to Manufacture 2nm Chips With a Little Help From IBM by Avieshek
PSA: ISA =/= implementation.
While it was common to suggest in the late 90s and early 2000s that there was a strong distinction between CISC and RISC styles of architecture, owing to CISC having a wide variety of purpose built instructions that aided in accomplishing specific tasks quickly, while RISC would have fewer transistors sitting around doing nothing (idle transistors do still consume some power, btw) as a consequence of bloated instruction sets, in reality, modern ISAs have a mix of CISC and RISC philosophies built in, and more important than a core being ARM or x86, is the way that core is implemented.
In reality, if you look at a variety of implementations of ARM cores, there actually isn't as big an efficiency improvement gen over gen as you would expect, as seen in the Snapdragon 865, 870, 888, and 8 gen 1 all performing relatively closely in longer tasks (though they do benchmark quite differently in benchmarks that test a series of tasks in very short bursts), and actually not being that out of line with certain x86 chips, such as something like a 5800X3D (were one to extrapolate its performance when compared to a 5800X power limited to similar wattage to the SD SoCs), or say, a Ryzen 6800U processor power limited to 5W.
​
That's not to say that there isn't ARM IP out there that can be beneficial to improving performance at lower power draw, but I'd just like to highlight that a lot of the improvements you see in Apple Silicon aren't necessarily down to it being ARM, but due to it being highly custom, and due to Apple having varying degrees of control over A) the hardware, B) the drivers / OS / software stack, and C) the actual apps themselves. If you're able to optomize your CPU architecture for specific APIs, programming languages, use cases, and operating systems, there's a lot of unique levers you can pull as a whole ecosystem, as opposed to say, just a platform agnostic CPU vendor.
Another thing to note is that while Apple saw a very respectable increase when jumping from Intel to their in house M1 chips, it's not entirely a fair comparison between x86 and ARM as instruction sets, as the Intel implementation was implemented on a fairly inferior node (14 nanometer IIRC), while the M1 series was implemented on a 5nm family node, or possibly more advanced. When taking this into account, and comparing the Intel versus M1 macs, you may want to remove anywhere between 80 to 120% of the performance per watt improvements to get a rough idea of the expected impact of the node, with what's left being a combination of the various ecosystem controls Apple has available.
When compared to carefully undervolted Raptor Lake chips, or equally carefully managed Zen 4 processors, the Apple SoCs, while respectable in what they do (and being respectable as a result of many things not owing to their ARM ISA), they aren't alien tech or anything; they're simply a well designed chip.
Jaohni t1_it3jhvp wrote
Reply to comment by nezeta in The End of Moore’s Law: Silicon computer chips are nearing the limit of their processing capacity. But is this necessarily an issue? Copenhagen Institute for Futures Studies by CPHfuturesstudies
Correct me if I'm wrong, but hasn't "nm" as a naming scheme been kind of misleading since 157nm immersion lithography failed?
Like, before then, nanometer was a measure of distance between transistors, and a smaller difference meant a faster calculation that also used less energy, and could be made cheaper because the transistors would require less silicon for the same calculation.
But as finfet started coming onto the scene, you could essentially raise the transistor in a third dimension, which adjusted the performance profile of that transistor, allowing you to gain "effective nanometer reduction", so things like TSMC 16nm and onward weren't really "nanometer" anymore, but an essentially abstract number that indicated roughly the performance compared to previous generations, which is also why intel 10, for instance, is roughly as dense, in terms of literal nanometers, as TSMC 6/7, but doesn't necessarily perform the same in all instances.
IMO Moore's Law, as originally described is dead (a doubling of transistors, and performance every 8 months), but the "Layman's Moore's Law", that "Computation will advance geometrically and we'll be able to acquire higher levels of performance for the same money", is still well alive.
There's plenty of interesting and technically challenging ways to improve performance, such as 3d stacking (IBM, AMD), disaggregation (AMD, Apple), heterogenous compute (Arm, Intel) and so on, without even going into the upcoming AI accelerators that will take advantage of improved multi-threading / parallel compute to shore up on our lack of raw single threaded improvements we've seen as of late, so as a tech enthusiast I'm absolutely hyped for upcoming products, but I don't quite think that it's quite right to say that Moore's Law is still alive as it was originally used.
Jaohni t1_irdfkqr wrote
Reply to comment by oscardssmith in Samsung announces 36 Gbps GDDR7 memory standard, aims to release V-NAND storage solutions with 1000 layers by 2030 by Avieshek
So, imagine you have one lane to transfer data from memory to a processor. You're probably going to clock that lane as quickly as you possibly could, right? Well, that means it'll have the lowest latency possible, too. But, if you added a second lane, you might not be able to totally double bandwidth, because you might not be able to clock both lanes as high as just the one, but maybe you get 1.8 or 1.9x the bandwidth of just the one...At the cost of slightly higher latency, in this case, 1.1x the latency.
The same idea is basically true of HBM versus GDDR. GDDR essentially has overclocked interconnects to get certain bandwidth targets, and as a consequence has lower latency, but with HBM it's difficult to clock all those interconnects at the same frequency, so you get higher bandwidth and higher latency overall. Because it's less efficient to overclock those lanes, though, HBM ends up being less power hungry (usually).
Jaohni t1_irajujt wrote
Reply to comment by AnimalNo5205 in Samsung announces 36 Gbps GDDR7 memory standard, aims to release V-NAND storage solutions with 1000 layers by 2030 by Avieshek
I wouldn't say that HBM never went anywhere; it was a high bandwidth, high latency alternative to GDDR's (relatively) low bandwidth, low latency, which was achieved by essentially overclocking the interconnects in GDDR, leading to HBM being much more power efficient. And then they overclocked their Vega series through the moon, but anyway...
...HBM is still alive and well, but it's more commonly used in server and workstation applications ATM, where bandwidth is worth as much as the compute in the right workload. We might actually see some high end gaming GPUs in a year and a half to two and a half years here, as certain incoming trends in game rendering (raytracing, machine learning, and so on), can benefit from increased bandwidth, though at least on the AMD side I think they'd prefer to do 3d stacked cache as beyond having a higher effective bandwidth, it also basically improves the perceived latency, and power efficiency is more heavily improved than via using HBM.
Jaohni t1_j6zlhkm wrote
Reply to comment by BezniaAtWork in LG Releases OLED Monitor Inspired by Dragonfly Eyes | There are over 5,000 micro lenses per pixel, enabling up to 2,100 nits and 160 degree viewing angles. by chrisdh79
I mean, everyone's preferences are different, but IMO bigger isn't always better with displays; if you have the same resolution of display but in two different sizes the smaller of the two will appear brighter and of higher contrast to the eye, and will generally offer a better experience in bright conditions.
Plus, you can just adjust sitting distance to the size of display, so I'm not entirely convinced that displays should be judged that heavily by size, IMO.