Comments

You must log in or register to comment.

carbocation t1_ixyz3g4 wrote

Seems like binocular depth estimation should be possible with a binocular device.

50

naccib t1_iy1009o wrote

Monocular depth estimation is very valuable for creating AR experiences in general-use devices such as smartphones. This is, in my opinion, the greatest value for such depth estimation algorithms.

11

carbocation t1_iy12bhd wrote

I agree with you about the value and use-cases for monocular depth estimation. I was just making the point that, in principle, a binocular device could attempt binocular depth estimation. Or perhaps they tried it internally and it was not sufficiently better to be worth the expense.

5

naccib t1_iy3jayh wrote

Oh, binocular depth estimation is definitely a less technically challenging approach. I think the reasons they are pursuing monocular are due to what the other commenter said about cost and stuff.

3

pm_me_your_pay_slips t1_ixz544s wrote

One camera is cheaper than two, though. Cheaper in every sense (compute, memory, network bandwidth, energy consumption, parts cost, etc).

7

mg31415 t1_iy2pg4i wrote

How is one camera is cheaper computationally? If it was stereo they wouldn't need a NN

5

pm_me_your_pay_slips t1_iy2twej wrote

You need to do feature computation and find correspondences. If you’re using a learned feature extractor, that will be twice as expensive as the monocular model. But let’s say you’re using a classical feature extractor. You still need to do feature matching. For dense depth maps, both of these stages can be as expensive, if not more, than a single forward pass through a highly optimized mobile NN architecture.

3

soulslicer0 t1_iy2fd1x wrote

Could be doing depth estimation by fusing two monocular nets like mvsnet

2

Zeraphil t1_ixz5ntr wrote

Slight peeve, that’s not being processed onboard the glasses but on the separate compute box, a Moto phone. Still nice but you can put heavier hitting compute when on that setup while keeping the glasses lightweight

15

pm_me_your_pay_slips t1_ixzf29m wrote

You can put that compute on the glasses. The real problem is heat dissipation. It is what killed google glass.

4

lennarn t1_iy0rom4 wrote

Just put the compute inside a cute little hat

4

Zeraphil t1_ixzf8u5 wrote

Sure, HoloLens has plenty and it’s all on the glasses as well. But at the cost of weight and comfort.

3

lennarn t1_iy0s0sg wrote

Weight and comfort are essential for a product like this, if it was indiscernible from a pair of sunglasses everyone would get one

6

SpatialComputing OP t1_ixzqmh5 wrote

Yes. On the other hand: in the glasses is a Snapdragon XR1 Gen 1 and if that's a Motorola Edge+, there's a SD 865 in there... both not the most efficient SoCs today. Hopefully QC can run this on the Snapdragon AR2 in the future.

2

donobinladin t1_ixzwa2l wrote

Wonder how much bandwidth is needed and if it could be compressed to Bluetooth

1

Zeraphil t1_ixzwpm4 wrote

Can’t say too much, but it’s in the works.

Source: I was on the team that designed the original compute box design, at Lenovo.

4

donobinladin t1_ixzyiri wrote

This is really cool tech, great work!

Would be an interesting use case for lifi since conference rooms or desk space is always well lit.

Would require some infra but if it were only in certain areas the overhead probably wouldn’t be much to realize 1.5 gbps+ throughput

You can just Venmo me cash if you use the idea 😉

1

Deep-Station-1746 t1_iy053b2 wrote

ELI5 why strap it on your face?

12

extracoffeeplease t1_iy0pkp7 wrote

Yeah they totally didn't show the application?? People've been doing 3d mesh construction with deep learning for a while now

5

JanFlato t1_iy2b3db wrote

Practically, help blind and visually impaired people. But commercially? Probably just post ads everywhere when VR glasses get more established

1

ixpu t1_ixyz005 wrote

Links to publication, press release etc?

10

Chuyito t1_ixz8ify wrote

It looks like the update to https://www.qualcomm.com/news/onq/2022/07/enabling-machines-to-efficiently-perceive-the-world-in-3d , In July they were doing similar depth estimation.

> Depth estimation and 3D reconstruction is the perception task of creating 3D models of scenes and objects from 2D images. Our research leverages input configurations including a single image, stereo images, and 3D point clouds. We’ve developed SOTA supervised and self-supervised learning methods for monocular and stereo images with transformer models that are not only highly efficient but also very accurate. Beyond the model architecture, our full-stack optimization includes using neural architecture search...

That press article and the DONNA page keep it mostly high level / architecture though

3

josefwells t1_iy0ic6y wrote

As a Qualcomm employee, I can confirm this is what our conference rooms look like.

3

Lonely_Tuner t1_iy24zue wrote

I conceived this as AI Photogrammetry as a 3d Modeller. And Why should I watch the loading screen for my office?

2

I_LOVE_SOURCES t1_ixzo13o wrote

I wonder why they weren’t walking around the room

1

the320x200 t1_iy3r3tw wrote

They walked all around the table... The reconstructed view showed the geometry from a fixed point but the depth and camera view showed they we're walking around.

2

OverLemonsRootbeer t1_ixzz5js wrote

This is amazing, it can make gaming and AR environment work so much easier.

0

[deleted] t1_iy2v3vm wrote

This is (one of the many reasons) why I won't buy a Meta Quest. Inside-out tracking relies on this sort of thing, constantly scanning your house and building a model of it. Outside-in tracking, which is far more accurate, doesn't use cameras at all but a swept timing laser and basic photodiodes.

0

_damak0s_ t1_iy01m15 wrote

we tried this a decade ago and it did not work.

−5

pennomi t1_iy11bxn wrote

That’s a terrible argument for compute-heavy technology. Our devices are far better at this today.

10

_damak0s_ t1_iy11gaf wrote

not what i meant. we don't have enough awareness to be able to check our 3d surroundings while reading a phone-like hud

0

pennomi t1_iy11r4s wrote

Well you wouldn’t need realtime depth estimation for a HUD would you?

This would be more of a 6-DOF AR system, which can and does have real world applications.

5