Sirisian t1_jdz1i7x wrote

Interesting. So if you had a bunch of these and used like a 35% efficient steam turbine without any other losses in the piping that's 20% + 35% * 60% = 41%. That seems extremely high though, so it seems wrong to me. That would match some of the most expensive solar panels used for satellites.


Sirisian t1_jdy84ph wrote

There's so many papers released every week. If you used you'd probably find more. Some of these released at the same time mind you. So many teams are working on nearly identical projects.


Sirisian t1_jcx8zsd wrote

In a very highly compressed form that might be enough, but "8K" 240Hz lightfield video with say a 1 meter camera sphere is so much data. That's with a restricted FOV. For 360 lightfield it's an absurd amount of data. The file size might end up being larger especially for films with a lot of action or VFX that can't be compressed easily as they're very dynamic. I foresee this utilizing very low latency streaming and predictive algorithms. (Analyzing how the audience watches with eye tracking to best compress scenes might also be required).


Sirisian t1_jclhe7i wrote

The interesting thing to mention with these tests is they aren't using a fine-tuned model. With GPT-4's multimodal configuration one could fine-tune a system to digest the DnD manuals for all their text and images to give the system a deeper understanding and set of constraints. One could imagine including a lot of rules into such a system.

The article also mentions context window issues where the AI forgets things. You can ask it to summarize important events every once in a while so that it remembers things. (Essentially it brings it into the context again reinforcing the information). The GPT-4 context is 8K tokens, but the API has a 32K token version. If someone was building a DnD dungeon master with the API they'd probably perform this summarization operation automatically with a tailored input.


Sirisian t1_j8v64h6 wrote

Just to be clear the paper is drawing fixed icons, so there's no real FOV. (The contact follows your eye). You could probably make a 7-segment display and such with this, but it's not pixels. There are batteries that can fit into contact lenses already like with Mojo's setup. That one has a monochromatic MicroLED display which is quite a bit more advanced with eye tracking and does what you're thinking.

Since electrochromic uses so little power it can probably run for a while on a small battery though displaying directional symbols, time, or simple icons to the user. The supplemental videos at the bottom shows how fast it can switch and shows various patterns they printing turning on and off.

I quite like how transparent it is when in the off state. There is a optical device for AR displays called an opacity filter that hasn't been invented. If they can get an electrochromic solution to go from transparent to opaque black with a lot of levels and make the resolution high enough they could have a really nice product. (If it switches at 240Hz+ though. That's probably the deal breaker as this switches very slowly).


Sirisian t1_j8h8gvq wrote

In Star Trek this is seen as a positive thing when used for historical reasons. A prolific writer with a detailed history is able to be reconstructed as closely as possible to have a conversation with. This is an element in a number of their Holodeck episodes. (It's also seen as taboo or maybe against regulation to generate holograms of crewmates).

As we move into a future with mixed reality the amount of data one can capture could entail most of a person's life recorded from their point of view. There will be hundreds of thousands of such archives collecting dust or used for training. Think of it like when Marion Stokes recorded old TV on VHS tapes, but this would be the real world in lightfield video formats. From a historical point of view stepping back in time where someone else lived is fascinating. Talking to them could offer very unique perspectives, very different from someone in the future.

It might be a bit weird to apply it in our time, but at some point it'll just be something that's possible. Like when you predict what your friend or spouse would say or how they'd react because you know them so well. That it's an AI doing it is different, but not really unexpected.


Sirisian t1_j7hy8td wrote

Google already has a knowledge graph which can be used to guard against common mistakes ChatGPT makes with trivia and basic information. Using such a system it's possible to prevent faults in the model and potentially stop some hallucination that can occur.

I've been hoping to see one of these companies construct and reference a complete probabilistic temporal knowledge graph. The bigger topic is being able to go from entity relationships back to training data sources to examine potential faults. I digress, this is a large topic, but it's something I've been very interested in seeing, especially since information can have a complex history with a lot of relationships. (Not just for our real timeline either. Every book has its own timeline of changing information that such a system should be able to unravel).


Sirisian t1_j7atnhi wrote

> My assumption it would be child's play to base an AI's decision making on a commercial marketing manual of some sort.

Again those are influences not from an AI, but from the corporation that produces them. Controlling what corporations do is what regulation is for.

> Bad? I'm not judging. Something to be aware of and alert for? Absolutely.

I wish others took that same view and simply studied and discussed the problems. Too often on r/ChatGPT people jump to wild conclusions.


Sirisian t1_j7ar1ic wrote

Your premise is flawed in regards to ChatGPT as it's OpenAI - a company - making the decisions on what to filter, not an AI. Corporations self-censoring products to be PR friendly isn't new. It's not even an overly advanced filter as it detects if a response is negative/gory/etc and hides it. (Unless you trick it). A lot of people attach meaning to ChatGPTs responses when there isn't one. It can create and hold opposing viewpoints on nearly every topic. Your issue, like a lot of AI concerns, is on how companies will implement AIs and what biases might exist in training them.

There's no real way to please everyone in these discussions. An unrestricted system will just output garbage from it's training data. Some users claim they want to see that even if it hurts the company's brand. People aware of how these systems work understand training on the Internet includes misinformation that can be amplified. Filtering garbage from training can take a while to get right.


Sirisian t1_j6yja6v wrote

Part of this is about brand identity also. Even if a technology isn't perfect some companies try to get in early. This is similar to virtual reality and mixed reality trends. The industry sees an inevitable future and want to be the name people think of. If one assumes gradual improvements until ~2045, then this is long-term planning. (Or short-term depending on improvements expected. It's possible MS has insider information that skews their motives).


Sirisian t1_j48f8pa wrote

If they used a binaural audio dummy setup and compared against a ground truth surround sound system it's possible this is machine learning technique to configure the setup. Sound propagation especially with specific speaker configurations (and ear positions) is hard to model. (Could use a non-machine learning approach to match the sound in the ground truth and construct a table of speaker configurations).


Sirisian t1_j44fz5x wrote

> now AI is AGI

It's not. Keep correcting people. AI is task or multi-task specific. It's perfectly fine for someone to say ChatGPT is a dialog AI, for example. It completes tasks (multiple in this case) using an artificial intelligence that happens to be created using machine learning techniques.

What you're describing in your second part is AGI. Non task-specific problem solving at the level of a human. The boundary between advanced AI, especially multi-task learning models, and AGI will get smaller and fuzzy in the coming decades, but there is a difference. There will be very advanced AIs that check a lot of boxes researchers create. When an AGI is created there are effectively no boxes left to check. The system can collect data, reason, and create a logical solution for any test way outside of its training data.


Sirisian t1_j07zhey wrote

Just to be clear the nodes do refer to upgrades generally. So both speed and power usage gains. Just as things get closer to literally buildings with atoms the terminology falls apart. The small structures are 3D arrangements, so one measurement doesn't capture things anyway. Back when things were larger (like a decade ago) it made a lot more sense.


Sirisian t1_iye8r8v wrote

We've had this discussion a few times in this subreddit. In the US specifically a national ID is contentious due to federal vs state bureaucracy. This has been shifting as we have Real ID guidelines which more or less standardize all IDs. It's basically a formality now that each state has its own ID. The other issue is national IDs need to be very cheap or subsidized. This used to be contentious, but states have been creating methods to get free IDs for voting for a while. Adding public/private keys to IDs might have some pushback due to ignorance of how cryptography works. That said, a lot of people now use chip payment and debit/credit cards and are more comfortable with the concept of secure communication. (Military people are familiar with Common Access Cards which are identical to what a national ID card is, so we have methods for producing them in mass quantities).

I view this as an inevitability as technology progresses and technology literacy increases. If you want to help you can talk about this with others. Being able to file taxes securely and do every government action securely has huge benefits for people and can basically eliminate identity theft. Anyone interested in lowering administration costs can usually be swayed with these systems. The cards more than pay for themselves with the added efficiency.


Sirisian t1_ixxahfn wrote

Yeah, the manual fixing part might be required for a while. For some applications like in film, NERF methods are looking interesting where the topology doesn't need to be perfect for hard surfaces. The new Corridor Digital video showed how fast that is progressing with a quick look at some applications of it so far.

It's not hard to imagine as graph neural networks become more advanced, and with enough training data, that a topology solver will exist. Even as an assistant tool rapidly fixing common issue. (These kind of force multipliers are important since they allow one artist to work at the pace of multiple). Another method is to start from an artistically created reference mesh (or something like MetaHuman) and mapping scans to it to import actors.

You mention using AR. A lot of comments view mainstream AR and its data collection ability to be a tipping point where various techniques become commonplace. Walking around with a headset and scanning the world at extremely fine detail with algorithms extracting objects and other algorithms extracting normal maps, lighting data, etc. In many game pipelines artists will (or did before large databases of assets) go out and collect photogrammetry scans with teams. This process will be much cheaper and faster later.

It's also interesting from a rendering perspective how some tools are dealing with larger polycount objects. (Other than simply simplifying them and baking normal maps). UE5 for instance can handle very unoptimized meshes with millions of polygons. Not super ideal and not viable for VR applications anytime soon, but engine pipelines might be able to magically handle things that artists used to do manually to increase performance.


Sirisian t1_ix1k40n wrote

There's probably a number of ways to do it. My naive solution would be to use segmentation masking first to simplify the problem without any background pixels. Mediapipe pose can do this. You can use pose to also segment regions, like arms and torso, to specialize the computation.

Can precompute the input image into such regions as well and mask out the person/background to just have the clothing.

That said the documentation says:

> Garment Transfer currently supports only one person in the target image. When using Garment Transfer with two or more people visible, the garment will attempt to apply to multiple targets, and place different parts of the garment on different people.

They have full pose tracking for multiple people, so this indicates this method is different and probably not relying on that previous research. It sounds more global to me which is strange as I'd expect that to be slower. Maybe their temporal pose tracking is more expensive than I'm thinking.