Submitted by geoffroy_lesage t3_11zcc1a in deeplearning

Hi all, I'm looking for advice for using ML for Adaptive Authentication.

The use case is that I want to generate a unique identifier key from user bahavior. eg: Sam uses my app and I want to generate key 1234, Mel uses the app, her key is 2351, etc

To generate this key I thought I could use an ML model that takes as input user behavior data and outputs this key or something I can use to derive a key.

Taking typing on a smartphone as an example: a user types 10 words on their keyboard, we take data from that and feed it to the model to generate the key for this user. The data we take might be something like speed of typing a letter, time fingers were pressed on keys, number of times they used backspace, etc...

Is this possible? I'm not an ML specialist so my knowledge is limited, but I was thinking we could do something like using a classifier with 10 categories, and use some statistical value from the output equivalent to prediction accuracy or prediction certainty for each category to generate numbers out of the classifications... but that seems like a hack and there may be something more precise and standard

1

Comments

You must log in or register to comment.

the_Wallie t1_jdbtm3l wrote

... What? I read this twice and still had no idea what it is you're trying to achieve or why. Could you try to explain it as a user story?

1

the_Wallie t1_jdbujh4 wrote

OK I understand the what now, but not the 'why'. If you're processing personal information to recognize users, that requires their consent. If you have their consent, and we're talking about an installed app on an iPhone or Android, why not just use the user ID or device Id as the identifier? No ml required. Are you trying to identify different users of the same device?

2

geoffroy_lesage OP t1_jdbuonn wrote

Right, yes it will require their consent but this information stays on device since the ML happens on-device as well. The full picture is that I'm trying to make a passwordless experience where the key generated by the ML model is their password and is used to encrypt and decrypt some data on the device as well (: Idk if that makes sense

1

the_Wallie t1_jdbuwai wrote

Then either make it a 'stay logged in' experience or use bio info (facial recognition, fingerprints), depending on your security requirements.

Custom machine learning models are difficult to maintain and integrate compared to out of the box standard it solution and api integrations with external (ml) services. We should really only apply them when it makes sense (ie when we have an important, complex problem we can't navigate with simple heuristics and a large amount of relevant data).

1

the_Wallie t1_jdbvvue wrote

I would probably ask them the user to draw a particular shape or set of shapes with their finger and record where they start and how they deviate from the perfect lining of that shape, then (using a vector that represents those deviations over time, their speed, the total time to completion and the starting position), build a database to of users and loosely identify a user using a nearest neighbor algo, or using a deep classifier that has the users as its output layer. What's challenging is you need to start building the data before you can apply it to logins, unless you already have a good proxy task in that context of your app that doesn't require logins (or that you can get after authenticating users using different means).

1

geoffroy_lesage OP t1_jdbw2s7 wrote

Yea no worries, I can authenticate them differently at first and start tracking data for a while before it becomes important to have this key generated.

But this process you are describing is just to identify users individually using a standard test, not to generate a unique key per user... Is there some way I could achieve this? Generating a unique key from a machine learning model?

1

the_Wallie t1_jdbwjfb wrote

it depends on what your users are doing in your app. Some unique fingerprint has to come from some sort of behavioral or bio data that can reasonably be assumed to uniquely identify a user. Encoding data in some meaningful way (ml or otherwise) can only happen after you choose what you're encoding.

1

geoffroy_lesage OP t1_jdbwzjj wrote

I'm not quite sure I understand: "Some unique fingerprint has to come from some sort of behavioral or bio data that can reasonably be assumed to uniquely identify a user"
--> you mean to say "you have to get something unique from the user directly"? Because there are many ways to acquire unique things about a user.... how they type words into a keyboard is a very unique one for example, and there are many metrics that can be measured to figure that out...
- Pressure, Duration of press, Duration between presses, Speed
- Accuracy of presses
- use of backspace, use of auto-correct
- use of emojis, punctuation
- length of phrases, length of text, etc

1

Jaffa6 t1_jdbysmg wrote

Broadly speaking, machine learning models are huge black boxes that you can't really explain the behaviour of.

It's going to be very difficult (if it's even possible) to guarantee that a certain user's behaviour will create a unique key because it would really just be multiplying and adding some different numbers (which come from the factors you mentioned).

You can certainly generate a key, though.

Much simpler is, as someone else suggested, just using something like the device's MAC address. But then you'll run into issues with them being locked out if they change address.

1

geoffroy_lesage OP t1_jdbz94x wrote

I see, I like the black box aspect but I understand it makes things difficult for when we need consistent output... What kind of "key" would you be able to generate and with what models? What about mathematical or statistical ways to try to reduce the output to make it more stable? This might be a dum idea but imagine if the model spits out floats, we get 1.1 but we expect 1 we could apply rounding to get integers in which case we would more often get 1... or we could do multiple runs and average them out.. or use fancy math like finite fields, modulo arithmetic, using different base math, etc...
And yea I get it that we could use something that is on device but unfortunately that is not something I want to rely on.. nothing that is hard coded anywhere can be used.
The goal here is to generate this key and use it to encrypt/decrypt stuff. I never want to store this key anywhere, it needs to be generated by the user data fed into the model

2

Jaffa6 t1_jdbzs22 wrote

This is unfortunately going to be a bit harsh, but it's worth knowing sooner rather than later: Cryptography (which this essentially is) is a VERY difficult field and creating a secure encryption scheme is very difficult.

Wanting to encrypt and decrypt without the key being stored anywhere is an admirable goal, but this is certainly not the way I'd recommend doing it and it's not likely to be secure this way.

If you're dead set on doing it like this, then pretty much any neural network can do it. You're just inputting numbers and wanting numbers out.

I guess your training data would be many sets of behavioural data from each user, say at least 50 users, and training it to predict the user from that data, but heavily penalising it if it matches another user too.

1

geoffroy_lesage OP t1_jdc027t wrote

I see, understood. You think harsh because it would be unreliable essentially? If it's possible is there no way of improving it or it will always be unreliable due to the nature of this method?

Right, I've been thinking about this for a bit and I'm not dead set on doing it like this but it seemed like there was a way so I wanted to explore. Unfortunately I'm not as smart as all you guys and gals but figured I'd ask for opinions.

2

Jaffa6 t1_jdc1gz4 wrote

It's possible, but I think you'd struggle to improve it (though I freely admit that I don't know enough maths to say). But yeah, it's never going to be a reliable method at all.

To be honest, I'd expect you to have more problems with people not being able to sign in as themselves (inconsistent behaviour) than signing in as other people deliberately.

1

geoffroy_lesage OP t1_jdc1x8t wrote

I see, ok. This is encouraging to be honest, I knew there wasn't just going to be a magical solution that is easy to see but I think there is some research needed in this department. This is something that could be huge, and maybe it's not ML but just logic gates chained together.
You said any Neural Net would do? Any particular one you would recommend for testing?

1

the_Wallie t1_jdd0t7w wrote

I don't think thst it's self-evident that all of those individual behaviors can actually yield a truly unique behavioral pattern per user for each type of app. Maybe when combined, if your app involves a lot of deep user interaction, but since you haven't shared what your app is supposed to actually do, it's impossible to give an informed opinion on your probability of success a priori, so all I can say is I'm skeptical but I wish you good luck building a solution.

1

geoffroy_lesage OP t1_jde7chs wrote

Fair enough, no there is no deep user interaction with the app it’s just a normal marketplace app, think Amazon app. I’ve just been relying on a bunch of research papers that seem to suggest that each of those data points individually yield unique profiles with high accuracy but I may be misunderstanding them… just a few:

- https://www.sciencedirect.com/science/article/pii/S1877050921015532

- https://www.sciencedirect.com/science/article/pii/S1877050918314996

- https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=67C08602F99414F622E55151E2EC484C?doi=10.1.1.675.9557&rep=rep1&type=pdf

1