In Japan, deep learning models are not protected as intellectual property. Because of that, I'm running the model in the cloud, but that has been causing multiple issues and raising costs. Since this model requires hefty processing power, I'm planning on shipping mini-pc's with powerful GPUs and everything installed directly to the customer. But then how to protect the model, which took a lot of effort, time and money to train, from being stolen?

The main issue here is probably having a market that is broad enough to make monkey, but at the same time niche enough to not make it worth developing a whole new ecosystem only to protect the model. Is there any readily available OS or a form of container made for such a purpose, or does anyone have another suggestion?

Comments

You must log in or register to comment.

lolillini t1_j1xt5rj wrote on December 28, 2022 at 4:40 AM

If a consumer is willing to buy hardware with powerful enough GPU from you, and pay for your model, and set it up, and maintain it, they are likely to value whatever you are offering enough to actually pay a lot per call/inference if you are offering your solution over an API.

Deploying the model on cloud and setting up a scalable API pipeline is a pain for the first time, sure, but I'd say it's waaay less pain than procuring, offering, and maintaining the model on physical hardware. Plus there is IP issues as you mentioned.

It's probably easier for you to hire a cloud or ML architect to setup a proper cloud pipeline and API for your model than shipping physical hardware. You can give a dummy model to your temporary hire to setup the pipeline for you.

幸運を

nexflatline OP t1_j1xy26j wrote on December 28, 2022 at 5:26 AM

Thank you for the tips. If I may give more details to make the problem clearer: at the moment we already have a cloud architect and the cloud ML system is already up and working. But we are dealing with large amounts of real time video data in high resolution, and that is what makes almost impossible to profit using cloud ML (also the latency is not as good as we expected). For this application we need full-HD video decoding at high frame rates.

The end users are people with no special knowledge of anything computer related and operate all through a mobile application (already done and working). Our idea now is keeping the mobile app, but moving the server locally (a mini-pc installed at the customer location). The problem is that the mini-pc would have the model stored in it and we can't find a way to keep it safe.

HGFlyGirl t1_j2al2u4 wrote on December 30, 2022 at 9:05 PM

Whatever solution you find, be mindful of how it impacts the bottom line. It's easy to spend more on protection against theft, than you could lose from a theft.

It could be impossible to make it completely safe from theft, but it can be made difficult and as you say - your customers have little knowledge of computers. I have had a customer actually pay a hacker to steal my software, I caught them at it and a letter from the legal team was all I needed. I caught it because I had legitimate remote access.

Can you encrypt the model and make your software temporarily decrypt it at the point of inference? This might make the model useless in isolation.

solresol t1_j1yl2z8 wrote on December 28, 2022 at 10:20 AM

You don't need intellectual property protection on the model. You can just contractually require exclusivity.

Insert a clause into the contract with your client that says "you have the right to use this software provided you do not re-use/copy/blah/blah the deep learning model in any way other than as specified in ____. This condition is of the essence, and monetary redress for breaking this term may be insufficient."

If you want to, you can add a watermark to your model for each customer (a special image or something where your classifier gives a totally unique output), then you can tell which customer leaked it, and sue them for contract violation.

nexflatline OP t1_j21e0fn wrote on December 28, 2022 at 11:20 PM

That could work. I will consider that possibility, thank you.

RecklesslyAbandoned t1_j1yebzz wrote on December 28, 2022 at 8:45 AM

So the way we protected models from decompilation at my previous place was to build out our own inference engine in C. The concern again was to prevent people from borrowing the model, but also to run on the edge and prevent users needing to upload data to the server (reducing security and privacy concerns).

This had the advantage too of being able to be much smaller, and better constrained on device than a standard python framework, because we only carried what we needed to get the model working. It provided a decent level of defense via a number of signature checks, encryption and rearrangement of data. But it's a big upfront cost - months of engineering effort, and even more when you need to update the core model because the newest technology blows it out the water for a similar performance cost.

djc1000 t1_j1yih6b wrote on December 28, 2022 at 9:43 AM

I’m sorry but, what jurisdiction do you think provides intellectual property protection for ml models? I dont believe any of them do.

solresol t1_j1ylemo wrote on December 28, 2022 at 10:25 AM

An explainable model that was human-readable (e.g. a decision tree) would probably be protected by copyright.

As long it is not just words (i.e. has a diagram) and is not just mathematics (i.e. maybe having a categorical variable might be sufficient), then you might be able to get a patent.

djc1000 t1_j1ym2n5 wrote on December 28, 2022 at 10:34 AM

No, no it wouldn’t. Copyright protects the product of human expression. A set of trained weights are not a human creative expression.

solresol t1_j1yqgub wrote on December 28, 2022 at 11:33 AM

For most algorithms (neural networks, neighbour methods, linear methods, svm) I would agree, but something like a small decision tree could have enough human input to show that there was "substantial human input" compared to the computer-generated part. Perhaps the author could argue that they tried a few different depths or loss functions to achieve a particular aesthetic result, or they manually pruned the tree afterwards for some purpose.

Also, a graphical manifestation of that decision tree would be copyrightable, because there are many human-made choices in its display, particularly if it is designed as a tool for human beings to use to perform inferences. (Again, there's probably no equivalent copyrightable graphical manifestation in other ML techniques, so this wouldn't apply there either.)

But the bulk of your point is true, in all jurisdictions, almost all ML models are not protected by copyright.

PassionatePossum t1_j231qxj wrote on December 29, 2022 at 7:47 AM

We protect our models with TPMs. The model is stored on the device in encrypted form using a device-specific key. During boot-up, the state of the system is compiled into a hash value by the TPM which then decrypts the model. If the system or the software running on it has been modified, this decryption will fail.

The nice thing about TPMs is that they are write-only. Nobody gets to see the key, except for the TPM.

However, be careful not to use GPLv3 licensed software. GPLv3 not only requires you to open-source the software (which is something I could live with) but also demands complete access to the hardware you are running it on (which is completely bonkers).

nexflatline OP t1_j23hkwn wrote on December 29, 2022 at 11:22 AM

That's exactly the type of suggestion I was looking for, and with real life experience. Thank you. I will look more into it and see how it would work in our situation. Thank you very much.

bubudumbdumb t1_j1ypwvd wrote on December 28, 2022 at 11:26 AM

I think the contract and the end user licence agreement is your best bet in terms of IP.

Some time ago I have read some research from Bocconi that concluded that very few industries (like pharma) are happy about how IP protect their competitive advantage. So my suggestion is to think about how to protect competitive advantage not intellectual property.

Even if you don't deploy the model on your customer hw you still have the risk that the model can be used "as a service" to create a synthetic dataset by a competitor (this is one of the risks my team is worried about).

AmbulatingGiraffe t1_j1y2zs7 wrote on December 28, 2022 at 6:19 AM

I don’t have a quick answer but you might want to look into how the video game industry deals with preventing decompiling. If the code is using a python based framework then you might be out of luck but if it’s possible to use a compiled language then there are more options available to you.

nexflatline OP t1_j1y6xw2 wrote on December 28, 2022 at 7:06 AM

>if the code is using a python based framework then you might be out of luck

Unfortunately that's the case.

malkocb t1_j1ycwd7 wrote on December 28, 2022 at 8:25 AM

So, if your code/model is valuable to you, I would suggest you port it to something compiled like C++.

I_will_delete_myself t1_j21349o wrote on December 28, 2022 at 10:04 PM

What about something like Codon that compiles the Python Code?

AmbulatingGiraffe t1_j21ojdt wrote on December 29, 2022 at 12:36 AM

Honestly I don’t know enough about it to provide an informed comment. Maybe worth looking at for OP.

ed_mercer t1_j1yw3ao wrote on December 28, 2022 at 12:41 PM

Password-protect and encrypt the mini-pc, and only provide access to the model through a local API.

nexflatline OP t1_j21f9w0 wrote on December 28, 2022 at 11:29 PM

That is a great idea and it may work. I'll have to check the technical challenges of implementing it, though. Thanks.

gradientpenalty t1_j21o2qt wrote on December 29, 2022 at 12:32 AM

This is abit irrelevant to your topic but I don't think the weights are anything useful at all. The core value or IP which you need to protect is how you obtain the weight itself. This includes the proprietary training data and training pipeline. If you model can be trained from a huggingface transformer model and its training using transformers pipeline + dataset from the hub, I don't think its all that "intellectual" at all.Its like Windows 7 or 8 where pirate software cannot be avoided. But the value of microsoft is the engineers and experience of developing an operating system in billions of different hardware that will allow them to develop newer version of Windows which is the most valuable.

nexflatline OP t1_j22fdhy wrote on December 29, 2022 at 3:58 AM

In the long term we do believe our strength is in the data and training, which is very hard to acquire. But, at least at the early stages, we would like to avoid someone reusing our model as it is.

JustOneAvailableName t1_j1yk4ya wrote on December 28, 2022 at 10:07 AM

Encypt the drive. Be the only one with password acces to the machine. Don't give the model output exactly.

PredictorX1 t1_j1yn6sz wrote on December 28, 2022 at 10:49 AM

Perhaps the model could be adulterated in some way which requires reversal, which is calculated remotely? My thought is that the model being calculated locally would be unusable, and the unlocking would be a simple hash function or something similar which would require minimal telecommunications bandwidth. Similarly, if calculation of the model could be divided into parts which require assembly in a final step, this could be worked the same way.

bitemenow999 t1_j1yxhfy wrote on December 28, 2022 at 12:56 PM

You do realize same results can be achieved irrespective of the "model", by changing the number of neurons in one layer you are essentially creating a new model...

"Protecting" your model doesn't make sense unless it has some new type of math involved in which case you can patent the method.

What you can protect is not disclose the training method (if it is somewhat unique) or not share the training data. Or you can wrap it up as a software and Copywrite it.

I_will_delete_myself t1_j21qqpj wrote on December 29, 2022 at 12:51 AM

You could use Libtorch to interact with Python trained models in C++. Don’t expect an easy Onnx runtime but it’s something to work with.

Professional-Ebb4970 t1_j1zvd1b wrote on December 28, 2022 at 5:16 PM

Did you train using a public dataset and public ML techniques? If so, the model is not your intellectual property regardless of what any country may say

nexflatline OP t1_j21dtkg wrote on December 28, 2022 at 11:19 PM

Private dataset that we acquired and labeled ourselves.

aidenr t1_j1y2fqj wrote on December 28, 2022 at 6:12 AM

Don’t share the data set that you paid entirely to create and then your model won’t be reproducible. Also don’t run your model in the cloud. If you’re using someone else’s data or someone else’s machines, what do you think you own exactly? The algorithm? Patent it.

nexflatline OP t1_j1y6woz wrote on December 28, 2022 at 7:05 AM

The dataset is secured, no problems with that, but someone could take the model and use it as it is. Our model is trained and runs on a popular open source framework, which we advertise as a feature since many people are familiar with how well it works already. Our main "product" is the model itself, made by painstakingly labeling hundreds of thousands of videos manually. Unfortunately deep learning models are not considered algorithms here and cannot be patented at the moment. So all we can do is hide it.