Submitted by Tigmib t3_10awo8f in MachineLearning

Hey guys,

I am working the sector of computer science for agriculture research. I deal here with algorithm to monitor crop conditions and try to simulate what yield will be the outcome.

I am focussing on ML based methods, but data in agriculture can be a quite limiting factor. If you have 100k samples from real crop fields, thats a lot! So we are not like ChatGPT, who just used 500bn word samples to train their model.

To overcome the issues of small data + ML, I want to set up an approach that combines ML methods (learning from data) with expert knowledge.

What do I mean by this: E.g. Everybody knows, if you do not water your plant, it will die. Or if there are 90° Celsius, the plant will just burn. This knowledge is partially stored in so called "crop simulation models" designed by agronomy experts and my idea was to use these expert models to generate synthetic yield data and feed this data into the training dataset for the ML models.

For me that will somehow result in an approach of "constrained machine learning" where I want to combine both. However, does some of you have any other idea how ML and expert models could be combined or the knowledge could be injected to ML methods, except via the training dataset?

I am happy to hear your suggestions!

19

Comments

You must log in or register to comment.

PredictorX1 t1_j47qgxr wrote

Expert knowledge could be encoded as rules whose output is used as features for a machine learning system. These rules would accept data you already have, and produce new data as conclusions which would be fed as extra variables to a modeling algorithm.

8

Meddhouib10 t1_j4andjc wrote

Have any paper in mind that speaks about this stuff ?

1

PredictorX1 t1_j4azldr wrote

No, but the idea is pretty straightforward. Assuming that experts can provide domain knowledge that can be coded as conditions or rules (IF engine_temperature > 95 AND coolant_pressure < 12 THEN engine_status = "CRITICAL"), these can be used to generate 0/1 flags based on existing data to augment the training variables.

This can be made much more complex by using actual expert systems or fuzzy logic. There are entire sections of the technical library for those. For fuzzy logic, I would recommend:

"The Fuzzy Systems Handbook"

by Earl Cox

ISBN-13: 978-0121942700

3

Tigmib OP t1_j4bcr6q wrote

Thanks for that suggestion! Yeah I had thoughts about this. The problem is that plant crop probably has not so binary solutions like a engine status... Maybe a very simple "rule" (e.g. a functions of water access and crop yield) could be added into the loss function. If this easy expert knowledge output a high probability that the plant died (and yield=0) all y_train could be set to 0 also.... However, crop growth relies on so many events that happens during growth, that it would mean to implement many many rules...

1

ndemir t1_j474eox wrote

When I have similar doubt, I ask myself; "forget ML, will statistics help you? Will just defining some rules will help you?" People in that industry already have some kind of idea about how to predict, learn their rules. By the way, I am not suggesting that you should not use ML. I am just asking you to look from a different angle.

7

Tigmib OP t1_j4bbsnf wrote

Thanks, yes that is true, the recent days I had a look into Bayesian Statistics. That might be an alternative to pure ML that I am considering right now

1

fudec t1_j48ku4i wrote

Hi! There is relatively new paradigm, 'Physics informed Machine Learning"

Here is a nice review of the different techniques:

https://www.nature.com/articles/s42254-021-00314-5

The most popular approach is based on physics regularization on neural networks.

PS: link for the paper is offered by the autor:

https://www.researchgate.net/publication/351814752_Physics-informed_machine_learning

6

Tigmib OP t1_j4bd1xk wrote

Thanks a lot! That looks like a very interesting approach! I will have a detailed look into it!

2

trnka t1_j488v5u wrote

You might try Snorkel. The gist is that domain experts write rules and those rules are fed into ML. If that company doesn't work, I'm pretty sure there are alternatives. Or maybe they had their work in a Python library... it's been a while.

Compared to traditional ML, the benefit is that you're involving the subject matter experts more and giving them a say more directly. That tends to ensure that they're bought in to the approach. Having been in healthcare ML for a while, getting buy-in can be very challenging.

2

Maggemkay t1_j48o8cw wrote

Im looking into something similar, essentially combining data driven ML with a knowledge base, but in the context of explainable AI and predictive maintenance.

I have stumbled across something called "Logic Tensor Networks" (search for the paper) which might help in your situation. I need to look into it more, but it combines ML + knowledge bases + fuzzy logic.

Hope you find a solution!

1

Tigmib OP t1_j4bdhpa wrote

Hi, interesting! Do expert models exist for your problem already or would it be only the knowledge database you want to combine?

1

Maggemkay t1_j4bms1x wrote

There might be general existing models that i can fit to my problem, but i havent looked into it yet.

Im interested regardless if they already exist and if i can combine them with other data sources.

1

Cherubin0 t1_j48txr9 wrote

Or you could use ml to predict the error of the rule based system or simulation.

1

currentscurrents t1_j499l3p wrote

Are you trying to do research, or solve a problem? Building expert systems out of neural networks is still a new, experimental idea. If you just want to get the job done you may want to pick more proven methods.

1

Tigmib OP t1_j4bbxyz wrote

I would say both. I have an actual problem (to predict crop yield as accurate as possible) but the way there is definitely a research problem... What proven methods would you think about?

1

idly t1_j48gglo wrote

Look into hybrid modeling, there are multiple ways to do this

0

janpf t1_j4ai7ia wrote

If you use synthetic data (from the crop simulation models), the model will kind of reverse-engineer it (it will learn what the simulation models are doing).

Using a mix of it with real word data, is like regularizing your model (adding a prior) to the simulation rules.

This is something that makes sense, and mixing data often is used. But "making sense" doesn't necessarily means it helps ... that depends a lot on your application. Also the next question is how much synthetic data you may want to mix ... fundamentally you'll have to figure it out by trial&error and having some way of measuring if things are getting better for whatever your extrinsic goal is (your business objective).

0

Tigmib OP t1_j4bezs5 wrote

Yes thats true. This is also what I thought about. Using a mixed dataset or transfer learning approaches (first train on synthetic data, then retrain on real world) should incorporate the domain knowledge. But you are right, right know thats just an hypothesis...but I will test it!

2