Soil sensor technology: you are as good as your database
AgroCares is often confronted with questions about the quality of the results of our soil testing technology. But what is quality? Is it accuracy, precision, repeatability and when is quality assured? Dr Christy van Beek, chief agronomist at AgroCares, shares her view on the matter.
After 17 years working as a researcher at Wageningen University and Research Center, I joined SoilCares (now AgroCares) late 2017. It was an opportunity I could not resist; it seemed my one chance to bring my scientific knowledge into practice.
AgroCares develops soil testing technology using sensor data. It appeared to me that this was the game changer that could bring soil information into the hands of farmers. Notably, isn’t information the predecessor of motivation and doesn’t motivation lead to implementation?
It was a bit astonishing to experience that selling the equipment is more difficult than I thought. Call it naïf, but I thought that such a great innovation would sell itself. Yet, potential clients nearly always challenge us with questions about quality.
And that is where the fun starts...What is quality? Is it accuracy, precision, repeatability and when is quality assured? When it equals wet chemistry laboratory results? For sure not. Numerous studies have confirmed the relatively high deviation between conventional laboratories for similar soil samples and worldwide proficiency testing schemes are set up to monitor differences between soil laboratories (see e.g. WEPAL).
So, requiring a ‘good match’ with a random conventional laboratory is like requiring the AgroCares technology to hit a moving target. Because of this deviation between different conventional laboratories, AgroCares has decided to analyse all calibration samples in our own laboratory, which we (I have to admit, it is a bit sniffy) call the “Golden Standard Laboratory”. We analyse all samples in this laboratory, not because we think it is so much better than other laboratories (but also not worse), but because it is the same laboratory.
In the GSL all samples are analysed for 93 parameters. These parameters include chemical elements, but also a range of other parameters like texture, EC and pH. For all these elements a statistical model is run to find correlations between the values from the GSL and from the results from the sensor laboratory, i.e. the spectrograph. I certainly undervalue my colleagues from the research department when I say ‘a statistical model’... I have had my share of statistics, but this is way beyond my understanding, especially when ‘machine learning’ comes in place. What is important, is that at the end of the day (or better; at the end of the week, because this calculation takes up one or two weeks computer calculation time) algorithms are developed that can predict the values of the GSL from the spectrograph. Not surprisingly, not every parameter has an equally good prediction model, some of them being rather dreadful. So, out of these 93 elements, only between 30 and 49 parameters are released, i.e. meet the quality criteria.
In case potential clients are interested to use our products in a not yet calibrated country, we have to discuss the calibration process. This is often a difficult discussion and it depends very much on the knowledge level of my counterpart whether we make it to the end.
Global database with local representation
AgroCares has a global soil database. This means that all data, from all over the world, are stored in one database and that the algorithms (prediction models) use all data. So, what happens when a soil is sampled and the spectrum is sent to the cloud for analysis?
Basically, it enters the prediction model, which looks at the slopes, peaks and valleys of the spectrum and to neighbouring spectra. A neighbouring spectra is not a geographical neighbour, but a spectra which has similarities to the spectrum at hand.
The information of the current spectrum and the neighbouring calibration spectra are used to find the best predictions. In the case of the LiaB also information from XRF and MIR are converged to improve the predictions.
When a new country is calibrated there is no need to duplicate existing spectra, but how to know whether the spectra of new country is already present in the database, or not? Well, we don’t know, but we can estimate. The spectrum of a soil sample depends on e.g. geology, climate and land use of the soil. In our calibration procedure we look for combinations of these factors that are not yet covered. Sometimes the estimation that comes of our procedure is that only a limited number of samples are needed. This happens for instance for neighbouring countries (and now I mean physical neighbours). But it also happens that a certain number was estimated, but, after the calibration was done, the release criteria were not yet met and we have to go back to take more samples. As a consequence, the amount of calibration samples is higher for countries that were calibrated first than for countries that followed later.
Covering all the spectra in the world
Consequently, at some point in time, the calibration database of AgroCares will have covered all possible spectra in the world. When we reach that stage, we are finished with calibration and the technology can be used anywhere. This point may not be so very far from now. The calibration database currently has 14000 samples, which equals 21 updates of the calibration database. Notably, the calibration database is updated each month. Since update 8 we hardly see more improvements in the performance of the prediction model (Figure 2), indicating we are reaching the point of saturation. This concept also explains why we need fewer samples to calibrate Ivory Coast compared to Kenya (which are about the same size) to reach the same level of accuracy; Kenya was calibrated prior to Ivory Coast.
We expect to hit the global calibration at 30000 samples. With the current speed of 150 samples per week, this is within 3 years.
Think in terms of impact
Now, how does this relate to my ambition to bring knowledge into the hands of the farmer? A farmer never asks me for accuracy levels. The main thing I have learned in my first year at SoilCares (now AgroCares) and SoilCares Foundation is that if you want to reach to a farmer, you have to reach its influencer, or in most cases, the client that uses our technology. The client for good reasons wants to know the accuracy and we try to be as transparent as possibly. But please, when you, as a potential client, ask us for ‘the accuracy’ (which does not exist) please look at the entire story.
And think in terms of impact... soil data is only one factor to develop fertiliser recommendations and, something that really surprises me, no one ever asks me for crop nutrient contents and uptake rates, which are just as important to determine fertiliser recommendations.