Waking Up Publishing with Interactive Coffee Data

interactive coffee dataset

When coffee is sold as single origin or as the more expensive Arabica beans— do you really know whether you are getting what you’re paying for? Different coffee-producing regions need to enforce the standards and reputation of their coffee, and there is a growing industry looking at different technologies to more accurately classify and test coffee beans from different origins. Researchers in Columbia at the universities Universidad del Valle and Universidad del Atlantico, and the company Almacafe have taken steps toward making it easier for the industry to validate the variety under which the coffee is being sold. For this, they analysed hundreds of coffee samples from multiple countries using highly sensitive Nuclear Magnetic Resonance (NMR), and made these data freely available for broad, inexpensive, and interactive use to  look at coffee to see ‘what’s in that cup’. A new paper in GigaByte allows just that, and does it showcasing new interactive features that let you browse this interactive coffee data in the publication itself. Lead author Julien Wist from Universidad del Valle (pictured) explains more in one of our author Q&As here.

Dr Julien Wist from Universidad del Valle, Cali, Colombia

Can you tell us about this interactive coffee dataset, and how it was collected?
This dataset was collected by Almacafe in Bogota and analysed at Universidad del Valle in Cali, Colombia. The Colombian Coffee Federation must enforce the Protected Geographical Indication (GPI) that protects Colombian Coffee high standards of quality. Therefore they have supported several research projects using different technologies to classify coffee beans from different origins. To date, I think this is one of the largest collections of samples and spectra acquired on coffee and it is made public.

What insight does NMR data give you, and how would you like people to use this data?
As just mentioned, NMR gave us information about the origin of coffee. It also became apparent very quickly that NMR can give accurate information about coffee quality, although this was not the primary goal of that research. Although roasting is very important as it can ruin the best beans, it is impossible to make good coffee out of bad beans. Our research group had a wonderful time working with coffee samples. The whole lab was, for once, smelling nice, and we could properly cup the samples we were analysing! Almacafe introduced us to the coffee business, to cuping and coffee quality and to coffee farming, this was an amazing journey into the world of coffee. The sample preparation is so simple that we just prepared coffee, cold for the magnet and hot for us!”

NMR spectra can reach such a resolution and accuracy that it allows to detect the impact of external conditions on the composition of coffee within a single experiment

Coffee is such a popular crop, so what applications are there in using this data to producing a better quality cup?
The next step is surely to use NMR and other techniques to follow sensory profiles and to monitor the outcome of research project aiming at improving some sensory scores, by using sophisticated fermentation processes for instance. NMR can be used for exploration of markers, at very high fields (very expensive, too) and can also be translated with benchtop (cost effective) low field devices once markers are known. NMR has been in the chemistry lab for half a century now, mainly for elucidation of structures. Now we analyse more and more complex samples, with hundreds of compounds in a single experiment, with applications in medical research or in agriculture. The non destructive nature of NMR also makes it invaluable to study dynamic processes in-situ.

Tell us about NMRium, and what can readers get through browsing the different coffee spectra with it?
NMRium is the newest iteration of a project that started 2 decades ago to bring NMR spectra to the browser. If you look at figure 4, anyone can try to find the signal of caffeine. Arabica beans have a lower content in caffeine, and it is thus possible to distinguish both arabica and robusta by just looking at the correct region, try it for yourself! (answer: look at the region between 7.83 and 7.87 ppm).

interactive coffee dataset example
Browsable spectra example embedded in the paper (interact with it yourself).

Visualisation of data is often difficult and requires expensive pieces of software. Often, the consequence is that data is overlooked and simply fed into a black box. I think the first step should always be to look at the data. NMRium does that in the browser and for free.

From exploring these datasets yourself have you found anything interesting?
Well we found out that we can indeed tell Colombian coffee apart! You can keep buying your 100% Colombian coffee safely, or start doing so!

Making large data sets interactive directly within the article is possible due to the fact that GigaByte uses new custom-built, end-to-end publishing technology that also includes the ability to integrate interactive content. This ability increases trust in article content and moves scientific publishing beyond the current standard of providing articles online, but static, into a living document. On top of the NMR-viewer in this article, GigaByte articles have other data visualisation tools such as Hi-C maps,3D imaging viewers that can run on VR-headsets, interactive maps and interactive protocols, and even Executable Research Articles. These types of embedded interactive tools showcase new things that can be done in publishing, and demonstrate this more hands-on approach as a way to share research in a manner better suited to communicate modern research and data. 

For more on the interactive features of GigaByte check out this video.

Further Reading
Osorio J et al. 1D and 2D NMR spectra of coffee from 27 countries. GigaByte, 2022 https://doi.org/10.46471/gigabyte.50