The idea behind this R project came from our “pizza night” that we had one day… Many thanks to Darren. After 5 meat pizza was eaten and half of one vegetarian pizza left the idea to analyse the pattern was just what I needed. Let’s see at superfood from data scientist’s perspective.
The origin of the “superfood “
The concept of the “superfood” is a popular one when it comes to food and health. The media is full of reports of ultra-healthy foods, from blueberries and beetroot to cocoa and salmon. These reports claim to reflect the latest scientific evidence, and assure us that eating these foods will give our bodies the health kick they need to stave off illness and aging. But is there any truth to such reports?
Despite its ubiquity in the media, however, there is no official or legal definition of a superfood. The Oxford English dictionary, for example, describes a superfood as “a nutrient-rich food considered to be especially beneficial for health and well-being”, while the Merriam-Webster dictionary omits any reference to health and defines it as “a super nutrient-dense food, loaded with vitamins, minerals, fibre, antioxidants, and/or phytonutrients”.
Criticism of the nomenclature
“There’s no such thing as a superfood. It’s nonsense: just one of those marketing terms,” says University College Dublin professor of nutrition Mike Gibney, throwing on the garb of Ireland’s superfood Grinch. “There is no evidence that any of these foods are in any way unusually good.”
“The European Food Safety Authority was created because the consumer was being conned by marketing people,” says Gibney. The authority bans health claims lacking scientific evidence, so you might find amazing health claims about superfoods in books and on websites, but you won’t on supermarket shelves.
What is the evidence?
In order to distinguish the truth from the hype, it is important to look carefully at the scientific evidence behind the media’s superfood claims. So what data should we use for analysis? What dimensions can be considered scientific?
The idea behind food supplements, also called dietary or nutritional supplements, is to deliver nutrients that may not be consumed in sufficient quantities. Food supplements can be vitamins, minerals, amino acids, fatty acids, and other substances delivered in the form of pills, tablets, capsules, liquid, etc. Supplements are available in a range of doses, and in different combinations. However, only a certain amount of each nutrient is needed for our bodies to function, and higher amounts are not necessarily better. At high doses, some substances may have adverse effects, and may become harmful. For the reason of safeguarding consumers’ health, supplements can therefore only be legally sold with an appropriate daily dose recommendation, and a warning statement not to exceed that dose.
There is a lot of legislation concerning food supplements in Europe and America. Let’s start form Vitamins and minerals, as government Food Safety Authority of Ireland has issued Guidance Note No. 21 “Food Supplements Regulations and Notifications ”.
Taking microelements as a reference point I created a table form the data source “Categories for Food Nutrition Labels” and rank foods by nutrient density.
|% DV per 100g||calcium||Iron||Magnesium||Phosphorus||Sodium||Potassium||Zinc||Copper||Manganese||Selenium|
|Cola Carbonated beverage without caffeine||0%||0%||0%||1%||0%||0%||0%||0%||0%||0%|
|Apples raw with skin||1%||1%||1%||1%||0%||3%||0%||1%||2%||0%|
|Alcoholic beverage beer regular BUDWEISER||0%||0%||2%||1%||0%||1%||0%||0%||0%||0%|
|Tea black brewed prepared with tap water||0%||0%||1%||0%||0%||1%||0%||1%||11%||0%|
|Lamb domestic shoulder whole lean 1/4Inch fat choice raw||2%||9%||6%||18%||3%||8%||32%||5%||1%||32%|
|Beef bottom sirloin roast lean and fat trimmed to 0Inch fat raw||2%||8%||5%||19%||2%||9%||24%||4%||1%||34%|
|Fish salmon Atlantic farmed raw||1%||2%||7%||24%||2%||10%||2%||2%||1%||34%|
|Chicken broilers or fryers leg meat and skin raw||1%||4%||5%||16%||4%||6%||10%||3%||1%||26%|
|Chicken broiler or fryers breast skinless boneless meat only raw||1%||2%||7%||21%||2%||10%||5%||2%||1%||33%|
|Bread white commercially prepared (includes soft bread crumbs)||14%||20%||6%||10%||20%||4%||5%||5%||27%||31%|
|Wheat flour whole-grain||3%||20%||34%||36%||0%||10%||17%||21%||203%||88%|
|Wheat soft white||3%||30%||23%||40%||0%||12%||23%||21%||170%||0%|
|Wheat bran crude||7%||59%||153%||101%||0%||34%||48%||50%||575%||111%|
|Wheat germ crude||4%||35%||60%||84%||1%||25%||82%||40%||665%||113%|
|Spices curry powder||53%||106%||64%||37%||2%||33%||31%||60%||415%||58%|
|Egg whole raw fresh||6%||10%||3%||20%||6%||4%||9%||4%||1%||44%|
|Lettuce iceberg (includes crisphead types) raw||2%||2%||2%||2%||0%||4%||1%||1%||6%||0%|
|Oranges raw all commercial varieties||4%||1%||3%||1%||0%||5%||0%||2%||1%||1%|
|Pineapple raw all varieties||1%||2%||3%||1%||0%||3%||1%||6%||46%||0%|
|Potatoes flesh and skin raw||1%||4%||6%||6%||0%||12%||2%||5%||8%||0%|
|Rice white short-grain raw||0%||24%||6%||10%||0%||2%||7%||11%||52%||22%|
To make the data more visually effective we used R. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, surveys of data miners, and studies of scholarly literature databases show that R’s popularity has increased substantially in recent years.
R and its libraries implement a wide variety of statistical and graphical techniques, including linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, and others. R is easily extensible through functions and extensions, and the R community is noted for its active contributions in terms of packages. Many of R’s standard functions are written in R itself, which makes it easy for users to follow the algorithmic choices made.
R’s data structures include vectors, matrices, arrays, data frames (similar to tables in a relational database) and lists. The capabilities of R are extended through user-created packages, which allow specialized statistical techniques, graphical devices (ggplot2), import/export capabilities, reporting tools (knitr, Sweave), etc. These packages are developed primarily in R, and sometimes in Java, C, C++ and Fortran. A core set of packages is included with the installation of R, with more than 5,800 additional packages (as of June 2014) available at the Comprehensive R Archive Network (CRAN), Bioconductor, Omegahat, GitHub and other repositories.
The “Task Views” page (subject list) on the CRAN website lists a wide range of tasks (in fields such as Finance, Genetics, High Performance Computing, Machine Learning, Medical Imaging, Social Sciences and Spatial Statistics) to which R has been applied and for which packages are available. R has also been identified by the FDA as suitable for interpreting data from clinical research.
I installed R 3.2.1 version on my computer. I prepared my list of food table in Excell, so before I start working on my homework it was necessary to import my data in comma separated values (CSV) – R compatible format.
The code samples above assume the data files are located in the R working directory, which can be found with the function getwd. You can select a different working directory with the function setwd(), and thus avoid entering the full path of the data files. Note that the forward slash should be used as the path separator even on Windows platform.
> setwd(“<new path>”)
Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. R comes with a standard set of packages. Others are available for download and installation. Once installed, they have to be loaded into the session to be used.
To add package follow these steps:
Download and install a package (you only need to do this once).
To use the package, invoke the library (package) command to load it into the current session. (You need to do this once in each session, unless you customize your environment to automatically load it each time.)
On MS Windows:
- Choose Install Packages from the Packages menu.
- Select a CRAN Mirror. (e.g. Ireland)
- Select a package. (e.g. gplots)
- Then use the library (package) function to load it for use. (e.g. library(gplots))
To visualise my data I used this code in R:
data <- read.csv(“microelement food list post.csv”)
rnames <- data[,1]
mat_data <- data.matrix(data[,2:11])
rownames(mat_data) <- rnames
my_palette <- colorRampPalette(c(“white”, “black” ))(n = 650)
data_heatmap <- heatmap(mat_data, Colv=NA, Rowv=NA, col = my_palette, scale=”none”, )
I saved my file in JPEG format 100% quality:
Then I played with different colours and palettes:
my_palette <- colorRampPalette(c(“snow”, “yellow”, “orange”, “brown”, “black” ))(n = 650)
data_heatmap <- heatmap(mat_data, Colv=NA, Rowv=NA, col = my_palette, scale=”none”, )
Heatmap in red was created with color brewer palette. Results visualised differently from previous, as only this palette uses only 9 colours and data position in the table was interchanged automatically.
Why vegetarian pizza lost points comparing to meat alternatives?
Generally speaking, superfoods refer to foods — especially fruits and vegetables — whose nutrient content confers a health benefit above that of other foods. Let’s see does fruits and vegetables has more nutritional benefits than meat, fish or spices?
As we can see from the table vegetables has much lower nutritional value then meat or grains.
From this table we can see real “Superfood” that are wheat germ, curry spice and wheat bran. However different food has a different microelements density: for example from the meat in comparison table lamb is much higher in zinc.
The time for this project is limited. By adjusting data, comparing like with like (raw with raw, oils and fats, etc) we can find a superfood in each category.