Dr. María Angélica Rueda Calderón: “One of the most important issues when analyzing data is that the research be reproducible”

Lorenzo Palma, Science in Chile.- “What I like most about the analysis of R is learning to identify the many types of errors that exist,” begins the researcher María Angélica Ruega, who has a doctorate in Agricultural Sciences from the National University of Córdoba, Argentina, and a researcher post-doctoral fellow at the Pontifical Catholic University of Valparaíso (PUCV).

The researcher is part of the faculty of the Diploma in data analysis with R and reproducible research for biosciences, together with the doctor of Sciences, mention in Ecology and Evolutionary Biology, José Gallardo Matus. The experts point out that, during the course, students will develop fundamental skills for storing, reading, processing, analyzing, and presenting research results using the R, Rstudio, and Rmarkdown software.

The researcher delves into the analysis of R and aspects of the diploma course (see program), she explains that currently data analysis is very important, since, for all those who work in scientific research or companies, it allows them to make publications with code and be able to replicate the studies.

During her doctorate in Agricultural Sciences, the researcher began to carry out her first programming in R —I was taking courses in R and data analysis. In the doctorate I began the consulting work in the area of ​​agricultural sciences and medical sciences, advising undergraduate, master’s and doctoral theses, in which advanced programming techniques in R have been used, the expert recalls.

After completing his doctorate, he met Dr. José Gallardo, who suggested that he teach both the diploma and advanced courses in R. This diploma has advanced techniques in data analysis for Biosciences. Angélica Rueda explains that, for all those who are interested in learning data analysis with R, they can do so.

All participants who take the Diploma will end up analyzing data with R. At first it may be a little more difficult, since it is a new language, but they will succeed. There are people who take the diplomas with the data set to analyze, but they do not necessarily have knowledge. They can be research at the doctorate level, companies or own research. There are some who have never used R and this could be the first time to use R, they will also learn and receive guidance to analyze their data —, explained Angélica Rueda.

R, being free software, has a large community of users who share their programming codes on the web, making it possible for any user to use and adapt the code according to their needs. R stands out for being one of the top statistical programs in science. This is an advantage, since R has strong statistical support that other programming languages ​​do not.

The teacher currently works in a large multinational ice cream industry —the fourth largest in the world—, she says that through the analysis of R she managed to program a whole code that analyzes the prediction of sales of ice cream, frozen products and supplies. “I had to program all the sales forecasts of how much will be sold in the season, for this I have used time series, neural networks, among others.” She is in charge of the area, and demonstrates uses of data analysis in R and its possibilities. On the other hand, she also works on genome-wide association analysis and genomic prediction, in her postdoctoral stay at PUCV.

Regarding the Diploma, he highlights that the format is very friendly for learning “The methodology is very suitable for when people do not know how to program or have doubts about their processes”, that is why professionals or graduates of Agronomy, Biology, Marine Biology, Biochemistry, Biotechnology, Microbiology, veterinary and areas related to research with living organisms and biological resources.

—Companies have a vast amount of “data” information, which has not been analyzed because there are no people trained to analyze it, so this Diploma is an important instance to generate that much-needed and required training in this sector. Having knowledge in R will serve as professional experience, but they will be able to have new economic benefits in their jobs at a competitive level. This is the latest that is being used in companies or research institutes.

Dr. Angélica Rueda anticipates that, in this version of the course, she will be in charge of random variables and probability distributions, Rmarkdown, evaluation of parametric test assumptions, introduction to mixed linear models, introduction to general linear models and logistic regression.

—In the diploma, both the statistical part and the programming part are studied in depth. At first it is a new language, and it is a bit difficult to identify possible errors in the process. That is why we do the exercise instance in each class, where we explain the codes and their functionality step by step. In this space, interaction is generated with the students, solving any doubts that arise. Then, when they finish the diploma course, they will be able to program without any problem—explained the specialist in data analysis in R.

—Everyone who completes this course will end up analyzing their data; since they will have enough tools to find the appropriate analysis methodology, it is guaranteed that they will learn —concluded Dr. Rueda.

