### Artikel

## More variables, more problems: rethinking fucoidan bioactivity by dimension reduction

### Suche in Medline nach

### Autoren

Veröffentlicht: | 7. Oktober 2020 |
---|

### Gliederung

### Text

Fucoidans are a group of heterogeneous biopolymers with various bioactivities. These activities are seemingly affected by seasonality, species and region. The method used for extraction of fucoidan also affects the bioactivity, as different methods produce different fucoidan fractions. The multivariate nature of fucoidan make production of fucoidan for pharmaceutical use challenging. Modelling of the bioactivity from known variables may give some insight into what activity we can expect from a given fucoidan extract. Principle component analysis (PCA) is a powerful tool within multivariate statistics. PCA shows whether the data points in a large data set are correlated, and if we can remove redundant variables from our statistical model. The removal of redundant variables is also called Dimension Reduction. Here, we use PCA to see which variables are needed to describe the bioactivity of a given fucoidan extract.

The data was provided by the FucoSan Research Group [1] and downloaded from their publicly available online Database https://doi.org/10.5281/zenodo.3876379. The open source software R (4.01 for Windows) and RStudio was used for the statistical analysis of the FucoSan data.

Categorical variables (e.g. country and species) were one-hot encoded, prior to data cleaning. Data cleaning on the FucoSan data was performed by removing rows with missing entries, and since PCA requires scaling, all rows containing the value 0 had to be removed. This meant that the variable ‘xylose‘ had to be removed entirely from the dataset, as it contained many 0 entries. All 0 values in other variables were removed by setting 0=NA and using the Na.omit() function in R. Redundant variables (variables explaining similar properties) were removed. The response variables (bioactivity results) were not included in the initial modelling of the data. After data cleanup, the remaining data contained 31 observations over 10 variables.

The screenplot of the 10 principle components showed that 3 PCs explains 80% of the variance within the dataset, while the remaining PCs explain less than a single variable (eigenvalues <1). This means that 8 variables can be discarded from the data set. The results from the principle component analysis show that the extraction method, the fucose content and the degree of sulfation, account for the largest amount of variance within the dataset. These variables also influence each other, however, as there is evidence that the extraction method affects the degree of sulfation. The bioactivity was plotted as a function of the degree of sulfation and the molecular weight, and here we saw that the ophthalmology bioactive fucoidans had a relatively high degree of sulfation, while a wide range of molecular weights was observed. One caveat of using a database, where not all analytical methods have been standardized, is that the PCA provides exaggerated variances: If the datapoints are not directly comparable (different analytical methods), the variance will appear more prominent.

References

### References

- 1.
- Neupane S, Bittkau KS, Sandow V, Ptak S, Mikkelsen MD, Dörschmann P, Ohmes J, Fretté X, Meyer AS, Fuchs S, Klettner A, Alban S . FucoSan: Extraction of fucoidans from different brown algae species using different methods and their chemical and biological characterization (version 100) [Data set]. Zenodo; 2020. DOI: 10.5281/zenodo.3876379