Practical example of balancing model performance and computational resource limitations — with code and visualization

Introduction

XGBoost or eXtreme Gradient Boosting is one of the most widely used machine learning algorithms nowadays. It is famously efficient at winning Kaggle competitions. Many articles praise it and address its advantage over alternative algorithms, so it is a must-have skill for practicing machine learning.

Although XGBoost is relatively fast, it still could be challenging to run a script on a standard laptop: when fitting a machine learning model, it usually comes with hyperparameter tuning and — although not necessarily — cross-validation. Tuning may increase the computational resource demand exponentially, if all combinations of hyperparameters are taken into account. …


A practical example for analyzing a complex seasonal time series with 100,000+ data points by the Unobserved Components Model

Scope

Forecasting is a common statistical task in business. It is of great importance as it affects management decisions and corporate activities of managing resources. The first statistical methods are first developed 100 years ago, and ARIMA models are still frequently used among other modern machine learning and deep learning techniques. Despite its popularity, ARIMA models have some serious drawbacks:

  1. the coefficients of the model are not easy to interpret or need detailed explanation
  2. efficient for small data sets, it is computationally…


Selected EDA tools for gaining insights

Exploratory data analysis or EDA for short is specifically important for at least two reasons. It is a process to gather information from a freshly encountered data set and

  1. produce interpretable insights for audience of interest, or
  2. when it comes down to modeling, helps to narrow down the relevant models by qualification of data

I will present some graphical analysis tools in Python that could be useful for drawing conclusions and provide input for building a dashboard or for a model selection approach.

Some skill keywords of interest that might expand the readers knowledge:


A data science project for justifying empirical observation in the lab

Introduction and reminder

This is the third out of a series of three blog posts on analyzing data originating from a soil testing laboratory. A quick overview of the goals set in the first post:

it has been empirically hinted by professionals, that the liquid limit of a soil sample — from which the physical classification of soil samples is derived — is primarily determined by the humus materials and soluble salt content (represented by specific conductivity) and may affected by the other parameters. …


A data science project for justifying empirical observation in the lab

Introduction and reminder

This is the second out of a series of three blog posts on analyzing data originating from a soil testing laboratory. A quick overview of the goals set in the first post:

it has been empirically hinted by professionals, that the liquid limit of a soil sample — from which the physical classification of soil samples is derived — is primarily determined by the humus materials and soluble salt content (represented by specific conductivity) and may affected by the other parameters. …


Introduction to my dataset of choice

When you complete an introductory course on data science, you usually go and practice by yourself what you have learned. Fortunately, plenty of sources offer free datasets that you can use.

The most popular sources are briefly, although not exclusively overviewed here and here.

This is the case for myself as well, but I am fortunate to have a dataset of my own. OK it’s not owned by me, but I lead the team that generated the data. Data is from a soil and plant analysis laboratory.

Because it is not for the public…

Daniel J. TOTH

Biochemical engineer, former environmental lab analyst with quality focus. I do my best to bring high quality content to data science learners/enthusiasts.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store