A practical example for analyzing a complex seasonal time series with 100,000+ data points by the Unobserved Components Model

Photo by Richard Horvath on Unsplash

Scope

Forecasting is a common statistical task in business. It is of great importance as it affects management decisions and corporate activities of managing resources. The first statistical methods are first developed 100 years ago, and ARIMA models are still frequently used among other modern machine learning and deep learning techniques. Despite its popularity, ARIMA models have some serious drawbacks:

  1. the coefficients of the model are not easy to interpret or need detailed explanation
  2. efficient for small data sets, it is computationally…


Selected EDA tools for gaining insights

Photo by Nicholas Cappello on Unsplash

Exploratory data analysis or EDA for short is specifically important for at least two reasons. It is a process to gather information from a freshly encountered data set and

  1. produce interpretable insights for audience of interest, or
  2. when it comes down to modeling, helps to narrow down the relevant models by qualification of data

I will present some graphical analysis tools in Python that could be useful for drawing conclusions and provide input for building a dashboard or for a model selection approach.

Some skill keywords of interest that might expand the readers knowledge:


Final visual comparison of the resulting eight models

Introduction and reminder

This is the third out of a series of three blog posts on analyzing data originating from a soil testing laboratory. A quick overview of the goals set in the first post:

it has been empirically hinted by professionals, that the liquid limit of a soil sample — from which the physical classification of soil samples is derived — is primarily determined by the humus materials and soluble salt content (represented by specific conductivity) and may affected by the other parameters. …


Introduction and reminder

This is the second out of a series of three blog posts on analyzing data originating from a soil testing laboratory. A quick overview of the goals set in the first post:

it has been empirically hinted by professionals, that the liquid limit of a soil sample — from which the physical classification of soil samples is derived — is primarily determined by the humus materials and soluble salt content (represented by specific conductivity) and may affected by the other parameters. …


cmglee, Mikenorton, United States Department of Agriculture, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons

Introduction to my dataset of choice

When you complete an introductory course on data science, you usually go and practice by yourself what you have learned. Fortunately, plenty of sources offer free datasets that you can use.

The most popular sources are briefly, although not exclusively overviewed here and here.

This is the case for myself as well, but I am fortunate to have a dataset of my own. OK it’s not owned by me, but I lead the team that generated the data. Data is from a soil and plant analysis laboratory.

Because it is not for the public…

Daniel J. TOTH

Graduated as a biochemical engineer. Have primarily been involved in environmental lab work, I have been pursuing a data science career since mid 2020.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store