Classification of Foliar Diseases in Apple Tree Leaves using Ensemble CNN

Various foliar diseases in apple leaves degrade the quality of apple fruit. Therefore, classifying the diseases in real-time is an essential task to battle the growth of the disease. This project focuses on developing an ensemble deep-learning model to classify the diseases in apple leaves.

Check the GitHub Repo for code. 

Deep Learning | Convolutional Neural Networks | Tensorflow | OpenCV | Streamlit


Project Context

Final year research project
Team-based: 4 members
Duration: 1 year from Aug 2021 - Jul 2022

Roles

Developed ensemble model using python,

Constructed front-end interfaces with streamlit

Tools

Python, Tensorflow & Keras, OpenCV, Pandas, Numpy, Streamlit, Google Colab Pro & VS Code

Project Details

Motivation

"An apple a day keeps the doctor away" - a famous quote that emphasizes that apple fruit is one of the fruits that keeps us healthy. They're rich in fiber and antioxidants. Eating them is linked to a lower risk of many chronic conditions, including diabetes, heart disease, and cancer. Apples may also promote weight loss and improve gut and brain health. But various diseases that attack the apple fruit degrade the quality of the fruit, and at times, the fruit is not edible. The disorders are observed in the leaves and characterized by shape, color, and other factors. With the help of deep learning and image processing systems, apple plantation cultivators can detect and classify diseases in real time, lower the disease spread, and enrich the growth of apple fruit.

Project Phase
Identifying the Problem

Plant leaf diseases significantly threaten the growth of individual species in agricultural production. As a result, reduced yield rates can lead to indeterminable economic downfall. Therefore, the detection and classification of plant leaf diseases play a significant role in agricultural production.

We conducted a literature survey to understand the existing machine-learning methodologies and approaches to addressing the problems related to plant leaf diseases. From our study, we formulated two critical pieces of information:

  • Among the plant leaf diseases, the detection and classification of apple leaf diseases are the least examined.
  • Ensemble learning approaches are the least explored.

With the two pieces of information in mind, we framed a problem statement:

How might we develop a deep learning model to increase accuracy in classifying apple leaf
Note: We are exclusively focusing on classifying the diseases, not the detection. One of the advantages of employing pre-trained deep-learning neural networks is that they can learn on their own to detect the disease spots within a given leaf image.

Proposed Solution

Our proposed solution is to develop an ensemble learning model with appropriate image processing techniques. We focus on ensemble modeling because it significantly enhances model performance compared to single-base models.

Data Collection

We gathered the dataset from multiple sources including kaggle, internet images, and reported images from khan's lab at Cornell university.

  • Dataset type: jpg file type with CSV file containing image annotations.
  • Dataset Size: 18,632
  • No.of Classifications: 12
  • Color scheme: RGB
  • Image Resolution: 2048 x 1035
  • Types of Diseases: healthy, complex, frog_eye_leaf_spot, frog_eye_leaf_spot complex, powdery_mildew, powdery_mildew complex, rust, rust complex, rust frog_eye_leaf_spot, scab, scab frog_eye_leaf_spot, scab frog_eye_leaf_spot complex
Data Analysis

We conducted data analysis to study the dataset using a few visual techniques. This step in our project allowed us to understand the dataset better before we proceeded to the pre-processing phase.

After understanding the essential components of the dataset, it was necessary to analyze the RGB channels because it is crucial to know how various diseases differ from each other from the perspective of the RGB value distribution.

We used histograms to plot the frequency of pixels' intensity values. In an RGB color space, pixel values range from 0 to 255, where 0 stands for black and 255 stands for white. Analysis of a histograms helped us understand the brightness, contrast and intensity distribution of an image.

The red channel values seem to have a roughly normal distribution but a slightly negative skew. This indicates that the red channel tends to be more concentrated at higher values, at around 100. There is a large variation in average red values across images.

The green channel values are more evenly distributed than the red channel values, but they also have a smaller peak and a right-skewed distribution. In addition, the distribution has a larger mode of about 160 and a right skew (in contrast to red). Given that these pictures are of leaves, it makes sense that green is more prominent than red in them.

Out of the three color channels, the blue channel exhibits the most consistent distribution and the least skew (slight leftward skew). The blue channel exhibits significant diversity throughout the dataset's photos.

Image Pre-processing

We experimented with various color spaces from the OpenCV module in image pre-processing. Color spaces help in profiling the images for model training by assigning specific color schemes.

We used jet, bone, inferno, ocean, rainbow, and HSV color spaces from the OpenCV module. We decided to use jet color space because it significantly distinguishes the diseased portions. As a result, the model would be trained at ease and effectively.

Image pre-processing techniques:

  • rescaling = 1./255
  • rotation_range = 180
  • zoom_range = 0.15
  • width_shift_range = 0.15
  • height_shift_range = 0.15
  • horizontal_flip = True
  • vertical_flip = True
  • blurring = Gaussian blurring
Color Spaces for images

I used the OpenCV library to render different color spaces for images before feeding the images for model training. I used color spaces for images because it enhances model learning performance. The disease spots are easily identified by the neural networks and hence enhanced efficiency in classification accuracies.

Restructured Ensemble Model
Performance Analysis
NameTraining AccuracyTraining LossTesting AccuracyTesting Loss
Front-end Development
Reflections

Classification of foliar diseases in apple leaves is essential for apple cultivation.

Ensemble modeling performance is significantly higher than the single base model performance.

Using colormaps can aid significantly in facilitating feature map generation for model learning.

Streamlit is an easy-to-use tool for developing front-end visualization for machine learning projects.

We could have experimented with all transferlearning models for model to deepen the understanding of performance analysis.


Built with Dorik