This document provides a basic algorithm description for aerosol classification from optical data using the benefits of artificial neural networks: capability to learn nonlinear complex relations, to tolerate quite noisy signals and to classify in real time after a proper training process. The method will be developed in two steps: the first step consists in the classification of a large set of data both from simulations and observations for training and in a second step we will do fine adjustments in order to increase the number of best classified aerosol types. The goal is to use as many optical parameters, which are measured by a ground based multiwavelength depolarization Raman lidar, as possible.
This document describes the methodology and the tools used to compute synthetic optical properties of aerosols, as well as the content of the database. The purpose is to obtain a relevant number of training cases for the artificial neural network, considering both pure and mixed aerosol types. Aerosols were simulated starting from the microphysical properties of basic components, internally mixed in various proportions. Validation of the values computed was done by comparison to measurements published in the literature.
This document describes the variables chosen as input parameters in the Artificial Neural Network(ANN) algorithm for aerosol typing.
Ranges of the values computed for pure aerosol classes have been compared to measurements published in the literature. Six classes of pure aerosol have been considered as were previously defined in NATALI-TD01 -01_v01.
This document provides the list of principal software requirements for aerosol classification from optical data derived from lidar measurements. Diverse Artificial neural networks (ANN) algorithms and designs will be used in order to best recognize several pure and mixt aerosols types. At least two ANN configurations will be used depending on the number of parameters considered. Data from simulations and observations will be used to train, test and adjust the ANN performances.
This document explains the methodology used to select observational data which are proper to be used in the process of training and testing the artificial neural network. It also presents the statistics of the selected cases, results obtained in comparison to synthetic data, and proposes a strategy to optimize the performance and the use of the aerosol typing algorithm.
This document shows the basic schematics of the artificial neural networks which will be used to further classify the aerosols type. Few types of neural networks are selected depending on the number of aerosols types to be retrieved and performances of the training and classification processes. Several comparative results on ANN performances using synthetic data are also presented.
This document presents the design of the software which implements the ANN algorithms for aerosol typing.
This document presents the report on lidar, in-situ and integrated column data collected.
The synthetic database is generated by simulating the optical properties of various aerosol types based on available information on the microphysics. The algorithm combines the GADS database (Global Aerosol DataSet) to OPAC model (Optical Properties of Aerosol and Clouds) and T-Matrix code in order to compute, in an iterative way, the intensive optical properties of each aerosol type. The chemical composition of each aerosol type varies in certain limits, in order to mimic as much as possible the large variety of particles present in the atmosphere.
Each aerosol type (pure) is built as an internal mixture of typical components which do not interact physically or chemically, having different mixing ratios. Basic components are picked up from OPAC (Hess et al 1998): water soluble, insoluble, soot, mineral (nucleation, accumulation, coarse), sulfates, sea salt (accumulation, coarse). GADS database is used for the microphysical properties of each component (Koepke et al., 1997). In order to include the non-sphericity of particles, T-matrix code is used to calculate aerosol’s optical properties (Waterman, 1965). Aerosols are considered spheroids with different axis ratios. These values were picked up from the literature (Munoz et al., 2001; Dubovik et al., 2006).
An ANN represents a mathematical projection of the human neural network. It is based on neurons, axons and synapses, the information being propagated as a neural influx. The analysis process takes part during the propagation of information through the synapses, where it is adjusted to properly interpret the given data. The ANN unit consists of neurons at each layer’s base, which include input, output and hidden layers. The outputs of the first layer become the input to the next layer (Braspenning et al., 1995).
Although neural networks can deal with complex and interconnected input vectors, which cannot be well separated using other classification methods, the efficiency of the classification depends strongly on the complexity of the input data, as well as on the training process. The input data must be constrained to a pattern and the ANN should learn to recognize this pattern. In the learning process, some weights are established for each synapse.
Forty-eight ANN structures were tested, out of which three were selected to make the classification, based on their performances: Jordan/Elman with 6 and 8 hidden layers respectively, and Generalized feedforward with 6 hidden layers. Supervised training with Momentum or Conjugate gradient learning rule was applied. Two levels of resolution have been considered in terms of ANN outputs, specifically to provide in the same time a rough classification out of six aerosol types (min. 70% pure and 30% residuals), and a detailed classification out of fourteen aerosol types (six 90% pure, six mixtures of two types, two mixtures of three). High resolution typing is proper for advanced lidar systems which provide highly accurate optical data (less than 10% uncertainty). Low resolution typing can still be performed on optical data with an uncertainty less than 20%.
The software is developed in LabVIEW and is organized in two modules:
The input module accommodates optical profiles in EARLINET standard NetCDF format, for multi-wavelength Raman lidar systems (3b + 2a) with or without depolarization. It finds the layer boundaries and calculates the mean values of the intensive optical parameters, and the associated uncertainties. Multiple datasets are generated taking into consideration all possible values between the error bars, and converted to the ANN format.
The typing module is designed in a two level approach. The first level takes into consideration high resolution typing, i.e. 14 aerosol (6 pure and 8 mixed aerosol types). The second level takes into consideration low resolution typing (6 basic aerosol types). The definition of types at each of the resolutions is described in TD05 (link la internal). Three ANNs produce the classification independently, for high and low resolution typing respectively. A recommendation is made by the software regarding the final aerorol type, based on a voting procedure.
The software allows sequential retrieval of aerosol type for all layers and visualization of the profile retrieved. Analysis of multiple files is also implemented.
More references