Doctoral School in Environmental Engineering


     A spatial decision support system
to assess personal exposure to air pollution
     integrating sensor measurements
                     Pietro Zambelli


                        March 2015
Doctoral thesis in Environmental Engineering, 26th cycle
Faculty of Engineering, University of Trento
Academic year 2014/2015
Supervisor: Prof. Marco Ciolli, Prof. Marco Ragazzi


University of Trento
Trento, Italy
31st January 2015
To Irene and Lorenzo,
           with love.
  Facing it, always facing it,
that’s the way to get through.
                      Face it.


              Joseph Conrad
Acknowledgments

This Ph.D. thesis contains the result of research undertaken at the Department of En-
vironmental Engineering of the University of Trento. Without the ﬁnancial support of
Dolphin S.r.l. this project would never have started; therefore my gratitude goes to
Mauro Chelodi.
    I would have never reached the point of ﬁnishing my dissertation without the help
and support of others. First and foremost, I offer my gratitude to my supervisors:
Marco Ciolli, who has supported and guided me through a path full of pitfalls such as
the Ph.D. experience is, and Marco Ragazzi, who has believed in the project idea.
    I thank Stafan Reis for his valuable help in revising and commenting the work
and the material presented in this thesis; his hints and critiques were always precious.
Massimo Vieno was the person that welcomed me at CEH and gave me the opportunity
to present my work at the “EMEP - group”, (I know that the name has changed, but
in my heart it will remain the “EMEP - group”). And of course a special thanks to
the whole group that patiently listened to my presentations on image classiﬁcation of
urban micro-environments. Without the work and support that have been provided by
Mathew Heal and Lin Chun and their colleagues to organize and collect the data from
the Passive Diffusive Tubes, this thesis would hae had a different shape, so thank you.
A special thanks to a warm and wonderful person, the best presenter of the University
of Edinburgh, Andrea Moring, thanks for the long chats that we had on the bus.
    In addition, I thank Markus Neteler for his hints and ideas and SÃ˝ ren Gebbert,
                                                                      u
who supervised me during my Google Summer of Code (2012), for his kindness and
availability.
    These four years have been a challenging trip, with both ups and downs. Fortu-
nately, I was not alone on this road, but accompanied by Irene, my wife, who patiently
endured me all those times when, instead of going out to enjoy our free time, I was
thinking, reading, and coding for my thesis; by Lorenzo, my son, who forced me to
hurry up the end of this manuscript, and by my family, who always encouraged me.
    Finally a special thanks to all the open source communities that supported me
during these long years, in particular the GRASS-developers and Python communities,
they inspire and stimulate me with new ideas and tools.
Contents

List of Figures                                                                        xiv


List of Tables                                                                         xvi

1 Introduction                                                                          1
   1.1   Status quo, context and motivations . . . . . . . . . . . . . . . . . .        1
         1.1.1    Monitoring sensor network . . . . . . . . . . . . . . . . . . .       2
         1.1.2    Model spatial variability . . . . . . . . . . . . . . . . . . . .     2
         1.1.3    Decision support system and air quality . . . . . . . . . . . .       4
   1.2   Aims and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . .      4
   1.3   Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        4
   1.4   Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        5

2 Low-cost sensors: a new paradigm for environmental monitoring                         7
   2.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     7
   2.2   Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . .        9
   2.3   Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . .       9
   2.4   Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     11

3 PyGrass: a high level GIS library for research and rapid prototyping                 13
   3.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    13
   3.2   Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       16
   3.3   Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   21
   3.4   Discussion and Benchmarks . . . . . . . . . . . . . . . . . . . . . .         23
   3.5   Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     24

4 Machine learning classiﬁcation of urban micro-environments                           27
   4.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    27
   4.2   Material and methods . . . . . . . . . . . . . . . . . . . . . . . . . .      29
         4.2.1    Study area and data . . . . . . . . . . . . . . . . . . . . . . .    29
         4.2.2    Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    30
XII                                                                              C ONTENTS


            4.2.3   Methodology . . . . . . . . . . . . . . . . . . . . . . . . . .      31
      4.3   Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . .     36
      4.4   Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    44

5     Geographical factors and urban Nitrogen Dioxide micro-environments                 49
      5.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   49
      5.2   Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . .      50
      5.3   Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . .     53
      5.4   Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    56

6     Decision support system to assess and compare the effect of different ac-
      tion on the urban air quality                                                      63

7     Regressed scenarios comparison: a methodology for urban planners                   65
      7.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   65
      7.2   Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . .      66
      7.3   Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . .     69
      7.4   Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    70

8     Conclusions                                                                        75
      8.1   Pro and cons of the proposed methodology . . . . . . . . . . . . . . .       76
      8.2   Future development . . . . . . . . . . . . . . . . . . . . . . . . . . .     77
List of Figures

 2.1   Comparison of CO measurements . . . . . . . . . . . . . . . . . . .           10
 2.2   CO Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      11
 2.3   Residuals ﬁrst week . . . . . . . . . . . . . . . . . . . . . . . . . . .     11

 3.1   Module, Raster and GIS classes. . . . . . . . . . . . . . . . . . . . .       19
 3.2   Vector class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   19

 4.1   Study area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    30
 4.2   Bands ratio indexes . . . . . . . . . . . . . . . . . . . . . . . . . . .     37
 4.3   Segments thresholds . . . . . . . . . . . . . . . . . . . . . . . . . .       38
 4.4   Texture ﬁrst order statistics (1) . . . . . . . . . . . . . . . . . . . . .   39
 4.5   Texture second order statistics (1) . . . . . . . . . . . . . . . . . . .     40
 4.6   SVC domain exploration . . . . . . . . . . . . . . . . . . . . . . . .        41
 4.7   Score matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    43
 4.8   Machine learning classiﬁcation . . . . . . . . . . . . . . . . . . . . .      43
 4.9   Precision/Recall curves . . . . . . . . . . . . . . . . . . . . . . . . .     46

 5.1   Passive Samples NO2 . . . . . . . . . . . . . . . . . . . . . . . . . .       51
 5.2   NO2 concentrations . . . . . . . . . . . . . . . . . . . . . . . . . . .      53
 5.3   NO2 concentrations . . . . . . . . . . . . . . . . . . . . . . . . . . .      54
 5.4   Weeks Spearman correlation . . . . . . . . . . . . . . . . . . . . . .        54
 5.5   Correlation between weeks . . . . . . . . . . . . . . . . . . . . . . .       55
 5.6   Positive Spearman’s correlation coefﬁcient . . . . . . . . . . . . . . .      57
 5.7   Negative Spearman’s correlation coefﬁcient . . . . . . . . . . . . . .        57
 5.8   Positive Spearman’s rank correlation coefﬁcient (ρs ) between the trans-
       formed geographical features (µ = 0 and σ = 1) and NO2 concentration 59
 5.9   Negative Spearman’s rank correlation coefﬁcient (ρs ) between the trans-
       formed geographical features (µ = 0 and σ = 1) and NO2 concentration 60
 5.10 Correlation coefﬁcient between measured data and regressed . . . . .           61
 5.11 Measured vs regressed data . . . . . . . . . . . . . . . . . . . . . . .       61
XIV                                                                     L IST OF F IGURES


  5.12 Regressed concentrations . . . . . . . . . . . . . . . . . . . . . . . .       62

  7.1   Amenities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     67
  7.2   Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     68
  7.3   Land-use changes . . . . . . . . . . . . . . . . . . . . . . . . . . . .      69
  7.4   Land-use change . . . . . . . . . . . . . . . . . . . . . . . . . . . .       70
  7.5   NO2 Variation at nursery level . . . . . . . . . . . . . . . . . . . . .      71
  7.6   NO2 differences at nursery level . . . . . . . . . . . . . . . . . . . .      72
  7.7   NO2 variations at city scale . . . . . . . . . . . . . . . . . . . . . . .    73
List of Tables

 2.1   Selected pollutants for each critical case. . . . . . . . . . . . . . . . .     9
 2.2   Comparison of different sensors concerning range and resolution for
       air pollutants: NO2 , CO, O3 , all units are in ppm. . . . . . . . . . . .     10

 3.1   GRASS C API consists of 2,698 C functions that are available through
       the ctypes library, divided into 10 different ﬁelds. This data is derived
       from the ofﬁcial GRASS source code (Zambelli et al., 2013). . . . . .          15
 3.2   Comparison of the most used GIS software (Zambelli et al., 2013). . .          15
 3.3   Table with benchmark results, reported in seconds, using different
       computational extents and a different number of vector points (Zam-
       belli et al., 2013). . . . . . . . . . . . . . . . . . . . . . . . . . . . .   22

 4.1   Table with the class name, numeric label and number of segments used
       to train all the different machine-learning algorithms . . . . . . . . .       34
 4.2   Deﬁnitions of the Accuracy, Precision and Recall parameters used to
       evaluate and compare the performance of different machine-learning
       classiﬁers (Wikipedia, 2014a). . . . . . . . . . . . . . . . . . . . . .       36
 4.3   Features importance in percentage % of each couple raster-statistics. .        42
 4.4   Confusion matrix for the ﬁrst best six classiﬁers. . . . . . . . . . . . .     45
 4.5   Total accuracy reach by each classiﬁer, in the table are reported mean,
       maximum, minimum, standard deviation and the time needed for cross-
       validation using 5 k-folds. . . . . . . . . . . . . . . . . . . . . . . .      47

 5.1   Spearman’s correlation coefﬁcient (ρs ) computed between the average
       value of the concentration of NO2 and different geographical factors,
       In this Table are reported selected geographical factors with a moder-
       ately or strong positive or negative Spearman’s correlation coefﬁcient
       ρs > 0.4 ∨ ρs < −0.4. . . . . . . . . . . . . . . . . . . . . . . . . . .      58
XVI                                                                      L IST OF TABLES


  5.2   Root Mean Square Error (RMSE) computed between the average value
        of the concentration of NO2 and the regression value assess using a
        left-one-out cross-validation method for each machine learning algo-
        rithm. In this Table are reported the mean value the maximum, mini-
        mum, standard deviation and the computational time needed for cross-
        validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   58

  8.1   Pros and Cons of the methodology described in this manuscript. . . .          76
Preface

All of the work presented henceforth was conducted between the Department of Civil
and Environmental and Mechanical Engineering at the University of Trento, the Centre
for Ecology and Hydrology (CEH) of Edinburgh and the European Research Academy
(EURAC) of Bolzano. The ﬁrst three years of the PhD. were funded by Dolphin S.R.L.
three months were covered by CEH and the last year by EURAC. All projects and as-
sociated methods were discussed with and approved by my supervisors: Marco Ciolli
and Marco Ragazzi.
   The work is split in seven main chapters sequenced to logically represent the of
investigation I followed during my study. They can be read sequentially or separately.

Chapter 1 introduces the problem of the spatial variability of air pollutant character-
      ization, providing an overview of the state of the art in terms of technology and
      modelling;

Chapter 2 focuses on emerging low-cost sensors for outdoor quality monitoring and
      describes the result of a two-week campaign for Carbon Monoxide (CO), which
      we carried out in the city of Trento (Italy). It explains the limits and potential of
      this technique;

Chapter 3 introduces PyGRASS, an object-oriented python library for GRASS GIS,
      which I developed speciﬁcally to facilitate the following phases of data analysis
      and classiﬁcation;

Chapter 4 describes in detail the methodology used to classify urban micro-environments
      combining Geo-Object-Based Image Analysis (GEOBIA) and machine-learning
      algorithms;

Chapter 5 ranks and compares the correlation between geographical factors and mea-
      sured data of a six-week campaign collecting Nitrogen Dioxide (NO2 ) concen-
      trations, using passive sample, in the city of Edinburgh. It explains the method-
      ology followed to assess NO2 concentrations combining machine-learning re-
      gression methods and geographical features;
XVIII                                                                   L IST OF TABLES


Chapter 6 takes advantage of the results of the modelling, proposing three different
        scenarios for improving the air quality in the neighbourhood of a kindergarten
        in Edinburgh;

Chapter 7 describes the main advantages and disadvantages of the presented method-
        ology and the limits of its applicability in other contexts.

   A version of Chapter two has been published in: Rada E.C., Chisté A., Schiavon
M., Ciolli M., Zambelli P., Brini M., Marmo L., Chelodi M. 2012 “A contribution to
the development of a public – private integrated network of wireless sensors for an en-
hanced management of air quality” in Sustainable Technology for Environmental Pro-
tection, Milano, 2012, p. [1-4]. - ISBN: 9788890355721. Proceedings: SIDISA2012,
Milano, 26-29 June 2012. and Rada E.C., Ragazzi M., Brini M., Marmo L., Zambelli
P., Chelodi M., Ciolli M. 2012 “Perspectives of low-cost sensors adoption for air qual-
ity monitoring” in Scientiﬁc Bulletin - “Polithenica” University of Bucharest. Series
D, Mechanical Engineering, v. 74, n.2, p 241-250., I was responsible for the data anal-
ysis and comparison between data coming from the ofﬁcial measurement stations and
low-cost sensors.
   A version of Chapter three has been published in: Zambelli P., Gebbert S., Ciolli M.
2013 “Pygrass: an object-oriented Python API for GRASS GIS”. ISPRS International
Journal of Geo-Information 2, p. 201–219. doi:10.3390/ijgi2010201. This part of the
work has been funded by Google within the Google Summer of Code 2012. I wrote
the code and built the main benchmarks, S. Gebbert was the supervisor author of the
code and M. Ciolli the supervisor for the manuscript composition.
   A version of Chapter four has been presented at Zambelli, P., Vieno, M., Reis,
S., Ragazzi, M., Ciolli, M., 2013. “Application of machine-learning techniques for
the characterization of urban micro-environments to assess spatial variability of air
pollutants”. Urban Environmental Pollution 2013 Asian Edition (UEP2013), Cre-
ating Healthy, Liveable Cities, 17-20 November 2013, Beijing, China and Reis, S.,
Zambelli, P., Vieno, M. and Reis, S. 2013 “Developing a virtual observatory for per-
sonal exposure assessment”, 20th International Congress on Modelling and Simula-
tion(MODSIM2013), Adelaide, South Australia, from Sunday 1 to Friday 6 December
2013. At the moment a version of the chapter is under review in Zambelli, P., Vieno,
M., Reis, S., Neteler, M., Ragazzi, M., Ciolli, M. XXXX. “Machine learning classiﬁ-
cation of urban micro-environment: the case of Edinburgh, Scotland” under revision
in Transaction in GIS. Concerning the characterization and extraction of urban micro-
environments I was the lead investigator, responsible for all major areas of concept
formation, data collection and analysis, as well as the bulk of the manuscript com-
position. Marco Ciolli provided the contact with CEH researchers M. Vieno and S.
Reis who were involved in the early stages of concept formation and contributed to
L IST OF TABLES                                                                     XIX


manuscript edits. M. Neteler provided ideas for data analysis and M. Ciolli and M.
Ragazzi were the supervisory authors.
    I was the lead investigator for the projects found in Chapters ﬁve and six, where I
was responsible for all major areas of concept formation, data collection and analysis,
as well as a majority of the manuscript composition. This part of the work has not been
published yet, but a version of Chapter 4 is under development in: Zambelli, P., Vieno,
M., Reis, S., Heal, M.R., Lin, C., Ragazzi, M., and Ciolli, M. XXXX. “Characteriza-
tion of urban micro-environment for NO2: the case of Edinburgh, Scotland”. Heal
and Lin were responsible for the design of the measurement campaign with Passive
Samplers and data collection. I participated in one of the six weeks of data collection.
I was responsible for the data and GIS analysis as well as a majority of the manuscript
composition.
    Marco Ciolli and Marco Ragazzi were the supervisory authors on this project and
were involved throughout the project in both concept formation and manuscript edits.
Summary

Recent epidemiological studies have reinforced the link between short and long-term
exposure to air pollutants and adverse effects on public health especially over the
weaker part of the population, like children and older adults. The creation of simple
tools to locate sensible areas as well as of dedicated Spatial Decision Support System
(SDSS) to improve the management of pollution risk areas system is strongly advised.
    The aim of this work is to develop a SDSS methodology, based on easy to ﬁnd data
and usable by decision makers, to assess and reduce the impact of air pollutants in a ur-
ban context. To achieve this goals I tested the exploitability of a set of low-cost sensors
for outdoor air quality monitoring, I characterized the urban micro-environments and
the spatial variability of air pollutants using remote sensing compared to ﬁeld data and
eventually I developed a SDSS to improve the public health designing and comparing
different scenarios.
    The city centre of Edinburgh has been used as study case for the purposed method-
ology. To test the reliability and applicability of low cost sensors as proxies for remote
sensed data, we conducted a measurements campaign to compare the observed data
between an ofﬁcial measurements station (OMS) in Trento (Italy) and electrochemical
and thick ﬁlm sensors respectively of Carbon Monoxide (CO) and Ozone (O3 ). Due to
data quality and availability we decided to characterize the urban micro-environments
of Edinburgh (Scotland, UK) in eight main classes (water, grass, vegetation, road, car,
bus, buildings and shadow) combining the Geographic Object-Based Image Analysis
(GEOBIA) with Machine Learning algorithms to process the high resolution (0.25m
x 0.25m) RGB aerial ortho-rectiﬁed images. This land-use characterization combined
with other geographical informations, like the classiﬁcation of the roads and the ur-
ban morphology, were compared with 37 Nitrogen Dioxide (NO2) concentration data,
collected using passive tubes during a six week campaign of measurements conducted
by the school of Chemistry of the University of Edinburgh. I developed a new open-
source GIS python library (PyGRASS), integrated in the stable release of GRASS GIS,
to speed-up the prototyping phase and to create and test new GIS tools and methodolo-
gies. Different studies on SDSS were carried out to implement procedures and models.
Based on these models and data all the factors (land-use, roads and geo-morphological
XXII                                                                      L IST OF TABLES


features) were ranked to identify which are driving forces for urban air quality and to
help decision makers to develop new policies.
    The sensor tested in Trento revealed an evident drift in measurement residues for
CO, furthermore the measurements were also quite sensitive to external factors such as
temperature and humidity. Since these sensors required frequent recalibration in order
to obtain reliable results, their use was not as low-cost as expected. The characteri-
zation of urban land-use in Edinburgh with GEOBIA and machine learning provided
an overall accuracy of 93.71% with a Cohen’s k of 0.916 using a train/test dataset of
9301 objects. The NO2 data conﬁrm the assumption that air concentration is strongly
dependent on geographical position and it is strongly inﬂuenced by the position of the
pollutant’s source. Using the results of the tests and remote sensing analysis, I devel-
oped an SDSS. Starting from the current situation, I designed three scenarios to assess
the effect that different policies and actions could have on improving air quality at on
the local and district level.
    The outcomes of this work can be used to deﬁne and compare different scenarios
and develop effective policies to reduce the impact of air pollutants in an urban context
using simple and easy to ﬁnd data. The GIS-based tool can help to identify critical
areas before deploying sensors and splitting the study area in homogeneous micro-
environments clusters. The model is easy to expand following different procedures.
Chapter 1


Introduction

1.1 Status quo, context and motivations


An increasing body of literature underlines how air pollution has a negative inﬂuence
on human health (Curtis et al., 2006; Janssen et al., 2013) in particular for weaker
parts of the population such as foetuses, children and older adults. The effects of pol-
lutants are more evident and temporally much closer to children due to their higher
metabolism than adults (Coneus and Spiess, 2012). Recent studies highlight that ma-
ternal exposure to pollutants increases adverse pregnancy outcomes (Dadvand et al.,
2011; Gomez-Mejiba et al., 2009), hence there is great interest from the science com-
munity to improve knowledge of the consequences of air pollution exposure on child
health and deepen the environmental injustice implications (Jephcote and Chen, 2012;
Meijer et al., 2012; Schoolman and Ma, 2012; Stuart and Zeager, 2011).

    The World Health Organisation (WHO) assesses that around 7 million of prema-
ture deaths are due to air quality, in particular around 3.7 million are due to outdoor
concentrations of air pollutants (World Health Organization (WHO), 2014a). The
pollutants of major public health concern include particular matter (PM10 , PM2.5 ,
PM0.1 ), carbon monoxide (CO), ozone (O3 ), nitrogen dioxide (NO2 ) and sulphur diox-
ide (SO2 ). Reducing emissions from stationary sources (e.g. power plants); reducing
emissions of road trafﬁc sources; improving waste management; reducing energy con-
sumption and increasing production from renewable energies are all examples of suc-
cessful emission control policies (World Health Organization (WHO), 2014b). How-
ever, exceeding air quality limit values, in particular in urban areas and near busy
roads, is still an open issue in many parts of the world.
2                                                                          1 I NTRODUCTION


1.1.1 Monitoring sensor network
The main technologies available to characterize the space and time variabilities of air
pollutants in an urban context are: Ofﬁcial Measurements Stations (OMS), passive
samplers, low-cost sensors, moving sensors and numerical models. All these solutions
are different in terms of: cost, accuracy, spatial and time resolution.
    The OMS are characterized by a very high accuracy and time resolution provid-
ing data in real-time, but also by high investments and maintenance costs, limiting
the number of observation points and therefore limiting the spatial resolution. As ev-
idenced by Sheppard et al. (2005) using central site monitor data will affect health
effect estimates. Traditional technologies provide poor information on the spatial con-
centration variability of air pollutants especially in an urban context.
    Passive samplers (Kardel et al., 2012; Sally Liu et al., 2012), and bio-indicators
(Ram et al., 2012; Salo et al., 2012) such as lichens (Käffer et al., 2012; Llop et al.,
2012; Salo et al., 2012; Stevens et al., 2012) increase the number of measurement
points improving the spatial characterization of air pollutants, but generally with a
lower accuracy and time resolution and requiring more time than other systems to
collect and analyse the samplers.
    Recent studies have started to investigate the use of low-cost sensors for air qual-
ity monitoring (De Vito et al., 2011; Hu et al., 2011). As highlighted in Rada et al.
(2012) combining the information coming from the Wireless Sensor Network (WSN)
can improve characterization of air pollutant concentrations in the micro-environment
in both spatial and temporal variability. A good Sensor Network is necessary (Borrell,
2011) for assessing an emerging scientiﬁc concept deﬁned by Wild (2005) as expo-
some. Exposome is a term used to represent all kind of exposure (including exposure
to diet, lifestyle, and endogenous sources) and aims to identify the combined effects of
genetic and environmental factors on chronic diseases (Gasiewicz, 2010; Hamzelou,
2011).
    An emerging monitoring technique is to use moving sensors; this system couples
sensors characterized by a high degree of accuracy and time resolution with a moving
platform (e.g. taxi, tram, bus, etc.) to be able to increase the spatial resolution of the
study area. What it is challenging in this approach is handling and interpolating data
values that are changing in space and time.


1.1.2 Model spatial variability
Steinle et al. (2013) highlight that the spatio-temporal variability of personal expo-
sures in urban micro-environments is very high. Fixed monitoring stations are not
sufﬁciently informative to characterize this variability, and recent studies are moving
from static monitoring to spatio-temporally resolved personal exposure assessment,
1.1 S TATUS QUO ,   CONTEXT AND MOTIVATIONS                                            3


collecting data of air pollutants concentrations while people are conducting their daily
activities. However it is not possible to measure everything everywhere, therefore we
need to identify a methodology to model personal exposure in highly variable urban
concentration ﬁelds. Many studies conﬁrm that the spatial temporal variability of am-
bient air pollution could affect the assessment of health effects on the population (EPA,
2010) and try to characterize the variability of micro-environments (Can et al., 2011;
Dons et al., 2011; Mölter et al., 2012; Richmond-Bryant et al., 2011).


    In most of the used cases, measuring concentrations of air pollutants is not enough
to assess the exposure of a population in a certain area. To transform the information
points from points to a surface, several methods are available; the most used in air
quality literature are: the Kriging interpolation method (Baume et al., 2011; Beelen
et al., 2009; Briggs et al., 2000; Guo et al., 2007; Janssen et al., 2008; Marchetti et
al., 2011; Motaghian and Mohammadi, 2011; Pearce et al., 2009; Shad et al., 2009;
Singh et al., 2011; Vienneau et al., 2009); the Land Use Regression (LUR) method
(Arain et al., 2007; Beelen et al., 2010; Crouse et al., 2009; Hoek et al., 2008; Johnson
et al., 2010; Mölter et al., 2010; Mukerjee et al., 2009; Ross et al., 2007; Su et al.,
2008b); and numerical dispersion modelling (Allwine et al., 2006; Berkowicz et al.,
2008; Carruthers et al., 2000; Kesarkar et al., 2007).


    The ﬁrst two methodologies are based on geo-statistical analysis, the third one
resolves numerically the physical equations of air pollutants dispersion. If enough
information is available concerning emissions and meteorology, dispersion models are
well suited to short and long term exposure modelling. Many different dispersion
models have been developed, but in broad terms, few of them are able to provide a
detailed map over a large study area. The most used dispersion models in literature are:
ADMS-Urban (Carruthers et al., 2000), AERMOD (Kesarkar et al., 2007), DUSTRAN
(Allwine et al., 2006) and OSPM (Berkowicz et al., 2008). As highlighted by Gulliver
and Briggs (2011) three main limiting factors characterize the applicability of these
models: they require a huge amount of data (e.g. detailed data on source emissions and
boundary-layer meteorology), they are often expensive, they do not simultaneously
deal with large numbers of emission sources and generally tens of model runs are
required to map air pollutants at high spatial resolution (< 50m).


    The Land Use Regression method demonstrates good results on pollutant interpo-
lations particularly to model the spatial variation of annual average concentrations of
Particular Matter (PM) (Eeftens et al., 2012), Black Carbon (BC) (Dons et al., 2013),
Nitrogen Dioxide (NO2 ) (Mukerjee et al., 2009).
4                                                                        1 I NTRODUCTION


1.1.3 Decision support system and air quality

From a literature review, it seems that not many Decision Support Systems (DSS)
have been developed to help urban dwellers, planners and policy-makers to test and
compare different scenarios. Most of these DSSs focus on the national/regional scale
with a spatial resolution that is not suitable/adaptable to an urban scale (Carnevale et
al., 2012a,b; Elbir et al., 2010; Finardi et al., 2008; Gidhagen et al., 2013; Vedrenne
et al., 2014). Vlachokostas et al. (2011) developed a multi-criteria methodological
approach to managing air pollution at an urban scale but the spatial variable is not
taken into account in the analysis. Mavroulidou et al. (2004) developed a DSS using an
interaction matrix to assess qualitatively the concentration of air pollutants at an urban
scale and provide a vulnerability map, but this ﬁnal map still had quite low spatial
resolution. González et al. (2013) presented a more general approach to developing a
DSS for a sustainable urban metabolism that considers the problem in broader terms
(e.g. social quality of life, economic performance, environmental protection).


1.2 Aims and objectives
This dissertation aims to develop a method to derive very high-resolution urban con-
centrations for personal exposure assessment, reﬂecting the spatio-temporal variability
of urban pollution ﬁelds. The research questions addressed in this dissertation are the
following:

    • Are low-cost sensors ready for outdoor air quality monitoring?

    • How can we characterize different urban micro-environments for air pollutants?
      Which geographical factors are more important?

    • Can we enhance the Land Use Regression (LUR) method to take into account
      and integrate information on micro-environments and other factors that are drivers
      of air pollutant concentrations in an urban context?

    • Can we develop a Spatial Decision Support System (SDSS) to assess and com-
      pare the impact of different policies and scenarios of urban management?


1.3 Methodology
During the PhD I have contributed to select areas that are suitable for the design of low-
carbon settlements using a Spatial Multi Criteria Analysis (SMCA) (Vettorato et al.,
2011) and developed a Spatial Decision Support System (SDSS) to assess the forestry
biomass potential for energy exploitation (Sacchelli et al., 2014; Sacchelli et al., 2013;
1.4 F RAMEWORK                                                                           5


Zambelli et al., 2012). Starting from this experience I developed an SDSS to assess
the impact on air pollutant concentrations of different policies and scenarios. But in
order to develop an SDSS, several intermediate steps needed to be addressed.

   • Low-cost sensors were compared with measurements from the Ofﬁcial Mea-
      surement Station (OMS) to test reliability of this emerging technology.

   • The urban micro-environments were extracted from high resolution aerial im-
      ages.

   • The correlation between geographical factors and pollutants concentration was
      veriﬁed using measured data of NO2 using passive samples.

   • The measured and geographical data were used to regress the concentration
      where the measurements were not available.

   • Finally, different scenarios were made, changing the urban land use to mimic
      the effect of different policies and management planning decisions.


1.4 Framework
The GIS used to develop this work is the Geographic Resources Analysis Support
System (GRASS), an open source software that supports creation, modiﬁcation and
processing of 2D and 3D raster and vector layers. It provides a topological vector
model and true three dimensional coordinates for vector features analysis. GRASS is
characterized by stability, an efﬁcient application programming interface (API) written
in C, and a large number of GIS functions and modules (Neteler et al., 2012). Its
capabilities of processing geographical information have been evaluated and validated
by many research and technical papers (Ciolli et al., 2004; Hoﬁerka and Zlocha, 2012;
Li et al., 2010; Okabe et al., 2009; Preatoni et al., 2012; Sacchelli et al., 2013; Tattoni
et al., 2010, 2012; Vettorato et al., 2011; Zambelli et al., 2010, 2012).
    I developed a new python interface (PyGRASS) to expand GRASS capabilities
implementing a tool that gives the freedom to approach the GIS problem from a dif-
ferent perspective, opening the software developer’s approach to GIS users and trying
to maintain relative simplicity. As shown in Zambelli et al. (2013), the new Object-
Oriented Python programming API introduces an abstract layer that opens the possi-
bility for users who are not familiar with C or with GRASS C-API to use and access
transparently the efﬁcient C functions of GRASS.
    The PyGRASS library has been designed to facilitate management of complex
problems that need the integration of large data sets and geographical data, such as
air quality, data monitoring and modelling, and to integrate/develop new methods and
6                                                                    1 I NTRODUCTION


functions extending the GRASS functionalities. PyGRASS supplies a new interface
to prototype complex scientiﬁc algorithms and simplify the interoperability with all
related geospatial software and tools provided by a Python interface. The development
of PyGRASS has been sponsored by Google’s Summer of Code program (2012), and
the code produced has been integrated in the current stable release: GRASS7.
    To characterize urban micro-environments, I extracted information from high-
resolution aerial photography, and I tested and used some Machine Learning algo-
rithms for image classiﬁcation and for the regression used in the LUR. The libraries
used for this analysis are sklearn and mlpy. Both libraries provide a wide range of
state-of-the-art machine learning methods for supervised, and unsupervised problems
for regression and classiﬁcation (Albanese and Visintainer, 2012; Pedregosa et al.,
2011).
Chapter 2

Low-cost sensors: a new paradigm
for environmental monitoring

        low-cost sensors open new paradigms for air quality monitoring. Their
        performance allows a new strategy closer to the population and its health.
        Critical situations that cannot be seen with conventional approaches can
        emerge through a low-cost sensors network. In this chapter a comparison
        between electrochemical sensors and ofﬁcial measurement stations (OMS)
        data is reported. Low-cost sensors can not be used straight-forwardly in
        outdoor environments to measure environmental concentrations of air pol-
        lutants. An accurate characterization of sensor behaviour in local condi-
        tions during the design phase of the network and the post-processing of the
        collected data is required. A low-cost sensor network can help detect dan-
        gerous peaks in concentration and provide an early warning system. This
        system can be used to drive and develop local strategies to reduce the ad-
        verse effects of air pollutants.


2.1 Introduction
In recent years the European Union regulation of air quality management has reached
important results in term of exposure and health implications of organic and inorganic
pollutants, and in terms of environment protection. However many actions remain
to be developed mainly in urban areas, the general trend is towards an average im-
provement of air quality, with positive consequences on population health. Generally,
adopted regulations for air quality management are based on the concept of protecting
the environment without facing critical situations at micro-scale, where human expo-
sure to atmospheric pollutants can reach unacceptable concentration levels.
   Traditionally, air quality measurement networks take into account two instruments:
the ﬁrst is characterized by very high accuracy and resolution, providing data in real-
time, but on the other hand, requiring high investment and maintenance costs (Shep-
8                 2 L OW- COST SENSORS :   A NEW PARADIGM FOR ENVIRONMENTAL MONITORING


pard et al., 2005); the second technique is based on passive samplers (Kardel et al.,
2011; Sally Liu et al., 2012), bio-indicators (Ram et al., 2012) such as lichens (Llop
et al., 2012; Stevens et al., 2012) that increase the number of measurement points but
only provide information on average concentrations. Therefore traditional monitoring
and environmental warning systems are often insufﬁcient for planning detailed correc-
tive actions and quickly identifying critical situations that are potentially harmful to
public health.

    Many studies point out how the spatial and temporal variability of ambient air
pollution could affect the assessment of health effects on the population (USEPA -
U.S. Environmental Protection Agency, 2010). As highlighted by Dons et al. (2011)
and Richmond-Bryant et al. (2011) some micro-environments can be characterized by
signiﬁcant variability.

    The existing regulatory monitoring network provides punctual information on the
spatial and temporal concentration of air pollutants. The main factors driving space-
time variability are: pollutant sources, meteorological conditions, building and vegeta-
tion topography and different land-use. Due to the higher number of factors some local
temporary anomalies in air pollution cannot be detected by the conventional regulatory
approach; new sensor networks can help to characterize this spatial variability.

    New developments in wireless communications and micro-electro-mechanical tech-
nology have enabled the development of low-cost and multifunctional sensors for ap-
plication not only in the industrial sector (Flammini et al., 2009) but also in the agri-
cultural and environmental sphere (Alemdar and Ersoy, 2010; Burgess et al., 2010;
Cao et al., 2008; Hu et al., 2011; Rawi and Al-Anbuky, 2011; Wang et al., 2006).

    In this context, Wireless Sensor Networks (WSNs) can effectively improve char-
acterization of the variability of air pollutants in space and time and consequently
improve the life hazard assessment. This is why recent studies have started to investi-
gate the use of WSNs for air quality monitoring (Hu et al., 2011; Merbitz et al., 2012).
However, protocols and sensors are extremely new, and much research remains to be
done to integrate and test these technologies in the outdoor environment.

    This chapter analyses the opportunities and challenges of low-cost monitoring net-
works for improving control of the human health risk from atmospheric pollutants. The
chapter presents the results of some preliminary tests to build an outdoor air quality
monitoring system based on WSNs technology aimed at detecting hotspots and peaks
of pollutant concentrations (NO2 , CO and O3 ). The proposed network is not designed
to substitute traditional monitoring systems but aims to detect peak concentrations of
pollutants, to provide a new instrument for public decision makers, and to minimize
the impact of air pollution on the population.
2.2 M ATERIALS AND METHODS                                                                     9


             Critical case                    Selected pollutants                     Notes
             Kindergartens                         NO2                    CO could be added
             Street canyon                         NO2                    CO could be added
             Proximity to high trafﬁc roads        NO2
             Proximity to urban tunnels            NO2
             Proximity to urban canyons            NO2
             Proximity to Industrial plants       CO, NO2           Depending on the process
             Summertime                             O3


                    Table 2.1: Selected pollutants for each critical case.


2.2 Materials and methods
A few micro-scale critical situations were selected pointing out peak values of O3 ,
NO2 , and CO, that could be reached and the potential effects on health, in order to
develop strategies and policies to improve the status of air quality and comply with
National and European legislations. Generally critical situations could be found as
follows: in the yards of kindergartens and schools (when an important road is present
in the proximity); in street canyons (when the ﬂux of trafﬁc is critical); in residential
areas close to highways, tunnels or above trenched roads, and in the proximity of large
industrial units.
    Some low-cost sensors were selected to check their viability to act as sentinels
where the conventional approach of air quality monitoring cannot guarantee a high
spatial density of measurements.
    The tests were conducted during two weeks from 14/12/2011 until 27/12/2011 in
a suburban area of Trento (via Bolzano, Gardolo).
    The implementation of the tested WSNs was provided by ENVEVE SA and we
used electro-chemical sensors as a tool to evaluate and characterize micro-environment
concentrations in both spatial and temporal variability. A mixed network composed of
two different sensors characterized by different accuracy/sensitivity for CO, O3 and
NO2 , and optional sensors for the measurement of such physical variables as tempera-
ture, humidity, wind speed and direction, have been designed for this research.
    During the campaign the low-cost sensors were coupled with an ofﬁcial measure-
ment station, and we tested only CO and O3 .


2.3 Results and discussion
For each critical case, different pollutants were selected: NO2 , CO and O3 . Other
pollutants were excluded, due to their power consumption that seemed ill-suited for a
wireless sensor network (e.g. PM10 ). The NO pollutant was not taken into account as
non-toxic. Large industrial plants are required to analyse the production process of air
10                   2 L OW- COST SENSORS :   A NEW PARADIGM FOR ENVIRONMENTAL MONITORING


       Company               NO2                       CO                         O3
       law limits    0.1@1h & 0.02@8760h             8.7@8h                    0.06@8h
                    min    max   resolution   min   max    resolution   min    max    resolution
             e2v    0.05   5     n.a.         0     500    1            n.a.   n.a.   n.a.
     alphasense     0.02   50    0.015        0     2000   0.02         0      10     0.01
         Nemoto     0      20    0.2          0     1000   5            n.a.   n.a.   n.a.
   Xcell Sensors    0      50    0.1          0     1999   1            n.a.   n.a.   n.a.
 SGX sensortech     0      20    0.1          0     500    1            n.a.   n.a.   n.a.
          Dräger    0      50    0.1          0     2000   1            0      10     0.01
      veris.com     0      10    0.1          0     200    1            n.a.   n.a.   n.a.
       City-tech    0      50    0.1          0     500    1            0      2      0.02
       Euro-gas     0      20    0.1          0     200    n.a.         0      3      n.a.


Table 2.2: Comparison of different sensors concerning range and resolution for air
pollutants: NO2 , CO, O3 , all units are in ppm.


Figure 2.1: Comparison of ofﬁcial measurement station (blue) and low-cost sensor
data (green) multiplied by a factor of 1.25.


pollutants and the way of they are released into the atmosphere, because both factors
can change signiﬁcantly the concentration mix and level of air pollutants. The selected
pollutants for each critical case are shown in Table 2.1.
     Looking at low-cost sensor available on the market (see Table 2.2), it is evident
that to measure outdoor concentration of NO2 and O3 is challenging, due to their low
resolution and because they are working always close to their detection limit.
     The comparison between measurements of O3 from the low-cost sensor and the
OMS data highlight a malfunctioning of the sensor which makes the low-cost sensor
data unsuitable for environmental monitoring. Thus, only data analysis for the CO will
be presented.
     Figure 2.1 shows the comparison between concentration data for CO collected
using the ofﬁcial measurement station and the low-cost sensor data multiplied by a
factor of 1.25 to reduce measurement shift.
     To analyse the behaviour of the electrochemical sensor in Figure 2.2 the residuals
between the two instruments are reported. The ﬁgure clearly shows a strong drift in the
data collected by the low-cost sensor; furthermore the electrochemical sensor requires
a quite long spin-up time (∼ 1 week) with a signiﬁcant standard deviation of ∼ 0.29 mg
2.4 C ONCLUSIONS                                                                         11


     Figure 2.2: CO residuals between OMS measurements and low-cost sensor.


that became ∼ 0.05 mg in the second week (see Figure 2.3).


               (a) First week                                  (b) Second week

Figure 2.3: Comparison of the residuals with removed drift for the ﬁrst and second
week of measurements.


2.4 Conclusions
The emerging technology of low-cost sensors can contribute to improving the charac-
terization of spatial variability of air pollutants in a complex context (e.g. city centre).
However, current technologies available on the market are designed to monitor the
12                2 L OW- COST SENSORS :   A NEW PARADIGM FOR ENVIRONMENTAL MONITORING


concentration of air pollutants in industrial and indoor environments. Further tests
are required to prove the exploitability of these technologies for outdoor air quality
monitoring. Low-cost sensors for NO2 , CO and O3 can be used to provide qualitative
information about air quality and to detect peaks, whereas low-cost sensors for PM10
are still not available (Rada et al., 2012).
     The tests conducted in this work highlight a high cross-sensitivity of the low-
cost sensor to other pollutants and micro-climate parameters (e.g humidity) and a low
signal-to-noise ratio (S/N). Therefore, sensor network design and the post-processing
phase requires accurate testing at local conditions.
     As highlighted by Ragazzi et al. (2012), in spite of all the difﬁculties and chal-
lenges, low-cost sensors could lead to a new monitoring paradigm where private cit-
izens can contribute toward integrating air quality monitoring systems. For example:
enterprises could be interested in monitoring the air quality around plants to certify
their activity to the community, while private citizens, grouping in committee, could
support the air quality monitoring of their neighbourhood. Offering a higher-spatial
resolution with respect to ﬁxed air quality stations could be considered a useful tool
in assessing population exposure near relevant pollutant sources (e.g. trafﬁcked roads,
domestic wood burning plants, ﬁlling stations placed in populated areas) located in re-
gions with complex landforms or simply far away from traditional monitoring stations.
This scenario, however, also raises some new issues that must be taken into account,
such as: how can we rely on data collected from entrepreneurs or citizens? How can
we validate or exclude these data? How can we homogenize information coming from
different networks with heterogeneous sensors? How can we collect and organize these
data coming from an increasing number of stakeholders and institutions?
     To be able to address the above questions, the solution could be to mix and inte-
grate different technologies such as: low-cost sensor networks, traditional air quality
monitoring stations and protocols such as the Sensor Observation Service (SOS) de-
ﬁned by the Open Geospatial Consortium (OGC) to collect/ﬁlter and query all these
data. An integrated system, that implements the above mentioned standards, promises
to reach several advantages compared to traditional air quality monitoring systems
and institutions: allows reaching higher spatial accuracy; reduces the redundancy of
measures by different network systems; improves the localization of critical pollutant
concentrations; reduces the cost of improving spatial resolution of the data; and allows
the creation of a real time alert system for dangerous pollutants.
Chapter 3

PyGrass: a high level GIS library
for research and rapid prototyping

        PyGRASS is an object-oriented Python Application Programming Interface
        (API) for Geographic Resources Analysis Support System (GRASS) Geo-
        graphic Information System (GIS), a powerful open source GIS widely used
        in academia, commercial settings and governmental agencies. We present
        the architecture of the PyGRASS library, covering interfaces to GRASS
        modules, vector and raster data, with a focus on the new capabilities that it
        provides to GRASS users and developers. Our design concept of the module
        interface allows the direct linking of inputs and outputs of GRASS modules
        to create process chains, including compatibility checks, process control
        and error handling. The module interface was designed to be easily ex-
        tended to work with remote processing services (Web Processing Service
        (WPS), Web Service Deﬁnition Language (WSDL)/Simple Object Access
        Protocol (SOAP)). The new object-oriented Python programming API in-
        troduces an abstract layer that opens the possibility of using and accessing
        transparently the efﬁcient raster and vector functions of GRASS that are
        implemented in C. The design goal was to provide an easy to use but pow-
        erful Python interface for users and developers who are not familiar with
        the programming language C and with the GRASS C API. We demonstrate
        the capabilities, scalability and performance of PyGRASS with several ded-
        icated tests and benchmarks. We compare and discuss the results of the
        benchmarks with dedicated C implementations.


3.1 Introduction
Geographic Information Systems (GIS) have the capability to integrate heterogeneous
digital data, giving the opportunity to public administration, industry and research to
provide basic and advanced data analysis and modeling for a wide range of disci-
plines (Foody, 2008). The Geographic Resources Analysis Support System (GRASS)
14           3 P Y G RASS :   A HIGH LEVEL   GIS LIBRARY FOR RESEARCH AND RAPID PROTOTYPING


supports the creation, modiﬁcation and processing of 2D and 3D raster and vector lay-
ers. It provides a topological vector model and true three dimensional coordinates for
vector features analysis. GRASS is characterized by stability, an efﬁcient application
programming interface (API) written in C, and a large number of GIS functions and
modules (Neteler et al., 2012). GRASS provides a large number of models and al-
gorithms which, after substantial testing and trouble shooting, have proven to be very
reliable. Its capabilities of processing geographical information have been attested by
many research and technical papers (Ciolli et al., 2004; Hoﬁerka and Zlocha, 2012; Li
et al., 2010; Okabe et al., 2009; Preatoni et al., 2012; Sacchelli et al., 2014; Sacchelli
et al., 2013; Tattoni et al., 2010, 2012; Vettorato and Zambelli, 2009; Vettorato et al.,
2011; Zambelli et al., 2010, 2012).

     GRASS GIS has a modular design. The core functionalities are implemented
in shared libraries using the programming language C and can be accessed via the
GRASS C API. This API provides read and write access to raster, 3D raster and vector
data, as well as the handling of projection information, spatial and attribute database
management, spline interpolation, mathematical and numerical functionalities and vi-
sualization functionalities (see Table 3.1). Spatial algorithms and models are imple-
mented as small stand-alone programs, called modules, that make use of the C API.
The implementation of GRASS modules follows the UNIX concept. Hence, each
module in GRASS has a dedicated purpose and is efﬁciently implemented. Modules
can be combined, similar to the UNIX tool concept. Since the early days of GRASS
in the 80s, the UNIX shell was used to combine GRASS modules and UNIX tools
to script repetitive tasks and to implement complex spatial analysis and processing
algorithms. This concept results in a large amount of over 400 modules. Most of
them are implemented in C. A sufﬁcient amount is implemented as scripts using either
POSIX (Portable Operating System Interface; deﬁnes a standard operating system in-
terface and environment, including a command interpreter (or “shell”), and common
utility programs to support applications portability at the source code level. Scripts are
POSIX-based until version 6 of GRASS GIS or Python, as in the latest stable release
of GRASS.

     Many GIS software packages have chosen Python as a main language for users (see
Table 3.2), because it is available on many platforms, and it seems to be a good com-
promise between simplicity (syntax, low learning curve), ﬂexibility (multi-paradigm
programming) and power (due to rich scientiﬁc libraries). GRASS developers have
chosen Python to replace POSIX for scripting modules (Neteler et al., 2012). For
this purpose, a Python scripting library was implemented by the GRASS development
team, but it does not provide any further improvement to the POSIX approach than
managing process chains using the standard Python library (subprocess).
3.1 I NTRODUCTION                                                                                     15


                              ctypes       n. funcs     n. structs    n. vars
                              gis                501           20        175
                              raster             372           26         37
                              vector             344           66        147
                              dbmi               333           19        100
                              ogsf               331           33        151
                              raster3d           245           20         52
                              gmath              130            1         21
                              display            120           15         21
                              imagery            100           15         16
                              nviz                82            8         30
                              date                63            1         17
                              vedit               22           49         43
                              cluster             19            3         10
                              stats               19            0         10
                              proj                17            9         16
                              arraystats           0            1         10
                                                2,698         286        856


Table 3.1: GRASS C API consists of 2,698 C functions that are available through the
ctypes library, divided into 10 different ﬁelds. This data is derived from the ofﬁcial
GRASS source code (Zambelli et al., 2013).


     Software        write             script            license        use     OS
     ArcGIS          C++             python             proprietary   desktop   Windows
     AutoCAD Map    C/C++           AutoLisp            proprietary   desktop   Windows
     Geoserver       java          python(dev)             GPL         server   Windows, Mac, Linux
     GRASS7           C              python                GPL        desktop   Windows, Mac, Linux
     gvSIG           Java             jython               GPL        desktop   Windows, Mac, Linux
     IDRISI          COM             python             proprietary   desktop   Windows
     ILWIS           C++             python                GPL        desktop   Windows
     Geomedia       C/C++            python             proprietary   desktop   Windows
     MapInfo        C/Basic      MapBasic, python       proprietary   desktop   Windows
     Mapserver      C/C++            python               X/MIT        server   Windows, Mac, Linux
     QGIS            C++             python                GPL        desktop   Windows, Mac, Linux
     Saga-GIS        C++             python                GPL        desktop   Windows, Mac, Linux
     Udig            Java            groovy               LGPL        desktop   Windows, Mac, Linux


    Table 3.2: Comparison of the most used GIS software (Zambelli et al., 2013).


   Most GIS software and tools provide a large number of high-level algorithms to
cover different GIS processing needs. Few GIS open capabilities to users to access the
lower functionalities, such as iterating between the geometry features of a vector map,
or iterating row by row to a raster map using a higher-level language.
   GRASS modules must be implemented in C to access the low level functionality.
To overcome this limitation and to reach a broader development community, a ctypes
interface was introduced to GRASS version 7. This interface allows access to the low
level GRASS C API in Python. However, the creation of new modules written in C
or using the C API with Python through the ctypes interface is not a trivial task and
is generally a very time-consuming activity. This happens because the writer must be
a competent C programmer (manage the computer memory, work with pointers, etc.)
16           3 P Y G RASS :   A HIGH LEVEL   GIS LIBRARY FOR RESEARCH AND RAPID PROTOTYPING


and because of how the GRASS library works internally. Hence, an intensive study of
the large GRASS C API is required.
     The goal of this work is to implement an intuitive and easy to use object-oriented
layer around the GRASS C API, hiding its complexity, but providing a more abstrac-
t/powerful development environment for solving complex GIS data analysis and model
problems. An additional task is the replacement of parts from the existing Python
script API with more efﬁcient and powerful object-oriented approaches. In this way,
we can provide access to the capability of the C API of GRASS for power users and
geo-scientists who are not familiar with C and the C API of GRASS.
     The idea of PyGRASS was born from the experience of the authors who wished to
expand GRASS capabilities by implementing a tool that gives the freedom to approach
the GIS problem from a different perspective, opening a software developer’s approach
to GIS users and trying to maintain relative simplicity. The PyGRASS library provides
a simple, object-oriented higher level interface that transforms each GRASS module
into an object by interpreting its XML interface description, trying to simplify the syn-
tax and enforcing the script activity. The object-oriented layer around the GRASS C
API, PyGRASS, implements several classes to access vector and raster data, covering
several complex features that are only available in the GRASS C API, like support for
the vector topology or the use of the raster cache for fast random read and write access.
In addition, PyGRASS simpliﬁes interoperability with all related geospatial software
and tools provided by a Python interface.
     The development of PyGRASS has been sponsored by Google’s Summer of Code
program (2012), and the code produced has been integrated in the latest stable release
of GRASS.


3.2 Methodology
The PyGRASS library is written in Python and makes use of modules from the Python
standard library (van Rossum, 1995), like: sys, fnmatch, collections, sqlite3,
as well as from the third party Python library, NumPy (Jones et al., 2013). NumPy
is a package for scientiﬁc computing and it is already a dependence of GRASS. An
optional library is psycopg2 (Varrazzo and Psycopg Community, 2013), which is used
to interface PyGRASS vector attribute handling with the PostgreSQL database.
     The PyGRASS library was developed taking into account four main aspects:

     • consistency – the library shall adhere to norms and architecture commonly found
       in both Python and GRASS, in order to avoid confusion for users who are only
       familiar with one of the above;

     • simplicity – the library must be simple and intuitive, without hiding access to
3.2 M ETHODOLOGY                                                                      17


      lower-level functionality, indeed, providing a seamless user experience between
      the low level C API of GRASS with a higher-level object-oriented Python ap-
      proach;

   • ﬂexibility – the library must be ﬂexible, both allowing the use of existing GRASS
      modules and giving to each user the freedom to implement his own logic, using
      more detailed and ﬁne-grained programming tools;

   • performance – the library must be fast, considering both the development and
      the CPU time. GRASS C API functions are heavily used by PyGRASS every
      time that it is possible.

   The library is split in two parts: the ﬁrst is more related to script activity and the
GRASS modules; the second is focused on programming aspects and the C API of
GRASS.
   To improve the existing script API of GRASS, PyGRASS considers each GRASS
module as an object with input parameters, output parameters and ﬂags. When the
object is “instantiate”, the Module class parses the XML interface description gener-
ated from the GRASS modules through the --interface-description ﬂag to know
which parameters and ﬂags are deﬁned. For each parameter, the metadata is analyzed.
The metadata speciﬁes if a parameter is required or optional, if it is an input or an
output, what type it is (raster, vector, string, ﬂoat, etc.) and many more. This infor-
mation allows the class to check the correctness of the parameters and provides the
capability to suggest the correct ones. The identiﬁcation of inputs and outputs allows
the implementation of process chains. The interface design of this class was chosen to
support the implementation of local and remote process execution services, which may
be added in future. To implement an interface to a Web Processing Service (WPS), the
deﬁnition of complex inputs, complex outputs and literals must be known to generate
the XML execute request. The same is true for remote process execution services based
on WSDL/SOAP. The Module class provides all required module-speciﬁc information
by design.
   The current Python script API deﬁnes several functions to manage the GRASS
module: the make_command returns a list of strings with the command options from
a dictionary of keys and values. The start_command is a GRASS-oriented interface
to subprocess.Popen (a module process creation and management provided by the
Python standard library), that internally uses the make_command function. All the other
run/pipe/feed/read/write/parse command functions are specialized wrappers of
the start_command.
   The Module class of PyGRASS gathers all these features in a single object, con-
necting directly the inputs and outputs of GRASS modules; see Listing 8.1 in the
18           3 P Y G RASS :   A HIGH LEVEL   GIS LIBRARY FOR RESEARCH AND RAPID PROTOTYPING


Appendix.
     The PyGRASS Module class simpliﬁes the Python syntax as much as possible to
be competitive with the POSIX module interface. It supports backward compatibility
syntax and enhances the API to provide a tool that manages user errors and returns a
list of valid options. Moreover, the PyGRASS library gives the capability to pass text
to a command as input (stdin), to catch the text output (stdout) and the error message
(stderr) of a command. Finally, PyGRASS allows users to manage (i.e., terminate,
kill, or wait) the process.
     The PyGRASS library introduces an Object-Oriented (OO) Python API to GRASS,
which implements for each GIS/GRASS entity one or more classes. The classes use
C structures and functions through the existing ctypes interface. Ctypes is a Python
library; it provides C-compatible data types and allows calling functions in DLLs or
shared libraries. It can be used to wrap these libraries in pure Python. Our higher level
Python interface uses ctypes to integrate the underlying GRASS C API structures and
functions in an object-oriented framework, but at the same time, trying to respect the
GRASS work-ﬂow and nomenclature to conform with the C API. The object-speciﬁc
ctypes pointer to the underlying C structures are available under the attribute name that
starts with c_*. This allows the user to access the lower level GRASS C API structures
directly using the ctypes interface. These classes allow facing the problem to be con-
fronted in a more abstract way. A high-level object-oriented approach can help users
face the problem, even if they are not familiar with the implementation details of the
C API level, speeding up the design, writing, prototyping and debugging phases.
3.2 M ETHODOLOGY                                                                  19


Architecture of the Library

   The PyGRASS library follows the main GRASS structure and is divided in four
parts. Each part implements a set of dedicated classes. See Figures 3.1 and 3.2 for a
general overview of the library.


                     Figure 3.1: Module, Raster and GIS classes.


                               Figure 3.2: Vector class.

   • modules contains the classes Module, MetaModule and Parameter. These
      classes are designed to substitute the previous POSIX-based scripting approach
20             3 P Y G RASS :   A HIGH LEVEL   GIS LIBRARY FOR RESEARCH AND RAPID PROTOTYPING


       (see Listing 8.2) and replace parts of the existing Python script API (see Listing
       8.3).
       The object-oriented architecture of the PyGRASS library allows users to interact
       with GRASS modules as Python objects. These objects allow direct access to
       module attributes like: name, description and keywords. The input and
       output options are implemented using a dedicated parameter class. Instances
       of this class are stored either in an input dictionary or an output dictionary within
       the module object (Listing 8.4). Inputs and outputs can be referenced by their
       name in the dictionaries or as attributes of the dictionary objects. They can be
       connected to each other to create process chains, (see Listing 8.1). The type
       check system of the parameter class assures that output options can only be
       connected with input options of different modules when they have the same
       type. Hence, the PyGRASS module library will raise an error in case the output
       of a vector module was connected with a raster input of a second module.
       The PyGRASS module library introduces special parameters to allow ﬁne-grain
       control over the GRASS processes. These special parameters end with the ‘_’
       character to avoid a mix-up with option names. The ﬁrst two special parameters
       – run_ and finish_ – are used to manage the process. The parameter deﬁni-
       tion run_=True will execute the process immediately, and finish_=True will
       wait until the process terminates, (see Listing 8.5). Other special parameters
       that were added are: stdin_, stdout_ and stderr_. The parameters stdin_
       and stdout_ are used to connect the textual inputs and outputs from different
       modules to create a process pipeline; stdin_ is used to pass the textual output
       (stdout_) from one process to another, (see Listing 8.6).

     • vector contains the classes Vector without the GRASS topology and VectorTopo
       with the GRASS topology.
       The Vector class allows the user to access the non-topological geometry fea-
       tures of a vector map in sequential order; see Listing 8.7. The class VectorTopo
       was designed to access topological and non-topological geometry features of
       a vector map in random order. This class allows for iterating among speciﬁc
       feature types, (see Listings 8.8 and 8.9). Writing is supported in both classes
       in sequential order. However, already written features can be updated in the
       topological access class.
       The following classes are designed to represent vector features: Point, Line,
       Centroid, Boundary, Isle and Area. Instances of these classes are usually cre-
       ated when features are read from vector maps by the Vector and VectorTopo
       classes. To manage multiple connections with vector attribute SQL databases,
3.3 R ESULTS                                                                         21


      the classes DBLinks and Link were designed. Attribute tables can be created,
      accessed and modiﬁed with the Table class. The Filter class provides sev-
      eral methods for working with data without the need to know SQL. The Attrs
      class was designed to access the content of the attributes table from a geometry
      feature.

   • raster contains the classes (RasterRow, RasterRowIO, and RasterSegment).
      Each class uses a different GRASS C-library to grant a speciﬁc kind of access
      to raster maps. All the raster classes share common methods to open a map,
      read raster values or raster rows, get raster information and write metadata, such
      as categories and history. As with vectors, a similar syntax has been used to
      instantiate, open and close a raster object.
      The RasterRow class reads the contents of the raster map row by row and writes
      it in a sequential mode, row after row, (see Listings 8.10 and 8.11).
      The RasterRowIO class implements a row cache that allows users to read raster
      rows randomly by keeping a number of rows in the main memory. This caching
      mechanism avoids heavy I/O (input/output) hard-disk usage in speciﬁc tasks,
      such as moving window operations, or cell neighborhood analysis. Similarly to
      the RasterRowIO, the RasterSegment class provides access to a tile cache. The
      tile cache is an uncompressed representation of a raster map that will be created
      at the point of initialization. The access to the uncompressed ﬁle is based on
      tiles that are cached in the main memory for fast random read and write access
      through the Segment class. With the RasterSegment class, it is possible to read
      and write the pixel value randomly at the same time in the same map.

   • gis contains GRASS management classes, like Gisdbase, Location and Mapset,
      that help users interact with the GRASS environment, (see Listing 8.12). The
      Region class manages the computational region of GRASS that directly affects
      2D and 3D raster processing, as well as several vector processing algorithms,
      (see Listing 8.13).

    The PyGRASS library assures that memory management is fully handled by Python.
All structures from the GRASS C API that are used by PyGRASS are ctypes objects or
get deleted in the class destructor’s and, therefore, are handled by the Python garbage
collector. The user must not take care of memory allocation and deletion directly.


3.3 Results
In this section, we compare different solutions of simple GIS tasks using standard
GRASS tools and PyGRASS. The machine used for the benchmark was a laptop with
22           3 P Y G RASS :   A HIGH LEVEL   GIS LIBRARY FOR RESEARCH AND RAPID PROTOTYPING


an Intel Core i7 3610QM processor with 2.30 GHz and 6 Mb L3 Cache. The system
has 24 Gb DDR3@1333Mhz of RAM and a solid state disk (SSD) of 250 Gb as the
system driver. The installed operating system (OS) is GNU/Linux 3.7.5 (×86_64) on
the SSD. The GRASS 7 development version used for the benchmark has the revision
number r54812. The GRASS data are stored on a secondary hard disk of 750 Gb at
7,200 rpm.
     Concerning script activity, PyGRASS improves mainly the syntax and changes
how users can interact with GRASS modules. We measured small performance loss
when executing GRASS modules using the PyGRASS module interface compared to
the POSIX approach, which go from 1% up to 12%, due to the average load of the
system. We did not expect a large performance difference, since Python and POSIX
are basically using the same OS (operating system) functions to spawn processes.
     On the contrary, we needed to test the new API added by PyGRASS to identify
its strengths, weaknesses and scalability (all the benchmark tests used in this chapter
are available at https://github.com/zarch/pygrass-benchmark). Each test, ex-
cluding the biggest region (with 1010 cells), has been repeated ﬁve times. There are
only small differences between each measured run time, resulting in a small standard
deviation. Hence, we think that the ﬁnal results of our benchmark are representative.
     The ﬁrst test compares two simple procedures, one written using PyGRASS (RasterRow
and VectorTopo) (see Listing 8.14) and the other using the programming language C
(see Listing 8.15 and for the results, see Table 3.3). The test takes as inputs a vector
point map and a raster map. It creates a new vector point map that includes all vector
points from the input map. A new attribute table is created and linked with the vector
map, which contains a column with the sampled values of the raster map. The proce-
dure is applied to ﬁve different random vector point maps, to be independent from the
spatial distribution of the vector points. Moreover, the tests have been executed using
different region extents and numbers of points, to test the scalability of the different
solutions. Both procedures are conceptually identical and share most of the GRASS
C API functions. The only difference is database access: PyGRASS uses the Python
driver instead of the C API of GRASS.

                  number of cells        102      104    106      108        1010
                  number of points        10      102    103      104         105
                  Vector and Raster
                  sample (PyGRASS)      2.21     4.23   23.87   218.63   12670.27
                  v.sample2 (C API)     3.03     5.48   31.67   266.54   13304.67
                  Raster
                  RasterRow            0.046    0.431    4.53    74.46    4303.24
                  r.mapcalc            0.078    0.525    5.83   170.43    5347.95


Table 3.3: Table with benchmark results, reported in seconds, using different compu-
tational extents and a different number of vector points (Zambelli et al., 2013).
3.4 D ISCUSSION AND B ENCHMARKS                                                      23


    One indication that our approach is easier to handle than the C implementation is
that the PyGRASS version is considerably shorter (48 lines) than the C version (102
lines). With a PyGRASS library, it is noted that there is a marginal advantage in speed
compared to it’s C counterpart. The speed gain over the C version is probably due to
the slower driver adopted by the GRASS C API of the vector attribute database.
    In a further test we compared performance of the PyGRASS RasterRow imple-
mentation (see Listing 8.16), with r.mapcalc using a simple raster map algorithm
(see Listing 8.17). The algorithm stores only those pixels in a new raster map that
have a value that it is greater than 50. Again, the PyGRASS version is slightly faster
than the GRASS module. The good performance of PyGRASS is caused by our design
approach that uses NumPy for row computation tasks. The performance will drop dra-
matically in case we would implement the same algorithm in pure Python comparing
cell by cell without using the optimized NumPy approach.


3.4 Discussion and Benchmarks

The PyGRASS Module class adds some useful features that were not available with
the previous Python script API; these features have a time cost, because they require
exporting the GRASS module in XML, parsing the XML and instantiating the object,
checking that all parameters are correct and then executing. The time cost for these op-
erations is around 0.2 s, but generally, the execution time of a GRASS module requires
much more time; therefore, in most of the cases, we can neglect this time loss.
    Concerning the new approach introduced by PyGRASS, the performance depends
mainly on the features that are used. For example, updating the column attribute with
the value of area with PyGRASS requires almost the same time, around 0.24 s for
PyGRASS and 0.26 s using the v.to.db module.
    Using the RasterRow class to compute areas that satisfy a condition, with a region
of 16,000 rows and 14,000 columns, it is slightly faster (27.42 s) than using r.mapcalc
(35.49 s) if the row is used as a NumPY array. Using the PyGRASS RasterRow class
without using the NumPy array makes the execution seven-times slower than using the
GRASS r.mapcalc module (992.5 s vs 144.2 s).
    The example above highlights that it is not convenient to replace an existing GRASS
module with a new one written in PyGRASS, because the user has to write more code
and because the GRASS native modules are generally faster. The big advantage of
using the PyGRASS library is the object-oriented access to the GRASS C API func-
tionality.
    Without the need to extract information from the output string of the module, in
this kind of operation, the PyGRASS library is faster compared with modules and with
24             3 P Y G RASS :   A HIGH LEVEL   GIS LIBRARY FOR RESEARCH AND RAPID PROTOTYPING


existing Python functions: for example, to get the list of the raster map contained in a
Mapset with PyGRASS takes 608 ns.
     Using the Python function list_grouped in the GRASS core takes 0.1273 s.
     The same good results are obtained with the Region object; with PyGRASS, it
takes 211 ns.
     Using the Python function region in the GRASS core takes 0.1056 s.
     The PyGRASS library can help to substitute all the commands in the GRASS
Python script library that need to wrap and interpret the output of a GRASS module.


3.5 Conclusions

An increasing amount of GIS software uses the Python language to provide a powerful
scripting interface. An easy to use, but powerful, Python interface can help to efﬁ-
ciently exploit the capabilities of GIS software. Such an interface can be effectively
used to integrate different GIS, statistical, and geospatial tools and programming lan-
guages in a GIS to expand its overall capabilities.
     The PyGRASS library tries to open a new perspective to power users and scientists
that use GRASS GIS. It provides a Python interface that is able to compete with the
simplicity of POSIX to write procedures with existing GRASS modules, as well as
a powerful object-oriented interface to deal and experiment with GIS problems at a
lower level.
     The new Module class, introduced by PyGRASS, provides a single interface to
all GRASS modules and can be extended to work with Web Processing Services
(WPS), Web Services Description Language (WSDL) and Simple Object Access Pro-
tocol (SOAP) services or other remote execution services. The design concept of the
Module class allows direct linking of inputs and outputs of GRASS modules to create
process chains, including compatibility checks, process control and error handling.
     The new Object-Oriented Python programming API introduces an abstract layer
that opens the possibility for users who are not familiar with C and with GRASS C API,
to use and access transparently the efﬁcient C functions of GRASS. Our tests show that
algorithms implemented with PyGRASS are comparable in terms of performance with
an equivalent C implementation. Hence, our approach wraps the underlying GRASS
C libraries efﬁciently. It needs much fewer lines of code to implement an algorithm in
PyGRASS than in C. Moreover, it shows that speciﬁc Python strengths, for example,
the database Python interface, can be used to gain a speed improvement over speciﬁc
C-implementations in GRASS. The PyGRASS library has been designed to integrate
new methods or to inherit from an existing class to extend GRASS functionalities,
providing new tools for prototyping complex scientiﬁc algorithms.
3.5 C ONCLUSIONS                                                                   25


    Some of the functionalities provided by PyGRASS are also available in other soft-
ware, such as Postgresql/Postgis, R, shapely, etc. However, switching to them requires
changing the GIS working environment. That means installing, conﬁguring, learn-
ing the new tools and converting from one format to another. The PyGRASS library
does not force the users to learn and switch between different languages (C, SQL, R,
Python, BASH, etc.) and tools to carry out their work.
    The PyGRASS library allows GIS modelers and scientists to use the C API of
GRASS, with a high level interface, providing a tool that gives the freedom to ap-
proach the GIS problem from a different perspective. In this way, users and scientists
can combine the GRASS modules with the GRASS C API functions and algorithms.
Therefore, PyGRASS is able to simplify the approach to develop a new GIS model,
using one program (GRASS) and one language (Python) to cover the different GIS
aspects, increasing the productivity and allowing geo-scientists to focus on studying
the problem they have selected and not on studying the tools and languages used.
    Moreover, the PyGRASS library can be used as a tool to facilitate use and inte-
gration with other GIS/statistical software and libraries (not only open source). The
common language among different software and the object-oriented structure should
make communication and procedure/data exchange easier.
    The PyGRASS library, together with the GRASS GIS temporal framework, can
provide a comprehensive high performance spatio-temporal GIS framework for GI-
Scientists.
    PyGRASS seems to be ideal applying in complex case studies, such as air quality
monitoring from wireless sensor networks, and for building decision support systems
to evaluate the assessment of sustainable forest energy.
Chapter 4

Machine learning classiﬁcation of
urban micro-environments
        Urban micro-environments substantially affect the spatial variability and
        distribution of air pollutants, urban heat-islands (UHI) and urban micro-
        climate. A ﬁne resolution land classiﬁcation and urban morphology can
        help to address these topics and provide useful insights into this spatial
        variability for urban planners and local authorities. This work presents
        a new methodology to identify the urban micro-environments from colour
        airborne images (RGB) applying automatic algorithms in an open source
        software environment. A set of several machine-learning algorithms was
        used to obtain an object-based image supervised classiﬁcation. Eight main
        categories (water, grass, vegetation, road, car, bus, building and shadow)
        were classiﬁed. Overall, 9,301 areas were used for training and evaluating
        the machine-learning results. The cross-validation accuracy of the six best
        algorithms is above 93% of the training data set using 5 folds. The best
        classiﬁers were Gradient Boost (93.82%), Random Trees (93.75%) and Ex-
        tremely Randomized Tree classiﬁer (93.73%).The described methodology
        yields reliable results for the classiﬁcation of urban micro-environments us-
        ing only RGB images and constitutes a promising approach. Where avail-
        able, spectral data could further improve the results characterizing each
        object with additional information about the surface that can be used by the
        classiﬁers.


4.1 Introduction
A highly detailed spatial characterization of urban environments is of increasing inter-
est for several research ﬁelds and communities. Fine resolution land classiﬁcation and
urban morphology can help address questions regarding the spatial variability of air
pollutants (Merbitz et al., 2012), help to understand model phenomena like the Urban
Heat Island (Busato et al., 2013), and urban micro-climates (Rahman et al., 2014),
28                   4 M ACHINE LEARNING CLASSIFICATION OF URBAN MICRO - ENVIRONMENTS


and also to identify the speciﬁc role of ecological functions and different ecosystem
services (Behling et al., 2015).
     Lehmann et al. (2014) stressed the importance of identifying the urban vegetation
structure type (UVST) to link particular ecosystem services with physical parame-
ters to explain temperature variability at the urban and district level. Jin et al. (2014)
showed how the concentration of ﬁne particulate matter (PM2.5 , particulate matter with
an aerodynamic diameter < 2.5 µm) is inﬂuenced, for instance, by the canopy density
(CD) and the leaf area index (LAI). Other studies such as (Kaur et al., 2007; Pirjola
et al., 2012; Rahman et al., 2014) demonstrated that the characterization of micro-
climatic and urban micro-environment conditions also drive concentrations of various
air pollutants.
     With the term “urban micro-environments”, we aim to identify not only environ-
ments with similar micro-climate conditions but also with a similar context such as:
urban morphology, distance from main roads, and other human structures and activi-
ties. The characterization of urban micro-environments, especially if combined with
indoor micro-environments (Steinle et al., 2013), can help to assess the overall impact
of air pollutants on the population and to deﬁne new policies to reduce this impact
(Rada, 2014; Rada et al., 2012; Schiavon et al., 2014).
     We selected eight broad land cover classes, that are considered representative, to
characterize urban environments (Chen et al., 2014; Dugord et al., 2014; Kotthaus
and Grimmond, 2013; Lehmann et al., 2014; Llop et al., 2012). Then we extracted
them from aerial images at high spatial resolution (0.25 m). The spatial characteri-
zation of these classes combined with other information, such as digital terrain and
surface model, can provide useful information to assess the spatial variability of phys-
ical phenomena that occur at the urban scale (Liu and Shen, 2014; Mölter et al., 2010;
Skelhorn et al., 2014). The classiﬁcation of an image is a complex task due to the high
number of elements (e.g., shapes, materials, textures, and relative position between the
objects) that need to be interpreted. Recent publications suggest that an object-based
image classiﬁcation can achieve better results when combined with machine-learning
algorithms, as compared to per-pixel classiﬁcations (Duro et al., 2012).
     The Geographic Objects-Based Image Analysis (GEOBIA) is a process that splits
image into segments (objects) based on analysis of the statistics of shape, texture and
spectral response. A growing number of publications describe combining object-based
image classiﬁcation with machine-learning algorithms (Alioscha-Perez and Sahli, 2014;
Clewley et al., 2014; Dronova et al., 2012; Duro et al., 2012; Novack et al., 2011;
Senthilnath et al., 2012; Wieland and Pittore, 2014). The most employed classiﬁers
are the Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT),
and k-Nearest Neighbours (k-NN). All of these studies apply the GEOBIA to multi-
4.2 M ATERIAL AND METHODS                                                            29


and hyper-spectral images.
    The objective of this study was to develop a methodology along with an imple-
mentation as open-source tools for the characterization of urban micro-environments
using only orthophoto images with the three spectral bands of red, green and blue
(RGB). We chose to work with RGB images because they are generally available at
lower cost compared to other multi-spectral data and are commonly available in aerial
photography and in high resolution satellite imagery. An open-source tool is important
as a guarantee of repeatability and reproducibility of the research (Steiniger and Hay,
2009).
    The study areas in the City of Edinburgh, Scotland (UK), were chosen to represent
a medium-size city of Northern Europe. With respect to other areas, it is particularly
challenging because the roof colors are often similar to the color of nearby roads.
Moreover, the study area is characterized by huge varieties of urban settlements that
include streets and medieval alleys and buildings ranging from residential and sub-
urban areas and modern and industrial buildings.
    In this chapter, I present the use of ten different machine-learning algorithms. For
each algorithm, I tested several set-ups to determine the options that perform best with
our training dataset.
    As reported by Clewley et al. (2014) there are currently a number of open source
packages that can be used to perform various parts of the GEOBIA process (Orfeo
Toolbox (Inglada and Christophe, 2009), TWOPAC (Huth et al., 2012), InterIMAGE
(InterImage, 2014), GDAL-RSGISLib-RIOS-TuiView-KEA (Clewley et al., 2014)).
None of them is integrated with a multi-purpose GIS software such as GRASS. The
presented work aims to ﬁll this gap opening the GEOBIA process to a wider number
of users.


4.2 Material and methods

4.2.1 Study area and data

The study area covers most of the city center of Edinburgh, Scotland (UK), with a
dimension of 6 km x 12 km (Figure 4.1). Edinburgh is the capital of Scotland and it
is situated on the south shore of the Firth of Forth. It is the seventh most populated
city in the United Kingdom and the second largest urban area in Scotland with a total
population of 487,000 habitants in 2013 (National Records of Scotland, 2013).
    The input dataset for the image classiﬁcation task of the urban micro-environment
of the city of Edinburgh is a set of RGB aerial ortho-rectiﬁed images (2009) provided
by the Ordinary Survey (British Ordnance Survey, 2012) with a spatial resolution of
0.25 m by 0.25 m.
30                   4 M ACHINE LEARNING CLASSIFICATION OF URBAN MICRO - ENVIRONMENTS


Figure 4.1: The red box illustrates the location and dimensions of the study area in the
City of Edinburgh, Scotland/UK.


4.2.2 Tools

I used GRASS (Geographic Resources Analysis Support System), a free and open source
Geographic Information System (GIS) (Neteler et al., 2012) to process geographi-
cal data through the pyGRASS library, and scikit-learn, which provided the main
machine-learning algorithms for the classiﬁcation of the objects. Python (van Rossum,
1995) was used as the main language to integrate all the tools together and to develop
a new GRASS GIS 7 module v.class.ml.

     • The GRASS GIS software suite has been under continuous development since
       1982 (Neteler et al., 2012). Since 1999, it has been developed as free and open
       source software by an international development team. GRASS GIS has been
       used in several research projects, e.g., to develop holistic models to assess the
       biomass potential in alpine regions (Sacchelli et al., 2013; Vettorato et al., 2011;
       Zambelli et al., 2012).

     • pyGRASS is a Python interface used to access the C API of GRASS GIS with
       a higher and more abstract interface (Zambelli et al., 2013) and was developed
       during the Google Summer of Code 2012 (GSoC 2012) as a tool to simplify the
       use of GRASS GIS from other Python libraries. The main features of pyGRASS
       are described in Chapter 3.

     • Scikit-learn (Pedregosa et al., 2011) is a Python library that provides simple and
       efﬁcient tools for data mining and data analysis built on NumPy (Walt et al.,
       2011), SciPy (Jones et al., 2001-2015; Millman and Aivazis, 2011; Oliphant,
       2007), and matplotlib (Hunter, 2007). The main features of Scikit-learn are al-
4.2 M ATERIAL AND METHODS                                                             31


      gorithms for classiﬁcation (Support Vector Machine (SVM), nearest neighbors,
      and random forest), regression (Support Vector Regression (SVR), ridge regres-
      sion, and Lasso), clustering (k-Means, spectral clustering, and mean-shift), di-
      mensionality reduction (Principal Component Analysis (PCA), Isometric map-
      ping, and non-negative matrix factorization), model selection (grid search, cross
      validation, and metrics), pre-processing and feature extraction.

    • Python is a general purpose high-level programming language that supports
      multiple paradigms (e.g., object oriented, imperative, functional and procedural
      styles). It is an interpreted language with a dynamic type system and automatic
      memory management and is provided with a large and comprehensive standard
      library (van Rossum, 1995; Wikipedia, 2014b).


4.2.3 Methodology

For the classiﬁcation methodology presented here, we used an RGB image of the city
center of Edinburgh for the following eight broad classes: water, grass, tree, road, car,
bus, building and shadow. The shadow class was introduced to avoid forcing the clas-
siﬁers to assess the land-use of areas without enough information. We introduced the
car and bus classes because, in many applications, it could be interesting to assess the
trafﬁc density on roads using relative low-cost RGB images. The remaining classes
are basic land-use classiﬁcations that are generally associated with a certain material
and use, which can help to assess the urban micro-climatic and/or environment’s con-
ditions.
    I classiﬁed the RGB image set using ﬁve main steps: bands ratio, image segmen-
tation, segment conversion, feature extraction, and classiﬁcation. In detail, I processed
the data as follows:

   1. Bands ratio: as highlight by Gitelson and Merzlyak (1996) and Zhang and Hu
      (2012) to improve the identiﬁcation of vegetation in a urban environment and
      to be less sensitive to the partial shadowing due to the closer vegetation and
      buildings, they combined the NDV I deﬁned as:

                                                NIR − R
                                      NDV I =                                       (4.1)
                                                NIR + R

      where NIR and R represent respectively the Near-Infrared and the Red image
      bands, with the Greeness Index (GI) expressed as:

                                                G
                                       GI =                                         (4.2)
                                              R+G+B
32                   4 M ACHINE LEARNING CLASSIFICATION OF URBAN MICRO - ENVIRONMENTS


       where R, G, B represent respectively the Red, Green and Blue image bands.
       Since the NIR band is not available on our dataset, we used the GI, which is
       an index representing the green percentage of the pixel. Zhang and Hu (2012)
       demonstrates that another useful band ratio that helps to distinguish between
       deciduous and conifer is GRI, deﬁned as:

                                                G−R
                                        GRI =                                       (4.3)
                                                G+R
       Following the same path, the indexes: GI, RI, BI, GRI GBI, RBI, visible in
       Figure 4.2, were computed and scaled to a value between 1 − 255 to improve the
       results of the segmentation algorithm and help to identify and distinguish the
       image objects.

     2. Image segmentation: The image segmentation was performed using a growing
       and merging region algorithm with the i.segment GRASS GIS 7 module. Dur-
       ing the segmentation process, a unique segment ID was assigned to the pixel if
       the difference of the pixels similarity criteria was lower than the user deﬁned
       threshold. The image was segmented hierarchically, which means that the out-
       put of the segmentation was used as input seeds for further segmentations. The
       image was segmented using a threshold of 0.01; the output was used as an input
       for the segment with a threshold of 0.02, and then for 0.03, etc. The thresholds
       computed were: 0.01, 0.02, 0.03, 0.04, 0.05, and 0.06. Another parameter used
       for the segmentation was to set a minimum number of pixels that a segment must
       have to be identiﬁed with a unique ID. I selected the best segmentation parame-
       ters from a visual comparison of the ones identiﬁed and objects that were fused
       together. Figure 4.3 shows the results of segmentation using different parame-
       ters.

     3. Conversion of segmented raster data to a vector map: The segment map was
       converted from an integer raster map containing the ID of each segment, to a
       vector map using the v.to.rast GRASS GIS module. In GRASS GIS, each
       geometry feature of a vector map can be linked with one or more categories of
       an attribute table called a “layer”. During the classiﬁcation process, I linked
       each geometric feature of the vector map with several layers: one containing
       the features extracted for each segment, one with the training classes, and one
       with the ﬁnal classiﬁcation results. I used v.category to assure that all of
       the geometric features are linked with the same category in all of the vector
       layers, otherwise, the ﬁrst row in the attribute table with the extracted features
       and the ﬁrst row with the classiﬁcation results could not be referred to the same
       geometric feature (segment).
4.2 M ATERIAL AND METHODS                                                            33


  4. Segment characterization and features extraction: The segment characterization
     was performed by extracting several statistical parameters from the raster image
     for each band and feature shape, using v.stats. To improve characterization
     of the segment, I computed the Principal Component Analysis (PCA) of the
     RGB image (i.pca) and the ﬁrst component of the PCA is used to analyse the
     texture index (r.texture). The r.texture module computed the ﬁrst (see
     Figure 4.4) and the second order statistics (see Figure 4.5). The texture indexes
     considered on this work are six, with a moving window of three pixels: Entropy
     (ENT), Sum of Variance (SV), Sum of Average (SA), Angular Second Moment
     (ASM), Contrast (CON), and Measure of Correlation (MOC). The shape of each
     segment was characterized by extracting 13 parameters: the number of isles, the
     longitude and latitude extension, the perimeter, the area, the area of the boundary
     of the segment without considering internal isles, the ratio between the area
     of isles and the area of the boundary, the compactness factor and the fractal
     dimension deﬁned as:

                                                         perimeter
                             compactness f actor =         √                       (4.4)
                                                        2 · π · area

                                                       log(perimeter)
                            f ractal dimension = 2 ·                               (4.5)
                                                          log(area)

     The isles of segments were characterized by the same parameters: area, perime-
     ter, compactness factor and fractal dimension and aggregate using the sum.
     From each raster (R, G, B, PCA1, PCA2, and PCA3) and for each segment, I
     extracted the following 20 zonal statistics: the number of pixels, the coefﬁcient
     of variance, coefﬁcient variance of the square, Kurtosis, Kurtosis of the square,
     maximum, mean, median, minimum, mode, number of occurrences, percentile
     (90%), range, skewedness, skewedness of the square, standard deviation, stan-
     dard deviation of the square, ﬁrst quartile, third quartile, variance and variance
     of the square. In summary, I used 15 bands to characterize each segment (3
     RGB + 6 RGB indexes + 6 textures); for each band I extracted 20 parameters
     with zonal statistics for each segment, and the shape of the segment was charac-
     terize by 7 values. Therefore the number of features extracted for each segment
     was 307. All these statistics were grouped together into a new layer of the vector
     map.

  5. Segment classiﬁcation: The new module v.class.ml, developed during the
     third year of the Ph.D., implements several utilities: to perform pre-processing,
     to tune classiﬁcation parameters and to conduct classiﬁcation and post-processing
34                 4 M ACHINE LEARNING CLASSIFICATION OF URBAN MICRO - ENVIRONMENTS


     tasks. The module can be applied to any general vector map that has all of the
     attributes that are integer and/or ﬂoat types, and there are no limitations on the
     number of columns (e.g., features) and rows (e.g., segments) except the dimen-
     sion of the whole table that must be able to be allocated to the RAM of the
     computer used for the analysis. The classiﬁcation process can be split into three
     main tasks: extraction of the training segments, testing of the different algo-
     rithms and set-ups, and ﬁnally, classiﬁcation of all of the remaining segments.

     The v.class.ml requires a vector map where the training areas for supervised
     classiﬁcation are identiﬁed. For each supervised area, the module extracts all of
     the segments that are included and/or intersected and assigns the class of the area
     to them. All of the segments selected from the training vector map are saved in
     a new layer of the vector segment map. To train and test the different machine-
     learning algorithms, I manually classiﬁed more than 9,300 segments; a complete
     list of classes with the number of training segments per class is reported in Table
     4.1

                    class            id   numb. of training segments    [%]
                    water            0                          297     3.19
                    grass            1                        2,810    30.21
                    vegetation       2                        1,260    13.55
                    road             3                        1,121    12.05
                    car              4                          126     1.35
                    bus              5                           94     1.01
                    building         6                        3,279    35.25
                    shadow           7                          314     3.38
                    total training                            9,301     1.34
                    total                                   691,795


Table 4.1: Table with the class name, numeric label and number of segments used to
train all the different machine-learning algorithms


     To identify the machine-learning algorithm that provides the best result, I used
     a cross-validation with a k-fold of ﬁve sub-samples. The sub-samples were gen-
     erated by randomly shufﬂing the dataset samples but preserving the percentage
     of samples for each class. The machine-learning algorithm uses all of the sub-
     samples for training, excluding one that is used to compute the accuracy score
     as deﬁned in Table 4.2, and the process is cycled through all of the sub-samples.
     Using this method, all sub-samples are used both to train and to test the machine-
     learning algorithm. The selection of the best algorithm is conducted by comput-
     ing the average of ﬁve accurately computed during the cross validation process
     and selecting the algorithms with the higher mean value.

     I tested the following ten machine-learning algorithms: the Gradient Boosting
4.2 M ATERIAL AND METHODS                                                             35


     Classiﬁer (GBC) using 100 Decision tree estimators with a varying minleaf size
     of one, three, and ﬁve; the Support Vector Classiﬁer (SVC) using different ker-
     nels: linear, Radial Basis Function (RBF), Sigmoid, and Polynomial and varying
     the C (from 10-2 up to 108) and the γ values (from 10-6 up to 106); Extra Trees
     Classiﬁer (ET) with a different number of estimators (10 and 100) and with dif-
     ferent minimum leaf sizes of one, three, and ﬁve; the Random Forest (RF) using
     different criteria (Gini and Entropy) and with a different value for the maximum
     feature (a percentage, the sqrt, and a log2); the k-Nearest Neighbor (kNN) us-
     ing a different number of neighbors (2, 4, 8, 16) and different weights (uniform
     and distance); Nearest Centroid (NC) using different metrics (l1, l2, cityblock,
     manhattan, and eucldean); the Decision Tree (DT) using different criterion (Gini
     and Entropy); Stochastic Gradient Descendant (SGD) varying the loss parameter
     (hinge, huber, and log) and different penalties (l1, l2, and elasticnet); Gaussian
     Naive Bayes (GNB), and Ada Boosting Classiﬁer (ABC) using 50 estimators
     with a learning rate of 1 and a minimum leaf of 3.
     Because the Support Vector Classiﬁer requires 2420 models (4(kernel) · 11(C) ·
     11(γ) · 5(cross − validation)) to run, the training dataset were applied on a bal-
     anced sub-set of the training classes, with a number of segments for each class
     that is equal to the class with the least training segments. The class with the low-
     est number of training segments is the bus with only 94 segments. Therefore,
     the subset was composed of 94 segments per class, reducing the total number of
     the training dataset from more than 9300 to (94 · 8 =) 752 segments. To select
     the number of trainings for each class that provides the best results, an instance
     of the SVC using the default parameters (kernel=RBF, C = 1, γ = 0) was tested
     1000 times with 1000 different randomly balanced shufﬂed training datasets; the
     subset of the training segments that provided the highest accuracy was used to
     explore the domain of the best kernel, C and γ set of values that maximize the
     accuracy score.
     Some of the above machine-learning algorithms tested during this work were en-
     semble methods. Ensemble methods, to improve robustness and generalizability
     of the classiﬁer, combine the results of several estimators. The ﬁrst approach is
     to return the average of several classiﬁers; this approach compared to a single
     estimator has a reduced variance (Pedregosa et al., 2011). RF and ET classi-
     ﬁers are the ones that use this approach. A second approach is to combine and
     organize several (weak) classiﬁers in a sequence to obtain a powerful ensemble
     (Pedregosa et al., 2011); the classiﬁers tested as part of this group are ABC and
     the GBC.

   The classiﬁcation task was repeated with a different conﬁguration: using a scaled
36                                  4 M ACHINE LEARNING CLASSIFICATION OF URBAN MICRO - ENVIRONMENTS


and decomposed (PCA) version of different features: RGB, PCA, and RGB and PCA
statistics.
     Based on the overall accuracy, the best 6 algorithms and pre-processing options
were further analyzed by computing the confusion matrix to see which classes can be
more problematic than others and by analyzing the Precision/Recall curve as deﬁned
in Table 4.2.

                                                             Condition

                  Tpop = CP +CN             CP = T P + FN           CN = FP + T N                      CP
                                                                                                PRV = CN
                  Total Population         Condition Positive      Condition Negative           Prevalence

                   OP = T P + FP                  TP                          FP                 PPV = OPTP                       FP
                                                                                                                            FDR = OP
                                                                                          Positive Predictive Value
 Outcomes        Outcomes Positive           True Positive               False Positive                                 False Discovery Rate
                                                                                                 (Precision)

                  ON = FN + T N                   FN                          TN                FOR = ONFN                  NPV = ON TN
                 Outcomes Negative          False Negative               True Negative      False Omission Rate       Negative Predictive Value

                    LR+ = tf pr
                             pr               T PR = CP TP              FPR = CNFP            ACC = T TpopN
                                                                                                      P+T

                                           True Positive Rate       False Positive Rate
              Positive Likelihood Ratio                                                          Accuracy
                                          (Sensitivity, Recall)          (Fall-out)
                             f nr                      FN                          TN
                    LR− =    tnr              FNR =    CP              T NR =      CN
                                                                    True negative rate
              Negative Likelihood Ratio   False Negative Rate
                                                                      (Speciﬁcity)
                            LR
                   DOR = LR+
                             −
               Diagnostic Odds Ratio


Table 4.2: Deﬁnitions of the Accuracy, Precision and Recall parameters used to evalu-
ate and compare the performance of different machine-learning classiﬁers (Wikipedia,
2014a).


4.3 Results and Discussion
The threshold used to segment the aerial images that provide the best results was 0.03.
To identify the best threshold, we performed a visual comparison between the seg-
mented images. Higher values of the threshold parameters generate segments that
merge different classes together; in our case, some buildings were merged with streets,
cars with roads, and trees with grass. However, a lower threshold value would only
identify a small portion of the objects. Therefore, the segment threshold parameter
needs to be set manually depending on what image objects we aim to classify. An-
other important parameter used for image segmentation was the minimum segment
areas, which were set to 64 pixels (0.25m · 0.25m · 64pix = 4m2 ). Setting a minimum
number of pixels to identify the image objects helps to exclude all objects that are too
small to be classiﬁed and reduces the overall number of segments. Another reason to
set a minimum number of pixels was that the feature extraction of each segment is
characterized by a statistical value, which can have some meaning only if it is com-
puted over a signiﬁcant number of pixels. The total number of segments identiﬁed
4.3 R ESULTS AND D ISCUSSION                                                         37


Figure 4.2: Bands ratio indexes used to highlight differences on aerial images, the grid
                                                      R             G              B
size is 250 m. The indexes are deﬁned as RI = R+G+B , GI = R+G+B , BI = R+G+B ,
RBI = R−B , GRI = G+R , GBI = G−B with R, G, B respectively the Red, Green and
        R+B
                     G−R
                                  G+B
Blue image bands.
38                 4 M ACHINE LEARNING CLASSIFICATION OF URBAN MICRO - ENVIRONMENTS


Figure 4.3: The results of different segment thresholds are shown, where indicated a
minimum size of 64 pixels (4 m2 ) is used, the grid size is 250 m.
4.3 R ESULTS AND D ISCUSSION                                                       39


Figure 4.4: First order statistics using the ﬁrst component of the Principal Component
Analysis (PCA1): Sum Average (SA), Entropy (ENT), Difference Entropy (DE), Sum
Entropy (SE), Variance (VAR), Difference Variance (DV), Sum Variance (SV), the grid
size is 250 m.
40                 4 M ACHINE LEARNING CLASSIFICATION OF URBAN MICRO - ENVIRONMENTS


Figure 4.5: Second order statistics using the ﬁrst component of the Principal Compo-
nent Analysis (PCA1): Angular Second Moment (ASM), Inverse Difference Moment
(IDM), Contrast (CON), Correlation (CORR), Measures of Correlation (MOC), Cor-
relation Coefﬁcient (MCC), the grid size is 250 m.
4.3 R ESULTS AND D ISCUSSION                                                         41


                    (a)                                         (b)

Figure 4.6: Domain exploration of the best set of parameters for each SVC kernel that
provides a higher averaged accuracy score using a cross-validation with ﬁve k-folds
and varying the C and γ values, using a subset of 94 segments per class.


using a threshold of 0.03 and a minimum number of pixels (64) were 691,795.
    As described in the methodology, the Support Vector Machine algorithm has two
main parameters (C, γ) that have to be set properly to reach good classiﬁcation results,
for this reason a domain exploration of the best parameters was performed. To speed-
up the process, the classiﬁcation was applied on a balanced sub-set of the training
classes; Figure 4.6 shows the results.
    I ranked the features used to classify the segment, using the Extra Trees Classifier
with 500 estimators (see results in Figure 4.7 and Table 4.3). The values not reported
in the previous Figure and Table are the importances of shape features that are [%]:
numb. of isles 0.0506, x extent 0.2155, y extent 0.2692, perimeter of the isles 0.0450,
area of the isles 0.0322, compact of the isles 0.0383, fractal dimension of the isles
0.0565, perimeter 0.3075, area 0.1728, bound of the area 0.1531, aratio 0.0313, com-
pact 0.3656, fractal dimension 0.1720.
    More than 80 different algorithm set-ups were tested to ﬁnd the best set of pa-
rameters. The 6 best classiﬁers providing a higher accuracy using 5 shufﬂed folds are
reported in Table 4.5 and visible in Figure 4.8 and are: with an accuracy of 93.82%
Gradient Boosting Classifier using 100 estimators and a min samples leaf of 3
(gradient_boost_500_meanleaf3); with 93.75% the Random Forest Classifier
using 500 estimators, the entropy function to assess the quality of a split and consid-
ering only 50% of the features at each split (rand_tree_entropy_0p50_500); with
almost the same accuracy 93.73% the Extra Trees Classifier using 500 estima-
tors and a min samples leaf of 1 (extra_tree_500_1); with 93.59% the Support
Vector Classifier (SVC), the Gaussian radial basis function kernel (RBF) and
C = 10, γ = 0.001 (SVC_rbf); with 93.10% the SVC using a linear kernel with a C = 1
4 M ACHINE LEARNING CLASSIFICATION OF URBAN MICRO - ENVIRONMENTS


                                                                                 B        GBI      BI       R        G        RBI      GI       RI       SV       Contr    SA       MOC1     GRI      Entr     ASM
                                                                   third_quart   0.9147   0.9374   1.0932   0.5535   0.6327   0.7541   0.5311   0.4619   0.5453   0.3243   0.3280   0.2042   0.3241   0.3977   0.2108
                                                                   stddev2       0.8721   1.4561   1.0688   0.4626   0.6478   0.7142   0.4881   0.4377   0.4138   0.3259   0.2591   0.2615   0.2711   0.1624   0.1969
                                                                   mean          0.7295   0.9598   0.9627   0.5277   0.4729   0.8004   0.7374   0.5152   0.3871   0.3995   0.2591   0.4647   0.3126   0.2793   0.1679
                                                                   ﬁrst_quart    0.6767   1.0549   1.1937   0.5292   0.4270   0.8294   0.4254   0.4672   0.2226   0.2849   0.2543   0.4729   0.3113   0.2958   0.3306
                                                                   median        0.7838   1.2277   0.8959   0.4847   0.4899   0.6453   0.5324   0.4303   0.2591   0.2836   0.3133   0.4225   0.2893   0.3306   0.2377
                                                                   variance2     0.6420   0.8236   1.0394   0.4218   0.4036   0.4529   0.5369   0.3662   0.2695   0.3256   0.2758   0.2715   0.1714   0.2389   0.1320
                                                                   stddev        0.9716   0.4381   0.5204   0.8621   0.8772   0.3729   0.2690   0.1884   0.3328   0.2897   0.2187   0.1725   0.1333   0.1454   0.1361
                                                                   max           0.9791   0.5068   0.7431   0.8448   0.6921   0.4397   0.3627   0.1939   0.2423   0.2875   0.2015   0.0960   0.1976   0.0338   0.1002
                                                                   mode          0.8039   0.6410   0.4792   0.4945   0.4822   0.7001   0.3362   0.2490   0.2507   0.2808   0.3405   0.0634   0.2184   0.1526   0.1090
                                                                   range         0.8136   0.3559   0.3883   0.8381   0.7304   0.3284   0.2607   0.2587   0.1288   0.1708   0.0900   0.1011   0.2606   0.1222   0.1055
                                                                   perc_90       0.5463   0.7825   0.5617   0.4018   0.4213   0.5100   0.3031   0.2680   0.0537   0.0574   0.0546   0.0502   0.1977   0.0521   0.0507
                                                                   coeff_var     0.5924   0.2290   0.4657   0.6681   0.6246   0.1250   0.2720   0.1977   0.1773   0.1486   0.1850   0.2078   0.0751   0.1580   0.1276
                                                                   min           0.3571   0.8956   0.4424   0.3348   0.2590   0.5950   0.2762   0.3096   0.0941   0.0987   0.1005   0.0752   0.2688   0.1059   0.0294
                                                                   variance      0.3180   0.2761   0.3135   0.3893   0.4114   0.3152   0.1904   0.1344   0.3662   0.2950   0.2291   0.2009   0.1327   0.1115   0.0743
                                                                   skewness      0.6275   0.1655   0.1621   0.5197   0.4344   0.1384   0.1030   0.0834   0.2321   0.2619   0.0998   0.0923   0.0868   0.0663   0.0616
                                                                   skewness2     0.2701   0.1790   0.2557   0.2120   0.2166   0.3431   0.2056   0.2188   0.2010   0.1324   0.1414   0.1133   0.2438   0.1046   0.0670
                                                                   kurtosis2     0.2413   0.1593   0.1326   0.2235   0.2720   0.1788   0.1541   0.1344   0.1509   0.1308   0.1730   0.1723   0.1000   0.1953   0.0874
                                                                   coeff_var2    0.1376   0.1417   0.1470   0.1426   0.1436   0.1034   0.1821   0.2152   0.1472   0.1772   0.2046   0.2276   0.0892   0.1215   0.1060
                                                                   kurtosis      0.1035   0.0667   0.0765   0.1417   0.1271   0.0809   0.0698   0.0754   0.1385   0.0689   0.0981   0.0693   0.0784   0.0565   0.0569
                                                                   occurrences   0.1197   0.0032   0.0000   0.1349   0.1501   0.0012   0.0000   0.0030   0.1131   0.1356   0.1480   0.0293   0.0000   0.0380   0.1046
                                                                                                                Table 4.3: Features importance in percentage % of each couple raster-statistics.
42
4.3 R ESULTS AND D ISCUSSION                                                           43


Figure 4.7: Score matrix with the percentage of the importance of each raster and zone
statistics considered in the dissertation.


Figure 4.8: In the ﬁrst plot the segments used for training the machine-learning algo-
rithms are reported, in the other subplots the results of the classiﬁcation with different
algorithms are reported.
44                   4 M ACHINE LEARNING CLASSIFICATION OF URBAN MICRO - ENVIRONMENTS


(SVC_linear); with 93.07% the SVC using the Sigmoid kernel and C = 1e6 , γ = 1e−6
(SVC_sigmoid);
     In Table 4.4 the confusion matrices of the ﬁrst best six algorithms are reported.
     The accuracy score of the ﬁrst 6 best machine-learning algorithms were very sim-
ilar (e.g., the best and the worst classiﬁers differ from each other by only 0.75% of
the overall accuracy). This result was in part due to the extremely unbalanced training
dataset. Analyzing the confusion matrix, all of the algorithms had particularly good
performances to identify: water, grass, tree, shadow which, looking at Table 4.1,
were the classes with an accuracy above 95% in most of the classiﬁers.
     All of the algorithms had a lower identiﬁcation performance for road class, with an
accuracy around ∼ 84% of the training dataset. This low identiﬁcation performance
for the road class shows that the features extracted from the RGB channels did not
characterize and differentiate this class enough from the others. All of the other classes
were above 80%. The classes car and bus obtain accuracy that, depending on the
classiﬁers, vary from 94.74% to 79.84% for the car and between 94.25% and 87.37%
for the bus. The confusion matrix clearly shows that there were two classes that were
more confused from each other: road and building. If available, we can integrate
the classiﬁcation result with the Digital Surface Model (DSM) to help the classiﬁers
distinguish between these classes. A graphical overview of how the classiﬁer was
able to distinguish between the classes is provided in Figure 4.9, which illustrates the
variation of the Precision and the Recall, deﬁned in Table 4.2.
     From our tests the Gradient Boost was the best classiﬁer with an overall accuracy
of 93.8% and a Cohen’s k of 0.92. The overall accuracies from other studies (Duro
et al., 2012; Li et al., 2014; Novack et al., 2011; Vieira et al., 2012) lie between 71%
and 95%, and the Cohen’s k coefﬁcients lie between 0.63 and 0.91. Therefore, the
current work seems to be in line with previous works, but it is notable that we were
able to reach almost the same accuracy level using only the visible spectrum, while
previous studies used at least an additional Near Infra Red (NIR) spectrum or more.
The methodology described in this chapter can also be applied to images with other
spectral bands and context, and it is not limited to the visible spectrum or the classi-
ﬁcation of urban land use, but can also be used for classiﬁcation and interpretation of
rural images.


4.4 Conclusions

This chapter presents a methodology for the extraction, from RGB aerial ortho-rectiﬁed
images, of urban micro-environments with a high spatial resolution by combining Geo-
Objects Based Image Analysis (GEOBIA) and machine-learning algorithms to achieve
4.4 C ONCLUSIONS                                                                                                             45


                                                         gradient_boost_500_meanleaf3
 class              water   grass   vegetation    road        car      bus   building     shadow   Tot. Prod.   Prod. Acc. [%]
 water               285        4                        kC ohen                    5          3         297             95.96
 grass                  1   2757           25       9                              18                   2810             98.11
 vegetation                    39        1202       3           1                  11          4        1260             95.40
 road                          18           8     924                             170          1        1121             82.43
 car                                        1                 96                   29                    126             76.19
 bus                                        4                           75         15                     94             79.79
 building                      14           8     153           7        5      3085           7        3279             94.08
 shadow                 2       1           2                                       7        302         314             96.18
 Tot. User            288   2833         1250    1089        104        80      3340        317         9301
 User Acc. [%]      98.96   97.32       96.16    84.85     92.31    93.75      92.37       95.27
 Overall Acc. [%]   93.82
 kCohen [%]         91.72

                                                          rand_tree_entropy_0p50_500
 water               289       2                                                    2          4         297            97.31
 grass                 1    2742           37      10                              20                   2810            97.58
 vegetation                   40         1196       4                              17          3        1260            94.92
 road                         26            9     927                             158          1        1121            82.69
 car                                        1       1         90         1         33                    126            71.43
 bus                           1            2                           82          9                     94            87.23
 building               1     19           11     158          5         4       3072          9        3279            93.69
 shadow                 2                   2                                       3       307          314            97.77
 Tot. User           293    2830         1258     1100        95        87       3314        324        9301
 User Acc. [%]      98.63   96.89       95.07    84.27     94.74     94.25      92.70      94.75
 Overall Acc. [%]   93.59
 kCohen [%]         91.43

                                                               extra_tree_500_1
 water               293       1                                                      2        1         297            98.65
 grass                      2747           38      10                                15                 2810            97.76
 vegetation                   35         1211       3         1                       7        3        1260            96.11
 road                         26            9     906                              179         1        1121            80.82
 car                                        1                96         1            28                  126            76.19
 bus                                        1                          82            11                   94            87.23
 building                     18           16     151         5         4         3078         7        3279            93.87
 shadow                 1                   2                                         6     305          314            97.13
 Tot. User            294   2827         1278     1070     102         87         3326      317         9301
 User Acc. [%]      99.66   97.17       94.76    84.67    94.12     94.25         92.54    96.21
 Overall Acc. [%]   93.73
 kCohen [%]         91.61

                                                                   SVC_rbf
 water               290       1                                                      3        3         297            97.64
 grass                 1    2754           28      11         1                      15                 2810            98.01
 vegetation                   33         1212       3         3                       5        4        1260            96.19
 road                         18            5     913                               184        1        1121            81.45
 car                                        3       1        98         1            23                  126            77.78
 bus                                                          1        83            10                   94            88.30
 building               2     26            9     153        15         8          3058        8        3279            93.26
 shadow                 3                   4                 1                       9     297          314            94.59
 Tot. User            296    2832        1261     1081      119        92          3307      313        9301
 User Acc. [%]      97.97   97.25       96.11    84.46    82.35     90.22         92.47    94.89
 Overall Acc. [%]   93.59
 kCohen [%]         91.43

                                                                  SVC_linear
 water               290       3                                                      1        3         297            97.64
 grass                 4    2748           30      13                                15                 2810            97.79
 vegetation                   37         1208       1         3                       6        5        1260            95.87
 road                  1      22            5     906                               186        1        1121            80.82
 car                                        3       1        99         1            22                  126            78.57
 bus                                                          2        83             9                   94            88.30
 building               4      26          11      174       19        11          3029        5        3279            92.38
 shadow                 3       1           4        1        1                       8     296          314            94.27
 Tot. User           302    2837        1261     1096      124         95         3276      310         9301
 User Acc. [%]      96.03   96.86       95.80    82.66    79.84     87.37         92.46    95.48
 Overall Acc. [%]   93.10
 kCohen [%]         90.78

                                                                  SVC_sigmoid
 water               290       3                                                      1        3         297            97.64
 grass                 4    2748           30      13                                15                 2810            97.79
 vegetation                   37         1208       1         3                       6        5        1260            95.87
 road                  1      22            5     904                              188         1        1121            80.64
 car                                        3       1        99         1            22                  126            78.57
 bus                                                          2        83             9                   94            88.30
 building               4      26          11      175       19        11         3028         5        3279            92.35
 shadow                 3       1           4        1        1                       8     296          314            94.27
 Tot. User           302    2837        1261      1095     124         95         3277      310         9301
 User Acc. [%]      96.03   96.86       95.80    82.56    79.84     87.37         92.40    95.48
 Overall Acc. [%]   93.07
 kCohen [%]         90.74


                    Table 4.4: Confusion matrix for the ﬁrst best six classiﬁers.
46                   4 M ACHINE LEARNING CLASSIFICATION OF URBAN MICRO - ENVIRONMENTS


        (a) Gradient Boosting Tree                    (b) Random Tree


       (c) Extremely Randomize Tree          (d) SVC radial basis function (RBF)


              (e) SVC linear                          (f) SVC sigmoid

Figure 4.9: The Precision/Recall (PR) curves provide an informative picture of algo-
rithm performance. Ideally, if all of the segments were classiﬁed correctly without
errors, both the value for Precision and Recall are 1, and the Area Under the Curve
(AUC) reported in the legend will be 1. (a) Gradient Boosting Tree (GBC) (c) Ex-
tremely Randomize Tree (ET) (e) SVC Sigmoid (b) SVC linear (d) SVC radial func-
tion (RBF) basis (f) Random Forest (RF).
4.4 C ONCLUSIONS                                                                           47


        name                           mean [%]   max [%]   min [%]   std [%]   time [s]
        gradient_boost_500_meanleaf3      93.82     94.78     92.91     0.67    2939.2
        rand_tree_entropy_0p50_500        93.75     94.41     92.96     0.56    2202.7
        extra_tree_500_1                  93.73     94.09     93.12     0.34      34.3
        SVC_rbf                           93.59     94.35     92.48     0.64      81.6
        SVC_linear                        93.10     93.98     91.99     0.67      92.2
        SVC_sigmoid                       93.07     93.82     91.99     0.63      90.9


Table 4.5: Total accuracy reach by each classiﬁer, in the table are reported mean, max-
imum, minimum, standard deviation and the time needed for cross-validation using 5
k-folds.


a reliable accuracy. The entire methodology is based on open-source software. The
minimum input data required is quite common and easy to retrieve, which makes this
research easier to reproduce, modify and apply to other contexts. The methodology
was veriﬁed with more than 9,300 training segments, testing several pre-processing
and machine-learning algorithm set-ups. The classiﬁers achieve an overall accuracy
greater than 93% and a Cohen coefﬁcient above 0.92, and the results are comparable
to those of other literature for the same classiﬁcation task (Duro et al., 2012; Li et al.,
2014; Moskal et al., 2011; Novack et al., 2011; Vieira et al., 2012).
    The accuracy achieved with the presented methodology could be sufﬁcient to char-
acterize urban micro-environmental conditions. For example, limiting the analysis
to micro-climatic conditions, even if the classiﬁers confuse some classes such as the
building and road, these two classes have generally similar material property with re-
gard to the emissivity and heat transmission and the same considerations are valid for
grass and trees. Further improvement of the overall accuracy can be reached using
extra information to characterize each segment, such as another spectrum band or the
elevation difference between the digital terrain model (DTM) and the digital surface
model (DSM).
    The GRASS GIS module v.class.ml developed and used in this work is not
limited to image and segment classiﬁcation, but presents a more general tool for wider
application that allows the classiﬁcation of any kind of vector map using the attribute
table as a data source for machine-learning classiﬁers. The segment classiﬁcation was
chosen as a test due to the relatively high number of segments (2,726,635) and features
(307), but the module can also be applied in other contexts and research ﬁelds.
Chapter 5

Geographical factors and urban
Nitrogen Dioxide
micro-environments
         In this chapter I examined the correlation between Nitrogen Dioxide (NO2 )
         concentrations measured using passive diffusive tubes (PDTs) and geo-
         graphical factors. The study area is the city centre of Edinburgh. I con-
         sidered nine urban micro-environment classes; for each of them I computed
         the distance and the Sky View Factor (SVF). For all of these maps I cal-
         culated the Sum Average texture using different window dimensions. The
         chapter presents which factors have a stronger correlation with the mea-
         sured concentration of NO2 . These geographical factors were used as input
         to apply several machine-learning regression algorithms, to assess the spa-
         tial variability of NO2 within an urban context.


5.1 Introduction

To examine the effects of air pollutants on human health, an accurate assessment of
spatial variability of the pollutants is needed. For this reason an increasing amount of
literature focuses on spatial variability within-city at an intra urban scale. At this scale
the traditional central site monitors based on ofﬁcial measurement stations (OMS) are
inadequate.
    This chapter aims to understand if it is possible to treat the spatial variability of
air pollutants within the domain of GIS modelling, neglecting, as ﬁrst approximation,
the meteorological conditions. Moreover I want to identify which geographical factors
inﬂuence pollutant concentration within an urban context.
    Levy et al. (2014) highlights that the space and time variability of multipollutant
mix varies considerably throughout a city, and even if a single pollutant that acts as
50         5 G EOGRAPHICAL FACTORS AND URBAN N ITROGEN D IOXIDE MICRO - ENVIRONMENTS


proxy measure for the entire mix under all circumstances does not exist, indicates Ni-
trogen Dioxide (NO2 ) as “the best available indicators of spatial variation in exposure
to the outdoor urban air pollutant mixture”.
     Similarly to other studies, passive diffusive tubes (PDTs) were used to measure
and characterize the spatial variability of the NO2 (Madsen et al., 2007); what is new
with respect to previous studies is the high density of PDTs used and the relatively
long time of the measurements campaign (6 consecutive weeks).
     I tested the correlation between several geographical factors and the average of
the measured concentrations of NO2 , and then I used these geographical factors to
regress the NO2 and assess the spatial variability of the air pollutant. The presented
work extends the concept of the Land Use Regression method to base the regression
not only on land-use categories but also on other geographical factors that could be
signiﬁcant to describe micro-environment conditions of the physical phenomena under
investigation.


5.2 Materials and methods

The study area is the city centre of Edinburgh, Scotland (UK); see previous chapter for
further information on the area object of this study. The methodology combines the
urban micro-environments extracted by the RGB aerial ortho-rectiﬁed images with the
elevation difference between the Digital Terrain Model (DTM) and the Digital Surface
Model (DSM) to better distinguish between buildings and roads or trees and grass,
improving the ﬁnal characterization of the urban morphology.
     Thirty-seven passive diffusion tubes (PDTs) of Nitrogen Dioxide were positioned
by the School of Chemistry and Engineering of the University of Edinburgh under the
supervision of Dr. M. Heal and Dr. C. Lin. The passive samplers were analysed weekly
for a six-week campaign starting on 2 December 2013 and ending on 13 January 2014.
The area covered by the PDTs was less than 9 sq.km with a density of about 0.250
sq.km per passive sample, and covered the south part of the city centre of Edinburgh
and suburban areas ()see Figure 5.1 for more details).
     Nine geographical factors were considered in this work: buildings, tree, grass,
roads split in ﬁve main classes: very high, high, medium, low, very low trafﬁc, and
urban tunnels. Since during this work we had not the opportunity to analyse data
coming from the trafﬁc counter of the city council, the trafﬁc classiﬁcation of roads
were based on the Open Street Map (OSM) project classiﬁcation. The OSM project
founded in 2004 by Steve Cost collects and organizes geo-referenced data provided
and checked from a growing number of volunteers, often in the scientiﬁc literature
we refer to this data as Volunteered Geographic Information (VGI). Roads vector data
5.2 M ATERIALS AND METHODS                                                           51


Figure 5.1: The white points represent the passive samplers used to monitor the con-
centration of Nitrogen Dioxide (NO2 ) and to characterize the spatial variability of air
pollutants in an urban context during the six weeks of the measurement campaign.
52           5 G EOGRAPHICAL FACTORS AND URBAN N ITROGEN D IOXIDE MICRO - ENVIRONMENTS


were transformed to raster using an increasing buffer around the road starting from
8m for very low until 18m for very high trafﬁc roads and removing from the buffer
all pixels that were already classiﬁed as buildings. The other geographical factors
were extracted from the visible spectrum (RGB) of aerial ortho-rectiﬁed images pro-
vided by the Ordinary Survey (British Ordnance Survey, 2012), with a resolution of
0.25 m by 0.25m, combining a Geo-Object Image Analysis (GEOBIA) and machine-
learning classiﬁers as explained in the previous chapter. The classiﬁed raster map was
re-sampled from a resolution of 0.25m to a 5m using the statistic mode as value of the
pixel. The spatial resolution has been reduced, because spatial variability of an aver-
age week’s concentration of air pollutants is difﬁcult to model and to measure at this
scale, and to make the processing time faster. Moreover a spatial resolution of 0.25 m
seems not coherent with deployment of the measurement campaign and probably not
feasible with current technologies and costs. A lower resolution of 10 or 25m, on the
other hand, hides the effect of important geographical factors such as urban grass and
trees.
      For each of these geographical factors three main features were extracted: the
distance, the Sky View Factor (SVF), and the Sum Average texture. The SVF is deﬁned
as:

                                           2π
                                           0 cos(φ(λ))dλ
                                 SV F =                                               (5.1)
                                                 2π
      and represents the portion of sky visible from a speciﬁc point. For geographical
features that haven’t got an elevation such as grass, roads and tunnels a default value
of 100m was assigned. The SVF was computed with an angle step of 5 degrees.
      The Sum Average (SA) texture was used to “describe” the context of each point
with a moving window of a different number of pixels (3, 5, 9, 15, 21, 27, 33, 39, 45),
therefore considering a zone of inﬂuence that goes from 15m up to 225m, for each
geographical factor.
      For each of the 243 geographical factors (9(class) · 3( f eatures) · 9(texture) I com-
puted the Spearman’s correlation coefﬁcient (ρs ) in respect to the mean concentration
of NO2 measured during the winter campaign. All the geographical factors satisfying
the following condition: ρs > 0.4 ∨ ρs < −0.4 were manually selected considering the
class type of the geographical features (e.g. grass, tree, roads, etc.), the feature type
(e.g. distance, SVF, etc.) and the texture value if used.
      A Recursive Feature Elimination (RFE) was used to reduce the number of features
and select only those features providing a higher score. The RFE process consists of
deﬁning an estimator, in this work the Support Vector Regression (SVR) with kernel =
RBF,C = 1, γ = 1 was used, and select features recursively considering only a small
subset of the features available. Only features with the highest score are selected. To
5.3 R ESULTS AND DISCUSSION                                                              53


Figure 5.2: Concentration of Nitrogen Dioxide (NO2 ) measured during the six weeks,
sorted by the mean value of each point.


evaluate the performance avoiding, or limiting, the over-ﬁt problem, we compute the
Mean Square Error (MSE) of a cross-validated dataset using the left-one-out (LOO)
method. I tested 18 different regression algorithms with several set-ups to ﬁnd the best
set of options for each regressor that maximize performance.


5.3 Results and discussion

The work is based on the assumption that the NO2 concentrations are strictly depen-
dent on the geographical position. To verify this assumption, the sample measurements
were sorted using the average value during the six weeks for each point and sorting the
points based on this value (see Figure 5.2). Since we were not interested in a concen-
tration value itself, but in a spatial variability of the NO2 , we scaled the values to have
µ = 0 and σ = 1 (see Figure 5.3). Both graphs show that points characterized by a low
concentration of NO2 have low concentrations in all the weeks, accordingly the spatial
variability of the NO2 seems stable during the six weeks of the campaign. Figure 5.4
shows a summary of the Spearman’s rank correlation coefﬁcient (ρs ) between weeks
of the scaled dataset, and Figure 5.5 shows the detailed plot for each week combina-
tion. The Spearman’s rank correlation coefﬁcient assesses how well the relationship
between two variables can be described using a monotonic function. If there are no
repeated data values, a perfect Spearman correlation of +1 or -1 occurs when each of
the variables is a perfect monotone function of the other. All couples of weeks had
a strong Spearman correlation coefﬁcient between 0.88 and 0.97. From this dataset
54        5 G EOGRAPHICAL FACTORS AND URBAN N ITROGEN D IOXIDE MICRO - ENVIRONMENTS


Figure 5.3: Concentration of Nitrogen Dioxide (NO2 ) measured during the six weeks,
sorted by the mean value of each point and scaled to have: µ = 0 and σ = 1.


Figure 5.4: Spearman correlation coefﬁcient between weeks measurements of NO2
scaled with µ = 0 and σ = 1.
5.3 R ESULTS AND DISCUSSION                                                          55


(a)  Week_1       VS   (b)  Week_1     VS     (c)  Week_1     VS     (d)  Week_1     VS
Week_2                 Week_3                 Week_4                 Week_5


(e)  Week_1       VS   (f)  Week_2     VS     (g)  Week_2     VS     (h)  Week_2     VS
Week_6                 Week_3                 Week_4                 Week_5


(i)  Week_2       VS   (j)  Week_3     VS     (k)  Week_3     VS     (l)  Week_3     VS
Week_6                 Week_4                 Week_5                 Week_6


(m) Week_4        VS              (n)  Week_4     VS                 (o)  Week_5     VS
Week_5                            Week_6                             Week_6

Figure 5.5: Detail of the correlation between weeks, the values are scaled with µ = 0
and σ = 1.


and graphs the geographical position seems to be the main driver or proxy of the NO2
concentrations.
   The Spearman’s rank correlation coefﬁcient was computed for each geographical
factor. The geographical factors with ρs > 0.4 are reported in Figure 5.6, and in Figure
5.7 are reported all of them with ρs < −0.4. The subset of geographical factors with the
higher Spearman’s coefﬁcient were manually selected avoiding repetition about: class,
feature, and texture type. The selected geographical factors with a positive correlation
56         5 G EOGRAPHICAL FACTORS AND URBAN N ITROGEN D IOXIDE MICRO - ENVIRONMENTS


coefﬁcient are shown in Figure 5.8, while Figure 5.9 shows the geographical factors
with a negative correlation’s coefﬁcient. The details and values of the Spearman’s
coefﬁcient are shown in Table 5.1.
     These graphs highlight a strong correlation between NO2 concentrations and grass
distance (ρs = 0.60), and in particular considering the grass presence within an area
of 45m (ρs = 0.66) another linked indicator is the grass’ SVF (ρs = 0.51). A strong
correlation exists with the presence of very high trafﬁc roads in a radius of 195m; this
conﬁrms the role of trafﬁc as main driver of the NO2 concentrations. The strong corre-
lation with the grass class could be due to the fact that where we have grass we haven’t
got NO2 sources (e.g. roads). A correlation also exists between NO2 concentrations
and the SVF of urban tunnels in a surrounding area of 135m (ρs = 0.59). It is not
clear from these data if the tunnels inﬂuence the NO2 as a geomorphological factor
or because they indicate a certain urban context in Edinburgh. Similar considerations
are valid for the presence of buildings in a radius of 195m (ρs = 0.46). A strong neg-
ative correlation exists between NO2 concentrations and the SVF (ρs = −0.67) and
distance (ρs = 0.66) of very high and high trafﬁc roads and also considering only very
high trafﬁc roads with a ρs respectively of 0.59 and 0.56 for SVF and distance, a simi-
lar correlation was found with the urban tunnels with distance (ρs = −0.56) and SVF
(ρs = −0.55). We found a negative correlation (ρs = −0.49) with the presence of the
tree within a surrounding area of 25 m, which means that a higher presence of trees
generally correspond to a lower concentration of NO2 .
     As highlighted by Cape (2009), the PDTs could be sensitive to wind speed, tem-
perature and humidity, and these meteorological conditions could have inﬂuenced the
campaign, perhaps the correlation with grass and trees is partially due to the micro-
climatic conditions inﬂuenced by urban vegetation. However the average humidity
condition of Edinburgh is quite high, and therefore we can assume a very low spatial
variability of this parameter.
     Table 5.2 reports the values of the Root Mean Square Error (RMSE) of the re-
gression estimators. The correlation of the best regressed values and the measured
concentrations of NO2 are shown in Figure 5.10 with a Spearman’s correlation coefﬁ-
cient of 0.78. A direct comparison between the estimate concentrations and measured
data is shown in Figure 5.11. Figure 5.12 shows the raster map with the assessed
concentrations of NO2 .


5.4 Conclusions

The work highlight how the concentration of Nitrogen Dioxide (NO2 ) is strictly de-
pendent on geographical position. Therefore it is possible to explore the spatial vari-
5.4 C ONCLUSIONS                                                                 57


Figure 5.6: Positive Spearman’s correlation coefﬁcient between Geographical features
and measurements of NO2 concentrations.


Figure 5.7: Positive Spearman’s correlation coefﬁcient between Geographical features
and measurements of NO2 concentrations.
58               5 G EOGRAPHICAL FACTORS AND URBAN N ITROGEN D IOXIDE MICRO - ENVIRONMENTS


     label                                               class type    feature type    texture [pixels]   texture [m]           ρs       P-value

     dist__grass09_SA                                         grass        distance                  9             45       0.6648     0.000005
     dist__grass                                              grass        distance                                         0.5972     0.000075
     cls__rd_vh39_SA                          very high trafﬁc road            class                39           195        0.5891     0.000100
     svf__tunnels27_SA                                       tunnel            SVF                  27           135        0.5864     0.000109
     svf__roads_vh_h09_SA              very high + high trafﬁc road            SVF                   9            45        0.5560     0.000290
     svf__grass                                               grass            SVF                                          0.5088     0.001107
     cls__bd39_SA                                          building            class                39           195        0.4641     0.003337

     svf__roads_vh_h                   very high + high trafﬁc road            SVF                                          -0.6742    0.000003
     dist__roads__vh_h                 very high + high trafﬁc road        distance                                         -0.6619    0.000006
     dist__tunnels21_SA                                      tunnel        distance                 21           105        -0.5950    0.000082
     svf__roads__very_high                    very high trafﬁc road            SVF                                          -0.5909    0.000094
     dist__roads__very_high                   very high trafﬁc road        distance                                         -0.5598    0.000258
     dist__tunnels                                           tunnel        distance                                         -0.5592    0.000263
     svf__tunnels                                            tunnel            SVF                                          -0.5540    0.000308
     dist__tunnels15_SA                                      tunnel        distance                 15            75        -0.5523    0.000324
     dist__roads__very_high45_SA              very high trafﬁc road        distance                 45           225        -0.5068    0.001168
     dist__roads__vh_h45_SA            very high + high trafﬁc road        distance                 45           225        -0.5024    0.001309
     cls__vg5_SA                                                tree           class                 5            25        -0.4985    0.001447


Table 5.1: Spearman’s correlation coefﬁcient (ρs ) computed between the average value
of the concentration of NO2 and different geographical factors, In this Table are re-
ported selected geographical factors with a moderately or strong positive or negative
Spearman’s correlation coefﬁcient ρs > 0.4 ∨ ρs < −0.4.


 model                             parameters                                          mean [µg · m−3 ]   max [µg · m−3 ]       min [µg · m−3 ]    std [µg · m−3 ]   time [s]

 Support Vector Regression         kernel=Sigmoid, C = 10, γ = 0.01                    15.17              62.48                 0.41               25.31             0.07
 Support Vector Regression         kernel=linear, C = 0.1                              15.54              63.24                 0.47               25.56             0.06
 Support Vector Regression         kernel=RBF, C = 100, γ = 0.001                      15.89              60.42                 0.46               24.61             0.08
 Ridge                             α = 100                                             15.99              62.40                 0.00               25.50             0.03
 BayesianRidge                                                                         16.47              62.29                 0.30               25.59             0.24
 Elastic Network                   α = 10, l1_ratio = 0                                16.52              65.49                 0.47               26.52             0.56
 Gradient Boosting Regressor       loss=lad, learning_rate=0.5, numb. estimators=10    17.02              45.67                 0.63               20.54             0.18
 Extra Trees Regressor             numb. estimators=500, max_features=sqrt             17.09              62.75                 0.33               25.45             9.35
 Lars                              numb. non-zero coefﬁcient=5                         17.27              67.72                 0.20               27.37             0.05
 Lasso Lars                        α=1                                                 17.63              69.65                 0.54               28.12             0.05
 Random Forest Regressor           numb. estimators=500, max_features=log2             17.64              65.03                 1.33               26.40             18.79
 Bagging Regressor                 numb. estimators=500, max_features=20               17.77              65.03                 0.83               26.33             33.02
 Lasso                             α=1                                                 20.10              61.90                 0.28               26.94             0.05
 Perceptron                        penality=l2, α = 100                                21.60              81.72                 0.20               33.45             0.02
 SGD Regressor                     loss=squared_epsilon_insensitive, penality=l2       21.60              81.72                 0.20               33.45             0.02
 Logistic Regression               penality=l1, C = 10                                 22.53              48.00                 1.00               26.67             0.73
 ARD Regression                                                                        30.39              73.98                 3.18               35.26             12.13
 Passive Aggressive Regressor   C = 10, loss=epsilon                                   38.80              105.57                1.89               48.28             0.02


Table 5.2: Root Mean Square Error (RMSE) computed between the average value of
the concentration of NO2 and the regression value assess using a left-one-out cross-
validation method for each machine learning algorithm. In this Table are reported the
mean value the maximum, minimum, standard deviation and the computational time
needed for cross-validation.
5.4 C ONCLUSIONS                                                                                   59


(a) ρs = 0.66, p = 5.3e−6            (b) ρs = 0.60, p = 7.5e−5       (c) ρs = 0.59, p = 9.9e−5
grass, distance, texture 9 pixels    grass, distance                 very high trafﬁc road, class, tex-
                                                                     ture 39 pixels


(d) ρs = 0.59, p = 1.1e−4            (e) ρs = 0.56, p = 2.8e−4       (f) ρs = 0.51, p = 1.1e−3
tunnel, SVF, texture 27 pixels       very high + high trafﬁc road,   grass, SVF
                                     SVF, texture 9 pixels


(g) ρs = 0.46, p = 3.3e−3
building, class, texture 39 pixels

Figure 5.8: Correlation between the transformed geographical features (µ = 0 and
σ = 1) and NO2 concentration. In each sub-plot are reported: the ρs and the relative
p, the feature land-use class, the feature type and, if a texture is applied, the number of
pixels used to compute the texture.


ability of the NO2 taking into account the geographical factors. In this chapter several
factors were taken into account: 9 land use classes were used (very high/high/medi-
um/low/very low trafﬁc roads, trees, grass, building, tunnels) two different features
were computed (distance and SVF) and for all these factors 9 different texture dis-
tances were considered. For each of these 243 geographical factors we checked the
60          5 G EOGRAPHICAL FACTORS AND URBAN N ITROGEN D IOXIDE MICRO - ENVIRONMENTS


(a) ρs = −0.67, p = 3.4e−6      (b) ρs = −0.66, p = 6.0e−6           (c) ρs = −0.59, p = 8.1e−5
very high + high trafﬁc road,   very high + high trafﬁc road, dis-   tunnel, distance, texture 21 pix-
SVF                             tance                                els


(d) ρs = −0.59, p = 9.4e−5      (e) ρs = −0.56, p = 2.6e−4           (f) ρs = −0.56, p = 2.6e−4
very high trafﬁc road, SVF      very high trafﬁc road, distance      tunnel, distance


(g) ρs = −0.55, p = 3.1e−4      (h) ρs = −0.51, p = 1.2e−3           (i) ρs = −0.50, p = 1.4e−3
tunnel, SVF                     very high trafﬁc road, distance,     tree, class, texture 5 pixels
                                texture 45 pixels

Figure 5.9: Correlation between the transformed geographical features (µ = 0 and
σ = 1) and NO2 concentration. In each sub-plot are reported: the ρs and the relative
p, the feature land-use class, the feature type and, if a texture is applied, the number of
pixels used to compute the texture.


correlation with the measured average mean concentration of NO2 .
     A positive correlation was found with the distance and the SVF of the grass, the
presence of very high trafﬁcked roads in the surrounding area, the presence of tunnel
and building. Negative correlation was found for the SVF and distance of the roads
with high and very high trafﬁc level, distance from the urban tunnels and the presence
5.4 C ONCLUSIONS                                                                 61


Figure 5.10: Relation between the regressed data using the Support Vector Regression
method and measurements of NO2 concentrations.


Figure 5.11: Comparison between regressed data and the average of the NO2 measured
concentrations.
62         5 G EOGRAPHICAL FACTORS AND URBAN N ITROGEN D IOXIDE MICRO - ENVIRONMENTS


Figure 5.12: Map with assessed NO2 concentrations using the Support Vector Regres-
sion method trained with the average of the measured concentration


of trees. Since correlation does not imply causation, we can not say that grass or vege-
tation play an active role in the reduction of NO2 concentrations or simply indicates the
absence of NO2 sources, however these correlations can be used to assess the spatial
variability of NO2 within the urban context.
     Geographical factors were reduced using a recursive feature selection and then
used as input data to train and test 18 different machine learning algorithms to test
the algorithm that provides the best regressed values of the NO2 . The Support Vector
Regression algorithm using a Sigmoid kernel provided a mean RMSE of 15.17µg · m−3
with a strong correlation between measured and estimate concentration data (ρs =
0.78).
     This chapter presents a new methodology that can be easily extended to consider
other land-use classes, different features and texture types. Combining these geo-
graphical factors with machine-learning algorithms proved to be reliable in assessing
the spatial variability of physical phenomena within the urban context.
Chapter 6

Decision support system to assess
and compare the effect of different
action on the urban air quality
Chapter 7

Regressed scenarios comparison: a
methodology for urban planners

        Many studies focus their research on characterizing the intra-urban vari-
        ability of air pollutants; few of them try to provide a methodology to assist
        policy and decision makers to maximize air quality enhancement within the
        urban context. This chapter presents a methodology that aims to compare
        different scenarios, combining the measurement ﬁeld data and regression
        modelling to assess the Nitrogen Dioxide (NO2 ) concentrations before and
        after a certain urban change. 37 passive diffusive tubes (PDTs) sampler
        points were used to train a regression model and link the measured con-
        centration with some geographical factors. Then three different scenarios
        were developed to reduce the average concentrations of NO2 in the area
        surrounding a nursery in the city of Edinburgh (UK). The ﬁrst sets some
        trees in front of the nursery building, the second changes the trafﬁc level of
        the road introducing a new tram line, and the last one couples the tram line
        with some green spaces using grass and trees along the road. The result-
        ing concentration in all three scenarios is computed and compared to one
        another.


7.1 Introduction

An increasing number of epidemiological studies have highlighted an association be-
tween air pollution exposure and adverse health effects (World Health Organization
(WHO), 2014a). Many works focus their research developing and testing tools that
predict and model concentration of intra-urban variation of air pollutants. Two main
techniques are used in literature: Dispersion Modelling (DM) and Land Use Regres-
sion modelling (LUR). As highlighted by Hoogh et al. (2014) “DMs are based on
detailed knowledge of the physical, chemical, and ﬂuid dynamical processes in the
atmosphere”. DMs require information related to meteorology to model transport and
66            7 R EGRESSED SCENARIOS COMPARISON :   A METHODOLOGY FOR URBAN PLANNERS


pollutant transformations, as well as a detailed model of the physical, chemical and
ﬂuid dynamical processes in the atmosphere. These requirements come at a price of
difﬁculties in collecting or modelling all these input data. LUR, on the other hand,
combines monitored data with Geographic Information System (GIS)-based predictor
data to build a prediction model, in order to assess the average concentration of air
pollutant in a certain urban context. LUR modelling requires a data ﬁeld campaign to
characterize the intra-urban variability and link the geographical factors with the con-
centration of air pollutants. LUR is good at assessing average values and is less used
to estimate short time variations. In comparison with DMs, LUR is easier to use and
faster to run, reaching comparable performances (Hoogh et al., 2014).
     This chapter presents a LUR modelling to assess the effect of different actions
and scenarios on the air quality level at an intra-urban scale. The idea is to provide
a methodology that can be applied by urban dwellers, planners and policy-makers
to create healthier cities, allowing them to compare the effects of different policies,
landscape and urban design. The methodology not only provides an estimate of the air
quality level that will be reached after a certain action, but furnishes information about
how much and where this reduction occurs and identiﬁes the area affected by a certain
urban change. This data-driven methodology can be used to compare different urban
and policy options.


7.2 Materials and methods

The geographical features of urban micro-environments were used in combination with
machine-learning algorithms for the regression of the average concentration of Nitro-
gen Dioxide (NO2 ) measured, during a winter campaign, using passive diffusive tubes
(PDTs) to monitor the city centre of Edinburgh (UK). For more details on the study
area, the measurement campaign, the geographical analysis and the regression tech-
nique please refer to previous chapters. Levy et al. (2014) highlight that NO2 seems to
be the best proxy available for the other air pollutants, therefore high levels of NO2 are
generally linked with high levels of others air pollutants, providing information about
the air quality of the area.
     Old adults and babies are more sensitive than others to air pollutant concentrations.
To preserve their health it could be useful to identify some of the places where they
spent a great portion of the day time. We identiﬁed some of these places and we
extracted them from the Open Street Map (OSM) project; in particular we considered:
nurseries, kindergartens, schools and clinics.
     I use a nursery in the south of the city centre of Edinburgh, close to trafﬁcked
roads, as a test case to assess the effects that different private and urban decisions and
7.2 M ATERIALS AND METHODS                                                            67


Figure 7.1: The main places, in the city centre of Edinburgh extracted by the Open-
StreetMap project, where the weaker part of the population (elderly and young people)
spend a relevant part of their day-time.


actions could have on air quality and speciﬁcally on the NO2 concentrations on that
area.
    Three different scenarios were developed and compared. In the ﬁrst scenario we
tried to improve the air quality, planting some trees to make a sort of "vegetation wall"
in front of the building. In the second scenario we decided to reduce the trafﬁc of the
road, introducing an electric tram that lowers the trafﬁc level from a very high to a
high trafﬁcked road. In this scenario the road lines were dropped from the current 4
road lines to 2 roads and 2 tram lines. The third scenario hypothesized substituting one
tram line with a green area with trees and grass and covering the remaining tram line
with grass.
    In all these scenarios we basically changed the land-use and the urban micro-
environments (see Figure 7.3), so we needed to update all the geographical factors
that are inﬂuenced by the change. Then we used the previous geographical features
and the average NO2 concentration to train the best regression algorithm, which in our
case is the Support Vector Regression (SVR), and applied the regression algorithm us-
ing the changed geographical features of the scenarios to assess the impact of different
planning choices. Figure 7.4 shows the SVF of grass and tree for the current context
and the context in the third scenario.
    I computed the difference between the current NO2 concentrations and all the sce-
narios to highlight the impact of different actions. Finally, actions have an effect not
68            7 R EGRESSED SCENARIOS COMPARISON :    A METHODOLOGY FOR URBAN PLANNERS


                (a) Current                                      (b) Add trees


               (c) Add tram                                  (d) Add trees and grass

Figure 7.2: In Figure 7.2a is reported the current situation; in Figure 7.2b some trees
are added in front of the building to make a sort of green wall to protect from pollutant
concentrations; Figure 7.2c introduces an electric tram line reducing the trafﬁc road
load from very high to high; Figure 7.2d leaves only one tram line introducing grass
and trees along the road.


only at the local scale (the nursery) but also at the district and city level.
7.3 R ESULTS AND DISCUSSION                                                             69


Figure 7.3: Land-use changes concerning the second scenario that introduced a new
tram line and the third that added trees and grass areas.


7.3 Results and discussion


In the ﬁrst scenario, adding trees in front of the nursery building had a negligible effect
on air pollutant concentrations (see Figure 7.5b). The maximum difference between
the existing conﬁguration and the conﬁguration with the trees was 0.5µgm−3 (see Fig-
ure 7.6a). The second scenario assesses the impact on the air quality if a tram line
was used to reduce the trafﬁc load of the road in front of the nursery building. In this
case the trafﬁc load was reduced from a very high trafﬁcked road class to high. In-
troducing this change had an important effect on the nursery area (see Figure 7.5c),
with an assessed reduction of NO2 concentration of about 15µgm−3 (see Figure 7.6b).
The third scenario combines the tram with the introduction of grass and trees, with an
assessed reduction of 22µgm−3 ; the resulting concentration is shown in Figure 7.5d,
and the NO2 concentration differences are reported in Figure 7.6c. Furthermore, as
shown in Figure 7.7, the methodology presented in this chapter is able to assess NO2
concentrations changes at an urban scale. Since the tram line involves a large part of
the city-centre of Edinburgh the effect of this change involves also other parts of the
city and not only the surrounding areas of the nursery.
70             7 R EGRESSED SCENARIOS COMPARISON :   A METHODOLOGY FOR URBAN PLANNERS


         (a) SVF grass, current                           (b) SVF grass, third scenario


          (c) SVF tree, current                            (d) SVF tree, third scenario

Figure 7.4: The Sky View Factor (SVF) as it is now and as it is in the third scenario
with the introduction of the grass and trees element along the trafﬁcked road.


7.4 Conclusions

The presented chapter shows a new data-driven methodology to assess the impact that
different actions have on average air quality within an urban context. The methodology
requires a set of geographical and measurement data to train the machine-learning
regression algorithms and link the pollutant’s concentrations with geographical factors.
Once this link has been established, it is possible to build new scenarios changing the
geographical factors and designing the urban landscape to enhance the average air
quality level of the city.
     Using a greater number of sampled points can increase the measurement density
and therefore improve the characterization of the spatial variability of air pollutants.
A higher number of measurements can help to make the link between geographical
factors and measured concentrations of air pollutants clearer and stronger.
     The described methodology is simple to implement and use, since it requires com-
7.4 C ONCLUSIONS                                                                       71


                   (a) Current                                      (b) Tree


                    (c) Tram                                        (d) TTG

Figure 7.5: Variation of N02 concentrations [µgm−3 ], in different scenarios, zoomed
on the surrounding areas of the nursery. Figure 7.5a shows the current concentration of
NO2 , Figure 7.5b illustrates the concentrations after the introduction of a tree in front
the nursery. Figure 7.5c reports the concentration level after reducing the trafﬁc level
of the road by introducing a tram, and Figure 7.5d introduces the tram and covers the
street with trees and grass (TTG).


mon geographical data as input and a measurement campaign using the Passive Dif-
fusive Tubes (PDTs) to characterize the concentration of air pollutants. Moreover,
the methodology is fast to run and allows testing for several urban conﬁgurations and
comparing the effectiveness of different scenarios and actions to each other.
    The used measurements provide a value that is representative only of the aver-
age concentration value of Nitrogen Dioxide (NO2 ). The same methodology can be
extended to work with ﬁeld data characterized by a higher temporal resolution, also
integrating meteorological factors into the model (Su et al., 2008a).
    The methodology can be improved to consider other factors that can play a role in
ﬁnal air quality in a certain area (e.g. farms, industrial areas, etc.)
    A future development could be to write a tool that, when geographical factors are
changed, automatically updates all the derived maps and estimates the concentration
72           7 R EGRESSED SCENARIOS COMPARISON :   A METHODOLOGY FOR URBAN PLANNERS


                  (a) Tree


                  (b) Tram                                    (c) TTG

Figure 7.6: Differences between the current N02 concentrations [µgm−3 ] and the sce-
narios.


of NO2 in the study area.
7.4 C ONCLUSIONS                                                                        73


                   (a) Current                                    (b) Tree


                    (c) Tram                                      (d) TTG

Figure 7.7: Variation of N02 concentrations [µgm−3 ], in different scenarios at city
scale. Figure 7.7a shows the current concentration of NO2 , Figure 7.7b illustrates the
concentrations after the introduction of the tree in front the nursery. Figure 7.7c shows
the concentration level after reducing the trafﬁc level of the road due to the introduction
of a tram, and Figure 7.7d reports the situation after the introduction of the tram and
when the street is covered with trees and grass (TTG).
Chapter 8

Conclusions

A new Geographic Information System model has been developed to characterize the
effect of micro-environmental factors that play a role in driving spatial and time vari-
ability of air pollutants, using machine-learning algorithms to assess and compare the
effects of different policies and scenarios on urban air quality. To achieve these results,
we tested low-cost sensor technologies on outside conditions, and the measurements
were coupled with Ofﬁcial Measurement Stations (OSM) datasets. The comparison
highlights a strong sensitivity of the accuracy to external humidity and temperature
conditions and a strong drift on the residues with time.
    A new python library has been developed to enlarge the prototyping options to
GIS modellers, creating a high-level interface to the C API of GRASS GIS that conve-
niently opens the GIS data with other scientiﬁc communities and libraries. The char-
acterization of urban micro-environments is made through classifying land-use using
an Object-Based Image Analysis (OBIA). To assign the proper land-use category, for
each object several features were extracted, and these data were used to train, test and
select the machine-learning algorithms that provide the higher accuracy.
    The School of Chemistry of the University of Edinburgh conducted a six-week
campaign using Passive Diffusive Tubes (PDTs) to characterize the spatial variability
of Nitrogen Dioxide (NO2 ) for the city centre of Edinburgh. I used these measure-
ments to test the correlation between different geographical factors and air quality
level. The analysis highlights that, even if with a certain variability, within the mea-
surement weeks, the most polluted points are the most polluted in all the weeks; there-
fore we can consider the average value of NO2 quite constant over time and spatially
well-deﬁned. The geographical factors were used as an input dataset, to train machine-
learning regression algorithms to assess the average concentration value of NO2 . The
ﬁnal step was to design some urban changes and test and compare the impact these
could have on average NO2 concentration within the urban context.
76                                                                         8 C ONCLUSIONS


8.1 Pro and cons of the proposed methodology

 PROS:                                        CONS:

      • Common and easy to found geo-             • Needs a measurements campaign
        graphical input data;                       to collect air quality data;

      • Extensible to consider other cate-        • Provides a static average picture
        gories and factors;                         of pollutants level;

      • Fast to compute and assess the av-        • Does not consider meteorological
        erage air quality level;                    conditions;

      • Easily quantiﬁes the estimate er-         • The uncertainty increases as the
        ror;                                        differences between the current
                                                    situation and the projected scenar-
                                                    ios increase.


      Table 8.1: Pros and Cons of the methodology described in this manuscript.

     The main advantages and disadvantages of the methodology described in this chap-
ter are summarized in Table 8.1. The proposed methodology requires geographical
input data that are quite common; if not available at the desired resolution it is possible
to extract them following the path as described in Chapter 4, from aerial or satellite
images. The geographical factors that were used to assess air quality level are ﬁve
different classes of roads, grass, trees and buildings, but new factors can be used de-
pending on the context and the pollutants that we aim to assess. The methodology
requires, for each change in the current land-use, to re-compute all the raster features
used by the regression model before being able to assess the air quality level. The
processing time required for these operations is short, especially if compared with dis-
persion models, since it is only extracting quite a basic feature, such as distance, Sky
View Factor or raster texture. Another interesting point of this methodology is that it
quantiﬁes the Root Mean Square Error (RMSE) or coefﬁcient of determination (R2 )
and therefore provides a value to assess the quality of the ﬁnal computed scenarios.
     The proposed methodology requires the concentration data value of several mea-
sured points to test and train the regression model. Due to the low number of Ofﬁcial
Measurement Stations (OMS) that are generally available within the urban context, the
existing air quality monitoring network is not enough. Therefore a dedicated measure-
ments campaign or network using passive samplers, bio-indicators or low-cost sensors,
is required to apply the method. The data collected using passive samplers provides
a weekly average value of NO2 and the meteorological factors seem to play a minor
8.2 F UTURE DEVELOPMENT                                                             77


role in inﬂuencing the average concentration value. Therefore we did not consider
meteorological factors; however, if data with a higher time resolution are available,
the methodology can be modiﬁed also to use these factors. Since the methodology
is based on measured data and not resolving the physics of the phenomena as object
of the study, then if the scenarios remain quite similar to the context where the data
were collected the error is close to the regression error; if the scenarios apply radi-
cal changes, uncertainty on the assessed error will increase considerably. Therefore a
dispersion model can be more suitable to assess radically different scenarios.


8.2 Future development
Concerning the Decision Support System, with the availability of data with a higher
temporal resolution the methodology can be tested in short-term modelling. As re-
ported by Gulliver and Briggs (2011), the LUR methods are able to model long term
spatial variability but are not well-suited to deal with short-term modelling, mainly
because the LUR methods do not consider the inﬂuence of meteorology. Combining
LUR techniques with a dispersion model produces good predictions of monitored con-
centrations for different time periods and locations (Beelen et al., 2010; Gulliver and
Briggs, 2011). Therefore increasing the temporal resolution by taking meteorology
into account could help to develop a Decision Support System that can deal with sce-
narios that change the day’s policies, such as, for example, closing a certain road to
the trafﬁc during peak hours, etc.
   Concerning the libraries and tools developed during the Ph.D., PyGRASS is now
included in the last stable release of GRASS GIS and it is being used by other re-
searchers (Grohmann, 2015). The v.class.ml module has been also converted into
a QGIS plugins (STEM), and it will be tested for processing biological images. The
analysis of urban micro-environments and the regression is under testing to assess the
Urban Heat Island (UHI) effect in the city centre of Bolzano, Italy within the SINFO-
NIA (FP7) project of the European Research Academy (EURAC).
Appendix

Some small samples of code are provided below to show how modelers, scientists
and developers could interact with the PyGRASS library. If the code starts with
 >>> , this indicates a python interactive section with the terminal. To use and test
the PyGRASS extension, the reader needs to install the latest development version
of GRASS7. Furthermore, all the following examples use the maps contained in the
free available GRASS demonstration dataset North Carolina (http://grass.osgeo.
org/sampledata/north_carolina/nc_basic_spm_grass7.tar.gz).
80                                                                            8 C ONCLUSIONS


Modules

                            Listing 8.1: Direct inputs/outputs.
# -*- coding : utf -8 -*-
import subprocess as sub
from pygrass . modules import Module


colout = Module (" r. colors . out ", map =" elevation ", stdout_ = sub . PIPE )

col = Module (" r. colors ")
col . inputs . map = " field "
col . inputs . stdin = colout . outputs . stdout
col . inputs . rules = ’-’
col . run ()


                      Listing 8.2: The syntax is similar to POSIX.
# -*- coding : utf -8 -*-
from grass . pygrass . modules import raster as r

# cmd : r. info map = elevation
r. info ( map =’ elevation ’)

# cmd : r. slope . aspect elevation = elevation slope = slope aspect = aspect -- overwrite
r. slope_aspect ( elevation =’ elevation ’, slope =’ slope ’, aspect =’ aspect ’, overwrite = True )

# cmd : r. mapcalc ’ slope_gt100 = if ( slope > 100 , slope )’ -- overwrite
r. mapcalc (’ slope_gt100 = if ( slope > 100 , slope )’, overwrite = True )


                           Listing 8.3: Backward compatibility.
# -*- coding : utf -8 -*-
from grass . pygrass . modules import Module as run_command

run_command (’r. info ’, map =’ elevation ’)


                             Listing 8.4: Module as an object.
>>> from grass . pygrass . modules import Module
>>> slp = Module (’r. slope . aspect ’)                         # instantiate
>>> slp . name
’r. slope . aspect ’
>>> slp . description
’ Aspect is calculated counterclockwise from east .’
>>> slp . inputs [’ elevation ’]
Parameter < elevation > ( required :yes , type : raster , multiple : no )
>>> slp . inputs . elevation = ’ elevation ’                # set parameter value
>>> slp . inputs . slope = ’slope ’
>>> slp . run ()                        # finally execute the ’slp ’ module


                                Listing 8.5: Run and ﬁnish.
8.2 F UTURE DEVELOPMENT                                                               81


>>>   from grass . pygrass . modules import Module
>>>   slp = Module (’r. slope . aspect ’)
>>>   slp ( elevation =’ elevation ’, slope =’slp ’, aspect =’asp ’,
...     format =’ percent ’, overwrite = True , run_ = False ) # only set the parameters
>>>   slp ( elevation =’ elevation ’, slope =’slp ’, aspect =’asp ’,
...     format =’ percent ’, overwrite = True , finish_ = False )    # start the process
>>>   slp . popen . wait () # . kill () manage the process


                                     Listing 8.6: Stdin.
# -*- coding : utf -8 -*-
from pygrass . modules import raster as r

rules = """0 red
1 green
2 blue
end """

r. colors ( map =’ rtest ’, rules =’-’, stdin_ = rules )

Vectors

                                 Listing 8.7: Vector class.
>>> from grass . pygrass . vector import Vector
>>> schools = Vector (’ schools ’)
>>> schools . open (’r ’)
>>> schools . title       # Vector attributes : name , organization , person , map_date
’ Wake County schools ( points map )’
>>> school = schools . next ()                # access to the geometry features
>>> school
Point (633649.285674 , 221412.944348)
>>> for school in schools :            # or simply iterate through the vector map
...    print school
...
POINT (628787.129283 , 223961.620521)
POINT (629900.710134 , 222915.798505)
POINT (630686.456623 , 224447.772161)
...
>>> schools . close ()


                              Listing 8.8: VectorTopo class.
>>> from grass . pygrass . vector import VectorTopo
>>> geology = VectorTopo (’ geology ’)
>>> geology . open (’r ’)
>>> geology . title       # Vector attributes : name , organization , person , map_date
’ North Carolina geology map ( polygon map )’
>>> for g in geology :                    # or iterate through the vector map
...    print g
...
LINESTRING (...)
LINESTRING (...)
...
POINT (...)
82                                                                   8 C ONCLUSIONS


POINT (...)
...
>>> big = [a for a in geology . viter (’ areas ’)   # iterate only a geometry type
...      if a. alive () and a. area () >= 10**8]
>>> geology . close ()


                        Listing 8.9: Write a new vector map.
# -*- coding : utf -8 -*-
from pygrass . vector import VectorTopo

# instantiate and open the tunnels map
tunnels = VectorTopo (’ tunnels ’)
tunnels . open (’r ’)

# instantiate a new map
new = VectorTopo (’ newvect ’)

# open a new vector map defining the column names and types , with :
new . open (’w ’, tabcols =[(u ’cat ’, int ),
                            (u ’ tunnel ’, int ),
                            (u ’ length ’, float )])

for tunnel in tunnels :
      # extract the first and the last point of the tunnel
      first , last = tunnel [0] , tunnel [ -1]
      # compute the tunnel length and divide the length for each point
      length = tunnel . length () / 2.
      #. write ( geom_feature , attributes )
      new . write ( first , ( tunnel .cat , length ))
      new . write ( last , ( tunnel .cat , length ))
# then close all
new . close ()
tunnels . close ()

Rasters

                            Listing 8.10: RasterRow class.
>>> from grass . pygrass . raster import RasterRow
>>> elev = RasterRow (’ elevation ’)
>>> elev . exist ()                            # check if the map exist
True
>>> elev . name                               # return the raster name
’ elevation ’
>>> elev . mapset                            # return the raster mapset
’ PERMANENT ’
>>> elev . open (’r ’)                             # open in read mode
>>> elev . is_open ()                         # check if the map is open
True
>>> elev . range                               # return the map range
(55.578792572021484 , 156.32986450195312)
>>> for row in elev [:5]:                          # get the first 5 rows
...    print ( row [:3])             # show the first 3 columns of each row
...
8.2 F UTURE DEVELOPMENT                                                         83


[ 141.99613953 141.27848816        141.37904358]
[ 142.90461731 142.39450073        142.68611145]
[ 143.81854248 143.54707336        143.83972168]
[ 144.56524658 144.58493042        144.86477661]
[ 144.99488831 145.22894287        145.57142639]
>>> elev . close ()


                           Listing 8.11: Write a new raster map.
# -*- coding : utf -8 -*-
from grass . pygrass . raster import RasterRow

# instantiate raster objects
old = RasterRow (’ elevation ’)
new = RasterRow (’ mapname1 ’)

# open the maps
old . open (’r ’)
new . open (’w ’, mtype = old . mtype , overwrite = True )

# start a cycle
for row in old :
    new . put_row ( row > 100)                                         # write row

# close the maps
new . close ()
old . close ()

GIS/GRASS

                           Listing 8.12: Write a new raster map.
>>> from grass . pygrass . gis import Location , Mapset
>>> location = Location ()
>>> location . mapsets ()
[’ PERMANENT ’, ’user1 ’]
>>> permanent = Mapset (’ PERMANENT ’)
>>> permanent . glist (’ rast ’, pattern =’ elev * ’) # return a list with rasters
[’ elevation_shade ’, ’ elevation ’]


                           Listing 8.13: Write a new raster map.
>>> from grass . pygrass . gis . region import Region
>>> region = Region ()
>>> region . north
258500.0
>>> region . south
185000.0
>>> region . rows
7350
>>> region . nsres
10.0
>>> print region
projection : 99 ( Lambert Conformal Conic )
zone :     0
84                                                                           8 C ONCLUSIONS


datum :    nad83
ellipsoid : a =6378137 es =0.006694380022900787
north :    258500
south :    185000
west :     596670
east :     678330
nsres :    10
ewres :    10
rows :     7350
cols :     8166
cells :    60020100

Benchmark

Listing 8.14: Write a new vector points map adding the raster value in the attribute
table using PyGRASS.
# -*- coding : utf -8 -*-
import numpy as np
from grass . pygrass . gis . region import Region
from grass . pygrass . vector import VectorTopo
from grass . pygrass . raster import RasterRow
from grass . pygrass . functions import coor2pixel


def sample ( vect_in_name , rast_in_name ):
    """ sample (’ point00 ’, ’field ’)"""
    # instantiate the object maps
    vect_in = VectorTopo ( vect_in_name )
    rast_in = RasterRow ( rast_in_name )
    vect_out = VectorTopo (’ test_ ’ + vect_in_name )

     # define the columns of the attribute table of the new vector map
     columns = [(u ’cat ’,       ’ INTEGER PRIMARY KEY ’) ,
                ( rast_in_name , ’DOUBLE ’)]

     # open the maps
     vect_in . open (’r ’)
     rast_in . open (’r ’)
     vect_out . open (’w ’, tab_cols = columns , link_driver =’ sqlite ’)

     # get the current region
     region = Region ()

     # initialize the counter
     counter = 0
     data = []
     for pnt in vect_in . viter (’ points ’):
         counter += 1
         # transform the spatial coordinates in row and col value
         x , y = coor2pixel ( pnt . coords () , region )
         value = rast_in [ int (x )][ int (y )]
         data . append (( counter , None if np . isnan ( value ) else float ( value )))
         # write the geometry features
         vect_out . write ( pnt )
8.2 F UTURE DEVELOPMENT                                                                         85


    # write the attributes
    vect_out . table . insert ( data , many = True )
    vect_out . table . conn . commit ()

    # close the maps
    vect_in . close ()
    rast_in . close ()
    vect_out . close ()


Listing 8.15: Write a new vector points map adding the raster value in the attribute
table using C.
# include    < stdio .h >
# include    < stdlib .h >
# include    < string .h >
# include    < math .h >

# include    < grass / gis .h >
# include    < grass / raster .h >
# include    < grass / glocale .h >
# include    < grass / dbmi .h >
# include    < grass / vector .h >


static void sample ( char * input , char * rast , char * column , char * output ) {

            struct Cell_head window ;
            struct Map_info In , Out ;
            int fdrast ;
            DCELL value ;
            G_get_window (& window );
            int line ;
            int type ;
            struct field_info * Fi ;
            dbDriver * Driver ;
            char buf [2000];
            dbString sql ;
            DCELL * dcell_buf ;

            /* Open input */
            Vect_set_open_level (2);
            Vect_open_old2 (& In , input , "" , "1");
            fdrast = Rast_open_old ( rast , "");

            /* Open output */
            Vect_open_new (& Out , output , 0);
            Vect_hist_copy (& In , & Out );
            Vect_hist_command (& Out );

            /* Create table */
            Fi = Vect_default_field_info (& Out , 1, NULL , GV_1TABLE );
            Vect_map_add_dblink (& Out , Fi -> number , Fi -> name , Fi -> table , Fi ->key ,
                            Fi -> database , Fi -> driver );
            Driver = db_start_driver_open_database (Fi -> driver ,
86                                                                         8 C ONCLUSIONS


                            Vect_subst_var (Fi -> database , & Out ));
         sprintf (buf , " create table %s ( cat integer , rast_val double precision )" ,
                            Fi -> table );
         db_init_string (& sql );
         db_set_string (& sql , buf );
         db_execute_immediate ( Driver , & sql );
         db_create_index2 ( Driver , Fi -> table , Fi -> key );
         db_grant_on_table ( Driver , Fi -> table , DB_PRIV_SELECT , DB_GROUP | DB_PUBLIC );

         /* Sample the raster map with vector points */
         struct line_pnts * Points = Vect_new_line_struct ();
         struct line_cats * Cats = Vect_new_cats_struct ();
         int nlines = Vect_get_num_lines (& In );
         int count = 0;

         dcell_buf = Rast_allocate_buf ( DCELL_TYPE );

         db_begin_transaction ( Driver );

         for ( line = 1; line <= nlines ; line ++) {
                  type = Vect_read_line (& In , Points , Cats , line );

                  if (!( type & GV_POINT ))
                      continue ;

                  if ( G_point_in_region ( Points ->x [0] , Points ->y [0]) == 0)
                            continue ;

                  if ( Rast_is_d_null_value (& value ))
                       continue ;

                  int row = Rast_northing_to_row ( Points ->y [0] , & window );
                  int col = Rast_easting_to_col ( Points ->x [0] , & window );

                  Rast_get_d_row ( fdrast , dcell_buf , row );
                  value = dcell_buf [ col ];

                  /* Write value into the vector table */
                  count ++;
                  Vect_reset_cats ( Cats );
                  Vect_cat_set ( Cats , 1, count );
                  Vect_write_line (& Out , GV_POINT , Points , Cats );

                  sprintf (buf , " INSERT INTO %s VALUES ( %d , %e ); ", Fi -> table , count ,
                                     ( double ) value );
                  db_set_string (& sql , buf );
                  db_execute_immediate ( Driver , & sql );
         }

     db_commit_transaction ( Driver );
         db_close_database_shutdown_driver ( Driver );

         Rast_close ( fdrast );
         Vect_close (& In );
         Vect_build (& Out );
8.2 F UTURE DEVELOPMENT                                                                      87


         Vect_close (& Out );

         exit ( EXIT_SUCCESS );
}


                    Listing 8.16: Compute using the RasterRow class.
# -*- coding : utf -8 -*-
from grass . pygrass . raster import RasterRow

def ifcondition ( mapname0 , mapname1 ):
    # instantiate raster objects
    old = RasterRow ( mapname0 )
    new = RasterRow ( mapname1 )
    # open the maps
    old . open (’r ’)
    new . open (’w ’, mtype = old . mtype , overwrite = True )
    # start a cycle
    for row in old :
          true = row > 50
          new . put_row ( row * true )
    # close the maps
    new . close ()
    old . close ()


                  Listing 8.17: Call the r.mapcalc module from Python
sub . Popen (" r. mapcalc expression =’ test_mapcalc = if ( field >50 , field ,0) ’ --o", shell = True ). wait ()
Bibliography

Albanese, D. and R. Visintainer (2012). “mlpy: Machine Learning Python”. In: arXiv preprint arXiv: . . .
     Pp. 1–4.
Alemdar, H. and C. Ersoy (2010). “Wireless sensor networks for healthcare: A survey”. In: Computer
     Networks 54.15, pp. 2688–2710.
Alioscha-Perez, M. and H. Sahli (2014). “Efﬁcient Learning of Spatial Patterns with Multi-Scale Con-
     ditional Random Fields for Region-Based Classiﬁcation”. In: Remote Sensing 6.8, pp. 6727–6764.
     ISSN : 2072-4292.
Allwine, K., F. Rutz, W. Shaw, J. Rishel, B. Fritz, E. Chapman, B. Hoopes, and B. Seiple (2006). “DUS-
     TRAN 1.0 user’s guide: a GIS-based atmospheric dust dispersion modelling system”. In: Technical
     report PNNL-16055.
Arain, M. A., R. Blair, N. Finkelstein, J. R. Brook, T. Sahsuvaroglu, B. Beckerman, L. Zhang, and M.
     Jerrett (2007). “The use of wind ﬁelds in a land use regression model to predict air pollution concen-
     trations for health exposure studies”. In: Atmospheric Environment 41.16, pp. 3453–3464.
Baume, O., J. O. Skø ien, G. B. M. Heuvelink, E. J. Pebesma, and S. J. Melles (2011). “A geostatisti-
     cal approach to data harmonization - Application to radioactivity exposure data”. In: International
     Journal of Applied Earth Observation and Geoinformation 13.3, pp. 409–419.
Beelen, R., G. Hoek, E. Pebesma, D. Vienneau, K. de Hoogh, and D. J. Briggs (2009). “Mapping of
     background air pollution at a ﬁne spatial scale across the European Union”. In: Science of The Total
     Environment 407.6, pp. 1852–1867.
Beelen, R., M. Voogt, J. Duyzer, P. Zandveld, and G. Hoek (2010). “Comparison of the performances
     of land use regression modelling and dispersion modelling in estimating small-scale variations in
     long-term air pollution concentrations in a Dutch urban area”. In: Atmospheric Environment 44.36,
     pp. 4614–4621.
Behling, R., M. Bochow, S. Foerster, S. Roessner, and H. Kaufmann (2015). “Automated GIS-based
     derivation of urban ecological indicators using hyperspectral remote sensing and height informa-
     tion”. In: Ecological Indicators 48, pp. 218–234. ISSN: 1470160X.
Berkowicz, R., M. Ketzel, S. S. Jensen, M. Hvidberg, and O. Raaschou-Nielsen (2008). “Evaluation and
     application of OSPM for trafﬁc pollution assessment for a large number of street locations”. In:
     Environmental Modelling & Software 23.3, pp. 296–303.
Borrell, B. (2011). “Epidemiology: Every bite you take”. In: Nature 470, pp. 320–322.
Briggs, D. J., C. de Hoogh, J. Gulliver, J. Wills, P. Elliott, S. Kingham, and K. Smallbone (2000). “A
     regression-based method for mapping trafﬁc-related air pollution: application and testing in four
     contrasting urban environments”. In: The Science of The Total Environment 253.1-3, pp. 151–167.
British Ordnance Survey (2012). Orthophoto RGB Edinmburgh.
Burgess, S., M. Kranz, N. Turner, R. Cardell-Oliver, and T. Dawson (2010). “Harnessing wireless sen-
     sor technologies to advance forest ecology and agricultural research”. In: Agricultural and Forest
     Meteorology 150.1, pp. 30–37.
90                                                                                       B IBLIOGRAPHY


Busato, F., R. Lazzarin, and M. Noro (2013). “Three years of study of the Urban Heat Island in Padua:
     Experimental results”. In: Sustainable Cities and Society, pp. 1–8. ISSN: 22106707.
Can, a, M Rademaker, T Van Renterghem, V Mishra, M Van Poppel, a Touhaﬁ, J Theunis, B De Baets,
     and D Botteldooren (2011). “Correlation analysis of noise and ultraﬁne particle counts in a street
     canyon.” In: The Science of the total environment 409.3, pp. 564–72.
Cao, X., J. Chen, Y. Zhang, and Y. Sun (2008). “Development of an integrated wireless sensor network
     micro-environmental monitoring system”. In: ISA Transactions 47.3, pp. 247 –255.
Cape, J. N. (2009). “The Use of Passive Diffusion Tubes for Measuring Concentrations of Nitrogen
     Dioxide in Air”. In: Critical Reviews in Analytical Chemistry 39.4, pp. 289–310. ISSN: 1040-8347.
Carnevale, C., G. Finzi, E. Pisoni, M. Volta, G. Guariso, R. Gianfreda, G. Maffeis, P. Thunis, L. White,
     and G. Triacchini (2012a). “An integrated assessment tool to deﬁne effective air quality policies at
     regional scale”. In: Environmental Modelling & Software 38, pp. 306–315.
Carnevale, C., G. Finzi, E. Pisoni, M. Volta, and F. Wagner (2012b). “Deﬁning a nonlinear control prob-
     lem to reduce particulate matter population exposure”. In: Atmospheric Environment 55, pp. 410–
     416. ISSN: 13522310.
Carruthers, D., H. Edmunds, A. Lester, C. McHugh, and R. Singles (2000). “Use and validation of
     ADMS-Urban in contrasting urban and industrial locations”. In: International Journal of Environ-
     ment and Pollution 14.1-6, pp. 364–374.
Chen, A., L. Yao, R. Sun, and L. Chen (2014). “How many metrics are required to identify the effects
     of the landscape pattern on land surface temperature?” In: Ecological Indicators 45, pp. 424–433.
     ISSN : 1470160X.
Ciolli, M, M De Franceschi, R Rea, A Vitti, D Zardi, and P Zatelli (2004). “Development and application
     of 2D and 3D GRASS modules for simulation of thermally driven slope winds”. In: TRANSACTIONS
     IN GIS 8.2, pp. 191–209.
Clewley, D., P. Bunting, J. Shepherd, S. Gillingham, N. Flood, J. Dymond, R. Lucas, J. Armston, and M.
     Moghaddam (2014). “A Python-Based Open Source System for Geographic Object-Based Image
     Analysis (GEOBIA) Utilizing Raster Attribute Tables”. In: Remote Sensing 6.7, pp. 6111–6135.
     ISSN : 2072-4292.
Coneus, K. and C. K. Spiess (2012). “Pollution exposure and child health: Evidence for infants and
     toddlers in Germany”. In: Journal of Health Economics 31.1, pp. 180–196.
Crouse, D. L., M. S. Goldberg, and N. a. Ross (2009). “A prediction-based approach to modelling tem-
     poral and spatial variability of trafﬁc-related air pollution in Montreal, Canada”. In: Atmospheric
     Environment 43.32, pp. 5075–5084.
Curtis, L., W. Rea, P. Smith-Willis, E. Fenyves, and Y. Pan (2006). “Adverse health effects of outdoor air
     pollutants”. In: Environment International 32.6, pp. 815–830. ISSN: 0160-4120.
Dadvand, P., J. Rankin, S. Rushton, and T. Pless-Mulloli (2011). “Ambient air pollution and congenital
     heart disease: A register-based study”. In: Environmental Research 111.3, pp. 435–441.
De Vito, S., G. Fattoruso, R. Liguoro, A. Oliviero, E. Massera, C. Sansone, V. Casola, and G. Di Francia
     (2011). “Cooperative 3D Air Quality Assessment with Wireless Chemical Sensing Networks”. In:
     Procedia Engineering 25, pp. 84–87.
Dons, E., L. Int Panis, M. Van Poppel, J. Theunis, H. Willems, R. Torfs, and G. Wets (2011). “Impact of
     time-activity patterns on personal exposure to black carbon”. In: Atmospheric Environment 45.21,
     pp. 3594–3602.
Dons, E., M. Van Poppel, B. Kochan, G. Wets, and L. Int Panis (2013). “Modeling temporal and spatial
     variability of trafﬁc-related air pollution: Hourly land use regression models for black carbon”. In:
     Atmospheric Environment 74, pp. 237–246.
B IBLIOGRAPHY                                                                                         91


Dronova, I., P. Gong, N. E. Clinton, L. Wang, W. Fu, S. Qi, and Y. Liu (2012). “Landscape analysis
     of wetland plant functional types: The effects of image segmentation scale, vegetation classes and
     classiﬁcation methods”. In: Remote Sensing of Environment 127, pp. 357–369. ISSN: 00344257.
Dugord, P.-A., S. Lauf, C. Schuster, and B. Kleinschmit (2014). “Land use patterns, temperature distribu-
     tion, and potential heat stress risk - The case study Berlin, Germany”. In: Computers, Environment
     and Urban Systems 48, pp. 86–98. ISSN: 01989715.
Duro, D. C., S. E. Franklin, and M. G. Dubé (2012). “A comparison of pixel-based and object-based
     image analysis with selected machine learning algorithms for the classiﬁcation of agricultural land-
     scapes using SPOT-5 HRG imagery”. In: Remote Sensing of Environment 118, pp. 259–272. ISSN:
     00344257.
Eeftens, M., R. Beelen, K. de Hoogh, T. Bellander, G. Cesaroni, M. Cirach, C. Declercq, A. Dedele,
     E. Dons, A. de Nazelle, K. Dimakopoulou, K. Eriksen, G. Falq, P. Fischer, C. Galassi, R. Gražule-
     viˇ iene, J. Heinrich, B. Hoffmann, M. Jerrett, D. Keidel, M. Korek, T. Lanki, S. Lindley, C. Madsen,
       c
     A. Mölter, G. Nádor, M. Nieuwenhuijsen, M. Nonnemacher, X. Pedeli, O. Raaschou-Nielsen, E.
     Patelarou, U. Quass, A. Ranzi, C. Schindler, M. Stempfelet, E. Stephanou, D. Sugiri, M.-Y. Tsai,
     T. Yli-Tuomi, M. J. Varró, D. Vienneau, S. von Klot, K. Wolf, B. Brunekreef, and G. Hoek (2012).
     “Development of Land Use Regression Models for PM2.5, PM2.5 Absorbance, PM10 and PMcoarse
     in 20 European Study Areas, Results of the ESCAPE Project”. In: Environmental Science & Tech-
     nology 46.20, pp. 11195–11205.
Elbir, T., N. Mangir, M. Kara, S. Simsir, T. Eren, and S. Ozdemir (2010). “Development of a GIS-based
     decision support system for urban air quality management in the city of Istanbul”. In: Atmospheric
     Environment 44.4, pp. 441–454.
EPA, U. (2010). “Integrated Science Assessment for Carbon Monoxide”. In: National Center for Envi-
     ronmental Assessment, Ofﬁce of Research and Development 019B.EPA/600/R-09/019B.
Finardi, S., R. De Maria, A. D’Allura, C. Cascone, G. Calori, and F. Lollobrigida (2008). “A deterministic
     air quality forecasting system for Torino urban area, Italy”. In: Environmental Modelling & Software
     23.3, pp. 344–355. ISSN: 13648152.
Flammini, A., P. Ferrari, D. Marioli, E. Sisinni, and A. Taroni (2009). “Wired and wireless sensor net-
     works for industrial applications”. In: Microelectronics Journal 40.9, pp. 1322–1336.
Foody, G. (2008). “GIS: biodiversity applications”. In: Progress in Physical Geography 32.2, pp. 223–
     235. ISSN: 0309-1333.
Gasiewicz, T. (2010). “The Exposome: A powerful approach for evaluating environmental exposures
     and their inﬂuences on human disease”. In: Emerging science for environmental health decisions -
     Workshop. Nas Building, 2100c street, Washington DC.
Gidhagen, L., G. Omstedt, G. Pershagen, S. Willers, and T. Bellander (2013). “High-resolution modeling
     of residential outdoor particulate levels in Sweden.” In: Journal of exposure science & environmental
     epidemiology 23.3, pp. 306–14. ISSN: 1559-064X.
Gitelson, A. and M. Merzlyak (1996). “Signature analysis of leaf reﬂectance spectra: Algorithm devel-
     opment for remote sensing of chlorophyll”. In: Journal of Plant Physiology 148.4, pp. 494–500.
Gomez-Mejiba, S. E., Z Zhai, H Akram, Q. N. Pye, K Hensley, B. T. Kurien, R. H. Scoﬁeld, and D. C.
     Ramirez (2009). “Inhalation of environmental stressors & chronic inﬂammation: autoimmunity and
     neurodegeneration.” In: Mutat Res 674.1-2, pp. 62–72.
González, A., A. Donnelly, M. Jones, N. Chrysoulakis, and M. Lopes (2013). “A decision-support sys-
     tem for sustainable urban metabolism in Europe”. In: Environmental Impact Assessment Review 38,
     pp. 109–119. ISSN: 01959255.
Grohmann, C. H. (2015). “Effects of spatial resolution on slope and aspect derivation for regional-scale
     analysis”. In: Computers & Geosciences 77.0, pp. 111–117. ISSN: 0098-3004.
92                                                                                       B IBLIOGRAPHY


Gulliver, J. and D. Briggs (2011). “STEMS-Air: a simple GIS-based air pollution dispersion model for
     city-wide exposure assessment.” In: The Science of the total environment 409.12, pp. 2419–29.
Guo, D., R. Guo, and C. Thiart (2007). “Predicting air pollution using fuzzy membership grade Kriging”.
     In: Computers, Environment and Urban Systems 31.1, pp. 33–51.
Hamzelou, J. (2011). “Welcome to the exposome”. In: The New Scientist 208.2792, pp. 6–7.
Hoek, G., R. Beelen, K. de Hoogh, D. Vienneau, J. Gulliver, P. Fischer, and D. Briggs (2008). “A review
     of land-use regression models to assess spatial variation of outdoor air pollution”. In: Atmospheric
     Environment 42.33, pp. 7561–7578.
Hoﬁerka, J. and M. Zlocha (2012). “A New 3-D Solar Radiation Model for 3-D City Models”. In: Trans-
     actions in GIS 16.5, pp. 681–690. ISSN: 1467-9671.
Hoogh, K. de, M. Korek, D. Vienneau, M. Keuken, J. Kukkonen, M. J. Nieuwenhuijsen, C. Badaloni, R.
     Beelen, A. Bolignano, G. Cesaroni, M. C. Pradas, J. Cyrys, J. Douros, M. Eeftens, F. Forastiere, B.
     Forsberg, K. Fuks, U. Gehring, A. Gryparis, J. Gulliver, A. L. Hansell, B. Hoffmann, C. Johansson,
     S. Jonkers, L. Kangas, K. Katsouyanni, N. Künzli, T. Lanki, M. Memmesheimer, N. Moussiopou-
     los, L. Modig, G. Pershagen, N. Probst-Hensch, C. Schindler, T. Schikowski, D. Sugiri, O. Teixidó,
     M.-Y. Tsai, T. Yli-Tuomi, B. Brunekreef, G. Hoek, and T. Bellander (2014). “Comparing land use
     regression and dispersion modelling to assess residential exposure to ambient air pollution for epi-
     demiological studies”. In: Environment International 73, pp. 382–392. ISSN: 01604120.
Hu, S.-C., Y.-C. Wang, C.-Y. Huang, and Y.-C. Tseng (2011). “Measuring air quality in city areas by
     vehicular wireless sensor networks”. In: Journal of Systems and Software 84.11, pp. 2005–2012.
     ISSN : 01641212.
Hunter, J. D. (2007). “Matplotlib: A 2D Graphics Environment”. In: Computing in Science & Engineering
     9.9, pp. 90–95.
Huth, J., C. Kuenzer, T. Wehrmann, S. Gebhardt, V. Tuan, and S. Dech (2012). “Land cover and land
     use classiﬁcation with TWOPAC: Towards automated processing for pixel- and object-based image
     classiﬁcation.” In: Remote Sensing 4, pp. 2530–2553.
Inglada, J. and E Christophe (2009). “The Orfeo Toolbox Remote Sensing Image Processing Software”.
     In: IEEE International Geoscience and Remote Sensing Symposium. Cape Town, South Africa, 12-
     17 July 2009, pp. IV–733–IV–736.
InterImage (2014). InterImage User Manual 1.41. Laboratório de Visão Computacional : Rio de Janeiro,
     Brazil.
Janssen, N., P. Fischer, M. Marra, C. Ameling, and F. Cassee (2013). “Short-term effects of PM2.5, PM10
     and PM2.5-10 on daily mortality in the Netherlands”. In: Science of The Total Environment 463-464,
     pp. 20–26.
Janssen, S., G. Dumont, F. Fierens, and C. Mensink (2008). “Spatial interpolation of air pollution mea-
     surements using CORINE land cover data”. In: Atmospheric Environment 42.20, pp. 4884–4903.
Jephcote, C. and H. Chen (2012). “Environmental injustices of children’s exposure to air pollution from
     road-transport within the model British multicultural city of Leicester: 2000-09”. In: Science of The
     Total Environment 414.0, pp. 140–151.
Jin, S., J. Guo, S. Wheeler, L. Kan, and S. Che (2014). “Evaluation of Impacts of Trees on PM2.5
     Dispersion in Urban Streets”. In: Atmospheric Environment 99, pp. 277–287. ISSN: 13522310.
Johnson, M., V. Isakov, J. S. Touma, S. Mukerjee, and H. Özkaynak (2010). “Evaluation of land-use
     regression models used to predict air quality concentrations in an urban area”. In: Atmospheric En-
     vironment 44.30, pp. 3660–3668.
Jones, E., T. Oliphant, P. Peterson, and S. Community (2013). “SciPy: Open Source Scientiﬁc Tools for
     Python”. In:
B IBLIOGRAPHY                                                                                        93


Jones, E., T. Oliphant, P. Peterson, et al. (2001-2015). SciPy: Open source scientiﬁc tools for Python.
     [Online; accessed 2014-07-23].
Käffer, M. I., A. T. Lemos, M. A. Apel, J. V. Rocha, S. M. D. A. Martins, and V. M. F. a. Vargas (2012).
     “Use of bioindicators to evaluate air quality and genotoxic compounds in an urban environment in
     Southern Brazil.” In: Environmental pollution (Barking, Essex : 1987) 163, pp. 24–31.
Kardel, F., K. Wuyts, B. Maher, R. Hansard, and R. Samson (2011). “Leaf saturation isothermal remanent
     magnetization (SIRM) as a proxy for particulate matter monitoring: Inter-species differences and in-
     season variation”. In: Atmospheric Environment 45.29, pp. 5164–5171.
Kardel, F., K. Wuyts, M. Babanezhad, T. Wuytack, S. Adriaenssens, and R. Samson (2012). “Tree leaf
     wettability as passive bio-indicator of urban habitat quality”. In: Environmental and Experimental
     Botany 75, pp. 277–285.
Kaur, S., M. Nieuwenhuijsen, and R. Colvile (2007). “Fine particulate matter and carbon monoxide ex-
     posure concentrations in urban street transport microenvironments”. In: Atmospheric Environment
     41.23, pp. 4781–4810. ISSN: 13522310.
Kesarkar, A. P., M. Dalvi, A. Kaginalkar, and A. Ojha (2007). “Coupling of the Weather Research and
     Forecasting Model with AERMOD for pollutant dispersion modeling. A case study for PM10 dis-
     persion over Pune, India”. In: Atmospheric Environment 41.9, pp. 1976–1988.
Kotthaus, S. and C. Grimmond (2013). “Energy exchange in a dense urban environment - Part I: Temporal
     variability of long-term observations in central London”. In: Urban Climate. ISSN: 22120955.
Lehmann, I., J. Mathey, S. Röß ler, A. Bräuer, and V. Goldberg (2014). “Urban vegetation structure types
     as a methodological approach for identifying ecosystem services - Application to the analysis of
     micro-climatic effects”. In: Ecological Indicators 42, pp. 58–72. ISSN: 1470160X.
Levy, I., C. Mihele, and G. Lu (2014). “Evaluating multipollutant exposure and urban air quality: pol-
     lutant interrelationships, neighborhood variability, and nitrogen dioxide as a proxy pollutant”. In:
     Environmental health . . . 122.1, pp. 65–72.
Li, X., S. W. Myint, Y. Zhang, C. Galletti, X. Zhang, and B. L. Turner (2014). “Object-based land-
     cover classiﬁcation for metropolitan Phoenix, Arizona, using aerial photography”. In: International
     Journal of Applied Earth Observation and Geoinformation 33, pp. 321–330. ISSN: 03032434.
Li, X., L. Di, W. Han, P. Zhao, and U. Dadi (2010). “Sharing geoscience algorithms in a Web service-
     oriented environment (GRASS GIS example)”. In: Computers & Geosciences 36.8, pp. 1060–1068.
     ISSN : 00983004.
Liu, H. and Y. Shen (2014). “The Impact of Green Space Changes on Air Pollution and Microclimates:
     A Case Study of the Taipei Metropolitan Area”. In: Sustainability 6, pp. 8827–8855.
Llop, E., P. Pinho, P. Matos, M. J. a. Pereira, and C. Branquinho (2012). “The use of lichen functional
     groups as indicators of air quality in a Mediterranean urban environment”. In: Ecological Indicators
     13.1, pp. 215–221. ISSN: 1470160X.
Madsen, C., K. C. Carlsen, G. Hoek, B. Oftedal, P. Nafstad, K. Meliefste, R. Jacobsen, W. Nystad, and
     B. Brunekreef (2007). “Modeling the intra-urban variability of outdoor trafﬁc pollution in Oslo,
     Norway–A GA2LEN project”. In: Atmospheric Environment 41.35, pp. 7500–7511.
Marchetti, A., C. Piccini, S. Santucci, I. Chiuchiarelli, and R. Francaviglia (2011). “Simulation of soil
     types in Teramo province (Central Italy) with terrain parameters and remote sensing data”. In:
     Catena 85.3, pp. 267–273.
Mavroulidou, M., S. J. Hughes, and E. E. Hellawell (2004). “A qualitative tool combining an interaction
     matrix and a GIS to map vulnerability to trafﬁc induced air pollution.” In: Journal of environmental
     management 70.4, pp. 283–9.
94                                                                                     B IBLIOGRAPHY


Meijer, M., J. Röhl, K. Bloomﬁeld, and U. Grittner (2012). “Do neighborhoods affect individual mor-
     tality? A systematic review and meta-analysis of multilevel studies.” In: Social science & medicine
     (1982) 74.8, pp. 1204–12.
Merbitz, H., M. Buttstädt, S. Michael, W. Dott, and C. Schneider (2012). “GIS-based identiﬁcation of
     spatial variables enhancing heat and poor air quality in urban areas”. In: Applied Geography 33.0,
     pp. 94–106. ISSN: 0143-6228.
Millman, K. J. and M. Aivazis (2011). “Python for Scientists and Engineers”. In: Computing in Science
     & Engineering 13.13, pp. 9–12.
Mölter, A, S Lindley, F de Vocht, A Simpson, and R Agius (2010). “Modelling air pollution for epidemi-
     ologic research – Part I: A novel approach combining land use regression and air dispersion”. In:
     Science of The Total Environment 408.23, pp. 5862–5869. ISSN: 0048-9697.
Mölter, A., S. Lindley, F. de Vocht, R. Agius, G. Kerry, K. Johnson, M. Ashmore, A. Terry, S. Dim-
     itroulopoulou, and A. Simpson (2012). “Performance of a microenviromental model for estimating
     personal NO2 exposure in children”. In: Atmospheric Environment 51.0, pp. 225–233.
Moskal, L. M., D. M. Styers, and M. Halabisky (2011). “Monitoring Urban Tree Cover Using Object-
     Based Image Analysis and Public Domain Remotely Sensed Data”. In: Remote Sensing 3.12, pp. 2243–
     2262. ISSN: 2072-4292.
Motaghian, H. R. and J. Mohammadi (2011). “Spatial Estimation of Saturated Hydraulic Conductivity
     from Terrain Attributes Using Regression, Kriging, and Artiﬁcial Neural Networks”. In: Pedosphere
     21.2, pp. 170–177.
Mukerjee, S., L. a. Smith, M. M. Johnson, L. M. Neas, and C. a. Stallings (2009). “Spatial analysis and
     land use regression of VOCs and NO2 from school-based urban air monitoring in Detroit/Dearborn,
     USA”. In: Science of The Total Environment 407.16, pp. 4642–4651.
National Records of Scotland (2013). Mid-Year Population Estimates. Tech. rep. Edinburgh: National
     Records of Scotland.
Neteler, M., M. H. Bowman, M. Landa, and M. Metz (2012). “GRASS GIS: A multi-purpose open source
     GIS”. In: Environmental Modelling & Software 31, pp. 124–130. ISSN: 13648152.
Novack, T., T. Esch, H. Kux, and U. Stilla (2011). “Machine Learning Comparison between WorldView-
     2 and QuickBird-2-Simulated Imagery Regarding Object-Based Urban Land Cover Classiﬁcation”.
     In: Remote Sensing 3.12, pp. 2263–2282. ISSN: 2072-4292.
Okabe, A., T. Satoh, and K. Sugihara (2009). “A kernel density estimation method for networks, its
     computational method and a GIS-based tool”. In: International Journal of Geographical Information
     Science 23.1, pp. 7–32. ISSN: 1365-8816.
Oliphant, T. E. (2007). “Python for Scientiﬁc Computing”. In: Computing in Science & Engineering 9.9,
     pp. 10–20.
Pearce, J. L., S. L. Rathbun, M. Aguilar-Villalobos, and L. P. Naeher (2009). “Characterizing the spa-
     tiotemporal variability of PM2.5 in Cusco, Peru using kriging with external drift”. In: Atmospheric
     Environment 43.12, pp. 2060–2069.
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
     R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E.
     Duchesnay (2011). “Scikit-learn: Machine Learning in Python”. In: Journal of Machine Learning
     Research 12, pp. 2825–2830.
Pirjola, L., T. Lähde, J. Niemi, A. Kousa, T. Rönkkö, P. Karjalainen, J. Keskinen, A. Frey, and R. Hillamo
     (2012). “Spatial and temporal characterization of trafﬁc emissions in urban microenvironments with
     a mobile laboratory”. In: Atmospheric Environment 63, pp. 156–167. ISSN: 13522310.
B IBLIOGRAPHY                                                                                          95


Preatoni, D., C. Tattoni, F. Bisi, E. Masseroni, D. D’Acunto, S. Lunardi, I. Grimod, and G. Tosi (2012).
     “Open source evaluation of kilometric indexes of abundance”. In: Ecological Informatics 7.1, pp. 35–
     40.
Rada, E. C. (2014). “Sustainable city and air pollution”. In: Wit Transactions on ecology and the environ-
     ment, pp. 1369–1380.
Rada, E. C., M. Ragazzi, M. Brini, L. Marmo, P. Zambelli, M. Chelodi, and M. Ciolli (2012). “Perspec-
     tives of low-cost sensors adoption for air quality monitoring”. In: Scientiﬁc Bulletin - Politehnica
     University Of Bucharest. Series D, Mechanical Engineering 74.2, pp. 241–250.
Ragazzi, M., E. C. Rada, A. Chisté, M. Schiavon, M. Ciolli, P. Zambelli, M. Brini, L. Marmo, and
     M. Chelodi (2012). “A contribution to the development of a public - private integrated network
     of wireless sensors for an enhanced management of air quality”. In: Sustainable Technology for
     Environmental Protection, pp. 1–4.
Rahman, M. M., G. J. Hay, I. Couloigner, and B. Hemachandran (2014). “Transforming Image-Objects
     into Multiscale Fields: A GEOBIA Approach to Mitigate Urban Microclimatic Variability within
     H-Res Thermal Infrared Airborne Flight-Lines”. In: Remote Sensing 6, pp. 9435 –9457.
Ram, S. S., S Majumdar, P Chaudhuri, S Chanda, S. C. Santra, P. K. Maiti, M Sudarshan, and a Chakraborty
     (2012). “SEMEDS: an important tool for air pollution bio-monitoring.” In: Micron (Oxford, England
     : 1993) 43.2-3, pp. 490–3.
Rawi, M. and A. Al-Anbuky (2011). “Development of Intelligent Wireless Sensor Networks for Human
     Comfort Index Measurement”. In: Procedia Computer Science 5, pp. 232–239.
Richmond-Bryant, J., S. S. Isukapalli, and D. A. Vallero (2011). “Air pollutant retention within a complex
     of urban street canyons”. In: Atmospheric Environment 45.40, pp. 7612–7618.
Ross, Z., M. Jerrett, K. Ito, B. Tempalski, and G. D. Thurston (2007). “A land use regression for predicting
     ﬁne particulate matter concentrations in the New York City region”. In: Atmospheric Environment
     41.11, pp. 2255–2269.
Sacchelli, S., I. Bernetti, I. De Meo, L. Fiori, A. Paletto, P. Zambelli, and M. Ciolli (2014). “Matching
     socio-economic and environmental efﬁciency of wood-residues energy chain: a partial equilibrium
     model for a case study in Alpine area”. In: Journal of Cleaner Production 66, pp. 431–442. ISSN:
     09596526.
Sacchelli, S., P. Zambelli, and M. Ciolli (2013). “Biomasfor - An open source holistic model for the
     assessment of sustainable forest bioenergy”. In: iForest 1.Biogeosciences and Forestry, pp. 285–
     293.
Sally Liu, L.-J., M.-Y. Tsai, D. Keidel, A. Gemperli, A. Ineichen, M. Hazenkamp-von Arx, L. Bayer-
     Oglesby, T. Rochat, N. Künzli, U. Ackermann-Liebrich, P. Straehl, J. Schwartz, and C. Schindler
     (2012). “Long-term exposure models for trafﬁc related NO2 across geographically diverse areas
     over separate years”. In: Atmospheric Environment 46.2, pp. 460–471. ISSN: 13522310.
Salo, H., M. S. Bu´ ko, E. Vaahtovuo, J. Limo, J. Mäkinen, and L. J. Pesonen (2012). “Biomonitoring of
                    c
     air pollution in SW Finland by magnetic and chemical measurements of moss bags and lichens”. In:
     Journal of Geochemical Exploration 115, pp. 69–81.
Schiavon, M., G. Antonacci, E. C. Rada, M. Ragazzi, and D. Zardi (2014). “Modelling human exposure
     to air pollutants in an urban area”. In: Revista de Chimie, pp. 61–64.
Schoolman, E. D. and C. Ma (2012). “Migration, class and environmental inequality: Exposure to pollu-
     tion in China’s Jiangsu Province”. In: Ecological Economics 75, pp. 140–151.
Senthilnath, J., S. Bajpai, S. Omkar, P. Diwakar, and V. Mani (2012). “An approach to multi-temporal
     MODIS image analysis using image classiﬁcation and segmentation”. In: Advances in Space Re-
     search 50.9, pp. 1274–1287. ISSN: 02731177.
96                                                                                       B IBLIOGRAPHY


Shad, R., M. S. Mesgari, A. Abkar, and A. Shad (2009). “Predicting air pollution using fuzzy genetic
     linear membership kriging in GIS”. In: Computers, Environment and Urban Systems 33.6, pp. 472–
     481.
Sheppard, L., J. C. Slaughter, J. Schildcrout, L.-J. S. Liu, and T. Lumley (2005). “Exposure and measure-
     ment contributions to estimates of acute air pollution effects.” In: Journal of exposure analysis and
     environmental epidemiology 15.4, pp. 366–76.
Singh, V., C. Carnevale, G. Finzi, E. Pisoni, and M. Volta (2011). “A cokriging based approach to recon-
     struct air pollution maps, processing measurement station concentrations and deterministic model
     simulations”. In: Environmental Modelling & Software 26.6, pp. 778–786.
Skelhorn, C., S. Lindley, and G. Levermore (2014). “The impact of vegetation types on air and surface
     temperatures in a temperate city: A ﬁne scale assessment in Manchester, UK”. In: Landscape and
     Urban Planning 121, pp. 129–140. ISSN: 01692046.
Steiniger, S. and G. J. Hay (2009). “Free and open source geographic information tools for landscape
     ecology”. In: Ecological Informatics 4.4, pp. 183–195. ISSN: 15749541.
Steinle, S., S. Reis, and C. E. Sabel (2013). “Quantifying human exposure to air pollution-Moving from
     static monitoring to spatio-temporally resolved personal exposure assessment.” In: The Science of
     the total environment 443, pp. 184–93. ISSN: 1879-1026.
Stevens, C. J., S. M. Smart, P. a. Henrys, L. C. Maskell, A. Crowe, J. Simkin, C. M. Chefﬁngs, C.
     Whitﬁeld, D. J. Gowing, E. C. Rowe, A. J. Dore, and B. a. Emmett (2012). “Terricolous lichens
     as indicators of nitrogen deposition: Evidence from national records”. In: Ecological Indicators 20,
     pp. 196–203.
Stuart, A. L. and M. Zeager (2011). “An inequality study of ambient nitrogen dioxide and trafﬁc lev-
     els near elementary schools in the Tampa area.” In: Journal of environmental management 92.8,
     pp. 1923–30.
Su, J. G., M. Brauer, B. Ainslie, D. Steyn, T. Larson, and M. Buzzelli (2008a). “An innovative land
     use regression model incorporating meteorology for exposure analysis”. In: Science of the Total
     Environment 390, pp. 520–529. ISSN: 00489697.
Su, J. G., M. Brauer, and M. Buzzelli (2008b). “Estimating urban morphometry at the neighborhood
     scale for improvement in modeling long-term average air pollution concentrations”. In: Atmospheric
     Environment 42.34, pp. 7884–7893.
Tattoni, C., M. Ciolli, F. Ferretti, and M. G. Cantiani (2010). “Monitoring spatial and temporal pattern of
     Paneveggio forest (Northern Italy) from 1859 to 2006”. In: iForest 1.Biogeosciences and Forestry 3,
     pp. 72–80.
Tattoni, C., F. Rizzoli, and P. Pedrini (2012). “Can LiDAR data improve bird habitat suitability models?”
     In: Ecological Modelling 245, pp. 103–110.
USEPA - U.S. Environmental Protection Agency (2010). Integrated Science Assessment for Carbon
     Monoxide. Tech. rep. National Center for Environmental Assessment, Ofﬁce of Research and Devel-
     opment., EPA/600/R–09/019B.
van Rossum, G. (1995). Python library reference. Report CS-R9524. pub-CWI, pp. iv + 186.
Varrazzo, D. and Psycopg Community (2013). “Psycopg - PostgreSQL database adapter for Python”. In:
Vedrenne, M., R. Borge, J. Lumbreras, and M. E. Rodríguez (2014). “Advancements in the design and
     validation of an air pollution integrated assessment model for Spain”. In: Environmental Modelling
     & Software 57, pp. 177–191. ISSN: 13648152.
Vettorato, D. and P. Zambelli (2009). “Estimation of Energy Sustainability at Local Scale : An Approach
     Based on Innovative Analytical and Mapping Tools and Multicriteria Analysis”. In: In Proceedings
     of 45th ISOCARP Congress Porto - Portugal, 18–22 October 2009, pp. 1–12.
B IBLIOGRAPHY                                                                                       97


Vettorato, D., D. Geneletti, and P. Zambelli (2011). “Spatial comparison of renewable energy supply and
     energy demand for low-carbon settlements”. In: Cities 28.6, pp. 557–566. ISSN: 0264-2751.
Vieira, M. A., A. R. Formaggio, C. D. Rennó, C. Atzberger, D. A. Aguiar, and M. P. Mello (2012).
     “Object Based Image Analysis and Data Mining applied to a remotely sensed Landsat time-series
     to map sugarcane over large areas”. In: Remote Sensing of Environment 123, pp. 553–562. ISSN:
     00344257.
Vienneau, D, K de Hoogh, and D Briggs (2009). “A GIS-based method for modelling air pollution expo-
     sures across Europe”. In: Science of The Total Environment 408.2, pp. 255–266.
Vlachokostas, C., C. Achillas, N Moussiopoulos, and G Banias (2011). “Multicriteria methodological
     approach to manage urban air pollution”. In: Atmospheric Environment 45.25, pp. 4160–4169.
Walt, S. van der, S. C. Colbert, and G. Varoquaux (2011). “The NumPy Array: A Structure for Efﬁcient
     Numerical Computation”. In: Computing in Science & Engineering 13.13, pp. 22–30.
                                                                                                   ˘ˇ
Wang, N., N. Zhang, and M. Wang (2006). “Wireless sensors in agriculture and food industry âAT Recent
     development and future perspective”. In: Computers and Electronics in Agriculture 50.1, pp. 1 –14.
Wieland, M. and M. Pittore (2014). “Performance Evaluation of Machine Learning Algorithms for Urban
     Pattern Recognition from Multi-spectral Satellite Images”. In: Remote Sensing 6.4, pp. 2912–2939.
     ISSN : 2072-4292.
Wikipedia (2014a). Confusion matrix — Wikipedia, The Free Encyclopedia. [Online; accessed 28-September-
     2014].
— (2014b). Python (programming language) — Wikipedia, The Free Encyclopedia. [Online; accessed
     2014-07-23].
Wild, C. P. (2005). “Complementing the genome with an "exposome": the outstanding challenge of en-
     vironmental exposure measurement in molecolar epidemiology”. In: Cancer Epidemiol Biomakers
     Prev 14.8, pp. 1847–1850.
World Health Organization (WHO) (2014a). 7 million premature deaths annually linked to air pollution.
     GENEVA.
— (2014b). Ambient (outdoor) air quality and health, Fact sheet N.313,
Zambelli, P., C. Lora, M. Ciolli, R. Spinelli, C. Tattoni, A. Vitti, and P. Zatelli (2010). “A FOSS4G model
     to estimate forest extraction methods and biomass availability for renewable energy production”. In:
     FOSS4G. Vol. 1. Barcelona.
Zambelli, P., C. Lora, R. Spinelli, C. Tattoni, A. Vitti, P. Zatelli, and M. Ciolli (2012). “A GIS decision
     support system for regional forest management to assess biomass availability for renewable energy
     production”. In: Environmental Modelling & Software 38, pp. 203–213. ISSN: 13648152.
Zambelli, P., S. Gebbert, and M. Ciolli (2013). “Pygrass : an object oriented Python API for GRASS
     GIS”. In: ISPRS International Journal of Geo-Information 2, pp. 201–219.
Zhang, K. and B. Hu (2012). “Individual Urban Tree Species Classiﬁcation Using Very High Spatial
     Resolution Airborne Multi-Spectral Imagery Using Longitudinal Proﬁles”. In: Remote Sensing 4.12,
     pp. 1741–1757. ISSN: 2072-4292.