In the previous lectures,
we have discussed an integrated framework of four disciplines: GIS,
spatial DBMS, spatial data analytics
and Big Data systems.
And I presented appropriate open source softwares for each discipline.
In this lecture, I will categorize
spatial data science problems into six groups in terms of solution frameworks.
As shown on the slide,
six problem types are: Desktop GIS, Server GIS, Spatial Web,
Spatial Data Analytics, Spatial Data Management and Analytics,
Spatial Big Data Management and Analytics.
For each problem type,
I will explain how the solution framework can be used.
In the very first lecture of the first week,
we discussed the definition of spatial data science in
a formal manner in comparison with science,
data science, and spatial data science.
And on the other hand,
spatial data science problems could be simply defined as
problems associated with "where" for either input data,
or output results, or both.
Examples are "Where is the best place for a new coffee shop in Illinois 60611?",
which could be often raised in locating local retail businesses.
The next example is the more challenging question.
What is the emotional condition based on his or her trajectory now?
We do not have a clear answer to the questions yet,
but it will be very meaningful question in mobile IT business.
If you have a reliable solution, Microsoft or
other major IP companies would buy and use it for their personal assistant.
For example, Cortana for Windows 10.
The figure is a solution framework for spatial data science problems.
With respect to any given spatial data science problems
some portion of the framework can be used,
dependent on data size,
single user or multiple users, level of analysis.
I mean, how advanced it is,
and main focus of the problem which could be data management,
geo-visualization, data dissemination, big data process management and so on.
The first category named
as Desktop GIS or single-user GIS.
In the integrated framework,
it uses only GIS.
As mentioned before, GIS can deal with almost
all the aspects of spatial data science,
so that it can handle spatial data science problem by itself.
Spatial data science problems with Desktop GIS would be characterized by the
followings: First of all, data size is relatively
small and only single user is assumed.
In other words, a stand alone applications.
In terms of level of analysis,
Desktop GIS can deal with basic spatial analysis only.
The main focus would be placed on mapping and geo-visualization.
As an example, a market analysis
for sales of timber logging machine is given here.
A simple ratio between timber demand from paper mil or
lumber mil and timber supply with harvestable forest would
present a rough estimation on
the counties where large sales would be expected for timber logging machine.
With a relative small dataset and a single-user can map or geo-visualize
the final result after a series of geo-processing and map algebra.
The second category is named as Server GIS or multiple-user GIS.
In the integrated framework,
it uses GIS and spatial DBMS.
The major difference from the Desktop GIS would be
data size and data management issues for multiple users.
So that Server GIS can be characterized by the
followings: Data size is generally bigger than Desktop GIS.
For example, over 100 megabytes.
And in order to support multiple users,
full functionality of database management system should be supported.
Such as transaction management with atomicity,
concurrency, isolation, and durability.
SQL and data indexing and query optimizations, and so on.
Main focus will be placed on
data management with DBMS and mapping geo-visualization with GIS.
The best example would be a Server GIS for the local government.
For example, country governments in the U.S.,
they generally have quite a few spatial data in-house for their operation everyday.
Cadastral data of parcel boundary and ownership,
certified surveying maps, zoning data for land-use planning,
tax roll for taxation,
utility layers such as water supply and sewerage,
parking and transportation, and many many others.
The datasets should be well-managed in database and
individual department should be connected to those spatial DBMS,
and conduct their works and operation with GIS.
The focus would be data management with DBMS
and mapping and basic operation with GIS.
The figure illustrates the system
configuration in the above and in the bottom,
real example is presented mapping with QGIS,
data management with PostGIS.
The next type is Spatial Web or Web GIS,
which is also known as web mapping,
which is the process and application of making spatial data dissemination.
And spatial data can be
delivered through world wide web with
server-client structure and some user interactive services.
In the figure, it seems that only spatial DBMS is used.
In reality, we need another system which is Internet Map Server.
It can communicate with spatial DBMS and
request mapping result through query language
and deliver the result to the client over the internet.
The characteristics of web GIS are
generally large size of spatial data as a server application.
It can deal with
basic spatial analysis functions available in query language of spatial DBMS.
And the main focus is definitely on spatial data delivery on the web.
Now you're looking at a Web GIS example,
which my research team developed for
crop information system where weather data of 10 years,
- minimum, maximum, average temperature,
precipitation and crop conditions can be
visualized with some analytical options such as anomaly detection.
The system was developed for the purpose of disseminating and delivering
crop-related information to clients over the Internet.
The next type is spatial data analytics.
GIS can deal with geo-processing and basic spatial analysis.
But it has limited analytic power.
For example, very limited functionality for statistical analysis.
In case you need to conduct advanced spatial analysis,
it is required to connect the GIS with a data analysis tool.
In our framework, QGIS for GIS with R for data analytics.
Spatial data analytics problem can be characterized by
the following: It has
a relative small dataset for a single user, so that GIS can deal with it.
Main focus must be placed on flexible and advanced spatial data analysis.
Mapping and geo-visualization is also important.
A simple example is
a statistical analysis with spatial data.
You are looking at the two examples.
In the above, the figure illustrates a categorized
map of hypertension prevalence in Korea.
And a corresponding decision tree with
which we could extract influential factors of the disease prevalence.
In the below, a simple correlation analysis of
homicide patterns in the New York City is presented.
Those analysis could not be easily conducted with desktop GIS.
The next type is spatial data management and analytics.
Only the difference from the previous type is whether spatial DBMS is used or not.
Or you can think of it as an extension of
server GIS with additional data analytics tool.
In this case, R. This type can be characterized by the
following: The data size is generally large and it has multiple users,
for that, full functionality of DBMS should be supported.
Main focus should be placed on both spatial data management with DBMS and advanced
spatial data analytics with data analytics tool.
The upper part of the figure illustrates
the system configuration which is quite similar to server GIS,
the only difference is additional connection
of data analytics tools, R to spatial DBMS.
Now let's imagine that New York City manage a variety of spatial data in spatial DBMS.
And GIS in each department is connected to
the DBMS and conduct not only simple Korean mapping,
but also advanced spatial analysis with R,
that would be a good example.
The figure in the bottom illustrates crime pattern with respect to different distance
from the subway stations in New York City which would
require both data management and advanced analytic power.
The next type is entitled as "spatial big data management and analytics".
Now the integrated framework is fully utilized. When spatial big data
are collected such as taxi trajectory,
floating population and multiple sensor data.
Big data systems such as Hadoop Hive, HBase and prime language like Java and
Python deal with the pre-processing and refine
the big data in order to make the data,
stored in spatial DBMS or conduct direct analysis with Hadoop Ecosystem.
Spatial data science problem in this type can be characterized by, first of all,
"BIG" data size at minimum over 100 gigabytes and
then data management with
database management system and flexible and advanced analysis.
And mapping and geo-visualization are all important.
The given example is
a system configuration of DTG data analysis for eco-driving design.
DTG stands for data Tachograph which
collects all driving information such as speed, r.p.m.
of the engine like acceleration,
fuel consumption, brake signal,
GPS coordinates and many others.
The data was collected from 6000 trucks by every single second for one year.
The data size is over three terabytes.
With Hadoop related systems,
the data set was pre-processed to remove error and noise,
then map matching was conducted.
After pre-processing, the data set was managed in spatial DBMS
and additional analysis and geo-visualization
was conducted as you can see in the figure.
As outcomes of the analysis
a more customized fuel consumption model was obtained and analysis
of influential variables and eco-
routing for minimization of fuel consumption was accomplished.
So far we have discussed and I
have presented six different solution frameworks
based on the integrated framework of GIS spatial DBMS,
Spatial Data Analytics and Big Data Systems.
However, there could be other variations as illustrated in the figure.
A combination of only spatial big data system and spatial data
analytics or just simply a spatial big data system.
Or a combination of all spatial big data system and GIS.
When do you think would other types work out the best?
The last three types are characterized
by the fact that they do not make use of spatial DBMS.
In other words, when data size is seriously 'BIG'.
Then the three solutions would be worth being considered.
Alright, this is the end of this lecture.
Thank you for your attention and see in the next lecture.