Examples
--------

First example: extract a time-series using the API and plot it with matplotlib
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. plot::
    :include-source:

    from geextract import ts_extract, get_date
    from datetime import datetime
    import numpy as np
    import matplotlib.pyplot as plt
    plt.figure(figsize=(10,5))

    # Extract a Landsat 7 time-series for a 500m radius circular buffer around
    # a location in Yucatan
    lon = -89.8107197
    lat = 20.4159611
    raw_dict = ts_extract(lon=lon, lat=lat, sensor='LE7',
                          start=datetime(1999, 1, 1), radius=500)

    # Function to compute ndvi from a dictionary of the list of dictionaries returned
    # by ts_extract
    def ndvi(x):
        try:
            return (x['B4'] - x['B3']) / (x['B4'] + x['B3'])
        except:
            pass

    # Build x and y arrays and remove missing values 
    x = np.array([get_date(d['id']) for d in raw_dict])
    y = np.array([ndvi(d) for d in raw_dict], dtype=np.float)
    x = x[~np.isnan(y)]
    y = y[~np.isnan(y)]

    # Make plot
    plt.plot_date(x, y, "--")
    plt.plot_date(x, y)
    plt.title("Landsat 7 NDVI time-series Uxmal")
    plt.ylabel("NDVI (-)")
    plt.grid(True)
    plt.show()


Second example: extract a time-series using the command line and read it in R
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    gee_extract.py -s LT5 -b 1980-01-01 -lon -89.8107 -lat 20.4159 -r 500 -db /tmp/gee_db.sqlite -site uxmal -table col_1
    gee_extract.py -s LE7 -b 1980-01-01 -lon -89.8107 -lat 20.4159 -r 500 -db /tmp/gee_db.sqlite -site uxmal -table col_1
    gee_extract.py -s LC8 -b 1980-01-01 -lon -89.8107 -lat 20.4159 -r 500 -db /tmp/gee_db.sqlite -site uxmal -table col_1


Running the three above commands gives you the following terminal output. Records refer to individual time steps at which Landsat observations were extracted. Note that some records may be "empty" due to absence of useful data after the cloud masking step. The amount of useful data of the entire time-series is therefore likely to be less than the reported extracted records.

.. code-block:: console
    
    Extracted 148 records from Google Eath Engine
    Extracted 231 records from Google Eath Engine
    Extracted 82 records from Google Eath Engine

A sqlite database has been created at ``/tmp/gee_db.sqlite``; it contains a table named ``col_1`` (for "collection 1") that can be read as an R dataframe using tools like ``dplyr``.

.. code-block:: r

    library(dplyr)
    library(DBI)

    # Open database connection (requires dbplyr and RSQLite packages, DBI installed via dbplyr)
    con <- dbConnect(RSQLite::SQLite(), dbname = "/tmp/gee_db.sqlite")
    dbListTables(con)
    df <- tbl(con, 'col_1') %>% collect()
    df


In that case the table contains only one time-series (uxmal site); we are therefore loading the whole table in memory using ``collect()`` without additional filtering query. 

.. code-block:: rout

    [1] "col_1"

    # A tibble: 390 x 11
       index  blue green id                     nir   red swir1 swir2 time       sensor site 
       <int> <dbl> <dbl> <chr>                <dbl> <dbl> <dbl> <dbl> <chr>      <chr>  <chr>
     1     0   434   635 LT05_020046_19850206  2077   675  2267  1281 1985-02-06 LT5    uxmal
     2     1   370   664 LT05_020046_19850427  2883   588  2136  1128 1985-04-27 LT5    uxmal
     3     2   385   592 LT05_020046_19860108  2732   553  2010   953 1986-01-08 LT5    uxmal
     4     3   555   748 LT05_020046_19860313  1971   823  2497  1479 1986-03-13 LT5    uxmal
     5     4   574   804 LT05_020046_19860414  2216   919  2751  1701 1986-04-14 LT5    uxmal
     6     5   790  1084 LT05_020046_19860703  3852   955  2205  1121 1986-07-03 LT5    uxmal
     7     6   546   858 LT05_020046_19860820  3876   730  1968   896 1986-08-20 LT5    uxmal
     8     7   334   560 LT05_020046_19861007  2694   532  2088  1072 1986-10-07 LT5    uxmal
     9     8   321   539 LT05_020046_19861023  2550   524  2064  1082 1986-10-23 LT5    uxmal
    10     9   590   832 LT05_020046_19870417  2390   891  2752  1660 1987-04-17 LT5    uxmal
    # ... with 380 more rows

This dataframe (or tibble) can now be used as the base for all kind of data analysis in R. Here we'll make some simple plots using the ``ggplot2`` package.

.. code-block:: r

    library(ggplot2)
    library(tidyr)

    df %>% mutate(ndvi = (nir - red) / (nir + red)) %>%
      ggplot(aes(time, ndvi)) +
        geom_line() +
        geom_point(aes(col = sensor)) +
        theme_bw()

.. image:: figs/ndvi_uxmal.png


.. code-block:: r

    df %>% gather(key, value, -c(time, index, sensor, site, id)) %>%
      ggplot(aes(time, value)) +
        geom_line() +
        geom_point(aes(col = sensor)) +
        facet_grid(key ~ ., scales = 'free') +
        theme_bw()

.. image:: figs/multispectral_uxmal.png


The idea when working with multiple sites is to append them all to the same database table and use sql (raw or via ``dplyr``) to filter the desired data. Ordering time-series for multiple sites can be done in batch thanks to the ``gee_extract_batch.py`` command (run ``gee_extract_batch.py --help`` to see the detailed usage instructions). Here we will simply append another site to the sqlite table by re-running the ``gee_extract.py`` commands with different coordinates.

.. code-block:: bash

    gee_extract.py -s LT5 -b 1980-01-01 -lon 4.7174 -lat 44.7814 -r 500 -db /tmp/gee_db.sqlite -site rompon -table col_1
    gee_extract.py -s LE7 -b 1980-01-01 -lon 4.7174 -lat 44.7814 -r 500 -db /tmp/gee_db.sqlite -site rompon -table col_1
    gee_extract.py -s LC8 -b 1980-01-01 -lon 4.7174 -lat 44.7814 -r 500 -db /tmp/gee_db.sqlite -site rompon -table col_1

.. code-block:: console

    Extracted 104 records from Google Eath Engine
    Extracted 494 records from Google Eath Engine
    Extracted 193 records from Google Eath Engine

Now the ``col_1`` sqlite table contains time-series for two different sites (uxmal and rompon). Loading the time-series of a single site can be done thanks to the ``filter()`` dplyr verb.

.. code-block:: r

    df <- tbl(con, 'col_1') %>%
      filter(site == 'rompon') %>%
      collect() %>%
      mutate(time = as.Date(time))
    df

.. code-block:: rout

    # A tibble: 513 x 11
       index  blue green id                     nir   red swir1 swir2 time       sensor site  
       <int> <dbl> <dbl> <chr>                <dbl> <dbl> <dbl> <dbl> <date>     <chr>  <chr> 
     1     0  1023  1179 LT05_196029_19840409  2438  1193  2096  1329 1984-04-09 LT5    rompon
     2     1   822  1035 LT05_196029_19840425  2561   987  2025  1125 1984-04-25 LT5    rompon
     3     4   451   715 LT05_196029_19840612  3481   582  1893   870 1984-06-12 LT5    rompon
     4     6   481   691 LT05_196029_19840815  2935   624  1799   866 1984-08-15 LT5    rompon
     5     7   370   590 LT05_196029_19840831  2880   534  1736   818 1984-08-31 LT5    rompon
     6     8   358   580 LT05_196029_19841002  2833   510  1560   708 1984-10-02 LT5    rompon
     7    10   408   642 LT05_196029_19841119  2491   656  1693   817 1984-11-19 LT5    rompon
     8    13   744   991 LT05_196029_19850327  2284  1033  2046  1239 1985-03-27 LT5    rompon
     9    16   579   845 LT05_196029_19850530  3678   697  2114   999 1985-05-30 LT5    rompon
    10    17   546   800 LT05_196029_19850615  3697   654  1928   933 1985-06-15 LT5    rompon
    # ... with 503 more rows

It is also possible to load the entire table to for instance plot the two time-series side by side.

.. code-block:: r
    
    df <- tbl(con, 'col_1') %>%
      collect() %>%
      mutate(time = as.Date(time))

    df %>% mutate(ndmi = (nir - swir1) / (nir + swir1)) %>%
        ggplot(aes(time, ndmi)) +
          geom_line() +
          geom_point(aes(col = sensor)) +
          facet_grid(site ~ ., scales = 'free') +
          theme_bw()

.. image:: figs/ndmi_2sites.png


Third example: Extract time-series for each feature of a polygon feature collection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The file ``cdmx_parks.gpkg`` contains contours of some of the parks of Mexico City. The example below extracts the spatially aggregated time-series for each of these parks, writes the results to a sqlite database and reads the data back into python for plotting.

.. plot::
    :include-source:

    from geextract import ts_extract, relabel, date_append, dictlist2sqlite
    import fiona
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    import sqlite3
    from datetime import datetime
    import os

    try:
        os.remove('/tmp/landsat_cdmx.sqlite')
    except:
        pass

    # Open feature collection generator
    with fiona.open('data/cdmx_parks.gpkg', layer='parks') as src:
        # Iterate over feature collection
        for feature in src:
            # Extract time-series
            ts_0 = ts_extract(sensor='LC8', start=datetime(2012, 1, 1),
                              feature=feature)
            ts_1 = relabel(ts_0, 'LC8')
            ts_2 = date_append(ts_1)
            # Write dictionnary list to sqlite database table
            dictlist2sqlite(ts_2, site=feature['properties']['name'],
                            sensor='LC8', db_src='/tmp/landsat_cdmx.sqlite',
                            table='cdmx')

    # REad the data back into a pandas dataframe
    conn = sqlite3.connect("/tmp/landsat_cdmx.sqlite")
    df = pd.read_sql_query('select * from cdmx', conn)
    df['date'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
    df['ndvi'] = (df.nir - df.red) / (df.nir + df.red)
    # Make facetgrid plot
    g = sns.FacetGrid(df, row='site', aspect=4, size=2)
    def dateplot(x, y, **kwargs):
        ax = plt.gca()
        data = kwargs.pop("data")
        data.plot(x=x, y=y, ax=ax, grid=False, **kwargs)
    g = g.map_dataframe(dateplot, "date", "ndvi")
    plt.show()