Jupyter Notebook Web Access on Google Cloud

Google has fantastic tutorials on how to set up a VM; the tutorials were a bit sketchy for how to get a Jupyter notebook server up and running that can be accessed via password through a web browser. This video filled the gap for me by providing careful attention to the firewall settings and other details that like the use of tmux (cheat sheet).

Messy Bash Script and Notes 

#tutorial from : https://www.youtube.com/watch?v=gMDQZPoMECE
#tmux tutorial from: https://www.youtube.com/watch?v=BHhA_ZKjyxo


sudo apt-get update
sudo apt-get --assume-yes upgrade
sudo apt-get --assume-yes install tmux build-essential gcc g++ make binutils
sudo apt-get --assume-yes install software-properties-common

sudo apt-get install python-setuptools python-dev build-essential
sudo easy_install pip

sudo -H pip install jupyter
#jupyter notebook --generate-config
#enter the config file, then change this information
#sudo nano ~/.jupyter/jupyter_notebook_config.py
#c = get_config()
#c.NotebookApp.ip = "*"
#c.NotebookApp.open_browser = False
#c.NotebookApp.port = 5678

#use tmux to keep things running
#jupyter notebook password >> create login password

# to run: jupyter-notebook --no-browser --port=5678

### external port must match VM firewall settings

Creating a MySQL Database on Google's Cloud Compute for Lahman's Baseball Data

Google recently released fully managed MySQL and postgreSQL instances that are very easy to deploy. While not GIS per se, SQL is a fundamental GISer skill in general, and so as a baseball fan, I decided to see how easy it would be to create a MySQL database for Sean Lahman's database. It took me about 15 minutes to figure out and do the work. Following is how it's done.

Step 0, go to this page and follow the directions. Frankly, I can't do it better than Google's quick start. After completing this tutorial, you will have created a small database.

Now, after doing that, you can download the Lahman's database.

1) First, exit MySQL:

mysql> exit

2) Use wget to download the most current SQL database from this page:

$ wget http://seanlahman.com/files/database/lahman2016-sql.zip

3) And then unzip the file:

$ unzip lahman2016-sql.zip

4) Open MySQL

$ gcloud beta sql connect <databasename> --user=root

5) See what databases exist on your MySQL server already.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
+--------------------+
3 rows in set (0.00 sec)

6) Create a new database for the Lahman data.

mysql> CREATE DATABASE lahman2016

7) See that your new database is there.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| lahman2016         |
| mysql              |
| performance_schema |
+--------------------+
4 rows in set (0.00 sec)

8) Connect to the new database.

mysql> use lahman2016;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed

9) Import the Lahman saved SQL database you downloaded. When you use this command, it will import the database and a long list of output will show as it adds the data.

mysql> source lahman2016.sql

10) Examine the database.

mysql> SHOW TABLES;
+----------------------+
| Tables_in_lahman2016 |
+----------------------+
| AllstarFull          |
| Appearances          |
| AwardsManagers       |
| AwardsPlayers        |
| AwardsShareManagers  |
| AwardsSharePlayers   |
| Batting              |
| BattingPost          |
| CollegePlaying       |
| Fielding             |
| FieldingOF           |
| FieldingOFsplit      |
| FieldingPost         |
| HallOfFame           |
| HomeGames            |
| Managers             |
| ManagersHalf         |
| Master               |
| Parks                |
| Pitching             |
| PitchingPost         |
| Salaries             |
| Schools              |
| SeriesPost           |
| Teams                |
| TeamsFranchises      |
| TeamsHalf            |
+----------------------+
27 rows in set (0.00 sec)

mysql> SHOW COLUMNS FROM Batting;
+----------+--------------+------+-----+---------+-------+
| Field    | Type         | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| playerID | varchar(255) | YES  |     | NULL    |       |
| yearID   | int(11)      | YES  |     | NULL    |       |
| stint    | int(11)      | YES  |     | NULL    |       |
| teamID   | varchar(255) | YES  |     | NULL    |       |
| lgID     | varchar(255) | YES  |     | NULL    |       |
| G        | int(11)      | YES  |     | NULL    |       |
| AB       | int(11)      | YES  |     | NULL    |       |
| R        | int(11)      | YES  |     | NULL    |       |
| H        | int(11)      | YES  |     | NULL    |       |
| 2B       | int(11)      | YES  |     | NULL    |       |
| 3B       | int(11)      | YES  |     | NULL    |       |
| HR       | int(11)      | YES  |     | NULL    |       |
| RBI      | int(11)      | YES  |     | NULL    |       |
| SB       | int(11)      | YES  |     | NULL    |       |
| CS       | int(11)      | YES  |     | NULL    |       |
| BB       | int(11)      | YES  |     | NULL    |       |
| SO       | int(11)      | YES  |     | NULL    |       |
| IBB      | varchar(255) | YES  |     | NULL    |       |
| HBP      | varchar(255) | YES  |     | NULL    |       |
| SH       | varchar(255) | YES  |     | NULL    |       |
| SF       | varchar(255) | YES  |     | NULL    |       |
| GIDP     | varchar(255) | YES  |     | NULL    |       |
+----------+--------------+------+-----+---------+-------+
22 rows in set (0.01 sec)

Encapsulation: getters, setters, public, private, and properties

This week in the Algorithms and Data Structures, we're going to briefly talk about encapsulation in Python (this post), implement a stack and a queue building on the linked list class (resources here), and then implement some more methods in the linked list like mean and standard deviation (post to come after the meeting).

Here's a Python.com tutorial that is a nice read to get started. All of the following code can be found in this Jupyter notebook.

Python Style

A fundamental idea in object oriented programming is encapsulation - the idea that attributes of class should be almost always private.

In the Algorithms and Data Structures for GIScientists study group, we recently implemented a linked list class in python following the lead from the video here. The video was great, and served our purposes well by getting everyone on the same page about linked lists, but raised a lot of questions for us in the group though about OOP in Python versus Java.

To followup on this are some videos that we will watch and discuss. 

Public and Private Variables

The following is from a post that is worth reading. Coding is cultural, and this post unpacks the mechanics the underly some of the cultural preferences between different languages with a specific focus on Python.

"Some people teach that _x is Python's equivalent of protected, and __x its equivalent of private, but that's very misleading.

The single underscore has only a conventional meaning: don't count on this being part of the useful and/or stable interface. Many introspection tools (e.g., tab completion in the interactive interface) will skip over underscore-prefixed names by default, but nothing stops a consumer from writing spam._eggs to access the value.

The double underscore mangles the name—inside your own methods, the attribute is named __x, but from anywhere else, it's named _MyClass__x. But this is not there to add any more protection—after all, _MyClass__x will still show up in dir(my_instance), and someone can still write my_instance._MyClass__x = 42. What it's there for is to prevent subclasses from accidentally shadowing your attributes or methods. (This is primarily important when the base classes and subclasses are implemented independently—you wouldn't want to add a new _spam attribute to your library and accidentally break any app that subclasses your library and adds a _spam attribute.)"

Property Decorators

This site provides a very nice consideration of public and private variables, and the use of @property. An excerpt:

"Getters and setters are used in many object oriented programming languages to ensure the principle of data encapsulation. They are known as mutator methods as well. ... These methods are of course the getter for retrieving the data and the setter for changing the data. According to this principle, the attributes of a class are made private to hide and protect them from other code."

Some General Thoughts on Style in Python

To start off with, in Java, encapsulation is implemented in a class by making all variables private and then using getters and setters. This is how I learned OOP.

Following is an example from last week's node class written in Python, but in a Java-like way:


class Node(object):

    def __init__(self, d, n = None):
        self.data = d
        self.next_node = n
        
    def get_next(self):
        return self.next_node
    
    def set_next(self, n):
        self.next_node = n
        
    def get_data(self):
        return self.data
    
    def set_data(self, d):
        self.data = d
    
    def __str__(self):
        print self.data
        print self.next_node

In the above example, the getters and setters are completely irrelevant. In your code, you can access the data and next_node by typing <nameOfObject>.data or <nameOfObject>.next_node.

As you can see, the class is written in a Java style and works just fine. The problem is that it doesn't follow convention. For reference, in PEP8, there are three references to properties in inheritance.

The following is an example of making the attributes of the node class private, but node, this still isn't best practice. It just makes things seem less redundant.

Node Class Private


class NodePrivate(object):

    def __init__(self, d, n = None):
        self.__data = d
        self.__next_node = n
        
    def get_next(self):
        return self.__next_node
    
    def set_next(self, n):
        self.__next_node = n
        
    def get_data(self):
        return self.__data
    
    def set_data(self, d):
        self.__data = d
    
    def __str__(self):
        print self.__data
        print self.__next_node

This is what the node class would look like with properties.


class NodeProperties(object):
    
    def __init__(self, d, n = None):

        self.__data = d
        self.__next_node = n
    
    @property
    def data(self):
        return self.__data
        
    @property
    def next_node(self):
         return self.__next_node
        
    @next_node.setter
    def next_node(self, n):
        self.__next_node = n
        
    @data.setter
    def data(self, d):
        self.__data = d
    
    def __str__(self):
        print self.__data
        print self.__next_node

But according to the interwebs, it's best to not use them at all, which would make the class look like this.


class NodePublic(object):
    
    def __init__(self, d, n = None):
        self.data = d
        self.next_node = n
        
    def __str__(self):
        print self.data
        print self.next_node

I think we can all agree, the totally public is really good looking.

Yes you do, my friend, yes you do. 

Yes you do, my friend, yes you do. 

All of the above code can be found in this Jupyter notebook.

Last of all, the real power of python and properties is that you can hack out the first node class, let everyone use it, and then at a later time realize you want to change some things, and use properties to encapsulate the public data so variable names don't have to change in the rest of the system. This probably isn't the best design principle. The general rule though still seems to be according to PEP8, if in doubt, make it private.

Resources on Stacks and Queues

Here is the Python 2 documentation for lists, which has the methods needed for stacks and queues. For a nice visual overview of these data structures, see this blog post. Following is a video that explains stacks and queues, chosen because Damien Gordon is by far the most dignified.

If you remain uninterested in implementing this data structure, here's a video showing the syntax with the list.