Creating a MySQL Database on Google's Cloud Compute for Lahman's Baseball Data

Google recently released fully managed MySQL and postgreSQL instances that are very easy to deploy. While not GIS per se, SQL is a fundamental GISer skill in general, and so as a baseball fan, I decided to see how easy it would be to create a MySQL database for Sean Lahman's database. It took me about 15 minutes to figure out and do the work. Following is how it's done.

Step 0, go to this page and follow the directions. Frankly, I can't do it better than Google's quick start. After completing this tutorial, you will have created a small database.

Now, after doing that, you can download the Lahman's database.

1) First, exit MySQL:

mysql> exit

2) Use wget to download the most current SQL database from this page:

$ wget http://seanlahman.com/files/database/lahman2016-sql.zip

3) And then unzip the file:

$ unzip lahman2016-sql.zip

4) Open MySQL

$ gcloud beta sql connect <databasename> --user=root

5) See what databases exist on your MySQL server already.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
+--------------------+
3 rows in set (0.00 sec)

6) Create a new database for the Lahman data.

mysql> CREATE DATABASE lahman2016

7) See that your new database is there.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| lahman2016         |
| mysql              |
| performance_schema |
+--------------------+
4 rows in set (0.00 sec)

8) Connect to the new database.

mysql> use lahman2016;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed

9) Import the Lahman saved SQL database you downloaded. When you use this command, it will import the database and a long list of output will show as it adds the data.

mysql> source lahman2016.sql

10) Examine the database.

mysql> SHOW TABLES;
+----------------------+
| Tables_in_lahman2016 |
+----------------------+
| AllstarFull          |
| Appearances          |
| AwardsManagers       |
| AwardsPlayers        |
| AwardsShareManagers  |
| AwardsSharePlayers   |
| Batting              |
| BattingPost          |
| CollegePlaying       |
| Fielding             |
| FieldingOF           |
| FieldingOFsplit      |
| FieldingPost         |
| HallOfFame           |
| HomeGames            |
| Managers             |
| ManagersHalf         |
| Master               |
| Parks                |
| Pitching             |
| PitchingPost         |
| Salaries             |
| Schools              |
| SeriesPost           |
| Teams                |
| TeamsFranchises      |
| TeamsHalf            |
+----------------------+
27 rows in set (0.00 sec)

mysql> SHOW COLUMNS FROM Batting;
+----------+--------------+------+-----+---------+-------+
| Field    | Type         | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| playerID | varchar(255) | YES  |     | NULL    |       |
| yearID   | int(11)      | YES  |     | NULL    |       |
| stint    | int(11)      | YES  |     | NULL    |       |
| teamID   | varchar(255) | YES  |     | NULL    |       |
| lgID     | varchar(255) | YES  |     | NULL    |       |
| G        | int(11)      | YES  |     | NULL    |       |
| AB       | int(11)      | YES  |     | NULL    |       |
| R        | int(11)      | YES  |     | NULL    |       |
| H        | int(11)      | YES  |     | NULL    |       |
| 2B       | int(11)      | YES  |     | NULL    |       |
| 3B       | int(11)      | YES  |     | NULL    |       |
| HR       | int(11)      | YES  |     | NULL    |       |
| RBI      | int(11)      | YES  |     | NULL    |       |
| SB       | int(11)      | YES  |     | NULL    |       |
| CS       | int(11)      | YES  |     | NULL    |       |
| BB       | int(11)      | YES  |     | NULL    |       |
| SO       | int(11)      | YES  |     | NULL    |       |
| IBB      | varchar(255) | YES  |     | NULL    |       |
| HBP      | varchar(255) | YES  |     | NULL    |       |
| SH       | varchar(255) | YES  |     | NULL    |       |
| SF       | varchar(255) | YES  |     | NULL    |       |
| GIDP     | varchar(255) | YES  |     | NULL    |       |
+----------+--------------+------+-----+---------+-------+
22 rows in set (0.01 sec)

11) Dominate your fantasy baseball league.

Encapsulation: getters, setters, public, private, and properties

This week in the Algorithms and Data Structures, we're going to briefly talk about encapsulation in Python (this post), implement a stack and a queue building on the linked list class (resources here), and then implement some more methods in the linked list like mean and standard deviation (post to come after the meeting).

Here's a Python.com tutorial that is a nice read to get started. All of the following code can be found in this Jupyter notebook.

Python Style

A fundamental idea in object oriented programming is encapsulation - the idea that attributes of class should be almost always private.

In the Algorithms and Data Structures for GIScientists study group, we recently implemented a linked list class in python following the lead from the video here. The video was great, and served our purposes well by getting everyone on the same page about linked lists, but raised a lot of questions for us in the group though about OOP in Python versus Java.

To followup on this are some videos that we will watch and discuss. 

Public and Private Variables

The following is from a post that is worth reading. Coding is cultural, and this post unpacks the mechanics the underly some of the cultural preferences between different languages with a specific focus on Python.

"Some people teach that _x is Python's equivalent of protected, and __x its equivalent of private, but that's very misleading.

The single underscore has only a conventional meaning: don't count on this being part of the useful and/or stable interface. Many introspection tools (e.g., tab completion in the interactive interface) will skip over underscore-prefixed names by default, but nothing stops a consumer from writing spam._eggs to access the value.

The double underscore mangles the name—inside your own methods, the attribute is named __x, but from anywhere else, it's named _MyClass__x. But this is not there to add any more protection—after all, _MyClass__x will still show up in dir(my_instance), and someone can still write my_instance._MyClass__x = 42. What it's there for is to prevent subclasses from accidentally shadowing your attributes or methods. (This is primarily important when the base classes and subclasses are implemented independently—you wouldn't want to add a new _spam attribute to your library and accidentally break any app that subclasses your library and adds a _spam attribute.)"

Property Decorators

This site provides a very nice consideration of public and private variables, and the use of @property. An excerpt:

"Getters and setters are used in many object oriented programming languages to ensure the principle of data encapsulation. They are known as mutator methods as well. ... These methods are of course the getter for retrieving the data and the setter for changing the data. According to this principle, the attributes of a class are made private to hide and protect them from other code."

Some General Thoughts on Style in Python

To start off with, in Java, encapsulation is implemented in a class by making all variables private and then using getters and setters. This is how I learned OOP.

Following is an example from last week's node class written in Python, but in a Java-like way:


class Node(object):

    def __init__(self, d, n = None):
        self.data = d
        self.next_node = n
        
    def get_next(self):
        return self.next_node
    
    def set_next(self, n):
        self.next_node = n
        
    def get_data(self):
        return self.data
    
    def set_data(self, d):
        self.data = d
    
    def __str__(self):
        print self.data
        print self.next_node

In the above example, the getters and setters are completely irrelevant. In your code, you can access the data and next_node by typing <nameOfObject>.data or <nameOfObject>.next_node.

As you can see, the class is written in a Java style and works just fine. The problem is that it doesn't follow convention. For reference, in PEP8, there are three references to properties in inheritance.

The following is an example of making the attributes of the node class private, but node, this still isn't best practice. It just makes things seem less redundant.

Node Class Private


class NodePrivate(object):

    def __init__(self, d, n = None):
        self.__data = d
        self.__next_node = n
        
    def get_next(self):
        return self.__next_node
    
    def set_next(self, n):
        self.__next_node = n
        
    def get_data(self):
        return self.__data
    
    def set_data(self, d):
        self.__data = d
    
    def __str__(self):
        print self.__data
        print self.__next_node

This is what the node class would look like with properties.


class NodeProperties(object):
    
    def __init__(self, d, n = None):

        self.__data = d
        self.__next_node = n
    
    @property
    def data(self):
        return self.__data
        
    @property
    def next_node(self):
         return self.__next_node
        
    @next_node.setter
    def next_node(self, n):
        self.__next_node = n
        
    @data.setter
    def data(self, d):
        self.__data = d
    
    def __str__(self):
        print self.__data
        print self.__next_node

But according to the interwebs, it's best to not use them at all, which would make the class look like this.


class NodePublic(object):
    
    def __init__(self, d, n = None):
        self.data = d
        self.next_node = n
        
    def __str__(self):
        print self.data
        print self.next_node

I think we can all agree, the totally public is really good looking.

Yes you do, my friend, yes you do. 

Yes you do, my friend, yes you do. 

All of the above code can be found in this Jupyter notebook.

Last of all, the real power of python and properties is that you can hack out the first node class, let everyone use it, and then at a later time realize you want to change some things, and use properties to encapsulate the public data so variable names don't have to change in the rest of the system. This probably isn't the best design principle. The general rule though still seems to be according to PEP8, if in doubt, make it private.

Resources on Stacks and Queues

Here is the Python 2 documentation for lists, which has the methods needed for stacks and queues. For a nice visual overview of these data structures, see this blog post. Following is a video that explains stacks and queues, chosen because Damien Gordon is by far the most dignified.

If you remain uninterested in implementing this data structure, here's a video showing the syntax with the list.

The 80's Farm Crisis: losses loom large

If you grew up on or around a family farm, the 80's means something very specific. It's the sort of thing that is said in passing, but with gravity, and often meant to imply a need for cautious decision making, or a general sense of down and out. The 1980's Farm Crisis still looms large in the minds of land owners, farmers, and rural communities

Students of the agricultural landscapes in the Midwest United States should study this history - doubly so if the work intends to comment on, or suggest potential changes to, rural land use patterns.

The following post is a compilation of videos that provide a range of perspectives on what happened in the Midwest in the 1960's, 70's, and 80's, and the overall impact on farming and rural communities.

The videos range from the USDA's 1965 videos on rural development initiatives and the advocacy against Reaganomics in Down and Out in America, to the nearly ethnographic approach of God's Country and the modern documentary style of Iowa Public Television. Each provides a window into this time period to be viewed critically and reflectively.

Note: I have only linked to one video of many in each series. If it doesn't autoplay to the next, additional videos can be found on YouTube.

USDA's Rural Development Initiatives (1965)

The first video is a video from the USDA on Poverty in Rural America. This video provides a sense of the rural development policy (and the attitudes of it) that encouraged people to borrow money to modernize their farms. This comes from a long tradition in the US government, and obviously, development is not bad, but the questions surrounding how development can and should happen still requires research. In many respects, the pushes for "sustainable development" and "reflexive modernization" are extensions of this long-standing development tradition, rife with the same underlying ethical and practical challenges.

God's Country (1979 | 1985)

The second documentary is God's Country directed by Louis Malle that profiles Glencoe, Minnesota in 1979 and then again in 1985. A colleague made me aware of this recently. It provides an unsentimental look at small-town life and agriculture, which helps to cut through some of the dense cultural narratives of what it means to be "rural."

Down and Out in America (1986)

Lee Grant's Down and Out in America provides a vivid depiction of the individual price farmers paid and the resulting rural activism in response to the Farm Crisis of the 1980's. This film is unique in that it hits on not only rural American farming, but also manufacturing, and then looks at challenges of poverty and housing in the city.

Iowa Public Television Farm Crisis (1970-1990)

This PBS documentary, while at times dry and overly sentimental, tries to give a full perspective on what happened in the farm crisis. When combined with the more ethnographic story-telling approaches of Malle and Grant, I think a full picture begins emerge about how modernist culture in the USDA and across agricultural sector, combined with hard-line free-market policies, interacted with market dynamics, weather, and credit to create the Farm Crisis.

Concluding Remarks

Obviously, the full picture isn't told by these films, but they do provide a glimpse into the time period from multiple different perspectives. If you come across other multimedia that you find compelling and thinks adds additional complexity to these stories, by all means let me know and I'd be happy to add to this post or link to other resources in the list below.

Additional Resources