This week in the Algorithms and Data Structures, we're going to briefly talk about encapsulation in Python (this post), implement a stack and a queue building on the linked list class (resources here), and then implement some more methods in the linked list like mean and standard deviation (post to come after the meeting).
A fundamental idea in object oriented programming is encapsulation - the idea that attributes of class should be almost always private.
In the Algorithms and Data Structures for GIScientists study group, we recently implemented a linked list class in python following the lead from the video here. The video was great, and served our purposes well by getting everyone on the same page about linked lists, but raised a lot of questions for us in the group though about OOP in Python versus Java.
To followup on this are some videos that we will watch and discuss.
Public and Private Variables
The following is from a post that is worth reading. Coding is cultural, and this post unpacks the mechanics the underly some of the cultural preferences between different languages with a specific focus on Python.
"Some people teach that _x is Python's equivalent of protected, and __x its equivalent of private, but that's very misleading.
The single underscore has only a conventional meaning: don't count on this being part of the useful and/or stable interface. Many introspection tools (e.g., tab completion in the interactive interface) will skip over underscore-prefixed names by default, but nothing stops a consumer from writing spam._eggs to access the value.
The double underscore mangles the name—inside your own methods, the attribute is named __x, but from anywhere else, it's named _MyClass__x. But this is not there to add any more protection—after all, _MyClass__x will still show up in dir(my_instance), and someone can still write my_instance._MyClass__x = 42. What it's there for is to prevent subclasses from accidentally shadowing your attributes or methods. (This is primarily important when the base classes and subclasses are implemented independently—you wouldn't want to add a new _spam attribute to your library and accidentally break any app that subclasses your library and adds a _spam attribute.)"
This site provides a very nice consideration of public and private variables, and the use of @property. An excerpt:
"Getters and setters are used in many object oriented programming languages to ensure the principle of data encapsulation. They are known as mutator methods as well. ... These methods are of course the getter for retrieving the data and the setter for changing the data. According to this principle, the attributes of a class are made private to hide and protect them from other code."
Some General Thoughts on Style in Python
To start off with, in Java, encapsulation is implemented in a class by making all variables private and then using getters and setters. This is how I learned OOP.
Following is an example from last week's node class written in Python, but in a Java-like way:
class Node(object): def __init__(self, d, n = None): self.data = d self.next_node = n def get_next(self): return self.next_node def set_next(self, n): self.next_node = n def get_data(self): return self.data def set_data(self, d): self.data = d def __str__(self): print self.data print self.next_node
In the above example, the getters and setters are completely irrelevant. In your code, you can access the data and next_node by typing <nameOfObject>.data or <nameOfObject>.next_node.
As you can see, the class is written in a Java style and works just fine. The problem is that it doesn't follow convention. For reference, in PEP8, there are three references to properties in inheritance.
The following is an example of making the attributes of the node class private, but node, this still isn't best practice. It just makes things seem less redundant.
Node Class Private
class NodePrivate(object): def __init__(self, d, n = None): self.__data = d self.__next_node = n def get_next(self): return self.__next_node def set_next(self, n): self.__next_node = n def get_data(self): return self.__data def set_data(self, d): self.__data = d def __str__(self): print self.__data print self.__next_node
This is what the node class would look like with properties.
class NodeProperties(object): def __init__(self, d, n = None): self.__data = d self.__next_node = n @property def data(self): return self.__data @property def next_node(self): return self.__next_node @next_node.setter def next_node(self, n): self.__next_node = n @data.setter def data(self, d): self.__data = d def __str__(self): print self.__data print self.__next_node
But according to the interwebs, it's best to not use them at all, which would make the class look like this.
class NodePublic(object): def __init__(self, d, n = None): self.data = d self.next_node = n def __str__(self): print self.data print self.next_node
I think we can all agree, the totally public is really good looking.
All of the above code can be found in this Jupyter notebook.
Last of all, the real power of python and properties is that you can hack out the first node class, let everyone use it, and then at a later time realize you want to change some things, and use properties to encapsulate the public data so variable names don't have to change in the rest of the system. This probably isn't the best design principle. The general rule though still seems to be according to PEP8, if in doubt, make it private.