11/25/2010
Package, the missing language feature - Part I
We have spent a way too much time on functions and classes. We put a lot of energy to maintain a clean and concise interface of classes. We care about the dependency between classes, by encouraging dependence injection and wire up objects via interface.
Besides functions and classes, we do have higher level construct. We have hacked the class loading of java to the hell, and then Satan gives us back his OSGi. We have invented a dedicated job to maintain the manifest of EJB. When dependency injection is not enough, people do find concept called Module emerging in modern things like Guice and Autofac.
What is Package? Itself is merely a name. It is just some annoying leading dots before the thing you actually want. It is not even a thing, it is just a being ignored prefix. People might say, oh yes, package is not doing anything, why should I care? Function is doing something, class is also doing somethings, package is just some dummy folder that I can put those valuable stuff inside it.
True, very true... so does the language designers. I can not say all of them does, but at least some of them does. Stroustrup ignored package. Gosling ignored package. Even Hejlsberg ignored package (But, assembly is better than nothing). What a huge mistake!
The problem we normally need to solve when writing business software is not some scientific work. In my mind, the only problem we need to solve is managing complexity. As we learned long time ago, the only way to control complexity is to break it down, and break it down further. One thing containing many other things. We need blackbox to encapsulate the internal complexity monster and give outside a clean and simple illusion. But constantly, when using Java or C# I find I need to reinvent all kinds of blackboxes to meet my needs. And none of them seems naturally to new comers, simply because they are not part of the original language, not known to most people, and not supported by many tools. There are many design patterns, people say they exist because the language itself is flawed. There are also many component platform/framework, I say it is because the language itself is flawed. It is because the language does not give us the blackbox, so we need to invent one ourself.
Package, it is a missing language feature for a long time. But luckily, Java or C# is not the only choice we have. In another open wonderland, without money but happiness, we have our lovely Python. In there, we finally see what is called package.
Package in PythonPackage in python is simple. If you have a folder called some_package, and you have a __init__.py file in that folder, then it becomes a package called some_package. If you happen to have another folder inside some_package folder called another_package, and itself also has a __init__.py file inside the folder, then it becomes some_package.another_package.
The key difference between package in Java and package in Python is, in Java, the package is just a literal symbol, it does not exist in the runtime. In Python, the package is a living object and you set and get attribute on it anytime. some_package.another_package.abc = 'def' is a valid statement in Python language.
This gives us the box we want. We can use this box to define our interface and hide our internal complexities. A package structure like A.B.C, A should hide B, C. B should hide C. In the A level, you might say start the car. In the B level, you might say start the engine, and then start radio and air conditioner. The hierarchical structure of package naming is the best fit for natural encapsulation.
The box can also initialize itself. It has a __init__.py file which can be used to execute any bootstrapping code. Sometimes we need to sort out some internal stuff before ready for outside service. Sometimes, we need to register ourself as subscriber for event published somewhere else. Having simple __init__ solves a lot of problem. It is powerful enough? Not really, it does not support thousand other features, like full lifecycle management, standard remote control interface, etc. As a user facing public component, the package construct exposed by the language is very limited. But we can build on top of what language provides us.
The real problem of python package is not it does not support things like JMX. The real problem is the box is not really a black box. Actually, everything in Python is sort of made by Glass. You can see right through nearly everything, that is public in Java term. Although we can use _ as convention, and __ as hard compiler constraint in some place. But here, the ugly underscore is not helping. You can always reference A.B.C.xxx anytime you want, and that is dangerous. It breaks encapsulation, introducing tangling dependency without being noticed. No one likes that, we want to make sure Y.X.Z only reference A.B.C.xxx through A.yyy. It should not know the internals like A.B.C.xxx.
Classic "pythonic" response would be, that is just a convention. When convention is there, people should follow. The problem is, this is not a easy convention, and can be broken in any minute. There is no easy rule people can follow. The real difficulty is, when you reference A.B.C.xxx you can not always reference it as A.yyy. If your code lives inside A.B, then you should not reference A.yyy, because the inside package should not reference the outside, as it is in the lower place in the dependency pyramid. In this case, you do need to reference A.B.C.xxx as A.B.C.xxx as it is some thing you have to deal with. It is no longer a hidden internal, you are living inside the internal. In other words, A.B.C.xxx is not always public or private. It is accessible or not depending on where you are. And that is exactly what encapsulation is about.
How can we make the box really a black box? Let's continue in the Part II.
Update 11.27
Alan (https://alanfranz.pip.verisignlabs.com/) commented: I'm not 100% sure about what you'd like to say in the next part, however:
- beware about "bootstrapping code". Many times such "static initializer" is known to provoke unpredictable problems, and will prevent package breakup via pkg_resources namespace_package , if ever needed.
Most of the times initialization should be performed by the client code or should be performed at first request; import-time initialization is absolutely abused in python coding. - convention is fine. If anybody imports a module or package that starts with underscore, it's their business - after all, if they've got the source code, they can modify all the names and make them public, can't they? Would you prefer java-like things where you can set methods and attributes private, and then you can access them via other means through some common.util.lang tool?
- nesting too much might just be unneeded, and if you want a leaf (a.b.c) not to depend on its parent you can use relative imports. But remember that two modules importing one the other trigger an error in Python, you simply can't do that.
Thanks Alan! I am surely aware of "bootstrapping code". The major problem of import time initialization is it is implicit. And it will be even worse if the bootstrapping is I/O intensive or causing other side effects. But if there are too many clients, like a lot of unit tests, pushing responsibility to them is also inconvenient. I use __import__('x.y.z') in the main function to implicit stating that I want to use those packages and initialize them now.
Starting with _ means private, that is fine. But a package is not always private, it is public to its siblings, but private to other packages. Traditional visibility only allow you to specify one thing is public or not, I think that is not enough. How many details you are allowed to know, or should depending on, might be contextual.
Python does check the circular dependency on module level. But it does not check circular dependency on package level. For example, A.B can not circular depend on C.D, but if A.E depend on C.D and C.D depend on A.B, that is allowed. But actually, that means package A depend on C, package C depend on A as well.
- beware about "bootstrapping code". Many times such "static initializer" is known to provoke unpredictable problems, and will prevent package breakup via pkg_resources namespace_package , if ever needed.
Most of the times initialization should be performed by the client code or should be performed at first request; import-time initialization is absolutely abused in python coding.
- convention is fine. If anybody imports a module or package that starts with underscore, it's their business - after all, if they've got the source code, they can modify all the names and make them public, can't they? Would you prefer java-like things where you can set methods and attributes private, and then you can access them via other means through some common.util.lang tool?
- nesting too much might just be unneeded, and if you want a leaf (a.b.c) not to depend on its parent you can use relative imports. But remember that two modules importing one the other trigger an error in Python, you simply can't do that.
PS
writing in English is fine, but having Blogger in Chinese is very hard to read and use for any foreigner :-)
Subscribe to Post Comments [Atom]
<< Home
Subscribe to Posts [Atom]