11/27/2010

 

Package, the missing language feature - Part II

Problems

In previous post, we have talked about how package works in Python language. Essentially, the problem is, the package is a good box, but color is not black. We want the package to expose all its API at the package level, and seal up any internal details. from A import * should give you all the things you need, you do not need to import A.B, or import A.C.

Unimportable

So, how to make a module unimportable? There are two things you need to do. First, remove the B attribute from package A object. By doing this, import A.B will fail. Because import A.B will first import A, and then import A.B, and then get B from A. By deleting B from A the import will fail. Second you need to remove A.B from sys.modules, by sys.modules['A.B'] = None. This will make from A.B import * fail.

delattr(package_A, 'B')
sys.modules['A.B'] = None

This way, we completely hide the existence of A.B. Which is the behavior we want when other package want to import this private internal. The drawback of this mechanism is that, the error message user get is not friendly. They will be told the module does not exist, but it actually exist if you look it up in the file browser.

When?

By making internal packages modules unimportable we can make the parent package a blackbox. But when we do this, deleting all the internal packages and modules?

The best place is in the __init__.py of parent package. But after we delete the internal packages and modules, they are gone. What if A.B reference A.C in the code? The thing we need to do is to make sure A.B are initialized(imported) before sealing up A. In A.B it might use import A.C or from A.C import xxx, both ways copy the referenced name to local namespace. So even A.C no longer exists, in A.B they can still be referenced.

from .B import *
package_A = sys.modules['A']
delattr(package_A, 'B')
sys.modules['A.B'] = None
Where?

Do I need to write those kind of ungly delattr in every __init__.py file? Isn't that a cross-cutting concern that should not be repeated in every place?

Yes, let's find some way to magically inject those code in every __init__.py file. The code actually has three parts. Part 1, expose API. Part 2, eager load sub modules. Part 3, delete sub modules. API stil need to be manually defined in __init__.py. But part 2 and 3, they can be put into "post-import-hook".

What is post import hook? They are the code executed after module being imported. After A being imported, we can eager load all its sub modules by scanning folder and then delete them. Post import hook is not directly supported in Python, but can be done by more powerful meta import hook.

def register_meta_import_hook(should_apply, post_import_hooks):
    import sys
    import imp

    class Importer(object):
        def __init__(self, file, pathname, description):
            self.file = file
            self.pathname = pathname
            self.description = description

        def load_module(self, name):
            try:
                module = imp.load_module(name, self.file, self.pathname, self.description)
                for post_import_hook in post_import_hooks:
                    post_import_hook(module)
                return module
            finally:
                if self.file:
                    self.file.close()

    class Finder(object):
        def find_module(self, qualified_module_name, path):
            if not should_apply(qualified_module_name):
                return
            if not path:
                path = sys.path
            module_name = qualified_module_name.rpartition('.')[2]
            file, pathname, description = imp.find_module(module_name, path)
            return Importer(file, pathname, description)

    sys.meta_path.append(Finder())
Conclusion

By eager loading sub modules and delete them in post import hook. We can seal up package and force people to define the package API in __init__.py, because that is the only way to let outsider to use the internal.

Another interesting side effect is that the circular dependency between packages are no longer possible. In python, circular dependency between modules are not possible, but because module in package is lazy loaded, so circular dependency between packages were possible. But after eager loading sub modules, we now disabled the circular dependency between packages. It is good thing, but could be very strict.

Finally, we have the box. And automatically seal it up in post import hook. If the box writer want to make the box external usable, they need to define its API in __init__.py. Plus, no circular dependency ever.


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]