Implementing double in C
The previous two sections described how to define a double Typeand arithmetic operations on that Type, but all of them wereimplemented in pure Python. In this section we will see how to definethe double type in such a way that it can be used by operationsimplemented in C (which we will define in the section after that).
How does it work?
In order to be C-compatible, a Type must provide a C interface to thePython data that satisfy the constraints it puts forward. In otherwords, it must define C code that can convert a Python reference intosome type suitable for manipulation in C and it must define C codethat can convert some C structure in which the C implementation of anoperation stores its variables into a reference to an object that can beused from Python and is a valid value for the Type.
For example, in the current example, we have a Type which represents aPython float. First, we will choose a corresponding C type. Thenatural choice would be the primitive double
type. Then, we needto write code that will take a PyObject
, check that it is aPython float
and extract its value as a double
. Finally, weneed to write code that will take a C double
and will build aPyObject
of Python type float
that we can work with fromPython. We will be using CPython and thus special care must be givento making sure reference counts are updated properly!
The C code we will write makes use of CPython’s C API which you canfind here.
What needs to be defined
In order to be C-compatible, a Type must define several additionalmethods, which all start with the c_
prefix. The complete list canbe found in the documentation for gof.type.Type
. Here, we’ll focus onthe most important ones:
- class
CLinkerType
cdeclare
(_name, sub, check_input=True)This must return C code which declares variables. These variableswill be available to operations defined in C. You may also writetypedefs.
This must return C code which initializes the variables declared in
c_declare
. Either this orc_extract
will be called.This must return C code which takes a reference to a Python objectand initializes the variables declared in
c_declare
to match thePython object’s data. Either this orc_init
will be called.When the computations are done, transfer the variables from the Cstructure we put them in to the destination Python object. This willonly be called for the outputs.
When we are done using the data, clean up whatever we allocated anddecrease the appropriate reference counts.
clibraries
([_c_compiler])cheader_dirs
([_c_compiler])clib_dirs
([_c_compiler])- Allows you to specify headers, libraries and associated directories.
These methods have two versions, one with a _c_compiler_argument and one without. The version with c_compiler is triedfirst and if it doesn’t work, the one without is.
The c_compiler argument is the C compiler that will be usedto compile the C code for the node that uses this type.
ccompile_args
([_c_compiler])cno_compile_args
([_c_compiler])- Allows to specify special compiler arguments to add/exclude.
These methods have two versions, one with a _c_compiler_argument and one without. The version with c_compiler is triedfirst and if it doesn’t work, the one without is.
The c_compiler argument is the C compiler that will be usedto compile the C code for the node that uses this type.
c_init_code
()Allows you to specify code that will be executed once when themodule is initialized, before anything else is executed.For instance, if a type depends on NumPy’s C API, then
'import_array();'
has to be among the snippets returnedbyc_init_code()
.Allows to add helper functions/structs (in a string or a list of strings) that the Type needs.
Allows to specify a special compiler. This will force this compilerfor the current compilation block (a particular op or the full graph).This is used for the GPU code.
Should return a tuple of hashable objects like integers. Thisspecifies the version of the code. It is used to cache thecompiled code. You MUST change the returned tuple for eachchange in the code. If you don’t want to cache the compiled codereturn an empty tuple or don’t implement it.
- Optional: should return the name of the primitive C type ofitems into variables handled by this Theano type. For example,for a matrix of 32-bit signed NumPy integers, it should return
"npy_int32"
. If C type may change from an instance to another(e.g.Scalar('int32')
vsScalar('int64')
), considerimplementing this method. If C type is fixed accross instances,this method may be useless (as you already know the C typewhen you work with the C code).
Each of these functions take two arguments, name
and sub
whichmust be used to parameterize the C code they return. name
is astring which is chosen by the compiler to represent a Variable ofthe Type in such a way that there are no name conflicts betweendifferent pieces of data. Therefore, all variables declared incdeclare
should have a name which includes name
. Furthermore,the name of the variable containing a pointer to the Python objectassociated to the Variable is py
<name>
.
sub
, on the other hand, is a dictionary containing bits of C codesuitable for use in certain situations. For instance, sub['fail']
contains code that should be inserted wherever an error is identified.
c_declare
and c_extract
also accept a third check_input
optional argument. If you want your type to validate its inputs, it mustonly do it when check_input
is True.
The example code below should help you understand how everything playsout:
Warning
If some error condition occurs and you want to fail and/or raise anException, you must use the fail
code contained insub['fail']
(there is an example in the definition of cextract
below). You must _NOT use the return
statement anywhere, ever,nor break
outside of your own loops or goto
to strangeplaces or anything like that. Failure to comply with thisrestriction could lead to erratic behavior, segfaults and/or memoryleaks because Theano defines its own cleanup system and assumesthat you are not meddling with it. Furthermore, advanced operationsor types might do code transformations on your code such asinserting it in a loop – in that case they can call yourcode-generating methods with custom failure code that takes into accountwhat they are doing!
Defining the methods
c_declare
- def c_declare(name, sub):
- return """
- double %(name)s;
- """ % dict(name = name)
- double.c_declare = c_declare
Very straightforward. All we need to do is write C code to declare adouble. That double will be named whatever is passed to our functionin the name
argument. That will usually be some mangled name like“V0”, “V2” or “V92” depending on how many nodes there are in thecomputation graph and what rank the current node has. This functionwill be called for all Variables whose type is double
.
You can declare as many variables as you want there and you can alsodo typedefs. Make sure that the name of each variable contains thename
argument in order to avoid name collisions (collisions will_happen if you don’t parameterize the variable names as indicatedhere). Also note that you cannot declare a variable calledpy
<name>
or storage_<name>
because Theano already definesthem.
What you declare there is basically the C interface you are giving toyour Type. If you wish people to develop operations that make use ofit, it’s best to publish it somewhere.
c_init
- def c_init(name, sub):
- return """
- %(name)s = 0.0;
- """ % dict(name = name)
- double.c_init = c_init
This function has to initialize thedouble we declared previously to a suitable value. This is useful ifwe want to avoid dealing with garbage values, especially if our datatype is a pointer. This is not going to be called for all Variables withthe double
type. Indeed, if a Variable is an input that we passfrom Python, we will want to extract that input from a Python object,therefore it is the c_extract
method that will be called instead ofc_init
. You can therefore not assume, when writing c_extract
, that theinitialization has been done (in fact you can assume that it _hasn’t_been done).
c_init
will typically be called on output Variables, but in generalyou should only assume that either c_init
or c_extract
has beencalled, without knowing for sure which of the two.
c_extract
- def c_extract(name, sub):
- return """
- if (!PyFloat_Check(py_%(name)s)) {
- PyErr_SetString(PyExc_TypeError, "expected a float");
- %(fail)s
- }
- %(name)s = PyFloat_AsDouble(py_%(name)s);
- """ % dict(name = name, fail = sub['fail'])
- double.c_extract = c_extract
This method is slightly more sophisticated. What happens here is thatwe have a reference to a Python object which Theano has placed inpy%(name)s
where %(name)s
must be substituted for the namegiven in the inputs. This special variable is declared by Theano asPyObject* py
%(name)s
where PyObject*
is a pointer to a Pythonobject as defined by CPython’s C API. This is the reference thatcorresponds, on the Python side of things, to a Variable with thedouble
type. It is what the end user will give and what he or sheexpects to get back.
In this example, the user will give a Python float
. The firstthing we should do is verify that what we got is indeed a Pythonfloat
. The PyFloat_Check
function is provided by CPython’s CAPI and does this for us. If the check fails, we set an exception andthen we insert code for failure. The code for failure is insub["fail"]
and it basically does a goto
to cleanup code.
If the check passes then we convert the Python float into a doubleusing the PyFloat_AsDouble
function (yet again provided by CPython’s CAPI) and we put it in our double variable that we declared previously.
c_sync
- def c_sync(name, sub):
- return """
- Py_XDECREF(py_%(name)s);
- py_%(name)s = PyFloat_FromDouble(%(name)s);
- if (!py_%(name)s) {
- printf("PyFloat_FromDouble failed on: %%f\\n", %(name)s);
- Py_XINCREF(Py_None);
- py_%(name)s = Py_None;
- }
- """ % dict(name = name)
- double.c_sync = c_sync
This function is probably the trickiest. What happens here is that wehave computed some operation on doubles and we have put the variableinto the double variable %(name)s
. Now, we need to put this datainto a Python object that we can manipulate on the Python side ofthings. This Python object must be put into the py_%(name)s
variable which Theano recognizes (this is the same pointer we get inc_extract).
Now, that pointer is already a pointer to a valid Python object(unless you or a careless implementer did terribly wrong things withit). If we want to point to another object, we need to tell Pythonthat we don’t need the old one anymore, meaning that we need todecrease the previous object’s reference count. The first line,PyXDECREF(py%(name)s)
does exactly this. If it is forgotten,Python will not be able to reclaim the data even if it is not usedanymore and there will be memory leaks! This is especially importantif the data you work on is large.
Now that we have decreased the reference count, we callPyFloatFromDouble
on our double variable in order to convert itto a Python float
. This returns a new reference which we assign topy
%(name)s
. From there Theano will do the rest and the end userwill happily see a Python float
come out of his computations.
The rest of the code is not absolutely necessary and it is basically“good practice”. PyFloatFromDouble
can return NULL
on failure.NULL
is a pretty bad reference to have and neither Python nor Theanolike it. If this happens, we change the NULL
pointer (which willcause us problems) to a pointer to None
(which is _not a NULL
pointer). Since None
is an object like the others, we need toincrease its reference count before we can set a new pointer to it. Thissituation is unlikely to ever happen, but if it ever does, better safethan sorry.
Warning
I said this already but it really needs to be emphasized that ifyou are going to change the py%(name)s
pointer to point to anew reference, you _must decrease the reference count of whateverit was pointing to before you do the change. This is only valid ifyou change the pointer, if you are not going to change the pointer,do NOT decrease its reference count!
c_cleanup
- def c_cleanup(name, sub):
- return ""
- double.c_cleanup = c_cleanup
We actually have nothing to do here. We declared a double on the stackso the C language will reclaim it for us when its scope ends. Wedidn’t malloc()
anything so there’s nothing to free()
. Furthermore,the py_%(name)s
pointer hasn’t changed so we don’t need to doanything with it. Therefore, we have nothing to cleanup. Sweet!
There are however two important things to keep in mind:
First, note that c_sync
and c_cleanup
might be called insequence, so they need to play nice together. In particular, let’ssay that you allocate memory in c_init
or c_extract
for somereason. You might want to either embed what you allocated to some Pythonobject in c_sync
or to free it in c_cleanup
. If you do theformer, you don’t want to free the allocated storage so you should setthe pointer to it to NULL
to avoid that c_cleanup
mistakenlyfrees it. Another option is to declare a variable in c_declare
thatyou set to true in c_sync
to notify c_cleanup
that c_sync
was called.
Second, whenever you use %(fail)s
in c_extract
or in the code of anoperation, you can count on c_cleanup
being called rightafter that. Therefore, it’s important to make sure that c_cleanup
doesn’t depend on any code placed after a reference to%(fail)s
. Furthermore, because of the way Theano blocks code together,only the variables declared in c_declare
will be visible in c_cleanup
!
What the generated C will look like
c_init
and c_extract
will only be called if there is a Pythonobject on which we want to apply computations using Ccode. Conversely, c_sync
will only be called if we want tocommunicate the values we have computed to Python, and c_cleanup
will only be called when we don’t need to process the data with Canymore. In other words, the use of these functions for a given Variabledepends on the the relationship between Python and C with respect tothat Variable. For instance, imagine you define the following functionand call it:
- x, y, z = double('x'), double('y'), double('z')
- a = add(x, y)
- b = mul(a, z)
- f = function([x, y, z], b)
- f(1.0, 2.0, 3.0)
Using the CLinker, the code that will be produced will look roughlylike this:
- // BEGIN defined by Theano
- PyObject* py_x = ...;
- PyObject* py_y = ...;
- PyObject* py_z = ...;
- PyObject* py_a = ...; // note: this reference won't actually be used for anything
- PyObject* py_b = ...;
- // END defined by Theano
- {
- double x; //c_declare for x
- x = ...; //c_extract for x
- {
- double y; //c_declare for y
- y = ...; //c_extract for y
- {
- double z; //c_declare for z
- z = ...; //c_extract for z
- {
- double a; //c_declare for a
- a = 0; //c_init for a
- {
- double b; //c_declare for b
- b = 0; //c_init for b
- {
- a = x + y; //c_code for add
- {
- b = a * z; //c_code for mul
- labelmul:
- //c_cleanup for mul
- }
- labeladd:
- //c_cleanup for add
- }
- labelb:
- py_b = ...; //c_sync for b
- //c_cleanup for b
- }
- labela:
- //c_cleanup for a
- }
- labelz:
- //c_cleanup for z
- }
- labely:
- //c_cleanup for y
- }
- labelx:
- //c_cleanup for x
- }
It’s not pretty, but it gives you an idea of how things work (note thatthe variable names won’t be x
, y
, z
, etc. - they willget a unique mangled name). The fail
code runs a goto
to theappropriate label in order to run all cleanup that needs to bedone. Note which variables get extracted (the three inputs x
, y
andz
), which ones only get initialized (the temporary variable a
and theoutput b
) and which one is synced (the final output b
).
The C code above is a single C block for the whole graph. Depending onwhich linker is used to process the computation graph, it ispossible that one such block is generated for each operation and thatwe transit through Python after each operation. In that situation,a
would be synced by the addition block and extracted by themultiplication block.
Final version
- from theano import gof
- class Double(gof.Type):
- def filter(self, x, strict=False, allow_downcast=None):
- if strict and not isinstance(x, float):
- raise TypeError('Expected a float!')
- return float(x)
- def values_eq_approx(self, x, y, tolerance=1e-4):
- return abs(x - y) / (x + y) < tolerance
- def __str__(self):
- return "double"
- def c_declare(self, name, sub):
- return """
- double %(name)s;
- """ % dict(name = name)
- def c_init(self, name, sub):
- return """
- %(name)s = 0.0;
- """ % dict(name = name)
- def c_extract(self, name, sub):
- return """
- if (!PyFloat_Check(py_%(name)s)) {
- PyErr_SetString(PyExc_TypeError, "expected a float");
- %(fail)s
- }
- %(name)s = PyFloat_AsDouble(py_%(name)s);
- """ % dict(sub, name = name)
- def c_sync(self, name, sub):
- return """
- Py_XDECREF(py_%(name)s);
- py_%(name)s = PyFloat_FromDouble(%(name)s);
- if (!py_%(name)s) {
- printf("PyFloat_FromDouble failed on: %%f\\n", %(name)s);
- Py_XINCREF(Py_None);
- py_%(name)s = Py_None;
- }
- """ % dict(name = name)
- def c_cleanup(self, name, sub):
- return ""
- double = Double()
DeepCopyOp
We have an internal Op called DeepCopyOp. It is used to make sure werespect the user vs Theano memory region as described in the tutorial. Theano has a Python implementation that calls the object’scopy()
or deepcopy()
method for Theano types for which it does notknow how to generate C code.
You can implement c_code for this op. You register it like this:
- theano.compile.ops.register_deep_copy_op_c_code(YOUR_TYPE_CLASS, THE_C_CODE, version=())
In your C code, you should use %(iname)s and %(oname)s to representthe C variable names of the DeepCopyOp input and outputrespectively. See an example for the type GpuArrayType
(GPUarray) in the file theano/gpuarray/type.py. The versionparameter is what is returned by DeepCopyOp.c_code_cache_version(). Bydefault, it will recompile the c code for each process.
ViewOp
We have an internal Op called ViewOp. It is used for someverification of inplace/view Ops. Its C implementation increments anddecrements Python reference counts, and thus only works with Pythonobjects. If your new type represents Python objects, you should tellViewOp to generate C code when working with this type, asotherwise it will use Python code instead. This is achieved bycalling:
- theano.compile.ops.register_view_op_c_code(YOUR_TYPE_CLASS, THE_C_CODE, version=())
In your C code, you should use %(iname)s and %(oname)s to representthe C variable names of the ViewOp input and outputrespectively. See an example for the type GpuArrayType
(GPUarray) in the file thean/gpuarray/type.py. The versionparameter is what is returned by ViewOp.c_code_cache_version(). Bydefault, it will recompile the c code for each process.
Shape and Shape_i
We have 2 generic Ops, Shape and Shape_i, that return the shape of anyTheano Variable that has a shape attribute (Shape_i returns only one ofthe elements of the shape).
- theano.compile.ops.register_shape_c_code(YOUR_TYPE_CLASS, THE_C_CODE, version=())
- theano.compile.ops.register_shape_i_c_code(YOUR_TYPE_CLASS, THE_C_CODE, CHECK_INPUT, version=())
The C code works as the ViewOp. Shape_i has the additional i
parameterthat you can use with %(i)s
.
In your CHECK_INPUT, you must check that the input has enough dimensions tobe able to access the i-th one.