web2py comes with a Database Abstraction Layer (DAL), an API that maps Python objects into database objects such as queries, tables, and records. The DAL dynamically generates the SQL in real time using the specified dialect for the database back end, so that you do not have to write SQL code or learn different SQL dialects (the term SQL is used generically), and the application will be portable among different types of databases. At the time of this writing, the supported databases are SQLite (which comes with Python and thus web2py), PostgreSQL, MySQL, Oracle, MSSQL, FireBird, DB2, Informix, Ingres, the Google App Engine (SQL and NoSQL) and MongoDB. Experimentally we support more databases. Please check on the web2py web site and mailing list for more recent adapters. Google NoSQL is treated as a particular case in Chapter 13.
The Windows binary distribution works out of the box with SQLite and MySQL. The Mac binary distribution works out of the box with SQLite.
To use any other database back-end, run from the source distribution and install the appropriate driver for the required back end.
``database drivers``:inxx
Once the proper driver is installed, start web2py from source, and it will find the driver. Here is a list of drivers:
+``DAL``:inxx ``SQLite``:inxx ``MySQL``:inxx ``PostgresSQL``:inxx ``Oracle``:inxx ``MSSQL``:inxx ``FireBird``:inxx ``DB2``:inxx ``Informix``:inxx ``Sybase``:inxx ``Teradata``:inxx ``MongoDB``:inxx ``CouchDB``:inxx ``SAPDB``:inxx ``Cubrid``:inxx
+
----------
database | drivers (source)
SQLite | sqlite3 or pysqlite2 or zxJDBC ``zxjdbc``:cite (on Jython)
PostgreSQL | psycopg2 ``psycopg2``:cite or pg8000 ``pg8000``:cite or zxJDBC ``zxjdbc``:cite (on Jython)
MySQL | pymysql ``pymysql``:cite or MySQLdb ``mysqldb``:cite
Oracle | cx_Oracle ``cxoracle``:cite
MSSQL | pyodbc ``pyodbc``:cite
FireBird | kinterbasdb ``kinterbasdb``:cite or fdb or pyodbc
DB2 | pyodbc ``pyodbc``:cite
Informix | informixdb ``informixdb``:cite
Ingres | ingresdbi ``ingresdbi``:cite
+Cubrid | cubriddb ``cubriddb``:cite ``cubrid``:cite
+Sybase | Sybase ``Sybase``:cite
+Teradata | pyodbc ``Teradata``:cite
+SAPDB | sapdb ``SAPDB``:cite
+MongoDB | pymongo ``pymongo``:cite
+IMAP | imaplib ``IMAP``:cite
---------
``sqlite3``, ``pymysql``, ``pg8000``, and ``imaplib`` ship with web2py. Support of MongoDB is experimental. The IMAP option allows to use DAL to access IAMP.
web2py defines the following classes that make up the DAL:
**DAL** represents a database connection. For example:
``sqlite``:inxx
``
db = DAL('sqlite://storage.db')
``:code
``define_table``:inxx
**Table** represents a database table. You do not directly instantiate Table; instead, ``DAL.define_table`` instantiates it.
``
db.define_table('mytable', Field('myfield'))
``:code
The most important methods of a Table are:
``insert``:inxx
``truncate``:inxx
``drop``:inxx
``import_from_csv_file``:inxx
db().select(db.table.ALL, orderby=myorder)
``:code
### Connection strings
``connection strings``:inxx
A connection with the database is established by creating an instance of the DAL object:
``
>>> db = DAL('sqlite://storage.db', pool_size=0)
``:code
``db`` is not a keyword; it is a local variable that stores the connection object ``DAL``. You are free to give it a different name. The constructor of ``DAL`` requires a single argument, the connection string. The connection string is the only web2py code that depends on a specific back-end database. Here are examples of connection strings for specific types of supported back-end databases (in all cases, we assume the database is running from localhost on its default port and is named "test"):
-------------
**SQLite** | ``sqlite://storage.db``
**MySQL** | ``mysql://username:password@localhost/test``
**PostgreSQL** | ``postgres://username:password@localhost/test``
**MSSQL** | ``mssql://username:password@localhost/test``
**FireBird** | ``firebird://username:password@localhost/test``
**Oracle** | ``oracle://username/password@test``
**DB2** | ``db2://username:password@test``
**Ingres** | ``ingres://username:password@localhost/test``
+**Sybase** | ``sybase://username:password@localhost/test``
**Informix** | ``informix://username:password@test``
+**Teradata** | ``teradata://DSN=dsn;UID=user;PWD=pass;DATABASE=database``
+**Cubrid** | ``cubrid://username:password@localhost/test``
+**SAPDB** | ``sapdb://username:password@localhost/test``
+**IMAP** | ``imap://user:password@server:port``
+**MongoDB** | ``mongodb://username:password@localhost/test``
**Google App Engine/SQL** | ``google:sql``
**Google App Engine/NoSQL** | ``google:datastore``
-------------
Notice that in SQLite the database consists of a single file. If it does not exist, it is created. This file is locked every time it is accessed. In the case of MySQL, PostgreSQL, MSSQL, FireBird, Oracle, DB2, Ingres and Informix the database "test" must be created outside web2py. Once the connection is established, web2py will create, alter, and drop tables appropriately.
It is also possible to set the connection string to ``None``. In this case DAL will not connect to any back-end database, but the API can still be accessed for testing. Examples of this will be discussed in Chapter 7.
#### Connection pooling
``connection pooling``:inxx
The second argument of the DAL constructor is the ``pool_size``; it defaults to 0.
As it is rather slow to establish a new database connection for each request, web2py implements a mechanism for connection pooling. Once a connection is established and the page has been served and the transaction completed, the connection is not closed but goes into a pool. When the next http request arrives, web2py tries to obtain a connection from the pool and use that for the new transaction. If there are no available connections in the pool, a new connection is established.
The ``pool_size`` parameter is ignored by SQLite and Google App Engine.
Connections in the pools are shared sequentially among threads, in the sense that they may be used by two different but not simultaneous threads. There is only one pool for each web2py process.
When web2py starts, the pool is always empty. The pool grows up to the minimum between the value of ``pool_size`` and the max number of concurrent requests. This means that if ``pool_size=10`` but our server never receives more than 5 concurrent requests, then the actual pool size will only grow to 5. If ``pool_size=0`` then connection pooling is not used.
For supported back-ends you may also specify if you would like to check against
``
check_reserved=['postgres', 'postgres_nonreserved']
``:code
The following database backends support reserved words checking.
-----
**PostgreSQL** | ``postgres(_nonreserved)``
**MySQL** | ``mysql``
**FireBird** | ``firebird(_nonreserved)``
**MSSQL** | ``mssql``
**Oracle** | ``oracle``
-----
### ``DAL``, ``Table``, ``Field``
The best way to understand the DAL API is to try each function yourself. This can be done interactively via the web2py shell, although ultimately, DAL code goes in the models and controllers.
Start by creating a connection. For the sake of example, you can use SQLite. Nothing in this discussion changes when you change the back-end engine.
``
>>> db = DAL('sqlite://storage.db')
``:code
The database is now connected and the connection is stored in the global variable ``db``.
At any time you can retrieve the connection string.
``_uri``:inxx
``
>>> print db._uri
sqlite://storage.db
``:code
and the database name
``_dbname``:inxx
``
>>> print db._dbname
sqlite
``:code
or even more complex ones using a function:
``
>>> db.define_table('person', Field('name'),
format=lambda r: r.name or 'anonymous')
``:code
The format attribute will be used for two purposes:
- To represent referenced records in select/option drop-downs.
- To set the ``db.othertable.person.represent`` attribute for all fields referencing this table. This means that SQLTABLE will not show references by id but will use the format preferred representation instead.
``Field constructor``:inxx
These are the default values of a Field constructor:
``
Field(name, 'string', length=None, default=None,
required=False, requires='<default>',
ondelete='CASCADE', notnull=False, unique=False,
uploadfield=True, widget=None, label=None, comment=None,
writable=True, readable=True, update=None, authorize=None,
autodelete=False, represent=None, compute=None,
uploadfolder=os.path.join(request.folder,'uploads'),
uploadseparate=None,uploadfs=None)
``:code
Not all of them are relevant for every field. "length" is relevant only for fields of type "string". "uploadfield" and "authorize" are relevant only for fields of type "upload". "ondelete" is relevant only for fields of type "reference" and "upload".
- ``length`` sets the maximum length of a "string", "password" or "upload" field. If ``length`` is not specified a default value is used but the default value is not guaranteed to be backward compatible. ''To avoid unwanted migrations on upgrades, we recommend that you always specify the length for string, password and upload fields.''
- ``default`` sets the default value for the field. The default value is used when performing an insert if a value is not explicitly specified. It is also used to pre-populate forms built from the table using SQLFORM. Note, rather than being a fixed value, the default can instead be a function (including a lambda function) that returns a value of the appropriate type for the field. In that case, the function is called once for each record inserted, even when multiple records are inserted in a single transaction.
- ``required`` tells the DAL that no insert should be allowed on this table if a value for this field is not explicitly specified.
- ``requires`` is a validator or a list of validators. This is not used by the DAL, but it is used by SQLFORM. The default validators for the given types are shown in the following table:
----------
**field type** | **default field validators**
``string`` | ``IS_LENGTH(length)`` default length is 512
``text`` | ``IS_LENGTH(65536)``
``blob`` | ``None``
``boolean`` | ``None``
``integer`` | ``IS_INT_IN_RANGE(-1e100, 1e100)``
``double`` | ``IS_FLOAT_IN_RANGE(-1e100, 1e100)``
``decimal(n,m)`` | ``IS_DECIMAL_IN_RANGE(-1e100, 1e100)``
``date`` | ``IS_DATE()``
``time`` | ``IS_TIME()``
``datetime`` | ``IS_DATETIME()``
``password`` | ``None``
``upload`` | ``None``
``reference <table>`` | ``IS_IN_DB(db,table.field,format)``
``list:string`` | ``None``
``list:integer`` | ``None``
``list:reference <table>`` | ``IS_IN_DB(db,table.field,format,multiple=True)``
+``bigint`` | ``None``
+``big-id`` | ``None``
+``big-reference`` | ``None``
---------
Decimal requires and returns values as ``Decimal`` objects, as defined in the Python ``decimal`` module. SQLite does not handle the ``decimal`` type so internally we treat it as a ``double``. The (n,m) are the number of digits in total and the number of digits after the decimal point respectively.
+The ``big-id`` and, ``big-reference`` are only supported by some of the database engines and are experimental. They are not normally used as field types unless for legacy tables, however, the DAL constructor has a ``bigint_id`` argument that when set to ``True`` makes the ``id`` fields and ``reference`` fields ``big-id`` and ``big-referece`` respectively.
+
The ``list:`` fields are special because they are designed to take advantage of certain denormalization features on NoSQL (in the case of Google App Engine NoSQL, the field types ``ListProperty`` and ``StringListProperty``) and back-port them all the other supported relational databases. On relational databases lists are stored as a ``text`` field. The items are separated by a ``|`` and each ``|`` in string item is escaped as a ``||``. They are discussed in their own section.
-------
Notice that ``requires=...`` is enforced at the level of forms, ``required=True`` is enforced at the level of the DAL (insert), while ``notnull``, ``unique`` and ``ondelete`` are enforced at the level of the database. While they sometimes may seem redundant, it is important to maintain the distinction when programming with the DAL.
-------
``ondelete``:inxx
- ``ondelete`` translates into the "ON DELETE" SQL statement. By default it is set to "CASCADE". This tells the database that when it deletes a record, it should also delete all records that refer to it. To disable this feature, set ``ondelete`` to "NO ACTION" or "SET NULL".
- ``notnull=True`` translates into the "NOT NULL" SQL statement. It prevents the database from inserting null values for the field.
- ``unique=True`` translates into the "UNIQUE" SQL statement and it makes sure that values of this field are unique within the table. It is enforced at the database level.
- ``uploadfield`` applies only to fields of type "upload". A field of type "upload" stores the name of a file saved somewhere else, by default on the filesystem under the application "uploads/" folder. If ``uploadfield`` is set, then the file is stored in a blob field within the same table and the value of ``uploadfield`` is the name of the blob field. This will be discussed in more detail later in the context of SQLFORM.
- ``uploadfolder`` defaults to the application's "uploads/" folder. If set to a different path, files will uploaded to a different folder. For example, uploadfolder=os.path.join(request.folder,'static/temp') will upload files to the web2py/applications/myapp/static/temp folder.
- ``uploadseparate`` if set to True will upload files under different subfolders of the ''uploadfolder'' folder. This is optimized to avoid too many files under the same folder/subfolder. ATTENTION: You cannot change the value of ``uploadseparate`` from True to False without breaking the system. web2py either uses the separate subfolders or it does not. Changing the behavior after files have been uploaded will prevent web2py from being able to retrieve those files. If this happens it is possible to move files and fix the problem but this is not described here.
+- ``uploadfs`` allows you specify a different filessystem where to upload files, including an Amazon S3 storage or a remote FTP storage. This option requires PyFileSystem installed. ``uploadfs`` must point to ``PyFileSystem``. ``PyFileSystem``:inxx ``uploadfs``:idxx
- ``widget`` must be one of the available widget objects, including custom widgets, for example: ``SQLFORM.widgets.string.widget``. A list of available widgets will be discussed later. Each field type has a default widget.
- ``label`` is a string (or something that can be serialized to a string) that contains the label to be used for this field in autogenerated forms.
- ``comment`` is a string (or something that can be serialized to a string) that contains a comment associated with this field, and will be displayed to the right of the input field in the autogenerated forms.
- ``writable`` if a field is writable, it can be edited in autogenerated create and update forms.
- ``readable`` if a field is readable, it will be visible in readonly forms. If a field is neither readable nor writable, it will not be displayed in create and update forms.
- ``update`` contains the default value for this field when the record is updated.
- ``compute`` is an optional function. If a record is inserted or updated, the compute function will be executed and the field will be populated with the function result. The record is passed to the compute function as a ``dict``, and the dict will not include the current value of that, or any other compute field.
- ``authorize`` can be used to require access control on the corresponding field, for "upload" fields only. It will be discussed more in detail in the context of Authentication and Authorization.
- ``autodelete`` determines if the corresponding uploaded file should be deleted when the record referencing the file is deleted. For "upload" fields only.
- ``represent`` can be None or can point to a function that takes a field value and returns an alternate representation for the field value. Examples:
``
db.mytable.name.represent = lambda name,row: name.capitalize()
db.mytable.other_id.represent = lambda id,row: row.myfield
db.mytable.some_uploadfield.represent = lambda value,row: \
A('get it', _href=URL('download', args=value))
``:code
``blob``:inxx
"blob" fields are also special. By default, binary data is encoded in base64 before being stored into the actual database field, and it is decoded when extracted. This has the negative effect of using 25% more storage space than necessary in blob fields, but has two advantages. On average it reduces the amount of data communicated between web2py and the database server, and it makes the communication independent of back-end-specific escaping conventions.
The DAL allows you to explicitly issue SQL statements.
>>> print db.executesql('SELECT * FROM person;')
[(1, u'Massimo'), (2, u'Massimo')]
``:code
In this case, the return values are not parsed or transformed by the DAL, and the format depends on the specific database driver. This usage with selects is normally not needed, but it is more common with indexes.
``executesql`` takes two optional arguments: ``placeholders`` and ``as_dict``
``placeholders`` is an optional
sequence of values to be substituted in
or, if supported by the DB driver, a dictionary with keys
matching named placeholders in your SQL.
If ``as_dict`` is set to True,
and the results cursor returned by the DB driver will be
converted to a sequence of dictionaries keyed with the db
field names. Results returned with ``as_dict = True ``are
the same as those returned when applying **.as_list()** to a normal select.
``
[{field1: value1, field2: value2}, {field1: value1b, field2: value2b}]
``:code
+executesql have an optional ``fields`` argument.
+If not None, the
+results cursor returned by the DB driver will be converted to a
+DAL Rows object using the ``db._adapter.parse()`` method. This requires
+specifying the "fields" argument as a list of DAL Field objects
+that match the fields returned from the DB. The Field objects should
+be part of one or more Table objects defined on the DAL object.
+
+The ``fields`` list can include one or more DAL Table objects in addition
+to or instead of including Field objects, or it can be just a single
+table (not in a list). In that case, the Field objects will be
+extracted from the table(s).
+
+The field names will be extracted from the Field objects, or optionally,
+a list of field names can be provided (in tablename.fieldname format)
+via the ``colnames`` argument. Note, the fields and colnames must be in
+the same order as the fields in the results cursor returned from the DB.
+
#### ``_lastsql``
Whether SQL was executed manually using executesql or was SQL generated by the DAL, you can always find the SQL code in ``db._lastsql``. This is useful for debugging purposes:
``_lastdb``:inxx
``
>>> rows = db().select(db.person.ALL)
>>> print db._lastsql
SELECT person.id, person.name FROM person;
``:code
-------
web2py never generates queries using the "*" operator. web2py is always explicit when selecting fields.
-------
### ``drop``
Finally, you can drop tables and all data will be lost:
``drop``:inxx
db.define_table('account',
Field('accnum','integer'),
Field('acctype'),
Field('accdesc'),
primarykey=['accnum','acctype'],
migrate=False)
``:code
- ``primarykey`` is a list of the field names that make up the primary key.
- All primarykey fields have a ``NOT NULL`` set even if not specified.
- Keyed table can only refer are to other keyed tables.
- Referenceing fields must use the ``reference tablename.fieldname`` format.
- The ``update_record`` function is not available for Rows of keyed tables.
-------
Note that currently this is only available for DB2, MS-SQL, Ingres and Informix, but others can be easily added.
-------
At the time of writing, we cannot guarantee that the ``primarykey`` attribute works with every existing legacy table and every supported database backend.
For simplicity, we recommend, if possible, creating a database view that has an auto-increment id field.
+
### Distributed transaction
``distributed transactions``:inxx
------
At the time of writing this feature is only supported
by PostgreSQL, MySQL and Firebird, since they expose API for two-phase commits.
------
Assuming you have two (or more) connections to distinct PostgreSQL databases, for example:
``
db_a = DAL('postgres://...')
db_b = DAL('postgres://...')
``:code
In your models or controllers, you can commit them concurrently with:
``
DAL.distributed_transaction_commit(db_a, db_b)
``:code
On failure, this function rolls back and raises an ``Exception``.
In controllers, when one action returns, if you have two distinct connections and you do not call the above function, web2py commits them separately. This means there is a possibility that one of the commits succeeds and one fails. The distributed transaction prevents this from happening.
### More on uploads
Consider the following model:
``
+>>> db.define_table('myfile',
+ Field('image', 'upload', default='path/'))
``:code
+In the case of an 'upload' field, the default value can optionally be set to a path (an absolute path or a path relative to the current app folder) and the default image will be set to a copy of the file at the path. A new copy is made for each new record that does not specify an image.
+
Normally an insert is handled automatically via a SQLFORM or a crud form (which is a SQLFORM) but occasionally you already have the file on the filesystem and want to upload it programmatically. This can be done in this way:
``
>>> stream = open(filename, 'rb')
>>> db.myfile.insert(image=db.myfile.image.store(stream, filename))
``:code
+It is also possible to insert a file in a simpler way and have the insert method call store automatically:
+
+``
+>>> stream = open(filename, 'rb')
+>>> db.myfile.insert(image=stream)
+``:code
+
+In this case the filename is obtained from the stream object if available.
+
The ``store`` method of the upload field object takes a file stream and a filename. It uses the filename to determine the extension (type) of the file, creates a new temp name for the file (according to web2py upload mechanism) and loads the file content in this new temp file (under the uploads folder unless specified otherwise). It returns the new temp name, which is then stored in the ``image`` field of the ``db.myfile`` table.
Note, if the file is to be stored in an associated blob field rather than the file system, the ``store()`` method will not insert the file in the blob field (because ``store()`` is called before the insert), so the file must be explicitly inserted into the blob field:
``
>>> db.define_table('myfile',
Field('image', 'upload', uploadfield='image_file'),
Field('image_file', 'blob'))
>>> stream = open(filename, 'rb')
>>> db.myfile.insert(image=db.myfile.image.store(stream, filename),
image_file=stream.read())
``:code
+
The opposite of ``.store`` is ``.retrieve``:
``
>>> row = db(db.myfile).select().first()
>>> (filename, stream) = db.myfile.image.retrieve(row.image)
>>> import shutil
>>> shutil.copyfileobj(stream,open(filename,'wb'))
``
### ``Query``, ``Set``, ``Rows``
Let's consider again the table defined (and dropped) previously and insert three records:
``
>>> db.define_table('person', Field('name'))
>>> db.person.insert(name="Alex")
1
>>> db.person.insert(name="Bob")
2
>>> db.person.insert(name="Carl")
3
which is equivalent to
db(db.mytable.id==id).update(myfield='somevalue')
``:code
and it updates an existing record with field values specified by the dictionary on the right hand side.
#### Fetching a ``Row``
Yet another convenient syntax is the following:
``
record = db.mytable(id)
record = db.mytable(db.mytable.id==id)
record = db.mytable(id,myfield='somevalue')
``:code
Apparently similar to ``db.mytable[id]`` the above syntax is more flexible and safer. First of all it checks whether ``id`` is an int (or ``str(id)`` is an int) and returns ``None`` if not (it never raises an exception). It also allows to specify multiple conditions that the record must meet. If they are not met, it also returns ``None``.
#### Recursive ``select``s
``recursive selects``:inxx
Consider the previous table person and a new table "thing" referencing a "person":
``
>>> db.define_table('thing', Field('name'), Field('owner','reference person'))
``:code
and a simple select from this table:
``
>>> things = db(db.thing).select()
``:code
which is equivalent to
``
>>> things = db(db.thing._id>0).select()
``:code
where ``._id`` is a reference to the primary key of the table. Normally ``db.thing._id`` is the same as ``db.thing.id`` and we will assume that in most of this book. ``_id``:inxx
For each Row of things it is possible to fetch not just fields from the selected table (thing) but also from linked tables (recursively):
``
>>> for thing in things: print thing.name, thing.owner.name
``:code
Here ``thing.owner.name`` requires one database select for each thing in things and it is therefore inefficient. We suggest using joins whenever possible instead of recursive selects, nevertheless this is convenient and practical when accessing individual records.
You can also do it backwards, by selecting the things referenced by a person:
``
person = db.person(id)
+for thing in person.thing.select(orderby=db.thing.name):
+ print person.name, 'owns', thing.name
``:code
In this last expressions ``person.thing`` is a shortcut for
``
db(db.thing.owner==person.id)
``:code
i.e. the Set of ``thing``s referenced by the current ``person``. This syntax breaks down if the referencing table has multiple references to the referenced table. In this case one needs to be more explicit and use a full Query.
#### Serializing ``Rows`` in views
Given the following action containing a query
``SQLTABLE``:inxx
``
def index()
return dict(rows = db(query).select())
``:code
The result of a select can be displayed in a view with the following syntax:
``
{{extend 'layout.html'}}
<h1>Records</h1>
{{=rows}}
``:code
Which is equivalent to:
Due to Python restrictions in overloading "``and``" and "``or``" operators, thes
It is also possible to build queries using in-place logical operators:
``
>>> query = db.person.name!='Alex'
>>> query &= db.person.id>3
>>> query |= db.person.name=='John'
``
#### ``count``, ``isempty``, ``delete``, ``update``
You can count records in a set:
``count``:inxx ``isempty``:inxx
``
>>> print db(db.person.id > 0).count()
3
``:code
Notice that ``count`` takes an optional ``distinct`` argument which defaults to False, and it works very much like the same argument for ``select``. ``count`` has also a ``cache`` argument that works very much like the equivalent argument of the ``select`` method.
Sometimes you may need to check is a table is empty. A more efficient way than counting is using the ``isempty`` method:
``
>>> print db(db.person.id > 0).isempty()
False
``:code
or equivalently:
``
>>> print db(db.person).isempty()
False
``:code
You can delete records in a set:
``delete``:inxx
``
>>> db(db.person.id > 3).delete()
In this case ``row.total_price`` is not a value but a function. The function tak
The lazy field in the example above allows one to compute the total price for each ``item``:
``
>>> for row in db(db.item).select(): print row.total_price()
``
And it also allows to pass an optional ``discount`` percentage (15%):
``
>>> for row in db(db.item).select(): print row.total_price(15)
``
------
Mind that virtual fields do not have the same attributes as the other fields (default, readable, requires, etc) and they do not appear in the list of ``db.table.fields`` and are not visualized by default in tables (TABLE) and grids (SQLFORM.grid, SQLFORM.smartgrid).
------
### One to many relation
``one to many``:inxx
To illustrate how to implement one to many relations with the web2py DAL, define another table "thing" that refers to the table "person" which we redefine here:
``
>>> db.define_table('person',
Field('name'),
format='%(name)s')
>>> db.define_table('thing',
Field('name'),
Field('owner', 'reference person'),
format='%(name)s')
``:code
Table "thing" has two fields, the name of the thing and the owner of the thing. When a field type is another table, it is intended that the field reference the other table by its id. In fact, you can print the actual type value and get:
``
>>> print db.thing.owner.type
reference person
``:code
Now, insert three things, two owned by Alex and one by Bob:
``
>>> db.thing.insert(name='Boat', owner=1)
1
>>> db.thing.insert(name='Chair', owner=1)
2
>>> db.thing.insert(name='Shoes', owner=2)
3
``:code
You can select as you did for any other table:
``
>>> for row in db(db.thing.owner==1).select():
print row.name
+Boat
+Chair
``:code
Because a thing has a reference to a person, a person can have many things, so a record of table person now acquires a new attribute thing, which is a Set, that defines the things of that person. This allows looping over all persons and fetching their things easily:
``referencing``:inxx
``
>>> for person in db().select(db.person.ALL):
print person.name
+ for thing in person.thing.select():
+ print ' ', thing.name
Alex
+ Boat
+ Chair
Bob
Shoes
Carl
``:code
#### Inner joins
Another way to achieve a similar result is by using a join, specifically an INNER JOIN. web2py performs joins automatically and transparently when the query links two or more tables as in the following example:
``Rows``:inxx ``inner join``:inxx ``join``:inxx
``
>>> rows = db(db.person.id==db.thing.owner).select()
>>> for row in rows:
+ print row.person.name, 'has', row.thing.name
+Alex has Boat
+Alex has Chair
+Bob has Shoes
``:code
Observe that web2py did a join, so the rows now contain two records, one from each table, linked together. Because the two records may have fields with conflicting names, you need to specify the table when extracting a field value from a row. This means that while before you could do:
``
row.name
``:code
and it was obvious whether this was the name of a person or a thing, in the result of a join you have to be more explicit and say:
``
row.person.name
``:code
or:
``
row.thing.name
``:code
There is an alterantive syntax for INNER JOINS:
``
>>> rows = db(db.person).select(join=db.thing.on(db.person.id==db.thing.owner))
>>> for row in rows:
+ print row.person.name, 'has', row.thing.name
+Alex has Boat
+Alex has Chair
+Bob has Shoes
``:code
While the output is the same, the generated SQL in the two cases can be different. The latter syntax removes possible ambiguities when the same table is joined twice and aliased:
``
>>> db.define_table('thing',
Field('name'),
Field('owner1','reference person'),
Field('owner2','reference person'))
>>> rows = db(db.person).select(
+ join=[db.person.with_alias('owner1').on(db.person.id==db.thing.owner1).
+ db.person.with_alias('owner2').on(db.person.id==db.thing.owner2)])
``
The value of ``join`` can be list of ``db.table.on(...)`` to join.
#### Left outer join
Notice that Carl did not appear in the list above because he has no things. If you intend to select on persons (whether they have things or not) and their things (if they have any), then you need to perform a LEFT OUTER JOIN. This is done using the argument "left" of the select command. Here is an example:
``Rows``:inxx ``left outer join``:inxx ``outer join``:inxx
``
>>> rows=db().select(
+ db.person.ALL, db.thing.ALL,
+ left=db.thing.on(db.person.id==db.thing.owner))
>>> for row in rows:
+ print row.person.name, 'has', row.thing.name
+Alex has Boat
+Alex has Chair
+Bob has Shoes
Carl has None
``:code
where:
``
left = db.thing.on(...)
``:code
does the left join query. Here the argument of ``db.thing.on`` is the condition required for the join (the same used above for the inner join). In the case of a left join, it is necessary to be explicit about which fields to select.
Multiple left joins can be combined by passing a list or tuple of ``db.mytable.on(...)`` to the ``left`` attribute.
#### Grouping and counting
When doing joins, sometimes you want to group rows according to certain criteria and count them. For example, count the number of things owned by every person. web2py allows this as well. First, you need a count operator. Second, you want to join the person table with the thing table by owner. Third, you want to select all rows (person + thing), group them by person, and count them while grouping:
``grouping``:inxx
``
>>> count = db.person.id.count()
>>> for row in db(db.person.id==db.thing.owner).select(
db.person.name, count, groupby=db.person.name):
print row.person.name, row[count]
Alex 2
Bob 1
``:code
Notice the count operator (which is built-in) is used as a field. The only issue here is in how to retrieve the information. Each row clearly contains a person and the count, but the count is not a field of a person nor is it a table. So where does it go? It goes into the storage object representing the record with a key equal to the query expression itself. The count method of the Field object has an optional ``distinct`` argument. When set to ``True`` it specifies that only distinct values of the field in question are to be counted.
### Many to many
``many-to-many``:inxx
In the previous examples, we allowed a thing to have one owner but one person could have many things. What if Boat was owned by Alex and Curt? This requires a many-to-many relation, and it is realized via an intermediate table that links a person to a thing via an ownership relation.
Here is how to do it:
``
>>> db.define_table('person',
Field('name'))
>>> db.define_table('thing',
Field('name'))
>>> db.define_table('ownership',
Field('person', 'reference person'),
Field('thing', 'reference thing'))
``:code
the existing ownership relationship can now be rewritten as:
``
+>>> db.ownership.insert(person=1, thing=1) # Alex owns Boat
+>>> db.ownership.insert(person=1, thing=2) # Alex owns Chair
+>>> db.ownership.insert(person=2, thing=3) # Bob owns Shoes
``:code
Now you can add the new relation that Curt co-owns Boat:
``
>>> db.ownership.insert(person=3, thing=1) # Curt owns Boat too
``:code
Because you now have a three-way relation between tables, it may be convenient to define a new set on which to perform operations:
``
>>> persons_and_things = db(
(db.person.id==db.ownership.person) \
& (db.thing.id==db.ownership.thing))
``:code
Now it is easy to select all persons and their things from the new Set:
``
+>>> for row in persons_and_things.select():
+ print row.person.name, row.thing.name
+Alex Boat
+Alex Chair
+Bob Shoes
+Curt Boat
``:code
Similarly, you can search for all things owned by Alex:
``
+>>> for row in persons_and_things(db.person.name=='Alex').select():
+ print row.thing.name
+Boat
+Chair
``:code
and all owners of Boat:
``
>>> for row in persons_and_things(db.thing.name=='Boat').select():
print row.person.name
Alex
Curt
``:code
A lighter alternative to Many 2 Many relations is tagging. Tagging is discussed in the context of the ``IS_IN_DB`` validator. Tagging works even on database backends that do not support JOINs like the Google App Engine NoSQL.
### Many to many, ``list:<type>``, and ``contains``
``list:string``:inxx
``list:integer``:inxx
``list:reference``:inxx
``contains``:inxx
``multiple``:inxx
``tags``:inxx
web2py provides the following special field types:
``
list:string
list:integer
Let's define another table "log" to store security events, their event_time and
Field('event_time', 'datetime'),
Field('severity', 'integer'))
``:code
As before, insert a few events, a "port scan", an "xss injection" and an "unauthorized login".
For the sake of the example, you can log events with the same event_time but with different severities (1, 2, 3 respectively).
``
>>> import datetime
>>> now = datetime.datetime.now()
>>> print db.log.insert(
event='port scan', event_time=now, severity=1)
1
>>> print db.log.insert(
event='xss injection', event_time=now, severity=2)
2
>>> print db.log.insert(
event='unauthorized login', event_time=now, severity=3)
3
``:code
+#### ``like``, ``regexp``, ``startswith``, ``contains``, ``upper``, ``lower``
+
+``like``:inxx ``startswith``:inxx ``regexp``:inxx
``contains``:inxx ``upper``:inxx ``lower``:inxx
Fields have a like operator that you can use to match strings:
``
>>> for row in db(db.log.event.like('port%')).select():
print row.event
port scan
``:code
Here "port%" indicates a string starting with "port". The percent sign character, "%", is a wild-card character that means "any sequence of characters".
+The like operator is case insisite but it can be made case sensitive with
+
+``
+db.mytable.myfield.like('value',case_sensitive=True)
+``:code
+
+
web2py also provides some shortcuts:
``
db.mytable.myfield.startswith('value')
db.mytable.myfield.contains('value')
``:code
which are equivalent respectively to
``
db.mytable.myfield.like('value%')
db.mytable.myfield.like('%value%')
``:code
Notice that ``contains`` has a special meaning for ``list:<type>`` fields and it was discussed in a previous section.
The ``contains`` method can also be passed a list of values and an optional boolean argument ``all`` to search for records that contain all values:
``
db.mytable.myfield.contains(['value1','value2'], all=True)
``
or any value from the list
``
db.mytable.myfield.contains(['value1','value2'], all=false)
``
+There is a also a ``regexp`` method that works like the ``like`` method but allows regular expression syntax for the look-up expression. It is only supported by PostgreSQL and SQLite.
The ``upper`` and ``lower`` methods allow you to convert the value of the field to upper or lower case, and you can also combine them with the like operator:
``upper``:inxx ``lower``:inxx
``
>>> for row in db(db.log.event.upper().like('PORT%')).select():
print row.event
port scan
``:code
#### ``year``, ``month``, ``day``, ``hour``, ``minutes``, ``seconds``
``hour``:inxx ``minutes``:inxx ``seconds``:inxx ``day``:inxx ``month``:inxx ``year``:inxx
The date and datetime fields have day, month and year methods. The datetime and time fields have hour, minutes and seconds methods. Here is an example:
``
>>> for row in db(db.log.event_time.year()==2009).select():
print row.event
port scan
xss injection
The SQL IN operator is realized via the belongs method which returns true when t
``belongs``:inxx
``
>>> for row in db(db.log.severity.belongs((1, 2))).select():
print row.event
port scan
xss injection
``:code
The DAL also allows a nested select as the argument of the belongs operator. The only caveat is that the nested select has to be a ``_select``, not a ``select``, and only one field has to be selected explicitly, the one that defines the set.
``nested select``:inxx
``
>>> bad_days = db(db.log.severity==3)._select(db.log.event_time)
>>> for row in db(db.log.event_time.belongs(bad_days)).select():
print row.event
port scan
xss injection
unauthorized login
``:code
+In those cases where a nested select is required and the loop-up field is a reference we can also use a query as argument. For example:
+
+``
+db.define_table('person',Field('name'))
+db.define_table('thing',Field('owner'),Field('owner','reference thing'))
+db(db.thing.owner.belongs(db.person.name=='Johnathan')).select()
+``:cite
+
+In this case it is obvious that the next select only needs the field referenced by the ``db.thing.owner` field so we do not need the more verbose ``_select`` notation.
+
+#### ``sum``, ``avg``, ``min``, ``max`` and ``len``
``sum``:inxx ``avg``:inxx ``min``:inxx ``max``:inxx
Previously, you have used the count operator to count records. Similarly, you can use the sum operator to add (sum) the values of a specific field from a group of records. As in the case of count, the result of a sum is retrieved via the store object:
``
>>> sum = db.log.severity.sum()
>>> print db().select(sum).first()[sum]
6
``:code
You can also use ``avg``, ``min``, and ``max`` to the average, mininum, and maximum value respectively for the selected records. For example:
``
>>> max = db.log.severity.max()
>>> print db().select(max).first()[max]
3
``:code
``.len()`` computes the length of a string, text or boolean fields.
Expressions can be combined to form more complex expressions. For example here we are computing the sum of the length of all the severity strings in the logs, increased of one:
``
>>> sum = (db.log.severity.len()+1).sum()
>>> print db().select(sum).first()[sum]
``:code
#### Substrings
One can build an expression to refer to a substring. For example, we can group things whose name starts with the same three characters and select only one from each group:
``
db(db.thing).select(dictinct = db.thing.name[:3])
``:code
#### Default values with ``coalesce`` and ``coalesce_zero``
There are times when you need to pull a value from database but also need a default values if the value for a record is set to NULL. In SQL there is a keyword, ``COALESCE``, for this. web2py has an equivalent ``coalesce`` method:
``
>>> db.define_table('sysuser',Field('username'),Field('fullname'))
>>> db.sysuser.insert(username='max',fullname='Max Power')
>>> db.sysuser.insert(username='tim',fullname=None)
print db(db.sysuser).select(db.sysuser.fullname.coalesce(db.sysuser.username))
"COALESCE(sysuser.fullname,sysuser.username)"
Max Power
tim
``
Other times you need to compute a mathematical expression but some fields have a value set to None while it should be zero.
``coalesce_zero`` comes to the rescue by defaulting None to zero in the query:
And finally, here is ``_update`` ``_update``:inxx
>>> print db(db.person.name=='Alex')._update()
UPDATE person SET WHERE person.name='Alex';
``:code
-----
Moreover you can always use ``db._lastsql`` to return the most recent
SQL code, whether it was executed manually using executesql or was SQL
generated by the DAL.
-----
### Exporting and importing data
``export``:inxx ``import``:inxx
#### CSV (one Table at a time)
When a DALRows object is converted to a string it is automatically
serialized in CSV:
``csv``:inxx
``
>>> rows = db(db.person.id==db.thing.owner).select()
>>> print rows
+person.id,person.name,thing.id,thing.name,thing.owner
+1,Alex,1,Boat,1
+1,Alex,2,Chair,1
+2,Bob,3,Shoes,2
``:code
You can serialize a single table in CSV and store it in a file "test.csv":
``
>>> open('test.csv', 'w').write(str(db(db.person.id).select()))
``:code
and you can easily read it back with:
``
>>> db.person.import_from_csv_file(open('test.csv', 'r'))
``:code
When importing, web2py looks for the field names in the CSV header. In this example, it finds two columns: "person.id" and "person.name". It ignores the "person." prefix, and it ignores the "id" fields. Then all records are appended and assigned new ids. Both of these operations can be performed via the appadmin web interface.
#### CSV (all tables at once)
In web2py, you can backup/restore an entire database with two commands:
To export:
``
Two tables are separated ``\r\n\r\n``. The file ends with the line
END
``:code
The file does not include uploaded files if these are not stored in the database. In any case it is easy enough to zip the "uploads" folder separately.
When importing, the new records will be appended to the database if it is not empty. In general the new imported records will not have the same record id as the original (saved) records but web2py will restore references so they are not broken, even if the id values may change.
If a table contains a field called
"uuid", this field will be used to identify duplicates. Also, if an
imported record has the same "uuid" as an existing record, the
previous record will be updated.
#### CSV and remote database synchronization
Consider the following model:
``
db = DAL('sqlite:memory:')
db.define_table('person',
Field('name'),
format='%(name)s')
db.define_table('thing',
Field('owner', 'reference person'),
Field('name'),
format='%(name)s')
if not db(db.person).count():
id = db.person.insert(name="Massimo")
db.thing.insert(owner=id, name="Chair")
``:code
Each record is identified by an ID and referenced by that ID. If you
have two copies of the database used by distinct web2py installations,
the ID is unique only within each database and not across the databases.
This is a problem when merging records from different databases.
In order to make a record uniquely identifiable across databases, they
must:
- have a unique id (UUID),
- have an event_time (to figure out which one is more recent if multiple copies),
- reference the UUID instead of the id.
This can be achieved without modifying web2py. Here is what to do:
**1.** Change the above model into:
``
db.define_table('person',
Field('uuid', length=64, default=lambda:str(uuid.uuid4())),
Field('modified_on', 'datetime', default=now),
Field('name'),
format='%(name)s')
db.define_table('thing',
Field('uuid', length=64, default=lambda:str(uuid.uuid4())),
Field('modified_on', 'datetime', default=now),
Field('owner', length=64),
Field('name'),
format='%(name)s')
db.thing.owner.requires = IS_IN_DB(db,'person.uuid','%(name)s')
if not db(db.person.id).count():
id = uuid.uuid4()
db.person.insert(name="Massimo", uuid=id)
db.thing.insert(owner=id, name="Chair")
``:code
-------
Note, in the above table definitions, the default value for the two 'uuid' fields is set to a lambda function, which returns a UUID (converted to a string). The lambda function is called once for each record inserted, ensuring that each record gets a unique UUID, even if multiple records are inserted in a single transaction.
-------
**2.** Create a controller action to export the database:
``
def export():
s = StringIO.StringIO()
db.export_to_csv_file(s)
response.headers['Content-Type'] = 'text/csv'
return s.getvalue()
``:code
**3.** Create a controller action to import a saved copy of the other database and sync records:
``
def import_and_sync():
specific for this example.
``XML-RPC``:inxx
Alternatively, you can use XML-RPC to export/import the file.
If the records reference uploaded files, you also need to export/import the content of the uploads folder. Notice that files therein are already labeled by UUIDs so you do not need to worry about naming conflicts and references.
#### HTML and XML (one Table at a time)
``DALRows objects``:inxx
DALRows objects also have an ``xml`` method (like helpers) that serializes it to XML/HTML:
``HTML``:inxx
``
>>> rows = db(db.person.id > 0).select()
>>> print rows.xml()
<table>
<thead>
<tr>
<th>person.id</th>
<th>person.name</th>
+ <th>thing.id</th>
+ <th>thing.name</th>
+ <th>thing.owner</th>
</tr>
</thead>
<tbody>
<tr class="even">
<td>1</td>
<td>Alex</td>
<td>1</td>
<td>Boat</td>
<td>1</td>
</tr>
...
</tbody>
</table>
``:code