The pattern.db module contains wrappers for databases (SQLite, MySQL), Unicode CSV files and Python's datetime. It offers a convenient way to work with tabular data, for example retrieved with the pattern.web module.
It can be used by itself or with other pattern modules: web | db | en | search | vector | graph.
A database is a collection of tables. A table has rows of data with a specific data type (e.g., string, float) for each field or column. A database engine provides an interface to the database, using SQL statements (Structured Query Language). Python 2.5+ comes bundled with the SQLite engine. The MySQL engine requires the MySQL-Python bindings. Note that a 32-bit Python requires a 32-bit MySQL.
The Database()
constructor creates (if necessary) and returns an SQLITE
or MYSQL
database. With SQLITE
, it will create a file with the given name in the current folder.
db = Database(
name,
host = 'localhost',
port = 3306,
username = 'root',
password = '',
type = SQLITE
)
db.type # SQLITE | MYSQL
db.name # Database name.
db.host # Database host (MySQL).
db.port # Database port (MySQL).
db.username # Database username (MySQL).
db.password # Database password (MySQL).
db.tables # Dictionary of (name, Table)-items.
db.relations # List of relations, see Database.link().
db.query # Last executed SQL query.
db.connected # True after Database.connect().
db.connect() # Happens automatically.
db.disconnect()
db.create(table, fields=[])
db.remove(table)
db.link(table1, field1, table2, field2, join=LEFT)
db.execute(SQL, commit=False)
db.commit()
db.escape(value) # "a cat's tail" => "'a cat\'s tail'"
Database.execute()
returns an iterator of rows for the given SQL query.Database.commit()
commits the changes of pending INSERT
, UPDATE
, DELETE
queries.Database.escape()
safely quotes and escapes field values.Database.create()
creates a new table in the database, It takes a table name and a list of row fields, where each field is defined with the field()
function. Each field has a name
(a-z + underscores) and a type
, with an optional default
value for new rows. The pk()
function can be used for primary keys.
field(name, type=STRING, default=None, index=False, optional=True)
pk(name='id') # field('id', INTEGER, index=PRIMARY, optional=False)
*Type* *Value* *Example* `STRING` `str`, `unicode` (1-255 characters) `u'Schrödinger'` `INTEGER` `int` `42` `FLOAT` `float` `3.14159` `TEXT` `str`, `unicode` `open('file.txt').read() ` `BLOB` `str` (binary, e.g., PDF, PNG) `db.binary(open('img.jpg',` `'rb').read())` `BOOLEAN` `bool` `True`, `False` `DATE` `Date` `date('1999-12-31 23:59:59')`
A STRING
field can contain up to a 100 characters. The length (1-255) can be changed by calling STRING
as a function, e.g., type=STRING(255)
. For longer strings, use TEXT
. The default value for a DATE
field is NOW
.
With index=True
, the field is indexed for faster search. The index can also be set to UNIQUE
(no duplicates) or PRIMARY
. A table must have a primary key field that uniquely identifies each row (i.e., an id). Integer primary keys are auto-numbered, there is no need to set the value manually in new rows.
With optional=True
, the field is allowed to contain None
.
>>> from pattern.db import Database, field, pk, STRING, BOOLEAN, DATE, NOW
>>>
>>> db = Database('my_stuff')
>>> db.create('pets', fields=(
>>> pk(),
>>> field('name', STRING(80), index=True),
>>> field('type', STRING(20)),
>>> field('tail', BOOLEAN),
>>> field('date_birth', DATE, default=None),
>>> field('date_created', DATE, default=NOW)
>>> ))
>>> db.pets.append(name=u'Schrödinger', type='cat', tail=True)
>>> print db.pets.rows()[0]
(1, u'Schrödinger', u'cat', True, None, Date('2013-12-11 10:09:08'))
Database.create()
can also take a Table.xml
or Query.xml
. It creates a new table and copies the row data in the given XML string. An optional name
parameter can be used to rename the new table. In Query.xml
, a field name may contain a period. It will be replaced with an underscore (e.g., pets.name → pets_name). Alternatively, an alias can be defined in the Query.aliases
dictionary.
A Table
is a list of rows, with one or more fields (i.e., table columns) of a certain type (i.e., string or number). A new table can be created with Database.create()
. A TableError
is raised if a table with the given name exists. An existing table can be retrieved with Database.tables\[name\]
, Database\[name\]
or Database.<name>
.
table = Database.tables[name]
table.db # Parent Database.
table.name # Table name (a-z + underscores).
table.fields # List of field names (i.e., columns).
table.schema # Dictionary of (field, Schema)-items.
table.default # Dictionary of (field, value)-items for new rows.
table.pk # Primary key field name.
table.count() # Total number of rows (len(table) also works).
table.rows() # List of rows, each a tuple of fields.
table.record(row) # Dictionary of (field, value)-items for given row.
table.append(fields={}, commit=True)
table.update(id, fields={}, commit=True)
table.remove(id, commit=True)
table.filter(*args, **kwargs)
table.search(*args, **kwargs)
table.xml # XML string with the table schema and rows.
table.datasheet # Datasheet object (see below).
Table.rows()
returns a list of all rows. To iterate rows memory-efficiently, use iter(``Table)
.Table.append()
, update()
and remove()
modify the table contents.commit=False
, changes are only committed after Database.commit()
(= faster in batch).Table.filter()
returns a subset of rows with a subset of fields.table.filter('name',
type='cat')
.The Table.schema
dictionary contains field name → Schema
items.
schema = Table.schema[fieldname]
schema.name # Field name.
schema.type # STRING, INTEGER, FLOAT, TEXT, BLOB, BOOLEAN, DATE
schema.length # STRING field length.
schema.default # Default value.
schema.index # PRIMARY | UNIQUE | True | False
schema.optional # True or False.
>>> from pattern.db import Database
>>>
>>> db = Database('my_stuff')
>>>
>>> print db.pets.fields
>>> print db.pets.schema['name'].type
>>> print db.pets.schema['name'].length
['id', 'name', 'tail', 'date_birth', 'date_created']
STRING
80
Table.append()
adds a new row with the given field values. It returns the row id, if the table has a primary key generated with pk()
. Field values can be given as optional parameters, a dictionary or a tuple. Field values for a BLOB
field must be wrapped in Database.binary()
.
>>> db.pets.append(name=u'Schrödinger', date_birth=date('2009-08-12'))
>>> db.pets.append({'name': u'Schrödinger', 'date_birth': date('2009-08-12')})
>>> db.pets.append((u'Schrödinger', 'cat', True, date('2009-08-12')) # in-order
Table.update()
updates values in the row with the given primary key. A batch of rows can be updated using a filter, or a chain of filters with any()
or all()
. In the last example, all rows with type='cat'
will have their tail
field set to True
.
>>> db.pets.update(1, type='cat') # set type='cat' in row with id=1.
>>> db.pets.update(1, {'type': 'cat'})
>>> db.pets.update(eq('type', 'cat'), tail=True)
Table.remove()
removes the row with the given primary key:
>>> db.pets.remove(1)
>>> db.pets.remove(ALL)
>>> db.pets.remove(all(eq('type', 'cat'), lt(year('date_birth'), 1990, '<')))
The last example removes all rows that have type='cat'
AND year of birth before 1990.
Table.filter()
returns a list of rows filtered by field value(s), where each row is a tuple of fields. The first parameter defines which fields to return. It can be a single field name, a list of field names or ALL
. The following parameters are optional and define field constraints. They can also be given as a dictionary:
>>> db.pets.filter('name') # all rows, name
>>> db.pets.filter(('id', 'name')) # all rows, name + id
>>> db.pets.filter(ALL, type='cat') # type='cat', all fields
>>> db.pets.filter(ALL, type=('cat', 'dog')) # type='cat' OR type='dog'
>>> db.pets.filter(ALL, type='*at') # type='cat' OR 'hat' OR 'brat', ...
>>> db.pets.filter(ALL, type='cat', tail=True) # type='cat' AND tail=True
>>> db.pets.filter('id', {'type': 'cat', 'tail': True})
More complex queries can be constructed with a Query
.
Table.search()
returns a new Query
with options for filtering, sorting and ordering rows by field value(s). It can include fields from other, related tables.
query = Table.search(
fields = ALL,
filters = [],
relations = [],
sort = None,
order = ASCENDING,
group = None,
function = FIRST,
range = None
)
query.table # Parent Table.
query.fields # Field name, list of field names, or ALL.
query.aliases # Dictionary of (field name, alias)-items.
query.filters # List of filter() objects.
query.relations # List of rel() objects.
query.sort # Field name or list of field names.
query.order # ASCENDING | DESCENDING
query.group # Field name or list of field names.
query.function # FIRST, LAST, COUNT, MIN, MAX, SUM, AVG, CONCATENATE
query.range # (start, stop)-tuple, e.g. rows 11-20.
query.sql() # SQL string, can be used with Database.execute().
query.rows() # List of rows, each a tuple of fields.
query.record(row) # Dictionary of (field, value)-items for given row.
query.xml # XML string with the query schema and rows.
To iterate rows memory-efficiently, use iter(Query)
instead of Query.rows()
.
The filter()
function creates a field-value constraint that matches certain rows in a table. A list of filters can be passed to the filters
parameter of a Query
.
filter(field, value, comparison='=')
*Comparison* *Description* *Example* *Alias* `=` equal to `filter('type',` `('cat',` `'dog'),` `'=') ` ` eq()` `i=` equal to (case-insensitive) `filter('name',` `'tig*',` `'i=') ` ` eqi()` `!=` not equal to `filter('name',` `'*y',` `'!=')` ` ne()` `>` greater than `filter('weight',` `10,` `'>') ` ` gt()` `<` less than `filter('weight',` `10,` `'<') ` ` lt()` `>=` greater than or equal to `filter(year('date'),` `1999,` `'>=') ` ` gte()` `<=` less than or equal to `filter(year('date'),` `2002,` `'<=')` ` lte()` `:` between (inclusive) `filter(year('date'),` `(1999,` `2002),` `':')` ` rng()`
The field name of a DATE
field can be passed to the year()
, month()
, day()
, hour()
, minute()
or second()
function.The short aliases of filter()
have a preset comparison operator.
Filters can be chained together. The all()
function returns a list with AND logic. The any()
function returns a list with OR logic. In the example below, the first query matches all cats named Taxi. The second and third query match any pet that is cat OR that is named Taxi.
all(filter1, filter2, ...) # Rows must match ALL of the filters.
any(filter1, filter2, ...) # Rows must match ANY of the filters.
>>> from pattern.db import Database, eq, all, any
>>>
>>> db = Database('my_stuff')
>>>
>>> db.pets.search(filters=all(eq('name', 'Taxi'), eq('type', 'cat')))
>>> db.pets.search(filters=any(eq('name', 'Taxi'), eq('type', 'cat')))
>>> db.pets.search(filters=any(name='Taxi', type='cat'))
Lists created with all()
and any()
can be nested to define complex search criteria. The example below matches all pets that are cats, and whose name starts with Fluff- OR ends with a -y:
>>> f = any(eq('name', 'Fluff*'), eq('name', '*y')) # OR
>>> f = all(eq('type', 'cat'), f) # AND
>>>
>>> for row in db.pets.search(filters=f):
>>> print row
The syntax can even be more concise:
>>> for row in db.pets.search(filters=all(name=('Fluff*', '*y'), type='cat')):
>>> print row
The rel()
function defines a relation between two fields in different tables (usually id's).
rel(field1, field2, table, join=LEFT) # LEFT | INNER
The optional join
parameter defines how rows are matched. LEFT
takes all rows from the base table, with additional fields from the related table. For a row with no match between field1
and field2
, these fields have value None
. INNER
takes the subset of rows that have a match between field1
and field2
.
A well-known example is a database app that processes invoices. Say we have a products table and an orders table. Each order has a product id – instead of all product details. Each product id can occur in multiple orders. This approach is called database normalization. It avoids duplicate data. To generate an invoice, we can combine product details and order details using a query relation.
The following example demonstrates a simple products + customers + orders database app:
products
id
name
price
1
pizza
15
2
garlic bread
3
customers
id
name
1
Schrödinger
2
Hofstadter
orders
id
product
customer
1
1
2
>>> from pattern.db import Database, field, pk, INTEGER as I
>>>
>>> db = Database('pizza_delivery')
>>>
>>> db.create( 'products', (pk(), field('name'), field('price', I)))
>>> db.create('customers', (pk(), field('name')))
>>> db.create( 'orders', (pk(), field('product', I), field('customer', I)))
Add products and customers. Pizza delivery is open for business!
>>> db.products.append(name='pizza', price=15)
>>> db.products.append(name='garlic bread', price=3)
>>> db.customers.append(name=u'Schrödinger')
>>> db.customers.append(name=u'Hofstadter')
Hofstadter orders a pizza.
>>> db.orders.append(product=1, customer=2)
An orders query with relations to products and customers generates a human-readable invoice:
>>> from pattern.db import Database, rel
>>>
>>> db = Database('pizza_delivery')
>>>
>>> f = ('orders.id', 'customers.name', 'products.name', 'products.price')
>>> q = db.orders.search(f, relations=(
>>> rel('orders.customer', 'customers.id', 'customers'),
>>> rel('orders.product', 'products.id', 'products'))
>>> )
>>> for row in q:
>>> print q.record(row)
{ 'orders.id' : 1,
'customers.name' : u'Hofstadter',
'products.name' : u'pizza',
'products.price' : 15 }
If a relation is used repeatedly, define it once with Database.link()
. It will be available in every Query
.
A Query
has an optional parameter group
that can be used to merge rows on duplicate field values. The given function
is applied to the other fields. It can also be a list with a function for each field.
For example, to get the total revenue per ordered product:
>>> print db.orders.search(
>>> fields = ('products.name', 'products.price'),
>>> relations = rel('product', 'products.id', 'products'),
>>> group = 'products.name', # Merge orders with same product name.
>>> function = SUM # Sum of product prices.
>>> ).rows()
A Datasheet
is a matrix of rows and columns, where each row and column can be retrieved as a list. The data can be imported or exported as a CSV-file. Optionally, the given fields
is a list of (name,
type)
headers, where type
can be STRING
, TEXT
, INTEGER
, FLOAT
, BOOLEAN
, BLOB
or DATE
.
datasheet = Datasheet(rows=[], fields=None)
datasheet = Datasheet.load(path, separator=',', decoder=lambda v: v, headers=False)
datasheet.rows # List of rows (each row = list of values).
datasheet.columns # List of columns (each column = list of values).
datasheet.fields # List of (name, type) column headers.
datasheet.<field> # List of column values.
datasheet[i] # Row at index i.
datasheet[i, j] # Value in row i at column j.
datasheet[i1:i2, j] # Slice of column j from rows i1-i2.
datasheet[i, j1:j2] # Slice of columns j1-j2 from row i.
datasheet[i1:i2, j1:j2] # Datasheet with columns j1-j2 from rows i1-i2.
datasheet[:] # Datasheet copy.
datasheet.insert(i, row, default=None)
datasheet.append(row, default=None)
datasheet.extend(rows, default=None)
datasheet.copy(rows=ALL, columns=ALL)
datasheet.group(j, function=FIRST, key=lambda v: v)
datasheet.save(path, separator=',', encoder=lambda v: v, headers=False)
datasheet.json # JSON-formatted string.
Datasheet.insert()
and append()
fill missing columns with the default
value.Datasheet.columns.insert()
and append()
fill missing rows with the default
value.field
parameter can be used to supply a (name
, type
) column header.Datasheet.copy()
returns a new Datasheet
from a selective list of row and/or column indices.datasheet
=
flip(datasheet)
.For example:
>>> from pattern.db import Datasheet
>>>
>>> ds = Datasheet()
>>> ds.append((u'Schrödinger', 'cat'))
>>> ds.append((u'Hofstadter', 'cat'))
>>> ds.save('pets.csv')
>>>
>>> ds = Datasheet.load('pets.csv')
>>> print ds
[[u'Schrödinger', 'cat'],
[ u'Hofstadter', 'cat']]
Datasheet.group(j)
returns a new Datasheet
with unique values in column j
. It merges rows using a given function
that takes a list of column values and returns a single value. Predefined functions are FIRST
, LAST
, COUNT
, MIN
, MAX
, SUM
, AVG
, STDEV
and CONCATENATE
. It can also be a list of functions.
The optional key
can be used to compare the values in column j
. For example, lambda
date:
date.year
groups a column of Date
objects by year.
>>> from pattern.db import Datasheet, pprint
>>>
>>> ds = Datasheet(rows=[
>>> (1, u'Schrödinger', 'cat'),
>>> (2, u'Hofstadter', 'cat'),
>>> (3, u'Taxi', 'dog')
>>> ])
>>>
>>> g = ds.copy(columns=[2, 0]) # A copy with type & id.
>>> g = g.group(0, COUNT) # Group type, count rows per type.
>>> pprint(g, fill='')
cat 2
dog 1
Datasheet.columns\[j\].sort()
sorts the rows according to the values in column j
.Datasheet.columns.sort()
can be used to change the column order:
>>> ds.columns.sort(order=[0, 2, 1])
>>> pprint(ds, fill='')
1 cat Schrödinger
2 cat Hofstadter
3 dog Taxi
Datasheet.columns.swap(j1,j2)
swaps two individual columns with given indices.
Datasheet.save()
exports the matrix as a CSV file. Datasheet.load()
returns a Datasheet
from a given CSV file. CSV (comma-separated values) is a simple text format for tabular data, where each line is a row and each value is separated by a comma.
datasheet = Datasheet.load(path, separator=',', decoder=lambda v: v, headers=False)
datasheet.save(path, separator=',', encoder=lambda v: v, headers=False)
On export, all str
, int
, float
, bool
and Date
values are converted to Unicode. An encoder
can be given for other data types. On import, all values in the datasheet will be Unicode unless a decoder
is given.
With headers=True
, the Datasheet.fields
headers are exported and imported (first line in CSV). In this case, the data type for each column (STRING
, INTEGER
, FLOAT
, BOOLEAN
or DATE
) is explicitly known and no encoder
or decoder
is needed.
>>> from pattern.db import Datasheet, STRING, DATE, date
>>>
>>> ds = Datasheet(fields=(('name', STRING), ('date', DATE)))
>>> ds.append((u'Schrödinger', date('1887-08-12')))
>>> ds.append((u'Hofstadter', date('1945-02-15')))
>>>
>>> ds.save('pets.csv', headers=True)
>>>
>>> ds = Datasheet.load('pets.csv', headers=True)
>>> print ds[0]
[u'Schrödinger', Date('1887-08-12 00:00:00')]
The csv()
function can also be used instead of Datasheet.load()
:
>>> from pattern.db import csv
>>>
>>> for name, date in csv('pets.csv', separator=',', headers=True):
>>> print name, date
The date()
function returns a new Date
, a convenient subclass of Python's datetime.datetime
. It takes an integer (Unix timestamp), a string or NOW
. An optional string input format and output format can be given (e.g., "%d/%m/%y"
). The default output format is "YYYY-MM-DD hh:mm:ss"
.
d = date(int)
d = date(NOW, format=DEFAULT)
d = date(string)
d = date(string, format=DEFAULT)
d = date(string, inputformat, format=DEFAULT)
d = date(year, month, day, format=DEFAULT)
d = date(year, month, day, hours, minutes, seconds, format=DEFAULT)
d.year
d.month # 1-12
d.week # 1-52
d.weekday # 1-7
d.day # 1-31
d.minute # 1-60
d.second # 1-60
d.timestamp # Seconds elapsed since 1/1/1970.
If no string input format is given, a number of common formats will be tried:
*Format* *Example* `%Y-%m-%d %H:%M:%S` 2010-09-21 09:27:01 `%a, %d %b %Y %H:%M:%S %z` Tue, 9 Sep 2010 17:58:28 +0000 `%Y-%m-%dT%H:%M:%SZ` 2010-09-20T09:27:01Z `%Y-%m-%dT%H:%M:%S+0000` 2010-09-20T09:27:01+0000 `%Y-%m-%d %H:%M` 2010-09-20 09:27 `%Y-%m-%d` 2010-09-20 `%d/%m/%Y` 20/09/2010 `%d %B %Y` 9 september 2010 `%B %d %Y` September 9 2010 `%B %d, %Y` September 09, 2010All date formats used in pattern.web (e.g., Twitter search result) are automatically detected.
For an overview of date format syntax, see: http://docs.python.org/library/time.html#time.strftime.
Date calculations
The time()
function can be used to add or subtract time to a Date
:
time(days=0, seconds=0, minutes=0, hours=0)
>>> from pattern.db import date, time
>>>
>>> d = date('23 august 2011')
>>> d += time(days=2, hours=5)
>>> print type(d)
>>> print d
>>> print d.year, d.month, d.day
<class 'pattern.db.Date'>
2011-08-25 05:00:00
2011, 8, 25
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4