[pyddr-discuss] Query setup : Query.py

Frank Foeth foeth01 at orange.nl
Fri Mar 2 04:42:43 EST 2007


Hi Pavel,

> Why are we on PyDDR-discuss now?
See private communication.

...
> > Your ideas don't clash with what I wrote about the query set-up. What I
> > can do is build a basic query.py keeping in mind the interface, I need
> > it anyway, so I have to keep more things in mind with it. Building it
> > should not interfere with discussing the interface.
> 
> Agreed.

I only "finished" it this week. The documentation still needs work, but
I hope to finish it after taking a bit of distance from the code. (For
one thing: I should merge the query.py and query-doc.txt .)

I'm thinking about removing the freeze option, the results are not very
impressive. Your opinion on this would be appreciated.

A large part is testing code, not the best in the world I'm afraid, but
it saved me a lot of time when making changes. It starts automatically
upon: `python query.py` . Testing contains a (large) timing test too, so
the runtime is a bit long. Most of the important stuff is tested, but I
will have made oversights.

Yours,
Frank



-------------- next part --------------
A non-text attachment was scrubbed...
Name: query.py
Type: text/x-python
Size: 29655 bytes
Desc: not available
URL: <http://icculus.org/pipermail/pyddr-discuss/attachments/20070302/22f55d7b/attachment.py>
-------------- next part --------------
Provisional documentation query.py version 0.1


Function:
Provide a framework get query results on objects in python. It is designed to be usable from the python shell, so users are somewhat protected from mistakes.

Status: functionality has been built and tested, could use some minor redesign though.


Basic outline:
The module defines of a number of classes which could easily be used to implement the following language:

query ::= 'True' | and_list | or_list | and_list 'OR' or_list
and_list ::= statement ( 'AND' statement )*
or_list ::= statement ( 'OR' statement )*
statement ::= ('NOT')+ query | ('NOT')+ parameterstatement | 
              ('NOT')+ compoundstatement
parameterstatement ::= function'('parameter (',' constant)* ')'
compoundstatement ::= function'('parameter (',' parameter)* (',' cconstant)* ')'
parameter ::= sort_key'(' value'(' item ')'')'
constant  ::= sort_key'(' pcon ')'
cconstant ::= csort_key'(' pcon ')'

Where:                                        |       or
(X)+    zero or one repeats of X              ::=     defined as
(X)*    zero to many repeats of X             'x'     literal x

User supplied information:
function, returns boolean
pcon, parameter value, stays constant during a query
sort_key, sort_key as used in python's sort()
csort_key, as sort_key, but user specifies it per constant
value, a function that returns an actual parameter value from an object
item, the queried object

Implementing objects:
QueryNameSpace : stores Parameter-s, Compound-s, and Queries, may store Statements
Parameter: stores definitions of  value  and  sort_key  that belong together
Compound: stores groups of Parameters, which are used together in one Statement
Statement: ParameterStatement, CompoundStatement, or QueryStatement
ParameterStatement: stores a parameter name, constants, a boolean function, and whether or not to negate the result of the function (= not_is_absent). 
CompoundStatement: stores a list of parameters, a list of tuples of a constant and the parameter it is bound to (or None), a boolean function and not_is_absent.
QueryStatement: stores a query and not_is_absent.
Query: stores the and_list, and the or_list in a local namespace.

Examples of calls:
ns = QueryNameSpace()
ns.set("par1", Parameter(lambda x: x.info["par1"]))
par2 = Parameter(lambda x: x[7], lambda x: x[:2])
ns.set("par2", par2)
cmp1 = Compound(ns, ["par1", "par2"])
ns.set("cmp1", cmp1)
stp = ParameterStatement(ns, "par1", lambda x,min,max: min<=x<=max, [4,7], True)
  # basically: suppose object's parameter value VAL: result -> NOT 4<=VAL<=7 
stp = ParameterStatement(ns, "par1", lambda x,min,max: min<=x<=max, [4,7])
  # same without NOT
spc = CompoundStatement(ns, "cmp1", 
      lambda x, y, xmin, xmax, ymin, ymax: (xmin, ymin)<=(x,y)<=(xmax, ymax),
      [(xmin,"par1"),(xmax,None),(ymin,"par2"),(ymax,"par2")], True)
  # please note that with a missing sort_key in par1 (xmin,"par1") and (xmin,None)
  # are equivalent
q1 = Query()
q1.set(stp)
qst = QueryStatement(ns, q1, True)
qst = QueryStatement(ns, q1)
ns.set("q1",q1)
resultslist = [item for item in origlist if q1.check(item)]
  # the method check(item) is also provided by all Statements


Name space:
The main namespace is provided to allow changing of parameter definitions while developing queries. Suppose you created a query which depends on many other queries, which in turn depend largely on the same parameters. You entered them from the command line, but you made an error. Or, you have two formulations to represent a parameter, you want to see which is best, but you don't want to rebuild the query tree. The main namespace insures all queries use the same parameter definition.

The local namespace in Query basically allows you to overwrite statements, it also prevents you from entering non-statements. Please note these statements do not need to be registered in the main name space. It has protection against building circular queries (a query that depends on itself - endless loop would result.)

After setting anything in the main name space, you can only overwrite it with an object of the same type. You cannot delete from the main name space. Local namespace is more forgiving, you are allowed deletions and any statement can replace any statement.

Errors:
If you try to set objects of a not-supported type into a name space, a TypeError will be raised. The alternative, report success upon set(), was rejected. Catching these errors is a waste of time, as the resulting queries will probably be buggy and yield wrong results. Running into these error during development will probably aid debuging instead of hindering it.

A CircularQueryError is raised if you attempt to make a query depend on itself. (Upon execution of check() an endless loop would result.)

Tips:
* Rebuilding your name space data in a different interface: add evaluable strings containing of your functions to the parameter's, statement's or query's info dictionary.
* To use freeze properly (maybe 30% faster query execution?): 
1. fill the namespace with parameter, then compound parameters, then queries (or if you add statements to them: first statements, then queries); 
2. freeze the name space;
3. run the queries.
If you have to change the namespace inbetween, first unfreeze() then apply changes and freeze() again. Other schemes may work too, but beware of confusion. 

Miscellaneous:
* Clones (a relic from the earliest idea's) are kept around. May be usefull when you use the info dictionary.
* A better __repr__() implementation was rejected. The main problem is what to do with the  name space. Also, clever use of info[] allows the user to store the info necessary to recreate the object.

TODO:
** Calculation time limitation. At present this is unknown, but it's likely to be very long. For pydance a net calculation time of say 0.3 seconds might be acceptable for a dance collection of say 10000 dances (~ 3000 songs).  
Ideas:
- name space alignment: see to it name strings are replaced with the name string in the QueryNameSpace (same id()), won't do a lot though, compared with freeze().
** Write proper documentation into query.py


More information about the pyddr-discuss mailing list