Metabolic Modeling Tutorial
discounted EARLY registration ends Dec 31, 2014
BioCyc websites down
12/28 - 12/31
for maintenance.
Metabolic Modeling Tutorial
discounted EARLY registration ends Dec 31, 2014
BioCyc websites down
12/28 - 12/31
for maintenance.
Metabolic Modeling Tutorial
discounted EARLY registration ends Dec 31, 2014
BioCyc websites down
12/28 - 12/31
for maintenance.
Metabolic Modeling Tutorial
discounted EARLY registration ends Dec 31, 2014
BioCyc websites down
12/28 - 12/31
for maintenance.
Metabolic Modeling Tutorial
discounted EARLY registration ends Dec 31, 2014
BioCyc websites down
12/28 - 12/31
for maintenance.

News

Update History

Information

Introduction to BioCyc
5500 Databases
Guided Tour
Pathway Tools Software
Pathway Tools Blog
Publications
Linking to BioCyc
Webinars
Contact Us

Services

Subscribe to BioCyc
Metabolic Posters
Genome Posters
Software/Database Downloads
Registry
Web Services

The BioVelo Query Language

Mario Latendresse, Ph.D.

1. Introduction

The BioVelo language is designed to let the user write precise queries against Pathway/Genome Databases created using the Pathway Tools software. By writing queries in BioVelo you can extract lists of objects (e.g., proteins) satisfying specific conditions.

BioVelo can be used as queries for BioCyc Web services. Note that according to the BioCyc Web services documentation, a BioVelo query used in a Web service request must create a list of objects (e.g., pathways, genes). In particular, a BioVelo query used in a BioCyc Web service request cannot return names, numbers, etc. from these objects. Consult the BioCyc Web services documentation for more details and examples.

The BioVelo language is based on a computer science concept called list comprehension, which is popular in functional languages such as Haskell, Miranda, and Python. BioVelo is rich enough to let the user specify complex queries without relying on procedural side effects. This property eases writing queries by starting with simple ones and composing them to create more complex ones. Functional languages like BioVelo offer this advantage over procedural languages.

For a very quick introduction on BioVelo, with short examples, please see Table 2.

2. Database Schema

The databases queried by BioVelo contain objects belonging to various classes: metabolic pathways, reactions, proteins, genes, and so on. Each class has a set of attributes associated with it. For example, the class Proteins has attributes that include pI (its isoelectric point), and Gene (the gene encoding the protein). That means that each protein object (instance of the class) has the attribute Gene, although in some objects, the attribute may have no value.

The list of classes, their attributes, and the type of each attribute are part of what is called the schema (or ontology) of the databases. All PGDBs share one schema, although that schema changes with new versions of Pathway Tools. The schema is not part of BioVelo. The graphical user interface provided with BioVelo gives some details of the schema and the list of accessible databases -- this list will change with time.

The Pathway Tools schema is described in several documents. The most comprehensive is the Pathway Tools User's Guide, which is available as part of the Pathway Tools software download package. See also several publications listed on the BioCyc publications page. The better you know the Pathway Tools schema, the more adept you will be at writing BioVelo queries, because you need to know what classes to base your queries on, and which attributes to filter in your queries.

By applying relations (e.g., is equal to) to attributes you can constrain your search to specific attribute values. Several relations on multiple attributes can be used in the same query (or subquery).

Each attribute has a type. For example, a type can be a string, a number, a Boolean, an enumerated list of values (aka ``a controlled vocabulary"), or some specific class. For each type, a specific list of relational operators is allowed. For example, addition can be done only on numbers, not strings. Knowing these types is important to write valid queries.

3. Syntax of BioVelo

BioVelo is based on List Comprehension, a notation that was first created in mathematics as Set Comprehension. The syntax is similar, but BioVelo emphasizes lists over sets. Note: a list may contain duplicates of some elements, whereas a set does not contain duplicates. BioVelo can process sets, but using lists is usually more efficient.

Table 1 presents the formal syntax of a valid BioVelo expression -- that is the element called Expr can be provided as a query. The notation used to define this syntax is described in the caption of the table although informal details are presented in the next paragraphs.

Typically, a query starts and ends with brackets (that is [ ] or { }). From Table 1 a bracketed expression is either ListExpr or SetExpr. We introduce BioVelo with such queries.

A bracketed query has the form

[ e : q1,..., qn]

or

{ e : q1,..., qn}

where e -- called the head -- is an expression, and the qi are qualifiers that are either generators, binders, or filters. The result of a query is an ordered list (i.e., duplicates may exist) if square brackets [ ] are used; but it is a set (i.e., no duplicates exist) if curly braces { } are used. We will detail the syntax and semantics of such expressions in the following paragraphs.

The head expression e is either a single expression (a SingleExpr from Table 3) or a tuple of single expressions (e1,..., en). In the simplest and most common case, the e is a single identifier that is assigned (aka bound) by one of the generators or binders.

3.1 Generators to Iterate Through Lists of Objects

A generator has the form p<-X where p is a variable or a tuple of variables and X is an expression of type list. The scope of the variables in p extends to all qualifiers to the right of that generator -- that is all the variables in p can be referenced by qualifiers on the right of this generator. Variables are identifiers as defined in Table 2 and note that they are case sensitive (e.g., variable X is not the same as variable x). The variables are iteratively bound to every element of X -- for each iteration the right qualifiers are evaluated. The generators are similar to ``loops" in programming languages. If p is a tuple, the number of variables in it must be the same as the tuples that are in X.

Here is a simple example with a single variable (clicking the query will bring you to the Free Form Advanced Query Page (FFAQP) where you can submit the query and see the result as a table):

[r : r <- ecoli^^reactions]

This gives the list of all reactions in the database E. coli.

3.2 Double caret ^^ to Extract All Objects and Subclasses of a Class, ~ to Extract One Object of a Database

The double caret ^^ operator, applied to a database and a class name of this database, as in the previous example, specifies the list of all objects and subclasses of that class for this database. The left operand of ^^ must be either the name of a database accessible from the server or the name of a bound variable to a database name accessible from the server. For example ecoli^^proteins gives the list of all protein objects and subclasses of the database E. coli. The underlying schema decides the exact capitalization of the class names. Although, most of the time, a class name does not have to be exactly capitalized, that is, you can most of the time enter the class name all in lower case. Indeed, we provide the following heuristic to help the user avoid entering the exact capitalization: we first try to find the class name in the schema as you enter it; if this fails, we capitalize the class name provided by capitalizing all the words separated by a dash (e.g., Super-Pathways). For example, if you enter proteins, it is first tried as is in the schema; since it does not exist in the current schema, the capitalization Proteins is then tried, which is correct in the current schema. Similarly with all-genes, it is capitalized to All-Genes. On the other hand, the class DNA-Binding-Sites must be written exactly that way since our heuristic capitalization would give Dna-Binding-Sites -- which is incorrect in the current schema.

The tilde ~ can refer to a single object of one database based on a frame-id. For example, the expression ecoli~rxn0-3151 refers to the object whith frame-id rxn0-3151 in database ecoli. If the frame-id contains characters that are not part of legal identifier (see Table 1 for the definition of identifier), then it must be enclosed in a pair of double-quotes. For example, ecoli~"GO:0009252". In this case, the colon character (:) is not a legal character for an identifier. It is not an error to use a pair of double-quotes when it is not required. In general, the left operand of ~ can be a variable bound to a database. For example, you can search the entire list of databases for one particular frame-id (e.g., eg10124) by using dbs like so: [(x,x~eg10124): x<-dbs, x~eg10124].

3.3 Single caret ^ to Reference the Value of an Attribute

The single caret ^ operator references the value of an attribute of an object of a class. The left operand must be an expression evaluating to an object. The right operand must be the name of a queryable attribute. The name of an attribute is not case sensitive. For example, assume that identifier x is bound to a protein object; then x^dna-footprint-size gives the value of the attribute dna-footprint-size of that protein. In this case, this is an integer value. An attribute of any type can be referenced using ^. If the attribute does not belong to the class of the object bound to the given variable, its value will be the empty list.

The ^? operator is essentially the same as ^ but it always generates a string that represent an HTML link to the object on its left operand. This is typically used in the head of the result to get clickable links in a browser. For example,

[r^?name : r <- ecoli^^reactions]
would give the list of reaction names with a clickable link to get more information for each reaction. Using the ^ operator would only give the name.

The special attribute FRAME-ID exists for all objects (it has the type String). Although it is not a real attribute of the schema, it is available to make it explicit that each object has a unique ID that can be referenced.

3.4 Filters to Apply Conditions to Objects Extracted from Generators

A filter is a Boolean expression formed by connectives, bound variables, comparison operators, arithmetic operators, and/or class predicates. The connective for logical "and" is &, for "or" it is |. The constants false and true represent the corresponding logical values. The comparison operators <, >, <=, >=, =, != pertain to numeric values. The priority (order of grouping) of these operators is given in Table 2.

The logical operators can be applied to any types. For example, assuming that e is a variable bound to a gene object, it is valid to write e^comment | e^citations which is true if there is a comment or some citations for e. That is, an attribute that has no value is considered false. So, that !(e^comment) (the exclamation point is the logical not operator) is true when e^comment has no value. This is useful to find out which attributes have values or no values.

For example, in the following we have a simple Boolean expression 3 > #r^left that is true if and only if the number of elements in the left attribute is smaller than 3:


[(r^?name, #r^left) : r <- ecoli^^reactions, 3 > #r^left ]

This expression "loops" through each reaction of E. coli, verifies for each one that the left attribute has less than 3 elements, and returns only these reactions along with their number of elements in left. Notice that in 3 > #r^left, the ^ has higher priority than #, and > has lower priority than #. So it is interpreted as: reference the attribute left from r, then take the length of left, then compare it with 3. The order is important, otherwise, in this case, these operations would not make sense because of invalid types (e.g., you cannot take the length of an object, but only of a list or set).

Several generators and filters can be specified as qualifiers in one bracketed query. The filters are simply evaluated from left to right as if a logical "and" operation was between them. Generators are also evaluated from left to right. They are nested "loops". For example, in


{r^?name : r <- ecoli^^reactions, product <- r^left, product isa proteins}

we have two generators. The first one loops through all the reactions of E. coli. For each reaction, the second generator loops through the products on the left attribute of this reaction. The filter verifies that product is a protein. Notice that we have used the curly braces so that the same reactions will not be returned twice. This will return the list of all reactions that have at least one protein on its left side. The similar query with square brackets:


[r^?name : r <- ecoli^^reactions, product <- r^left, product isa proteins]

would return a similar list of reactions but a reaction that has more than one protein in its left attribute would be repeated.

The following example contains three generators. This query generates a list of all pathways of E. coli with every possible pair of their reactions.


[(p^?name, r1^?name, r2^?name) : 
         p <- ecoli^^pathways, r1 <- p^reaction-list, 
         r2 <- p^reaction-list, r1 != r2]

The monadic operator # gives the number of elements of a list or set. For example, #ecoli^^proteins would give the number of protein objects in database ecoli. It can be applied to any expression that returns a list or set. For example the following expression gives the number of reactions that has only one object in its left attribute.


#[r : r <- ecoli^^reactions, 1 = #r^left ]

3.5 Binding a Variable to a Value of an Expression

It is possible to bind a variable to any expression value. For example, the qualifier e := x - y binds the variable e to the value of the expression x - y. In general, a binder has the form p:=e where p is a pattern and e is an expression. The symbol := is used to bind one or several identifiers to the value of the expression e. If the pattern p is a tuple with n variables, then the expression e must generate a tuple of n values. If the pattern p is a single variable, the expression e can be of any type. This is useful to avoid recalculating a value (e.g., computation time is saved) or simply shortening a query. For example, in


[r^?name : r <- ecoli^^reactions, p := #r^left, 2 < p, p < 5]

we avoid doing a reference to #r^left twice by using the identifier p.

A tuple of variables can be used in some cases to bind multiple variables with multiple values. For example, in


[ (p^?name, e^?name) : p <- ecoli^^proteins, 
  e := [(c1, c2) : (c1, c2) <- protein-to-components p, c2 > 1],
  #e > 4]

there is a tuple of variable (c1, c2) bound for each element returned by protein-to-components -- this function returns a tuple of two values for components of the protein p, the component c1 and the number of times c2 it occurs in p. This query would return all the proteins of E. coli that has more than four components that repeat at least once.

3.6 The List of Available Databases

The list of possible databases is given by the special identifier dbs. For example, the generator db <- dbs would bind the variable db to each available database. A specific database is specified by using its name - for example, ecoli identifies the E. coli K-12 database, EcoCyc. The complete list of available databases is provided by the graphical user interface to BioVelo. It is also possible to have this list by entering the query dbs -- it will return the list of databases currently accessible from the server.

For any given object, the function database returns its database. It is not a string but a database object. The returned value can be used by any function that requires a database.

3.7 Arithmetic Expressions

Arithmetic expressions can be formed by using arithmetic operators +, -, /, *, quotient (integer division), remainder, and abs. The subtraction (i.e., -) operation is dyadic as well as monadic (i.e., a minus sign). Moreover, lexically, it is necessary to have a delimiter between a - and an adjacent operator or identifier. For example, writing a-b does not mean a subtraction between identifier a and b, but rather the identifier a-b, since the dash can be used to create identifiers.

In the following query, a subtraction is done between the end and start positions of every gene of E. coli associated with an RNA. This query find all RNAs in E. coli that have less than 100 nucleotides.


[(r^?name, l^?name, r^?gene) : r <- ecoli^^RNA, g <- r^gene, 
  l := abs(g^left-end-position - g^right-end-position), 
  l < 100 ]

3.8 Constraining the Extracted Objects to a Class

The form x isa c -- where x is an expression returning an object, and c a class -- is a Boolean expression, or filter, that tests the membership of the object bound to x in class c. The expression is true if and only if x is an object of class c. Notice that this includes the cases where the class of the object is a subclass of class c. For example, r isa binding-reactions is true if variable r is bound to an object of class binding-reactions or to one of its subclasses.

The class c is either a string, a symbol, or an expression using the operator ~.

The previous example could have been written in the following way using isa:


[(r^?name, l^?name, g^?name) : g <- ecoli^^genes, 
  #(g^product) = 1,
  r <- g^product, r isa RNA,
  l := abs(g^left-end-position - g^right-end-position), 
  l < 100 ]

The following query returns all the proteins of E. coli, with their DNA binding sites, if any; one binding site per line when displayed as a table.


[(p^?name, e^?name) : p <-ecoli^^proteins, c <- p^component-of, 
  c isa Protein-DNA-Complexes,
  e := [c2 : c2 <- c^components, c2 isa DNA-Binding-Sites], 
  #e > 0]

This last expression will actually repeat the same protein for each of its binding sites. To have one protein with all its binding sites "next to it" when displayed in a table, one could write


[(p^?name, e^?name) : p <-ecoli^^proteins, 
 e := [c2 : c <- p^component-of,  c isa Protein-DNA-Complexes,
            c2 <- c^components, c2 isa DNA-Binding-Sites], 
 #e > 0]

3.9 String, Set, and List Operations

The operator in, such as x in X, returns true if and only if element x is in list X. The type of x can be a number, string, or database object.

The Boolean expression s instring S is true if and only if the string s is a substring of string S; or if S is a list of strings, that s is a substring of these concatenated strings; or if S is a list of lists, each list is converted into a string, removing all spaces and parentheses, and s is a substring of these concatenated strings. This expression returns false otherwise. There is also a case insensitive version called instringci.

Regular expressions can be used to specify general string patterns to search. The operator ~= must be used to specify one regular expression. Its left operand must be of type string, and its right operand must be of type string whose value is a string that specifies a regular expression. The syntax of this regular expression follows the rules of the Perl language. Notice that there is a match if the left operand has a substring matching the regular expression. For example, to find all genes in E. coli having a name with a substring made of 't' followed by at least one 'a', you would write:


[g^?name : g <-ecoli^^genes, 
           g^name ~= "t.*a"
]
If we were looking for gene names that starts with 't' and ends with 'a', we would use the specifiers '^' and '$' for beginning and end string:


[g^?name : g <-ecoli^^genes, 
           g^name ~= "^t.*a$"
]
Notice that the pattern is case-sensitive. For example, the last query would not find gene names that end with 'A'.

The monadic operator set, applied to a list, returns the same list but without duplicated elements. Curly braces can be used instead of the square brackets -- for the whole query -- to get a set instead of a list.

The dyadic operator ++ concatenates two lists or sets (it does not remove duplicates). (The union of two sets can be obtained with ++ and set as in (set A ++ B)) As usual, it is used as an infix operator, as in A ++ B. The operator ** does an intersection between two lists or two sets; it does not remove duplicates.

3.10 Order of Operations (Priority and Associativity of Operations)

When forming complex expressions, it is important to consider the priority of the operators used. For example, in the expression y+3*x > 1, the multiplication of 3 to x is done before the addition to y. Likewise, the addition is done before the comparison operator > is applied. That is, the priority of operator * is higher than + which is higher than >. Table 2 gives the relative priority of all operators: an operator higher up in the table has higher priority. As usual, parentheses can be used to apply the operators in a different order; for example in (y+3)*x > 1, the addition is done before the multiplication to x. Adding extra or redundant parentheses is not an error -- in many cases it is better to over-specify the order of evaluation so that the expression is clearer.

The associativity of an operator, or group of operators with the same priority, is either right or left. The operators * and / have the same priority and are left-associative; for example the expression x*y/z*w is interpreted as ((x*y)/z)*w. Note that an expression such as x/y/z is interpreted as (x/y)/z. Table 2 gives the associativity of all operators. In some cases, the associativity does not apply as the operator cannot be composed with itself. For example, an expression such as db^^c^^d is invalid since the second ^^ cannot be applied to a class -- that is, neither db^^(c^^d) nor (db^^c)^^d has a valid interpretation.

The expression [exp1 @ exp2] represents a list of numbers from exp1 up to at most exp2. For example, [1 @ 10] represents the list of integers from 1 to 10. The starting expression exp1 and ending expression exp2 must be of type numbers; they can be integer or non-integer expressions. The list of numbers generated starts at the value of exp1 and, by increasing by 1, goes to a maximum of exp2. If the ending value is smaller than the starting value, the generated list is empty.

Similarly, the expression [exp1 @ exp2, exp3] represents a list of numbers from exp1 up to at most exp2 in steps specified by exp3. For example, [1 @ 10, 2] represents the list of odd integers from 1 to 9. The starting expression exp1 and ending expression exp2 must be of type numbers; they can be integer or non-integer expressions. The list of numbers generated starts at the value of exp1 and, by increasing by exp3, goes to a maximum of exp2. If exp3 is zero, the list has only one element, the value of exp1. In the case that exp3 is negative, if the starting value exp1 is greater than the ending value exp2 than the list of values will be decreasing; otherwise the empty list will be generated.

List of numbers can also be specified by simply using a comma separated list of expressions between square brackets. For example, [1, -1, 100, 9] is a list of four numbers. General numerical expressions can be used as in [n, 2*n, m-1], assuming that the variables n and m are bound to numerical values.

Indexing can be done on list of any type by using the list syntax [...]. For example, [r : r <- ecoli^^reactions][0 @ 4] selects the first five reactions returned by the query on E. coli reactions. The indices do not have to be in numerical sequence, e.g. [3, 5, 7] is also valid. If the list of indices is only one element, the result will have the type of the extracted element, otherwise it is a list of the elements. Applying an indexing operation to a non-list generates an empty list. If some indices are out of range (e.g., greater than the length of the list or negative), no error occurs, but these indices are skipped. Indexing is 0-based, that is, L[0] returns the first element of variable L.

3.11 Queries Generating Numbers and Multiple Tables

A BioVelo query can also be an expression that does not start with a bracket -- a query is in general an Expr according to the formal syntax of Table 1. For example, a tuple (#dbs, [c^?name:c<-ecoli^^genes]) is a legal query. This gives the number of accessible databases and the list of genes of E. coli. The expression #[p:p<-ecoli^^pathways] is also valid as a query. It gives the number of pathways in E. coli. The following would return two lists, each one interpreted as a table when displayed. The first table contains all E. coli reactions that have two objects on the right attribute; and the second table contains all E. coli reactions that have three.


([r?^name : r <- ecoli^^reactions, 2 = #r^right ],
 [r?^name : r <- ecoli^^reactions, 3 = #r^right ]
)

So, in general, arithmetic expressions and tuples are also valid queries -- in the former case they return single values; in the latter case they return multiple values of possibly complex values (e.g., lists).

3.12 Library of Special Functions

Table 3 presents a list of functions that can be applied to various objects. In general, these functions return a list of objects of a specific type. For example, the function enzyme-to-genes takes an enzyme as input and returns the list of genes that produce this given enzyme. You can use these functions anywhere an expression is allowed. For example, [(x, reaction-to-genes x): x <- ecoli^^reactions] gives a list of tuples where the first element is a reaction from E. coli and the second element is a list of genes involved in this reaction.

Here are some more details about some of these functions.

The special function find-objects searches a given database, i.e., its first operand, based on a given object as the second operand. The search is done using the common name, the synonyms and the frame-id of the given object. Typically, the given object is from a database different than the given database. A similar operation could be done using a complex BioVelo query based on the attributes of the object, but the find-objects function does the search in a more efficient way. For example, the following query will search for the gene trpA of ecoli in all databases available: (the gene trpA from ecoli is also included in the result)

[(db, l): g <-ecoli^^genes, g^name = "trpA", 
          db <- dbs, l:=find-objects(db,g), 
          #l > 0]

Note: the variable dbs is bound to the list of available databases.

3.13 Sorting the Result of a Query and Adding HTML Table Headers

The result of a query can be sorted using the following functions: sort, html-sort-descending, and html-sort-ascending.

The sort function always sort in ascending order. It is applied directly to the given list without preprocessing the elements. For exampe, sort [x^?name: x <- ecoli^^proteins] would return the ascending sorted list of proteins based on their name. If the result is a list of tuples (i.e., several columns), by default the sorting is done on the first element of the tuple (i.e., the first column). This can be changed by specifying two arguments to the function sort. The first argument is the query, the second is an integer specifying the index of the element to sort on. For example, sort ([(x^name,x^frame-id): x <- ecoli^^proteins],2) sorts on the second element, the frame-id. Note if the query is executed in the context of HTML output, all the elements are strings, including numbers, so that lexicographic order applies in this case.

The function html-sort-descending should only be used in the context of a query ran from one of the web interfaces: the SAQP or FFAQP. (Note: These interfaces are available at SAQP/FFAQP.) The output is not sorted by the server, but sorted by the JavaScript in the HTML page generated by the server. Naturally, as its name says, it sorts in descending order. As for sort, its second argument specifies which column to sort on. By default this is the first column.

The function html-sort-ascending is similar to html-sort-descending but sorts in ascending order.

The function html-table-headers controls the headers content of tables generated by a web server when generating HTML output. It should only be used in that context. Its first argument is a query that results in a list, its second argument is a tuple of strings. Each string will appear in the column headers of the HTML table to display the list. Note that the list is made of tuples of the same arity which is the number of columns of the displayed table. The number of strings specified in the second argument should be the same as the number of columns, although it is not an error to specify more or less than that number. The list will be either truncated, if too long, or supplied with some headers, if too short. Typically, the added headers are formulated by taking the head specification of the query, although this may result in odd looking text.

Example:

html-table-headers([(pathway-to-reactions(x), pathway-to-genes(x) ) : x <- ecoli^^pathways],
                      ("My Reactions", "My Genes"))

The functions html-table-headers, html-sort-ascending, and html-sort-descending can be freely composed. That is, for example, after sorting using either html-sort-ascending or html-sort-descending you can apply html-table-headers.

3.14 How To Submit BioVelo Queries

There are two ways you can submit BioVelo queries, either via the Web using a browser (e.g., Internet Explorer) or from a desktop version of Pathway Tools. The latter can be done from a Lisp prompt or the Pathway Tools prompt by calling the Lisp function biovelo. For example you can enter (biovelo "dbs") which would return the list of accessible databases. In general, the query is provided as a string to the biovelo function. The Web version can be accessed at Advanced Query Page. The Web version has a GUI interface (actually two GUI interfaces). It also include its own documentation accessible via the previous Web link.

3.15 Syntax Tables


Table 1: Syntax of expressions in BioVelo. How to read this notation: each element on the left is defined by what is presented on the right of ::=; a vertical bar separates various possibilities (i.e., it can be read as the connective or); a character in bold represents itself; a word in italic is defined in this table or the other two tables. For example, an Expr is defined as either a SingleExpr or a Tuple; a ListExpr starts with a square bracket typed as [, then an Expr, as defined in this table, then a colon, and so on. An identifier is a sequence of letters, digits, -, _ or ? that starts with a letter or an underscore (i.e., _). Identifiers are case sensitive. A string constant may contain a double quote or a backslash by escaping them with a backslash.
Expr ::= Tuple | SingleExpr
Tuple ::= (Exprs)
Exprs ::= Expr , Exprs | Expr , Expr
SingleExpr ::= ConstNumber | ConstString | ListExpr | SetExpr
    | Var | (SingleExpr) | SingleExpr Dop SingleExpr
    | Mop SingleExpr
ListExpr ::= [ Expr : Qualifiers] |[Expr:]
SetExpr ::= {Expr:Qualifiers} |{Expr:}
Qualifiers ::= Qualifier,Qualifiers |Qualifier
Qualifier ::= Generator |Filter | Binder
Generator ::= Pattern <- Expr
Filter ::= Expr
Binder ::= Pattern := Expr
Pattern ::= Var | (Vars)
Vars ::= Var,Vars | Var
Var ::= an identifier
ConstNumber ::= integer or real numbers
ConstString ::= ¨any printable character¨
Mop ::= Any monadic operator of Table 2
Dop ::= Any dyadic operator of Table 2



Table 2: Monadic and dyadic operators with their associativity. Operators higher in the table have higher priority. The instringci operator is the case insensitive version of instring. The =ci operator is the case-insensitive version of =. The rightmost column gives examples of complete BioVelo (clickable) queries for each operator. Clicking on a query will bring you to the Free Form Advanced Query Page (FFAQP) where you can try out the query: once at the FFAQP, you can submit the query by clicking the "4. Submit Query" button. Or you can edit that query at the FFAQP to submit a modified version.
Operators Meaning Associativity Example (Meaning)
:= Bind a variable to a value. On the left of := is a variable name and on the right an expression. In general, the left operand can be a list of variables. n/a [trpa: trpa := ecoli~eg11024] (The trpA gene of ecoli by using the unique id eg11024 of trpa)
^^ Reference all objects of a class. The left operand is a database name, or a variable bound to a database name, and the right operand is a class name. n/a [p: p <- ecoli^^pathways] (All metabolic pathways of organism ecoli)
^ Reference the value of an attribute of an object. The left operand is an expression evaluating to an object and the left operand is a slot name. left [g^product : g <- ecoli^^genes] (The product, e.g. proteins, of all genes of ecoli)
^? Generates a URL string of the value of an attribute of an object. The left operand is an object, the right operand is a slot name. left [x^?name: x <- ecoli^^reactions] (All reactions of ecoli with their name as URL links)
~ Reference a specific object based on a frame-id in a database. The left operand is a database name (unique identifier of a database) and the right operand is a frame id (a unique identifier of an object). left ecoli~GLYCOCAT-PWY (The glycogen degradation pathway of ecoli)
sort Sort a query result right sort [p^name : p<-ecoli^^people] (List the sorted names (increasing order) of people associated with the ecoli database)
set Convert a list into a set n/a set [p^name :p <- ecoli^^people ++ human^^people] (The names of people that contributed to the ecoli and human databases without duplicate names)
# The length of a list, or the cardinality of a set. The right operand is a list or a set. n/a #(meta^^reactions) (The number of reactions in MetaCyc)
++ Concatenate two lists left [g : g <- ecoli^^genes ++ human^^genes] (All genes from ecoli and human databases)
-- Subtract one list or set from another list or set left [r : r <- meta^^reactions -- ecoli^^reactions]
** Intersection of two lists or two sets left [r^ec-number : r <- ecoli^^reactions,r^ec-number] ** [r^ec-number : r <- human^^reactions,r^ec-number] (All the EC numbers, attached to reactions, that exist in ecoli and human)
- Monadic minus (arithmetic negative) n/a -2 (The number minus 2)
/,*, remainder, quotient Arithmetic operators division, multiplication, remainder (integer), and quotient (integer) left 100 remainder 3 (The remainder of dividing 100 per 3, which is 1)
+,- Arithmetic addition and subtraction left
abs Absolute value function n/a
in, instring instringci element in list/set?, substring in string or list/set of strings? n/a [g : g <- ecoli^^genes, "trp" instringci g^name] (All ecoli genes that have 'trp' in their name)
~= Match a regular expression. The left operand is a string expression to match, the right operand is a regular expression (using Perl syntax) as a string. n/a [r: r <- meta^^reactions, r^ec-number ~= "^2.*"] (All MetaCyc reactions which have an EC number starting with 2)
=, =ci, !=, <,>,<=, >= Relational operators equal, equal case insensitive, not equal, smaller-than, greater-than, smaller-or-equal-than, greater-or-equal-than. The operands can be strings or numbers. n/a [x : x <- ecoli^^proteins, x^dna-footprint-size < 10] (All the proteins of ecoli for which the dna-footprint-size is smaller than 10)
! Logical operator not n/a [g : g <- ecoli^^genes, !(g^citations)] (All ecoli genes that have no citations)
& Logical operator and left [p : p <- meta^^pathways, !(p^citations) & !(p^comments)] (All pathways of MetaCyc that have no citations and no comment)
| Logical operator or left [g : g <- human^^genes, "adam" instringci g^name | "cys" instringci g^name] (All genes of the human database having a name containing the string 'adam' or 'cys')
evenp, oddp Is it an even number? Is it an odd number? n/a
special functions see Table 3 n/a



Table 3: Special functions provided in BioVelo. Note that all functions (20 functions) are provided to map between the objects compounds, proteins, genes, reactions, and pathways in the 'of' form (e.g., genes-of-reactions, compounds-of-reaction, reactions-of-compound, etc.) and 'to' form (e.g., reaction-to-genes, reaction-to-compounds, compound-to-reactions, etc.). In the left column, function names appearing in parentheses are synonyms of the main function described.
Function name Meaning
   
Reactions
enzrxn-activators Returns the list of activators of the given enzymatic reaction.
enzrxn-inhibitors Returns the list of inhibitors of the given enzymatic reaction.
reaction-to-genes (genes-of-reaction) Given a reaction, returns the list of genes involved in this reaction.
reaction-to-proteins (proteins-of-reaction) Given a reaction, returns the list of proteins involved in this reaction.
reaction-to-pathways (pathways-of-reaction) Given a reaction, returns the list of pathways containing this reaction.
reaction-to-compounds (compounds-of-reaction) Given a reaction, returns the list of compounds involved in this reaction.
Genes
genes-in-same-operon Given a gene, returns the list of other genes in the same operon.
genes-regulating-gene Given a gene, returns the list of genes regulating it by transcription initiation.
genes-regulated-by-gene Given a gene, returns the list of genes it regulates by transcription initiation.
binding-sites-affecting-gene Given a gene, returns all binding sites which are present in the same transcription units as the given gene.
gene-to-compounds (compounds-of-gene) Given a gene, returns the list of compounds involved in the reactions catalyzed by the products of the gene.
gene-to-reactions (reactions-of-gene) Given a gene, returns the list of reactions catalyzed by the products of the gene.
gene-to-pathways (pathways-of-gene) Given a gene, returns the list of pathways containing at least one reaction catalyzed by the products of the gene.
gene-to-proteins (proteins-of-gene) Given a gene, returns the list of proteins that are product of this gene and catalyzes at least one reaction.
Proteins and Enzymes
genes-regulated-by-protein Given a protein, returns the list of genes it regulates by transcription initiation.
monomers-of-protein Returns the monomers of the given protein complex.
containers-of Returns all the protein complexes that have the given protein as a component.
unmodified-form Returns the unmodified form of the given protein.
top-containers Returns the protein complexes that have the given protein as a component but that are not components of any protein complexes.
protein-to-components Given a protein, returns a list of pairs (component coefficient) where coefficient is an integer representing the number of occurrences of this component in the protein.
protein-to-genes (genes-of-protein, enzyme-to-genes) Returns all the genes that encode the monomers of the given protein.
protein-to-reactions (reactions-of-enzyme, reactions-of-protein) Returns all the reactions associated with a given protein via its enzymatic reactions.
protein-to-pathways (pathways-of-protein) Returns all the pathways associated with a given protein via its enzymatic reactions.
protein-to-compounds (compounds-of-protein) Returns all the compounds associated with a given protein via its enzymatic reactions.
Pathways
pathway-to-proteins (proteins-of-pathway, enzymes-of-pathways) Returns a list of enzymes involved in a given pathway.
pathway-to-genes (genes-of-pathway) Given a pathway, returns the list of genes involved in this pathway.
pathway-to-reactions (reactions-of-pathway) Given a pathway, returns the list of reactions in this pathway.
pathway-to-compounds (compounds-of-pathway) Given a pathway, returns all substrates of all reactions that are within the given pathway.
General
find-objects Given a database (object or string-name) and an object, returns a list of all objects from the given database that have the same name, synomym, or frame-id as the given object.
find-orthologs Given a database (object or string-name) and gene (or protein), returns a list of genes (or proteins) that are orthologs to the given gene (or protein) in the given database.
database Given any object, returns its database name.
maximum Given a list of numbers, returns the largest one.
minimum Given a list of numbers, returns the smallest one.
mean Given a list of numbers, returns their mean.
sum Given a list of numbers, returns their sum.
prod Given a list of numbers, returns their product.