Rule Overview

A rule list is a set of rules that is processed in order on queries, and dictates the processing done on a particular query and its response. A rule is composed of several components:

  • #: A current row indicator (for moving rows) and the index of the rule.
  • Enabled: If the rule is enabled. Non-enabled rules will not be processed.
  • Comment Indicator: When a comment is attached to a rule, an icon will be visible, and the comment can be viewed by hovering the mouse over the icon
  • Regular Expression: See below for regular expression behavior
  • In-Trans Flag: This specifies if a rule should operate on a query that is in the middle of a transaction, i.e. it is not in auto-commit mode in an explicit transactional context (between "Start Transaction" and "commit" or "rollback")
  • Action: What action should be taken on a regex match
  • Parameter: A parameter that modifies the behavior of the action
  • Value: The value of the parameter, e.g. to specify the maximum time to live for a cache rule
  • Edit: An icon to provide an expanded view of the rule, and to allow editing of comments
  • Delete: Toggle deletion of a rule on the next commit

To re-order a rule, it can simply be dragged into place in the desired order.

Hint: To better edit a rule, make sure to use the pencil icon to open up the expanded window, which provides more editing options.

Regular Expression Field Behavior

Each rule has a field that can contain either a re2j 1.1 (https://github.com/google/re2j) regular expression, or an extended specifier (below). The re2j regular expression language is nearly identical to normal Java regular expressions, except can operate in linear time, while Java regular expressions may end up being unbounded in time. As a tradeoff, certain features dependent on backtracking are removed.

Examples of some simple regular expressions:

  • "(?i)^select" Case insensitive match all queries that start with "select" at the start of the line
  • "(?i)somestring" Case insensitive match queries with the string "sometable" anywhere in the query

In addition to regular expressions, the following extended specifiers can be used instead:

  • tables:{table1},{table2}..., to match any one table in the named list.
  • tablesall:{table1},{table2}..., match only if all the tables in the list are present in the query tables.
  • tablesexactly:{table1},{table2}..., match only if every table in the list is present in the query tables, and no other tables are present, i.e. an exact match between table lists.
  • users:{user1},{user2}..., match if the query's uses matches one of the users in the list (as regex).
  • ldapgroups:{group1},{group1}..., Match if any of the named groups matches a group the connected user is a member of (as regex)
  • catalog:{catalog1},{catalog2}..., match if the query catalog matches any catalog in the list (as regex)
  • catalogprefix:{prefix1},{prefix2}..., match if the query catalog starts with any string in the list
  • appname:{application_name1},{application_name2}..., match if the postgres application name matches (as regex)
  • ports:{port1},{port2}..., match the port the query was received on, via the vdb. This allows multiple ports to be used on the vdb, and have a different behavior based on which port was used.
  • ips:{ip1},{ip2}..., match the IP the query was received from, via the vdb. Subnet matching is not currently supported. If complex IP based rules are desired, it is suggested that an additional port be added to the proxy, the ports qualifier be used, and firewall rules can be used to control what IP ranges have access to this new port with the desired configured behaviors.

Note for extended specifiers: The fully qualified catalog.table name is expected, so if using the "magento" database with a table of "users", you would match against "magento.users". Please note the dynamic parameters below for substitutions that are supported here as well. To verify that the table name is as expected, use the "printtable" option, view the query in the expanded analytics tab, or use the debug option to view the fully qualified table name as used internally. Alternatively, use the "expanded query view" in the analytics, and then scroll down to the table names:

To open the expanded query view:

And to view the fully-qualified table names, in the expanded view, scroll down to the tables section before any query plans:

Additionally, you can use the string "${catalog}" to match the current connections catalog in processing a rule, so as to allow matching on multiple catalogs as appropriate.

Examples of some extended regular expressions:

  • "tables:magento.core_store" to match on the table magento.core_store only, but any other table may exist as well
  • "tables:${catalog}.core_store" to match on the table core_store only, but in any catalog, not just the magento catalog
  • "tablesexactly:magento.core_store" to match on the table magento.core_store only, and no other tables are in the query

When using the expanded rule view, it is possible to add more than one regex field to a rule, and then specify if AND or OR logic should be used to connect them. This allows for example, one to specify a tables: specifier, then to filter say INSERT, UPDATES or DELETES only against that table.

Rule Processing Behavior

All rules are processed in order of evaluation. This means that a rule that matches earlier in processing can be overridden later in processing by the same rule type.

In order to process a rule, the following steps are taken:

  • Check if rule list is enabled
  • Iterate through rules
  • Check if a rule is enabled
  • Check if the in-transaction flag is set when in a transaction
  • Check if the rule type should be excluded for processing, due to a previous ignore rule
  • Check if the regular expression matches (see below)
  • Process the capture parameter, if set, for dynamic parameters
  • Check if the rule rate matching limits are allowed
  • Process the update, printmatch, printtables, tables, and notify flags (although no action is taken at this point)
  • Process the rate limiting parameters (except delay) this is done only once, after caching, async and forwarding are executed)
  • Per-action rule processing

Note: In the case of multiple Allow rules in consecutive order, the rules internally will self-organize based on the frequency of hits observed by the driver. The more often a rule has been hit, the further up the list it will be positioned to optimize the regex lookup performance. Any other type of rule will be processed in the order specified, and no Allow rule will optimize around another other rule type.

Special case: The nocache rule behaves as both a "cache with a ttl of 0" and "ignore all other cache rules following", i.e. it prevents all past and future cache rules from impacting the query. Most rules can be overridden, but for safety, the nocache rule can not be overridden once it matches a particular pattern.

Rule Flow Control & Dynamic Parameters

Several actions and properties can be used to control the processing of rules for more advanced applications:

  • Ignore Action: When an ignore action is specified, the name of an action type can be specified. Any rule matching that action type will be ignored in any further processing of the matching query. This can be used to create exclusions where a particular action shouldn't be taken, but creating a regular expression to account for this would be difficult.
  • Call Action: A call can be used to call by name a nested rule-list, to allow one rule-list to be called from another rule-list. This can be leveraged to provide a common set of rules for an application, but custom behaviors for particular instances. Another use case is to use a single regular expression match to apply multiple actions to a given set of content, as in the called rule, an empty regex can be used.
  • Stop Parameter: When specified on a matching rule, no further processing will occur in the current rule-list for the given query. If in a nested rule-list due to a Call action, the calling rule-list will continue to be processed.
  • Capture Parameter: This parameter triggers dynamic property evaluation (this is not the default). When set, regular expression capture groups are used to edit the parameter values for the query, in order to dynamically adjust behavior. A common use would be to include a comment at the front of a query, and then use the value in the comment part of or a parameter's value, for example, to dynamically set the TTL of a query based on the comment's included value: Example:

    Regex: /* (.*) */

    Action: Cache, ttl=${1}

    Query: / ttl=30000 / SELECT * from Table

    Result: The query would be cached for 30 seconds

Similar to the capture parameter based on regular expression capture groups, some built-in strings have special meaning in properties, and will be replaced with the internal values during rule processing:

  • ${catalog} or ${database}: the current catalog/(database in MySQL term) being used.
  • ${vdb}: The VDB the traffic was received on
  • ${user}: The JDBC user on a connection

Note: These variables can be used when using the extended table matching syntax, such as "tables:${catalog}.tablename" to avoid having to specify the actual catalog to match against. There is no need to specify the "capture" parameter for these to operate in this context.

Rate Limiting

Often, such as in the case of a "Log" rule action, it is desirable to limit the rate that a particular rule is matched to an absolute number. Each rule has several options that can be used for this:

  • matchlimit: Enforces a rate of X pattern matches, with up to Y seconds of burst. To specify the burst value, use the parameter "maxburst". Useful to limit the amount of log data generated in a similar manner to using the sample parameter.
  • maxburst: Specify the number of seconds of bursting that is allowed before clamping the rate for the ratelimit or matchlimit parameters. Defaults to 10.

Another case is that a rule should be evaluated only a certain percentage of the time, say 1 out of 10. The following parameters can be used for this:

  • exclude: A ratio of queries should NOT match the policy, as expressed as 1/X. Useful to allow a small number of objects that would otherwise be a cache hit to be a cache miss instead. Example: A value of 10 would exclude 1 out of every 10 matches from actually matching the rule.
  • sample: A ratio of queries should match the policy, as expressed as 1/X. Useful to limit logging to a small percentage of traffic. Example: A value of 10 would mean that 1 out of 10 matches would be treated as a match.

A final case is where queries matching a rule should themselves be slowed down, say to help prevent an outage of all traffic due to a DoS attack. The following parameters can be used in such cases:

  • delay: When a rule with a delay flag matches, induce a delay in processing. Useful to determine if a rule can have an overall impact on performance, i.e. if a delay doesn't make a noticeable difference, then caching won't either. Also useful in inducing delays to determine if moving the database further away will impact performance significantly. This delay will occur only if the query is requested from the database, i.e. it will not impact a cached query.
  • ratelimit: Enforces a rate of X queries per second, with up to Y seconds of burst. To specify the burst value, use the parameter 'maxburst'. This will actually SLOW DOWN queries matching this rule to at most this value.

Rule Details

  • For Any Rule:
    • capture: Specifies that capture groups should be used from the matched regex, to fill in values in the parameters, ${1} for the first, ${2}, for second, etc
    • delay: Adds X microseconds to each query, to help model the impact of latency between the db client and server, for hybrid cloud impact testing
    • evicttables: Tables that should be flagged as evicted when a query matches the rule
    • exclude: Exclude matching the rule 1/X times, to allow a small sample of queries to match
    • invalidate: Can be specified to override invalidation behavior of writes for matching queries, default is to allow invalidations to occur.
    • log: When specified, behave as if a log rule had matched, used to consolidate rules
    • logger: Specify the log type, options are "hdlog" or "log4j". Multiple values can be specified, separated by a ,
    • matchdelta: Specifies the minimum time between matches on a per-proxy basis, overrides matchlimit
    • matchlimit: Specifies the frequency (in X per second) that a rule should match
    • maxburst: When using matchlimit or ratelimit, specify the number of seconds worth of bursting to allow before limiting the rate
    • notify: If specified as true, flag that the notification configuration should be used to notify on a match
    • notrack: When specified, don't track the performance of this query for statistics records. Useful to "hide" overhead queries that aren't impacted by performance
    • olderthan: specify the time window in ms to allow content to not be evicted on eviction. Negative prevents caching into the future.
    • printcapture: When a rule matches, print capture groups of the rule match
    • printmatch: When a rule matches, print the rule that matched
    • printmeta: Print the resultset metadata for results returned
    • printresults: Prints to stdout the result of matching queries
    • printtables: If specified, print the table attributes to be used for caching & invalidation
    • printtiming: Prints to stdout the execution time of this rule, useful for profiling called rulelists, such as for the SQL Firewall
    • ratelimit: Limits matching queries to X per second. This is tracked on a per-rule basis, so ratelimit and matchlimit can interact on the same rule
    • sample: Match the rule only 1/X times, often to reduce frequency of logging, but remain proportionate to load
    • stop: If specified on a matching rule, no further rule processing in the current rule list will occur
    • tables: For any matching rule, specifies the tables that should be added to a query's table list. Use "none" to allow caching without tables. Newlines and whitespace are acceptable between table names, but not in table names.
    • update: Specifies if the matching queries should be processed as updates by the cache engine
    • queryTimeout: Query execution timeout
  • Async Execute: Execute insert/update/delete asynchronously, not supported by MySQL proxy
    • asyncsize: The maximum length of a query (in characters) that can be made asynchronous
    • batchSize: The maximum batch size of queries to be executed--queries will not be delayed to fill the queue, but this sets the maxiumim size of
    • maxPollInterval: The maximum time (in ms) to poll after an async query is received before a batch is processed, defaults to 0 for no delay the batch.
    • source: The name of the data source to forward the query to for async execute. This source must be specified in the VDB definition.
    • queueName: The per-source named async queue to process the query from. Defaults to the name of the data source the query is sent to. Set to ${table} to allow dynamic replacement of the table name (if only one).
    • holdUntil: When to hold the thread until
    • spoofedResult: The result the DML will return if the result is spoofed (only for immediate holdUntil values).
    • lockTable: Lock tables of async DML while the DML is pending, to serialize access to the table (default to TRUE)
  • Nocache: Disable caching for a query, overrides previous cache rules as well as prevents later cache rules from being evaluated
  • Cache: Use the cache configuration for the VDB to cache matching queries, if possible
    • ttl: Specifies the TTL in ms of objects cached, if not evicted due to a write first
    • cachesystem: True if Heimdall should cache system tables, and not just user tables, can break things!
    • hitonly: True if Heimdall should throw an exception if there is no cache entry
    • unconditional: if true, bypass many internal checks to insure a query is cacheable
    • maxLocalSize: Specify the largest size of a cachable object in local memory, defaults to Integer.MAX_VALUE
    • maxSize: Specify the largest size of a cachable object, defaults to Integer.MAX_VALUE
    • dmlThreshold: Specify the value (as integer percent) of DML operations that negate caching on a table (defaults to 5)
    • autoRefresh: Specify that queries may be automatically refreshed in cache if invalidated (but not on TTL expiration). Only a limit number of queries are tracked for this purpose, configurable on the vdb properties
  • Drop: If not allowed with an allow rule, drop the request, and generate SQL Exception
  • Result: Return an integer value as a result of a SQL call, i.e. for rows updated
  • Retry: Retry execution if exception occurs
    • maxRetries: Maximum number of attempts that will be made if none of them succeeds
    • exception: String that exception should contain
    • retryDelay: Delay between retry attempts
  • Forward: Allows queries to be sent to alternate data sources for execution
    • source: The name of the data source to forward the query to. This source must be specified in the VDB configuration.
    • setcatalog: If in forwarding, should we inherit/set the catalog from the parent connection on the forwarded connection.
    • setcatalog: If in forwarding, should we inherit/set the schema from the parent connection on the forwarded connection.
    • readonly: When forwarding, should the connection be allowed to read-only servers.
  • Call: Execute (call) a named rule list, must also be included in the VDB config to work
    • rule: The rule name to call/execute in place of the call rule itself, if matched.
  • Ignore: Ignore rules of the specified type in the current rule-list that occur after this rule
    • action: What type of action to ignore after matching
  • Log: Log the SQL observed in the query
    • logcolumns: List of comma separated column names/aliases to save in log records, based on the resultset metadata. Use '*' or 'all' for all columns. Note: HIGHLY impacting on performance.
    • slowtime: Response time in microseconds above which logging should be triggered.
  • Extract Plan: Automatically (periodically) retrieve query plan on queries
    • threshold: The response time threshold in ms after which the plan is requested, based on query pattern average response times
    • frequency: How many times the query plan can be requested per hour
    • format: What format to use when extracting an execution plan
    • force: By default Extract Plan works only for queries with result set from DB (for example, SELECT queries). To force explaining on any queries set the parameter to true.
    • checkTableMismatch: For SQL Server, check if tables in the explain match what is tracked internally by Heimdall
  • Learn Pattern: Generate a firewall learn action, to add a rule to a named rule list
    • rule: The name of the target rule to learn into.
    • action: When learning, specifies the default rule type to ignore, defaults to allow
  • Allow: Explicity allow a query to bypass the learn and drop rule types
  • Trigger: Trigger another SQL or script to execute as a result of a matching query
    • type: Specifies the type of trigger action to perform. If execute, then it will call the named program or script, and execute on the central manager (always). If SQL, it will execute on the proxy/driver unless in parallel, delay and queueunique is TRUE, then it will process on the central manager.
    • command: The command to execute, either as sql or as a script. Accepts a replacement value of ${tables} to represent the tables in the query.
    • timing: When to execute the command, either before, after, or in parallel to the triggering sql. With before, if an exception is encountered on the initial query, the trigger action is not performed. If an exception is encountered with after, then the exception will be reported to the calling thread. When sql is set to execute in parallel mode, it will execute on the central manager if delayed, on the proxy/driver if not, and exceptions will only be reported in the logs.
    • delay: When running in parallel to the triggering sql, how much time to delay processing (in ms), if any
    • maxDelay: When running in parallel, delayed and queueunique, how often should we run (at least) even if the trigger action keeps occurring, i.e. to insure data inserted is flushed to another db at least X times an hour
    • queueUnique: When performing delayed parallel queries, should the execution queue only maintain unique command executions, to prevent the same command from executing over and over later?
    • source: The name of the data source to execute a SQL query on. This source must be specified in the VDB configuration.
  • Stack Trace: Generate a Java stack trace in logs on match
    • threshold: Response time threshold for stack traces to trigger (in microseconds)
  • Transform: Change the query based on capture group manipulation
    • target: When translating a query, the target of the translation, with $1 representing the first capture group, $2 the second, etc.
  • Table Cache: Apply a TTL to a query based on if the table specified is referenced by the query. Lower TTL values override higher ones
    • ttl: Specifies the TTL of objects cached, if not evicted due to a write first
    • table: In table cache rules, specifies the name of the table to apply settings for
  • Tag: A no-op action, for use with universal parameters only
  • Debug: Allows to enable and disable debug mode for a connection
    • enabled: If true, enables driver debug logging at that point
  • Pool: Pool and connection multiplexing behavior overrides
    • maxUserActive: Sets the per-user active pool parameter (idle+busy), and overrides any default set in the source settings. Useful to allow individual users to have more connections than others, and can be set based on AD group, say to allow service accounts more connections than general users (Integer).
    • maxUserIdle: Overrides the per-user allowed idle connection limit (Integer).
    • multiplex: Override the vdb level multipex setting to either enable or disable multiplexing (Boolean).
    • reuse: Control if connections can be reused on return by pooling, defaults to true (Boolean). Only impacts the primary (writer) connection, not read/write split connections.