Data Source Overview

A data source represents access via JDBC to a back-end database server. Each VDB definition must have at least one default data source associated with it, which will be used if no policy dictates what database should receive a query. Source configuration includes the connection properties, connection pooling, and Load balancing and cluster management characteristics.

The driver field allows either a new or existing JDBC driver to be specified. Heimdall comes with a small library of JDBC drivers that it has been tested with, but other drivers can be configured in the driver tab as needed. When using the proxy functionality of a VDB, the tested drivers should in general be used, as some internal and version specific features of a driver are often used by the proxy to provide compatibility with the binary layer translation that occurs.

The JDBC URL configuration will normally be build through the configuration wizard, or can be copied from an existing JDBC application. Properties on the JDBC URL can be moved to the name-value list in the connection properties, OR can remain in the JDBC URL–either works, although using the name-value list provides an easier to view list vs. a long URL. When internally doing database type detection, the JDBC url is also used to determine what type of database is in place, for per-database behavior adjustments.

The data source username and password are used for several purposes:

  1. For health checking the data source;
  2. When performing lag detection, to write to and read from the Heimdall schema;
  3. When doing cluster detection, reading cluster variables as appropriate, or when Heimdall needs to extract metadata for other purposes;
  4. When no user or password is specified in JDBC mode, this user will be used to connect to the database;
  5. In proxy mode, if proxy authentication is not provided. In this case, any user can connect, but the DB connection will be made with these credentials.

In some cases, it is necessary to specify various parameters in a dynamic way, i.e. with the Odoo application, it changes the active database at runtime, and connects to the desired database as desired. To support this, fill in the following values as necessary:

  • ${catalog} or ${database} to specify the name of the database (either will work)
  • ${schema} for the specified schema
  • ${user} for the connection user
  • ${password} for the password specified on connect (only for Postgres and SQL Server)

The test query option is used in a variety of situations, including health checks, and verifying that a connection is still viable after an exception. It is also used in a variety of conditions when pooling is configured, in order to validate the connection while idle, etc.

When using PostgreSQL, an optional checkbox is also presented, Use DB Stats. When selected, Heimdall will attempt to pull per-minute statistics from the Postgres data tracking module pg_stats_statements, and will expose the information to the Analytics tab.

To validate that the options appear valid, please use the Test Connection button. This will initiate a connection through the vendor driver by the Heimdall management server, and acts as a general guide if the configuration appears valid. It is not a guarantee that under all conditions it is valid however.

Connection properties

Connection properites are database and driver specific. When in proxy mode, a default set of properties will be inherited by the source in order to provide compatibility with the proxy mode used. In JDBC cases, no default properties will be set by Heimdall. When using a known driver, all supported connection properties will be provided in a drop-down list, and tool-tip help provided for each one when selected. When using an unknown driver, the property name field will allow an arbitrary name to be specified and value provided, as per your JDBC driver's documentation.

In order to add a custom property (in the event it isn't in the drop-down list), you can use the custom property checkbox next to the blue plus, to provide a free-form field.

Special properties:

  • dbTriggerTracking, dbTimeMs and dbChanges: Please see the Cache Theory section.
  • initSQLAsync: Please see the Async Theory section.
  • hdConnectRetryCount: Specifies the number of times a connection attempt will be made, defaults to 3.
  • hdConnectRetryDelay: The length of time between retry attempts. This value doubles on every retry for a given connection. Defaults to 0 (ms).
  • hdConnectTimeout: How long each attempt to connect will be made in ms. This option sets database driver specific values based on what is supported. Defaults to 1000(ms).
  • hdUsePgNotify: Instead of using a cache's pub/sub interface, when using Postgres, use the postgres notify interface for cache invalidation and read/write split last write time notifications. Please note, this still requires a cache to be enabled, although it can be the "local only" cache option.
  • hdNotifyDBs: When using hdUsePgNotify, by default only the default catalog is listend to. This option allows one or more catalogs to be listened to, for trigger based invalidation. A value of * means "all catalogs", with comma separated values starting with "-" removing from the complete list. Example: *,-template0,-template1 would be a generally good starting point.
  • azureDbHost: Please see the [Azure] (../environment/azure.md) documentation for more details.
  • dbTriggerTracking, dbChanges, dbTimeMs: Options to enable query based table change tracking, leveraging triggers. See the caching page for more details.

Please see the UI for a more complete list, including vendor options, as the drop-down list provides context sensitive help on each option provided.

Connection Pooling

The pooling section controls if connection pooling (the reuse of back-end connections for many front-side connections) is used internally to Heimdall, and if so, what the connection pooling properties are. Heimdall implements a variant of the Tomcat connection pool, and where appropriate uses the same properties for configuration.

The primary difference between the Heimdall and Tomcat connection pool is that the Heimdall pool implements it as a "pool of pools" where each user/database (catalog) combination becomes a pool within the connection pool, but overall resources are constrained by the overall maxActive option, which dictates the overall number of busy AND idle connections that can be established to the data source. Additional properties of maxUserActive and maxUserIdle dictate how many active (busy+idle) and idle connections are allowed on a per-catalog+user combination. This provides much more granular control over the behavior of the connection pool when many databases and users are connecting to the database server.

The properties section provides a drop-down list of the available properties, and tool-tip help for them. The available properties are:

Important queries--in general these should be set in all cases:

  • maxAge: The chances of a connection being closed when being returned to the pool (1 out of X) or during an eviction run if idle. Old behavior was the Time in milliseconds to expire the connection on return or idle (Long, 20). connections less than the timeBetweenEvictionRunsMillis will not be culled at random, so if there is activity on the connection at least that often, it will remain connected.
  • maxActive: The maximum number of open connections (Integer, no default, however an attempt is made to detect the value from the server)
  • maxUserActive: The maximum number of open connections for a particular user, can be overridden with a rule (Integer, 0)
  • maxUserIdle: The max number of idle connections for a particular user, can be overridden with a rule (Integer, 0)
  • maxWait: The time to wait in milliseconds before a SQLException is thrown when a connection is requested, but can't be honored due to the number of connections (Integer, 30000)
  • resetQuery: The query to run after a connection is returned to the pool (String, no default)
  • testOnReturn: True if validation happens when a connection is returned, implied with a reset query (Boolean, false)
  • validationQuery: The query to run during validation (String, select 1)
  • rateLimit: Total connections/ms that can be opened, (double, 2)
  • maxBurst: Number of ms of burst that can be allowed before clamping occurs (double, 50)

Other properties:

  • abandonWhenPercentageFull: Connections that have been abandoned isn't closed unless connections in use are above this percentage (Integer)
  • commitOnReturn: If autoCommit!=null and autocommit is not set then the pool can complete the transaction by calling commit on the connection as it is returned to the pool (Boolean, false)
  • rollbackOnReturn: If autoCommit!=null and autocommit is not set then the pool can terminate the transaction by calling rollback on the connection as it is returned to the pool (Boolean, true)
  • connectionProperties: The connection properties that will be set for new connections. Format of the string will be [propertyName=property;]* (String)
  • defaultAutoCommit: The default auto-commit state of connections created by this pool. If not set, default is JDBC driver default (If set to "null" then the setAutoCommit method will not be called.) (boolean, true). When the connection is returned to the pool, this state will be reset if necessary.
  • defaultCatalog: (String) The default catalog of connections created by this pool.
  • defaultReadOnly: (boolean) The default read-only state of connections created by this pool. If not set then the setReadOnly method will not be called. (Some drivers don't support read only mode, ex: Informix)
  • defaultTransactionIsolation: (String) The default TransactionIsolation state of connections created by this pool. One of the following: NONE,READ_COMMITTED,READ_UNCOMMITTED,REPEATABLE_READ,SERIALIZABLE. If not set, the method will not be called and it defaults to the JDBC driver.
  • initSQL: A SQL executed once per connection, when it is established (String)
  • logAbandoned: If true, stack trace will be recorded and printed out for timed out connection (Boolean)
  • logValidationErrors: Log errors during the validation phase to the log file (Boolean, false)
  • minEvictableIdleTimeMillis: Minimum amount of time a connection stays idle before it is evicted (Integer, 60000)
  • removeAbandoned: True if connection in use can be timed out (Boolean)
  • removeAbandonedTimeout: Timeout in seconds for connections in use (Integer, 60)
  • suspectTimeout: Timeout in seconds for connection that suspected to have been abandoned (Integer)
  • testOnBorrow: True if validation happens when a connection is requested (Boolean)
  • testOnConnect: Validate connection after connection has been established (Boolean)
  • testWhileIdle: True if validation happens when a connection is not in use (idle) (Boolean)
  • timeBetweenEvictionRunsMillis: Sleep time for background thread in between pool checks (Integer, 5000)
  • validationInterval: If larger than zero than validation will only occur after the interval milliseconds has passed (Long, 3000)
  • validationQueryTimeout: The timeout in seconds before a connection validation/reset queries fail (Integer)

For additional information about each pool parameter, please see the Tomcat pool attributes page (https://tomcat.apache.org/tomcat-9.0-doc/jdbc-pool.html#Attributes).

Warning: Not all applications will be compatible with connection pooling. If issues are observed in initial application testing, it is recommended that disabling connection pooling be performed in order to isolate if this feature is triggering undesirable behaviors in the application. If using the connection pooling, the default number of connections is set to 1000. To adjust this, change the “maxActive” setting to the desired number of connections.

Load balancing & High Availability

Due to the complexity of this topic and how the options interact with other subsystems, please see the help section dedicated to Load Balancing for more details.

Configuring Trigger Based Invalidation

In order to support out-of-band data ingestion into the database, a data source can be configured for "Trigger based invalidation". There are several options available on the data source for this configuration:

  • dbTriggerTracking: set to true if this feature is to be enabled
  • dbTimeMs: A query to retrieve the current time in MS on the database, to insure timestamps are synchronized
  • dbChanges: A query to retrieve a timestamp (as a long) and table name (as a string), for the last invalidation time of that table.

The dbTimeMs call for example defaults to the value of "SELECT heimdall.get_time_ms()", usable on MySQL without any changes. The dbChanges call defaults to {CALL heimdall.changes(@ts)}. This can be a stored procedure. The string "@ts" will be replaced automatically with the last modification (per the DB time) that has been observed, so can be used to pull only the tables that have been modified since the last invalidation observed. The table name returned needs to be fully qualified, as observed with the "printTables" option for the rules, as this is the table name we will be comparing for invalidation purposes.