Ignite .NET supports free-form SQL queries virtually without any limitations. SQL syntax is ANSI-99 compliant. You can use any SQL function, any aggregation, any grouping and Ignite will figure out where to fetch the results from.
Ignite supports collocated and non-collocated distributed SQL joins. Moreover, if data resides in different caches, Ignite allows for cross-cache joins as well.
Joins between PARTITIONED
and REPLICATED
caches always work without any limitations. However, if you do a join between two PARTITIONED
data sets, then you must make sure that the keys you are joining on are either collocated or you enabled non-collocated joins parameter for the query.
See example SqlQuery JOIN below.
Instead of selecting the whole object, you can choose to select only specific fields in order to minimize network and serialization overhead. For this purpose Ignite has a concept of fields queries
. Also it is useful when you want to execute some aggregate query.
See example SqlFieldsQuery below.
You can query data from multiple caches. In this case, cache names act as schema names in regular SQL. This means all caches can be referred by cache names in quotes. The cache on which the query was created acts as the default schema and does not need to be explicitly specified.
See example Cross-Cache SqlFieldsQuery.
var cache = ignite.GetOrCreateCache<int, Person>("personCache");
var sql = new SqlQuery(typeof(Person), "Salary > ?");
var cursor = cache.Query(sql);
foreach (var cacheEntry in cursor)
Console.WriteLine(cacheEntry.Value);
var cache = ignite.GetOrCreateCache<int, Person>("personCache");
// SQL join on Person and Organization.
var sql = new SqlQuery(typeof(Person), "from Person as p," +
"\"orgCache\".Organization as org" +
"where p.OrgId = org.Id " +
"and lower(org.Name) = lower(?)");
// Find all persons working for Ignite organization.
foreach (var cacheEntry in cache.Query(sql))
Console.WriteLine(cacheEntry.Value);
var cache = ignite.GetOrCreateCache<int, Person>("personCache");
// Execute query to get names of all employees.
var sql = new SqlFieldsQuery(
"select concat(FirstName, ' ', LastName) from Person as p");
// Iterate over the result set.
foreach (var fields in cache.QueryFields(sql))
Console.WriteLine("Person Name = {0}", fields[0]);
// In this example, suppose Person objects are stored in a
// cache named 'personCache' and Organization objects
// are stored in a cache named 'orgCache'.
var personCache = ignite.GetOrCreateCache<int, Person>("personCache");
// Select with join between Person and Organization to
// get the names of all the employees of a specific organization.
var sql = new SqlFieldsQuery(
"select p.Name " +
"from Person as p, \"orgCache\".Organization as org where " +
"p.OrgId = org.Id " +
"and org.Name = ?", "Ignite");
foreach (IList fields in personCache.QueryFields(sql))
Console.WriteLine("Person Name = {0}", fields[0]);
By default, if an SQL join has to be done across a number of Ignite caches, then all the caches have to be collocated. Otherwise, you will get an incomplete result at the end of query execution because at the join phase a node uses the data that is available only locally. Referring to Picture 1. below you will see that, first, an SQL query is sent to all the nodes (Q
) where data, required for a join, is located. After that the query is executed right away by every node (E(Q)
) over the local data set and, finally, the overall execution result is aggregated on the client side (R
).
Picture 1. Collocated SQL Query.
Besides the fact that the affinity collocation is a powerful concept that, once set up for an application's business entities (caches), will let you execute cross-cache joins in the most optimal way by returning a complete and consistent result set, there is always a chance that you won't be able to collocate all the data. Thus, you may not be able to execute the whole range of SQL queries that are needed to satisfy your use case.
The non-collocated distributed joins have been designed and supported by Apache Ignite for cases when it's extremely difficult or impossible to collocate all the data but you still need to execute a number of SQL queries over non-collocated caches.
❗️Don't overuse the non-collocated distributed joins based approach in practice because the performance of this type of joins is worse than the performance of the affinity collocation based joins due to the fact that there will be much more network round-trips and data movement between the nodes to fulfill a query.
When the non-collocated distributed joins setting is enabled for a specific SQL query with the SqlQuery.setDistributedJoins(boolean)
parameter, then, the node to which the query was mapped will request for the missing data (that is not present locally) from the remote nodes by sending either broadcast or unicast requests. This is depicted on Picture 2. below as a potential data movement step (D(Q)
). The potential unicast requests are only sent in cases when a join is done on a primary key (cache key) or an affinity key, since the node performing the join knows the location of the missing data. The broadcast requests are sent in all the other cases.
Picture 2. Non-collocated SQL Query.
👍Neither broadcast nor unicast requests, that are sent by one node to another in order to get the missing data, are executed sequentially. The SQL engine combines all the request into batches. This batch size can be managed using
SqlQuery.setPageSize(int)
parameter.
Refer to the non-collocated distributed joins blog post for more technical details.
const string orgName = "Apache";
var qry = cache.Query(new SqlQuery("Employee",
"from Employee, \"dotnet_cache_query_organization\".Organization " +
"where Employee.organizationId = Organization._key and Organization.name = ?", orgName)
{
EnableDistributedJoins = true
});
Console.WriteLine(">>> Employees working for " + orgName + ":");
foreach (var entry in qry)
Console.WriteLine(">>> " + entry.Value);
Indexes can be configured by marking cacheable type members with QuerySqlFieldAttribute
and QueryTextFieldAttribute
. These types should be passed to CacheConfiguration(string name, params Type[] queryTypes)
constructor, or one of the QueryEntity constructors.
var cfg = new IgniteConfiguration
{
CacheConfiguration = new[]
{
// Configure queries for a cache with Person values (cache keys are not indexed)
new CacheConfiguration("personCache", typeof(Person)),
new CacheConfiguration
{
QueryEntities =
{
// Configure indexing for both keys and values
new QueryEntity(typeof(int), typeof(Company))
}
}
}
};
To make fields or properties accessible for SQL queries you have to mark them with [QuerySqlField]
. Property Age
will not be accessible from SQL. Note that none of these properties are indexed.
public class Employee
{
[QuerySqlField]
public string Name { get; set; }
[QuerySqlField]
public long Salary { get; set; }
public int Age { get; set; }
}
📘Field Names and Serialization
Marking field or property with
[QuerySqlField]
automatically addsQueryField
to the correspondingQueryEntity
. You have to make sure that SQL field name and serialized field name are the same.For example, default Ignite reflective serializer operates on fields. So if you follow standard naming convention, your backing field will be
_name
, and that name is used for serialization. So you have to mark that field, and not a property, with[QuerySqlField]
, and use_name
in SQL queries.With automatic properties, though, it just works, because Ignite recognizes them and trims backing field specifier.
📘Predefined Fields
In addition to all the fields marked with
[QuerySqlField]
attribute, each table will have two special predefined fields-_key
and_val
that represent links to whole key and value objects. This is useful, for example, when one of them is a primitive and you want to filter by its value. To do this, execute a query likeSELECT * FROM Person WHERE _key = 100
.
To make fields not only accessible by SQL but also speedup queries you can index field values. To create a single column index you can annotate field with [QuerySqlField(IsIndexed = true)]
attribute.
public class Employee
{
// Index in ascending order
[QuerySqlField(IsIndexed = true)]
public string Name { get; set; }
// Index in descending order
[QuerySqlField(IsIndexed = true, IsDescending = true)]
public long Salary { get; set; }
// Enable field in SQL, but don't index
[QuerySqlField]
public int Age { get; set; }
}
To have a multi-field index to speedup queries with complex conditions, you can use QuerySqlField.IndexGroups
property. It is possible to put multiple groups into IndexGroups array if you want the field to participate in more than one group index.
For example of a group index in the class below we have property Age
which participates in a group index named "age_salary_idx"
with descending sort order. Also in the same group index there is a property salary
with ascending sort order. On top of that field salary
itself is indexed with single column index.
public class Employee
{
[QuerySqlField(IsIndexed = true, IndexGroups = new[] {"age_salary_idx"}, IsDescending = true)]
public int Age { get; set; }
[QuerySqlField(IsIndexed = true, IndexGroups = new[] {"age_salary_idx", "salary_idx"})]
public long Salary { get; set; }
}
Indexes and fields can also be configured with Apache.Ignite.Core.Cache.Configuration.QueryEntity
either in code, in app config, or in Spring XML. It is equivalent to attribute-based configuration because attributes are converted to query entities internally.
var cfg = new IgniteConfiguration
{
CacheConfiguration = new[]
{
new CacheConfiguration
{
QueryEntities = new[]
{
new QueryEntity
{
KeyType = typeof(int),
ValueType = typeof(Employee),
Fields =
{
new QueryField {Name = "Name", FieldType = typeof(string)},
new QueryField {Name = "Salary", FieldType = typeof(long)},
new QueryField {Name = "Age", FieldType = typeof(int)}
},
Indexes =
{
new QueryIndex("Name"),
new QueryIndex
{
Fields =
{
new QueryIndexField {Name = "Salary"},
new QueryIndexField {Name = "Age", IsDescending = true}
},
IndexType = QueryIndexType.Sorted,
Name = "age_salary_idx"
}
}
}
}
}
}
};
<cacheConfiguration>
<queryEntities>
<queryEntity keyType='System.Int32' valueType='Apache.Ignite.ExamplesDll.Binary.Employee, Apache.Ignite.ExamplesDll'>
<fields>
<queryField name='Name' fieldType='System.String' />
<queryField name='Salary' fieldType='System.Int64' />
<queryField name='Age' fieldType='System.Int32' />
</fields>
<indexes>
<queryIndex>
<fields>
<queryIndexField name='Name' />
</fields>
</queryIndex>
<queryIndex name='age_salary_idx' indexType='Sorted'>
<fields>
<queryIndexField name='Salary' />
<queryIndexField name='Age' isDescending='true' />
</fields>
</queryIndex>
</indexes>
</queryEntity>
</queryEntities>
</cacheConfiguration>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="mycache"/>
<!-- Configure query entities -->
<property name="queryEntities">
<list>
<bean class="org.apache.ignite.cache.QueryEntity">
<property name="keyType" value="Long"/>
<property name="valueType" value="Employee"/>
<property name="fields">
<map>
<entry key="name" value="java.lang.String"/>
<entry key="age" value="java.lang.Integer"/>
<entry key="salary" value="java.lang.Long "/>
</map>
</property>
<property name="indexes">
<list>
<bean class="org.apache.ignite.cache.QueryIndex">
<constructor-arg value="name"/>
</bean>
<!-- Group index. -->
<bean class="org.apache.ignite.cache.QueryIndex">
<constructor-arg>
<list>
<value>age</value>
<value>salary</value>
</list>
</constructor-arg>
<constructor-arg value="SORTED"/>
</bean>
</list>
</property>
</bean>
</list>
</property>
</bean>
🚧Make sure to use assembly-qualified type names in app.config
Since SQL queries are executed in Java by H2 engine, Ignite.NET maps .NET types to Java types via paired properties in QueryEntity
and QueryField
:
QueryEntity.KeyType
and QueryEntity.KeyTypeName
QueryEntity.ValueType
and QueryEntity.ValueTypeName
QueryField.FieldType
and QueryField.FieldTypeName
Type properties are .NET types, and TypeName properties are Java type names. Type properties set TypeName properties automatically, but NOT vice versa.
DateTime
is mapped to Timestamp
. All DateTime
values used in queries must be UTC.sbyte
, ushort
, uint
, ulong
are not present in Java and are mapped to byte
, short
, int
, long
using bitwise conversion (SQL may not work as expected for out-of-range values).🚧DateTime and SQL
DateTime
can be Local and UTC; JavaTimestamp
can only be UTC. Because of that, Ignite.NET can serializeDateTime
in two ways: .NET style (can work with non-UTC values, does not work in SQL) and asTimestamp
(throws exception on non-UTC values, works properly in SQL).Reflective serialization: mark field with
[QuerySqlField]
to enforceTimestamp
serialization.
IBinarizable
: useIBinaryWriter.WriteTimestamp
method.When it is not possible to modify class to mark fields with
[QuerySqlField]
or implementIBinarizable
, useIBinarySerializer
approach. See Serialization for more details.
🚧SQL Date Functions
Date and Time SQL functions (such as
HOUR
) produce result according to current time zone, which may provide unexpected results, given that everything else is UTC. To force UTC for SQL functions, use-Duser.timezone=UTC
JVM option (viaIgniteConfiguration.JvmOptions
).
There are two main ways of how query can be processed in Ignite:
If you execute the query against REPLICATED
cache, Ignite assumes that all data is available locally and runs a simple local SQL query in H2 database engine. The same will happen for LOCAL
caches.
If you execute the query against PARTITIONED
cache, it works the following way: the query will be parsed and split into multiple map queries and a single reduce query. Then all the map queries are executed on all data nodes of participating caches, providing results to reducing node, which will in turn run reduce query over these intermediate results.
Ignite supports "EXPLAIN ..." syntax in SQL queries and reading execution plans is a main way to analyze query performance in Ignite. Note that plan cursor will contain multiple rows: the last one will contain query for reducing node, others are for map nodes.
var sql = new SqlFieldsQuery("explain select name from Person where age = ?", 26);
foreach (var fields in cache.QueryFields(sql))
Console.WriteLine(fields[0]);
The execution plan itself is generated by H2 as described here:
http://www.h2database.com/html/performance.html#explain_plan
When developing with Ignite sometimes it is useful to check if your tables and indexes look correctly or run some local queries against H2 database embedded in the node. For that purpose Ignite has an ability to start H2 Console. To do that you can start a local node with IGNITE_H2_DEBUG_CONSOLE
system property or environment variable set to true
. The console will be opened in your browser. You may have to click Refresh
button on the Console because it can be opened before database objects are initialized.
When SQL query fails (Failed to parse query
and other exceptions), make sure to examine the InnerException
property: it contains full error message from Ignite SQL engine with details on what exactly has failed. You can do that in Visual Studio debugger or by calling ToString()
on the exception object:
try
{
IQueryCursor<List> cursor = cache.QueryFields(query);
}
catch (IgniteException e)
{
Console.WriteLine(e.ToString());
}
Ignite supports placing index data in off-heap memory. This makes sense for very large datasets since keeping data on heap can cause high GC activity and unacceptable response times.
By default, Ignite stores SQL Indexes on heap. Ignite will store query indexes in off-heap memory if CacheConfiguration.setMemoryMode
is configured to one of the off-heap memory modes - OFFHEAP_TIERED
or OFFHEAP_VALUES
, or CacheConfiguration.setOffHeapMaxMemory
property is set to a value >= 0.
To improve the performance of SQL queries with off-heap enabled, you can try to increase the value of CacheConfiguration.sqlOnheapRowCacheSize
property. The default value for this property is 10000
.
There are multiple things you should consider when choosing indexes for your Ignite application.
❗️It is a bad strategy to index everything!
📘Example of Sorted Index
| A | B | C |
| 1 | 2 | 3 |
| 1 | 4 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | 4 |
| 2 | 4 | 5 |Any condition like
a = 1 and b > 3
can be viewed as a bounded range, both bounds can be quickly looked up in index in log(N) time, the result will be everything between.The following conditions will be able to use the index:
a = ?
a = ? and b = ?
a = ? and b = ? and c = ?
Condition
a = ? and c = ?
is no better thana = ?
from the index point of view.
Obviously half-bounded ranges likea > ?
can be used as well.
There are few common pitfalls that should be noticed when running SQL queries.
If the query is using operator OR then it may use indexes not the way you would expect. For example for query select name from Person where sex='M' and (age = 20 or age = 30)
index on field age
will not be used even if it is obviously more selective than index on field sex
and thus is preferable. To workaround this issue you have to rewrite the query with UNION ALL (notice that UNION without ALL will return DISTINCT rows, which will change query semantics and introduce additional performance penalty) like select name from Person where sex='M' and age = 20 UNION ALL select name from Person where sex='M' and age = 30
. This way indexes will be used correctly.
If query contains operator IN then it has two problems: it is impossible to provide variable list of parameters (you have to specify the exact list in query like where id in (?, ?, ?)
, but you can not write it like where id in ?
and pass array or collection) and this query will not use index. To work around both problems you can rewrite the query in the following way: select p.name from Person p join table(id bigint = ?) i on p.id = i.id
. Here you can provide object array (object[]) of any length as a parameter and the query will use index on field id
.
Updated over 4 years ago
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4