RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://apacheignite-net.readme.io/docs/sql-queries below:

Distributed Queries

Ignite .NET supports free-form SQL queries virtually without any limitations. SQL syntax is ANSI-99 compliant. You can use any SQL function, any aggregation, any grouping and Ignite will figure out where to fetch the results from.

Ignite supports collocated and non-collocated distributed SQL joins. Moreover, if data resides in different caches, Ignite allows for cross-cache joins as well.

Joins between PARTITIONED and REPLICATED caches always work without any limitations. However, if you do a join between two PARTITIONED data sets, then you must make sure that the keys you are joining on are either collocated or you enabled non-collocated joins parameter for the query.

See example SqlQuery JOIN below.

Instead of selecting the whole object, you can choose to select only specific fields in order to minimize network and serialization overhead. For this purpose Ignite has a concept of fields queries. Also it is useful when you want to execute some aggregate query.

See example SqlFieldsQuery below.

You can query data from multiple caches. In this case, cache names act as schema names in regular SQL. This means all caches can be referred by cache names in quotes. The cache on which the query was created acts as the default schema and does not need to be explicitly specified.

See example Cross-Cache SqlFieldsQuery.

var cache = ignite.GetOrCreateCache<int, Person>("personCache");

var sql = new SqlQuery(typeof(Person), "Salary > ?");

var cursor = cache.Query(sql);

foreach (var cacheEntry in cursor)
    Console.WriteLine(cacheEntry.Value);

var cache = ignite.GetOrCreateCache<int, Person>("personCache");

// SQL join on Person and Organization.
var sql = new SqlQuery(typeof(Person), "from Person as p," +
    "\"orgCache\".Organization as org" + 
    "where p.OrgId = org.Id " +
    "and lower(org.Name) = lower(?)");

// Find all persons working for Ignite organization.
foreach (var cacheEntry in cache.Query(sql))
    Console.WriteLine(cacheEntry.Value);

var cache = ignite.GetOrCreateCache<int, Person>("personCache");

// Execute query to get names of all employees.
var sql = new SqlFieldsQuery(
    "select concat(FirstName, ' ', LastName) from Person as p");

// Iterate over the result set.
foreach (var fields in cache.QueryFields(sql))
    Console.WriteLine("Person Name = {0}", fields[0]);

// In this example, suppose Person objects are stored in a 
// cache named 'personCache' and Organization objects 
// are stored in a cache named 'orgCache'.
var personCache = ignite.GetOrCreateCache<int, Person>("personCache");

// Select with join between Person and Organization to 
// get the names of all the employees of a specific organization.
var sql = new SqlFieldsQuery(
    "select p.Name  " +
    "from Person as p, \"orgCache\".Organization as org where " +
    "p.OrgId = org.Id " +
    "and org.Name = ?", "Ignite");

foreach (IList fields in personCache.QueryFields(sql))
    Console.WriteLine("Person Name = {0}", fields[0]);

By default, if an SQL join has to be done across a number of Ignite caches, then all the caches have to be collocated. Otherwise, you will get an incomplete result at the end of query execution because at the join phase a node uses the data that is available only locally. Referring to Picture 1. below you will see that, first, an SQL query is sent to all the nodes (Q) where data, required for a join, is located. After that the query is executed right away by every node (E(Q)) over the local data set and, finally, the overall execution result is aggregated on the client side (R).

Picture 1. Collocated SQL Query.

Besides the fact that the affinity collocation is a powerful concept that, once set up for an application's business entities (caches), will let you execute cross-cache joins in the most optimal way by returning a complete and consistent result set, there is always a chance that you won't be able to collocate all the data. Thus, you may not be able to execute the whole range of SQL queries that are needed to satisfy your use case.

The non-collocated distributed joins have been designed and supported by Apache Ignite for cases when it's extremely difficult or impossible to collocate all the data but you still need to execute a number of SQL queries over non-collocated caches.

❗️
Don't overuse the non-collocated distributed joins based approach in practice because the performance of this type of joins is worse than the performance of the affinity collocation based joins due to the fact that there will be much more network round-trips and data movement between the nodes to fulfill a query.

When the non-collocated distributed joins setting is enabled for a specific SQL query with the SqlQuery.setDistributedJoins(boolean) parameter, then, the node to which the query was mapped will request for the missing data (that is not present locally) from the remote nodes by sending either broadcast or unicast requests. This is depicted on Picture 2. below as a potential data movement step (D(Q)). The potential unicast requests are only sent in cases when a join is done on a primary key (cache key) or an affinity key, since the node performing the join knows the location of the missing data. The broadcast requests are sent in all the other cases.

Picture 2. Non-collocated SQL Query.

👍
Neither broadcast nor unicast requests, that are sent by one node to another in order to get the missing data, are executed sequentially. The SQL engine combines all the request into batches. This batch size can be managed using SqlQuery.setPageSize(int) parameter.

Refer to the non-collocated distributed joins blog post for more technical details.

const string orgName = "Apache";

var qry = cache.Query(new SqlQuery("Employee",
    "from Employee, \"dotnet_cache_query_organization\".Organization " +
    "where Employee.organizationId = Organization._key and Organization.name = ?", orgName)
{
    EnableDistributedJoins = true
});

Console.WriteLine(">>> Employees working for " + orgName + ":");

foreach (var entry in qry)
    Console.WriteLine(">>>     " + entry.Value);

Indexes can be configured by marking cacheable type members with QuerySqlFieldAttribute and QueryTextFieldAttribute. These types should be passed to CacheConfiguration(string name, params Type[] queryTypes) constructor, or one of the QueryEntity constructors.

var cfg = new IgniteConfiguration
{
    CacheConfiguration = new[]
    {
      	// Configure queries for a cache with Person values (cache keys are not indexed)
        new CacheConfiguration("personCache", typeof(Person)),
      	new CacheConfiguration
        {
            QueryEntities =
            {
            		// Configure indexing for both keys and values
                new QueryEntity(typeof(int), typeof(Company))
            }
        }
    }
};

To make fields or properties accessible for SQL queries you have to mark them with [QuerySqlField]. Property Age will not be accessible from SQL. Note that none of these properties are indexed.

public class Employee
{
    [QuerySqlField]
    public string Name { get; set; }

    [QuerySqlField]
    public long Salary { get; set; }

    public int Age { get; set; }
}

📘
Field Names and Serialization

Marking field or property with [QuerySqlField] automatically adds QueryField to the corresponding QueryEntity. You have to make sure that SQL field name and serialized field name are the same.

For example, default Ignite reflective serializer operates on fields. So if you follow standard naming convention, your backing field will be _name, and that name is used for serialization. So you have to mark that field, and not a property, with [QuerySqlField], and use _name in SQL queries.

With automatic properties, though, it just works, because Ignite recognizes them and trims backing field specifier.

📘
Predefined Fields

In addition to all the fields marked with [QuerySqlField] attribute, each table will have two special predefined fields- _key and _val that represent links to whole key and value objects. This is useful, for example, when one of them is a primitive and you want to filter by its value. To do this, execute a query like SELECT * FROM Person WHERE _key = 100.

To make fields not only accessible by SQL but also speedup queries you can index field values. To create a single column index you can annotate field with [QuerySqlField(IsIndexed = true)] attribute.

public class Employee
{
    // Index in ascending order
    [QuerySqlField(IsIndexed = true)]
    public string Name { get; set; }

    // Index in descending order
    [QuerySqlField(IsIndexed = true, IsDescending = true)]
    public long Salary { get; set; }

    // Enable field in SQL, but don't index
    [QuerySqlField]
    public int Age { get; set; }
}

To have a multi-field index to speedup queries with complex conditions, you can use QuerySqlField.IndexGroups property. It is possible to put multiple groups into IndexGroups array if you want the field to participate in more than one group index.

For example of a group index in the class below we have property Age which participates in a group index named "age_salary_idx" with descending sort order. Also in the same group index there is a property salary with ascending sort order. On top of that field salary itself is indexed with single column index.

public class Employee
{
    [QuerySqlField(IsIndexed = true, IndexGroups = new[] {"age_salary_idx"}, IsDescending = true)]
    public int Age { get; set; }

    [QuerySqlField(IsIndexed = true, IndexGroups = new[] {"age_salary_idx", "salary_idx"})]
    public long Salary { get; set; }
}

Indexes and fields can also be configured with Apache.Ignite.Core.Cache.Configuration.QueryEntity either in code, in app config, or in Spring XML. It is equivalent to attribute-based configuration because attributes are converted to query entities internally.

var cfg = new IgniteConfiguration
{
    CacheConfiguration = new[]
    {
        new CacheConfiguration
        {
            QueryEntities = new[]
            {
                new QueryEntity
                {
                    KeyType = typeof(int),
                    ValueType = typeof(Employee),
                    Fields =
                    {
                        new QueryField {Name = "Name", FieldType = typeof(string)},
                        new QueryField {Name = "Salary", FieldType = typeof(long)},
                        new QueryField {Name = "Age", FieldType = typeof(int)}
                    },
                    Indexes =
                    {
                        new QueryIndex("Name"),
                        new QueryIndex
                        {
                            Fields =
                            {
                                new QueryIndexField {Name = "Salary"},
                                new QueryIndexField {Name = "Age", IsDescending = true}
                            },
                            IndexType = QueryIndexType.Sorted,
                            Name = "age_salary_idx"
                        }
                    }
                }
            }
        }
    }
};

<cacheConfiguration>
    <queryEntities>    
        <queryEntity keyType='System.Int32' valueType='Apache.Ignite.ExamplesDll.Binary.Employee, Apache.Ignite.ExamplesDll'>
            <fields>
                <queryField name='Name' fieldType='System.String' />
                <queryField name='Salary' fieldType='System.Int64' />
                <queryField name='Age' fieldType='System.Int32' />
            </fields>
            <indexes>
                <queryIndex>
                    <fields>
                        <queryIndexField name='Name' />
                    </fields>
                </queryIndex>
                <queryIndex name='age_salary_idx' indexType='Sorted'>
                    <fields>
                        <queryIndexField name='Salary' />
                        <queryIndexField name='Age' isDescending='true' />
                    </fields>
                </queryIndex>
            </indexes>
        </queryEntity>
    </queryEntities>
</cacheConfiguration>

<bean class="org.apache.ignite.configuration.CacheConfiguration">
    <property name="name" value="mycache"/>
    <!-- Configure query entities -->
    <property name="queryEntities">
        <list>
            <bean class="org.apache.ignite.cache.QueryEntity">
                <property name="keyType" value="Long"/>
                <property name="valueType" value="Employee"/>

                <property name="fields">
                    <map>
                        <entry key="name" value="java.lang.String"/>
                        <entry key="age" value="java.lang.Integer"/>
                        <entry key="salary" value="java.lang.Long "/>
                    </map>
                </property>

                <property name="indexes">
                    <list>
                        <bean class="org.apache.ignite.cache.QueryIndex">
                            <constructor-arg value="name"/>
                        </bean>
                        <!-- Group index. -->
                        <bean class="org.apache.ignite.cache.QueryIndex">
                            <constructor-arg>
                                <list>
                                    <value>age</value>
                                    <value>salary</value>
                                </list>
                            </constructor-arg>
                            <constructor-arg value="SORTED"/> 
                        </bean>
                    </list>
                </property>
            </bean>
        </list>
    </property>
</bean>

🚧
Make sure to use assembly-qualified type names in app.config

Since SQL queries are executed in Java by H2 engine, Ignite.NET maps .NET types to Java types via paired properties in QueryEntity and QueryField:

QueryEntity.KeyType and QueryEntity.KeyTypeName
QueryEntity.ValueType and QueryEntity.ValueTypeName
QueryField.FieldType and QueryField.FieldTypeName

Type properties are .NET types, and TypeName properties are Java type names. Type properties set TypeName properties automatically, but NOT vice versa.

User-defined types are mapped by their simple name (type name without namespace).
Primitive types, strings, and Guids are mapped to corresponding Java types.
DateTime is mapped to Timestamp. All DateTime values used in queries must be UTC.
sbyte, ushort, uint, ulong are not present in Java and are mapped to byte, short, int, long using bitwise conversion (SQL may not work as expected for out-of-range values).

🚧
DateTime and SQL

DateTime can be Local and UTC; Java Timestamp can only be UTC. Because of that, Ignite.NET can serialize DateTime in two ways: .NET style (can work with non-UTC values, does not work in SQL) and as Timestamp (throws exception on non-UTC values, works properly in SQL).

Reflective serialization: mark field with [QuerySqlField] to enforce Timestamp serialization.

IBinarizable: use IBinaryWriter.WriteTimestamp method.

When it is not possible to modify class to mark fields with [QuerySqlField] or implement IBinarizable, use IBinarySerializer approach. See Serialization for more details.

🚧
SQL Date Functions

Date and Time SQL functions (such as HOUR) produce result according to current time zone, which may provide unexpected results, given that everything else is UTC. To force UTC for SQL functions, use -Duser.timezone=UTC JVM option (via IgniteConfiguration.JvmOptions).

There are two main ways of how query can be processed in Ignite:

If you execute the query against REPLICATED cache, Ignite assumes that all data is available locally and runs a simple local SQL query in H2 database engine. The same will happen for LOCAL caches.
If you execute the query against PARTITIONED cache, it works the following way: the query will be parsed and split into multiple map queries and a single reduce query. Then all the map queries are executed on all data nodes of participating caches, providing results to reducing node, which will in turn run reduce query over these intermediate results.

Ignite supports "EXPLAIN ..." syntax in SQL queries and reading execution plans is a main way to analyze query performance in Ignite. Note that plan cursor will contain multiple rows: the last one will contain query for reducing node, others are for map nodes.

var sql = new SqlFieldsQuery("explain select name from Person where age = ?", 26);

foreach (var fields in cache.QueryFields(sql))
	Console.WriteLine(fields[0]);

The execution plan itself is generated by H2 as described here:
http://www.h2database.com/html/performance.html#explain_plan

When developing with Ignite sometimes it is useful to check if your tables and indexes look correctly or run some local queries against H2 database embedded in the node. For that purpose Ignite has an ability to start H2 Console. To do that you can start a local node with IGNITE_H2_DEBUG_CONSOLE system property or environment variable set to true. The console will be opened in your browser. You may have to click Refresh button on the Console because it can be opened before database objects are initialized.

When SQL query fails (Failed to parse query and other exceptions), make sure to examine the InnerException property: it contains full error message from Ignite SQL engine with details on what exactly has failed. You can do that in Visual Studio debugger or by calling ToString() on the exception object:

try 
{
    IQueryCursor<List> cursor = cache.QueryFields(query);  
}
catch (IgniteException e) 
{
    Console.WriteLine(e.ToString());
}

Ignite supports placing index data in off-heap memory. This makes sense for very large datasets since keeping data on heap can cause high GC activity and unacceptable response times.

By default, Ignite stores SQL Indexes on heap. Ignite will store query indexes in off-heap memory if CacheConfiguration.setMemoryMode is configured to one of the off-heap memory modes - OFFHEAP_TIERED or OFFHEAP_VALUES, or CacheConfiguration.setOffHeapMaxMemory property is set to a value >= 0.

To improve the performance of SQL queries with off-heap enabled, you can try to increase the value of CacheConfiguration.sqlOnheapRowCacheSize property. The default value for this property is 10000.

There are multiple things you should consider when choosing indexes for your Ignite application.

Indexes are not free. They consume memory, and each index needs to be updated separately, thus your cache update performance can be lower if you have more indexes. On top of that optimizer can do more mistakes choosing wrong index to run the query.

❗️
It is a bad strategy to index everything!

Indexes are just sorted data structures. If you define an index on the fields (a,b,c) , the records are sorted first on a, then b, then c.

📘
Example of Sorted Index

| A | B | C |
| 1 | 2 | 3 |
| 1 | 4 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | 4 |
| 2 | 4 | 5 |

Any condition like a = 1 and b > 3 can be viewed as a bounded range, both bounds can be quickly looked up in index in log(N) time, the result will be everything between.

The following conditions will be able to use the index:

a = ?

a = ? and b = ?

a = ? and b = ? and c = ?

Condition a = ? and c = ? is no better than a = ? from the index point of view.
Obviously half-bounded ranges like a > ? can be used as well.

Indexes on a single fields are no better than group indexes on multiple fields starting with the same field (index on (a) is no better than (a,b,c)). Thus it is preferable to use group indexes.

There are few common pitfalls that should be noticed when running SQL queries.

If the query is using operator OR then it may use indexes not the way you would expect. For example for query select name from Person where sex='M' and (age = 20 or age = 30) index on field age will not be used even if it is obviously more selective than index on field sex and thus is preferable. To workaround this issue you have to rewrite the query with UNION ALL (notice that UNION without ALL will return DISTINCT rows, which will change query semantics and introduce additional performance penalty) like select name from Person where sex='M' and age = 20 UNION ALL select name from Person where sex='M' and age = 30. This way indexes will be used correctly.
If query contains operator IN then it has two problems: it is impossible to provide variable list of parameters (you have to specify the exact list in query like where id in (?, ?, ?), but you can not write it like where id in ? and pass array or collection) and this query will not use index. To work around both problems you can rewrite the query in the following way: select p.name from Person p join table(id bigint = ?) i on p.id = i.id. Here you can provide object array (object[]) of any length as a parameter and the query will use index on field id.

Updated over 4 years ago

Table of Contents

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4