<!--

    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.

-->

# Query Result Set Construction

## Introduction

This text mainly introduces the construction principle of query result set, including three parts: header construction, generating non repeated result set and restoring complete result set.

## Header construction

Next Introduce the first part: including the result set header construction way of RawDataQuery, AlignByDeviceQuery and LastQuery. e.g. DownsamplingQuery and FillQuery etc. will be introduced as subquery in these three queries.

### Raw data query

The header construction logic of raw data query is in the `getWideQueryHeaders()` method.

- org.apache.iotdb.db.service.TSServiceImpl.getWideQueryHeaders

For the construction of each header, you need to provide the column name and the corresponding data type of the column.

- Ordinary raw data query (including FillQuery) only needs to obtain timeseries paths **not de duplicated** from the physical query plan, which are the column names, and use these paths to obtain the corresponding data type to generate the result set header.

- If the raw data query contains aggregate functions (including AggregateQuery and DownsamplingQuery), the time column will be ignored and **aggregate function and timeseries path will be combined to form the column name**, and when obtaining the data type, the type of aggregate function will prevail. For example, `root.sg.d1.s1` is of FLOAT type, while `count(root.sg.d1.s1)` shoule be of LONG type.

Next, we will give examples：

Assuming that all timeseries in the query below exist, the result set headers generated by the following two queries are respectively:

SQL1：`SELECT s1, s2 FROM root.sg.d1;`  ->

| Time | root.sg.d1.s1 | root.sg.d1.s2 |
| ---- | ------------- | ------------- |
|      |               |               |

SQL2：`SELECT count(s1), max_time(s1) FROM root.sg.d1;` ->

| count(root.sg.d1.s1) | max_time(root.sg.d1.s2) |
| -------------------- | ----------------------- |
|                      |                         |

### Align by device query

The header construction logic of align by device query is in the `getAlignByDeviceQueryHeaders()` method.

- org.apache.iotdb.db.service.TSServiceImpl.getAlignByDeviceQueryHeaders

The result set construction of the AlignByDeviceQuery depends on the list of **measurements not deduplicated** generated in the physical query plan. For a brief introduction, the measurements list is a list generated by the suffix path (including wildcards) in the SELECT clause, including three types, namely constant, exist and nonexist. For details, please refer to [Align by device query](../DataQuery/AlignByDeviceQuery.md)

Since the structure of relation table is used for AlignByDeviceQuery, the device column is added to the header first, and its corresponding data type is text type.

Then get the list of measurements, take each measurement as the column name, and get the corresponding data type depending on the measurements' type. If it is a Constant or NonExist type, the data type is directly set as Text type. If it's an Exist type, then get the corresponding type from the `measurementdatatypemap`, which is stored in the physical query plan.

Note that in the case of an AggregationQuery, the measuremnts in the list here will contain aggregate functions, so they can be processed together.

Next, we will give an example：

Assuming there are two timeseries now: `root.sg.d1.s1`, `root.sg.d1.s2`， then the result set header generated by the query below is:

SQL：`SELECT '111', s1, s2, *, s5 FROM root.sg.d1 ALIGN BY DEVICE;`

-> measurements list: ['111', s1, s2, s1, s2, s5]

-> header

| Time | Device | 111 | s1  | s2  | s1  | s2  | s5  |
| ---- | ------ | --- | --- | --- | --- | --- | --- |
|      |        |     |     |     |     |     |     |

### LastQuery

The header construction logic of last query is in the static method `LAST_RESP`.

- org.apache.iotdb.db.service.StaticResps.LAST_RESP

The latest data query calculates the result with the largest timestamp of the timeseries to be queried and displays it in three columns: time, timeseries and the corresponding value.

Next, we will give an example：

Assuming there are two timeseries now: `root.sg.d1.s1`, `root.sg.d1.s2`， then the result set header generated by the query below is:

SQL：`SELECT last s1, s2 FROM root.sg.d1;`

| Time | timeseries    | value |
| ---- | ------------- | ----- |
| ...  | root.sg.d1.s1 | ...   |
| ...  | root.sg.d1.s2 | ...   |

## Generate non repeated result set

Unlike the header construction, we do not need to query duplicate data when executing the physical query plan. For example, for the query `select s1, s1 from root.sg.d1`, we only need to query the value of the timeseries `root.sg.d1.s1` once. Therefore, after the header is constructed, we need to generate a non repeated result set on the server side.

In addition to AlignByDeviceQuery, the deduplication logic of **RawDataQuery, AggregateQuery, LastQuery** etc. is in the `duplicate()` method.

- org.apache.iotdb.db.qp.strategy.PhysicalGenerator.deduplicate()

The deduplication logic is relatively simple: first, get the path not deduplicated from the query plan, and then create a `Set` structure to deduplicate during traversal.

It is worth noting that the timeseries paths of the query are sorted by device before the RawDataQuery and AggregateQuery are deduplicated, in order to reduce I/O and deserialization operations and speed up the query.
Here an additional data structure `pathToIndex` is calculated to record the position of each path in the query.

Because only one set of data needs to be calculated for the LastQuery, there is no need to sort the paths. Its `pathToIndex` will be null.

The deduplication logic of **AlignByDeviceQuery** is in the  `hasNextWithoutConstraint()` method of its result set.

- org.apache.iotdb.db.query.dataset.AlignByDeviceDataSet.hasNextWithoutConstraint()

Because AlignByDeviceQuery need to organize their query plans by device, each device query may not have the same path, and it is allowed to contain constant columns and nonexistent timeseries, so it cannot simply be deduplicated with other queries. Deduplication requires **removing not only the repeated timeseries path, but also the constant columns appearing in the query and the timeseries that do not exist in the current device**.
The implementation logic can be referred to [Align by device query](../DataQuery/AlignByDeviceQuery.md).

After the deduplication paths in the query plan are completed, the query executor of IoTDB can be called to execute the query and obtain the deduplication result set.

## Restore the complete result set

The construction of headers and the generation of non-repeating result set above are done on the server side and then returned to the client side. After the client restores the non-repeating result set based on the original header, the complete result set is presented to the user. To distinguish two result sets, they are called **query result set** and **final result set** respectively.

To explain simply, an example is given first：

SQL: `SELECT s2, s1, s2 FROM root.sg.d1;`

The list of column names `columnNameList` in the header which has been calculated using the steps above is (Timestamp will be contained later if need be)：

`[root.sg.d1.s2, root.sg.d1.s1, root.sg.d1.s2].`

The position of the timeseries path in the query `pathToIndex` is(sorted by device)：

`root.sg.d1.s1 -> 0, root.sg.d1.s2 -> 1;`

Then query result set is：

| Time | root.sg.d1.s1 | root.sg.d1.s2 |
| ---- | ------------- | ------------- |
|      |               |               |

To restore the final result set, we need to construct a mapping set `columnOrdinalMap` with the column name to its position in the query result set, which is aimed at fetching the corresponding result of a column from the query result set. This part of logic is completed in the constructor of the new result set `IoTDBQueryResultSet`.

- org.apache.iotdb.jdbc.AbstractIoTDBResultSet.AbstractIoTDBResultSet()

In order to construct metadata information in final result set, a complete column name list is needed. The `columnnamelist` given above does not contain a timestamp. Therefore, it's necessary to determine whether a timestamp needs to be printed. If so, add the `Time` column to the header to form a complete header.

The complete header in example is：

| Time | root.sg.d1.s2 | root.sg.d1.s1 | root.sg.d1.s2 |
| ---- | ------------- | ------------- | ------------- |
|      |               |               |               |

Then calculating `columnordinalmap`, judge whether to print a timestamp first. If so, record the timestamp as the first column.

Then traverse the column name list in the header and then check whether `columnnameindex` is initialized. This field comes from `pathtoindex` calculated during deduplication, which records the location of each timeseries path in the query. If it is initialized, record the position + 2 as its position in the result set. If not, record the positions in traversal order, which is consistent with the query order in server side.

The `columnOrdinalMap` in example is：

`Time -> 1, root.sg.d1.s2 -> 3, root.sg.d1.s1 -> 2`

Next, traverse the column names in the header, and then fill in the complete result set according to the mapping set. Its logic is in the `cacheResult()` method.

- org.apache.iotdb.cli.AbstractCli.cacheResult()

For example, the second column in the final result set is `root.sg.d1.s2`, therefore the result of the third column will be taken as its value from the result set. Repeat the process to fill the results of each row until there is no next result in the query result set or the maximum number of output rows is reached.
