---
title: "Getting Started with qryflow"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with qryflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r}
library(qryflow)
```

# What is `qryflow`?

`qryflow` lets you write multi-step SQL workflows in plain `.sql` files and run them from R with a single function call. Specially formatted tags tell R how to execute each SQL chunk and what to name the results. This allows you to:

- Keep multiple SQL statements in the same file.

- Control how each SQL "chunk" is executed.

- Return results as named R objects.

- Pass metadata that can be used later in R workflows

In short: You can define and run **multi-step SQL workflows** with one function call, and get your results back as a structured R object.

# Basic usage

The main function is `qryflow`, which accepts SQL tagged with special comments and a connection to DBI-compliant database. Note, the SQL can be a character vector, like in the example below, or a filepath to a file that contains SQL.

```{r example}
# Connection to In-Memory DB with table populated from mtcars
con <- example_db_connect(mtcars)

sql <- "
-- @exec: drop_cyl_6
DROP TABLE IF EXISTS cyl_6;

-- @exec: prep_cyl_6
CREATE TABLE cyl_6 AS
SELECT *
FROM mtcars
WHERE cyl = 6;

-- @query: df_cyl_6
SELECT *
FROM cyl_6;
"

# Pass tagged SQL to `qryflow`
results <- qryflow(con, sql, verbose = TRUE)

# Access the results from the chunk named `df_cyl_6`
head(results$df_cyl_6)
```

 By default, the package supports `@exec` tags, which are executed with `DBI::dbExecute()` and `@query` tags, which are executed with `DBI::dbGetQuery()`.

When you run `qryflow()`:

1. The SQL script is split into chunks using tag lines like `-- @query: df_mtcars`.

2. Each chunk is assigned a type (e.g., `query` or `exec`)

3. Chunks are executed in order, using the associated execution type

4. The results are returned as named objects

# Defining a Chunk

In `qryflow`, a chunk is a grouped section of SQL code, representing a single executable unit within a larger multi-step SQL workflow, and preceded by one or more tag lines (e.g., the pattern `-- @<tag>: <value>`).

- Tagged lines act as markers that start a new chunk.

- All lines (comments and SQL) immediately following a contiguous group of tagged lines belong to that chunk until another tag line starts the next chunk.

- If the script has no tags, the entire script is treated as one single chunk.


# Tags and Aliases

Each SQL chunk must be tagged with a `type` so `qryflow` knows how to execute it. If a chunk is not provided with a tag, the `qryflow` engine will use the value of the `default_type` argument, which can be provided directly or set with `getOption("qryflow.default_type", "query")`. It defaults to "query", as getting data out is the most common use case.

Tags use SQL-style comments (`--`) and follow the format:

```sql
-- @<tag>: <value>
```

## Important Tags

Each chunk should have both a `name` (the name of the object when returned to R) and a `type` (execution mode for the chunk). Users can set these explicitly with the following tags:

- `@type`	apecifies execution type (`-- @type: query`)

- `@name`	assigns a name to the chunk’s result (`-- @name: df_users`)

For registered types, users can use shorthand to supply both name and type in one line. For example, `@query` and `@exec` are aliases for setting both `@type` and `@name` in one line.

**Aliased form (preferred):**

```sql
-- @query: df_mtcars
SELECT *
FROM mtcars;
```

**Explicit form (equivalent):**

```sql
-- @type: query
-- @name: df_mtcars
SELECT *
FROM mtcars;
```

## Type Identification

During parsing, `qryflow` determines its type using the following rules:

1. If a chunk includes an explicit `-- @type:` tag, that value is used as the chunk type.

2. If there is no `@type:` tag, `qryflow` checks for any other tag that matches a registered type (`@query:`, `@exec:`, etc.) . The first match found is used as the type.

3. If no recognized tag is found, the type defaults to the value of `getOption("qryflow.default_type", "query")`.

## Passing Additional Tags

You can include additional tags to carry metadata into your R workflow, that follow the tagging structure:

```sql
-- @exec: df_mtcars
-- @src: dbo.mtcars
-- @topic: cars
SELECT *
FROM mtcars;
```

# Important Arguments

## on_error

The `on_error` argument controls what happens when a single chunk fails:

- `"stop"` (default): halts execution immediately and raises an error.

- `"warn"`: records the error and signals a warning, but continues running remaining chunks.

- `"collect"`: silently collects all errors across all chunks and raises a single combined error at the end.

```{r, error=TRUE}
# on_error = "stop" (default): halts on first failure
bad_sql <- "
-- @exec: prep_cyl_6
CREATE TABLE cyl_6 AS SELECT * FROM mtcars WHERE cyl = 6;

-- @query: df_missing
SELECT * FROM nonexistent_table;

-- @query: df_mtcars
SELECT * FROM mtcars;
"

qryflow(con, bad_sql, on_error = "stop")
```

```{r}
# Warn collects errors and signals a warning
qryflow(con, bad_sql, on_error = "warn")
```

```{r, error=TRUE}
# on_error = "collect": runs everything, then reports all failures together
qryflow(con, bad_sql, verbose = TRUE, on_error = "collect")
```

## verbose

By default, `qryflow` is quiet. However, for long running queries with multiple chunks, you may want feedback on which chunks are currently running. You can use `verbose = TRUE` to get updates during execution.

## simplify

When `simplify = TRUE`, in the case where there is only one chunk, `qryflow()` will return a single object (as opposed to a named list of results). For example:

```{r}
sql1 <- "
-- @query: df_mtcars
SELECT *
FROM mtcars;
"

sql2 <- "
-- @query: df_mtcars
SELECT *
FROM mtcars;

-- @query: df_mtcars_cyl6
SELECT *
FROM mtcars
WHERE cyl = 6;
"

# Pass tagged SQL to `qryflow`
res1 <- qryflow(con, sql1, simplify = TRUE)
res2 <- qryflow(con, sql2, simplify = TRUE)
res3 <- qryflow(con, sql1, simplify = FALSE)

class(res1) # simplifies the result to the single data.frame() because only one chunk
class(res2) # returns named list
class(res3) # returns named list, because simplify = FALSE
```

This design choice is to facilitate easy interactive use and is a common use-case. Because `qryflow()` might return a named `list` or a single `data.frame` depending on the input, the `qryflow` package exports other functions so users can prioritize reliability in return objects. The next section explores functions like `qryflow_run()` and `qryflow_results()` further.

# The Core API

While `qryflow()` covers most use cases, users who want more control and consistency may prefer to use the functions that `qryflow()` leverages:

- `qryflow_run()`

- `qrflow_results() `

- `qryflow_parse()`

- `qryflow_execute`

## `qryflow_run()` and `qryflow_results()`

`qryflow_run()` performs parsing *and* execution, returning a full `qryflow` object - including all chunk metadata, not just the query results.

```{r}
obj <- qryflow_run(con, sql)

# A qryflow object
class(obj)

# Chunk names are top-level list names
names(obj)

obj # Print Method
```

Each element is a `qryflow_chunk`:

```{r}
class(obj$df_cyl_6)

# Print the chunk
obj$df_cyl_6
```

To extract only the query results (equivalent to what `qryflow()` returns), use `qryflow_results()`:

```{r}
results <- qryflow_results(obj)
class(results$df_cyl_6)
head(results$df_cyl_6)
```

## `qryflow_parse()` and `qryflow_execute()`

For even more control, you can parse and execute separately:

```{r}
# Step 1: Parse the SQL into structured chunks
filepath <- example_sql_path()
workflow <- qryflow_parse(filepath)

class(workflow)
length(workflow)
names(workflow)

# Inspect a chunk before execution
workflow$df_mtcars
```

Each `qryflow_chunk` contains:

- `$type`: the execution type (e.g., `"query"`)
- `$name`: the chunk name
- `$sql`: the SQL body
- `$tags`: any additional tags
- `$results`: `NULL` before execution; populated after

```{r}
# Step 2: Execute the parsed workflow
executed <- qryflow_execute(con, workflow)

class(executed)
names(executed)
executed
```

# Metadata

Both the worfklow object (`qryflow`) and the chunk objects (`qryflow_chunk`) store metadata about the execution. You can access this information with the `qryflow_meta()` function:

```{r}
qryflow_meta(executed) # The whole workflow
```

```{r}
qryflow_meta(executed[[1]]) # The whole chunk
```

# Summary

| Function | What it does |
|---|---|
| `qryflow()` | Parse + execute + return query results. |
| `qryflow_run()` | Parse + execute, returning a full `qryflow` object with metadata. |
| `qryflow_results()` | Extract query results from a `qryflow` object. |
| `qryflow_parse()` | Parse SQL into structured `qryflow` object - No execution. |
| `qryflow_execute()` | Execute a parsed `qryflow` object against a connection. |
| `qryflow_meta()` | Access metadata (status, duration, timing) on a workflow or chunk. |

For a guide on registering custom chunk types and extending `qryflow`'s behaviour, see `vignette("extend-qryflow", package = "qryflow")`.

# Examples

**Example 1 - Script with no tags**

```sql
CREATE TABLE cyl_6 AS
SELECT *
FROM mtcars
WHERE cyl = 6;
```
Result - The entire script is one chunk containing all lines.

- Why? Without tags, `qryflow` treats the whole script as a single step.

**Example 2 - Script with one tag at the start**

```sql
-- @query: get_6cyl
SELECT *
FROM mtcars
WHERE cyl = 6;
```
Result - One chunk starting at the tag, containing the rest of the script.

Because the tag is at line 1, the chunk starts there and continues to the end.

**Example 3 - Script with one tag in the middle**

```sql
SELECT *
FROM mtcars
WHERE cyl = 6;

-- @query: df_mtcars
SELECT *
FROM mtcars;
```

Result - Two chunks:

- Chunk 1: lines before the tag (untagged SQL).

- Chunk 2: from the tag line to the end.

This preserves any pre-tag SQL as a separate chunk.

**Example 4 - Script with multiple tags**

```sql
-- @exec: drop_cyl_6
DROP TABLE IF EXISTS cyl_6;

-- @exec: prep_cyl_6
CREATE TABLE cyl_6 AS
SELECT *
FROM mtcars
WHERE cyl = 6;

-- @query: df_mtcars
SELECT *
FROM mtcars;

-- @query: df_cyl_6
SELECT *
FROM cyl_6
```

- Result - Four
 chunks, each starting at its respective tag line.

- Each chunk is parsed and executed independently in sequence.


```{r, echo=FALSE, include=FALSE}
DBI::dbDisconnect(con)
```
