Skip to content

feat: new populate initial data from prod#927

Open
Winzen wants to merge 37 commits intobackup-before-chatbot-stagingfrom
feat/local_populate
Open

feat: new populate initial data from prod#927
Winzen wants to merge 37 commits intobackup-before-chatbot-stagingfrom
feat/local_populate

Conversation

@Winzen
Copy link
Contributor

@Winzen Winzen commented Dec 2, 2025

No description provided.

@Winzen Winzen self-assigned this Dec 2, 2025
@Winzen Winzen changed the title feat: New populate initial data from prod feat: new populate initial data from prod Dec 2, 2025
@Winzen Winzen added the staging Indica que o Pull Request está com a branch de destino (base) apontando para staging label Dec 2, 2025
@Winzen Winzen marked this pull request as ready for review December 5, 2025 03:01
Winzen and others added 26 commits December 17, 2025 16:54
…and_more #944

fix: migrations account 0025_role_description_en_role_description_es_and_more

NOTE: This migration was created automatically by the modeltranslation extension on 2025-02-04 but was not committed.
Because our start-server.sh runs the makemigrations command, the migration file was created directly in the production
pod and applied as 0021. This created divergences between our migration files in the repository and the actual state of
the production database. We fixed this on December 17, 2025 by:
- Deleting the original 0021 migration entry from the django_migrations table
- Creating this migration file with the correct sequence number (0025)
- Running a fake migration to register 0025 in django_migrations without re-executing the SQL
chore: update resources - main, staging and dev #945

- fix if expression (it was always evaluating to truthy)
- update pod resources for dev, staging and production environments
- add 10 minutes timeout to deployments
renamed `ChatbotThreadListView` to `ThreadListView` and created `CheckpointView` for deleting checkpoints

refactored views

added chatbot env variables

temp: added a script for running django in dev mode and updated the Dockerfile for installing the `chatbot` package

updated migrations

fixed django admin

added `djangorestframework` and `djangorestframework-simplejwt` packages as dependencies

comment could be empty/null

added jwt tokens urls

changed `chatbot` package installation path

created serializers and refactored views, added jwt auth, error handling, etc.

reminder for custom authentication rules

updated dockerfile and compose file for chromadb

fixing a TypeError (the assistant expects `thread_id` as a string) and using the `get_sync_sql_assistant` helper method

Adding chatbot as submodule

Download submodules in ci. Also update README to alert on cloning submodules

minor stuff

adjusting indentation

updated chatbot package and using validated data

updated chatbot package version to `v0.3.0`

validated data is serialized to a UUID but `UserMessage` expects a string

added healthcheck to vector database service

created a request validation function using pydantic

just renaming views and endpoints

added authentication to the checkpoint deletion endpoint

added docstrings

update `chatbot` package to version `v0.4.0`

update .gitignore

updated volume mount path to match the expected path on chroma v0.6.3

updated `chatbot` package to `v0.4.1` and using Serializers for request validation

updated chatbot package version to v0.4.2

created `ChatbotDatabase` to read metadata from the local database and query data on BigQuery

created chatbot authentication field on user account model

sending feedbacks to langsmith

fixed authentication rule (when a user is not found in db, django simple jwt returns `None`, so it has no `has_chatbot_access` attribute

checking if the thread exists before trying to delete its checkpoints and returning 201 on thread creation

added unit tests for the endpoints

added  `chatbot` dependency

update env variables file

optimize docker image

the `chatbot` package was added  to  the api dependencies, so it doesn't need to be installed it directly.

add chatbot migrations

removed unused type

edited server starting scripts

update compose file

updated chatbot admin models

update account admin model to show the `has_chatbot_access` flag

remove unused pydantic import

cloning submodules in ci workflow

add missing `Account` migrations

we don't need to checkout submodules in the deploy step because the docker image is already built and we're only using the helm templates

we need to use a deploy key to checkout the `chatbot` submodule because it's private

we don't need to checkout submodules in this action because it only watches for new chart versions

building `db_url` on runtime

add non-sensitive env variables

apps.py so chatbot shows in admin

remove chatbot related env variables

update `chatbot` package version to `0.4.4`

enable `chatbot` package logging

update poetry lock file

fix imports and formatting to pass linting CI workflow

fix imports and formatting to pass linting CI workflow

increase dev ram memory limit to 1.5Gi

remove chroma related env variables

stopped installing packages needed for chromadb

update dependencies

update chatbot views and tests

update `chatbot` package to version `v0.4.5`

add `__init__.py` file to `apps` so python sees it as a package and pytest can import its chatbot module correctly.

discard changes in Dockerfile

setting model name directly in plain text

use pgvector for similarity search

use special valua `__all__`  to include all model fields

remove chroma from compose files

add `PGVECTOR_COLLECTION` env variable

minor fixes

added `EMBEDDING_MODEL` env variable

created `populate_pgvector` command

improved logging

running `populate_pgvector` script in the foreground

run `populate_pgvector` command when starting the server

rewrite query when invoking the assistant

update `chatbot` package to version `0.5.1`

set top-k to 5, i.e., retrieve the metadata of the 5 most relevant datasets

remove deploy key setup as the  submodule is now public

update log message

improved comments

remove `model_provider` arg

support multiple chats

using full table ids and spaces instead of tabs for tables metadata

remove default value, as `None` is the default already.

add optional ordering for `ThreadListView` and `MessageListView`

update `chatbot` app test cases

add a single migration file

return only non-deleted threads

refactor thread deletion endpoint

make `title` a required field in the `Thread` model and populate old threads `title` field

update `chatbot` package to version `0.5.2`

prepare `MessagePair` model for streaming

add streaming support

use `Response` and `status` from DRF

update `chatbot` package to version `0.5.2`

fix chatbot migration comments

fix: correctly parse parallel tool calls

Add token bridge endpoint for chatbot authentication

fix: use poetry to install and run the project

fix: make poetry venv accessible to all users

add comments to `Dockerfile`

increase helm timeout period to 10 minute

revert `Dockerfile` changes to debug deploy timeout

restore `Dockerfile` updates to use Poetry

chore: update chatbot package (#857)

chore: update chatbot package to `v0.6.1` (#858)

fix: update chatbot v0.6.1 (#859)

feat: react agent (#873)

feat: query billing limit (#875)

feat: custom react agent (#876)

chore: system prompt tuning + other stuff (#884)

fix: empty ai message (#887)

refactor: truncate tool output (#895)

chore: update agent prompt (#897)

chore: add usage guides (#903)

style: run ruff linter

chore: add `contains=tables` parameter (#905)

chore: chatbot service account (#911)

feat: show sql query (#912)

chore: remove submodule `chatbot` (#896)

feat: add user ID to BigQuery query job (#926)

chore: improve metadata usage (#928)

perf: add async support (#932)

perf: use rest transport (#933)

chore: adjust chatbot env (#937)

docs: update readme (#929)

chore: prepare for staging and prod (#938)

chore: update poetry lock file

chore: update `.env.example` file

refactor: using already existing `BACKEND_URL` variable from django settings (#940)

fix: base backend url (#941)

remove unused variable `BACKEND_BASE_URL` from `chatbot.agent.tools` module

chore: fix migrations order

chore: set `pipefail` option before curl

chore: set shell with pipefail

chore: fix locustfile exceptions messages

chore: update ruff

chore: fix ruff lint errors

chore: fix migrations order

chore: update ruff and sqlfmt pre-commit config

chore: sort dependencies in alphabetical order

chore: preserve quotes in yamlfix

chore: migrate to official ruff action with latest version

fix: template response processing in `LoggerMidleware`

chore: revert changes in backend.custom.environment

chore: standardized images versions and api ports

fix: remove duplicated timeout
fix: fixing missing static files in local development #948
feat: chatbot to main
# Descrição do Problema:
Atualmente, estamos enfrentando dificuldades ao tentar abrir alguns datasets devido a problemas de otimização. O uso da CPU atinge 100%, o que impede a conclusão do processo de carregamento, resultando em um erro de timeout.

### Ação:
Estou implementando a limpeza de alguns campos para garantir que os datasets possam ser abertos, ao menos temporariamente, enquanto aguardamos uma solução definitiva de otimização.
feat: catalog download
Winzen and others added 11 commits February 10, 2026 04:38
Este PR implementa o fluxo completo de notificação por e-mail para usuários inscritos em atualizações de tabelas, incluindo verificação de mudanças, envio de e-mails e execução automática via tarefa agendada.
Este PR adiciona o comando `disable_unhealthy_flow_schedules`, responsável por **desativar automaticamente schedules de flows não saudáveis no Prefect**, com base no histórico recente de execuções.

Estamos adicionando o comando `disable_unhealthy_flow_schedules`, que:

* Identifica flows com execuções recentes problemáticas
* Avalia a saúde do flow a partir das **duas últimas execuções concluídas**
* Desativa automaticamente o schedule de flows considerados não saudáveis
Adiciona filtro na query GraphQL `LastTwoCompletedRunsWithTasks` para garantir que apenas `flow_runs` com `start_time` diferente de `null` sejam retornadas.
- Add `DJANGO_JWT_ALGORITHM` env variable.
- Create `db_network` (external network) in the compose file.
- Add `account.uuid` to JWT token payload.
- Add unique constraint to `account.uuid`.
# fix: disable_unhealthy_flow_schedules and _check_for_updates

## disable_unhealthy_flow_schedules 
No Prefect foi identificado um comportamento intermitente na desativação de schedules.
Devido a uma **race condition entre o scheduler e a API**, quando o scheduler ainda está processando ticks no momento da mutation, o schedule pode não ser efetivamente desativado na primeira chamada — mesmo retornando `success`.

Isso fazia com que flows considerados não saudáveis continuassem gerando novas execuções, exigindo nova tentativa manual para efetivar a desativação.

---

Foi implementada uma segunda tentativa imediata de desativação do schedule para cada flow validado como unhealthy:

```python
for _ in range(2):  # Existe um bug onde o Flow não desativa com apenas uma query
    self.set_flow_schedule(flow_id=flow.id, active=False)
```

Essa abordagem mitiga a race condition do scheduler e garante que o schedule seja efetivamente desativado, tornando o processo determinístico e evitando a necessidade de reexecução manual.

## check_for_updates

Foi identificado um erro na função check_for_updates. Algumas tables não possuem o atributo last_updated_at, o que gerava exceção durante a comparação e interrompia o fluxo normal da execução.

Também foi ajustada a função check_for_updates para tratar casos onde table.last_updated_at não está presente, evitando que a ausência desse atributo gere exceção e quebre a execução. Agora, nesses casos, a função trata o erro de forma segura e retorna False, preservando a estabilidade do processo.
@Winzen Winzen force-pushed the feat/local_populate branch from 035f5c7 to 49feb72 Compare March 7, 2026 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

staging Indica que o Pull Request está com a branch de destino (base) apontando para staging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants