feat: new populate initial data from prod by Winzen · Pull Request #927 · basedosdados/backend

Winzen · 2025-12-02T01:13:12Z

No description provided.

…and_more #944 fix: migrations account 0025_role_description_en_role_description_es_and_more NOTE: This migration was created automatically by the modeltranslation extension on 2025-02-04 but was not committed. Because our start-server.sh runs the makemigrations command, the migration file was created directly in the production pod and applied as 0021. This created divergences between our migration files in the repository and the actual state of the production database. We fixed this on December 17, 2025 by: - Deleting the original 0021 migration entry from the django_migrations table - Creating this migration file with the correct sequence number (0025) - Running a fake migration to register 0025 in django_migrations without re-executing the SQL

chore: update resources - main, staging and dev #945 - fix if expression (it was always evaluating to truthy) - update pod resources for dev, staging and production environments - add 10 minutes timeout to deployments

renamed `ChatbotThreadListView` to `ThreadListView` and created `CheckpointView` for deleting checkpoints refactored views added chatbot env variables temp: added a script for running django in dev mode and updated the Dockerfile for installing the `chatbot` package updated migrations fixed django admin added `djangorestframework` and `djangorestframework-simplejwt` packages as dependencies comment could be empty/null added jwt tokens urls changed `chatbot` package installation path created serializers and refactored views, added jwt auth, error handling, etc. reminder for custom authentication rules updated dockerfile and compose file for chromadb fixing a TypeError (the assistant expects `thread_id` as a string) and using the `get_sync_sql_assistant` helper method Adding chatbot as submodule Download submodules in ci. Also update README to alert on cloning submodules minor stuff adjusting indentation updated chatbot package and using validated data updated chatbot package version to `v0.3.0` validated data is serialized to a UUID but `UserMessage` expects a string added healthcheck to vector database service created a request validation function using pydantic just renaming views and endpoints added authentication to the checkpoint deletion endpoint added docstrings update `chatbot` package to version `v0.4.0` update .gitignore updated volume mount path to match the expected path on chroma v0.6.3 updated `chatbot` package to `v0.4.1` and using Serializers for request validation updated chatbot package version to v0.4.2 created `ChatbotDatabase` to read metadata from the local database and query data on BigQuery created chatbot authentication field on user account model sending feedbacks to langsmith fixed authentication rule (when a user is not found in db, django simple jwt returns `None`, so it has no `has_chatbot_access` attribute checking if the thread exists before trying to delete its checkpoints and returning 201 on thread creation added unit tests for the endpoints added `chatbot` dependency update env variables file optimize docker image the `chatbot` package was added to the api dependencies, so it doesn't need to be installed it directly. add chatbot migrations removed unused type edited server starting scripts update compose file updated chatbot admin models update account admin model to show the `has_chatbot_access` flag remove unused pydantic import cloning submodules in ci workflow add missing `Account` migrations we don't need to checkout submodules in the deploy step because the docker image is already built and we're only using the helm templates we need to use a deploy key to checkout the `chatbot` submodule because it's private we don't need to checkout submodules in this action because it only watches for new chart versions building `db_url` on runtime add non-sensitive env variables apps.py so chatbot shows in admin remove chatbot related env variables update `chatbot` package version to `0.4.4` enable `chatbot` package logging update poetry lock file fix imports and formatting to pass linting CI workflow fix imports and formatting to pass linting CI workflow increase dev ram memory limit to 1.5Gi remove chroma related env variables stopped installing packages needed for chromadb update dependencies update chatbot views and tests update `chatbot` package to version `v0.4.5` add `__init__.py` file to `apps` so python sees it as a package and pytest can import its chatbot module correctly. discard changes in Dockerfile setting model name directly in plain text use pgvector for similarity search use special valua `__all__` to include all model fields remove chroma from compose files add `PGVECTOR_COLLECTION` env variable minor fixes added `EMBEDDING_MODEL` env variable created `populate_pgvector` command improved logging running `populate_pgvector` script in the foreground run `populate_pgvector` command when starting the server rewrite query when invoking the assistant update `chatbot` package to version `0.5.1` set top-k to 5, i.e., retrieve the metadata of the 5 most relevant datasets remove deploy key setup as the submodule is now public update log message improved comments remove `model_provider` arg support multiple chats using full table ids and spaces instead of tabs for tables metadata remove default value, as `None` is the default already. add optional ordering for `ThreadListView` and `MessageListView` update `chatbot` app test cases add a single migration file return only non-deleted threads refactor thread deletion endpoint make `title` a required field in the `Thread` model and populate old threads `title` field update `chatbot` package to version `0.5.2` prepare `MessagePair` model for streaming add streaming support use `Response` and `status` from DRF update `chatbot` package to version `0.5.2` fix chatbot migration comments fix: correctly parse parallel tool calls Add token bridge endpoint for chatbot authentication fix: use poetry to install and run the project fix: make poetry venv accessible to all users add comments to `Dockerfile` increase helm timeout period to 10 minute revert `Dockerfile` changes to debug deploy timeout restore `Dockerfile` updates to use Poetry chore: update chatbot package (#857) chore: update chatbot package to `v0.6.1` (#858) fix: update chatbot v0.6.1 (#859) feat: react agent (#873) feat: query billing limit (#875) feat: custom react agent (#876) chore: system prompt tuning + other stuff (#884) fix: empty ai message (#887) refactor: truncate tool output (#895) chore: update agent prompt (#897) chore: add usage guides (#903) style: run ruff linter chore: add `contains=tables` parameter (#905) chore: chatbot service account (#911) feat: show sql query (#912) chore: remove submodule `chatbot` (#896) feat: add user ID to BigQuery query job (#926) chore: improve metadata usage (#928) perf: add async support (#932) perf: use rest transport (#933) chore: adjust chatbot env (#937) docs: update readme (#929) chore: prepare for staging and prod (#938) chore: update poetry lock file chore: update `.env.example` file refactor: using already existing `BACKEND_URL` variable from django settings (#940) fix: base backend url (#941) remove unused variable `BACKEND_BASE_URL` from `chatbot.agent.tools` module chore: fix migrations order chore: set `pipefail` option before curl chore: set shell with pipefail chore: fix locustfile exceptions messages chore: update ruff chore: fix ruff lint errors chore: fix migrations order chore: update ruff and sqlfmt pre-commit config chore: sort dependencies in alphabetical order chore: preserve quotes in yamlfix chore: migrate to official ruff action with latest version fix: template response processing in `LoggerMidleware` chore: revert changes in backend.custom.environment chore: standardized images versions and api ports fix: remove duplicated timeout

feat: chatbot to staging

fix: fixing missing static files in local development #948

feat: chatbot to main

fix: optimize search view

chore: use gthread

# Descrição do Problema: Atualmente, estamos enfrentando dificuldades ao tentar abrir alguns datasets devido a problemas de otimização. O uso da CPU atinge 100%, o que impede a conclusão do processo de carregamento, resultando em um erro de timeout. ### Ação: Estou implementando a limpeza de alguns campos para garantir que os datasets possam ser abertos, ao menos temporariamente, enquanto aguardamos uma solução definitiva de otimização.

feat: catalog download

Feat/catolog download

Este PR implementa o fluxo completo de notificação por e-mail para usuários inscritos em atualizações de tabelas, incluindo verificação de mudanças, envio de e-mails e execução automática via tarefa agendada.

Este PR adiciona o comando `disable_unhealthy_flow_schedules`, responsável por **desativar automaticamente schedules de flows não saudáveis no Prefect**, com base no histórico recente de execuções. Estamos adicionando o comando `disable_unhealthy_flow_schedules`, que: * Identifica flows com execuções recentes problemáticas * Avalia a saúde do flow a partir das **duas últimas execuções concluídas** * Desativa automaticamente o schedule de flows considerados não saudáveis

Adiciona filtro na query GraphQL `LastTwoCompletedRunsWithTasks` para garantir que apenas `flow_runs` com `start_time` diferente de `null` sejam retornadas.

- Add `DJANGO_JWT_ALGORITHM` env variable. - Create `db_network` (external network) in the compose file.

- Add `account.uuid` to JWT token payload. - Add unique constraint to `account.uuid`.

# fix: disable_unhealthy_flow_schedules and _check_for_updates ## disable_unhealthy_flow_schedules No Prefect foi identificado um comportamento intermitente na desativação de schedules. Devido a uma **race condition entre o scheduler e a API**, quando o scheduler ainda está processando ticks no momento da mutation, o schedule pode não ser efetivamente desativado na primeira chamada — mesmo retornando `success`. Isso fazia com que flows considerados não saudáveis continuassem gerando novas execuções, exigindo nova tentativa manual para efetivar a desativação. --- Foi implementada uma segunda tentativa imediata de desativação do schedule para cada flow validado como unhealthy: ```python for _ in range(2): # Existe um bug onde o Flow não desativa com apenas uma query self.set_flow_schedule(flow_id=flow.id, active=False) ``` Essa abordagem mitiga a race condition do scheduler e garante que o schedule seja efetivamente desativado, tornando o processo determinístico e evitando a necessidade de reexecução manual. ## check_for_updates Foi identificado um erro na função check_for_updates. Algumas tables não possuem o atributo last_updated_at, o que gerava exceção durante a comparação e interrompia o fluxo normal da execução. Também foi ajustada a função check_for_updates para tratar casos onde table.last_updated_at não está presente, evitando que a ausência desse atributo gere exceção e quebre a execução. Agora, nesses casos, a função trata o erro de forma segura e retorna False, preservando a estabilidade do processo.

Winzen self-assigned this Dec 2, 2025

Winzen changed the title ~~feat: New populate initial data from prod~~ feat: new populate initial data from prod Dec 2, 2025

Winzen added the staging Indica que o Pull Request está com a branch de destino (base) apontando para staging label Dec 2, 2025

Winzen marked this pull request as ready for review December 5, 2025 03:01

Winzen and others added 26 commits December 17, 2025 16:54

chore: update resources - main, staging and dev #945

8328d0d

chore: update resources - main, staging and dev #945 - fix if expression (it was always evaluating to truthy) - update pod resources for dev, staging and production environments - add 10 minutes timeout to deployments

feat: initial commit chatbot

3c99389

WIP: Created basic chatbot backend endpoints

6956453

Adding migrations and fixing admin

65584a2

Fixing stuff and adding todos

907b666

Bad set to 0

e62e833

Merge pull request #946 from basedosdados/feat/chatbot

ccbb951

feat: chatbot to staging

fix: fixing missing static files in local development #948

283ae24

fix: fixing missing static files in local development #948

feat: chatbot to main #950

ab253ee

feat: chatbot to main

perf: optimize search view

4185624

chore: more descriptive variables names

50b4d94

perf: add worker connections limit

229c8ba

chore: add comment for clarity

01d530b

perf: refactored get_facets in DatasetSearchView

ed68339

Merge pull request #952 from basedosdados/perf/optimize-search-view

37f5844

fix: optimize search view

feat: add export functionality for catalog data in CSV format

61839c7

chore: use gthread

3f0f3e6

Merge pull request #956 from basedosdados/chore/use-gthread

cf306d4

chore: use gthread

Temporary Simplification Admin

deaf78c

feat: catalog download #959

54f4e58

feat: catalog download

feat: change one name in headers and adjust status map

1e9bdd0

fix: correct grammar in CSV

1bbf103

Merge pull request #972 from basedosdados/feat/catolog_download

56e9fa7

Feat/catolog download

Winzen and others added 11 commits February 10, 2026 04:38

Init app user_notifications

a8c44fc

check_for_updates_and_send_emails only production

299ede1

feat: init app user_notifications #961

e2c2fe8

Este PR implementa o fluxo completo de notificação por e-mail para usuários inscritos em atualizações de tabelas, incluindo verificação de mudanças, envio de e-mails e execução automática via tarefa agendada.

fix: correct ordering by excluding null start_time runs #983

7b9ccdd

Adiciona filtro na query GraphQL `LastTwoCompletedRunsWithTasks` para garantir que apenas `flow_runs` com `start_time` diferente de `null` sejam retornadas.

feat: prepare for chatbot service #986

8e382f0

- Add `DJANGO_JWT_ALGORITHM` env variable. - Create `db_network` (external network) in the compose file.

feat: add UUID to token #987

0fbf0f7

- Add `account.uuid` to JWT token payload. - Add unique constraint to `account.uuid`.

feat: add chatbot access flag to verify token (#994)

3c4ca9c

Organizando e iniciando novo populate em comandos

492bc02

Mudar constants para xml

49feb72

Winzen force-pushed the feat/local_populate branch from 035f5c7 to 49feb72 Compare March 7, 2026 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: new populate initial data from prod#927

feat: new populate initial data from prod#927
Winzen wants to merge 37 commits intobackup-before-chatbot-stagingfrom
feat/local_populate

Winzen commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Winzen commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants