Skip to content

Table reorg prosposal #1002

@ionous

Description

@ionous

My main goal after node optimizations is to support event tags ( ex. "protest ride" ); i started pondering.... and the idea grew a bit. 😊 I went through existing issues on db data, and put together some ideas for a table reorg:

New Tables:

all new tables use new names to avoid overwriting the old; splits calevent into several smaller tables, and simplifies caldaily. the summarize() query ( in the node optimization branch ) would handle merging all of these into a backwards compatible query ( for the events, ical, and search endpoints ); manage and delete are the main endpoints that would need updating. ( and unit tests. )

( potentially, for a little while, we could write to both sets of tables "just in case" ™️ -- tho it might make the code somewhat icky. )

LEDE:

this is the core bit of what was calevent. i think ledeis a nice name because it'd be unique in the code base. indicates it's a lead-in, the initial part of ride info but it can't stand fully on its own either.

  • auto id: maybe call this series_id? -- the ride and all its days binds to this id.
  • auto created time
  • auto modified time: the node opts branch fixes modified data not working. i think.
  • title: string, as it is now.
  • summary: replaces descr
  • organizer: replaces name
  • secret: we can set this to null to block editing. ( when migrating, it becomes null if calevent.password is blank )
  • published: int, this replaces both our 'change counter', and not coalesce(hidden, 0).

when migrating data: the series_id will have to be set to the current calevent.id for every ride; that's the simplest route for backwards compatible links and images.

currently, the change counter is used both to generate image names for cache-busting and to notify ical readers about data changes. the table reorg would separate those usages.

published would be for content changes. let it start at 0, and increase to 1 when initially published; then count number of times an event has been saved to act like changes does. we could arrange things so that null revokes and hides the ride. ( when migrating, if the current password is null or blank, then published would be null; otherwise it'd be the opposite of hidden )

STATUS:

mimics caldaily, but no pkid; and simplifies status.
rationale: caldaily.pkid is easy to confuse with calevent.id; and, the date is enough to uniquely identify the status, so we don't even really need it! the primary key for status here would be a composite key: ymd + series_id.

  • series_id: one or more status entries per ride
  • ymd: date without time in YYYY-MM-DD format. the weird name to help remember that.
  • news: string, replaces newsflash
  • scheduled: replaces eventstatus. instead of a char, uses 1, 0, or null. 1 is active, 0 is cancelled, null is delisted.

PRIVATE:

rationale, by putting all private data in its own table, we can easily export anonymized data by excluding this table(!) and, i think we could pretty easily create a PUBLIC table view based on its internal show settings: so it'd limit the chances of a coding error leaking personal details.

  • series_id - exactly one private row for every ride
  • private_email(string) -- long names, but really hammers it home.
  • private_phone(string)
  • private_contact(string)
  • show_email: 0,1
  • show_phone:
  • show_contact

PRINT:

used for the print calendar. rationale, only a very small number of rides use this. partitioning off tiny here, helps show what it's used for.

  • series_id: at most one print entry for every ride
  • tiny_title: string; when migrating if its the same as the automunge, dont store the field.
  • tiny_summary: string.
  • add_email: 0,1. ex. whether to "add" the email to the print calendar.
  • add_phone
  • add_link
  • add_contact

when migrating if the fields here would all be empty, we can skip adding a print entry.

WEB:

takes the place of 'weburl' and 'webname'
rationale: could potentially support multiple sites, could be bluesky handles , etc.
the primary key would be series_id + web_type.

  • series_id: zero or more web entries for every ride
  • web_type: ex. "url".
  • web_text: human readable
  • web_link: ex. "https://...."

IMAGE:

for any rides with images; rationale: not all rides have images;

  • series_id: at most one image for every ride
  • img_ext ( extension )
  • img_alt ( alt text )
  • img_num: int, increments monotonically every image update ( or if it was easy to determine; could be the image hash. )
  • img_override: null, or a string for events that have images with specific filenames.

on disk, we store most images as calevent_id.ext, but report them as (calevent_id)-(image_num).ext for cache-busting. we pull the image num from calevent.changes, and we record a string with the format "calevent_id.ext" into the db. occasionally, there are images with specific names ( usually setup by one of the calendar team manually ) - which override that.

this change would give image its own counter, and would store it as an int, instead of a string. img_override would handle the rare case where we have given an image a special name.

LOCATION:

handles both the start, and optionally the end of the ride. the primary key would be loc_type + series_id. we could maybe have table views for START and FINISH

  • series_id: one or more locations for every ride. ( ex. all rides have a start, and many have an end. )
  • loc_type: "start" for start, "end" for end. would be a cool expansion to allow YYYY-MM-DD values here to override the location for particular days. ( ex. a special time and location on a given day, maybe for shift hack meetups )
  • loc_name
  • loc_address
  • loc_instructions: ex. replaces locdetails
  • loc_hrmin: ex. replaces eventtime. using this weird name b/c i sometimes confused what we expect it to store. ride_duration would be calculated from finish.hrmin - start.hrmin. storing ride_duration would update the finish.hrmin ( as part of the manage endpoint )
  • loc_hrmin_info: ex. replaces timedetails

the proper db way probably wouldn't store loc_type and series. instead, it would give each location a unique id, and have the lede table store "location ids" for start and end. BUT that implies we could share locations, and i don't think we should. ( location here has more than just a geo-code. ) AND, i really like the idea that every table has the series id. i think that makes manually inspecting the db super easy.

TAG:

new! this is the main goal. the primary key would be tag_name + series_id

  • series_id: zero or more tags for every ride.
  • tag_name: string.
  • tag_value: string, or the string "true". ( could use blank or null for a valueless tag, but that might be confusing. )

the tags could maybe some day be added directly by users, these could be some of the built-in tags:

  • audience: family, ,adult -- i think "general" could be the absence of an audience tag
  • pace ex. easy,steady,strenuous, etc. no drop?
  • area: ex.vancouver, westside, eastside, clackmas. assumes "portland" if there isn't an area tag.
  • safety: the string "true", or doesn't exist and isn't covid friendly.
  • featured: the string "true", some extra details, or doesn't exist and isn't featured.
  • loop: the string "true", or doesn't exist and isn't a loop.
  • distance: one of the exiting strings: "0-3", ..., "15+"
  • protest: "true", or doesn't exist and isn't a protest ride.
  • ... etc.

LEGACY:

exists to respond to existing calendar/event-PKID urls that people might have bookmarked. the node code would catch those urls, find the new table data by matching the incoming `pkid, and join in the lede and status tables to return the right data.

  • series_id: points to the new migrated data. one legacy entry would exist for every existing caldaily; no new entries would be created after migration.
  • pkid: from caldaily
  • eventdate: from caldaily
  • eventstatus and newsflash: these would live in the status table, and don't need to be duplicated here.

in an ideal world, instead of using urls like -- ex. https://www.shift2bikes.org/calendar/event-21504 -- we could immediately start using urls likecalendar/event/SERIES_ID/YY-MM-DD, and update all current code to use that. any old saved bookmarks would route through the legacy table to interpret those 5 digit pkid numbers. anyone getting new links would use the new format and directly route to current data. we wouldn't need a pkid for new events.

BUT we have no idea how bike fun would react to a new url scheme, so we should keep event-NUMBER for now. AND, to be super super safe, we probably should assume bikefun is only good up till a signed int32 number. That's 10 digits, but limited to a max of 2,147,483,647. Figuring out how to support that without a pkid style table is a bit odd....

The simple case: Any request < 6 digits, we assume is an old bookmark, and we run through the legacy table, matching the pkid to find the right data.

The complicated case: all new urls that the backend generates would find the number of days since Jan 01 2008 ( the first day in the db is 2008-02-20 ) -- which is about 6507 days right now. And, if we used the "left" 5 numbers of the event-NUMBER as days, that would get us to Oct 17, 2066 ( 21474 days ) before overflowing.

For the 5 digits on the "right" side we would use the series id. We're currently at 10,024 total events, and we could get up to 83,647 events without worry -- someone smarter than me can guess how the two sides would interact above that; but we can keep them safely independent for a nice long time.

At any rate, the new node code could generate urls of the format event-(5 digits of days)(5 digits of series) -- and we could reference the day of any particular ride while still keeping bikefun ( and any possible integer limitations ) happy. For 40 years or so, at any rate.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions