Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions lib/tasks/import.rake
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,39 @@ namespace :import do
end
end

desc 'import scc-docs'
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using Oktokit (https://github.com/octokit/octokit.rb?tab=readme-ov-file#additional-query-parameters) to fetch .md files from private repos with an access token?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good idea. will look into it today!

task :scc_docs, [:site] => [:environment] do |_, args|
docs_repo = ENV['DOCS_REPO']
deploy_key_location = ENV['deploy_key_location']
# clone repo locally
`ssh-add #{deploy_key_location}`
`git clone #{docs_repo} tmpdocs`
files = Dir['tmpdocs/**/**.md']
files.each do |readme_file|
content = File.read(readme_file)
url = build_url_for_github_docs(readme_file)
Article.find_or_initialize_by(url: url).tap do |a|
content_words = content.split
# DO NOT # only index Article::MAX_EMBEDDINGS * 0.75 words
# content = content_words[0..(Article::MAX_EMBEDDINGS*0.75)].join(' ')
a.update!(title: build_title(readme_file), text: content, category: 'documentation',indexed_at: DateTime.now)
if a.previous_changes['embedding']
puts "Stored '#{build_title(readme_file)}' from #{url} (#{content.split.size}/#{content_words.size} words)"
end
end
end
end

private

def build_url_for_github_docs(filename)
"https://github.com/#{ENV['DOCS_REPO'].scan(/git@github.com:(.+).git/)[0][0]}/blob/master#{filename.gsub('tmpdocs', '')}"
end

def build_title(readme_file)
readme_file.gsub('tmpdocs', '').scan(/\/(.+)\./)[0][0].split('/').map(&:capitalize).join(' ')
end

def import_from_url(uri, selector: 'article, #content, .chapter, .article, .appendix, main',
category: 'doc')
file = URI::open(uri)
Expand Down