-
Notifications
You must be signed in to change notification settings - Fork 3
Code Structure
This file acts as the Graphics User Interface (GUI) for the program. It uses PySimpleGUI and its dependencies. There are 30 functions in the file.
EAD_EXPORT_THREAD, MARCXML_EXPORT_THREAD, PDF_EXPORT_THREAD, CONTLABEL_EXPORT_THREAD, XTF_UPLOAD_THREAD, XTF_INDEX_THREAD - these global variables act as the events for PySimpleGUI's threading feature - an example of which can be found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
This function handles the GUI operation as outlined by PySimpleGUI's guidelines. There are 2 components to it, the setup code and the while loop.
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
gc.disable() disables garbage collection. The program re-enables gc when the while loop is running - this allows us to run multiple threads within the GUI to allow the user to interact with the GUI while background jobs are running (such as exports or uploads).
sg.theme() changes the theme of the GUI (colors, button styles, etc.).
- as_username (str): user's ArchivesSpace username
- as_password (str): user's ArchivesSpace password
- as_api (str): the ArchivesSpace API URL
- close_program (bool): if a user exits the popup, this will return true and end run_gui()
- client (ASnake.client object): the ArchivesSpace ASnake client for accessing and connecting to the API
- asp_version (str): the current version of ArchivesSpace
- repositories (dict): contains info on all the repositories for an ArchivesSpace instance, including name as the key and id # as it's value
- resources (dict): contains info on all the resources for each repository where the key (int) is the repository # in ASpace and the value (list) is a list of strings for each resource # in ASpace
- xtf_version (bool): user indicated value whether they want to display xtf features in the GUI
if close_program_as is True - if a user closes the initial popup, the program will exit without error
if asp_version in pdf_broken - if the asp_version matches any of those listed in pdf_broken list, then asp_pdf_api is set as False, otherwise true. This enables/disables a warning message displayed to a user in pdf_layout
rid_box_len (int): sets the length of the Resource Identifier text box in the GUI
if xtf_version is True - if a user on the first ASpace credentials popup indicates they want XTF features, then load the following variables using get_xtf_log()
- xtf_username (str): user's XTF username
- xtf_password (str): user's XTF password
- xtf_host (str): the host URL for the XTF instance.
- xtf_remote_path (str): the path (folder) where a user wants their data to be stored on the XTF host
- xtf_indexer_path (str): the path (file) where the website indexer is located
- close_program (bool): if a user exits the popup, this will return true and end run_gui()
Set xtf_login_menu_button (str) and xtf_opt_button (str) - this sets the values as specific strings. If XTF is not enabled, then an "!" is added to the front of the string, which in PySimpleGUI disables the buttons from being displayed. Otherwise, display the buttons
cleanup_default (list) and cleanup_options (list) store the options if a user wants to specify what cleanup operations to run through cleanup.py.
menu_def sets the options for the toolbar menu.
ead_layout (list), xtf_layout (list), marc_layout (list), contlabel_layout (list), pdf_layout (list) all set the various layouts for their respective screens.
simple_layout_col1 (list) and simple_layout_col2 (list) divide the layout into two columns, the first being defined with the multiline input for resource identifiers, the second defining the above export layouts and Output console.
layout_simple (list) sets the layout for the GUI.
window_simple (PySimpleGUI class) sets the window for the main screen. Any variable with _window follows the same logic.
The way the loop works is that the program "reads" the Window we made above in the setup code and returns events and values (event_simple and values_simple). If a user clicks a button with a specific key as outlined in the layout, that is acted upon under the following if statements.
Note: gc.collect() re-enables garbage collection within the while loop, which allows the program to create multiple threads without experiencing fatal errors. Documentation for this issue can be found here: https://pysimplegui.readthedocs.io/en/latest/#multiple-threads
There are 42 if statements in the while loop:
-
if event_simple == 'Cancel' or event_simple is None or event_simple == "Exit":- This exits the program for the user. -
if event_simple == "_EXPORT_EAD_RAD_":This sets the screen to the EAD layout -
if event_simple == "_EXPORT_MARCXML_RAD_":This sets the screen to the MARCXML layout -
if event_simple == "_EXPORT_PDF_RAD_":This sets the screen to the PDF layout -
if event_simple == "_EXPORT_CONTLABS_RAD_":This sets the screen to the Container Label layout -
if event_simple == "_REPO_DEFAULT_":This is the 'SAVE' button, which saves the user's chosen repository as default in defaults.json. -
if event_simple == "_EXPORT_EAD_":This activates searching ASpace, exporting, and cleaning the EAD.xml files. Under this areif values_simple["_OPEN_RAW_"] is True:which checks if the user selected the "Open raw ASpace exports" checkbox. -
if event_simple == "_EXPORT_ALLEADS_":This activates taking all published resources within the selected repository (or all repositories if using system admin) and exporting those resources as EAD.xml files. -
if event_simple == "_EAD_OPTIONS_" or event_simple == "Change EAD Export Options":This is the 'EAD Export Options' button. -
if event_simple == "Change EAD Cleanup Defaults" or event_simple == "Change Cleanup Defaults":- activates a popup (window) where users can change the operations of cleanup.py script. Outlined in the toolbar menu (Edit) -
if event_simple == "_OPEN_CLEAN_B_" or event_simple == 'Open Cleaned EAD Folder':- This is the 'Open Cleaned EAD Folder' button. Opens clean_eads folder. Outlined in the toolbar menu -
if event_simple == "_OPEN_RAW_EXPORTS_" or event_simple == "Open RAW ASpace Exports":- This is the 'Open Raw ASpace Exports' button. Opens the source_eads folder. Outlined in the toolbar menu and button -
if event_simple == "_EXPORT_MARCXML_":This activates searching ASpace, exporting, and placing MARCXML files in the default folder or folder specified by the user -
if event_simple == "_EXPORT_ALLMARCXMLS_":This activates taking all published resources within the selected repository (or all repositories if using system admin) and exporting those resources as MARC.xml files. -
if event_simple == "_OPEN_MARC_DEST_":This is the 'Open Output' button and will open the default MARCXML folder or folder as specified by the user -
if event_simple == "_MARCXML_OPTIONS_" or event_simple == "Change MARCXML Export Options":- This is the button 'MARCXML Export Options' -
if event_simple == "_EXPORT_PDF_":- This activates searching ASpace, exporting, and placing PDF files in the default folder or folder specified by the user -
if event_simple == "_EXPORT_ALLPDFS_":This activates taking all published resources within the selected repository (or all repositories if using system admin) and exporting those resources as PDF files. -
if event_simple == "_OPEN_PDF_DEST_":- This is the 'Open Output' button and will open the default MARCXML folder or folder as specified by the user -
if event_simple == "_PDF_OPTIONS_" or event_simple == "Change PDF Export Options":- This is the button 'PDF Export Options' -
if event_simple == "_EXPORT_LABEL_":- This activates searching ASpace, exporting, and placing container label (.tsv) files in the default folder or folder specified by the user -
if event_simple == "_EXPORT_ALLCONTLABELS_":This activates taking all published resources within the selected repository (or all repositories if using system admin) and exporting those resources as contailer label (.tsv) files. -
if event_simple == "_OPEN_LABEL_DEST_":- This is the 'Open Output' button and will open the default container label folder or folder as specified by the user -
if event_simple == "_OUTPUT_DIR_LABEL_INPUT_":- This is the folder where container label files are exported -
if event_simple == "_CONTOPT_HELP_":- Opens a new tab in a user's browser to the Github Wiki/User Manual section -
if event_simple in (EAD_EXPORT_THREAD, MARCXML_EXPORT_THREAD, PDF_EXPORT_THREAD, CONTLABEL_EXPORT_THREAD):This checks if the EAD, MARCXML, PDF, or Container Label export threads returned an event, in which case, re-enable export buttons on all screens -
if event_simple == EXPORT_PROGRESS_THREAD:This initiates the progress meter for exporting and uploading files -
if event_simple == "Clear Cleaned EAD Export Folder":- deletes all files in clean_eads. Outlined in the toolbar menu (File) -
if event_simple == "Clear EAD Export Folder":- deletes all files in source_eads. Outlined in the toolbar menu (File) -
if event_simple == "Clear MARCXML Export Folder":deletes all files in source_marcs. Outlined in the toolbar menu (File) -
if event_simple == "Clear Container Label Export Folder":deletes all files in source_labels. Outlined in the toolbar menu (File) -
if event_simple == "Clear PDF Export Folder":deletes all files in source_pdfs. Outlined in the toolbar menu (File) -
if event_simple == "Reset Defaults":- will reset a user's default settings such as folders, options, login credentials -
if event_simple == "Change ASpace Login Credentials":- runs the functionget_aspace_log()to set ArchivesSpace login credentials. Outlined in the toolbar menu (Edit) -
if event_simple == 'Change XTF Login Credentials':- runs the functionget_xtf_log()to set XTF login credentials. Outlined in the toolbar menu (Edit) -
if event_simple == "About":- displays a popup (really another window) describing info about the program version. Users can select the Check Github button, which will open a browser tab to the Github version page for the program. Outlined in the toolbar menu (Help) -
if event_simple == "User Manual":- opens a new tab in a user's webbrowser to the User Manual Wiki page. -
if event_simple == "_UPLOAD_":- clicking the button Upload, opens the popup (window) for users to upload to XTF. Links out to xtf_upload.py to upload and execute an indexing of the most recent files to be uploaded -
if event_simple == "_DELETE_":- clicking the button Delete, opens a popup (window) for users to select files listed on their XTF server to delete and execute an indexing of the files deleted -
if event_simple == "_INDEX_":- This button re-indexes the XTF hostname specified by using a -index default. -
if event_simple == "_XTF_OPTIONS_" or event_simple == "Change XTF Options":This button opens a popup letting the user modify XTF upload and indexing options -
if event_simple in (XTF_INDEX_THREAD, XTF_UPLOAD_THREAD, XTF_DELETE_THREAD, XTF_GETFILES_THREAD):- This checks if the XTF upload and index threads return an event and if so, re-enable the Upload and Index Changed Records buttons This will NOT do a clean index, but only indexing the any changed or new files added to the XTF remote path
Gets a user's ArchiveSpace credentials.
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
- xtf_checkbox (bool, optional): user input that is used to display XTF-related features in the GUI
- as_un (str, optional): None # user's ArchivesSpace username
- as_pw (str, optional): None # user's ArchivesSpace password
- as_ap (str, optional): None # the ArchivesSpace API URL
- as_client (ASnake.client object, optional): None # the ArchivesSpace ASnake client for accessing and connecting to the API
- as_repos (dict, optional): None # contains info on all the repositories for an ArchivesSpace instance, including name as the key and id # as it's value
- as_res (dict, optional): contains info on all the resources for an ArchivesSpace instance, including repository name as key and list of resource ids as value
- xtf_ver (bool, optional): None # user indicated value whether they want to display xtf features in the GUI
- as_username (str): user's ArchivesSpace username
- as_password (str): user's ArchivesSpace password
- as_api (str): the ArchivesSpace API URL
- close_program (bool): if a user exits the popup, this will return true and end run_gui()
- client (ASnake.client object): the ArchivesSpace ASnake client for accessing and connecting to the API
- asp_version (str): the current version of ArchivesSpace
- repositories (dict): contains info on all the repositories for an ArchivesSpace instance, including name as the key and id # as it's value
- resource_ids (dict): contains info on all the resources for each repository for an ArchivesSpace instance, including repository # as the key and a list of resource #s as strings as it's value
- xtf_version (bool): user indicated value whether they want to display xtf features in the GUI
This function gets a user's ArchiveSpace credentials. There are 3 components to it, the setup code, correct_creds while loop, and the window_asplog_active while loop. It uses ASnake.client to authenticate and stay connected to ArchivesSpace. Documentation for ASnake can be found here: https://archivesspace-labs.github.io/ArchivesSnake/html/index.html
as_username = as_un
as_password = as_pw
as_api = as_ap
client = as_client
asp_version = None # ArchivesSpace version
repositories = if the optional parameter `as_repos` is not fulfilled (using login for program start), set value as {"Search Across Repositories (Sys Admin Only)": None}, if it is fulfilled (user changing login credentials within program), use `as_repos`
xtf_version = True # checkbox for using XTF features set to true
save_button_asp = "" - set to either " Save and Continue " or " Save and Close " depending on login or in-program window. It changes the language of the save button.
window_asplog_active = True # To help break from the login verification while loop
correct_creds = False # To help break out of the login verification while loop
close_program = False # If a user hits the X button, it passes that on to the run_gui() function which closes the program entirely.
Yes, I know it's weird. This while loop keeps the ArchivesSpace
login window open until a user fills in the correct credentials and sets correct_creds
to True.
It sets the columns for the layout, the layout, and the window.
The way the loop works is that the program "reads" the Window we made above in the setup code and returns events and values (event_log and values_log). The window takes the inputs provided by the user using as_username = values_log["_ASPACE_UNAME_"] for instance.
The try, except statement tries to authorize the user using the ArchivesSpace ASnake client, which is a package for handling API requests to ArchivesSpace. If it fails to authenticate, a popup is generated saying the credentials are incorrect. The cycle continues until the credentials are correct, or the user exits out of the login window. Also, if the credentials are correct, it will grab repository and resources info from ASpace to return for exporting.
Gets a user's XTF credentials.
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
- login (bool): determines whether window is on initial popup or within program. It changes the language of the save button
- xtf_un (object, optional): None # user's XTF username
- xtf_pw (object, optional): None # user's XTF password
- xtf_ht (object, optional): None # the host URL for the XTF instance
- xtf_rp (object, optional): None # the path (folder) where a user wants their data to be stored on the XTF host
- xtf_ip (object, optional): None # the path (file) where the website indexer is located
- xtf_username (str)
- xtf_password (str)
- xtf_host (str)
- xtf_remote_path (str)
- xtf_indexer_path (str)
- close_program (bool)
This function gets a user's XTF credentials. There are 3 components to it, the setup code, correct_creds while loop, and window_xtflog_active while loop.
xtf_username = xtf_un # XTF credentials to to be set later in the function
xtf_password = xtf_pw # XTF credentials to to be set later in the function
xtf_host = xtf_ht
xtf_remote_path = xtf_rp
xtf_indexer_path = xtf_ip
save_button_xtf = if login is true (a user starting the program), then assign the string "Save and Continue" to act as the button dialog in the GUI. Else, set as "Save and Exit"
window_xtflog_active = True # To help break out of the GUI while loop
correct_creds = False # To help break from the login verification while loop
close_program = False # If a user hits the X button, it passes that on to the run_gui() function which closes the
program entirely.
This while loop keeps the ArchivesSpace login window open until a user fills in the correct
credentials and sets correct_creds to True.
It sets the columns for the layout, the layout, and the window.
The way the loop works is that the program "reads" the Window we made above in the setup
code and returns events and values (event_xlog and values_xlog). The window takes the
inputs provided by the user using xtf_username = values_xlog["_XTF_UNAME_"] for instance.
The try, except statement tries to authorize the user by checking the class variable
self.scp in xtf_upload.py. This is only set if the connection to XTF is successful, found
in the function connect_remote(self) in xtf_upload.py. If it fails to authenticate,
a popup is generated saying the credentials are incorrect. The cycle continues
until the credentials are correct, or the user exits out of the login window.
Iterates through the user input and sends them to as_export.py to fetch_results() and export_ead().
- input_ids - (list) a list of user inputs as gathered from the Resource Identifiers input box
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
- cleanup_options - (list) a list of options a user wants to run against an EAD.xml file after export to clean the file
- repositories - (dict) a dictionary of repositories as listed in the ArchivesSpace instance
- client - (ASnake.client object) a client object from ASnake.client to allow to connect to the ASpace API
- values_simple - (list) a list of values as entered with the
run_gui()function - export_all (bool): whether to pass URIs of all published resources to export
This function iterates through the user input in the Resource Identifier text box on the left side of the screen and
sends them to as_export.py to fetch_results() and export_ead().
export_counter (int): counts the number of times a successful export is carried out. This number is printed at the end of the function
if "," in input_ids - checks for commas or newlines in the Resource Identifier box and splits the input and adds it to a list
called resources. Otherwise, add resource identifiers to the resources list via splitlines
The for loop begins by iterating through each user input as created in the resources list above.
It then initializes a class instance of ASExport from as_export.py and runs the fetch_results()
function on each input. If an error occurs when fetching results, the class instance self.error will not be none
and an if-else statement will default to else, printing the error statement in the Output Terminal.
If there are results that both match and do not match exactly the input, those that did not match but were fetched
are added to self.result and printed.
For the singular result that matched, it is run against the class method export_ead(). If any errors are caught, the
if-else statement will determine if self.error is not none and print the error to the Output terminal. If no error
is detected, it will ask whether or not the user selected to run cleanup.py on the exported records and if the user
wants to keep the raw ASpace EAD exports.
gui_window.write_event_value('-EAD_THREAD-', (threading.current_thread().name,)) - Returns an event and value to the main GUI thread, re-enabling the Upload and Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
If the user is exporting all resources, the function will not update the progress bar or return any of the threads. That is handled in get_all_eads()
Iterates through resources set to Publish = True and sends them to get_eads() to fetch and export files.
- input_ids (dict): contains repository ASpace ID as key and all published resource IDs in a list as value.
- defaults (dict): contains the data from defaults.json file, all data the user has specified as default
- cleanup_options (list): options a user wants to run against an EAD.xml file after export to clean the file. These include the following: "ADD_EADID", "DEL_NOTES", "CLN_EXTENTS", "ADD_CERTAIN", "ADD_LABEL", "DEL_LANGTRAIL", "DEL_CONTAIN", "ADD_PHYSLOC", "DEL_ATIDS", "DEL_ARCHIDS", "CNT_XLINKS", "DEL_NMSPCS", "DEL_ALLNS"
- repositories (dict): repositories as listed in the ArchivesSpace instance
- client (ASnake.client object): the ArchivesSpace ASnake client for accessing and connecting to the API
- gui_window (PySimpleGUI Object): is the GUI window for the app. See PySimpleGUI.org for more info
The function iterates through input_ids values, which is a list of dictionaries containing the repository # as their key and a list of strings with the resource ids. It then checks if the resource's publish status is set to True and if so, passes the info into get_eads(). If it's not set to publish, the resource is skipped and the counters for the progress meter are subtracted. It then prints the number of exports in the console window.
gui_window.write_event_value('-EAD_THREAD-', (threading.current_thread().name,)) - Returns an event and value to the main GUI thread, re-enabling the Upload and Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
Write the options selected to the defaults.json file.
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
This function opens a window in the GUI that allows a user to choose specific export options. These options include:
- Include unpublished components (default is false)
- Include digital objects (default is true)
- Use numbered container levels (default is true)
- Convert to EAD3 (default is false)
- Keep raw ASpace Exports (default is false)
- Set raw ASpace output folder
- Clean EAD records on export (default is true)
- Set clean ASpace output folder
The function will write the options selected to the defaults.json file. Additionally, a user is alerted if Keep raw ASpace Exports and Clean EAD records on export are set to false - the result of which will mean no file will be kept on export.
Write the options selected to the defaults.json file.
- cleanup_defaults - (list) a list of all the default values a user can select for cleaning an EAD.xml file
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
This function opens a window in the GUI that allows a user to choose what operations they want to run in order to clean any exported EAD.xml files. These options include:
-
Add Resource ID as EADID- Takes the resource identifier as listed in ArchivesSpace and copies it to the element in the EAD.xml file. -
Delete Empty Notes- Searches for every<p>element in the EAD.xml file and checks if there is content in the element. If not, it is deleted. -
Remove (), [], {} from and Empty Extents- Does 2 things. It deletes any empty<extent>elements and removes non-alphanumeric characters from the beginning of extent elements. An example would be:<extent>(13.5x2.5")</extent>. This would change to<extent>13.5x2.5"</extent>. -
Add Certainty Attribute- Adds the attributecertainty="approximate"to all dates that include words such as circa, ca. approximately, etc. -
Add label='Mixed Materials' to containers without label- Adds the attributelabel='Mixed Materials'to any container element that does not already have a label attribute. -
Remove trailing . from langmaterial- Removes the ending period on the element. -
Delete Empty Containers- Searches an EAD.xml file for all container elements and deletes any that are empty. -
Add Barcode as physloc Tag- This adds aphyslocelement to an element when a container has a label attribute. It takes an appended barcode to the label and makes it the value of the physloc tag. -
Remove Archivists' Toolkit IDs- Finds any unitid element with a type that includes an Archivists Toolkit unique identifier. Deletes that element. -
Remove Archon IDs- Finds any unitid element with an Archon unique identifier. Deletes that element. -
Remove xlink Prefixes from Digital Objects- Counts every attribute that occurs in a<dao>element. Removesxlink:prefixes in all attributes. -
Remove Unused Namespaces- Removes any unused namespaces in the EAD.xml file. -
Remove All Namespaces- Replaces other namespaces not removed byclean_unused_ns()in the<ead>element with an empty<ead>element.
The function will write the options selected to the defaults.json file.
Iterates through user input and sends them to as_export.py to fetch_results() and export_marcxml().
- input_ids - (list) a list of user inputs as gathered from the Resource Identifiers input box
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
- repositories - (dict) a dictionary of repositories as listed in the ArchivesSpace instance
- client - (ASnake.client object) a client object from ASnake.client to allow to connect to the ASpace API
- values_simple - (list) a list of values as entered with the
run_gui()function - gui_window (PySimpleGUI Object): is the GUI window for the app. See PySimpleGUI.org for more info
- export_all (bool): whether to pass URIs of all published resources to export
This function iterates through the user input in the Resource Identifier text box on the left side of the screen and
sends them to as_export.py to fetch_results() and export_marcxml().
export_counter (int): counts the number of times a successful export is carried out. This number is printed at the end of the function
if "," in input_ids - checks for commas or newlines in the Resource Identifier box and splits the input and adds it to a list
called resources. Otherwise, add resource identifiers to the resources list via splitlines
The for loop begins by iterating through each user input as created in the resources list above.
It then initializes a class instance of ASExport from as_export.py and runs the fetch_results()
function on each input. If an error occurs when fetching results, the class instance self.error will not be none
and an if-else statement will default to else, printing the error statement in the Output Terminal.
If there are results that both match and do not match exactly the input, those that did not match but were fetched
are added to self.result and printed.
For the singular result that matched, it is run against the class method export_marcxml(). If any errors are caught,
the if-else statement will determine if self.error is not none and print the error to the Output terminal.
gui_window.write_event_value('-MARCXML_THREAD-', (threading.current_thread().name,)) - Returns an event and value to the main GUI thread, re-enabling the Upload and Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
If the user is exporting all resources, the function will not update the progress bar or return any of the threads. That is handled in get_all_marcxml()
Iterates through resources set to Publish = True and sends them to get_marcxml() to fetch and export files.
- input_ids (dict): contains repository ASpace ID as key and all published resource IDs in a list as value.
- defaults (dict): contains the data from defaults.json file, all data the user has specified as default
- repositories (dict): repositories as listed in the ArchivesSpace instance
- client (ASnake.client object): the ArchivesSpace ASnake client for accessing and connecting to the API
- gui_window (PySimpleGUI Object): is the GUI window for the app. See PySimpleGUI.org for more info
The function iterates through input_ids values, which is a list of dictionaries containing the repository # as their key and a list of strings with the resource ids. It then checks if the resource's publish status is set to True and if so, passes the info into get_marcxml(). If it's not set to publish, the resource is skipped and the counters for the progress meter are subtracted. It then prints the number of exports in the console window.
gui_window.write_event_value('-MARCXML_THREAD-', (threading.current_thread().name,)) - Returns an event and value to the main GUI thread, re-enabling the Upload and Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
Write the options selected to the defaults.json file.
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
This function opens a window in the GUI that allows a user to choose specific export options. These options include:
- Include unpublished components (default is false)
- Open output folder on export (default is false)
- Set output folder
The function will write the options selected to the defaults.json file.
Iterates through the user input and sends them to as_export.py to fetch_results() and export_pdf().
- input_ids - (list) a list of user inputs as gathered from the Resource Identifiers input box
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
- repositories - (dict) a dictionary of repositories as listed in the ArchivesSpace instance
- client - (ASnake.client object) a client object from ASnake.client to allow to connect to the ASpace API
- values_simple - (list) a list of values as entered with the
run_gui()function - gui_window (PySimpleGUI Object): is the GUI window for the app. See PySimpleGUI.org for more info
- export_all (bool): whether to pass URIs of all published resources to export
This function iterates through the user input in the Resource Identifier text box on the left side of the screen and
sends them to as_export.py to fetch_results() and export_pdf().
export_counter (int): counts the number of times a successful export is carried out. This number is printed at the end of the function
if "," in input_ids - checks for commas or newlines in the Resource Identifier box and splits the input and adds it to a list
called resources. Otherwise, add resource identifiers to the resources list via splitlines
The for loop begins by iterating through each user input as created in the resources list above.
It then initializes a class instance of ASExport from as_export.py and runs the fetch_results()
function on each input. If an error occurs when fetching results, the class instance self.error will not be none
and an if-else statement will default to else, printing the error statement in the Output Terminal.
If there are results that both match and do not match exactly the input, those that did not match but were fetched
are added to self.result and printed.
For the singular result that matched, it is run against the class method export_pdf(). If any errors are caught,
the if-else statement will determine if self.error is not none and print the error to the Output terminal.
gui_window.write_event_value('-PDF_THREAD-', (threading.current_thread().name,)) - Returns an event and value to the main GUI thread, re-enabling the Upload and Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
If the user is exporting all resources, the function will not update the progress bar or return any of the threads. That is handled in get_all_pdfs()
Iterates through resources set to Publish = True and sends them to get_pdfs() to fetch and export files.
- input_ids (dict): contains repository ASpace ID as key and all published resource IDs in a list as value.
- defaults (dict): contains the data from defaults.json file, all data the user has specified as default
- repositories (dict): repositories as listed in the ArchivesSpace instance
- client (ASnake.client object): the ArchivesSpace ASnake client for accessing and connecting to the API
- gui_window (PySimpleGUI Object): is the GUI window for the app. See PySimpleGUI.org for more info
The function iterates through input_ids values, which is a list of dictionaries containing the repository # as their key and a list of strings with the resource ids. It then checks if the resource's publish status is set to True and if so, passes the info into get_pdfs(). If it's not set to publish, the resource is skipped and the counters for the progress meter are subtracted. It then prints the number of exports in the console window.
gui_window.write_event_value('-PDF_THREAD-', (threading.current_thread().name,)) - Returns an event and value to the main GUI thread, re-enabling the Upload and Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
Write the options selected to the defaults.json file.
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
This function opens a window in the GUI that allows a user to choose specific export options. These options include:
- Include unpublished components (default is false)
- Include digital objects (default is true)
- Use numbered container levels (default is true)
- Convert to EAD3 (default is false)
- Open ASpace Exports on Export (default is false)
- Set output folder
The function will write the options selected to the defaults.json file.
Iterates through the user input and sends them to as_export.py to fetch_results() and export_labels().
- input_ids - (list) a list of user inputs as gathered from the Resource Identifiers input box
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
- repositories - (dict) a dictionary of repositories as listed in the ArchivesSpace instance
- client - (ASnake.client object) a client object from ASnake.client to allow to connect to the ASpace API
- values_simple - (list) a list of values as entered with the
run_gui()function - gui_window (PySimpleGUI Object): is the GUI window for the app. See PySimpleGUI.org for more info
- export_all (bool): whether to pass URIs of all published resources to export
This function iterates through the user input in the Resource Identifier text box on the left side of the screen and
sends them to as_export.py to fetch_results() and export_labels().
export_counter (int): counts the number of times a successful export is carried out. This number is printed at the end of the function
if "," in input_ids - checks for commas or newlines in the Resource Identifier box and splits the input and adds it to a list
called resources. Otherwise, add resource identifiers to the resources list via splitlines
gui_window.write_event_value('-CONTLABEL_THREAD-', (threading.current_thread().name,)) - Returns an event and value to the main GUI thread, re-enabling the Upload and Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
The for loop begins by iterating through each user input as created in the resources list above.
It then initializes a class instance of ASExport from as_export.py and runs the fetch_results()
function on each input. If an error occurs when fetching results, the class instance self.error will not be none
and an if-else statement will default to else, printing the error statement in the Output Terminal.
If there are results that both match and do not match exactly the input, those that did not match but were fetched
are added to self.result and printed.
For the singular result that matched, it is run against the class method export_labels(). If any errors are caught,
the if-else statement will determine if self.error is not none and print the error to the Output terminal.
If the user is exporting all resources, the function will not update the progress bar or return any of the threads. That is handled in get_all_contlabels()
Iterates through resources set to Publish = True and sends them to get_contlabels() to fetch and export files.
- input_ids (dict): contains repository ASpace ID as key and all published resource IDs in a list as value.
- defaults (dict): contains the data from defaults.json file, all data the user has specified as default
- repositories (dict): repositories as listed in the ArchivesSpace instance
- client (ASnake.client object): the ArchivesSpace ASnake client for accessing and connecting to the API
- gui_window (PySimpleGUI Object): is the GUI window for the app. See PySimpleGUI.org for more info
The function iterates through input_ids values, which is a list of dictionaries containing the repository # as their key and a list of strings with the resource ids. It then checks if the resource's publish status is set to True and if so, passes the info into get_contlabels(). If it's not set to publish, the resource is skipped and the counters for the progress meter are subtracted. It then prints the number of exports in the console window.
gui_window.write_event_value('-CONTLABEL_THREAD-', (threading.current_thread().name,)) - Returns an event and value to the main GUI thread, re-enabling the Upload and Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
Uploads files to XTF.
- defaults (dict): a dictionary containing the data from defaults.json file, all data the user has specified as default
- xtf_hostname (str): the host URL for the XTF instance
- xtf_username (str): user's XTF username
- xtf_password (str): user's XTF password
- xtf_remote_path (str): the path (folder) where a user wants their data to be stored on the XTF host
- values_upl (dict?): the GUI values a user chose when selecting files to upload to XTF
- gui_window (PySimpleGUI object): the GUI window used by PySimpleGUI. Used to return an event
- Initialize a RemoteClient class in xtf_upload.py
- Fetch the local files locations as a list
- Upload files using xtf_upload.py's bulk_upload() function
- If the user selected a re-index to run: use xtf_upload.py's
execute_commands()function to run a re-indexing on updated files only, not a clean re-index. -
gui_window.write_event_value('-XTFUP_THREAD-', (threading.current_thread().name,))- Returns an event and value to the main GUI thread, re-enabling the Upload, Delete, Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
Deletes files from XTF.
- defaults (dict): contains the data from defaults.json file, all data the user has specified as default
- xtf_hostname (str): the host URL for the XTF instance
- xtf_username (str): user's XTF username
- xtf_password (str): user's XTF password
- xtf_remote_path (str): the path (folder) where a user wants their data to be stored on the XTF host
- xtf_index_path (str): the path (file) where the textIndexer for XTF is - used to run the index
- values_del (dict): the GUI values a user chose when selecting files to upload to XTF
- gui_window (PySimpleGUI object): the GUI window used by PySimpleGUI. Used to return an event
- Initialize a RemoteClient class in xtf_upload.py
- Fetch the local files locations as a list
- Perform a
-rm(remove) command using xtf_upload.py'sexecute_commands()function for each individual file in the list - If the user selected a re-index to run: use xtf_upload.py's
execute_commands()function to run a re-indexing on updated files only, not a clean re-index. -
gui_window.write_event_value('-XTFDEL_THREAD-', (threading.current_thread().name,))- Returns an event and value to the main GUI thread, re-enabling the Upload, Delete, Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
Runs a re-index of all changed or new files in XTF. It is not a clean re-index.
- defaults (dict): contains the data from defaults.json file, all data the user has specified as default
- xtf_hostname (str): the host URL for the XTF instance
- xtf_username (str): user's XTF username
- xtf_password (str): user's XTF password
- xtf_remote_path (str): the path (folder) where a user wants their data to be stored on the XTF host
- xtf_index_path (str): the path (file) where the textIndexer for XTF is - used to run the index
- gui_window (PySimpleGUI object): the GUI window used by PySimpleGUI. Used to return an event
- Initialize a RemoteClient class in xtf_upload.py
- Use xtf_upload.py's execute_commands() function to run a re-indexing on updated files only, not a clean re-index.
-
gui_window.write_event_value('-XTFIND_THREAD-', (threading.current_thread().name,))- Returns an event and value to the main GUI thread, re-enabling the Upload, Delete, Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
Retrieves list of files from XTF remote path as entered upon XTF login screen.
- defaults (dict): a dictionary containing the data from defaults.json file, all data the user has specified as default
- xtf_hostname (str): the host URL for the XTF instance
- xtf_username (str): user's XTF username
- xtf_password (str): user's XTF password
- xtf_remote_path (str): the path (folder) where a user wants their data to be stored on the XTF host
- xtf_index_path (str): the path (file) where the textIndexer for XTF is - used to run the index
- gui_window (PySimpleGUI object): the GUI window used by PySimpleGUI. Used to return an event
- remote_files (list): a sorted list of files retrieved from XTF remote path
- Initialize a RemoteClient class in xtf_upload.py
- Use xtf_upload.py's execute_commands() function to run an ls command in the XTF remote path and run the sort_list() function to generate a list in human readable form.
-
gui_window.write_event_value('-XTFGET_THREAD-', (threading.current_thread().name,))- Returns an event and value to the main GUI thread, re-enabling the Upload, Delete, and Index Changed Records buttons. This comes from PySimpleGUI's write_event_value update from July 2020. An issue can be found here: https://github.com/PySimpleGUI/PySimpleGUI/issues/3641 and demo found here: https://github.com/PySimpleGUI/PySimpleGUI/blob/master/DemoPrograms/Demo_Multithreaded_Write_Event_Value.py
Set options for uploading and re-indexing records to XTF.
- defaults - (dict) a dictionary containing the data from defaults.json file, all data the user has specified as default
This function allows a user to select what options they want when uploading and re-indexing records in XTF. These options include:
- Re-index changed records upon upload (default is true)
- Select source folder
- Change XTF Login Credentials - button that opens the XTF popup window
The function will write the options selected to the defaults.json file.
Takes a filepath and opens the folder according to Windows, Mac, or Linux.
- filepath - (str) a filepath as input
This function takes a filepath and opens the folder according to whichever Operating system the user is using including Windows, Mac, and Linux.
Upload local files to remote host.
- local_file_dir - (str) the local file directory path used to determine what file to use for uploading to XTF
- select_files - a list of files to be uploaded to XTF
- (list): contains filepaths for files to be uploaded to XTF
This function creates a list of the files to be uploaded to XTF, as written by Todd Birchard in his article 'SSH & SCP in Python with Paramiko', https://hackersandslackers.com/automate-ssh-scp-python-paramiko/
Checks for directories in the current directory the GUI or .exe is located and tries to open defaults.json
- json_data (dict): contains data from defaults.json for user's default settings
The function first gets the current working directory, then performs an os.walk(), iterating through the root, directories, and files looking for the following directories: clean_eads, source_eads, source_marcs, source_pdfs, source_labels. If any of the above are not found, the function create_default_folders() from setup.py is run.
It also attempts to open and load the data from defaults.json. If unable to, it will create a new defaults.json file with default values as assigned by the set_defaults_file() function from setup.py.
Sorts a list in human readable order. Source: https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/
- input_list (list): a list to be sorted
- A list sorted in human readable order
This function is taken from this source code: https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/. See the source code for more info.
Starts a thread and disables buttons to prevent multiple requests/threads.
- function (function): the function to pass to the thread
- args (tuple): the arguments to pass to the function with ending ,. Ex. (arg, arg, arg,)
- gui_window (PySimpleGUI object): the GUI window used by PySimpleGUI. Used to return an event
This function initiates a thread that is separate from the GUI thread by using the threading python library and the .Thread class. It then starts the thread and updates all the Export buttons in the GUI window across export options. It does not disable XTF options
Checks the create time of logs and deletes those after 1 month
None
The function checks if there is a "logs" directory in the current directory, then checks each file within the logs directory and if any files are 1 month old or older, they are deleted.
Will run the run_gui() function with the json_data returned from setup_files().
This script searches the ArchivesSpace database for a user-input resource identifier and can export EAD.xml, MARCXML, Container Labels (.tsv), and PDF files of the resource if found.
Interacts with the ASpace API to search for and retrieve records.
- input_id (str): a string value the user generated from the Resource Identifier box in the GUI
- repo_id (int): an integer that contains the number for which a repository is assigned via the ArchivesSpace instance
- client (ASnake.client object): a client object from ASnake.client to allow to connect to the ASpace API
- output_dir (str): a string of a filepath containing the folder a user wants files to be exported to
- self.input_id (str): a string value the user generated
- self.filename (str): the name assigned to the exported file, takes input_id and removes any "/"s
- self.repo_id (int): an integer that contains the number for which a repository is assigned via the ArchivesSpace instance
- self.resource_id (int): an integer that is the ArchivesSpace's assigned resource identifier found in the resource URI
- self.resource_repo (int): an integer that is the ArchivesSpace's assigned respository identifier also found in a resource's URI
- self.client (ASnake.client object): a client object from ASnake.client to allow to connect to the ASpace API
- self.error (str): a string who's value is None unless an error occurs and is then populated with a string detailing the error
- self.result (str): a string who's value is none unless an operation completes or multiple results are returned and is then populated with a string detailing the result(s)
- self.filepath (str): a string that is the filepath where records will be exported to
- self.output_directory (str): location of the output directory for the file"""
- self.export_all (bool): whether exporting all records for a repository"""
This method searches ArchivesSpace for a resource that matches the self.input_id. If it matches with a resource, it then extracts the resource identifier and the resource repository number from the URI and assigns them to self.resource_id and self.resource_repo. If an error occurs, self.error will be populated with a string containing information about the error.
- If self.export_all is False, remove all non-alphanumeric characters using
id_combined_regex.sub() - Check if self.repo_id is None, if not, search using that specific repository with this API endpoint:
'/repositories/{}/search'. If None, search across repositories with this API endpoint: '/search'. This was made because
only system administrators can search across repositories. All other users have to have the right permissions set
within their repository to use the program.
- Method - get_paged() More documentation on this can be found in the ASnake readme: https://github.com/archivesspace-labs/ArchivesSnake#low-level-api
- Parameters:
-
"q": 'four_part_id:' + input_id- q is the optional search query (str), we include "four_part_id:" as that narrows our search to looking for the complete resource identifier. We add the input_id as a whole. -
"type": ['resource']- this specifies that we are looking for resources only
-
- Go through all that is returned and append it to the list
search_results. This happens because of ASnake's get_paged() function, which returns a list of JSON results. - If self.export all is True, then generate a list of an single dictionary (do this to match expected format) that matches the input_id passed
- If there are no results in
search_results, generate an error in self.error. Else:
- Begin counting the results and set
match_resultsandnon-match_resultsas dictionaries with URI's as keys and the resource title as their values. -
for key, value in json_info.items()- the following is a little confusing, but the basic summary is we want to match the resource identifier as listed in ArchivesSpace with the user input (self.input_id). If the identifier is more than 1 part, we use regex to check all of the fields in the json record for id_0, id_1, etc. and grab the values found in those fields and combine them into a string calledcombined_aspace_id. Then we again remove all non-alphanumeric characters. If the user input and combined ArchivesSpace identifier match, get the URI for the resource, split it on "/"'s and take the last number as the value ofresource_uriand the 3rd part [2] index as theresource_repo - Any results that fail this are put in the
non_match_resultsdictionary. - Check if there are any results in
non_match_resultsand not inmatch_resultsand return a value in self.error. - Check if there are any results in
non_match_resultsandmatch_resultsand return the non_matched results in self.results
Handles exporting EAD.xml files from ArchivesSpace.
- include_unpublished=False - (bool) Optional parameter
- include_daos=True - (bool) Optional parameter
- numbered_cs=True - (bool) Optional parameter
- ead3=False - (bool) Optional parameter
This function handles exporting EAD.xml files from ArchivesSpace. The steps for this function are as follows:
- Within a try, except statement, try to run a client.get (similar to request.get) to make a call to the ArchivesSpace API to export an EAD.xml file of a specific resource record.
- In the ArchivesSpace API, we use the endpoint 'repositories/{}/resource_descriptions/{}.xml' as documented here: https://archivesspace.github.io/archivesspace/api/#get-an-ead-representation-of-a-resource
- Parameters:
-
'include_unpublished': False- doesn't include unpublished portions of the resource -
'include_daos': True- include digital objects -
'numbered_cs': True- include numbered container levels -
'print_pdf': False- do not export resource as pdf -
'ead3': False- do not use EAD3 schema, instead defaults to EAD2002
-
-
if request_ead.status_code == 200:- check if request was successful- Assign .xml to the end of self.filepath
-
with open(filepath, "wb") as local_file:- write the file's content to the source_eads folder and return the filepath and result
Handles exporting MARCXML files from ArchivesSpace.
- include_unpublished=False - (bool) Optional parameter
This function handles exporting MARCXML files from ArchivesSpace. The steps for this function are as follows:
- Run a client.get (similar to request.get) to make a call to the ArchivesSpace API to
export a MARC .xml file of a specific resource record.
- In the ArchivesSpace API, we use the endpoint '/repositories/{}/resources/marc21/{}.xml' as documented here: https://archivesspace.github.io/archivesspace/api/#get-a-marc-21-representation-of-a-resource
- Parameters:
-
'include_unpublished': False- doesn't include unpublished portions of the resource
-
-
if request_ead.status_code == 200:- check if request was successful- Assign .xml to the end of self.filepath
-
with open(filepath, "wb") as local_file:- write the file's content to the source_marcs folder and return the filepath and result
Handles exporting PDF files from ArchivesSpace.
- include_unpublished=False - (bool) Optional parameter
- include_daos=True - (bool) Optional parameter
- numbered_cs=True - (bool) Optional parameter
- ead3=False - (bool) Optional parameter
This function handles exporting PDF files from ArchivesSpace. NOTE: This will work in ArchivesSpace 2.8.0. Any earlier versions will not properly export! The steps for this function are as follows:
- Run a client.get (similar to request.get) to make a call to the ArchivesSpace API to
export a .pdf file of a specific resource record.
- In the ArchivesSpace API, we use the endpoint 'repositories/{}/resource_descriptions/{}.pdf' as documented here: https://archivesspace.github.io/archivesspace/api/#get-an-ead-representation-of-a-resource
- Parameters:
-
'include_unpublished': False- doesn't include unpublished portions of the resource -
'include_daos': True- include digital objects -
'numbered_cs': True- include numbered container levels -
'print_pdf': True- exports resource as PDF -
'ead3': False- do not use EAD3 schema, instead defaults to EAD2002
-
-
if request_ead.status_code == 200:- check if request was successful- Assign .pdf to the end of self.filepath
-
with open(filepath, "wb") as local_file:- write the file's content to the source_pdfs folder and return the filepath and result
This function handles exporting container label files from ArchivesSpace.
- Run a client.get (similar to request.get) to make a call to the ArchivesSpace API to
export a .tsv file of a specific resource record.
- In the ArchivesSpace API, we use the endpoint 'repositories/{}/resource_labels/{}.tsv' as documented here: https://archivesspace.github.io/archivesspace/api/#get-a-tsv-list-of-printable-labels-for-a-resource
-
if request_ead.status_code == 200:- check if request was successful- Assign .tsv to the end of self.filepath
-
with open(filepath, "wb") as local_file:- write the file's content to the source_labels folder and return the filepath and result
This script runs a series of xml and string cleanup operations on an EAD.xml file. The
function cleanup_eads iterates through all the files in the folder source_eads
and creates a class instance of EADRecord, running a host of class methods on the
EAD.xml file and returns it to the function, which writes the new file to the folder
clean_eads.
Modify an EAD.xml file for web display and EAD2002 compatibility.
- file_root - (lxml.Element object) an lxml.element root to be edited by different methods in the class
- self.root - (lxml.Element object) an lxml.element root to be edited by different methods in the class
- self.results - (str) a string that is filled with result information when methods are performed
- self.eadid - (str) a string that contains the EADID for a record
- self.daos - (bool) a boolean that determines whether there are digital objects in a record
This class hosts 14 different methods, each designed to modify an EAD.xml file according to guidelines set by the Hargrett and Russell Libraries for display on their XTF finding aid websites, as well as other measures to match some of Harvard's EAD schematron checks. This produces an EAD.xml file that is EAD2002 compliant.
The following methods will be described briefly.
-
add_eadid()- Takes the resource identifier as listed in ArchivesSpace and copies it to the element in the EAD.xml file. -
delete_empty_notes()- Searches for every<p>element in the EAD.xml file and checks if there is content in the element. If not, it is deleted. -
edit_extents()- Does 2 things. It deletes any empty<extent>elements and removes non-alphanumeric characters from the beginning of extent elements. An example would be:<extent>(13.5x2.5")</extent>. This would change to<extent>13.5x2.5"</extent>. -
add_certainty_attr()- Adds the attributecertainty="approximate"to all dates that include words such as circa, ca. approximately, etc. -
add_label_attr()- Adds the attributelabel="Mixed Materials"to any container element that does not already have a label attribute. -
strip_langmaterial()- Removes the ending period on the element. -
delete_empty_containers()- Searches an EAD.xml file for all container elements and deletes any that are empty. -
update_barcode()- This adds aphyslocelement to an element when a container has a label attribute. It takes an appended barcode to the label and makes it the value of the physloc tag. -
remove_at_leftovers()- Finds any unitid element with a type that includes an Archivists Toolkit unique identifier. Deletes that element. -
remove_archon_ids()- Finds any unitid element with an Archon unique identifier. Deletes that element. -
count_xlinks()- Counts every attribute that occurs in a<dao>element. Removesxlink:prefixes in all attributes. -
clean_unused_ns()- Removes any unused namespaces in the EAD.xml file. -
clean_do_dec()- Replaces other namespaces not removed byclean_unused_ns()in the<ead>element with an empty<ead>element. -
clean_suite()- Runs the above methods according to what the user specified in the custom_clean parameter. Will also add a doctype to the EAD.xml file and encode it in UTF-8.
This function iterates through all the files located in source_eads, uses the lxml package to parse the file into an lxml element, passes the lxml element through the EADRecord class, runs the clean suite against the lxml element, then writes it to a file in our clean_eads folder.
To learn more about the lxml package, see the documentation: https://lxml.de/
- filepath - (str) a string of the filepath of the EAD record to be cleaned
- custom_clean - (list) a list of strings as passed from as_xtf_GUI.py that determines
what methods will be run against the lxml element when running the
clean_suite()method. The user can specify what they want cleaned in as_xtf_GUI.py, so this is how those specifications are passed. - output_dir [Optional] - (str) a string of the filepath of where the EAD record should be sent after cleaning, as specified by the user ("clean_eads" is default)
- keep_raw_exports [Optional] - (bool) if a user in as_xtf_GUI.py specifies to keep the exports that come from as_export.py, this parameter will prevent the function from deleting those files in source_eads.
- (bool) - if True, the XML was valid. If False, the XML was not valid.
- results - (str) a string that contains the results from the export process
-
filename(str) set the value as the filename for the record -
fileparent(str) set the filepath of the parent directory of the file -
valid_err(str) set the value to any xml validation errors encountered while parsing - Create an XML parser using lxml to remove redundant namespaces
- Generate an lxml compatible tree for the record using
etree.parse, passing our filepath and custom parser as made above.- If above fails, an exception is raised and the function returns False and the validation error message.
- Get the root of the tree using
tree.getroot(), an lxml function - Create an EADRecord instance, passing the root to the class EADRecord
- Run the method
clean_suite(), passing the parameters for our EADRecord instance andcustom_clean, our list of methods we want to use to clean the EAD record -
clean_ead_file_rootis the path we want the cleaned EAD record to be delivered, passing our output directory and filepath asfile - Open and write our EAD record to the
clean_ead_file_root. This is writing a bytes object, what 'wb' stands for. This is because lxml roots return bytes objects. - Iterate through the files in the source folder and check if any of them are older than 2 months. If so, delete them.
- Check if the user in the GUI indicated they want to keep the raw ArchivesSpace EAD export record. If not, delete all the records in the user specified source folder (source_eads by default)
This script creates the default folders for exporting records as well as defaults.json file. The folders generated by this script include: clean_eads, source_eads, source_labels, source_marcs, and source_pdfs. These folders are created at the same level as the script that creates them, as well as defaults.json.
The script contains 2 functions:
This function checks the defaults.json file to make sure it contains all the appropriate keys and if there is an error, creates a new defaults.json file and returns the data.
-
json_data- (dict) a dictionary containing all the data for default behavior for the GUI
- Run
create_default_folders()function, which creates folders at the same level of the script - establish filepaths for the folders generated by
create_default_folders() -
standard_defaults- (list) a list containing all the keys that should be contained within defaults.json - Under try:
- Open defaults.json and load the data into
json_data - Iterate through the keys and values and append all the keys to the list
defaults_keys - If the value of the key is a dictionary, iterate through that dictionary and append the keys to
defaults_keys. This is done because there are only 2 levels of dictionaries in defaults.json. - Iterate through the keys in
standard_defaultsand check to make sure the same keys exist indefaults_keys
- Open defaults.json and load the data into
- Under except:
- Print an error message
- Overwrite defaults.json and write the new data to the file.
- Open defaults.json again and load the the data into
json_data
This script checks for the existance of the folders clean_eads, source_eads, source_labels, source_marcs, and source_pdfs within the current working directory. It loops and checks each folder individually and if not found will create a new folder.
Deletes and recreates defaults.json file
- Checks if defaults.json exists in the current directory and if so, deletes it.
- Runs set_defaults_file() function to recreate the defaults file.
This script is derived mostly from Todd Birchard in his article 'SSH & SCP in Python with Paramiko', https://hackersandslackers.com/automate-ssh-scp-python-paramiko/
There have only been a few modifications to his program, including setting the output for the function
execute_commands() to return an output string to print in as_XTF_GUI.py and getting rid of the SSH key requirement and validation functions.
This file sets the default inputs for the GUI. This includes:
- Setting the default folders for export and cleanup
- Setting the default options for users upon startup
The GUI was designed to allow a user to save their preferred default behavior. These changes will be saved to defaults.json and will be present when the user closes and re-opens the program.
If you want to reset your defaults, you can reset them through the GUI in the menu (File) or you must delete the file defaults.json and run as_XTF_GUI.py. This will initiate the setup.py script to run.