-
-
Notifications
You must be signed in to change notification settings - Fork 1
Web Functions
Imagine you have a collection of information scattered across the internet-articles on websites, explanations in YouTube videos, or details locked inside PDF documents. What if you could easily gather all that text into one simple format, like a plain document you can read or search through? That's exactly what this library helps with. It quietly works behind the scenes to pull out the important words and ideas from many different places online, turning them into clean, readable text you can actually use. Whether you're researching a topic, saving interesting content, or just trying to understand something new, this tool handles the messy technical work so you don't have to.
Beyond just collecting text, this library also acts like a helpful safety check for links you might come across. We've all seen suspicious URLs in messages or emails that make us wonder, Is this safe to click? The library quietly checks these links against trusted databases that track known troublemakers online. If a link has a history of causing problems-like spreading scams or malware-it will let you know, giving you peace of mind before you interact with it. This means you spend less time worrying about digital risks and more time focusing on the information that matters to you.
What makes this library special is how it adapts to whatever you throw at it. Paste in a YouTube video link, and it will fetch the spoken words as if someone wrote them down for you. Share a webpage URL, and it strips away all the distracting buttons and ads to give you just the core story. Even complex PDF files become plain text with a single step. It's designed to feel effortless, like having a patient assistant who organizes the digital world for you, one piece of content at a time. Whether you're a student, a curious learner, or someone who just wants to navigate the internet more smoothly, this tool quietly makes your digital life a little clearer and safer.
This function takes any text that might contain special numeric codes representing letters or symbols and automatically converts those codes into the actual readable characters they stand for, ensuring the text appears correctly formatted and easy to understand without any hidden encoding issues.
-
input_string: A string containing text that may include numeric character references (such asA); the function processes this text to replace each reference with its corresponding character, resulting in clean, human-readable output where symbols and letters display properly.
Returns a string containing the input text with all numeric character references (e.g., A) converted to their corresponding characters.
This function takes a YouTube video link and retrieves the descriptive keywords associated with that video, presenting them in a clean, readable format where each keyword appears on its own line; if the video cannot be located or lacks these keywords, it clearly indicates that the information is unavailable without requiring any technical setup from the user.
-
url: A string representing the full web address of a YouTube video; the function uses this to identify the specific video and extract its associated keywords. -
userhome: An optional string specifying a custom directory path for accessing API authentication tokens; if provided, it directs the function to use tokens from this location instead of the default, which may be necessary if standard configuration files are unavailable.
Returns a string containing newline-separated video tags when available, the string '{[(VNF)]}' when the video is not found or an error occurs during retrieval, or None when the video is found but has no tags assigned.
This function takes a YouTube video link and retrieves the spoken words from the video as readable text, automatically trying multiple times if needed to ensure it successfully captures the complete dialogue, which it then presents in a clean format starting with a clear label so you can easily understand what you're reading without any extra formatting or technical elements.
-
video_url: A string representing the complete web address of a YouTube video, required to identify which specific video's spoken content should be extracted and converted to text. -
retry: An integer specifying how many times the function should attempt to fetch the transcript when encountering temporary issues, with a default value of 3 meaning it will try up to three times before stopping.
Returns a string beginning with "Video Transcript: " followed by the complete text of the video's spoken content if successful, or returns None if it cannot retrieve the transcript after the specified number of retry attempts.
This function takes a digital PDF document and transforms all the written words inside it into plain, readable text that you can easily copy, search, or analyze, adding a clear label at the beginning so you know exactly where the content came from without any confusing formatting or hidden elements.
-
pdf_buffer: Represents the raw data of a PDF file provided as input; expected to be of type bytes; the function requires this specific format to process the document, and without valid PDF data, no text extraction can occur.
Returns a string containing the extracted text from all pages of the PDF, prefixed with the label "PDF Content: ".
This function quietly handles the behind-the-scenes work of retrieving clean content from any website address you provide, using a specialized service that bypasses common obstacles like blocked requests or complex page structures, so you get the main text without needing to worry about technical setup or potential errors during the process.
-
url: A string representing the web address to scrape; required for identifying the target page and must be a valid URL format. -
userhome: An optional string specifying a custom directory path for authentication tokens; if omitted, defaults to a standard location and may affect access if tokens are stored elsewhere.
Returns a string containing the scraped content from the specified URL if successful, or returns None if any error occurs during the scraping process.
This function takes any chunk of web content filled with hidden formatting codes and technical clutter, then carefully strips away all the invisible scaffolding like menus, scripts, and styling instructions to reveal only the clear, readable words underneath, smoothing out any messy spacing to deliver a clean, natural-looking paragraph ready for easy reading.
-
htmlbuf: A string containing raw HTML content to be processed. The function expects this input to represent standard web page structure, and its quality directly determines the output-poorly formed HTML may yield incomplete or messy text, while well-structured input produces polished readable content.
Returns a string containing the purified plain text extracted from the input HTML, with all tags, scripts, styles, and excessive whitespace removed, condensed into a single clean paragraph with normalized spacing.
This function helps you grab readable text from websites and online content by visiting the provided web address and carefully pulling out the main information while ignoring unnecessary elements like menus, scripts, and styling. It automatically handles different types of online content including regular web pages, YouTube videos, and PDF documents, cleaning up the extracted text to make it easy to read and use for your purposes without any extra formatting or technical clutter.
-
url: A string representing the web address to process; determines the source content to extract text from. -
external: A boolean value that, when set toTrue, uses an external service to fetch the content instead of the built-in browser; defaults toFalse. -
userhome: An optional string specifying a user-specific configuration directory; used when accessing tokens for external services; defaults toNone. -
raw: A boolean value that, when set toTrue, returns the unprocessed HTML content instead of cleaned text; defaults toFalse. -
ua: An optional string representing a custom browser user agent to use when fetching content; defaults to a standard Chrome browser signature if not provided.
Returns a string containing the extracted content prefixed with "Web Page Content:", "Video Transcript:", or "PDF Content:" depending on the source type, or returns None if no content could be retrieved or processed successfully.
This function carefully scans any block of written content to identify and collect all web addresses that begin with http or https, making it easy to see which links are embedded within the text without missing any or including unrelated information.
-
text: The input content to search for web addresses; expected as a string; the function examines this text to find and return all valid URLs starting with http or https.
Returns a list of strings, where each string represents a URL found in the input text.
This function takes a website address you provide and quietly checks what numerical internet location it points to behind the scenes; if the address is valid and reachable, it gives you that numerical location, but if there's any issue finding it-like a typo or temporary network problem-it simply returns nothing without making a fuss.
-
domain: Represents the website address to convert to a numerical location; expected as a string (e.g., "example.com"); the function's ability to return a valid numerical location depends entirely on whether this address exists and can be found through standard internet lookup systems.
Returns a string containing the numerical internet location (IP address) if the website address resolves successfully, otherwise returns None.
This function takes any web address you provide and neatly pulls out the main part that tells you which website it is, like turning a long address into just the recognizable name of the site without any extra details.
-
url: Represents the complete web address from which the domain will be isolated; expected to be a string; the result is directly derived from this input as the domain portion of the address.
Returns a string containing the domain extracted from the provided URL.
This function checks whether a website's address has been reported for harmful activities like scams or hacking by consulting a global database of known problematic server addresses. It quietly investigates the website you're curious about and tells you if it's considered risky to interact with, along with how certain that assessment is, so you can decide whether to proceed safely or avoid potential dangers.
-
domain: A string representing the website address (like "example.com") to be checked for safety. This is required and must be a valid domain name format. -
userhome: An optional string specifying a custom directory path where security credentials are stored. If omitted, the function uses default system locations to find necessary access keys, which affects whether the safety check can be performed successfully.
Returns a tuple containing two values: a boolean (or None) indicating whether the domain is flagged as abusive, and an integer representing the confidence score (0-100) of that assessment. Specifically, it returns (True, score) if abusive with the score value, (False, 0) if clean, or (None, 0) if the check couldn't be completed due to errors or invalid inputs.
If you would like to help support this project financially, please click on the heart shaped sponsor's button in the right column of this page. I also have a merch store with some awesome and really cool products. Please visit supporting Jackrabbit for more options.
All subscriptions/sales go to the costs of sustaining Jackrabbit AI. Thank you.