@johnofpinebrook Web Crawling Script Development Request
- Target Website(s):
URLs to Crawl:
[List the specific URLs or website domains you want to crawl]
Access Requirements:
[Indicate if login/authentication is required and whether you can provide credentials or if the script should handle public access only]
- Specific Data Points:
Information to Extract:
[Describe the specific information or data points you want to extract, e.g., product prices, article titles, etc.]
Page Structure Details:
[Provide details about how the information is structured on the page, including any specific HTML elements, classes, or IDs relevant to the data]
- Frequency and Volume:
Crawl Frequency:
[Specify how often the crawl should occur – once, daily, weekly, etc.]
Data Volume:
[Estimate the number of pages or the volume of data you expect to scrape]
- Output Format:
Desired Format for Extracted Data:
[Indicate the format you require for the output, e.g., CSV, JSON, Excel, etc.]
- Crawler Behavior:
Link Following and Discovery:
[Specify if the crawler needs to follow links to find new pages]
Special Behaviors:
[Mention any special behaviors needed, such as handling infinite scrolls, waiting for AJAX content, dealing with pagination, etc.]
- Compliance and Ethical Considerations:
Legal and Ethical Compliance:
[Acknowledge any compliance with the target website’s terms of service and legal regulations around web scraping]
- Technical Requirements:
Preferred Programming Language/Stack:
[Indicate your preferred language or technology stack, if any]
Infrastructure Requirements:
[Mention if you have any server or infrastructure preferences]
- Error Handling and Logging:
Error Management:
[Describe how you want the script to handle errors or exceptions]
Logging Requirements:
[Detail if you need logging for the crawling process and the level of detail required]
- Deployment and Maintenance:
Deployment Assistance:
[Indicate if you need help with deploying the script]
Maintenance and Updates:
[Describe how future changes or updates to the website or script should be handled]