Web-scrapping tool to be developed to search and report Critical and High Severity Vulnerabilities of OEM equipment (IT and OT) published at respective OEM websites and other relevant web platforms.
ABSTARCT :
Background: Critical Sector organisations uses a number of IT and OT equipment (e.g. Networking and hardware device, Operating Systems, Applications, Firmware etc.). These devices/application come with vulnerabilities from time to time.
There should be timely information sharing mechanism by which the concerned equipment users at critical sector orgs should be altered regarding any critical / high severity vulnerabilities in their equipment within the shortest possible time. Detailed description: The ICT components (HW/SW) being used by Critical Sector Organisations become vulnerable from time to time.
These vulnerabilities can be categorised as Critical, High, Medium and Low. Any exploitation of these vulnerabilities can cause havoc in multiple Critical Sector Organisations where such vulnerable equipment are being used. Keeping in view of the above, there is a need to monitor all such vulnerability information published at the equipment’s OEM websites and also other relevant websites.
Once a critical or high severity vulnerability information is published at OEM website or any other relevant website, the ‘to be developed scrapper’ will immediately take that vulnerability input along with possible mitigating strategy published in the website and send the information to predefined email id(s). Note: The NVD website publishes such OEM vulnerable information. But the same comes with a time lag.
It is therefore needed to get such information directly from OEM websites and /or from other relevant websites where such vulnerable information is published almost in real time. Expected Outcome: An automatic script using open source tools to be developed for the OEM vulnerability information scrapping and reporting.
Tool should know various vulnerability information published data formats/syntax at OEM websites (both for IT and OT hardware and application) and come up with optimum solution for monitoring and reporting of such vulnerability information.
The output of the tool that will be emailed to pre-designated email id(s) is as per following (shared with example; all fields may not be available at the time of reporting): * Product Name: Chrome * Product Version: - NA * OEM name: Google * Severity Level (Critical/High): High * Vulnerability: The N-able PassPortal extension before 3.29.2 for Chrome inserts sensitive information into a log file.
* Mitigation Strategy: Install patch from https://me.n-able.com/s/security-advisory/aArHs000000M8CCKA0/cve202347131-passportal-browser-extension-logs-sensitive-data * Published Date: Jan 2024 * Unique ID: CVE-2023-47131
EXISTING SYSTEM :
The process of scraping data from the Internet can be divided into two sequential steps; acquiring web resources and then extracting desired information from the acquired data. Specifically, a web scraping program starts by composing a HTTP request to acquire resources from a targeted website.
This request can be formatted in either a URL containing a GET query or a piece of HTTP message containing a POST query. Once the request is successfully received and processed by the targeted website, the requested resource will be retrieved from the website and then sent back to the give web scraping program.
The resource can be in multiple formats, such as web pages that are built from HTML, data feeds in XML or JSON format, or multimedia data such as images, audio, or video files. After the web data is downloaded, the extraction process continues to parse, reformat, and organize the data in a structured way.
There are two essential modules of a web scraping program – a module for composing an HTTP request, such as Urllib2 or selenium and another one for parsing and extracting information from raw HTML code, such as Beautiful Soup or Pyquery.
DISADVANTAGE :
Legal and Ethical Concerns
Compliance Issues: Web scraping can sometimes violate terms of service of websites or legal regulations, especially if done on a large scale. Ensure compliance with legal standards and website terms of use.
Privacy Risks: Scraping sensitive or proprietary information might inadvertently breach privacy or security policies.
Technical Challenges
Site Structure Variability: OEM websites and relevant platforms often have diverse and constantly changing structures, making it challenging to maintain a scraping tool.
Dynamic Content: Many modern websites use JavaScript to load content dynamically. This requires advanced scraping techniques, which can be more complex and error-prone.
Anti-Scraping Measures: Websites may employ anti-scraping technologies such as CAPTCHAs, IP blocking, or rate limiting, which can hinder data collection efforts.
PROPOSED SYSTEM :
Of the various types of web scraping programs, some are created to automatically recognize the data structure of a page, such as Nutch or Scrapy, or to provide a web-based graphic interface that eliminates the need for manually written web scraping code, such as Import.io. Nutch is a robust and scalable web crawler, written in Java.
It enables fine-grained configuration, paralleling harvesting, robots.txt rule support, and machine learning. Scrapy, written in Python, is an reusable web crawling framework. It speeds up the process of building and scaling large crawling projects.
In addition, it also provides a web-based shell to simulate the website browsing behaviors of a human user. To enable nonprogrammers to harvest web contents, the web-based crawler with a graphic interface is purposely designed to mitigate the complexity of using a web scraping program.
Among them, Import.io is a typical crawler for extracting data from websites without writing any code . It allows users to identify and convert unstructured web pages into a structured format. Import.io’s graphic interface for data identification allows user to train and learn what to extract.
ADVANTAGE :
Timely and Automated Vulnerability Detection
Real-Time Updates: Automates the process of checking for new vulnerabilities, enabling quicker identification of critical and high-severity issues.
Continuous Monitoring: Provides ongoing vigilance without manual intervention, ensuring vulnerabilities are detected as soon as they are published.
Comprehensive Coverage
Wide Scope: Can be configured to scrape multiple OEM websites and relevant platforms, aggregating information from a broad range of sources.
Varied Sources: Accesses vulnerabilities from different platforms (OEM sites, security forums, advisories) to provide a more comprehensive view of potential threats.
|