When approaching a data extraction project we have to take into account 3 fundamental aspects:
Frequency of extraction:
If the process is going to be carried out only once, for example, to carry out a specific migration. Or if, on the contrary, it is going to be used repeatedly, such as obtaining the most current data frequently.
The volume of data and available resources
If we want to collect a few tens of thousands of records or less, we will be able to use a series of simpler techniques and tools than if we are going to obtain hundreds of thousands of records or millions. In the latter case, due to the extraction time and the high number of records to be obtained, we will need other types of data extraction techniques, tools, and resources to use. Some techniques will allow us to extract over 20 records per minute, do not count.
The accessibility of the source data: here we are going to consider two handicaps:
- According to the data presentation structure: if all the data to be extracted is on a single page or if it is on several pages. And if they are in the form of a tab or a table (a single data per page or several data per page).
- Depending on the way the web page presents them: The data can be displayed by requesting a specific web page for each data or data set, that is, we can obtain the data link, or on the contrary, from the web page itself, they are dynamically loaded and presenting the data using javascript and/or AJAX techniques.
If data extraction is what you are looking for, then Botscraper will come into play for you beyond comparison. Please visit www.botscraper.com for full information.