Pygooglenews is a Python package that will be employed to initiate Google News headlines' scraping. The first step is to install this package which can be executed by entering the command: pip install pygooglenews. Once installed, the Google News package must be initiated by importing it into the repository. Here's the command:
from pygooglenews import GoogleNews
gn = GoogleNews()
The command top = gn.top_news() helps you achieve the highest stories
Configurations
Mandates
Google News class consists of two fundamental variables necessary as mandates. These are country and lang. Many more options are also available for a diverse set of permutations and combinations to test and try. One can locate the listicle of these combinations in the language and region section of the page.
Let's proceed to understand the commands and their application in the usage of the Google News Scraper tool.
Commands
1. Search through top news
This is a command that enables searching the top-ranking stories of Google News making, it one of the most easy news sites to scrape.
The command for this goes like this:
gn.top_news()
2. Search through topic headlines
Headlines are always all over the internet. The advent of an advanced web crawler facilitates looking up specific and headline-based search results. These results included every domain, like entertainment, health, technology, business, science, politics, social media, and so much more.
The command for this goes like this:
gn.topic_headlines('business')
The command must be filled in with the desired topic for the relevant searches.
3. Search through queries
Web scraping services USA have now advanced to realize the optimization of keywords. The easy news sites to scrape, like the Google News scraper, regularise the use of these keywords with customization in terms of word choices within a time frame. This means a user can make a search based on a query, and the relevant keywords will appear in the results. The user can also add a time range for the query, and all the results for that particular time frame will appear on the results page.
Let us use an example of a query 'Dog' with the keyword 'Spain' in an eight-month time frame.
Hence, the command for the same goes like this:
gn.search('Dog - Spain', when = '8m')
4. Search through geographical locations
The scope of a field is significantly narrowed down if the location is at play. For instance, a search for “best italian restaurants around me” will deliver better results aided by location mapping technology. Such advancements are made only by the best news sites to scrape and eliminate unwanted options. One can operate on Google News Feed by incorporating their desired location in the search feed for better accuracy.
Let us use an example of the location of ‘Tokyo’
The command for this goes like:
gn.geo_headlines(‘Tokyo’)
Outputs
A collection of JSON objects are generated by the Google Scraper output. Below is an illustration of a ethereum keyword search result:
{
"title": "Cardano, the cryptocurrency that wants to dethrone Ethereum - OI Canadian",
"title_detail": {
"type": "text/plain",
"language": null,
"base": "",
"value": "Cardano, the cryptocurrency that wants to dethrone Ethereum - OI Canadian"
},
"links": [
{
"rel": "alternate",
"type": "text/html",
"href": "https://oicanadian.com/cardano-the-cryptocurrency-that-wants-to-dethrone-ethereum/"
}
],
"link": "https://oicanadian.com/cardano-the-cryptocurrency-that-wants-to-dethrone-ethereum/",
"id": "CBMiUmh0dHBzOi8vb2ljYW5hZGlhbi5jb20vY2FyZGFuby10aGUtY3J5cHRvY3VycmVuY3ktdGhhdC13YW50cy10by1kZXRocm9uZS1ldGhlcmV1bS_SAQA",
"guidislink": false,
"published": "Mon, 13 Sep 2021 21:46:22 GMT",
"published_parsed": [
2021,
9,
13,
21,
46,
22,
0,
256,
0
],
"summary": "<a href=\"https://oicanadian.com/cardano-the-cryptocurrency-that-wants-to-dethrone-ethereum/\" target=\"_blank\">Cardano, the cryptocurrency that wants to dethrone Ethereum</a> <font color=\"#6f6f6f\">OI Canadian</font>",
"summary_detail": {
"type": "text/html",
"language": null,
"base": "",
"value": "<a href=\"https://oicanadian.com/cardano-the-cryptocurrency-that-wants-to-dethrone-ethereum/\" target=\"_blank\">Cardano, the cryptocurrency that wants to dethrone Ethereum</a> <font color=\"#6f6f6f\">OI Canadian</font>"
},
"source": {
"href": "https://oicanadian.com",
"title": "OI Canadian"
},
"sub_articles": []
}
Search for articles that mention NFT but not Bitcoin In this example, you'll first
Examples
Let us also take a look at a few examples on how the Google News scraping process will work in real time. We shall note different sets of samples to witness how exactly these scraping processes will be executed.
1. Scraping information that specifies NFT without any mention of Bitcoin.
This piece of sample starts with an import and initialization of the pygooglenews package. In the following example, we shall discuss the Search through queries instance.
from pygooglenews import GoogleNews
gn = GoogleNews()
s = gn.search('NFT -bitcoin')
print(s['feed'].title)
2. Scraping data with a timeframe of 1 hour with information on Ethereum or Bitcoin.
This is an instance where a headline is decoded using an advanced web crawler. Here, a headline can be scraped for with an option to choose multiple at the same time by adding ‘OR’ as a distinction. There can be a distinction of time. In this sample, we see the search results from the last hour. The unit of time can be denoted by the relevant letters assorted to the time frames like y goes for a year, m goes for a month, and h goes for an hour.
The command looks like:
from pygooglenews import GoogleNews
gn = GoogleNews()
s = gn.search('intitle=bitcoin OR intitle=ethereum, when = '1h')
print(s['feed'].title)
3. Executing command in the terminal and not in the script.
The majority cases, users prefer not to update any single search query in within script while running a Google scraper since that isn't particularly productive. Scrapers would much rather use the terminal to execute the query commands. Only Google News is queried by the pygooglenews scraper software, which then outputs the search suggestions to the terminal.
Additionally, one should likely record the search suggestions in a JSON file in a package. We must make our script better in order to accomplish that. Let's begin by adding the following lines of code to our script so that we may save the outcome as JSON.
def parse_args():
parser = argparse.ArgumentParser(description='Google News Scraper')
parser.add_argument('-k', '--keyword', default='top_news', help='Enter Keyword')
parser.add_argument('-n', '--name', default='output', help='json file name of where you would want result to be saved')
args = vars(parser.parse_args())
return args
Parting Thoughts
This was a complete tutorial that covered the installation and usage processes for the Google News scraper tool, one of the most advanced web crawlers and one of the most preferred web scraping services in the USA.
If you are looking for assistance using the Google news scraper tool or other web scraping services, feel free to contact BotScraper. We will be happy to help!