The Google search engine is amazing in in finding websites and answers in seconds when you’re searching for things on the Internet – but it isn’t foolproof. Although searching for a person by name is a routine online activity, the search results are in many cases incomplete or even misleading due to variations in name spelling.
Searching for information about a specific person is a frequent online activity. In most cases, users are aided in the search process by queries containing a name in Web search engines. Typically, Web search engines provide just a few accurate results associated with a name-containing query.
Most existing solutions for suggesting synonyms in online search are based on pattern matching and phonetic encoding, but very often, the performance of such solutions is less than optimal. I
Unlike a standard word, such as “wall” – which is spelled or written only one way – names or shortened names (diminutives) can be spelled several ways – such as John/Jon or Debbie/Debby. Search engines try to identify such name similarities using string-similarity algorithms, but these in many cases perform poorly.
New methodologies developed by researchers at Ben-Gurion University of the Negev (BGU) in Beersheba will make it easier for search engines to overcome the complexities of name identification and result in significantly more accurate online people searches,” according to Dr. Michael Fire, a member of BGU’s department of software and information systems engineering (SISE) and the Data Science for Social Good Lab. “This is a significant problem both for companies that might be conducting searches for job applicants or individuals who might want to search for a distant relative.
BGU researchers Fire, Dr. Rami Puzis and lead researcher PhD student Aviad Elyashar presented these groundbreaking algorithms in two papers published in Knowledge-Based Systems and in the IEEE Transactions on Knowledge and Data Engineering under the titles “It Runs in the Family: Unsupervised Algorithm for Alternative Name Suggestion Using Digitized Family Trees” and “How does that name sound? Name representation learning using accent-specific speech generation.”
The BGU team proposed SpokenName2Vec, a novel and generic algorithm that addresses the synonym suggestion problem by using automated speech generation and deep learning to produce novel spoken name embeddings. These embeddings capture the way people pronounce names in a particular language and accent. Utilizing a name’s pronunciation can help detect names that sound alike, but are written differently.
They harnessed a dataset of 17 million people on a large-scale dataset with more than 250,000 forenames and surnames and evaluated it on two ground truth datasets containing 7,399 forenames and 25,000 surnames (including their verified synonyms). In total, 37,916 synonyms were retrieved for the 7,399 distinct names They maintained that the performance of SpokenName2Vec was superior to the 10 other algorithms evaluated, including phonetic encoding, string similarity and machine learning algorithms. The results obtained emphasize the potential of spoken name embeddings for improved synonym suggestion.
The methods were tested on three cataloged datasets of first and last names including tens of thousands of first and verified last names.
As a part of their work, the researchers proposed an innovative and groundbreaking representation of names, which considers the way humans pronounce the name in a particular language and accent. “This innovative representation is very dynamic and allows you to identify names that sound similar, but are not necessarily written in the same way,” explained Fire.
“The impressive data obtained highlights the breakthrough and the huge potential in the methods proposed to make it easier to find people based on name variants,” added Puzis. “We are creating a website that will be accessible to everyone and will allow people to be found using the algorithms we have developed.”
The shortcode is missing a valid Donation Form ID attribute.