DatasetAdded By LanguageFerret
This is one of several in a series of LanguageFerrets Middle East “think tank” datasets where the
extent of the native Arabic Internet “footprint” of the given organization here being:
The Washington Institute
The unique datasets by LanguageFerret can be described in this nutshell:
What LanguageFerret can do is derive or define a true international Internet “footprint” of any
names, terms and regular words with unparalled precision, speed and accuracy.
Select the names, terms and regular words that you would doing a search of the engine of your
choice (probably Google) in ANY language’s alphabet or character set that is present or
represented on the Internet in any combination. For example the search terms for Oakland Raiders
can be in Cyrillic (Russian), Arabic and Korea making for a pretty obscure but valid combination.
With respect to LanguageFerret these are called “Primary Terms”
From the search engine results select an additional list of names, terms and regular words that you
want to examine for each of the top web pages or URLs your search engine returns. With respec
to LanguageFerret these are called “Auxiliary Terms”. The same things apply to Auxiliary Terms
for ANY language’s alphabet and mix and matching them etc. and Auxiliary Terms may or may not
be some or all of the Primary Terms and the Auxiliary list can be as extensive as the user desires
with the inherent tradeoff the larger the Auxiliary List the slower the processing or dispositioning
of each web page.
Without going into great detail here several languages have different “encodings”. While one web
page’s encoding may appear exactly like another if its encoding is not recognized or “seen”, the
web page will be missed. LanguageFerret recognizes most, if not all encodings.
The value resulting dataset or the difference between a null or no result dataset versus one that
abounds with pertinent URLs/web pages depends on the skill of the user in the same regard as
one doing any search engine. Hopefully this nutshell has sufficiently explained the gist and
uniqueness of LanguageFerret datasets.
For a further description and examples please go to this infochimps link:
The Primary Term(s) for the
and the Auxiliary Term(s) is/are:
Washington Institute Mehdi Khalaji Ahmad Ali Patrick Clawson
واشنطن انستيتوت مهدي خلجي أحمد علي باتريك كلوسن
Note: The file for this dataset is a text file but does not have the .txt file extension.
Creative Commons BY
Attribution 3.0 Unported
You are free:
- to Share — to copy, distribute and transmit the work * to Remix — to adapt the work
Under the following conditions:
- Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
With the understanding that:
- Waiver — Any of the above conditions can be waived if you get permission from the copyright holder. * Other Rights — In no way are any of the following rights affected by the license: * Your fair dealing or fair use rights; * The author’s moral rights; * Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights. * *Notice — For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.