Dataset

Arabic Internet Footprint of the Carnegie Endowment For International Peace Middle East "Think Tank"

Added By LanguageFerret

This is one of several in a series of LanguageFerrets Middle East “think tank” datasets where the
extent of the native Arabic Internet “footprint” of the given organization here being:

The Carnegie Endowment For International Peace
http://carnegieendowment.org

The unique datasets by LanguageFerret can be described in this nutshell:

What LanguageFerret can do is derive or define a true international Internet “footprint” of any
names, terms and regular words with unparalled precision, speed and accuracy.

Select the names, terms and regular words that you would doing a search of the engine of your
choice (probably Google) in ANY language’s alphabet or character set that is present or
represented on the Internet in any combination. For example the search terms for Oakland Raiders
can be in Cyrillic (Russian), Arabic and Korea making for a pretty obscure but valid combination.
With respect to LanguageFerret these are called “Primary Terms”

From the search engine results select an additional list of names, terms and regular words that you
want to examine for each of the top web pages or URLs your search engine returns. With respec
to LanguageFerret these are called “Auxiliary Terms”. The same things apply to Auxiliary Terms
for ANY language’s alphabet and mix and matching them etc. and Auxiliary Terms may or may not
be some or all of the Primary Terms and the Auxiliary list can be as extensive as the user desires
with the inherent tradeoff the larger the Auxiliary List the slower the processing or dispositioning
of each web page.

Without going into great detail here several languages have different “encodings”. While one web
page’s encoding may appear exactly like another if its encoding is not recognized or “seen”, the
web page will be missed. LanguageFerret recognizes most, if not all encodings.

The value resulting dataset or the difference between a null or no result dataset versus one that
abounds with pertinent URLs/web pages depends on the skill of the user in the same regard as
one doing any search engine. Hopefully this nutshell has sufficiently explained the gist and
uniqueness of LanguageFerret datasets.

For a further description and examples please go to this infochimps link:

http://infochimps.org/datasets/ex-oakland-raiders-jamarcus-russells-large-internet-footprint

The Primary Term(s) for the

The Carnegie Endowment For International Peace

dataset is/are:

Carnegie endowment international peace

translated/transcribed into Arabic:

كارنيجي هبة دولي سلام

and the Auxiliary Term(s) is/are:

Carnegie endowment international peace Israel Palestine sanction Salem Paul Marina Ottaway Gaza

translated/transcribed into Arabic:

كارنيجي هبة دولي سلام إسرائيل فلسطين جزاء سالم بل مارينا وتطوي غزة