You will develop a short 1300 word document.
Task: Describe how OSINT can be used to supplement your organizational collection plan, identify 10 sites that can be used to research sites/domains for:
- legitimacy
- sender verification
- list of domains (country codes/domains/extensions, organization domains, other IOC (indicators of compromise) that might need to be researched)
https://github.com/dthroner/Hacking/blob/master/Google%20Hacking%20For%20Penetration%20Testers-Syngress%20(2015).pdf
https://www.youtube.com/watch?v=6dRT5VEvIo8&feature=youtu.be
https://www.youtube.com/watch?v=WQo03MJG0m4
Understanding Metadata
What is Metadata? …………………………………………………………………………………….. 1
What Does Metadata Do? …………………………………………………………………….. 1
Structuring Metadata ……………………………………………………………………… 2
Metadata Schemes and Element Sets ……………………………………….. 3 Dublin Core ………………………………………………………………………………………………………3
TEI and METS………………………………………………………………………………………………..4 MODS ………………………………………………………………………………………………………..5
EAD and LOM…………………………………………………………………………………………6 <indecs>, ONIX, CDWA, and VRA …………………………………………………………7
MPEG …………………………………………………………………………………………….8 FGDC and DDI …………………………………………………………………………….9
Creating Metadata ………………………………………… 10
Interoperability and Exchange of Metadata ….11
Future Directions ……………………………… 12
More Information on Metadata …….. 13
Glossary ……………………………….. 15
Acknowledgements Understanding Metadata is a revision and expansion of Metadata Made Simpler: A guide for libraries published by NISO Press in 2001. NISO Press extends its thanks and appreciation to Rebecca Guenther and Jacqueline Radebaugh, staff members in the Library of Congress Network Development and MARC Standards Office, for sharing their expertise and contributing to this publication.
About NISO NISO, a non-profit association accredited by the American National Standards Institute (ANSI), identifies, develops, maintains, and publishes technical standards to manage information in our changing and ever-more digital environment. NISO standards apply both traditional and new technologies to the full range of information-related needs, including retrieval, re-purposing, storage, metadata, and preservation. NISO Standards, information about NISO’s activities and membership are featured on the NISO website <http://www.niso.org>.
This booklet is available for free on the NISO website (www.niso.org) and in hardcopy from NISO Press.
Published by: NISO Press National Information Standards Organization 4733 Bethesda Avenue, Suite 300 Bethesda, MD 20814 USA Email: [email protected] Tel: 301-654-2512 Fax: 301-654-1721 URL: www.niso.org
Copyright © 2004 National Information Standards Organization ISBN: 1-880124-62-9
What Is Metadata? Metadata is structured infor-
mation that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.
The term metadata is used differently in different communities. Some use it to refer to machine understandable information, while others use it only for records that describe electronic resources. In the library environment, metadata is commonly used for any formal scheme of resource description, applying to any type of object, digital or non-digital. Traditional library cataloging is a form of metadata; MARC 21 and the rule sets used with it, such as AACR2, are metadata standards. Other metadata schemes have been developed to describe various types of textual and non-textual objects including published books, electronic documents, archival finding aids, art objects, educational and training materials, and scientific datasets.
There are three main types of metadata: • Descriptive metadata describes
a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords.
• Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters.
• Administrative metadata pro- vides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of
Understanding Metadata
Page 1
Metadata is key to ensuring that
resources will survive and
continue to be accessible into
the future.
administrative data; two that sometimes are listed as separate metadata types are:
− Rights management meta- data, which deals with intellectual property rights, and
− Preservation metadata, which contains information needed to archive and preserve a resource.
Metadata can describe re- sources at any level of aggregation. It can describe a collection, a single resource, or a component part of a larger resource (for example, a photograph in an article). Just as
catalogers make decisions about whether a catalog record should be created for a whole set of volumes or for each particular volume in the set, so the metadata creator makes similar decisions. Metadata can also be used for description at any level of the information model laid out in the IFLA (International Federation of Library Associations and Institutions) Functional Require- ments for Bibliographic Records: work, expression, manifestation, or item. For example, a metadata record could describe a report, a particular edition of the report, or a specific copy of that edition of the report.
Metadata can be embedded in a digital object or it can be stored separately. Metadata is often embedded in HTML documents and
in the headers of image files. Storing metadata with the object it describes ensures the metadata will not be lost, obviates problems of linking between data and metadata, and helps ensure that the metadata and object will be updated together. However, it is impossible to embed metadata in some types of objects (for example, artifacts). Also, storing metadata separately can simplify the management of the metadata itself and facilitate search and retrieval. Therefore, metadata is commonly stored in a database system and linked to the objects described.
What Does Metadata Do?
An important reason for creating descriptive metadata is to facilitate discovery of relevant information. In addition to resource discovery, metadata can help organize electronic resources, facilitate interoperability and legacy resource integration, provide digital identification, and support archiving and preservation.
Resource Discovery Metadata serves the same
functions in resource discovery as good cataloging does by: • allowing resources to be found
by relevant criteria;
• identifying resources;
• bringing similar resources together;
• distinguishing dissimilar re- sources; and
• giving location information.
Organizing Electronic Resources
As the number of Web-based resources grows exponentially, aggregate sites or portals are increasingly useful in organizing
Page Understanding Metadata2
l inks to resources based on audience or topic. Such lists can be built as static webpages, with the names and locations of the resources “hardcoded” in the HTML. However, it is more efficient and increasingly more common to build these pages dynamically from metadata stored in databases. Various software tools can be used to automatically extract and reformat the information for Web applications.
Interoperability Describing a resource with
metadata allows it to be understood by both humans and machines in ways that promote interoperability. Interoperability is the ability of multiple systems with different hardware and software platforms, data structures, and interfaces to exchange data with minimal loss of content and functionality. Using defined metadata schemes, shared transfer protocols, and crosswalks between schemes, resources across the network can be searched more seamlessly.
Two approaches to inter- operability are cross-system search and metadata harvesting. The Z39.50 protocol is commonly used for cross-system search. Z39.50 implementers do not share metadata but map their own search capabilities to a common set of search attributes. A contrasting approach taken by the Open Archives Initiative is for all data providers to translate their native metadata to a common core set of elements and expose this for harvesting. A search service provider then gathers the metadata into a consistent central index to allow cross-repository searching regardless of the metadata formats used by participating repositories.
Digital Identification Most metadata schemes include
elements such as standard numbers to uniquely identify the work or object to which the metadata refers. The location of a
digital object may also be given using a file name, URL (Uniform Resource Locator), or some more persistent identifier such as a PURL (Persistent URL) or DOI (Digital Object Identifier). Persistent identifiers are preferred because object locations often change, making the standard URL (and therefore the metadata record) invalid. In addition to the actual elements that point to the object, the metadata can be combined to act as a set of identifying data, differentiating one object from another for validation purposes.
Archiving and Preservation
Most current metadata efforts center around the discovery of recently created resources. However, there is a growing concern that digital resources will not survive in usable form into the future. Digital information is fragile; it can be corrupted or altered, intentionally or unintentionally. It may become unusable as storage media and hardware and software technologies change. Format migration and perhaps emulation of current hardware and software behavior in future hardware and software platforms are strategies for overcoming these challenges.
Metadata is key to ensuring that resources will survive and continue to be accessible into the future. Archiving and preservation require special elements to track the lineage of a digital object (where it came from and how it has changed over time), to detail its physical characteristics, and to document its behavior in order to emulate it on future technologies.
Many organizations inter- nationally have worked on defining metadata schemes for digital preservation, including the National Library of Australia, the British Cedars Project (CURL Exemplars in Digital Archives), and a joint Working Group of OCLC and the Research Libraries Group (RLG).
The latter group developed a framework outlining types of presentation metadata. A follow-up group, PREMIS (PREservation Metadata: Implementation Strat- egies)—also sponsored by OCLC and RLG—is developing a set of core elements and strategies for the encoding, storage, and manage- ment of preservation metadata within a digital preservation system. Many of these initiatives are based on or compatible with the ISO Reference Model for an Open Archival Information System (OAIS).
Structuring Metadata Metadata schemes (also called
schema) are sets of metadata elements designed for a specific purpose, such as describing a particular type of information resource. The definition or meaning of the elements themselves is known as the semantics of the scheme. The values given to metadata elements are the content. Metadata schemes generally specify names of elements and their semantics. Optionally, they may specify content rules for how content must be formulated (for example, how to identify the main title), representation rules for content (for example, capitalization rules), and allowable content values (for example, terms must be used from a specified controlled vocabulary).
There may also be syntax rules for how the elements and their content should be encoded. A metadata scheme with no prescribed syntax rules is called syntax independent. Metadata can be encoded in any definable syntax. Many current metadata schemes use SGML (Standard Generalized Mark-up Language) or XML (Extensible Mark-up Language). XML, developed by the World Wide Web Consortium (W3C), is an extended form of HTML that allows for locally defined tag sets and the easy exchange of structured
PageUnderstanding Metadata
Dublin Core Example
Title=”Metadata Demystified”
Creator=”Brand, Amy”
Creator=”Daly, Frank”
Creator=”Meyers, Barbara”
Subject=”metadata”
Description=”Presents an overview of metadata conventions in publishing.”
Publisher=”NISO Press”
Publisher=”The Sheridan Press”
Date=”2003-07"
Type=”Text”
Format=”application/pdf”
Identifier=”http://www.niso.org/ standards/resources/ Metadata_Demystified.pdf”
Language=”en”
3
information. SGML is a superset of both HTML and XML and allows for the richest mark-up of a document. Useful XML tools are becoming widely available as XML plays an increasingly crucial role in the exchange of a variety of data on the Web.
Metadata Schemes and Element Sets
Many different metadata schemes are being developed in a variety of user environments and disciplines. Some of the most common ones are discussed in this section.
Dublin Core The Dublin Core Metadata
Element Set arose from discussions at a 1995 workshop sponsored by OCLC and the National Center for Supercomputing Applications (NCSA). As the workshop was held in Dublin, Ohio, the element set was named the Dublin Core. The continuing development of the Dublin Core and related spec- ifications is managed by the Dublin Core Metadata Initiative (DCMI).
The original objective of the Dublin Core was to define a set of elements that could be used by authors to describe their own Web resources. Faced with a pro- liferation of electronic resources and the inability of the library profession to catalog all these resources, the goal was to define a few elements and some simple rules that could be applied by noncatalogers. The original 13 core elements were later increased to 15: Title, Creator, Subject, Descrip- tion, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights.
The Dublin Core was developed to be simple and concise, and to describe Web-based documents. However, Dublin Core has been used with other types of materials and in applications demanding
some complexity. There has historically been some tension between supporters of a minimalist view, who emphasize the need to keep the elements to a minimum and the semantics and syntax simple, and supporters of a structuralist view who argue for finer semantic distinctions and more extensibility for particular communities.
These discussions have led to a distinction between qualified and unqualified (or simple) Dublin Core. Qualifiers can be used to refine (narrow the scope of) an element, or to identify the encoding scheme used in repre- senting an element value. The element Date, for example, can be used with the refinement qualifier created to narrow the meaning of the element to the date the object was created. Date can also be used with an encoding scheme qualifier to identify the format in which the date is recorded, for example, following the ISO 8601 standard for representing date and time.
All Dublin Core elements are optional and all are repeatable. The elements may be presented in any order. While the Dublin Core description recommends the use of controlled values for fields where they are appropriate (for example, controlled vocabularies for the Subject field), this is not required. However, working groups have been established to discuss authoritative lists for certain elements such as Resource Type. While Dublin Core leaves content rules to the particular imple- mentation, the DCMI encourages the adoption of application profiles (domain-specific rules) for particular domains such as education and government. An application profile
for libraries is being developed by the Libraries Working Group.
Because of its simplicity, the Dublin Core element set is now used by many outside the library c o m m u n i t y — r e s e a r c h e r s , museum curators, and music collectors to name only a few. There are hundreds of projects worldwide that use the Dublin Core either for cataloging or to collect data from the Internet; more than 50 of these have links on the DCMI website. The subjects range from cultural heritage and art to math and physics. Meanwhile the Dublin Core Metadata Initiative has expanded beyond simply maintaining the Dublin Core Metadata Element Set into an organization that describes itself as “dedicated to promoting the widespread adoption of inter- operable metadata standards and developing specialized metadata vocabularies for discovery systems.”
Page Understanding Metadata4
The Text Encoding Initiative (TEI)
The Text Encoding Initiative is an international project to develop guidelines for marking up electronic texts such as novels, plays, and poetry, primarily to support research in the humanities. In addition to specifying how to encode the text of a work, the TEI Guidelines for Electronic Text Encoding and Interchange also specify a header portion, embedded in the resource, that consists of metadata about the work. The TEI header, like the rest of the TEI, is defined as an SGML DTD (Document Type Definition)— a set of tags and rules defined in SGML syntax that describe the structure and elements of a document. This SGML mark-up becomes part of the electronic resource itself. Since the TEI DTD is rather large and complicated in order to apply to a vast range of texts and uses, a simpler subset of the DTD, known as TEI Lite, is commonly used in libraries.
It is assumed that TEI-encoded texts are electronic versions of printed texts. Therefore the TEI Header can be used to record bibliographic information about both the electronic version of the text and about the non-electronic source version. The basic bibliographic information is similar to that recorded in library cataloging and can be mapped to and from MARC. However, there are also elements defined to record details about how the text was transcribed and edited, how mark-up was performed, what revisions were made, and other non-bibliographic facts. Libraries tend to use TEI headers when they have collections of SGML-encoded full text. Some libraries use TEI headers to derive MARC records for their catalogs, while others use MARC records as the basis for creating TEI header descriptions for the source texts.
Metadata Encoding and Transmission Standard (METS)
The Metadata Encoding and Transmission Standard (METS) was developed to fill the need for a standard data structure for describing complex digital library objects. METS is an XML Schema for creating XML document instances that express the structure of digital l ibrary objects, the associated descriptive and administrative metadata, and the names and locations of the files that comprise the digital object.
The metadata nec- essary for successful management and use of digital objects is both more extensive than and different from the metadata used for managing collections of printed works and other physical materials. Structural metadata is needed to ensure that separately digitized files (for example, different pages of a digitized book) are structured appro- priately. Technical metadata is needed for information about the digitization process so that scholars may determine how accurate a reflection of the original the digital version provides. Other technical metadata is required for internal purposes in order to periodically refresh and migrate the data, ensuring the durability of valuable resources.
METS was originally an outgrowth of the Making of America II project, a digitization project of major research libraries that attempted to address these metadata issues, in part by providing
an encoding format for metadata for textual and image-based works. The Digital Library Federation (DLF) built on that earlier work to create METS, a standard schema for providing a method for expressing and packaging together descriptive, administrative, and structural metadata for objects within a digital library. Expressed using the XML schema language, METS provides a document format for encoding the metadata necessary for manage- ment of digital library objects within a repository and for exchange between repositories.
Metadata in Action An oral historian makes tape-
recordings of interviews with members of a particular ethnic group. Interviewees sign a paper release form giving intellectual property rights to the historian. Most interviewees grant permission to disseminate the interviews in print and electronically, but several restrict publication and dissemination until 25 years after death.
Information about each interview is kept in a database: Interviewer, Interviewee, Date, Place, etc. Each interview follows a questionnaire format. The questionnaire exists as a text file. The tapes, release forms, database, and text file are donated to a library that has a special collection focusing on the particular ethnic group.
The tapes are digitized. Since each interview runs over several tapes, technicians record structural metadata to keep component parts of each interview together. Technicians record administrative metadata such as file names, location of each interview in the files, equipment used, the methods of digitizing and assuring quality and completeness, file formats, etc. Different segments of this metadata allow the audio files to be automatically tracked, accessed, stored, refreshed, and migrated.
An archivist expands the database to include the persistent identifier of each interview, thereby linking the audio file to the descriptive metadata. The names of the data elements are revised to match Dublin Core terminology, including qualifiers used specifically for audio
(continued on page 5)
PageUnderstanding Metadata 5
A METS document contains seven major sections: • METS Header – Contains
metadata describing the METS document itself, including such information as creator, editor, etc.
• Descriptive Metadata – Points to descriptive metadata external to the METS document (for example, a MARC record in an OPAC or an Encoded Archival Description finding aid main- tained on a webserver), or to internally embedded descriptive metadata, or both.
• Administrative Metadata – Provides information regarding how the files are created and stored, intellectual property rights, the original source object from which the digital library object derives, and the prov- enance of the files comprising the digital library object.
• File Section – Lists all files containing content that comprise the electronic versions of the digital object.
• Structural Map – Outlines a hierarchical structure for the digital library object and links the
elements of that structure to content files and metadata that pertain to each element.
• Structural Links – Allows METS creators to record the nodes in the hierarchy outlined in the Structural Map.
• Behavior – Associates executable behaviors with content in the METS object.
The METS header, file section, structural map, structural l inks, and behavior sections are defined within the METS schema. METS is less prescriptive about descriptive and admin- istrative metadata, relying on extension schemas— externally developed metadata schemes—to provide specific elements. The METS Editorial Board has endorsed three descriptive metadata schemes: simple Dublin Core, MARCXML, and MODS (discussed below).
For technical metadata the METS website makes available schemas for text and digital still images. The latter standard is
called MIX, Metadata for Images in XML Schema, and is based on a proposed NISO standard, Z39.87, Data Dictionary: Technical Metadata for Digital Still Images. Further work is in process on extension schemas for audio, video, and websites. Another current area of concentration for the METS development community is the creation of METS application profiles to give guidance regarding the creation of METS documents for particular object types.
Use of the METS schema is widespread. A list of implementation registries using METS, a tutorial, and other important information can be found on the METS website.
Metadata Object Description Schema (MODS)
The Metadata Object Description Schema (MODS) is a descriptive metadata schema that is a derivative of MARC 21 and intended to either carry selected data from existing MARC 21 records or enable the creation of original resource description records. It includes a subset of MARC fields and uses language- based tags rather than the numeric ones used in MARC 21 records. In some cases, it regroups elements from the MARC 21 bibliographic format. Like METS, MODS is expressed using the XML schema language.
Although the MODS standard can stand on its own, it may also complement other metadata formats. Because of its flexibility and use of XML, MODS may potentially be used as a Z39.50 Next Generation specified format, an extension schema to METS, a metadata set for harvesting, and for creating original resource metadata records in an XML syntax.
Rich description of electronic resources is a particular focus of MODS, which provides some advantages over other metadata
Metadata in Action (continued from page 4)
materials. Information on rights and permissions is entered.
An archivist creates an EAD finding aid for the audio collection using the database as the core. Portions of the questionnaire text file are incorporated as a rich source of subject keywords. A MARC record is derived from the EAD finding aid and added to OCLC and RLIN.
A webpage is created where researchers can access the finding aid, search the database, and listen to the audio files. Interviews coded as restricted are invisible to the search program until the date when they become open to the public. Administrative, structural, and descriptive metadata is created for the webpage to hold all the pieces together, allow them to be managed, and allow them to be accessed.
The library participates in a metadata harvesting protocol to provide extracts of local metadata in a common format to a service provider so that information about the collection is automatically included in a number of relevant tools such as catalogs and portals.
The webpage is linked to the library’s website dedicated to resources about the ethnic group, where it is available to researchers in context with archival and visual materials, digitized secondary sources, etc. Administrative, structural, and descriptive metadata at the website level has also been created.
Page Understanding Metadata
A MODS Record Example <mods>
<titleInfo> <title>Metadata demystified</title>
</titleInfo> <name type=”personal”>
<namePart type=”family”>Brand</namePart> <namePart type=”given”>Amy</namePart> <role>
<roleTerm authority=”marcrelator” type=”text”>author</roleTerm> </role>
</name> <typeOfResource>text</typeOfResource> <originInfo>
<dateIssued>2003</dateIssued> <place>
<placeTerm type=”text”>Bethesda, MD</placeTerm> </place> <publisher>NISO Press</publisher>
</originInfo> <identifier type=”isbn”>1-880124-59-9</identifier>
</mods>
6
schemes. MODS elements are richer than the Dublin Core; its elements are more compatible with library data than the ONIX or Dublin Core standards; and it is simpler to apply than the full MARC 21 bibliographic format. With its use of XML Schema language, MODS offers enhancements over MARC 21, such as the use of an optional ID attribute to facilitate linking at the element level; the ability to specify language, script, and transliteration scheme at the element level; and the ability to embed a rich description of components in the related Item element.
The ability in MODS to give granular descriptions of constituent parts of an object works particularly well with the METS structural map for complex digital library objects.
The Encoded Archival Description (EAD)
The Encoded Archival Description (EAD) was developed as a way of marking up the data contained in finding aids so that they can be searched and displayed online.
In archives and special collections, the finding aid is an important tool for resource
description. Finding aids differ from catalog records by being much longer, more narrative and explanatory, and highly structured in a hierarchical fashion. They generally start with a description of the collection as a whole, indicating what types of materials it contains and why they are important. If the collection consists of the personal papers of an individual there can be a lengthy biography of that person. The finding aid describes the series into which the collection is organized—such as corres- pondence, business records, personal papers, and campaign speeches—and ends with an itemization of the contents of the physical boxes and folders comprising the collection.
Like the TEI Header, the EAD is defined as an SGML DTD. It begins with a header section that describes the finding aid itself (for example, who wrote it) and then goes on to the description of the collection as a whole and successively more detailed information about the records or series within the collection. If individual items being described exist in digital form, the EAD can include pointers to the digital objects. The 2002 version of
the EAD DTD provides support for both SGML and XML through the use of defined “switches” for turning off features used only in SGML and turning on features used only in XML. The EAD standard is maintained jointly by the Library of Congress and the Society of American Archivists.
The EAD is particularly popular in academic libraries, historical societies, and museums with large special collections. Many of these collections contain unique materials unavailable elsewhere and often the materials in the
collections are not individually cataloged like traditional library materials. By creating searchable EAD finding aids, libraries and archives can increase awareness of their unique collections to the Internet community.
Learning Object Metadata The IEEE Learning Technology
Standards Committee (LTSC) developed the Learning Object Metadata (LOM) standard (IEEE 1484.12.1-2002) to enable the use and re-use of technology-supported learning resources such as computer-based training and distance learning. The LOM defines the minimal set of attributes to man- age, locate, and evaluate learning objects. The attributes are grouped into eight categories: • General, containing information
about the object as a whole;
• Lifecycle, containing metadata about the objects evolution;
• Technical, with descriptions of the technical characteristics and requirements;
• Educational, containing the educational / pedagogical attributes;
PageUnderstanding Metadata 7
• Rights, describing the intellectual property rights and use conditions;
• Relation, identifying related objects;
• Annotation, containing com- ments and the date and author of the comments; and
• Classification, which identifies other classification system identifiers for the object.
Within each category is a hierarchy of data elements to which the metadata values are assigned. Examples of learning-related metadata elements found in the Education category are Typical Age Range (of the intended user), Difficulty, Typical Learning Time, and Interactivity Level.
The IMS Global Learning Consortium has developed a suite of specifications to enable interoperability in a learning environment. Their Meta-Data Information Model specification is based on the IEEE LOM scheme with only minor modifications.
E-Commerce – <indecs> and ONIX
Metadata schemas are increasingly being developed to support electronic commerce applications. The <indecs> Framework (Interoperability of Data in ECommerce Systems) was an international collaborative effort supported by the European Commission’s Info 2000 Pro- gramme. The collaborators were major rights owners, such as publishers and members of the recording industry, who wanted to develop a framework for metadata standards to support network commerce in intellectual property.
The foundation of the <indecs> work is a data model for intellectual property and its transfer. Rather than developing a new metadata scheme, <indecs> sought to develop a common framework to
allow various schemes for transactions related to different genres such as music, journal articles, and books to be able to interchange information, particularly that related to intellectual property rights. In order to support this common framework, <indecs> has developed a minimal kernel of required metadata.
Several organizations have built on the <indecs> Framework to develop specific metadata schemas. Among them is the ONIX (Online Information Exchange) International standard. ONIX is an XML-based metadata scheme developed by publishers under the auspices of a number of book industry trade groups in the United States and Europe. The original ONIX specification was a direct response to the enormous growth in online book sales and the realization that books described with images, cover blurbs, reviews, and similar information significantly outsold books without this information. Therefore ONIX for Books has elements to record a wide range of evaluative and promotional infor- mation as well as basic bibliographic and trade data. ONIX for Serials is in development to define serials product metadata at the title, item, and subscription package levels.
While ONIX information was designed for use in the commerce cycle of a publication, it may also provide a source for enrichment of library-created catalog records; the Bibliographic Enrichment Advisory Team (BEAT) project at the Library of Congress is experimenting with this use. ONIX metadata may also be used by libraries in the future for the creation of a beginning bibliographic record. Mappings between ONIX for Books and both MARC 21 and UNIMARC have already been created.
Visual Objects – CDWA and VRA
Metadata used to describe visual objects such as a painting or
sculpture has its own special requirements. The Art Information Task Force (AITF), developed a conceptual framework for describ- ing and accessing information about objects and images called Categories for the Descriptions of Works of Art (CDWA). Some 30 categories were defined, most with multiple subcategories. Some examples of the specialized descriptive elements relevant to artworks included are: Orientation, Dimensions, Condition, Inscrip- tions, Conservation Treatment, and Exhibition / Loan History.
Typically, visual resources collections used in teaching art history and similar subjects do not contain original art works but rather slides or photographs of the original art. Metadata for these materials therefore has to accommodate the description of multiple levels of related resources, such as an original painting, a slide of the painting, and a digitized image of the slide. The VRA Core Categories build on and expand the CDWA work to define a single metadata element set that can be used to describe the work (the actual painting, photograph, sculpture, building, etc. ) as well as the images (visual representations) of them.
Version 3.0 of the VRA Core Categories consists of 17 metadata elements which can be used as applicable to describe each of these versions and relate them to each other: Record Type, Type, Title, Measurements, Material, Tech- nique, Creator, Date, Location, ID Number, Style/Period, Culture, Subject, Relation, Description, Source, and Rights. Like the Dublin Core, the VRA Core scheme does not specify any particular syntax or rules for representing content.
Both CDWA and VRA emphasize the use of controlled vocabularies for specified elements. A number of existing vocabularies are suggested and communities are encouraged to develop additional vocabularies as needed.
Page Understanding Metadata8
MPEG Multimedia Metadata
The ISO/IEC Moving Picture Experts Group (MPEG) has developed a suite of standards for coded representation of digital audio and video. Two of the standards address metadata: MPEG-7, Multimedia Content Description Interface (ISO/IEC 15938), and MPEG-21, Multimedia Framework (ISO/IEC 21000).
MPEG-7 defines the metadata elements, structure, and rela- tionships that are used to describe audiovisual objects including still pictures, graphics, 3D models, music, audio, speech, video, or multimedia collections. It is a multi- part standard that addresses: • Description Tools including
Descriptors that define the syntax and the semantics of each metadata element and Description Schemes that specify the structure and semantics of the relationships between the elements.
• A Description Definition Lang- uage to define the syntax of the Description Tools, allow the creation of new Description Schemes, and allow the extension and modification of existing Description Schemes.
• System tools, to support storage and transmission, synch- ronization of descriptions with content, and management and protection of intellectual property.
Descriptors for visual and audio are defined separately using a hierarchy of elements and sub- elements. For visual objects there are descriptors for Basic Structure, Color, Texture, Shape, Motion, Localization, and Face Recognition. Audio descriptors are divided into two categories: low-level descriptors that are common to audio objects across most applications, and high-level descriptors that are specific to
particular applications of audio. The cross-application low-level descrip- tors cover Structures and Features (temporal and spectral). The domain-specific high-level descrip- tors include such elements as Musical Instrument Timbre, Melody Description, and Spoken Content Description.
The Description Schemes are based on XML, and can be expressed in textual form suitable for editing, searching, filtering, and human readability; or in a binary form for storage, transmission, and streaming delivery. Since the full description of a multimedia object can be quite complex, the standard provides for a Summary Description Scheme geared to browsing and navigation.
The standard envisions that search engines could use MPEG-7 metadata descriptions to identify audiovisual objects in entirely new ways, such as digitizing a musical phrase played on a keyboard and then retrieving a list of musical pieces that contain the sequence of notes; drawing some lines on an electronic drawing tablet and retrieving images with similar graphics; or using a voice excerpt to retrieve related speech files, photographs, video clips, and biographical information of the speaker. These retrieval mech- anisms are outside the scope of MPEG-7, but the standards developers wanted to accommodate these futuristic capabilities and have included many interoperability requirements beyond the typical metadata elements.
MPEG-21 was developed to address the need for an overarching framework to ensure interoperability of digital multimedia objects. The multi-part standard is not yet fully completed but is intended to include the following: • Part 1: Vision, Technologies and
Strategy provides the overview of the complete vision and plan
for the framework. It was issued as an ISO technical report (ISO/ IEC TR 21000:1-2001) and is available as a free download from ISO’s publicly available standards website. A second edition of the vision document is underway to address comments and suggestions received from other organizations following the initial publication.
• Part 2: Digital Item Declaration, issued in 2003, describes a model for defining Digital Items. It includes a description of the syntax and semantics of each of the Digital Item Declaration elements and a corresponding XML schema.
• Part 3: Digital Item Identification, also issued in 2003, describes how to uniquely identify Digital Items and how to link Digital Items with related information such as descriptive metadata.
• Part 4: Intellectual Property Management and Protection is still in development. It is intended to define the framework for ensuring interoperability of intellectual property manage- ment tools, including authen- tication, and accommodates the Rights information defined in the following two parts.
• Part 5: Rights Expression Language, issued in 2004, is a machine-readable language that can declare rights and per- missions.
• Part 6: Rights Data Dictionary is still in development. It will define a standard set of terms to be used with the Rights Expression Language. It is also expected to include specifications for mapping and transforming rights metadata terminology. The Rights Data Dictionary and Expression Language are being viewed as models for the handling of intellectual property metadata for applications beyond audiovisual.
PageUnderstanding Metadata 9
• Part 7: Digital Item Adaptation, also in development, is intended to standardize networking and interoperability description tools. Included in this part will be User Characteristic description tools that specify user preferences.
There are some seven additional parts identified and in various stages of development that deal with technical interoperability issues of less specific relevance to metadata. All of the published parts are available from ISO as ISO/IEC 21000-[part#].
Metadata for Datasets Metadata schemes for datasets
are enabling original data in the science and social science fields to be shared in a way that was never possible before the Internet. One of the most well developed element sets is the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM), officially known as FGDC-STD-001-1998.
Geospatial datasets include topographic and demographic data, GIS (geographic information systems), and computer-aided cartography base files. They are used in a wide variety of areas, including soil and land use studies, biodiversity counts, climatology and global change tracking, remote sensing, and satellite imagery. The FGDC Content Standard is required for use with resources created and funded by the U.S. Government and is also being used by many state governments.
An international standard, ISO 19115, Geographic Information— metadata was issued in 2003. A technical amendment that will allow datasets to be both ISO and FGDC compliant is underway along with an implementation model that can be used in conjunction with an XML schema.
A metadata scheme becoming well established in the social and behavioral sciences is the Data
Documentation Initiative (DDI) standard for describing social science datasets. The DDI is defined as an XML DTD, and allows for top down hierarchical description of a social science study, the data files resulting from that study, and the variables used in the data files. There is also a header area that uses Dublin Core elements for a high-level description of the DDI document itself.
Extensions and Profiles
Despite the recent development of many of these metadata schemes, most have already been subject to the changes brought about by imple- menting them in real world situations. These modifi- cations are of two types: extensions and profiles.
An extension is the addition of elements to an already developed scheme to support the description of an infor- mation resource of a particular type or subject or to meet the needs of a particular interest group. Extensions increase the number of elements.
Profiles are subsets of a scheme that are implemented by a particular interest group. Profiles can constrain the number of elements that will be used, refine element definitions to describe the specific types of resources more accurately, and specify values that an element can take.
In practice, many applications use both extensions and profiles of base metadata schemes. For example, the National Biological Information Infrastructure (NBII) has developed a Biological Data Profile of the FGDC Content Standard for use with biological
information resources. The profile defines an extended set of data for describing biological data, such as the taxonomic name of the organism and its classification in the taxonomic hierarchy.
The U.S. Department of Education’s Gateway to Edu- cational Materials (GEM) project has based their own metadata scheme on the Dublin Core. The GEM profile limits the Dublin Core elements that can be used (for example, Contributor is not allowed) and makes some elements mandatory. GEM also defines ad- ditional elements such as Audience, Grade, Quality, and Standards, extending the base Dublin Core set for educational use.
Metadata in Action A county land planner is studying the
impact of new zoning laws on a particular bird species. The study team is composed of an ecologist, hydrologist, civil engineer, and environmental protection specialist.
Remote sensing data for the last 20 years provides a trend analysis of the decrease in wetlands, the bird’s habitat. These datasets have FGDC metadata. The biologists on the study team need to document the results of a field inventory. Using a biological profile to extend the FGDC element set, the biologists add the genus-species name and taxonomic hierarchy. The ecologists are concerned with collection methods and modeling tools. The data related to the changes in human population are documented using a metadata set developed by the Census Bureau.
This study results in a technical report which is assigned Dublin Core metadata by the author. When the technical report is cataloged into the organization’s repository, the Dublin Core elements are used as the basis for automatic generation of a MARC cataloging record. This record is enhanced by the cataloger and included in the library’s online public access catalog.
Page Understanding Metadata10
Creating Metadata Who creates metadata? The
answer to this varies by discipline, the resource being described, the tools available, and the expected outcome, but it is almost always a cooperative effort.
Much basic structural and administrative metadata is supplied by the technical staff who initially digitize or otherwise create the digital object, or is generated through an automated process. For descriptive metadata, it is best in some situations if the originator of the resource provides the information. This is particularly true in the documentation of scientific datasets where the originator has significant understanding of the rationale for the dataset and the uses to which it could be put, and for which there is little if any textual information from which an indexer could work.
However, many projects have found that it is more efficient to have indexers or other information professionals create the descriptive metadata, because the authors or creators of the data do not have the time or the skills. In other cases, a combination of researcher and information professional is used. The researcher may create a skeleton, completing the elements that can be supplied most readily. Then results may be supplemented or reviewed by the information specialist for consistency and compliance with the schema syntax and local guidelines.
Creation Tools Many metadata project
initiatives have developed tools and made them available to others, sometimes for free. A growing number of commercial software tools are also becoming available. Creation tools fall into several categories: • Templates allow a user to enter
the metadata values into pre-set fields that match the element set
being used. The template will then generate a formatted set of the element attributes and their corresponding values.
• Mark-up tools will structure the metadata attributes and values into the specified schema language. Most of these tools generate XML or SGML Document Type Definitions (DTD). Some templates include such a mark-up as part of their final translation of the metadata.
• Extraction tools will automatically create metadata from an analysis of the digital resource. These tools are generally limited to textual resources. The quality of the metadata extracted can vary significantly based on the tool’s algorithms as well as the content and structure of the source text. These tools should be con- sidered as an aid to creating metadata. The resulting metadata should always be manually reviewed and edited.
• Conversion tools will translate one metadata format to another. The similarity of elements in the source and target formats will affect how much additional editing and manual input of metadata may be required.
Metadata tools are generally developed to support specific metadata schemas or element sets. The websites for the particular schema will frequently have links to relevant toolsets.
Metadata Quality Control The creation of metadata
automatically or by information originators who are not familiar with cataloging, indexing, or vocabulary control can create quality problems. Mandatory elements may be missing or used incorrectly. Schema syntax may have errors that prevent the metadata from being processed correctly. Metadata content ter- minology may be inconsistent,
making it difficult to locate relevant information.
The Framework of Guidance for Building Good Digital Collections, available on the NISO website, articulates six principles applying to good metadata: • Good metadata should be
appropriate to the materials in the collection, users of the collection, and intended, current and likely use of the digital object.
• Good metadata supports inter- operability.
• Good metadata uses standard controlled vocabularies to reflect the what, where, when and who of the content.
• Good metadata includes a clear statement on the conditions and terms of use for the digital object.
• Good metadata records are objects themselves and therefore should have the qualities of archivability, persistence, unique ident- ification, etc. Good metadata should be authoritative and verifiable.
• Good metadata supports the long-term management of objects in collections.
There are a number of ongoing efforts for dealing with the metadata quality challenge: • Metadata creation tools are
being improved with such features as templates, pick lists that limit the selection in a particular field, and improved validation rules.
• Software interoperability pro- grams that can automate the “crosswalk” between different schemas are continuously being developed and refined.
• Content originators are being formally trained in understanding metadata and controlled vocabulary concepts and in the
PageUnderstanding Metadata 11
use of metadata-related software tools.
• Existing controlled vocabularies that may have initially been designed for a specific use or a narrow audience are getting broader use and awareness. For example, the Content Types and Subtypes originally defined for MIME email exchange are commonly used as the controlled list for the Dublin Core Format element.
• Communities of users are developing and refining audience-specific metadata schemas, application profiles, controlled vocabularies, and user guidelines. The MODS User Guidelines are a good example of the latter.
Interoperability and Exchange of Metadata
Some people ask: Do we need so many metadata standards? With all the metadata standards, initiatives, extensions, and profiles, how can interoperability be ensured?
It is important to remember that different schemes serve distinct needs and audiences. Comple- mentary schemes can be used to describe the same resource for multiple purposes and to serve a number of user groups. For ex- ample, a technical report could have a MARC metadata set in a library’s online catalog, an FGDC description as part of the National Spatial Data Infrastructure
C l e a r i n g h o u s e Mechanism, and an embedded set of Dublin Core ele- ments.
The Resource D e s c r i p t i o n Framework (RDF), developed by the World Wide Web Consortium (W3C), is a data model for the description of resources on the Web that provides a mechanism for integrating multiple metadata schemes. In RDF a name- space is defined by a URL pointing to a Web resource that describes the metadata scheme that is used in the description. Multiple namespaces can be defined, allowing elements from different schemes to be combined in a single resource description. Multiple
descriptions, created at different times for different purposes, can also be linked to each other. RDF is generally expressed in XML.
Metadata Crosswalks The interoperability and ex-
change of metadata is further facilitated by metadata crosswalks. A crosswalk is a mapping of the elements, semantics, and syntax from one metadata scheme to those of another.
A crosswalk allows metadata created by one community to be used by another group that employs a different metadata standard. The degree to which these crosswalks are successful at the individual record level depends on the similarity of the two schemes, the granularity of the elements in the target scheme compared to that of the source, and the compatibility of the content rules used to fill the elements of each scheme.
Crosswalks are important for virtual collections where resources are drawn from a variety of sources and are expected to act as a whole, perhaps with a single search engine applied. While these crosswalks are key, they are also labor intensive to develop and maintain. The mapping of schemes with fewer elements (less granularity) to those with more elements (more granularity) is problematic.
Table 1 on page 12 shows a crosswalk between Dublin Core, MARC 21, and VRA Core for selected elements. In this case, there is no attempt to map at the content level.
Metadata Registries Registries are an important tool
for managing metadata. Metadata registries can provide information on the definition, origin, source, and location of data. Registration can apply at many levels, including schemes, usage profiles, metadata elements, and code lists for element values. The metadata registry provides an integrating resource for
A Dublin Core description represented in RDF
<?xml version=”1.0"?> <!DOCTYPE rdf:RDF SYSTEM “http://purl.org/
dc/schemas/dcmes-xml-20000714.dtd”> <rdf:RDF xmlns_rdf=”http://www.w3.org/
1999/02/22-rdf-syntax- ns#”xmlns:dc=”http://purl.org/dc/elements/ 1.1/”> <rdf:Description about=”http://
www.niso.org/standards/resources/ Metadata_Demystified.pdf”>
<dc:title>Metadata Demystified</ dc:title>
<dc:creator>Brand, Amy</dc:creator> <dc:creator>Daly, Frank</dc:creator> <dc:creator>Meyers, Barbara</
dc:creator> <dc:subject>metadata</dc:subject> <dc:description>Presents an overview
of metadata conventions in publish- ing.</dc:description>
<dc:publisher>NISO Press</ dc:publisher>
<dc:publisher>The Sheridan Press</ dc:publisher>
<dc:date>2003-07</dc:date> <dc:format>application/pdf</
dc.format> </rdf:Description>
</rdf:RDF>
Page Understanding Metadata
legacy data, acts as a lookup tool for designers of new databases, and documents each data element.
Registries can also document multiple schemes or element sets, particularly within a specific field of interest. A good example is the U.S. Environmental Protection Agency’s Environmental Data Registry that provides information about thousands of data elements used in current and legacy EPA databases.
Standards relevant to metadata registries include ISO/IEC 11179, Specification and Standardization of Data Elements, and ANSI X3.285, Metamodel for the Management of Shareable Data.
Future Directions Most early metadata standards
have focused on the descriptive elements needed for discovery, identification, and retrieval. As metadata initiatives developed, administrative metadata, especially in the rights and preservation areas was further emphasized. Technical metadata is one area that still does not get much attention in metadata schemas. The effective exchange and use of the digital objects described by the metadata often requires knowledge of specific technical aspects of the objects beyond its filename and type. Newer standards are beginning to address this need. The NISO/AIIM standard, Z39.87, Data Dictionary— Technical Metadata for Digital Still Images, focuses solely on the technical data needed to facilitate
interoperability between systems of digital image files. The metadata elements defined in the standard cover basic image parameters such as compression and color profile, information about the equipment and settings use to create the image, and performance assess- ment data such as sampling frequency and color maps.
Metadata work is ongoing across a number of standards development organizations. In the International Organization for Standardization (ISO), a subcom- mittee of Technical Committee (TC) 46 (Information and documen- tation), is addressing metadata development for bibliographic applications. ISO TC 211 (Geo- graphic information / Geomatics) is developing metadata standards for applications in geographic information systems. The Data management and interchange subcommittee of ISO-IEC JTC1 (Information technology) is developing standards for the specification and management of metadata and has recently issued a technical report on Procedures for achieving metadata registry content consistency (ISO/IEC 20943).
Many organizations that developed metadata specifications outside the formal standards community are seeking to have their specifications turned into international standards. The Dublin Core is an example of this approach. It was originally de- veloped in 1995 at a workshop sponsored by OCLC and the
12
National Center for Super- computing Applications. In 2001, it became an official ANSI/NISO standard (Z39.85) and in 2003 Dublin Core was issued as an international standard (ISO 15836).
The World Wide Web Consortium’s (W3C) metadata activity has been incorporated into the Semantic Web, their initiative to “provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.” The RDF framework is one of the key enabling standards. The Semantic Web efforts are directed to standards that increase the interoperability of metadata, rather than specific metadata schemas.
The World Wide Web has created a revolution in the accessibility of information. The development and application of metadata represents a major improvement in the way information can be discovered and used. New technologies, standards, and best practices are continually advancing the applications for metadata. The resources in the following section will give you a head start in tracking developments and contain links to more information on the projects discussed throughout this document.
Dublin Core EAD MARC 21 Title Element Title <titleproper> 245 00$a (Title Statement/Title proper)
Author Element Creator <author>
700 1#$a (Added Entry–Personal Name) (with $e=author) 720$a (Added Entry–Uncontrolled Name/Name) (with $e=author)
Date Created Element Date.Created <unitdate> 260 ##$c (Date of publication, distribution,
etc.)
Table 1. Example of Metadata Crosswalk Mapping
PageUnderstanding Metadata
More Information on Metadata
13
General Resources
Digital Libraries: Metadata Resources (IFLA) http://www.ifla.org/II/ metadata.htm
A Framework of Guidance for Building Good Digital Collections http://www.niso.org/framework/ forumframework.html
Introduction to Metadata: Pathways to Digital Information by Martha Baca http://www.getty.edu/research/ conducting_research/standards/ intrometadata/index.html
Metadata: Cataloging by Any Other Name by Jessica Milstead and Susan Feldman ONLINE, January 1999 http://www.onlinemag.net/ OL1999/milstead1.html
Metadata and Its Application by Brad Eden Library Technology Reports (September-October 2002)
Metadata Demystified: A Guide for Publishers by Amy Brand, Frank Daly, Barbara Meyers NISO Press & The Sheridan Press, 2003, ISBN 1-880125-49-9 http://www.niso.org/standards/ resources/ Metadata_Demystified.pdf
Metadata Fundamentals for All Librarians by Priscilla Caplan ALA, 2003, ISBN: 0-8389-0847-0
Metadata Information Clearinghouse Interactive (MICI) http://www.metadata information.org
Metadata Portals and Multi- standard Projects by Candy Schwartz http://web.simmons.edu/ ~schwartz/meta.html
Metadata Primer – A “How To” Guide on Metadata Implementation [for digital spatial data] by David Hart and Hugh Phillips http://www.lic.wisc.edu/metadata/ metaprim.htm
Metadata Principles and Practicalities Duval, Erik, Wayne Hodgins, Stuart Sutton, and Stuart L. Weibel D-Lib Magazine 8(4) (April 2002) http://www.dlib.org/dlib/april02/ weibel/04weibel.html
Metadata Resources (UKOLN) http://www.ukoln.ac.uk/metadata/ resources
Metadata Standards http://www.chin.gc.ca/English/ Standards/metadata_intro.html
Metadata Standards, Crosswalks, and Standards Organizations http://staff.library.mun.ca/staff/ toolbox/standards.htm
Metadata.net – Projects, Tools & Services, and Schema Registry (Australia) http://metadata.net/
Preservation Metadata for Digital Objects: A Review of the State of the Art A White Paper by the OCLC/RLG
Working Group on Preservation Metadata, January 31, 2001 www.oclc.org/research/projects/ pmwg/presmeta_wp.pdf
Schemes, Initiatives, and Related Sites
Application profiles: mixing and matching metadata schemas Rachel Heery and Manjula Patel, Ariadne, Issue 25, September 2000. http://www.ariadne.ac.uk/issue25/ app-profiles/intro.html
The Cedars Project (CURL exemplars in digital archives) http://www.leeds.ac.uk/cedars/ metadata.html
CDWA (Categories for the Description of Works of Art) http://www.getty.edu/research/ conducting_research/standards/ cdwa/
DDI (Data Documentation Initiative) http://www.icpsr.umich.edu/DDI/
DOI (Digital Object Identifier) http://www.doi.org/
Dublin Core Metadata Initiative (DCMI) http://dublincore.org
EAD (Encoded Archival Description) http://www.loc.gov/ead/
Environmental Data Registry (EPA) http://www.epa.gov/edr/
FGDC Content Standard for Digital Geospatial Metadata (CSDGM) http://www.fgdc.gov/metadata/
Gateway to Educational Materials (GEM) http://www.geminfo.org/
Page Understanding Metadata14
IFLA Functional Requirements for Bibliographic Records http://www.ifla.org/VII/s13/frbr/ frbr.htm
IMS Global Learning Consortium http://www.imsglobal.org
<indecs> interoperability of data in ecommerce systems http://www.indecs.org/
LOM (Learning Object Metadata) http://ltsc.ieee.org/wg12/
MARC 21 (Machine-Readable Cataloging) http://www.loc.gov/marc
MetaWeb Project http://www.dstc.edu.au/Research/ Projects/metaweb/
METS (Metadata Encoding and Transmission Standard) http://www.loc.gov/standards/ mets/
MIX (Metadata for Images in XML Schema) http://www.loc.gov/standards/mix/
MODS (Metadata Object Description Schema) http://www.loc.gov/standards/ mods/
MPEG (Moving Picture Experts Group) http://www.chiariglione.org/mpeg/
NBII (National Biological Information Infrastructure) http://www.nbii.gov/
Nordic Metadata Projects http://www.lib.helsinki.fi/meta/
NSDI (National Spatial Data Infrastructure) http://www.fgdc.gov/nsdi/
OAI (Open Archives Initiative) http://www.openarchives.org/
OAIS (Open Archival Information System) http://www.ccsds.org/documents/ 650x0b1.pdf
ONIX (Online Information Exchange) http://www.editeur.org/onix.html
Open GIS Consortium http://www.opengis.org/
PADI (Preserving Access to Digital Information) http://www.nla.gov.au/padi/topics/ 32.html
PREMIS (PREservation Metadata: Implementation Strategies) http://www.oclc.org/research/ projects/pmwg
PURL (Persistent Uniform Resource Locator) http://purl.org
RDF (Resource Description Framework) http://www.w3.org/RDF/
SCHEMAS: Forum for Metadata Schema Implementors (UKOLN) http://www.ukoln.ac.uk/metadata/ schemas/
TEI (Text Encoding Initiative) http://www.tei-c.org/
VRA (Visual Resources Association) Core Categories http://www.vraweb.org/ vracore3.htm
XML (Extensible Markup Language) http://www.w3.org/XML/
Z39.50 http://www.loc.gov/z3950/agency/
ZING (Z39.50 Next Generation) http://www.loc.gov/z3950/agency/ zing/zing-home.html
Crosswalks and Lists of Crosswalks
All about Crosswalks http://www.oclc.org/research/ projects/mswitch/ 1_crosswalks.htm
Dublin Core / MARC / GILS Crosswalk http://www.loc.gov/marc/ dccross.html
FGDC to MARC http://www.alexandria.ucsb.edu/ public-documents/metadata/ fgdc2marc.html
Issues in Crosswalking Content Metadata Standards by Margaret St. Pierre and William P. LaPlant, Jr. http://www.niso.org/press/ whitepapers/crsswalk.html
MARC 21 to Dublin Core http://www.loc.gov/marc/ marc2dc.html
Metadata: Mapping between Metadata Formats (UKOLN) http://www.ukoln.ac.uk/metadata/ interoperability/
Metadata Mappings (Crosswalks) http://libraries.mit.edu/guides/ subjects/metadata/mappings.html
Metadata Standards Crosswalk (Getty) http://www.getty.edu/research/ conducting_research/standards/ intrometadata/3_crosswalks/ crosswalk1.html
Metadata Standards Crosswalks (Canadian Heritage Information Network) http://www.chin.gc.ca/English/ Standards/ metadata_crosswalks.html
PageUnderstanding Metadata 15
Metadata Registries & Clearinghouses
DCMI Registry Working Group http://dublincore.org/groups/ registry/
DESIRE Metadata Registry http://desire.ukoln.ac.uk/registry/
Environmental Data Registry http://www.epa.gov/edr/
FGDC Clearinghouse Registry http://registry.gsdi.org/
MICI (Metadata Information Clearinghouse Interactive) http:// www.metadatainformation.org/
AACR2 (Anglo-American Cataloging Rules) – A standard set of rules for cataloging library materials. The “2” refers to the second edition.
administrative metadata – metadata related to the use, management, and encoding processes of digital objects over a period of time. Includes the subsets of technical metadata, rights management metadata, and preservation metadata.
ANSI (American National Standards Institute) – administers and coordinates the U.S. voluntary standardization and conformity assessment system.
CDWA (Categories for the Descriptions of Works of Art) – a metadata element set for describing artworks.
crosswalk – a mapping of the elements, semantics, and syntax from one metadata scheme to another.
CSDGM (Content Standard for Digital Geospatial Metadata) – a metadata standard developed by the FGDC. Officially known as FGDC-STD-001.
dataset – a collection of computer- readable data records.
DC (Dublin Core) – a general metadata element set for describing all types of resources.
DDI (Data Documentation Initiative) – a specification for describing social science datasets.
descriptive metadata – metadata that describes a work for purposes of discovery and identification, such as creator, title, and subject.
DLF (Digital Library Federation) – a membership organization dedicated to making digital information widely accessible.
DOI (Digital Object Identifier) – a unique identifier assigned to electronic objects of intellectual property which can be resolved to the object’s location on the Internet.
DTD (Document Type Definition) – a formal description in SGML or XML syntax of the structure (elements, attributes, and entities) to be used for describing the specified document type.
EAD (Encoded Archival Description) – a metadata scheme for collection finding aids.
element set – information segments of the metadata record, often called semantics or content.
encoding rules – the syntax or prescribed order for the elements contained in the metadata description.
Glossary
NBII Metadata Clearinghouse http://metadata.nbii.gov/
The SCHEMAS Registry http://www.schemas-forum.org/ registry/
Tools for Metadata Creation
DDI Tools http://www.icpsr.umich.edu/DDI/ users/tools.html#a01
Dublin Core tools http://dublincore.org/tools/
FGDC Metadata Tools http://www.nbii.gov/datainfo/ metadata/tools/
Metadata Software Tools http://ukoln.bath.ac.uk/metadata/ software-tools/
OAI-Specific Tools http://www.openarchives.org/tools/ tools.html
RDF Editors and Tools http://www.ilrt.bris.ac.uk/ discovery/rdf/resources/#sec-tools
TEI Software http://www.tei-c.org/Software/ index.html
extension – an element that is not officially part of a metadata scheme, which is defined for use with that scheme for a particular application.
FGDC (Federal Geographic Data Committee) – a U.S. Federal government interagency committee responsible for developing the National Spatial Data Infrastructure.
GEM (Gateway to Educational Materials) – a U.S. Department of Education initiative that has defined an extension to the Dublin Core element set to accommodate educational resources.
GIS (Geographic Information System) – a computer system for capturing, managing, and displaying data related to positions on the Earth’s surface.
HTML (Hypertext Mark-up Language) – a set of tags and rules derived from SGML used to create hypertext documents for the World Wide Web. Officially, a W3C Recommendation.
<indecs> (Interoperability of Data in ECommerce Systems) – a framework for metadata to support commerce in intellectual property.
interoperability – the ability of multiple systems, using different hardware and software platforms, data structures, and interfaces, to exchange and share data.
Page Understanding Metadata16
ISO (International Organization for Standardization) – the primary international standards develop- ment organization.
IEC (International Electro- technical Commission) – an international standards develop- ment organization for all electrical, electronic and related technologies. Co-sponsors with ISO the Joint Technical Committee 1 on Infor- mation Technology.
LOM (Learning Object Metadata) – a metadata scheme for technology-supported learning resources.
MARC 21 (MAchine Readable Cataloging) — a formatting, record structure, and encoding standard for electronic bibliographic cataloging records developed by the Library of Congress. The “21” refers to the version of MARC issued in 1998 that integrated the U.S. and Canadian versions of MARC.
MARCXML – a metadata scheme for working with MARC data in a XML environment
metadata – structured information that describes, explains, locates, and otherwise makes it easier to retrieve and use an information resource.
metadata harvesting – a technique for extracting metadata from individual repositories and collecting it in a central catalog
METS (Metadata Encoding and Transmission Standard) – a metadata scheme for complex digital library objects.
MODS (Metadata Object Description Schema) – a metadata scheme for rich description of electronic resources.
MPEG (Moving Pictures Experts Group) – Standards Committee 29, Working Group 11 of ISO/IEC JTC1, which develops standards for digital audio and video. Also refers to a suite of standards developed by the group.
namespace – in RDF, a way to tie a specific use of a metadata element to the scheme where the intended definition is to be found.
NISO (National Information Standards Organization) – a standards development organ- ization, accredited by the American National Standards Institute, that develops library and information- related standards.
ONIX (Online Information Exchange) – a metadata scheme for book bibliographic, trade, and promotional data.
preservation metadata – a form of administrative metadata dealing with the provenance of a resource and its archival management.
profile – a subset of a scheme defined and used by a particular interest group to customize the scheme for its purposes.
PURL (Persistent URL) – a naming and resolution system developed by OCLC utilizing an intermediate redirection service to locate a resource’s URL.
qualifier – an optional sub-element to a Dublin Core element that is used to further refine the element or support a specific encoding scheme.
RDF (Resource Description Framework) – a language for representing metadata about Web resources so it can be exchanged between applications without loss of meaning. Officially, a suite of W3C specifications.
registry – a formal system for the documentation of the element sets, descriptions, semantics, and syntax of one or more metadata schemes.
rights management metadata – a form of administrative metadata dealing with the intellectual property rights of a resource.
scheme (schema)– a metadata element set and rules for using it.
semantics – the names and meanings of metadata elements.
SGML (Standard Generalized Markup Language) – a language used to mark-up electronic documents with tags that define the relationship between the content and the structure. Officially, international standard ISO 8879, Information processing—Text and office systems—Standard Gen- eralized Markup Language (SGML).
structural metadata – metadata that indicates how compound objects are structured, provided to support use of the objects.
syntax – rules for how metadata elements and their content are encoded.
technical metadata – a form of administrative metadata dealing with the creation or storage encoding processes or formats of the resource.
TEI (Text Encoding Initiative) – a metadata scheme for electronic text
URL (Uniform Resource Locator) – A unique address for identifying and locating a resource on the Internet.
VRA (Visual Resources Association ) Core – a metadata scheme for describing a visual work and its representations
W3C (World Wide Web Consortium) – an international consortium that develops consensus protocols and specifications to ensure the interoperability of the World Wide Web.
XML (Extensible Mark-up Language) – an application profile of SGML designed for use in Web applications. Officially, a W3C Recommendation.
Z39.50 – a NISO and ISO standard protocol for cross-system search and retrieval. Officially, international standard, ISO 23950, Information Retrieval (Z39.50): Application Service Definition and Protocol Specification, and ANSI/NISO standard Z39.50.
Glossary
PageUnderstanding Metadata
3M
American Association of Law Libraries
American Chemical Society
American Library Association
American Society for Information Science and Technology
American Society of Indexers
American Theological Library Association
ARMA International
Armed Forces Medical Library
Art Libraries Society of North America
AIIM International
Association of Information and Dissemination Centers
Association of Jewish Libraries
Association of Research Libraries
Auto-Graphics, Inc.
Barnes & Noble, Inc.
Book Industry Communication
California Digital Library
Cambridge Information Group
Checkpoint Systems, Inc.
College Center for Library Automation
Colorado State Library
CrossRef
Davandy, L.L.C.
Docutek Information Systems
Dynix Corporation
EBSCO Information Services
Elsevier Science Inc.
Endeavor Information Systems, Inc.
Entopia, Inc.
ExLibris USA
Fretwell-Downing Informatics
Gale Group
Geac Library Solutions
GIS Information Systems, Inc.
H.W. Wilson Company
Helsinki University Library
Index Data
Infotrieve
Innovative Interfaces, Inc.
Institute for Scientific Information
The International DOI Foundation
Ithaka/JSTOR/ARTstor
John Wiley & Sons, Inc.
KINS, Inc.
Library Binding Institute
Library of Congress
The Library Corporation
Los Alamos National Laboratory
Lucent Technologies
Medical Library Association
MINITEX
Modern Language Association
Motion Picture Association of America
MuseGlobal, Inc.
Music Library Association
National Agricultural Library
National Archives and Records Administration
National Federation of Abstracting and Information Services
National Library of Medicine
National Security Agency
Nylink
OCLC, Inc.
Openly Informatics, Inc.
ProQuest Information and Learning
Random House, Inc.
Recording Industry Association of America
The Research Libraries Group
SAGE Publications
Serials Solutions, Inc.
SIRSI Corporation
Society for Technical Communication
Society of American Archivists
Special Libraries Association
Synapse Corporation
TAGSYS, Inc.
Talis Information Ltd.
Triangle Research Libraries Network
U.S. Department of Commerce, NIST, Office of Information Services
U.S. Department of Defense, DTIC (Defense Technical Information Center)
U.S. Department of Energy, Office of Scientific & Technical Information
U.S. Government Printing Office
U.S. National Commission on Libraries and Information Science
VTLS, Inc.
WebFeat
Support the leaders in our community who support NISO as Voting Members:
ISBN 1-880124-62-9
- What is Metadata?
- What Does Metadata Do?
- Resource Discovery
- Organizing Electronic Resources
- Interoperability
- Digital Identification
- Archiving and Preservation
- Structuring Metadata
- Metadata Schemes and Element Sets
- Dublin Core
- Text Encoding Initiative (TEI)
- Metadata Encoding and Transmission Standard (METS)
- Metadata Objects Description Schema (MODS)
- Encoded Archival Description (EAD)
- Learning Object Metadata (LOM)
- E-Commerce
- <indecs>
- ONIX
- Visual Objects
- Categories for the Description of Works of Art (CDWA)
- VRA Core Categories
- MPEG Multimedia Metadata
- Metadata for Datasets
- Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM)
- Data Documentation Initiative (DDI)
- Extensions and Profiles
- NBII Biological Data Profile
- Gateway to Educational Materials (GEM)
- Creating Metadata
- Creation Tools
- Metadata Quality Control
- Interoperability and Exchange of Metadata
- Resource Description Framework (RDF)
- Metadata Crosswalks
- Metadata Registries
- Future Directions
- More Information on Metadata
- General Resources
- Schemes, Initiatives, and Related Sites
- Crosswalks and Lists of Crosswalks
- Metadata Registries and Clearinghouses
- Glossary
- Sidebars and Tables
- Dublin Core Example
- Metadata in Action (1)
- MODS Record Example
- Metadata in Action (2)
- Dublin Core description represented in RDF
- Example of Metadata Crosswalk Mapping
,
Methodology | Preparation | Execution | Documentation
Pre-Operational Considerations Workspace & Tools
Time and Resource Constraints
Adversary Sophistication
Clean/Secure Workstation
Fresh Research Accounts
Collection Tools
Deliverables and Scope Clean/Secure Connectivity
Exposure/Risk Factors Clean Browser w/Extensions
OSINT Cheat-Sheet Investigative Resources – Summer 2019
Control Expectations Storage/Archiving Solution
Communication and Sit-reps Documentation System
Investigative Steps OSINT Resources
Knoll Your Tools
Document Your “Knowns”
Query, Sweep, and Pivot
Define The Question
Set Up Collection
Complete Reporting and Archive
Consolidate Findings
OSINTFramework.com
Netbootcamp.org
Investigativedashboard.org
OSINTBrowser.com
Workinukraine.space
Start.me/p/b56xX8/osint
Ethical and Legal Assessment
INTELTECHNIQUES .com
Tab Management
https://www.one-tab.com/ (Local Storage Only) Simple Tab Management/Export For Chrome and Firefox
https://chrome.google.com/webstore/detail/graphitabs/dcfclemgmkccmnpgn- ldhldjmflphkimp?hl=en GraphiTabs – Tree View of Tabs
http://tabsoutliner.com/ Tab Management – Outline Format, Export, Sync (Paid version)
http://www.gettoby.com/ (Account Bases w/Sync) Thumbnailed Tab Management For Chrome and Firefox
https://clusterwm.com/ Simple Tab Manager w/Export (Sync Premium Offered)
Useful Browser Extensions
https://www.onenote.com/clipper Screen Capture and Tag (One-Note Users Only)
https://github.com/ssborbis/ContextSearch-web-ext Context Menu Search Menu
https://github.com/az0/linkgopher/ Simple Link Extraction
https://getfireshot.com/ Screen Capture and Annotation (as image or pdf)
http://www.osintbrowser.com/ OSINT Bookmarks
https://github.com/mozilla/multi-account-containers#readme Firefox – Multi-Account Containers (Compartmentalization)
https://github.com/marklieberman/downloadstar Firefox – Download all items in a webpage that match a pattern
Link Analysis/Visualization
https://www.paterva.com/buy/maltego-clients.php Maltego CE and CaseFile
https://gephi.org/ http://www.automatingosint.com/blog/category/gephi/
https://medium.com/@raebaker/using-lampyre-for-basic-email- and-phone-number-osint-e0e36c710880 (Lampyre)
https://vis.occrp.org/ Create Link Charts – Organized Crime & Corruption Project
https://www xmind.net/ Mind Mapping – Free and Paid Versions
My Workstation Setup
Workstation – Win 10, PIA/ProtonVPN, Chrome/Firefox, Vbox, Bus- cador/Kali, Nox/Geny, Hunch.ly, UC Cable/Mifi, Keypass, Malware- bytes, Glasswire
Email/Payments – Prontonmail, GMX, Fastmail, Blur, 33mail, Priva- cy.com, Vanilla Visa
Alt-Hardware: MacBook Air, Atom Text Editor, VMware Fusion, Chrome/Firefox, Little Snitch
Mobile – iPhone, MySudo, Signal, Wire – Android, burner, unlocked, on Mint sim kit
Office Software – Libre, OneNote, Notepad++, CherryTree, Stan- dard Notes, Paper notebook, Teams/Slack/Mattermost/Rocket
Hypervisors: Virtualbox, Buscador Linux, Kali Linux, Genymotion, Nox
http://www.visualsitemapper.com/ Domain Mapping
https://www.draw.io/ https://github.com/michenriksen/drawio-threatmodeling
https://github.com/woj-ciech/Danger-zone Link IPs, Domains, and Email Addresses
https://www.mindmup.com/ Mind Mapping – Free and Paid Tiers
https://www.nodexlgraphgallery.org/Pages/Registration.aspx Powerful Graphing Client – Free and Paid Tiers
https://github.com/mozilla/multi-account-containers#readme Firefox – Multi-Account Containers (Compartmentalization)
https://webrobots.io/ Scrape YP, Yelp, Ebay, Amazon, etc. Save as Excel or CSV
https://www.gettabli.com/ Simple, Private (offline-storage only) Tab Management
Google Operators Remember we can string multiple operators together
site: Limit results to those from a specific domain site:apple. com
“ ” Quotes indicate search for exact term “red rider BB gun”
AND Only show results for both terms apple AND orange
OR Search for term A, term B, or both. A pipe symbol is the same as OR. gun OR rifle is the same as gun | rifle
* Wildcard for words in a phrase that you don’t know wish * a star
( ) Group a set of words/operators separately (gun | pistol) ammo
– Exclude results including this word chicago baseball -cubs
$ Search for a certain price “apple watch” $299
cache: Most recent cached version of a domain cache:boston. gov
filetype: Only search for specific filetype, ext: works the same filetype:pdf “confidential” or ext:pdf “confidential”
related: Search for sites related to a domain related:sony.com
intitle: Find pages with a term in the page title intitle:sabotage
inurl: Find pages with a term in the url inurl:private
around(x) Find pages with terms in X words proximity of each oth- er microsoft (7) surface
info: Sometimes shows related pages, cache date etc. in- fo:chicago.gov
Adv. Search https://www.google.com/advanced_search
Bing Operators Most of the Google operators work in Bing
( ) Just like Google, terms or operators grouped in paren- thesis are processed together and separate from other conditions
OR All Bing searches are treated as AND searches unless you specify OR between terms goat OR pig OR cow
NOT Exclude results with a specific term(s) the – symbol also works boat NOT (raft OR ship)
loc: Return pages from a specific region(s) dogs (loc:GB OR loc:FR)
prefer: Weight results in favor of a term prefer:tomato plum apple
near:x Words in x proximity of each other red near:4 blue
ip Finds sites hosted on an IP address ip:208.43.115.82
site/domain: Filter for specific domain type site/.gov confidential
feed: Finds RSS feeds based on search terms feed:osint
Bing Adv. MS retired Bing’s advanced search page
info:https://www.lifewire.com/bing-advanced-search-3482817
More Operators: https://ahrefs.com/blog/google-ad- vanced-search-operators/
DuckDuckGo DuckDuckGo handles some operators a little differently
Cats dogs Results about cats or dogs
"cats and dogs" Results for exact term "cats and dogs". If no results are found, we'll try to show related results.
cats +dogs More dogs in results
cats filetype:pdf PDFs about cats. Supported file types: pdf, doc(x), xls(x), ppt(x), html
dogs site:exam- ple.com Pages about dogs from example.com
Cats -site:exam- ple.com Pages about cats, excluding example.com
intitle:dogs Page title includes the word "dogs"
inurl:cats Page url includes the word "cats"
Startpage Startpage makes Google requests on your behalf (privacy)
Operators Most standard Google operators work
Adv. Search https://www.startpage.com/en/advanced-search. html
Search Tips https://support.startpage.com/index.php?/Knowl- edgebase/List/Index/1
Yandex Most standard Boolean operators work (Google operators) such as site:
and “quotes“
Adv. Search Click the icon in the search bar
lang: Language filter ccn lang:fr
mime: Similar to filetype mime:docx gdpr
date: Page modified date bombing date:20180416
url: Similar to site: but adding a * to the end of the url pulls up any docs sharing that url url: Alice url:en.wikiquote.org/wiki/*
special operators: https://yandex.com/support/direct/ keywords/symbols-and-operators.html
Baidu Most standard Google Operators work on Baidu
Adv. Search https://www.baidu.com/gaoji/advanced.html
In English http://www.baiduinenglish.com/
Search Tips https://www.seomandarin.com/baidu-search-tips.html
Other International Consider using a proxy or VPN to appear in the target region
Adv. Search https://www.alexa.com/topsites/countries
Colossus http://www.searchenginecolossus.com/
Occrp https://data.occrp.org/
Int. OSINT https://start.me/p/W2kwBd/sources-cnty
UK https://investigativedashboard.org/databases/
http://www.rba.co.uk/search/TopSearchTips.html
Twitter Don’t forget Google – “site:twitter.com keyword”
Advanced Search https://twitter.com/search-advanced
Toolset http://tweetbeaver.com/
User Report https://tinfoleak.com/
Analytics https://socialbearing.com/
Analytics https://analytics.mentionmapp.com/
Analytics https://foller.me
Analytics http://twiangulate.com/search/
Older Posts http://staringispolite.github.io/twayback-machine/
Search https://snapbird.org/
Followers https://doesfollow.com
Video https://twdown.net/
Visualization https://treeverse.app/
Profile Changes https://spoonbill.io/
Mapping https://onemilliontweetmap.com
Inteltechniques https://inteltechniques.com/menu/pages/twitter. tool.html
Legal Requests https://help.twitter.com/en/rules-and-policies/twit- ter-law-enforcement-support#19
Facebook Warning: Many of these tools may not function correctly as
Facebook continues to kill graph search capabilty. https://www. vice.com/en ca/article/zmpgmx/facebook-stops-graph-search
FB Expand http://com.hemiola.com/bookmarklet/
Messenger https://www.messenger.com/
Mobile View https://m.facebook.com/
FB Videos https://www.facebook.com/watch
Video Download https://www.fbdown.net/index.php
Video Download https://www.tubeninja.net/how-to-download/face- book
NetBootcamp http://netbootcamp.org/facebook.html (Warning: Netbootcamp.com does run tracking scripts)
Research Tools http://www.researchclinic.net/facebook/
User -> ID https://lookup-id.com/ (lookup-id.com runs some tracking scripts)
Graph Search https://inteltechniques.com/menu/pages/facebook. tool.html (Reminder FB Graph Is Broken as of 8/2019)
Graph Search http://socmint.tools/graph.htm
Graph Search https://peoplefindthor.dk/ Graph Search https://pitoolbox.com.au/facebook-tool/ Graph Search https://searchisback.com/ Graph Search https://whopostedwhat.com/ Graph Search https://www.uk-osint.net/facebook.html Graph Search https://github.com/sowdust/searchbook
Graph Discussion https://inteltechniques.com/blog/2019/08/02/ the-privacy-security-osint-show-episode-133/
Legal & Privacy https://www.facebook.com/safety/groups/law/guide- lines
Reddit Don’t Forget Google – site:reddit.com keyword
Topic Search https://www.reddit.com/search?q=keyword
User Search https://www.reddit.com/user/username
Analytics https://pushshift.io/api-parameters/
Archives https://web.archive.org/web/*/https://www.reddit.com/ user/username
Inteltech- niques
https://inteltechniques.com/menu/pages/communities.tool. html
TikTok https://www.ticktick.com
Search https://tiktokapi.ga/
Search https://www.osintcombine.com/tiktok-quick-search
How To IOS https://www.pageflows.com/post/ios/general-browsing/ tiktok
How To Android https://www.wikihow.tech/Find-Friends-on-Tik-Tok-on- Android
Downloader https://en.savefrom.net/download-from-tiktok
Video Caputre https://airmore.com/watch-tik-tok-pc.html
Legal Requests https://www.tiktok.com/en/law-enforcement
Instagram User/Tag Search https://www.yooying.com/search
User/Tag Search https://www.social-searcher.com/
Hashtag Search https://tagboard.com/
Analyze Followers https://hypeauditor.com/
Location Search https://www.osintcombine.com/instagram-explorer
Search https://mulpix.com/
Media Capture https://downloadgram.com/
Media Capture https://instasave.xyz/
Downloader https://www.4kdownload.com/products/prod- uct-stogram
Profile Pic https://instadp.net/
Profile Pic http://izuum.com/
Stories https://storiesig.com/
Image Search https://imgwonders.com/
User/Hashtag http://picdeer.com/
User/Hashtag https://www.pictame.com/
Inteltechniques https://inteltechniques.com/menu/pages/instagram. tool.html
Snapchat User Search https://somesnapcode.com/
User Search https://www.snapdex.com/
Loc Search https://map.snapchat.com
Loc Search https://sovip.io
https://storage.googleapis.com/snap-inc/privacy/lawenforcement.pdf
Site Archives Searching pre-existing archives or requesting a capture
Wayback Ma- chine http://archive.org/web/
Archive Today http://archive.fo/
How To – Belling- cat
https://www.bellingcat.com/resources/how- tos/2018/02/22/archive-open-source-materials/
How To – Tech.co https://tech.co/news/tools-to-help-you-search-the-ar- chived-internet-2018-06
Mass Archive Script https://github.com/motherboardgithub/mass archive
OSINT Resource Lists Collections curated by my favorite OSINT experts:
OSINT.Team https://osint.team/home (OSINT rocket chat group)
Ph055a https://github.com/Ph055a/OSINT-Collec- tion#ph055as-osint-collection
Bellingcat Tool- Kit
https://docs.google.com/document/d/1BfLPJpRty- q4RFtHJoNpvWQjmGnyVkfE2HYoICKOGguA/edit
Sprp77 https://drive.google.com/drive/folders/1CBcemF- dorkAqJ-Sthsh67OVHgH4FQF05
Baywolf88 https://www.learnallthethings.net/osint-resources
Sector0355 https://medium.com/@sector035
Justin Nordine https://osintframework.com/
Start.me’s: Technisette
Bruno Mortier Emmanuelle
-Welch Travis Birch
https://start.me/p/7kxL6K/search-engines
https://start.me/p/b56xX8/osint
https://start.me/p/gyXexK/dating-apps-and-sites
https://start.me/p/kx72n5/databases
https://start.me/p/rxeRqr/aml-toolbox
https://start.me/p/ZME8nR/osint
Reuser http://arnoreuser.com/osint-repertorium/
Phonexicum https://phonexicum.github.io/infosec/osint.html#tools
i-intelligence https://www.i-intelligence.eu/wp-content/up- loads/2018/06/OSINT_Handbook_June-2018_Final.pdf
PI Links https://diligentiagroup.com/due-diligence/101-investi- gative-links-for-digging-up-information-on-people/
Photo/Image Search Reminder: we do not upload sensitive photos to the internet
Search/Reverse https://images.google.com/
Search/Reverse https://tineye.com
Search/Reverse https://www.bing.com/images/
Reverse Russia https://www.yandex.com/images/
Reverse Asia http://images.baidu.com/
Search http://www.picsearch.com/
Twitter Search http://twipho.net/
Flickr https://www.flickr.com/map
Exif http://exif.regex.info/exif.cgi
Edit Detection http://www.errorlevelanalysis.com/
Basic Forensics https://fotoforensics.com/
Text Recog. https://www.newocr.com/
Stolen Check www.stolencamerafinder.com/
Document Search Google “keyword AND ext:pdf OR ext:docx OR ext:txt OR ext.xlsx”
https://psbdmp.ws http://www.findpdfdoc.com/
http://cryptome.org https://www.base-search.net/
http://megasearch.co https://psbdmp.ws
Video Extension https://www.downloadhelper.net/
Youtube-DL https://github.com/ytdl-org/youtube-dl
Extension https://addons.mozilla.org/en-US/firefox/addon/ video-downloader-profession/
Screen Capture https://www.techsmith.com/screen-capture.html
Video Archives https://archiving.witness.org/archive-guide/ac- quire/acquiring-raw-video-and-metadata/
Capture/Collection Tools Although not open-source, Hunch.ly remains my go-to ;safety-net & collec-
tion too.
Hunch.ly https://hunch.ly/try-it-now https://hunch.ly//guides
Screen Capture Extension https://getfireshot.com/
Snip & Sketch https://www.microsoft.com/en-us/p/snip-sketch/9mz- 95kl8mr0l#activetab=pivot:overviewtab
Annotation https://www.diigo.com/
OneNote Clip https://www.onenote.com/clipper
Spiderfoot https://www.spiderfoot.net/
Documentation Tools Hunch.ly’s Report Builder Is Great To Build Off Of
OneNote https://www.onenote.com
Win Text Editor https://notepad-plus-plus.org/
Mac Text Editor https://atom.io/
Backnote https://chrome.google.com/webstore/detail/backnote/ gcikdkpooobdlgkkimomdgochmclliek?hl=en-US
Paliscope https://www.paliscope.com (Free Standard Ed for LE)
Zotero https://www.zotero.org/
Private Notes https://app.standardnotes.org/
Office Alternative https://www.libreoffice.org/
Maps/Locations
https://www.google.com/maps https://www.osintcombine.com/ social-geo-lens
https://www.mapillary.com/ https://openstreetcam.org
https://ctrlq.org/maps/address/ https://livingatlas.arcgis.com/way- back/
https://www.gpsies.com/track- List.do https://www.zillow.com/
Classifieds Ebay https://www.ebay.com/
Fatfingers http://fatfingers.com/default.aspx
Flippity http://www.flippity.com/
Kijiji https://www.kijiji.ca/
SearchAllJunk http://www.searchalljunk.com/
SearchTempest https://www.searchtempest.com/
NotiCraig https://noticraig.com/
Oodle https://www.oodle.com/local/burien-wa/
Offerup https://offerup.com/
Craigslist https://craigslist.org
Inteltechniques https://inteltechniques.com/menu/pages/communities. links.html
User Names Knowem https://knowem.com/checksocialnames.php?u=
NameChk https://namechk.com/
NameCheckr https://www.namecheckr.com/
NameVine https://namevine.com/
UserSearch https://usersearch.org/
UserSherlock http://usersherlock.com/
Profilr https://www.profilr.social/search/
Tinder https://www.gotinder.com/@user
Amazon https://www.google.com/search?q=site%3Aamazon. com+%22name%22
SocialCatfish https://socialcatfish.com/reverse-username-search/ WhatsMyName https://github.com/webbreacher/whatsmyname
Sherlock https://github.com/sherlock-project/sherlock
Inteltechniques https://inteltechniques.com/menu/index.html
Real Name “People” search engines
TruePeopleSch https://www.truepeoplesearch.com/
Spokeo https://www.spokeo.com/
Thatsthem https://thatsthem.com/
Adv Background https://www.advancedbackgroundchecks.com/
Nuwber https://nuwber.com/
FamTreeNow https://www.familytreenow.com/
PeopelByNm http://www.peoplebyname.com/
UFind http://ufind.name/…
PublicRcrds https://publicrecords.directory/
GoLookup https://golookup.com/
PMR http://publicemailrecords.com/name listings
Radaris https://radaris.com/
Cubib https://cubib.com/
ComLullar http://com.lullar.com/
Yasni http://www.yasni.com/
TabSearch https://www.zabasearch.com/
Spytox https://www.spytox.com/
Intelius https://www.intelius.com/
ZoomInfo https://www.zoominfo.com/
Whoodle https://www.whoodle.com/
PeekYou https://peekyou.com/
Webmil http://webmii.com/
CvGadget https://cvgadget.com/
Classmates https://www.classmates.com/
192 (UK) https://www.192.com/
Inteltechniques https://inteltechniques.com/menu/pages/person.tool. html
Email Don’t Forget A Basic Google Search “[email protected]”
Hunter.io https://hunter.io/ (make a free account)
HIBP https://haveibeenpwned.com/ (may be premium soon)
Verify https://tools.verifyemailaddress.io/
Verifalia https://verifalia.com/validate-email
Mailtester http://www.mailtester.com/testmail.php
FindThatEmail http://findthat.email/
AnyMailFinder https://anymailfinder.com/
EmailMatcher https://emailmatcher.com/
ProspectLinked https://prospectlinked.com/#/home
MetricSparrow http://metricsparrow.com/toolkit/email-permutator/
ThatsThem https://thatsthem.com/reverse-email-lookup
Spokeo https://www.spokeo.com/email-search
PsbDmp https://psbdmp.ws/
HackedEmails https://hacked-emails.com/
OCCRP https://data.occrp.org/search?q=gmail.com
Dehashed https://dehashed.com/
Hashes.org https://hashes.org/leaks.php
Gravatar https://en.gravatar.com/site/check/[email protected]
ReverseGenie http://www.reversegenie.com/searching=email
ManyContacts https://www.manycontacts.com/en/mail-check
ComLullar http://com.lullar.com/
Inteltechniques https://inteltechniques.com/osint/menu.email.html
Basic Guide https://www.blurbiz.io/blog/the-most-complete- guide-to-finding-anyones-email
OSINT Flow Charts: https://www.dfir.training/osint
Domains/IPs
Censys https://censys.io
IntelX https://intelx.io
Domaintools https://www.domaintools.com/
CentralOps https://centralops.net/co/
Whoxy https://www.whoxy.com/
IPLocation https://www.iplocation.net/
DNSLytics https://dnslytics.com/reverse-ip
Randhome https://www.randhome.io/blog/2018/02/23/harpoon- an-osint-/-threat-intelligence-tool/
CrimeFlare http://crimeflare.org:82/
Spyonweb http://spyonweb.com/
Pub-DB http://pub-db.com/
Whoisology https://whoisology.com/
Visualping https://visualping.io/
WatchThatPage http://watchthatpage.com/
PentestTools https://pentest-tools.com/information-gathering/ find-subdomains-of-domain#
SharedCount https://www.sharedcount.com/
SmallSEO https://smallseotools.com/backlink-checker/
SimilarWeb https://www.similarweb.com/
Alexa https://www.alexa.com/siteinfo/inteltechniques.com
Hunter.io https://hunter.io/
ViewDNS https://viewdns.info/
Robtex https://www.robtex.com/?=
Majestic https://majestic.com/
D-Me http://d-me.info/
Netcraft https://www.netcraft.com/
DomainBigData https://domainbigdata.com/
Inteltechniques https://inteltechniques.com/osint/domain.search.html
Inteltechniques https://inteltechniques.com/blog/2018/04/24/search- ing-subdomains-with-findsubdomains-com/
IP6Locator http://ipv6locator.net/
ViewDNS https://viewdns.info/
Maxmind https://www.maxmind.com/en/home
IP2Location https://www.ip2location.com/demo/
IPFingerprints https://www.ipfingerprints.com/
ThatsThem https://thatsthem.com/reverse-ip-lookup
Netbootcamp https://netbootcamp.org/websitetool.html
Shodan https://www.shodan.io/
Inteltechniques https://inteltechniques.com/menu/pages/ip.tool.html#
Phone Numbers For phone #s consider gov/paid options (OSINT is limited)
Zaba https://www.zabasearch.com/reverse-phone-lookup/
USPhoneBook https://www.usphonebook.com/
TruePeopleSearch https://www.truepeoplesearch.com/#
Whitepages+ https://whitepages.plus/
ThatsThem https://thatsthem.com/
TrueCaller https://www.truecaller.com/
Whitepages https://www.whitepages.com/reverse-phone | Reverse Phone Lookup
411 https://www.411.com/reverse-phone
CellRevealer https://www.cellrevealer.com/
FoneFinder http://www.fonefinder.net/
WhoCalld https://whocalld.com/
SpyDialer https://www.spydialer.com/
Searchbug https://www.searchbug.com/tools/
NumberGuru https://www.numberguru.com/phone/
ReverseGenie http://www.reversegenie.com/
YellowPages https://people.yellowpages.com/whitepages/?re=SP people search
Spokeo https://www.spokeo.com/reverse-phone-lookup
PhoneValidator https://www.phonevalidator.com/index.aspx
CallerIDTest https://www.calleridtest.com/
IMEI https://www.imei.info/
IMEI24 https://imei24.com/phone base/
Sync https://sync.me/
Infobel https://www.infobel.com/
DialingCode http://www.dialingcode.com/
OpenCnam https://www.opencnam.com/
TeleFoonGids https://telefoongids.2link.be/
ServiceObjects https://www.serviceobjects.com/developers/lookups/ geophone-plus
WTNG http://www.wtng.info/index.html
SeanLawson https://www.seanlawson.net/2019/02/use-chrome- developer-tools-view-masked-phone-numbers-for-free- people-search/
NANPA https://www.nationalnanpa.com/enas/coCodeRepor- tUnsecured.do?reportType=7
Inteltechniques https://inteltechniques.com/osint/menu.phone.html
Vehicles CarOwners https://carsowners.net
NICB https://www.nicb.org/vincheck
OReilly https://www.oreillyauto.com/
Carvana https://www.carvana.com/
CheckThatVIN https://checkthatvin.com/ctv#/home
CarFax https://www.carfax.com/processQuickVin.cfx
VehicleHistory https://www.vehiclehistory.com/license-plate-search
CarOwners https://carsowners.net/
Misc. Tools & Tricks Efficiency and Organizational Tools That I Use
Better Windows File Search
https://www.voidtools.com/
Synced Notes https://www.onenote.com
Encrypted Coms https://signal.org/
Encrypted Coms https://wire.com/en/
Encrypted Email https://protonmail.com/ (use the free tier for burner/ seed accounts)
Hotkey Panel https://www.elgato.com/en/gaming/stream-deck
NAS/Local Cloud https://www.synology.com/en-us
Screen Capture https://www.techsmith.com/store/snagit
Screen Capture https://getfireshot.com/buy.php (pro supports multi- page pdf)
Paper Notebooks https://www.costco.com/Moleskine-Cahier-6-Pack-Extra- Large-Notebooks.product.100300742.html
Veracrypt https://www.youtube.com/watch?v=cxo8xosH TI Vera- crypt containers are ideal for archiving cases or placing them on flash media for delivery to clients.
Tech Issues https://stackoverflow.com/ Aside from Googling your tech issues, stackoverflow has discussion on just about any desktop or software issue.
Virtual Machines Follow written steps verbatim when installing VMs
Buscador https://inteltechniques.com/buscador/
Virtualbox https://www.virtualbox.org/wiki/Downloads
VBox Extensions
https://download.virtualbox.org/virtualbox/6.0.10/Oracle VM VirtualBox Extension Pack-6.0.10.vbox-extpack
Kali Linux https://www.kali.org/downloads/
Tails https://tails.boum.org/
Update Linux apt-get update && apt-get upgrade
Update You- tube-DL sudo -H pip install –upgrade youtube-dl
Common Error Make sure virtualization is enabled in BIOS settings
Host Key Win – Right Control Key Mac – Left Command Key
Vbox Scale Issues
host + f, to switch to full screen mode, if not yet,
host + c, to switch to/out of scaled mode,
host + f, to switch back normal size, if need
3rd Party Over- view https://www.youtube.com/watch?v=7Y fKC5EN10
LinkedIn site:linkedin.com inurl:pub -inurl:dir “at Microsoft” “Current”
site:linkedin.com “Real Name”
User Query https://gitlab.com/initstring/linkedin2username
Email Query https://github.com/pry0cc/GoogLinked
Breach Data https://archive.org/details/LIUsers.7z
Inteltechniques https://inteltechniques.com/menu/pages/linkedin. tool.html
Speed Tricks Saving a few seconds here and there adds up over time
Context Search https://github.com/ssborbis/ContextSearch-web-ext
Add As Search Engine
https://www.wired.com/2014/07/tip-week-chrome-site- search/
Default to Last Year
https://thepracticalsysadmin.com/defaulting-google- search-results-to-the-past-year/
Keyboard Shortcuts
https://www.quinnssmtbrand.com/windows-key- board-shortcut/
Gaming Legal requests: https://www.search.org/resources/isp-list/
Discord Search https://www.discordportal.com/
Discord Search https://discordservers.com/
Discord Search https://discord.center/
Discord Search https://disboard.org/
Discord Search https://discord.me/
Discord Search https://support.discordapp.com/hc/en-us/arti- cles/115000468588-Using-Search
Discord Capture https://dht.chylex.com/ | Discord History Tracker
Twitch https://www.twitchtools.com/
Fortnite https://fortnitetracker.com/profile/search?q=
PSN https://psnprofiles.com/search/
Mixer https://www.lifewire.com/what-is-mixer-4156866
Steam https://steamrep.com/ or https://steamid.uk/
Business & Organizations Google: resume AND “real name”
OpenCorp https://opencorporates.com/
Rocketreach https://rocketreach.co/
OCCRP https://data.occrp.org/
CorpWiki https://www.corporationwiki.com/
Recruitin https://recruitin.net/
Indeed https://www.indeed.com/
MarketVisual http://marketvisual.com/
AihitData https://www.aihitdata.com/
Glassdoor https://www.glassdoor.com/Reviews/index.htm
LittleSis https://littlesis.org/
OpenSanctions https://www.opensanctions.org/
CEOEmail https://ceoemail.com/
Enigma https://public.enigma.com/browse/collection/ corp-watch-company-subsidiaries/
Angel https://angel.co/
RipoffReport https://www.ripoffreport.com/
Sector035’s Guide
https://medium.com/@sector035/gathering-company-in- tel-the-agile-way-6db12ca031c9
Operational Security – Browsers
Browser, Session, and Site Tests
Device Fingerpint https://panopticlick.eff.org/
Browser Fingerpint https://amiunique.org/fp
Browser Fingerpint https://www.deviceinfo.me/
Browser Fingerpint https://browseraudit.com
Browser Fingerpint https://browserleaks.com/
Browser Fingerpint https://pixelprivacy.com/resources/browser-fin- gerprinting/
Browser Fingerpint https://detectmybrowser.com/
IP Leaks https://ipleak.net
DNS Leaks https://www.dnsleaktest.com/
Email Leaks https://www.emailprivacytester.com
Site Privacy Test https://webbkoll.dataskydd.net/en/
Privacy Resources https://inteltechniques.com/links.html
Operational Security – Windows
Recommended Tools For Windows Security
Create Non-Priv- ledged User
https://support.microsoft.com/en-us/help/4026923/win- dows-10-create-a-local-user-or-administrator-account
Anti-Virus https://www.microsoft.com/en-us/windows/comprehen- sive-security
Anti-Malware https://www.malwarebytes.com/mwb-download/
Anti-Spyware https://www.safer-networking.org/
Windows Privacy https://ssd.eff.org/en/module/how-delete-your-data-se- curely-windows
Win10 Privacy https://www.thewindowsclub.com/privatewin10-ad- vanced-windows-10-privacy-tool
Win10 Privacy https://fdossena.com/?p=w10debotnet/index 1903.frag
Check Your Micro- Soft Data https://account.microsoft.com/account/privacy
Network Activity https://www.glasswire.com/
Password Manager https://keepassxc.org/
Cleaner https://www.bleachbit.org/download/windows
Cleaning Manually https://www.makeuseof.com/tag/best-way-clean-win- dows-10-step-step-guide/
Common Missteps
Methodology is more important that tools or techniques because those things change. Invest in defining strong process.
Are you signed into a live session for the platform you are query- ing? ie: make sure you are signed into FB in another tab
Do you have script blockers that might be preventing data from loading on a page? (ie:privacy badger, ublock, ghostery)
Failure to use non-OSINT approaches and strategies ie: social engineering (consider a friendly phone call)
Including a space at the end when pasting a account ID or other keyword into a query form field.
Start looking at page source to see what is going on behind the scenes. If you only look at the gui, you are missing alot.
Location. Your search results are being scewed by yoru perceived location, consider using VPN to “relocate”.
Tenacity wins the day. Most answers are not going to fall into your lap. Patience and persistence above all else.
More OSINT Resources
https://docs.google.com/document/d/1BfLPJpRtyq4RFtHJoNpvWQjm- GnyVkfE2HYoICKOGguA/ (Bellingcat Toolkit)
https://www.i-intelligence.eu/wp-content/uploads/2018/06/OSINT Handbook June-2018 Final.pdf (I-Intelligence Collection)
https://medium.com/@sector035 (@sector035)
https://github.com/Ph055a/OSINT-Collection (OSINT.Team Collection)
https://www.osinttechniques.com/osint-tools.html
https://osintcurio.us/10-minute-tips/
https://www.learnallthethings.net/creepyosint (@baywolf88)
https://atlas.mindmup.com/digintel/digital intelligence training/index. html
OSINT METHODOLOGY 101 BUILDING AN EFF IC IENT, REPEATABLE, AND ARTICULABLE PROCESS
Basic Investigative Steps Working up your first case with your new tools and techniques
1. Set up your note-taking and data collection to track your work – paper notebook, One-Note, Hunch.ly, direc tory on encrypted flash drive, etc. 2. List your investigative goals – full profile, locate for apprehension, identify associates, collect digital evi dence, etc. (are you collecting intel or evidence for court?) 3. List your seed info – emails, phone numbers, names, etc. 4. Run all of your paid and/or gov queries and use those to add to your seed information. If possible get a hold of a booking or DOL photo for comparison while researching social media. 5. Run Accurint (Lexis-Nexis), TLO, or Clear reports. 6. Fire up firefox/chrome with your plugins of choice – noscript, https everywhere, ghostery, fireshot, one-tab (or use browsers in Buscador VM) 7. If it’s a serious investigation I turn on hunch.ly and enter my "selectors" (keywords from seed info) 8. I do a quick Google search and check my people finder site of choice for that week. ["James McIntire" "Denver"] and then this week truepeoplesearch.com These are just quick for low hanging fruit. 9. Go to https://inteltechniques.com/menu.html (or your OSINT toolset of choice ie: osintframework.com) and use the tabs on the left hand side to select the categories that match your seed info. My typical order is email, real name, search engines, Facebook, twitter and then the rest depending on what you have to go on. 10. I exhaust inteltechniques.com tools closing any tabs that return false positives or no useful results. Any page that is important I note any identifiers (account IDs, user names, etc) on my notepad and fireshot a pdf of the page. That pdf is saved in the case directory. On a case with multiple targets create subfolders for each person of interest. 11. Either periodically or when I'm done with my research I copy/paste or manually enter any pertinent info into a profile or case report in either word or one-note. I embed any pertinent screen captures, pdfs such as lexis-nexis reports, and good photos of the targets, any vehicles and addresses. 12. I go over that report with the case detective or agent to explain my investigation and see if they have any questions or want any additional info. 13. My rough notes, workbooks, hunch.ly files, and/or cloned VMs (if I used buscador) are usually saved in case I need them for court. The exceptions are things like intel gathering for operations, events, threat assess- ments, etc. A hunch.ly export might be burned to disc as evidence but be cautious of any unintend ed data that might have been unintentionally saved during that session. The VM backup should not go into evidence as it would divulge trade-craft. Treat it as an undercover laptop that you can refer to, but avoid exposing it unless you are forced to (work with your prosecutor to fight this). If you don't need that VM for court, do not keep it (hording data comes with custodial responsibilities and potential liabilities). 14. I make sure I have a fresh VM for the next case or crisis that comes up. I also make new accounts to have in pocket if any of my research accounts were burned. Better to prepare for the next case at the end of the previous and be ready to go at a moments notice 15. Wash, rinse, repeat. Track successes to justify more equipment, staffing, and training. Note: My standard setup is an off-grid windows pc, on a UC cable modem or mifi (VPN as appropriate). For quick checks such as events, threats, etc. I stay in windows and just use chrome/Firefox and the links on inteltechniques.com. This is for convenience and speed with less fuss when there's less of a need for com partmentalization, security, and/or anonymity. For investigations I typically use Buscador with Hunch.ly installed, and all fresh research account. Quick utility vs. backstopped single purpose – use the right tool for each mission.
ACCOUNT CREATION 101 BUILDING AN EFF IC IENT, REPEATABLE, AND ARTICULABLE PROCESS
Building Reliable Research Accounts
This is a list of recommended steps for creating investigative/research social media accounts. These are largely based on feedback from our community and their experiences with having their accounts locked or suspended. Where applicable steps are in order of preference in regards to successfully avoiding security challenges.
Equipment Setup – It may seem simple, but the equipment and connection you are on matters. 1. Avoid VPNs during account creation, most of their IP ranges are flagged 2. Mifi’s or dynamic IP devices work quite well for account creation 3. Public networks (Starbucks Wi-Fi) but be aware that you are being exposed and cross-correlated with other
users on that network 4. Phone #- A real non-VOIP phone number will save you a lot of hassle, we recommend a $5 Mint sim card kit
paired with an unlocked smart phone (mintmobile.com) 5. Online Footprint – “Google” your name and employer. Print the first two pages of results and include this in
your binder as the “low hanging fruit” of personal data. Covert Accounts 1. We usually make FB, IG, and Twitter at once and tie them in as one covert profile. Each adds depth and verac ity to the others (intentional cross correlation). 2. Keep notes on your covert details either in a paper notebook or a digital format like a password manager or spread sheet, having your security requirements in mind. 3. If it is a sensitive or deep infiltration case make sure to compartmentalize this profile from the get-go (connec tion, browser, device (use VM to isolate), etc.) 4. Connection: a. no VPN during account creation, most VPN IP blocks are flagged b. Cellular data connections (MiFi’s) are good – dynamic/shared IPs c. Another technique is to get a free tier AWS EC2 or Digital Ocean VM and use it to make the account as then you will have an AWS IP, this is more advanced but works pretty well if you are comfortable with VMs and learning to navigate AWS. Some groups even run full investigative VMs on AWS, but again this is a more advanced setup that takes some work to sort out. d. Another advanced technique is to roll your own VPN thru AWS as the providers tend not to flag AWS https://github.com/StreisandEffect/streisand 5. Email Address: a. no Gmail, Hotmail, yahoo, or other top free mail (Gmx is an exception for now) b. Private domains work best, grab a Namecheap or GoDaddy domain and webmail for cheap and make a bunch of account with them c. Gmx.us accounts seem to work ok (for now) and require no existing email or contact info d. Sudomail and Protonmail addresses work ok, not as good as a private domain though 6. Phone #: a. You might get lucky and not get the phone number requirement, but also sometimes it won’t require it at first but then a couple hours or days in it will throw it at you as a security requirement b. No VOIP – most number blocks are flagged c. Mint test kits and an unlocked phone are a cheap way to get 7 days on a real number 1. Make sure you have Mint coverage in your area 2. https://www.amazon.com/Mint-Mobile-Starter-Verify-Compatibility/dp/B0786RD524 ($5 for two sims) 3. You might then port the number over to google voice 4. Some groups buy these in bulk d. You can also use an extra # on a real account (i.e.: Verizon) and then port it over to google voice and then draw a new # for that Verizon account e. Some people will also use hotel phones and the like when traveling to roll accounts, but that is kind of
ACCOUNT CREATION (CONT.)
BUILDING AN EFF IC IENT, REPEATABLE, AND ARTICULABLE PROCESS
7. Once we get into our new account, we do not leave it fallow, start making it feel real right away 8. Choose a name that is generic, but not too generic a. i.e.: Nicky Robinson, Hunter Reynolds, etc. b. http://howmanyofme.com/ 9. Name, gender, city, employer (school) should make sense, remember a real person at FB will likely look at your profile if it is reported as suspicious, we want to pass the smell test 10. Profile/cover photo a. We don’t ever purport to be a specific individual without consent (i.e.: no identity theft) b. Pikwizard.com – Good source for free for anything licensed photos c. Pixabay.com is also decent d. Avatar makers are another option https://mashable.com/2007/09/12/avatars/#mn3Ph1PwgZqi e. fiverr.com – You can buy profile photos for cheap or anything else really…avoid buying bulk accounts, they are often locked, scams, or stolen f. I also like taking a pic from images.bing.com of a large crowd (road race, sporting event, concert), use the snip tool to crop it, and then post the still large group shot, it’s unclear who we are in the group and yet it’s the kind of content people post for profiles or banners because the internet is all about bragging g. Get creative – general rule is snip, crop, filter, logical pic choice 1. Time to flesh out our profile by making some friends a. Join Groups – anything that has large groups that accept anyone b. Nerdy groups and pop culture are my favs: video games, cosplay (cause then costumed profiles make sense), etc. c. If you are doing a deep infiltration you may have to research your targets groups, don’t join her/his groups directly, join similar and work your way in slowly after you have some history d. Do some liking and commenting in groups for a day or two e. then https://www.facebook.com/find-friends/browser/ and let FB recommend friends. We never cold call friends anymore, we let FB tell who it’s already cross correlated with our profile. This reduces chances of getting flagged significantly. 2. Posts: August 1st Facebook cut off all 3rd part app access except for messenger or FB pages. We formerly used IFTTT and WordPress to auto-post but they are broken for now. IFTTT still works for twitter. 3. Avoid political chat and comments. Politics and social issues are high on the radar of the FB watchdogs due to the fake news and voter tampering concerns. 4. Keep track of covert accounts in a spread sheet or better yet a password manager. 5. Sim jacking Twitter accounts is very popular so use long passphrases even on your sock accounts and consid er 2-factor if they are mature or otherwise valuable accounts 6. Know your agencies policies around things like friending and any levels of approval or documentation req- uired 7. …and of course, we always use our powers for good so we always assume that our investigation will eventu ally see the light of day so make sure you are proud of how your activity will look in retrospect by an objective 3rd party in regard to reasonable and responsible
Note: This is purely anecdotal, but in addition to “getting into character” and making our accounts feel real, I suspect that there may be some value to occasionally clicking on ads and other content that the platform is pushing at you. This is not a privacy/security best practice, but there are detection algorithms that may favor revenue positive accounts. Again, this is just a theory.
REPORTING SAMPLE COVER/FACE SHEET
LOGO HERE Company/Org Name
Section or Analyst Name
Open Source Investigative Profile Summary of Findings
Subject ID
Name: DOB: Address: Phone #1:
Phone #2: Employer: SS#: Vehicles:
Alternate Identities and Associations
Email #1: Email #2: Email #3: Email #4: User Name: UN #2 Facebook : FB # Twitter: TW #: Instagram: IG #:
Photos/Video
☐Photos
☐Video
Description Source
Attachments
☐ Excel Profile Report ☐ Data Source DVD ☐ Photographs ☐ Hunch.ly Archive
☐ Link Analysis Report ☐ Comprehensive TLO, Clear, Accurint Report ☐ DOL/GOV Checks ☐ Other: ____________________
Relatives:
SHORTCUTS & HOT-KEYS COMPLETING 1,000 SMALL TASKS A L IT TLE FASTER
Windows Shortcut Keys Shortcuts for Mac Windows Key + R: Opens the Run menu. Command + X: Cut selected text and copy it.
Windows Key + E: Opens Explorer. Command + C: Copy selected text.
Alt + Tab: Switch between open programs. Command + V: Paste copied text.
Windows Key + Up Arrow: Maximize current window. Command + Z: Undo previous command.
Ctrl + Shift + Esc: Open Task Manager. Command + A: Select all items.
Windows Key + Break: Opens system properties. Command + F: Open Find window to search text.
Windows Key + F: Opens search for files and folders. Command + H: Hide windows of the front app.
Windows Key + D: Hide/display the desktop. Command + N: Open a new document or window.
Alt + Esc: Switch between programs in order they were opened. Command + O: Open a selected item.
Alt + Letter: Select menu item by underlined letter. Command + P: Print current document.
Ctrl + Esc: Open Start menu. Command + S: Save current document.
Ctrl + F4: Close active document (does not work with some applications). Command + W: Close front window.
Alt + F4: Quit active application or close current window. Command + Q: Quit the app.
Alt + Spacebar: Open menu for active program. Command + M: Minimize the front window to the Dock.
Ctrl + Left or Right Arrow: Move cursor forward or back one word. Command + Spacebar: Open Spotlight search field.
Ctrl + Up or Down Arrow: Move cursor forward or back one paragraph. Command + Tab: Switch between open apps.
F1: Open Help menu for active application. Command + B: Bold selected text.
Windows Key + M: Minimize all windows. Command + I: Italicize selected text.
Shift + Windows Key + M: Restore windows that were minimized with previous keystroke.
Command + U: Underline selected text.
Windows + F1: Open Windows Help and Support. Command + Semicolon (;): Find misspelled words in document.
Windows + Tab: Open Task view. Option + Command + Esc: Choose an app to force quit.
Windows + Break: Open the System Properties dialog box. Shift + Command + Tilde (~): Switch between open windows.
Hold Right SHIFT key for eight seconds: Switch FilterKeys on and off. Shift + Command + 3: Take a screenshot.
Left Alt + Left Shift + Print Screen: Switch High Contrast on and off. Fn + Up Arrow: Scroll up one page.
Left Alt + Left Shift + Num Lock: Switch Mouse keys on and off. Fn + Down Arrow: Scroll down one page.
Press Shift five times: Switch Sticky keys on and off. Fn + Left Arrow: Scroll to beginning of document.
Hold Num Lock for five seconds: Switch Toggle keys on and off. Fn + Right Arrow: Scroll to end of document.
Ctrl+Tab Switch Between Program Groups
F11 Maximize Window Finder Shortcuts Ctrl+A Select Text (Expanded with Windows 10) Shift + Command + F: Open All My Files window.
Ctrl+C Copy Text Shift + Command + K: Open Network window.
Ctrl+V Paste Text Option + Command + L: Open Downloads folder.
Win+R, then type ‘cmd’ Command Prompt Shift + Command + O: Open documents folder.
Tab Autocomplete Folder or File Name Shift + Command + U: Open Utilities folder.
Alt-Tab Switch Between Open Applications Option + Command + D: Show or hide the Dock.
Windows logo key + Tab Task View Shift + Command + N: Create a new folder.
Windows logo key + X Shutdown Your Workstation Command + Delete: Move selected item to the Trash.
Windows logo key + L Lock Your Workstation Shift + Command + Delete: Empty Trash.
*www.quinnssmtbrand.com/windows-keyboard-shortcut/
SHORTCUTS & HOT-KEYS COMPLETING 1,000 SMALL TASKS A L IT TLE FASTER
Chrome
Shortcut Keys Description Alt+Home Open your homepage. Alt+Left Arrow Back a page. Alt+Right Arrow Forward a page. F11 Display the current website in full-screen mode. Pressing F11 again will exit this mode. Esc Stop loading the page or a download from loading. Ctrl+(- or +) Zoom in or out of a page, "-" will zoom out and "+" will zoom in on the page. Ctrl+1-8 Pressing Ctrl and any number 1 through 8 moves to the corresponding tab in your tab bar. Ctrl+9 Switch to last tab. Ctrl+0 Reset browser zoom to default. Ctrl+Enter This combination is used to quickly complete an address. For example, type "computerhope" in the
address bar and press Ctrl+Enter to get https://www.computerhope.com. Ctrl+Shift+Del Open the Clear browsing data window to quickly clear private data. Ctrl+Shift+B Toggle the bookmarks bar between hidden and shown. Ctrl+A Select everything on a page. Ctrl+D Add a bookmark for the page currently opened. Ctrl+F Open the "find" bar to search text on the current page. Ctrl+O Open a file in the browser. Ctrl+Shift+O Open the Bookmark manager. Ctrl+H Open browser history in a new tab. Ctrl+J Display the downloads window. Ctrl+K or Ctrl+E Moves your text cursor to the omnibox so that you can begin typing your search query and per-
form a Google search. Ctrl+L Move the cursor to the browser address bar and highlight everything in it. Ctrl+N Open New browser window. Ctrl+Shift+N Open a new window in incognito (private) mode. Ctrl+P Print current page or frame. Ctrl+R or F5 Refresh the current page or frame. Ctrl+S Opens the Save As window to save the current page. Ctrl+T Opens a new tab. Ctrl+U View a web page's source code. Ctrl+W Closes the currently selected tab. Ctrl+Shift+W Closes the currently selected window. Ctrl+Shift+T This combination reopens the last tab you've closed. If you've closed multiple tabs, you can press
this shortcut key multiple times to restore each of the closed tabs. Ctrl+Tab Moves through each of the open tabs going to the right. Ctrl+Shift+Tab Moves through each of the open tabs going to the left. Ctrl+Left-click Open a link in a new tab in the background. Ctrl+Shift Left-click Open a link in a new tab and switch to the new tab. Ctrl+Page Down Open the browser tab to the right. Ctrl+Page Up Open the browser tab to the left. Spacebar Moves down a page at a time. Shift+Spacebar Moves up a page at a time. Home Go to top of page. End Go to bottom of page. Alt+Down Arrow Display all previous text entered in a text box and available options on a drop-down menu.
*Shortcut List Source: www.computerhope.com
SHORTCUTS & HOT-KEYS COMPLETING 1,000 SMALL TASKS A L IT TLE FASTER
Firefox
Shortcut Keys Description F5 Refresh current page, frame, or tab. F11 Display the current website in fullscreen mode. Pressing F11 again will exit this mode. Esc Stop page or download from loading. Spacebar Moves down a page at a time. Alt+Home Open your homepage. Alt+Down arrow Display all previous text entered in a text box and available options on drop-down menu. Alt+Left Arrow Back a page. Alt+Right Arrow Forward a page. Ctrl+(- or +) Increase or decrease the font size, pressing '-' will decrease and '+' will increase. Ctrl+0 will reset
back to default. Ctrl+D Add a bookmark for the page currently opened. Ctrl+F Access the Find option, to search for any text on the currently open web page. Ctrl+H View browsing history. Ctrl+I Display available bookmarks. Ctrl+J Display the download window. Ctrl+K or Ctrl+E Move the cursor to the search box. Ctrl+L Move cursor to address box. Ctrl+N Open New browser window. Ctrl+O Access the Open File window to open a file in Firefox. Ctrl+P Print current page or frame. Ctrl+T Opens a new tab. Ctrl+U View a web page's source code. Ctrl+F4 or Ctrl+W Closes the currently selected tab. Ctrl+F5 Refresh the page, ignoring the Internet cache (force full refresh). Ctrl+Enter Quickly complete an address. Ctrl+Tab Moves through each of the open tabs. Ctrl+Shift+Del Open the Clear Data window to quickly clear private data. Ctrl+Shift+B Open the Bookmarks window, to view all bookmarks in Firefox. Ctrl+Shift+J Open the Browser Console to troubleshoot an unresponsive script error. Ctrl+Shift+P Open a new Private Browsing window. Ctrl+Shift+T Undo the close of a window. Ctrl+Shift+W Close the Firefox browser window. Shift+Spacebar Moves up a page at a time. Ctrl+Shift+Tab Moves through each of the open tabs going to the left. Ctrl+Left-click Open a link in a new tab in the background. Ctrl+Shift Left-click Open a link in a new tab and switch to the new tab. Ctrl+Page Down Open the browser tab to the right. Ctrl+Page Up Open the browser tab to the left. Spacebar Moves down a page at a time. Shift+Spacebar Moves up a page at a time. Home Go to top of page. End Go to bottom of page. Alt+Down Arrow Display all previous text entered in a text box and available options on a drop-down menu.
*Shortcut List Source: www.computerhope.com
BUSCADOR 2.0 OSINT L INUX DISTRO
Installation Notes (2.0)
You will need a Virtual Machine application in order to use this system. VirtualBox is free and will suffice for most investigations. Some users prefer a more robust option with VMWare Workstation for Windows or VMWare Fusion for Mac. Any of these options will get you started.
VirtualBox Installation and Configuration:
* Make sure you have latest version of VirtualBox and VirtualBox Extension Pack installed 1) In the VirtualBox menu, click on File > Import Appliance 2) Navigate to the OVA file that was downloaded (Buscador) 3) Choose this file and select “Import” 4) Before starting the new machine, highlight it and choose “Settings” 5) Under General > Basic, rename this machine as desired (Buscador?) 6) Under General > Advanced, change Shared Clipboard to Bi-Directional 7) Under System > Motherboard, increase the RAM if you have ample resources (half of total system) 8) Under Display > Screen, increase the Video Memory to 128MB is available 9) Under Shared Folders, click the “plus” on the right, choose folder to store evidence, select “Auto-Mount” 10) Click “OK” twice, then launch the new machine (Double Click) 11) Upon boot, log into the user “osint” with the password of osint 12) In the VirtualBox Menu, select Devices > “Insert Guest Additions CD Image” 13) Click “Cancel” when the dialogue box pops up. 14) Open Terminal (Tilex) 15) In Terminal, Create a directory on the Desktop titled vbox: mkdir ~/Desktop/vbox 16) Copy everything from the CD media on the Desktop to vbox folder (copy/paste) 17) In Terminal, input the following commands:
cd Desktop/vbox chmod +x *.sh ./autorun.sh (type password when prompted)
18) Allow the image to be installed, and reboot upon completion. 19) Start the Terminal in the new VM and type sudo adduser osint vboxsf 20) Provide the password as needed (osint) 21) Reboot
You should now have access to the shared directory in order to save data to the host operating system (evidence). It can be found in the File Manag- er (Home), on the left column, titled “sf_” followed by the name of the folder to which it is connected. This shared folder will also be on your desk- top for easy access. You can make the machine full-screen, copy and paste text to and from the image, and you are ready to begin using the applica- tions.
Support & Updates
Open Tilix (Terminal), and enter the following commands:
NOTE: Update_scripts no longer needed!
Video Download Update: sudo -H pip install –upgrade youtube-dl
Spiderfoot Update: cd /opt/spiderfoot git reset –hard git pull sudo reboot
