Stanford University
Movies: Documentation
Gio's Movie files: DOC
Movies Database Documentation
This the Master file on [brambles]/MyDocs/mov/doc.html; copyright 1996, 1997, 1998 Gio Wiederhold
Loaded into DB web files 25 September 1997.
Updated December 1998.
This material was entered by
Gio Wiederhold
The initial objective was as test data for students, to allow non-trivial
exercises. There are fields suitable for complex joins, outerjoins, temporal
joins, and recursion. It has also been used to investigate object-structured database technologies
This is a HTML version of the description of the movies database.
It was based originally on the description from the Kamens and Wiederhold temporal paper.
It also contains literature references in appendix A.
1. Introduction
The database was used originally in implementing the temporal queries in SQL is Wiederhold's movies database (see [21]) and for a number of other projects, both in Stanford classes and by research at the Un.of Maryland and others.. The database allows testing of theories about an implementation. Since it contains "real data", it allowing checking of results for semantic as well as syntactic correctness. Finally, the tables in the database consist of both journal and history relations (see 21, section 3), allowing us to test all of the different elements of an implementation.
The Overview section (2) below contains a summary of all the files and the URL pointers to them, or their parts or sections; see Section 3 for a full schema description.
A number of the fields are unsuitable for relational implementation, because they contain sets of values. These are best used in an object-capable implementation. The schema describes when such sets are to be expected. Encodings for several of the fields are listed in Section 4.. At the end of this document, in Appendix B., are some conversion rules for converting the HTML format to fields for relational or object databases.
2. Overview:
The central file (MAIN) is a list of movies, each with a unique identifier. These identifiers may change in successive versions. The actors (CAST) for those movies are listed with their roles in a distinct file. More information about individual actors (ACTORS) is in a third file. All directors in MAIN are listed in a fourth file (PEOPLE), with a number of important producers, writers, and cinematographers. A fifth file (REMAKES) links movies that were copied to a substantial extent from each other. The sixth file (STUDIOS) provides some information about studios show in MAIN. This documentation file provides supplementary information, and is an essential part of the database. Some images are also available, but not on-line now. There are many cross-linking names throughout the files, so many in fact that web browsers have choked when trying to make them live. Currently we simply try to be careful in naming films and people consistently. More detail follows below. Counts have been made using emacs `count-matches’.
2.1 -- MOVIES --
The main movies file is
main.html.
It contains 11435 `tr/td’ table row entries (July 1999).
It is complete for Hitchcock movies and TV shows,
and has most films that can be related by topic, actors, director, history, and such to those films.
It has also many films corresponding to a variety of interests of mine, former students etc.,
with an emphasis on historical value.
MAIN is the "central" relation in the database; it is a journal relation that
contains information about the movies themselves. The relation contains
information about movies such as their titles, type, directors, and producers, as
well as their year of release.
There is also a subset file of 82
Hitchcock
full length Movies.
2.2 -- ACTORS -- The file
actors.html has
6813 `tr/td’ table row entries (July.1999) for many of the actors appearing in CASTS.
Also First part and
Second part of actors.html, not maintained.
The key of the relation is
"stagename", and there are intervals indicating the dates that the
actor worked and the actor's lifetime. Other information in this relation is
the actor's real name, background, and the type of roles he/she typically
plays. References to images are kept here too.
2.3 -- DIRECTORS -- in file
people.html
The file lists all directors, as well as some other movie people, as producers and cinematographers
(A total of 3290 `tr/td’ table row entries, 3011 `@’ directors as of July 1999).
The directors table is similar to the actors table in that it contains intervals for when the director worked and when he/she lived. The key of the relation is the field "name", which is the name under which the director directed. Director's key names do not contain any blanks. Typically the last name is used, when needed prefaced by an initial. A secondary unique key is defined for each director, up to three letters, based on the initial letters of the first, middle, and last names. This key will provide HTML HREF linkages among many of the files. As with the actors table, this table also includes the real name of the director among its fields ("lastname" and "firstname"). It also contains important producers, cinematographers, musicians, composers, etc.
2.4 -- STUDIOS -- in file
studios.html
are important studios only (203 `tr/td’ entries, sparse information).
The key of the studios relation is the name of the studio. The temporal
information that is included is an interval indicating the years the
studio was (or is) in operation, represented by the fields "startdate" and
"enddate". This is a history relation.
2.5 -- CASTS -- in file
casts.html
This is a large (too large?) file of who acted as what in which movie.
(46 009 tr/td entries, only partial for movies and roletypes, July 1999).
Casts is an association relation, linking actors with movies. The key of the relation is the catenation of the two fields "film_id" and "actor"; no temporal information is included in this relation.
This file was too big for Netscape in 1996. There is a smaller subset, but that one is not kept up-to-date.
casts of Hitchcock films;
2.6 -- REMAKES -- in file
remakes.html
(1278 `tr/td’ entries in July 1999).
This table (which is not extensively used in the temporal DB paper) gives information about movies that are remakes of other movies. It is very useful to test recursion in databases.
2.7 -- SYNONYMS --
Some movies are known by alternate titles, and can be accessed indirectly.
This file has been superseded. All entries are now in the main file, as Alt(T:…) in the notes field.
It’s access was synonyms.html, and it contained 379 entries in Sep 1997) This list relates to the MOVIES in main.html.
2.8 -- QUOTES -- in file
casts.html
QUOTES.
A few (26) memorable quotes from movies are listed in
quotes.html.|1.
2.9 -- AWARDS -- in file
actors.html.
Types of awards and the awarding agencies are in
awtypes.html.
2.10 -- AWARDS-RECEIVED -- is no longer a distinct file
Awards received for special occasions are listed with individual entries
in the files for ACTORS (actors.html).or MOVIE PEOPLE (people.html).
Regular awards associated with a particular movies are given in
CASTS (casts.html).
2.11 -- REFERENCES --
Books that provided material for this database are listed within this documentation file as
Appendix A.
2.12 -- GEOGRAPHY --
Codes for countries and origins are listed within this documentation file as section 4.3:
doc.html GEO.
2.13 -- CATEGORIES --
Codes for movie categories are listed within this documentation file as Section 4.4:
doc.html CATS.
2.14 -- COLOR-CODES --
Codes for color processes used for movies are listed within this documentation file as Section 4.5:
doc.html COLS.
2.15 -- ROLE-TYPES --
Codes that specify role-types for actors are listed in the preamble for
casts.html ROLES.
2.16 -- FIELD-IDENTIFIERS --
Codes that identify subfields
in various files are listed within this documentation file as Section 4.2:
doc.html
FIELDS.
2.17 - AWARD TYPES --
Lists the award types
used in MAIN, ACTORS, and PEOPLE,
with the organizations who award them, and the span of years they were awarded.
2.19 -- IMAGES --
there is a small collection of .tiff files for actors and directors.
They are kept individually in an images subdirectory.
2.20 -- ICONS --
There are about a dozan icons to be used to identify
subfiles. Some of them come from the New Yorker Magazine Jan.1993.
There are kept individually in an icons subdirectory.
3. Schema Definition for the Movies Database
Here we give a detailed description of the schema of the movies
database, which is used for all examples in this paper and was used to
implement the temporal SQL additions. General descriptions are given
in Section 2, above.
This file is being updated to desctribe the HTML version. Where
updates were made, the old material is in curly {brackets}.
3.1 The MOVIES Table
Col-Name = Description.
There is a distinct table for each director (Hitchcock has multiple tables,
one for early silent, one for British, one for American, and one for TV movies).
The tables are broken up p by year of first known film by the directors. There are some
break and header records for each year.
Each director table has two types of records:
one header record for the director, with the director id, as shown in people,
the first year known for movies by that director, prefixed by an @ symbol, matching the
people entry, and the standard name for the director, also matching the people entry.
The remainder shows the format for the data records that follow below..
The note field is often used to describe the set of detail records
For movies where the director in not known there is a dummy entry, either by topic (Unknown) or
by year (UnYear), as shown in the people file.
any number of records, one per film, formatted as shown below.
film_id = An internally generated id for the film. This is the key of the relation entries and is unique. It is composed of director_id and a sequence number. .
All movies of a director are listed together in sequence, but only for some directors have all movies been entered..
The sequence numbers often have gaps to allow insertions when all movies for this director were not known at entry time at time of entry, a common occurrence.
title = The film's title. It is preceded by T: or Tn: (earlier also \Tm,\Tmm) depending on the source of the data. This field is not neccessarily unique. In a few cases, where the title is unknown, NKT stands for `No Known Title’.
year = Year the movie was released. This is assumed to be an event (i.e., to take zero time)
director = Director of the movie, preceded by D:.
The standardized id-name is used. All directors must appear as DIRECTORS
in people.html, so that we have a proper reference constraint.
If there are multiple candidate directors the primary one or the one
who finished the movie is chosen and other candidates are given in the
notes field as CoD().
producers= Producer(s) of the movie, preceded by P: if shown in
people.html and hence referencable by id_name .
P: alone shows that there was no specific producer.
If prefaced by PN: then the full name(s) is(/are) given;
if prefaced by PS: then the spelling is uncertain.
In both cases no reference to people can be expected..
PN: is common, since only few id-names for producers exist yet in the people.html files, except for producers who also were DIRECTORS
:PU alone means the producer is unknown to me.
Multiple producers are permitted and common.
studios = Studio(s) where the movie was filmed. Common studio names appear in STUDIOS. If the studio is not known or uncommon its location may be given as SL:{COUNTRY-CODE}.
Unknown studios are prefixed by SU:
sometimes the distributing studio, where the distributor differs from the production studio, is shown prefixed by SD:.
prc = Process used to make the movie (e.g. black and white as `bnw', col).
Color processes may by specified as \COLOR-CODES. The code `cld' is used for black-and-white movies that have been colorized.
Unknown is coded prc.
cat = Category of the film (e.g., suspense, mystery), as given in the
list of CATEGORIES. Unknown is coded Ctxx.
awards = Awards received by the film, separated by commas. The awards are
listed in AWARD (optionally followed by keywords such as 'Special')
and included actual awards as well as favorable (mostly) mentions
in compendia as Halliwell and Roger Ebert's books, with the
appropriate number of stars. A + symbol is a half star and
a - after the awardee code indicates a negative mention.
Unknown is coded aw. -H means not in Halliwell [4].
lc = Location where the film plays. Multiple locations are separated
by semicolons (;), multiple levels in any location hierarchy are
separated by commas, as `high-school, csd, CA'; indicating movie
location is a California high-school in the countryside. Codes used are
listed in the preamble of main.html. For countries other than the
USA the country name is given as well. Alternatives for country
names are `space' or `xxx ocean'. Unknown is lc.
If the period of the film is significant it is given as
T([[dd]mmm]yyyy).
notes = Here a variety of notes is kept. The preferred order is
chronological, as Book before Writer before Cost before Rating,
but this has not been entered consistently.
. All entries have a FIELD-IDENTIFIER
designator, as W(writer), R(rating), ... .
Fields as writers can have multiple entries, separated by commas.
If an award is associated with an entry, as an academy award for the
writers of a movie, it follows the name(s) after a semicolon (;).
For authors (also music directors), the title is specified as
B(author:book: "title")
Alt(T:title; reason) lists alternate titles, with a reason or date for the change, if known.
There is a general Notes field (Nt) which mainly record firsts,
as first sound movie, etc.
. Er() means possible error in the record, to be checked sometime in the
future from some source..
These note fields can be used to demonstrate the flexibility of
object-based structures, but are best place in distinct fields in relational
models.
Note entries SEEN and VT are private, indicating `when seen' or `have video
tape' information.
3.2. The ACTORS Table
There is one record for each actor listed, but not all actors listed in CAST are documented.
There are also break and header records for each letter of the alphabet.
Col-Name = Description
stagenm = Stagename of the actor. This is nearly the key of the table.
When an actor has used multiple names the last one used is preferred.
There are a few actors with identical names. Then the birthyear
(dob) becomes important.
dowstrt = Beginning of the "dates of work" interval: year of first movie
dowend = End of "dates of work" interval.
birthnm = Original last name.
firstnm = Original first name. Nick-names or other assumed names
in ().
gender = coded as M,F, and X for unknown, G for group, and A for Animal.
dob = Date of Birth. If not found in [ref]. If found, but date unknown *.
dod = Date of Death, if unknown or alive coded as \UN. Year+ indicates
also still alive in that year, mainly used for oldies.
type = Types of roles played by the actor; e.g., leading man, hero.
origin = Country of origin using COUNTRY-CODES
photo = Photos in reference books may be cited as [book.page(s)]
notes = Used mainly for Marriages(Mt), Lived-with(Lw), and Worked-with(W).
A code Cit(n) indicates how frequently the actor is cited in
CASTS.html. This field is used for maintenance, as a weight
of importance for completion of the data.
3.3 The PEOPLE Table
Directors are the major subset of the general people.html table.
Other entries are significant producers, writers, art directors and some authors.
Being a director is indicated in the Pcode field, and has some effect on
other fields.
There are also break and header records for each letter of the alphabet.
Col-Name = Description
id-name = The name of the movie person in standardized form.
These names are made to be unique. Intials may be prependended, and
special character codes omitted. This field is referenced by the
"director" field and by P:{references} in the MOVIES table.
Pcode = Codeletters {PDWACGV} indicating that the movie person a Producer, Director, Writer, Actor, Cinematographer, choreoGrapher, or a Visual or art director. They are often arranged chronologically.
Just being an actor does not justify an entry here, for those see the ACTORS table.
Did = If the person is a director (Pcode includes D), then this field contains an internally defined, unique 1, 2, or 3 letter identification code for the director, the director_id. It is made up by taking one or two letters of the first name, no or one letter of the middlename and one or two letters of the family name of the director. Because of the high frequency of `John', it is encoded as `I'.
This code is used a prefix to generate unique film_id's for all films directed by this director.
yearstart= First year of work, for directors the first year he/she directed a movie, it is preceded by a @. (start of the years interval).
yearend= Last year of work, or that the director directed.
lastnm = Given last name of the movie-person, may be spelled more precisely here than in the id-name field.
firstnm = Given first name of the movie-person. Nick-names or other assumed names in ().
dob = Date of Birth. If not found in [ref]. If found, but date unknown *
dod = Date of Death, or 190x, to indicate when known to be alive
backgrd = The director's birth country. If unknown \Un.
notes = This field is as in "actors". Female movie-people are identified as Ge(F), as a partial index. Special awards (not associated with a film) are shown as Aw().
3.4 The STUDIOS Table
Col-Name = Description
----------- -------------
name = Short name of the studio, may be standarized for reference.
company = Company that owns the studio.
city = City where the studio is located.
country = The studio's country.
fddate = Date the studio was founded or first opened.
enddate = Last date represented by the studio.
founder = The studio's founder.
successor = The fate of the studio.
notes = co founders, etc.
3.5 The CASTS Table
The large CAST file is broken up into sections by initial letter(s) of the directors identifying code.
In each section will be a number of directors, ordered by code. There is a distinct table for each director.
Col-Name = Description
Each director table has two types of entries
one header record giving the director's id, name, and format information.
multiple records for each movie and listed actor. There are no headers for distinct movies.
film_id = Identifier of the film. All film_ids used here appear in
MOVIES, and can be used as references.
title = Title of the movie, prefixed by T.
A prefix of TZ is used when he entry is uncertain, as the actor's name.
The title field is actually redundant here, because
is also given in MOVIES with the film_id. It is used to reduce the
the chances of errors and to reduce requirements for "joins".
actor = Name of the actor in this role, always using the standardized
stage_name, if the actor is listed in ACTORS.html
This field presents only a partial reference, illustrating dangling pointers.
If unknown, but role is important, then `sa' is used for `some actor'.
roletype= Type of the role. Similar to the "type" field in the ACTORS
table, but always encoded by a ROLE-TYPE. \Und means unassigned.
role = short description of the role prefixed by R:
If the trole is uncertain, the RZ: is used as the prefix
If the name used in the role is significant (as in Biographical Movies),
this role name follows in “quotes”, as R:king “Henry V”
If only the role name is known, then the prefix is RN:
If the role is unknown, then only RU: is entered.
awards = Awards given to this actor for this role. Optional field.
notes = Rarely used; only for something exceptional in the performance, as
`Nt(Garbo laughs)'. or Debut
3.6 The REMAKES Table
Col-Name = Description
film_id = Identifier of the remake. All film_ids used here appear in MOVIES.
title = Title of the remake; redundant., but essential for maintenance
year = Year when the remake was made. Note that this MUST be after the
year the original was made.
part = A fraction indicating how similar the remake is to the original.
The semantics used appear in the preamble of remakes.html
wasfilm = Identifier of the original film. All film_ids used here appear in
MOVIES; \UN is used where the film has not been identified,
wastitle= Title of the original movie, also redundant.
wasyear = Year of the original.
3.7 -- SYNONYMS table
This file has been superseded. All entries are now in the main file, as Alt(T:…) in the notes field.
Col-Name = Description
film-id = Unique film identifier
s-title = > Secondary, synonymous, title >
s-country = Country for secondary title
p-country = Country for primary title
p-title = Primary title; redundant
3.8 -- QUOTES table
Col-Name = Description
----------- -----------
film-id = Identifier of film where quote was taken.
title = Redundant title.
speaker = Actor selivering the quote.
role = Role of quoted actor.
listenr = Role of adressee.
quote = Text of quote.
3.9 -- AWARDS table
Col-Name = Description
----------- -----------
award = code of award used
agency = awarding agencies or authors
place = location or reference where award is given
3.10 -- AWARDS table
Col-Name = Description
----------- -----------
recepnt = Movie-person receiving the award, stage-name or id-name
award = Award type
year = Year awarded
reason = This table is mainly for awards not associated with films,
so that reasons my be `lifetime', `honorary', etc.
notes = as needed.
3.11 -- REFERENCES table
Col-Name = Description
----------- -----------
no. = Unique reference number in []
author = Author(s) name(s), up to colon (:)
title = Title of book
pub-inf = Publication information.
3.12 -- GEOGRAPHY table
Col-Name = Description
----------- -----------
code = Two- or three letter code for each country, as \Hu.
See 4.1 for the encoding used.
country-name = Full name, as `Hungary'
c-adjective = Adjectival form, as `Hungarian'
3.13 -- CATEGORIES table
Col-Name = Description
ctcode = Four-letter code
Movie categories
codecategory|codecategory|codecategory |
Ctxxuncategorized|
Actnviolence|Advtadventure|AvGaAvant Garde|
Campnow - camp|Cartcartoon|CnR Cops and Robbers|
Comdcomedy|
DisaDisaster| Docudocumentary| Dramdrama|
Epicepic|Famlfamily |Histhistory | Horrhorror|
Muscmusical|Mystmystery|
Noirblack|
Pornpornography|Romtromantic|
ScFiscience fiction|
Surlsureal|Suspthriller |
Westwestern|
3.14 -- COLOR-CODES table
Col-Name = Description
----------- -----------
color-code = Code following \
full = Full name for color process used for movies,
prc = unknown process, likely bnw prior to 1945, col after 1955
bws = black and white silent
bnw = black and white
TV = film made for TV – various processes
col = unknown color process
cld = recolored black and white
specific color processes
\Tcol = Technicolor – the best – three distinct layers
\DeLuxe = DeLuxe – a high intensity low-cost color process
\DuArt = DuArt
\Ecol = Eastmancolor by Kodak
\Mcol = Metrocolor – used by MGM
\Mlab = Movielab
\Wcol = Warnercolor used by Warners
\Acol = Anscocolor – another color process by Kodak?
\Agcol = Agfacolor
\Fcol = Fujicolor
\CS = Cinemascope -- widescreen, mostly color, 72mm film
\Trama = Technirama -- widescreen color
\Pan = PanaVision -- compressed wide screen, only noted at times.
\Vst = Vistavision
They are listed in the preamble for movies.macros.
3.15 ROLE-TYPES table
Col-Name = Description
----------- -----------
role-codes = Three-letter macro codes to specify role-types for actors.
They are listed in the preamble for casts.html.
Und means unassigned.
3.16 FIELD-IDENTIFIERS table
Col-Name = Description
----------- -----------
codes = Codes identifying subfields in various files.
They are listed below in Section 4.4..
Some general codes, appearing in many files, are:
Nt(note about something unusual)
Er(Possible error in some field, to be checked)
4: CODE TABLES
A fair amount of the information is encoded for consistency of reference.
Directors' names are always treated as codes, and many other movie people as
well. Names with out spaces are codes, and can be found in the PEOPLE relation.
Actors names are treated as codes as well, although here first names have
been retained. Many actors can be found in the ACTORS relation.
All movies have been assigned a code by catenating a director's identifier,
found in the PEOPLE relation with sequence digits.
Several code tables appear below, other used are
Remote code tables
-- ROLE-TYPES -- Codes that specify role-types for actors are listed in the preamble for
casts.html ROLES.
Local code tables
4.1 -- FIELD-DESIGNATORS --
These codes are used in certain filed to further identify the contents.
Check this, much changed when moving to HTML.
codedefinition|
T:film title|
T2: redefinition of title in
main.html SYNS.|
T3: title used for locale file in MAIN |
T4: title used for License plate list in MAIN|
T5: title |
T6: title used in
casts.html SAYINGS.|
T6: title used in
quotes.html.|
TS:not sure of actor spellingused in CASTS obsolete |
TZ:title from Movies-dir.html|
TZ:not sure if actor in this filmused in CASTS|
P: producer in PEOPLE|
PN: producer full name not yet classified|
PU: unknown producer not yet classified|
St: listed studionot yet consistent|
SN: studio name|
SU: studio name unknown|
SL: country or city of studio|
SD: distributor old codes: Dtr, Ds, Dis{\Dtr|
R: roleused in CASTS|
RZ: role uncertainused in CASTS|
RU: role unknownused in CASTS|
RN: only name in roleused in CASTS|
RS: spelling of actor's name unsureused in CASTS|
D: ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- bubblesworth public library stephmansolf
- star wars breaks box office records breaking news english
- toss ups by jeremy white and ben lea
- genre analysis of film television with clips illustrating
- nebuchadnezzar houston was born on may 23 1952 and
- stanford university
- magnolia pictures independent films documentaries
Related searches
- stanford university philosophy department
- stanford university plato
- stanford university encyclopedia of philosophy
- stanford university philosophy encyclopedia
- stanford university philosophy
- stanford university ein number
- stanford university master computer science
- stanford university graduate programs
- stanford university computer science ms
- stanford university phd programs
- stanford university phd in education
- stanford university online doctoral programs