Juniata College
Friday, February 28, 2020Name ______KEY_________________Python data concepts. Short answer.[15 pts]For each of these Python data structures, describe their basic structure, purpose and inherent limitations, and if each structure is immutable or not.StructurePurposeLimitationsImmutable?Tuple ():Collection of values held in a particular orderIndexable with [i]Can represent a row, a record or a collection of return values. (any single purpose is accepted)Any value type including structures can be elementsImmutable. Fixed in size tooYesList []:Same as tuple but you can change any element, remove, insert, append elements.Can represent any collection and allow changes to the its organization and contentRather unlimited, must be indexed by integers starting at zeroNoDictionary {}:Collection of key:value pairs for fast lookup by the keyA collection indexed by anything, not just integersKeys must be uniqueNoUnix. Give the command do the task as described.[15 pts]___ls____ list the file names in your current working directory.__ls -la_ list the files and hidden files with all their details.___cd ..__ change your working directory up one level__cat small.txt (more ok)_____ display the contents of the file small.txt___more huge.txt_______ display the contents of the file huge.txt so that you can look at it screen by screenWhat key displays the next screen? _spacebar____ What key do you press to go back one screen? __b___ And how do you exit this command, especially if the file is huge? ___q________wc weather.csv__ show how many lines and characters there are in the file weather.csv.__grep Fog weather.csv_ list all the lines in weather.csv with the word Fog in it.____tail -n 15 huge.txt_____ list the last 15 lines of huge.txtFor the transforming code below fill in the skeletal code to read a csv file and convert all empty cells (‘’) or ‘unknown’ entries to ‘NA’. Remove the first value of each line as it’s just a row id number. There are not meant to be errors in the code.[15 pts]import csvdef transform3(inFile, outFile): inf = open(inFile,"r", encoding="utf-8") outf = open(outFile,"__w__", encoding="utf-8") csvinf = csv.reader(_inf__) csvout = csv.writer(__outf__) header = True for line in csvinf: if header: #fix header info headers = line headers.pop(__0__) #row idnumber deleted csvout.writerow( __headers_ ) header = ___False__ _else : __line__ . pop ( _0__ ) for i in range(len(_line_)) : # we have to index the list to make changes if str(line[i]) == _’’___ or str(line[i]) == _’unknown’___ : line[i] = __’NA’____ csvout.writerow(__line_) inf.close() outf.close()Python has a powerful string formatting capability. Fill in the chart explaining the meaning the various options available in this feature when “ {n: options} “.format(a,b,c,d,…..).[10 pts]Symbolmeaning{n}The n refers to: the n-th value in the argument list (a,b,c…)w.dw= width in spaces and d= number of decimalsss denotes formatting the item as a stringd,b,o,x or XThese are options to format the item as a ____integer____data typef or eThese are options to format the item as a ___floating point____data type<Does what to the data in the field? ____left justify_________>Does what to the data in the field? ____right justify___^Does what to the data in the field? _____center________Hadoop system concepts.[17 pts]Give a typical file size that is worthwhile placing into a Hadoop system. ___Gb or Tb___[2]A file is broken into what typical size in Mb? ___128____ [2]Why are file blocks replicated in HDFS? Give two reasons. [4]- reliability - blocks can be more easily processed in parallelGive a brief explanation of mapping. [3]each data item is screened out or transformed to some other values and passed on to reduceGive a brief explanation of reducing. [3]distill the data to totals or summariesBetween the mapping and reducing steps, there is one other operation. What is it doing? [3]Ensuring one node gets the collection of like keys across the systemXML and JSON files types. [9 pts]How are these file types similar? That is, what do they provide beyond a CSV? [3]they provide a means to represent a hierarchy of data What does self-describing mean in these file types? [3]each value is tagged with what would be the column header.How is XML more “wordy” than JSON? [3]<tag> data </tag> versus tag : dataSQL Select. Give a SQL query to generate the result set for the description [19 pts]Assume the schema of relations as we’ve used in class. Below is a reminder of the SQL syntax.Student (StuId, LastName, FirstName, major, credits)Faculty (FacId, Name, Dept, Rank)Class (ClassNum, FacId, Schedule, Room)Enroll (ClassNum, StuId, Grade)Department(DeptId, Name)Quick syntax for SQL, where [] means optional, {op1|op2|...} means choiceSELECT [DISTINCT] {* | attribute-list | aggregate functions}...FROM table {, table | NATURAL JOIN table | LEFT OUTER JOIN table {USING(attr) | ON condition}}*WHERE condition[GROUP BY attribute-list [HAVING condition]] [ORDER BY attribute]SQL conditions consist of <,>,<=,>=, <>,=, IS [NOT] NULL, AND, OR, BETWEEN value AND value, IN (SELECT…)Aggregate functions: COUNT(* | [DISTINCT] attr), MIN(attr), MAX(attr), SUM(attr), AVG(attr)Generate a table to list the CSC faculty (all attributes) sorted by name. [4]SELECT *FROM facultyWHERE dept=’CSC’ORDER by nameAssume each course is 3 credits.?Get a table of the number of courses each student, by name,?has earned credits from.?Give the computed column header a name. [5]SELECT lastname, firstname, credits/3 as courseCountFROM StudentGet a table of Math major’s student names and their courses taken. [5]SELECT lastname, firstname, classnumFROM Student s NATURAL JOIN Enroll e [FROM student s, enroll c]WHERE s.major = ‘Math’[AND s.stuid = e.stuid] – if didn’t use NJGet a table of all faculty, their names and the number of courses they each teach (need to outer join and group by). [5]SELECT f.name, count(classnum) as NcoursesFROM faculty f LEFT OUTER JOIN Class c USING (facid)GROUP BY f.facid ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- solution of final exam 10 701 15 781 machine learning
- pandas
- cheat sheet pyspark sql python lei mao
- 1 apl6 common substrings of more than two strings
- exploring data using python 3 charles r severance
- spark cheat sheet stanford university
- sorting codility
- with pandas f m a f ma vectorized a f operations cheat
- an optimal algorithm for the distinct elements problem
- data wrangling tidy data pandas python data
Related searches
- college board college code lookup
- college board average college cost
- college ceeb codes college board
- college board college search
- college board college search tool
- college board college search engine
- college board college application waiver
- college football college football recruiting ranks
- college search engine college board
- college board college application checklist
- college scholarships for college 2021
- college board college matchmaker