Question 1: Judgement Call - Anand Natrajan



Interview Essay QuestionsAnand Natrajan4th April, 2016 TOC \o "1-3" Question 1: Judgement Call PAGEREF _Toc321370270 \h 2Question 2: Inventive Thing PAGEREF _Toc321370271 \h 5Question 1: Judgement CallMost decisions are made with analysis, but some are judgment calls not susceptible to analysis due to time or information constraints. Please write about a judgment call you’ve made recently that couldn’t be analyzed. It can be a big or small one, but should focus on a business issue. What was the situation, the alternatives you considered and evaluated, and your decision making process? Be sure to explain why you chose the alternative you did relative to others considered.About 6 months ago, we began work with a partner who would send ad requests that our platform would have to fulfill by serving advertiser ads. One requisite of this integration was that the partner would “tell” us which advertisers were not permitted to show ads for a particular request. For any given request, the “blacklist” would be no more than half-a-dozen advertisers. Our platform could pick any advertiser as long as we honoured the advertiser blacklist for that request.A key question that arose in that integration was the protocol for informing us about the blacklist. The partner’s proposal was to create a dictionary containing unique numeric codes, one per advertiser. At ad request time, the partner would populate a request parameter with a list of such codes, thus “telling” us about the blacklist. I argued against that proposal on the grounds that the implementation would become too complex. My alternative proposal was for the partner to identify advertisers by normalised domain name, e.g., “Nike” would be identified as “”, and “Amazon” as “”, not a numeric code.We had limited time to finish off the integration, so there was not much time to analyse the pros/cons of each proposal objectively. Even if we had the time, it’s hard to see how one would analyse both proposals. It’s not feasible to build both in parallel and see which takes longer, which requires more maintenance over time, which is more error-prone, etc.In my judgement, my proposal was (and is) the better one. As a matter of fact, I was able to carry the day on that integration, and it was built as I proposed. My judgement was informed by experience and some philosophy.Some years earlier, when RTB protocols were still being developed, our ad serving platform integrated with Google AdX to receive bid requests. At that time, Google wanted to build a protocol whereby it could conduct ad filtering on bid requests. In order to do so, bid responses from platforms like ours had to “tell” Google the advertiser on whose behalf we were sending bid responses. I worked with Google on some of the use-cases involved in that project. We finally ended up selecting domain name as the vocabulary for identifying advertisers.That experience led me to consider the vocabulary of parameter values in API calls. I believe the vocabulary should be the one that is most universal and most unambiguous. Such a vocabulary makes implementations of APIs lightweight.Specifying an API call is but the first step in making different services interoperate. Anyone can specify API calls, and there are plenty of tools that aid creating API skeleton and stub code. The more difficult part is trying to understand what the values for the tightly-typed parameters actually mean. Sure, the API spec says that the parameter “country” contains an “integer”, but what are the values of those integers? What are the permitted ranges? What does each value mean? How will the values change? How will we know when the values change? In the end, syntactic compliance turns out to be the least challenging part of using an API; semantic comprehension dominates the effort. As the effort becomes more onerous, the time to implement what seems like a simple API call grows longer, and the customers of the expected interoperation grow more impatient.The usual recourse in such cases is to create a dictionary that translates those API parameter values into something human-readable. However, the dictionary is often more pain than relief. First, it’s often an afterthought (“oops, I guess we have to create one, else how will the guy on the other side know?”). Second, the dictionary itself is now a contract, albeit a hastily-slapped one. Usually, developers agree on “out of band” communication about how the dictionary itself will be exchanged (what format, what cadence, what protocol, etc.), or they construct another set of meta-API calls for the dictionary, with an attendant meta-dictionary for those calls, and so on. Third, the proliferation of dictionaries makes the API “heavy”. What seemed like a fun-filled HTTP call with a protobuf payload now becomes a drudge where each parameter has to be looked up against that parameter’s dictionary, all the while ensuring that each dictionary is the latest version available.We can do better. We should select parameter values that can be understood organically by the widest audience, yet be precise. For example, the values of “country” in any API spec I control are usually ISO standard, almost always, ISO 3166-1 alpha-2. The codes are reasonably readable, e.g., “US” is United States of America, “IN” is India, “TL” is Timor-Leste, etc. Contrast that level of readability with the values in the sister standard, ISO 3166-1 numeric (840, 356, 626). Of course, the alpha-2 codes are not as readable as the country names themselves, but country names can be ambiguous (“Georgia”, “St. Martin”, “Republic of Korea”, the various “Guinea” countries), can change (“State of Palestine” was “Occupied Palestine Territory”), and can be specified differently (“USA”, “United States”, “America”, etc.).Similarly, an API spec that requires a “TLD” (top-level domain name) should simply use “.com”, “.edu”, “.co.uk”, etc. (IANA standard). If we need “language”, use locale-style codes (ISO 639-1). So far, this sounds just a paean to using standards, but some values do not have hard standards, which is where the judgement starts to matter more. Consider dates, and imagine a situation where passing a UTC timestamp is insufficient; the value must be an easily-printable date. Should we use the American standard, e.g., 08-30-1971? Or what most of the world uses, e.g., 30-08-1971? I prefer using either 30-Aug-1971 or, if the dates must be sortable, then 1971-08-30. Yes, we could annotate the value with the format (“yyyy-MM-dd”), but we can be precise with fewer words. Consider a parameter like “browser”, where there is no clear standard. I prefer using enumerations like “chrome” and “ie_8” over obscure codes. Even without a dictionary swap, the enumerations make the values obvious. A simple namespace can distinguish between potentially-conflicting enumerations, e.g., “browser:opera” vs. “genre:opera”.When making a judgement call I like to understand the limits of that call. One cannot insist that campaign identifiers be logged by name – there are way too many campaigns named “Test” to make that tenable. The best possible alternative is an id, and yes, often it is an auto-incremented database id that is meaningless without a dictionary. Likewise, one cannot insist that creatives be logged with their body, no matter how universal that may be. The size of the body and the presence of special characters make parsing a nightmare, and we must resort to ids. Moreover, the injunction to use universal parameter values is limited to the API alone. Forcing the internals of a platform to continue using these values can be overbearing and counter-productive. For example, a good set of values for “dayOfWeek” can be “Mon”, “Tue”, “Wed”, etc. But a platform that consumes these values may store them internally as a bitmask for efficiency reasons. Likewise, “hourOfDay” could be a bitmask. With some clever coding, even “browser”, “OS” and “device” enumerations could be stored as bitmasks. Integer operations are faster than string operations, so it is reasonable for a platform to convert country codes or TLD values into integers for its internal operations. My guidelines do not prohibit any of those optimisations. All my guidelines specify are the values when they “enter” or “exit” the platform.In my recent work, often someone else hands down an API spec to us, and there is not much we can do about it. Always, writing the first few lines of code to compile against the spec takes a few days, if not a few hours. Negotiation over the semantics can take a month, much to the frustration of the business stakeholders.In contrast, I control the spec between our campaign setup platform, our ad serving platform, our user data platform, and our reporting platform. Therefore, we try to operate under my guidelines. When the ad server logs impression features, it does so in universal yet unambiguous values, like I mention above. As a result, the downstream reporting team does not have to negotiate a dictionary for those values with the ad server team – simply using the values as-is suffices for rollups and reporting. The campaign setup team in turn, does not have to negotiate a dictionary with the reporting team – it can display reports as-is. Of course, the setup team can prettify the values for display if they choose to, but that decision does not affect the other teams. When I inherited these teams, they operated with a “shared dictionary” (in database tables). At that time, adding any feature required a massive round of coordination among the various teams, simply to agree on a new value for an existing parameter in the dictionary. Using more universal parameter values has decoupled these teams more, though the work is far from done. As a result, the teams are more agile, and we have fewer dictionary-based meetings.All of us prefer to make decisions objectively, based on evidence and a sound understanding of all of the facts and figures. But, as this question rightly postulates, some judgement calls are not amenable to analysis because of time or information constraints. At such times, we must rely on our wisdom to make the right decision. That wisdom comes from knowledge accumulated through experience.Question 2: Inventive ThingWhat is the most inventive or innovative thing you’ve done? It doesn’t have to be something that’s patented. It could be a process change, product idea, a new metric or customer-facing interface – something that was your idea. It cannot be anything your current or previous employer would deem confidential information. Please provide us with context to understand the invention/innovation. What problem were you seeking to solve? Why was it important? What was the result? Why or how did it make a difference and change things?Building tools and support for parameter-space studies has always been an enjoyable endeavour for me. In 2001, I built tools for initiating, monitoring and restarting parameter-space jobs on grid environments. In 2002, I built a tool for plotting the results of parameter-space studies. Someday, I intend to write a tool to generate parameter-space jobs given the parameters and their appropriate values.Parameter-space studies are a class of computational algorithms that rely on running a single job repeatedly with different input values. For example, imagine a program called topos that takes a parameter shape with values such as “square”, “circle”, “octagon”; a second parameter intensity with integer values from 0 through 256; and a parameter temperature with real values such as -42.6, 212.3, 32.6, etc. A user running topos with one set of values for each parameter expects the job to return one value each for output parameters height with real values, pressure with real values, and status with some enumerated values.Scientists often run programs such as this apocryphal topos with different, independent sets of input values. Clearly, the number of sets of input values is combinatorially explosive. One challenge scientists face is scheduling and running jobs for different sets of inputs. Mercifully, the various jobs are independent of each other, so can be run in embarrassingly parallel fashion. However, when scientists run several of them in parallel, the bookkeeping to track which jobs have finished, failed, running or yet to run, can become overwhelming. Next, collating all of the results and analysing them can be a major undertaking as well.Circa 2001, technologies such as cloud computing were in their infancy. At that time, I worked for a research project named Legion at the University of Virginia. Legion attempted to build a planet-wide “grid computer” by bringing together heterogeneous compute and data resources owned by disparate organisations and distributed geographically. Legion provided a tool to run a single job across an anonymous machine that was part of the grid. After I joined the project, I worked on that tool to improve its handling of queuing systems and input/output files.In talking to the users of the run tool, I noticed that they struggled to run large parameter-space studies. I discovered the rudiments of a tool for running such studies in the codebase, but it lacked any sophistication. I started from scratch and wrote a new tool, run-multi, for running such jobs. Over the course of several months, I augmented the tool to improve monitoring and restarting jobs.One of my first innovations was eliminating the need for the user to have to specify the universe of jobs. Users would assemble their inputs in files, and specify a file pattern to my tool. My tool would detect the total universe of jobs based on the input file pattern. From there, it was a short step to assemble outputs as well in files, specify a file pattern for those, and auto-detect which jobs were done. Now, users could restart the entire study easily – by simply running my tool again with the original parameters. When the tool started, it checked the inputs and outputs and automatically determined what remained to run. Doing just this much reduced the bookkeeping overhead on users dramatically, to near zero.Next, I had been working on a separate tool to probe remote jobs. I combined that probe tool with my run-multi tool to keep probing running jobs and getting detailed status on each job. I now had a way to report canonically on the status of jobs running on disparate machines. I used that status to launch new jobs using the next set of input parameters on machines that were soon going to be idle. I also eliminated the burden of scheduling each job; my tool exploited the underlying grid infrastructure to identify available machines on which the next job could run.With these under my belt, I wrote the only Tcl/Tk program I have ever written to create a dashboard of sorts to monitor jobs. The dashboard laid out the universe of jobs; showed status of any requested job; showed which jobs were completed, running, failed or yet to run; and presented the status changes for each job graphically. One of the happiest days in my professional life was in Spring 2001, when I launched hundreds of jobs on remote machines, grabbed a cup of coffee, turned on some music, and watched my dashboard show me how jobs flitted in and out of various stages of success and failure, starting and running and finishing.My tools were used to run several scientific parameter-space studies on our grid. One of those was a program that determined how a protein folded in a solvent under certain conditions (temperature, pH, etc.). My customer in this study was a scientist from the Scripps Institute. He and I worked closely to run a nationwide parameter-space study that was later published.By 2002, I was working for a company where I was in the position of running parameter-space jobs myself. One of my tasks was to compare the performance of a few different data transfer protocols. To do so, I had to initiate multiple jobs sequentially, with each job specifying the protocol, the direction of the transfer and the file size. Each job would end with a status of success or failure, and the time taken to complete the transfer. Although I could not use my run-multi tool (because these were not grid jobs), my fascination with parameter-space jobs continued. This time around, I focused on the results of the jobs.My tool to plot the results of parameter-space studies had to function at a time when JSON was not yet popular, and when there were few sophisticated charting tools available for free. I relied on a free, scriptable, 2D charting tool called jgraph that could produce PostScript output. My tool would take lines of data, one per job, with values specified using a “key=value” format. The user could tell my tool which values to plot on the X-axis, which on the Y-axis, which values to keep or ignore, how to resolve multiple Y values for the same X value, and whether or not to use log plots. Over time, I added the ability to draw 3D plots by handcrafting oblique and vanishing-point projections and plotting them as a 2D chart. The results of my charts were useful to my company; building my tool was loads of fun. Once again, I could fire off dozens of jobs, go out for a coffee, and return to find the jobs completed, with results fed to my charting tool, and a clear-eyed chart ready to explain the results to management.To me, asking about the “most innovative or inventive” technical thing I have done is a bit like asking someone to pick a favourite child. I have written two search engines, a finite-automaton-to-regular-expression converter, a JIRA dependency analyser, a graphical user interface for grids, multiple games using the Unix curses library, an HTML table generator, algorithms to pace digital advertising campaigns, a policy for constructing user graphs, user id synching techniques on RTB inventory, an aggregate log sampling algorithm, a library for timing code components in production, a self-writing program for assigning desks to students, and numerous little tools and games.I have architected and built significant portions of business features, e.g., pacing by revenue, balancing campaign delivery with performance, targeting on RTB vs. upfront inventory, major/minor key sorting in search engines, creating general functions for aggregating search results by category, etc. I have always been an active participant in the architectural design of every major feature produced by every company I have worked for. Additionally, in my current role, I am responsible for process changes, such as one for reacting to incidents on our platform, policies for auto-scaling, several processes for improving the quality of our platforms while increasing development velocity, and processes to minimise disruptions caused by personnel churn.I am proud of many of my accomplishments. Several of the items listed above are company confidential; others are quirky but perhaps not quite consequential. However, my work on parameter-space studies is in the public domain, and the work has benefited past organisations I worked for as well as scientists seeking a cure for Alzheimer’s and other diseases. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download