Lesson 2: Your Universe of Stocks - Quantopian

Lesson 2: Your Universe of Stocks

Introduction

This is the second tutorial in our Quantopian tutorial series. The first lesson covered the basics of how to set up and algorithm and a momentumtrending strategy that involved the use of the history() method. You can find the details of the first lesson here: () but if you needed a quick refresher:

You need to setup an `initialize()` and `handle_data()` method You can grab historical data using the `history()` method which returns a DataFrame

with the TimeSeries as the index and the securities as the columns

In this lesson, we're going to cover what it means to have a universe of stocks and the different ways you can use to set your universe.

So that means we'll be going over: 1. Basic Concepts: a. The definition of your universe b. The `set_universe` method 2. Algorithm Walkthrough: a. A daily rebalancing algorithm that equally weights all the securities within the Financial Sector of the S&P 500 3. Resources: a. Sample algorithms that we'll be using: i. singthenewuniversefunccallback b. Help docs: i.

Part 1: Basic Concepts

Summary

Your universe is the stocks that will be present in `data` at every call of your `handle_data()` method. So you can get an extremely quick glance at your universe by calling `for stock in data: print stock`. A more formal definition exists: Your universe of securities is a union of:

All the securities handwritten in your algorithm (E.g. Anytime you have sid(24) or symbol(`AAPL') in your algorithm, this will belong in your universe.

Anything called by a set_universe or my_universe method Anything that currently belongs in your portfolio, so that means anything you hold a

position in will belong in your universe. Even if that security is no longer trading.

So why should you care? When you're testing your strategies, it's important to test it against a random basket of securities rather than ones you've handpicked yourself. While it's great that you're strategy long/shorts Apple based on a movingaverage, we know that Apple has immensely outperformed the market so the results of your algorithm might be due to the choice of securities and not the algorithm itself.

Setting Up Your Universe

There are a couple different ways to setup your universe. The first method is quite simple as it involves setting up your variables like you would in any other situation. The second entails two different methods: `set_universe` and `fetch_csv` so we'll be focusing on that for the rest of the lesson.

1. Define for yourself which stocks you want to use specifically. The example below will show me defining two symbols (AAPL and SPY) and these two will be the only ones present in data when you run handle_data()

a. 2. Use a universe method to either set a random basket of securities (defined by dollar

volume rankings) or use the securities present in a CSV as your daily universe (dates have to match up)

a. `set_universe(universe.DollarVolumeUniverse(floor_percentile=98.0,ceiling_per centile=100.0))`

b. `fetch_csv(my_universe=universe_func)` c. The maximum amount of random securities that can be present are 200

securities

Using the `set_universe` method

In a nutshell, `set_universe()` allows you to define a random list of securities that you can call in `handle_data()`. It does this by using a ranking mechanism (refer to the docs for more information) which has a maximum cap on stocks for both daily and minutely mode. The ranking mechanism changes your list of stocks every quarter. But remember that any security you hold in your portfolio also belongs in your universe. So if you bought a security that was present in the universe for quarter 1 of 2013 and it's no longer in quarter 2 of 2013, the security will still be in your universe because you have it within your portfolio. The same principle applies when you own a security that's no longer traded. The backtester will think you still have the stock in your universe, so if you try to trade that security again, it will throw an error because there's no more data for that stock.

Let's go through an example of what using the `set_universe()` method might look like. To start us off, I'm simply going to call the `set_universe()` method using the `DollarVolumeUniverse` method I previously mentioned, and then I'm going to print out all the stocks in our universe so we can see which securities are currently there.

The output of this is:

As you can see, we didn't define any of these stocks, but they're all still there because we used `set_universe()` with a `DollarVolumeUniverse` ranking. Right now we're grabbing the top 80 securities (1% ~= 80) with the highest dollar volume rankings and printing them out. This is quite useful for backtesting strategies where we want to avoid survivorship bias, however it does present some problems when we're dealing with securities that are no longer trading (extra algorithmwriting work will be based on this). Now, there are sometimes when instead of a completely `random' basket of securities, you'd want to use something like all the securities in the SPY which is where the `fetch_csv()` method becomes very useful.

Part 2: Algorithm Walkthrough

Algorithm with `fetch_csv()` as the universe

The `fetch_csv()` method has an extremely large number of usecases and we'll be going over it extensively in the other tutorials, but for now, we'll only be focusing on using it as a method to import an external dataset filled with securities and dates, and setting our universe to match those securities. So let's go over our goals and plans for this algorithm:

Goals: We want to load in all the stocks that belong in the financial sector of the S&P500 and equally weight our portfolio by rebalancing each day, at the beginning of day. Please refer to the algorithm found in this post (or the sample algorithm listed in the introduction section).

If we break that down into it's individual parts, we have: 1. Get a list of stocks that belong in the S&P 500 2. Set that list of stocks as our universe of securities 3. Rebalance our portfolio at the beginning of day

I'll walk you through each of these sections step by step.

1. Get a list of stocks that belong in the S&P 500

This part isn't too hard. We can find the list on Quandl at this link (). It's all the securities in the S&P 500 as of October 10, 2013. It shows you the Ticker symbol, the Quandl Code, the name of the company, and the GICS Sector (the sector) it belongs in. Here's a snapshot of the data:

I cannot emphasize this enough. The format of your CSV matters a lot. Unless you're comfortable enough with Python to format the data yourself, there is no better thing you can do than to format your CSV correctly. I repeat, format your CSV correctly and well. But just incase you find yourself in a position where your CSV isn't optimally formatted, like it is now, I'll walk you through how to fix that.

Pause and Sidelesson: What a good CSV looks like

The documentation () goes over the specifics but I'm going to give you a quick one sentence summary of what your CSV needs:

There has to be a date column along with the signal that you want to use, and if you're mapping it to symbols, there should be a symbol column.

As I noted in the sidelesson above, the CSV needs a date column. Unfortunately, we don't have one in our right now so you can go about changing this in two ways:

Add a column for date with 10/1/13 in Excel/your choice of editor Add a column in the DataFrame in your pre_func() function in fetcher

This brings us to our next section:

2. Set that list of stocks as our universe of securities

Fetcher is a beast of it's own. There are a lot of caveats to using Fetcher and it definitely takes a lot of practice to get used to. I'll give you a simple overview on how it works. One of the next webinars will do an indepth, complete look at how Fetcher works under the hood so stay tuned for that if you'd like to learn more.

Fetcher calls your CSV => Converts it to a df => Applies your pre_func method to it After your pre_func, it applies your post_func => Calls universe_func

So remember I said that our CSV didn't have a date column in it. That means we have to add it in ourselves which is why the `pre_func` parameter is pointing to the `add_date_column` method which is defined below:

The `add_date_column` method simply adds a column called `date' and fills it with the values "10/01/2013" and also renames the `Ticker' column to `symbol'. We have to rename the column because Fetcher, internally, looks for the `symbol` column to map our stock tickers to the Quantopian `sid' object.

Once pre_func completes, Fetcher will look for `post_func` but since we don't have a post_func in this case, it'll go straight to the `universe_func` method. In our case, the `universe_func` references a method defined as `my_universe`.

There's a lot of complicated stuff happening here, but the overall goal of the universe_func is simply to do this: return the list of sids that you want to be using. And when you break down the code in this method, that's exactly what you have.

1. We're taking only the parts of the DataFrame where `GICS Sector' == `Financials'. This single line already isolates only securities from the financial sector of the S&P 500.

2. We're taking a set of all the sids that exist in financials. This means we're taking all the unique sids in financials. a. Remember that Fetcher has already transformed a lot of things in the DataFrame that you've passed in. The `symbol' column has been renamed to `sid'.

3. Then we the number of all sids that we have by defining context.count and use that to define an equal weight for all the securities in there

4. We return the original set/list of sids we got from `set(financials[`sid'])`

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download