Dear Analyst

Episode #45: Thinking long-term for structuring your dataset using U.S. public food assistance data

Al — Mon, 28 Sep 2020 10:05:26 GMT

This post originally appeared on the KeyCuts blog.

When you need to capture some data in a structured way, you’ll open up an Excel file or Google Sheet and just start throwing data into the spreadsheet. Not much thinking; just copy and paste. As that dataset grows, the original structure you had set up for that spreadsheet may not be ideal. Specifically, the dataset is not ideal for putting into a PivotTable. Long-term, I’d argue that all your spreadsheets should be structured in a way that’s suitable for a PivotTable (which makes it ready for storing in a traditional database). This post explores how you can structure a dataset that looks like 99% of data out there into a structure you can analyze in a PivotTable. Link to the Google Sheet is here.

Video walkthrough of Google Sheet here.

Why this is important

Telling someone that their data should be structured is a platitude like “such is life” and “forgive and forget.” Let’s be more specific in how this statement can impact your work.

To be specific: 9 times out of 10, structure your data so that it can always be analyzed in a PivotTable.

Consider this scenario:

* Your accounting team needs your group to start forecasting expense for next month’s budget

* You start gathering the data and throw it into a spreadsheet

* Every month new data gets added to the spreadsheet, and perhaps the CFO wants to get more granular analyses on the forecast

* You start adding additional columns to the spreadsheet and perhaps summary tables in other sheets in the file

* Other teams now need to see your data to understand how your team’s decisions will impact their decisions

* This spreadsheet ends up being too hard to maintain, so there’s an internal project to put this data into a real database (some ERP solution)

* One quarter of planning goes by, and another quarter for implementation

* 6 months later, the business has changed, the structure of the database needs to be adjusted, and the data engineer role still needs to be filled

This concocted scenario is quite extreme, but the key lesson is this:

Focusing on the schema and structure of your spreadsheet today takes time and requires you to think about how your data will be used and maintained in the future.

U.S. public food assistance dataset

I’ve started browsing Kaggle’s to find interesting datasets recently, and this one caught my attention since it looks at spending and household participation related to a public food assistance program called SNAP. As the creator of the dataset discusses, there are many issues with collecting government datasets. Data is spread out across different agencies, there are multiple formats, and data is sometimes aggregated. This makes consolidating the data a pain. These problems may sound familiar if you’re working at a large organization.

The “Raw” sheet in the Google Sheet simply shows the cost, households participating, and total people associated with the SNAP program for the 2019 fiscal year across four states (CA, IL, LA, NY):

In your organization, this could be sales data, headcount data, COGS, whatever. The key thing about this dataset is that you have all the numbers organized by month across the top. This table would be great for a simple time series analysis where you may want to see the cost per household for California over time. But what if you need to build out a more dynamic dashboard looking at various metrics for just a few months or a subset of states?

Pivoting this data

If you create a PivotTable with this data, you’ll run into this issue of having to select individual month names to put into the Values section of the PivotTable builder. We only have twelve months of data for FY19, imagine if this we had to this for ten years worth of data going back to FY09.

Some people asked me about what a “denormalized dataset” means in the context of Excel/Google Sheets I mentioned this term in the previous episode. We need to “denormalize” this data so that it’s easier to pivot off of. This means putting in data that may repeat itself in a certain column, but this helps with structuring the data properly for a PivotTable.

In Excel, there is hacky way of denormalizing your data, and it involves going through the antiquated PivotTable wizard (which I believe you can only access via old Excel keyboard shortcuts). I don’t think the PivotTable wizard is available in the ribbon in recent versions of Excel.

This video below shows you how to do it. It involves checking a radio button for “Multiple consolidation ranges” and then double-clicking in the grand total of the sum of Values in the PivotTable. It’s not pretty, but it works:

Unfortunately for Google Sheets users, that PivotTable wizard isn’t available. If you find a similar workaround let me know.

Moving time periods to rows in Google Sheets

Whenever you see time periods (in this case, months in 2018 and 2019) organized across the columns, think about how you can put those time periods into one column. This starts the process of denormalization. You want something that looks like this:

When you pivot off of the Period column in the PivotTable, you can then filter for and group your values by specific dates:

Moving metrics from rows to columns

In the original data set, there’s a Metric column which contains metrics we care about for each state (Cost, Households, and Persons). This structure will make a PivotTable very hard to organize and analyze because you will have to filter for a specific metric in order to get any meaningful statistics from your dataset. Additionally, this structure is mixing data types (e.g. Cost is in dollars and Households is a number).

Whenever you see metrics organized in this manner, think about moving each individual metric to its own column:

Now, each of these columns is a value you can drag and drop into the “Values” section of the PivotTable. This means you can get summary results or drill down into a specific state’s numbers:

Transposing the data

Setting up the data structure to look like the structure in the “Solution” sheet of the Google Sheet does take a little spreadsheet gymnastics. The easiest method I’ve found is to apply the TRANSPOSE function to the original dataset and then do some copying/pasting. Here’s what a TRANSPOSE looks like:

The nice thing about this function is that it puts all your time periods (months in this case) into its own column. Each metric also is organized in a top-to-down fashion. The problem is that each state’s data is still organized across the top. At this point, you’re doing a copy and paste to consolidate the 13 columns that result from the TRANSPOSE function into the 5 columns we ultimately care about: State, Period, Cost, Households, and Persons.

Setting things up for a database

You may be wondering what other benefits there are for having this data structure besides the ease of creating a PivotTable. If your data ever ends up in a regular database (e.g. SQL), this is the ideal data structure for that tool.

I’ve seen scenarios at different organizations where an Excel file or Google Sheet has hundreds of thousands of rows that represent critical business data cobbled together over time. There comes a point in time from an organizational perspective where that data needs to be put into a database for ease of querying. A data engineer will have to do some data manipulation or run an ETL process to convert the data into a suitable format for a database. Guess what? You can help your data engineer out by getting this structure correct from day one.

Data down good 😀, data right bad 😔

To summarize how your data should “grow” over time (big data ain’t going nowhere), your data should NOT grow right:

Instead, it should grow down:

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* Acquired Podcast Season 7 Episode #3: Epic Games

* People I (Mostly) Admire Ep #2: Mayim Bialik

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Episode #43: Setting up workflows that scale – from spreadsheets to tools & applications

Al — Mon, 14 Sep 2020 10:05:08 GMT

This post originally appeared on the KeyCuts blog.

This episode is the audio from a presentation I gave a few weeks ago to members of Betaworks based in NYC. Betaworks is a startup accelerator, co-working space, and community of founders. No-code is a pretty hot topic right now, and in this presentation I talk about how spreadsheets is one of the first no-code “platforms” and how your spreadsheet skills can be extended to build real tools. The presentation is adapted from a talk I gave last year at Webflow’s No-Code Conference. I embedded the “slides” at the bottom of the post, and here is a link to the slides if you want to look on your own.

Summary of presentation

* The skills you’ve learned in Excel/Google Sheets — include data structuring — translate to building workflows for any part of your business

* Thinking beyond spreadsheets as a way to do data analysis or “number crunching”

* Any tool that helps automate or solve some workflow at your company can be built with spreadsheets

* Why learning spreadsheets can set you up well for learning “no-code” tools

Spreadsheet examples from presentation

During the presentation, I showed actual spreadsheets (Excel and Google Sheets) I’ve built in the past for freelance clients and friends. The main concept I’m trying to convey is that each of these spreadsheets look and feel more like an application rather than a model that forecasts out certain values. Each of these examples consists three core elements:

* Database – A place to store information

* User Input – Fields and forms for someone to fill out

* Calculations/Display – Formulas (e.g. “business logic”) to make the spreadsheet output something for you (the administrator) or the user

My 2 cents: When you’re building an application in a spreadsheet, you’re extending the original purpose and audience Excel and Google Sheets was meant to serve: financial models for accountants. But this is what makes the spreadsheet so versatile. The fact that an analyst can string together formulas to make a spreadsheet look and feel like an application is what gives the spreadsheet power. This innovation also pushes Microsoft, Google, and other platforms to release new features that give analysts the ability to build tools, not just models.

I’ve written extensively about this subject in the past, so will leave my soliloquy at that. On to the examples

Bachelorette planning Google Sheet

The first example I discuss is this bachelorette party planning Google Sheet I built for a friend. This spreadsheet has been duplicated quite a few times by friends of friends, and all it does is help a to-be bride plan figure out which weekend works best to have a bachelorette party.

The key insight is that the database is everything from column B onwards and row 3 and below. All the availability for each person is captured in each of these cells and there’s some conditional formatting to give the bride a visual indicator to see when a weekend is available.

The user input is the ability for each friend who is shared the Google Sheet to edit the cells. “Yes,” “No,” and “Maybe” are the only inputs that matter for this Google Sheet. Finally, the calculations are in rows 31-33 which tallies up the user inputs for each weekend so the bride can see which weekend is the “most free” for her friends.

There are countless iPhone and Android apps you can download to do this exact same thing, but this spreadsheet just does one thing and one thing well: help brides figure out which weekend to plan a bachelorette party.

Splitting costs with friends

This splitting costs with friends blog post is by far the most popular post on my blog since I published it in 2014 (thanks Google search!). Every day I still get requests to give people edit access to the Google Sheet (please just make a copy of it instead of requesting edit access). Here’s the Google Sheet if you want to make a copy for yourself.

Similar to the previous example, the database is all the items, costs, and who participated in the cost from rows 2 and down. The user input are the cells themselves, but the most important part of the Google Sheet are the 1s and 0s from column C onward. Those 1s and 0s represent whether a friend or family member “participated” in the cost. This allows the spreadsheet to do some basic calculations to figure out who owes what.

Rows 26-28 are the calculations that the trip organizer can see at a glance to see who is owed or who owes money. Again, numerous apps and custom tools you can pay for or download to split costs with friends, and this Google Sheet mimics the features of those apps in a more bare bones way.

Patient intake system

This example shows when the spreadsheet is really extended beyond what it was intended to do. This was for one of my consulting clients who needed a new CRM system for managing new patients at their clinic.

The Excel file basically lets the operations manager at the clinic quickly “move” new patients from one spreadsheet to another using a VBA macro. To mimic the look and feel of an application, I drew these blue and green buttons using the shape feature in Excel and tied a macro to each button. The database consists of patient details, the user input is simply each row of data, and the calculations involve these macros that move data from one spreadsheet to another.

This gets into an important concept that an Excel file or Google Sheet are not that great for: workflows. Since everything is usually calculated in real-time in a spreadsheet, it can be difficult to do a if-this-then-that type of workflow without using a macro or script (see my last post on automating a tedious filling values down task).

“Slides” from Betaworks presentation

The rest of the presentation includes tool and tips for building applications with other no-code tools. Link to slides.

Original talk from Webflow’s No-Code Conference in 2019.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* No other podcasts for this episode given how long this episode is!

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Episode #42: Filling values down into empty cells programmatically with Google Apps Script & VBA using SPAC data

Al — Mon, 07 Sep 2020 11:00:04 GMT

This post originally appeared on the KeyCuts blog.

SPACs (Special Purpose Acquisition Companies) or “blank check” companies have been in the news recently, so I used some real SPAC data for this episode. Your spreadsheet has empty cells in column A, and these empty cells should be filled with values. Your task is to fill values down up until you find another cell with a value, at which point you need to fill that value down. This episode walks through how to do this programmatically with a script in Google Apps Script (for Google Sheets) and VBA (for Excel). This is the Google Sheet associated with the episode. The Google App Script is here and VBA script is here. See a quick example of what the issue is in the gif below and how the script “fills in” the values for you.

See the video below if you want to jump straight to the tutorial:

Why is this data structure a problem?

You’ve inherited a spreadsheet and the data structure looks like this:

It’s a list of data but there are empty cells in column A. This is usually a category or dimension in your data set that needs to be “filled down” so that the data set is complete. In the Google Sheet, each row represents one person that is associated with a given SPAC, but the SPAC Ticker column is incomplete. You’ll usually get this type of data structure through the following:

* Data was manually created by someone who didn’t fill down the values in column A since they thought it was a “category”

* You are working a data set that originally came from a PivotTable but you only have the “values” from the PivotTable, not the PivotTable itself

This data structure is a problem because if you want to do any type of analysis on this data, it will be extremely difficult since you have missing values in column A. Sorting, filtering, and PivotTables are all out of the question if your data set looks like that screenshot.

Solving this with keyboard shortcuts

Totally doable for this Google Sheet. This is what you could do:

All I’m doing above is the following (on PC):

* SHIFT+CONTROL+DOWN ARROW – Select all the empty cells from the current cell with a value up until the next cell with a value

* SHIFT+UP ARROW – Reduce the selection by one row

* CONTROL+D– Fill the value from the first cell in the selection down

* CONTROL+DOWN ARROW – Skip to the next value that needs to be filled down

The obvious tradeoff here is time vs. human error. Every time I have to do this task on a spreadsheet, I think about whether it was worth filling the values down “manually” using keyboard shortcuts or using a VBA script (in Excel) to do this programatically. It really depends on the number of rows. For the example SPAC Google Sheet, doing this with keyboard shortcuts takes 10 seconds tops. If this spreadsheet was 1,000,000 rows, then we have a problem.

Don’t worry, I got you. Here’s the script you can use to do this programmatically.

Using Google Apps Script in Google Sheets

First off, here’s the script you can use for Google Sheets (gist here). Just 14 lines of code and you’re good to go:

function fillValuesDown() { var spreadsheet = SpreadsheetApp.getActive() var currentRange = spreadsheet.getRange("A2:A" + spreadsheet.getLastRow()) var newRange = [] var newFillValue currentRange.getValues().map(function(value) { if (value[0] !== '') { newFillValue = value[0] newRange.push([newFillValue]) } else { newRange.push([newFillValue]) } }) currentRange.setValues(newRange)}

Never used macros or Google Apps Script before? It’s super simply. First go to Tools then Script Editor:

You may be asked to authenticate your Google account so just hit Yes to all those screens. Copy/paste the script into the editor:

Go to File and Save in order to save the script into the Google Apps Script project. Go back to Google Sheets and go to Tools, Macros, and click Import to import the fillValuesDown function into Google Sheets. Now you can use this function as a macro in your Google Sheet:

You can close out the Google Apps Script editor and now click on Tools, Macros, and click on fillValuesDown to run the script on your dataset:

How does the script work?

The script utilizes the Spreadsheet service for Google Apps Script to access the data object for your Google Sheet (more on that below). The script is really only 12 lines long, and does the following in sequential order:

* Sets the spreadsheet variable so that we can use the active worksheet you’re on

* Sets the currentRange variable to start from A2 to the last row in the table

* Two more variables are set: newRange to store the new range of values we want to put into column A, and newFillValue which is kind of like an intermediate variable used in the loop

* The script goes through all values in currentRange (including the blank ones) and adds all the correct values to the newRange array

* The currentRange is then set equal to newRange to get all the “correct” values into column A

On the backend, the currentRange array looks like this:

[['HZAC'], [], ['FST'], [], [] , []...]

The purpose of newRange is to create a new array that is a complete list of values:

[['HZAC'], ['HZAC'], ['FST'], ['FST'], ['FST'] , ['FST']...]

Recording macros vs. programming Google Sheets

When I first started learning macros, the first thing I did was record my keystrokes and break down what the backend “code” looked like. Here’s what recording a macro looks like:

When you open up the script editor, you’ll see this:

There’s a lot of activate() and getCurrentCell() functions being called. You can then deconstruct all these keystrokes to build a script that accomplishes the task. But here’s the key difference between recording keystrokes versus working with the data object:

You are programming keystrokes instead of the Google Sheets application.

Other advantages of programming the application instead of the keystrokes:

* Utilizes less compute resources and runs faster

* Easier to debug

* Easier to adapt to more scenarios and use cases

In the keystroke world, you are literally telling Google Sheets to select cells, select ranges, and moving the cursor around which doesn’t seem like a big deal. When you are working with hundreds of thousands of rows, this could cause serious performance issues. Since Google Apps Script runs in the cloud, you may not see these performance deficiencies, but you’ll definitely see this in your Excel workbooks.

Speaking of Excel workbooks…

Using the VBA script for Excel

Sub fillValuesDown() Dim lastRow As Double lastRow = ActiveSheet.Cells(Rows.Count, "B").End(xlUp).Row Dim currentRange As Variant: Set currentRange = ActiveSheet.Range("A2:A" & lastRow) ReDim newRange(1 To lastRow) Dim newFillValue As String Dim i As Long i = 1 For Each cell In currentRange If IsEmpty(cell.Value) = False Then newFillValue = cell.Value newRange(i) = newFillValue i = i + 1 Else newRange(i) = newFillValue i = i + 1 End If Next cell currentRange.Value = Application.Transpose(newRange)End Sub

The structure of the VBA script is pretty similar to the Google Apps Script, but it’s just a little different syntax. I’m not going to walk through the tutorial of how to set this up since it’s pretty similar to Google Sheets. In the VBA script, you do end up doing some “cell selection” like in line 8. Most of the script, however, is working with the Excel data object model so the script should run pretty quickly regardless of the size of your Excel file.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* Developer Love #3: Developer Experience Teams with Peggy Rayzis of Apollo

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Episode #41: How to do a VLOOKUP to the “left” without INDEX/MATCH with TikTok data

Al — Mon, 31 Aug 2020 10:09:56 GMT

This post originally appeared on the KeyCuts blog.

Since TikTok is in the news right now about who is going to buy them, I thought using some fake-ish TikTok acquisition data would be relevant for this episode. A classic Excel/Google Sheets challenge: how to do a VLOOKUP to the “left” e.g. your lookup column is not the first column in your lookup table. There are all sorts of strategies to overcome this issue with how your data is structured. Notably, the INDEX/MATCH strategy is the most commonly-cited strategy when good ‘ol VLOOKUP is not at your disposal. In this episode I walk through a strategy that allows you to use VLOOKUP: array formulas. Skip to strategy #3 below if you want to see the answer. Associated Google Sheet for this episode if you want to follow along.

Was trying to find some gif associated with “looking up”

See the video below if you want to jump straight to the tutorial:

Why the VLOOKUP won’t work

If you are new to why VLOOKUP won’t work in this scenario (see Google Sheet), take a look at the data data structure below:

We have ID in column A and we want to find Company Name and Market Cap in columns C and D, respectively, for these IDs. The ID in column A is the unique identifier for the row, and we need to do a lookup to Company ID in column I.

While you can eyeball the result for the first row (“Triller” is the company for ID 3), we want to find a scalable solution using formulas.

As you start writing the VLOOKUP formula in column C, you’ll start to notice the problem: the Company ID column is not the first column in your table to lookup the ID value in column A:

Here are a few strategies for solving this problem (#3 is probably the one you haven’t seen before).

Strategy #1: Move the lookup column to the first column position

This is not the most ideal solution, but you could just simply cut and paste the Company ID column and move it to the left-most “first” column of your lookup table. In Excel you would have to do a cut and paste, but in Google Sheets you can just drag and drop the column into the proper position:

Now the VLOOKUP for Company Name will work correctly since Company ID is the first column in your lookup table:

I don’t like this strategy because it involves some manual cutting and pasting of columns. If your lookup table isn’t static (e.g. might be sales data that gets added daily), then you might be ruining the “structure” of your data on subsequent updates. Let’s see what else we can do.

Strategy #2: Make copies of the columns to the right of the lookup column

Also not an ideal solution, but it works in one-off cases where your data is static and you don’t care about showing your back-end work to a colleague. It looks like data is duplicated, but you’re basically referencing existing columns in your table so that those columns appear to the “right” of your lookup column:

Now you can do a VLOOKUP for columns I to K to get the Company Name and Market Cap values to show up in columns C and D:

Strategy #3 (preferred): Use array formulas

A relatively unknown feature in Google Sheets is you can create your own “tables” using array formulas. An array is simply a range of cells, and you can separate different range of cells using a semicolon. To create an array, you put curly brackets around your ranges. Here’s how an array of columns F and G would look like:

What’s the result? You simply get a reference to the two ranges after you enter the formula:

The key here is that you can create any order of range references in the array formula. We could’ve put G2:G6 first and F2:F6 second, and you would’ve seen the values in Website first followed by Company Name after entering the formula.

Knowing this, we can create our own lookup “table” using the array formula syntax like so:

Notice how the second argument in the VLOOKUP formula is no longer a table, but rather an array of column I followed by columns F to H. In this array, the second “column” is Company Name since we are saying column F is the second range of cells after column I. Market Cap is now the fourth column in this array:

In order to fill this formula down, we need to turn the range references in the array formula into absolute references as shown above.

Strategy #4 (most common): INDEX/MATCH

As mentioned at the beginning of this post, this is the most common method for looking up values to the left. I won’t give a detailed explanation of how INDEX/MATCH works, but here’s how you would get the Company Name given the data structure:

Which strategy should you use?

I’m a little torn between strategies #3 and #4 since INDEX/MATCH is the go-to method for looking up data to the left, and is also more performant than VLOOKUP on large data sets. The fact that the array formula in strategy #3 doesn’t involve a nested formula makes it potentially easier to debug in complicated spreadsheets. I haven’t used an array formula in many VLOOKUP situations since I learned INDEX/MATCH such a long time ago, but I may try this strategy in the future.

Of course, this all becomes irrelevant if you have the XLOOKUP function at your disposal which became available to certain Office 365 subscribers about a year ago (September 2019). This video is a fun poke at XLOOKUP, but also holds some truth for the VLOOKUP purists out there (start watching at 1:19):

A little Kant and poker

I talk about this in the 2nd half of the episode, but thought it would be worth sharing a passage from The Critique of Pure Reason as it relates to betting on your convictions. Listen to the Knowledge Projectepisode for the full background:

The usual touchstone, whether that which someone asserts is merely his persThe usual touchstone, whether that which someone asserts is merely his persuasion — or at least his subjective conviction, that is, his firm belief — is betting. It often happens that someone propounds his views with such positive and uncompromising assurance that he seems to have entirely set aside all thought of possible error. A bet disconcerts him. Sometimes it turns out that he has a conviction which can be estimated at a value of one ducat, but not of ten. For he is very willing to venture one ducat, but when it is a question of ten he becomes aware, as he had not previously been, that it may very well be that he is in error. If, in a given case, we represent ourselves as staking the happiness of our whole life, the triumphant tone of our judgment is greatly abated; we become extremely diffident, and discover for the first time that our belief does not reach so far. Thus pragmatic belief always exists in some specific degree, which, according to differences in the interests at stake, may be large or may be small.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* The ShopTalk Show #424: Web Components, Frameworks vs Vanilla, Accessible Numbers, and SVG Memory Usage

* The Knowledge Project #89: Maria Konnikova: Less Certainty, More Inquiry

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst Episode #40: A spreadsheet error from two Harvard professors leading to incorrect economic policies after 2008 recession

Al — Mon, 24 Aug 2020 10:30:41 GMT

This post originally appeared on the KeyCuts blog.

It's 2010, and the world is coming out of recession. Two Harvard professors--one of whom is a former economist for the IMF and chess Grandmaster--publish a paper suggesting that a country with a high public debt-to-GDP ratio of over 90% is associated with low economic growth. Turns out the Excel model the professors use is riddled with some basic statistical and formula errors. The results potentially lead to incorrect economic policies, austerity measures, and high unemployment around the world. This is a Google Sheet which shows one of the spreadsheet errors, and I show how you can prevent such an error in this post.

See the video below if you want to jump straight to the tutorial:

Background

Economists Carmen Reinhart and Kenneth Rogoff published a paper in 2010 called Growth in a Time of Debt (originally published in the American Economic Review) where they argued:

[...] median growth rates for countries with public debt over 90 percent of GDP are roughly one percent lower than otherwise; average (mean) growth rates are several percent lower.

In 2013, PhD students Thomas Herndon, Michael Ash, and Robert Pollin of the University of Massachusetts, Amherst had re-created the study from Reinhart and Rogoff's paper as part of their PhD program. The students had to analyze the original Excel files that Reinhart and Rogoff used, and they weren't able to replicate the original results. They cited in their own paper entitled Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff :

[...] coding errors, selective exclusion of available data, and unconventional weighting of summary statistics lead to serious errors that inaccurately represent the relationship between public debt and GDP growth among 20 advanced economies in the post-war period.

Reinhart and Rogoff suggested that the debt/GDP ratio and economic growth is simply a correlation, and that correlation still holds after correcting for the spreadsheet mistakes. However, that correlation is not as strong as their original paper posited.

Why this was a big deal

The implications of their findings resulted in news outlets, politicians, and policymakers using the 90% benchmark as a signal that a country is heading for low economic growth. Some notable examples:

* 2012 Republican nominee for the US vice presidency Paul Ryan included the paper in hi proposed 2013 budget

* The Washington Post editorial board takes it as an economic consensus view, stating that "debt-to-GDP could keep rising — and stick dangerously near the 90 percent mark that economists regard as a threat to sustainable economic growth."

* Austerity measures are put into place around the world despite the advice from economic advisers, pushing unemployment rate above 10% in the eurozone

3 main Excel spreadsheet problems with the model

The three main errors that Herndon, Ash, and Polling discovered are the following:

* Years of high debt and average growth where selectively excluded from the data set

* Countries' GDP growth rates were not properly weighted

* Summary table excludes high-debt and average-growth countries

This video illustrates the three individual problems with the spreadsheet really clearly:

If you fix these errors, the average real GDP growth rate for countries carrying a public debt-to-GDP ratio of over 90% is actually 2.2%, not -0.1%. In the Google Sheet I shared, you wont' see the correct 2.2% average growth rate since I'm not doing the full analysis and focusing on the third Excel error stated above.

Fixing incorrect cell references for average GDP growth rates

The third error of incorrectly excluding high-growth countries from the average GDP growth rate is a particularly egregious mistake, and Reinhart and Rogoff admit that they made this simple cell referencing mistake. As you can see in the screenshot below, they simply omit rows 45 to 49 in their AVERAGE formula:

Source: https://statmodeling.stat.columbia.edu/

Here are three methods Reinhart and Rogoff could have used to ensure that they referenced the correct cells to avoid this mistake:

Method 1: Check the summary dropdown in the bottom-right

After you select all the cells that contain GDP growth rates in column G, you can look at the dropdown in the bottom right of Excel or Google Sheets to see the average. No formulas required:

You can also get other summary stats like the SUM, MIN, and MAX of your selected range of cells. Probably the easiest method to get a quick sanity check of your averages that you've calculated in lines 26-27 of the Google Sheet.

Method 2: Adding a checksum/checkaverage formula to compare results

This one is my preferred method, and is quite common in financial models. Usually you'll see this type of "error checking" when you want to make sure you've captured the correct cell references for a SUM formula, but with some extra work you can check for averages too.

You start by writing a formula below your actual summary stats (in this case starting on line 28 of the Google Sheet) and create a SUM formula of the data:

The big question is this: how do you know if you've referenced the correct cells in your "checksum" formula? The hope here is that by writing the SUM formula for the second time, in theory, you won't make the same mistake twice. Obviously this is a big assumption in this method, but let's assume you've properly made the reference for this internal error-checking formula.

The next formula below the "checksum" is a "count" formula:

Notice how it's not a COUNT formula. This is because the table contains the "n.a." text so a COUNTA formula would be incorrect since it would count all values in the column. We only want the numeric values, hence the reason for using COUNT.

Finally, the "checkaverage" formula compares your actual average in line 26 with the result of checksum / count. If the values aren't equal, then you'll get the text "Error" as the result of the IF formula:

Since line 26 references the "incorrect" averages used in Reinhart and Rogoff's paper, we get errors across the board. This "checksum" or "checkaverage" methodology gives you a visual indicator on whether your calculated results are properly referencing all the cells in the range instead of a subset. Instead of writing a "checksum" and "count" formula, you could simplify the "checkaverage" formula to this:

We simply put the SUM and COUNT formulas inside the first argument of the IF statement.

Method 3: Create a PivotTable and compare results

This method also relies on you selecting the proper cells to build your PivotTable. Again, assuming you don't make the same mistake twice, selecting the cells in the range should be a pretty simple task. After you select the cells (B4:G24 in this case), you build a PivotTable with Country in the Rows and the four debt/GDP buckets in the values. You then summarize each metric with the AVERAGE selection:

The "Grand Total" on the last line of the PivotTable contains the average across all growth rates. You can then compare these numbers to your computed numbers on the first sheet that contains your table.

Lessons to be learned for your own models

People don't check their analyses with the the above 3 methods because it takes extra work and...well...people are lazy. In addition to putting in error checks to ensure you are not making simple spreadsheet errors like this, there are other strategies you can use to ensure others can replicate your work to detect potential errors.

For Reinhart and Rogoff, they didn't make their full underlying data public. They only shared their spreadsheet after Herndon, Ash and Pollin reached out to them as the trio was trying to replicate their results. Some other strategies:

* Upload your results to a public repository like GitHub early on in your analysis and "open source" your data

* Write detailed steps on experimental design, procedures, equipment, data processing, and statistical methods used so others can replicate your experiment

I really liked this quote from a commenter about the Excel error on this Stat Modeling blog

I’d like to see how many researchers expose themselves to such criticism. Uploading a raw dataset is one thing but allowing people to see all your intermediate calculations in messy detail is rare.

Too often we're caught up in doing all the number crunching ourselves and then sharing the output once we think we've crossed finished the analysis. As this example suggests, sharing your data set and model as you are doing the analysis can prevent a blunder like this from happening.

Auto date formatting and human gene naming problems

In the second half of this episode, I discuss an article in The Verge about how the HUGO Gene Nomenclature Committee had to rename gene names because of Excel's simple feature of auto-formatting dates. Gene names like "MARCH1" and "SEPT1" get re-formatted to the dates "1-Mar" and "1-Sep" when these values are entered into Excel. I thought this was interesting to see the scientific community bending to this standard feature in Excel given the widespread use of Excel in the scientific community.

Source: The Verge

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* The Verge: Scientists rename human genes to stop Microsoft Excel from misreading them as dates

* This Week In Startups #948: HackerOne CEO Mårten Mickos shares insights on how he grew his bug bounty army to 400,000 strong by providing a path to hack for good, most common security vulnerabilities, worst security breaches, hacking the Pentagon, protecting the open source that unites us & scaling a company culture that defaults to disclosure

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Episode #39: Generate a random list of names from a list of popular 90s TV characters

Al — Mon, 10 Aug 2020 13:53:07 GMT

This post originally appeared on the KeyCuts blog.

Let’s say you have a set list of names (in this case TV characters from popular 90s TV shows). You want Google Sheets/Excel to generate a random list of names from your list as if you were picking names out of a hat. How would you do this? It most likely would involve the RAND function, but let’s take it a step further and say you want to give the end user the ability to dictate the number of random names to return from your list (e.g. out of my list of 100, give me 5 random names). This is the Google Sheet with all the completed formulas. In addition to the audio format of this episode, I’m also going to start releasing the video tutorial:

Create your list in column B

Start with your list of names in column B. This can be any list you want to randomize. My list is just a bunch of TV characters from shows I watched when I was a kid.

Source: Fandom

In column A, you put the RAND function and copy it all the way down to the bottom of our list. You’ll get a decimal with random numbers. Doesn’t look that useful now, but this random number column will drive the rest of the tool to generate your list of random names:

Sort this random list of numbers

It sounds kind of weird, why would you sort a random list of numbers? What does that even mean? As you have probably seen, every time you refresh your Google Sheet or commit an Excel formula by hitting ENTER, all those random numbers in column A will change. This means if you sort this list of random numbers, the sorted list will change too. I put a space in column C so in cell D2, you enter this formula:

The SORT function takes in a range of cells as the first parameter, the sort index as the 2nd (which is just the number column we ant to sort on, column #1), and then true or false for sorting in ascending or descending order. You can also put 0 to indicate false which is what I did in this example to sort in descending order.

The nice thing about the SORT function is that it automatically fills the formula down to the bottom of your data set. This is a relatively new function in Excel since it kind of acts like dynamic array formulas or array-entered formulas. The formula kind of “spills” down for you as your list grows so you don’t have to worry about dragging the formula down until the last row in your data set.

A good ‘ol VLOOKUP

What does this column of sorted random numbers do for us? Well, we know that each random number in this sorted column corresponds to one of the numbers in column A where we generated the random number. So in column E, we just do a VLOOKUP using column D as our lookup value and columns A:B as our lookup table to get the name associated with the random number in column D:

This is not the usual way you might use VLOOKUP because you’re usually using VLOOKUP with some unique identifier as the lookup value. Column A isn’t really a unique “TV character ID” since that “ID” changes all the time with the RAND function. We don’t really care about that, because now when you refresh the Sheet, column E will always have a random list of names:

In the above gif I’m just pressing COMMAND + R a few times to refresh the Sheet so that the RAND function in column A constantly changes.

We could stop here since you now have a random list of names in column E. Let’s take this a step further and give the end user the ability to choose the number of random names from the list.

User input with OFFSET

We’re already doing some hacking with VLOOKUP and using it in a way that it probably wasn’t made to use, so let’s do something similar with the OFFSETfunction. Cell H5 is just my “user input” cell where I’m getting the number of results from the user. This is a hard-coded number the user has to input. Then in cell H2, I have this OFFSET formula:

Let’s break this down by each parameter:

* E2 – This is the “starting point” for my OFFSET function

* 0 – I don’t want to move any rows up/down

* 0 – I don’t want to move any columns up/down

* H1 – References my user input cell indicating how many rows of data I want to return from my OFFSET (e.g. “height” of the range)

* 1 – How many columns to return (e.g. “width” of the range)

Now as you put a number in cell H1, the list of random names will grow and shrink. If you put a number that is more than the list of names you have, then it will just return the max number of names from your list (in random order, of course):

Picking the right tool for the job

A caveat I point out at the end of this episode is that while you can do this random list of names generator in Excel or Google Sheets, a spreadsheet may not be the best tool for the job. There are hundreds of random list generator apps that may be built specifically for your industry be it education or hospitality. Sometime it’s just easier to do it in a spreadsheet because all our data is there, but constantly question if the tool you are using is the right one for the job.

There’s a similar template in the Coda gallery which generates a random list of teams of players based on the number of teams and players you have. Just another nifty way at approaching the same problem in a different tool. Disclosure: I work at Coda.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* Google Cloud Platform Podcast #226: Documentation in Developer Practices with Riona Macnamara

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Episode #38: Breaking down an Excel error that led to a $6.2B loss at JPMorgan Chase

Al — Tue, 04 Aug 2020 11:16:02 GMT

This post originally appeared on the KeyCuts blog.

You blink a few times at the screen and realize what you're seeing is not a type. $6.2B has left your bank due to some rogue trader making untimely bets on the market. That's B as in billion. You call up the modeler who was supposed to make sure this never happens to your bank. The modeler takes a closer look at his model, and realizes that he made a fundamental error in how he calculates one value that caused the dominoes to fall. This is the story of the "London Whale" at JPMorgan Chase in 2012 who cost the bank $6.2B and a breakdown of the Excel error that may have caused the whole thing. This is the Google Sheet if you want to follow along with the Excel error.

Derivative of a derivative

I'm not going to pretend like a know the intricacies of all the financial products involved here, so you can read the Wikipedia article if you want the full details. In 2012, there was a CDS (credit default swap) product called CDX IG 9 that the trader at JPMorgan may have made large bets on, and ended up on the wrong side of the bet. The London trader's name is Bruno Iksil, and it was a classic scenario of a gambler trying to get out of his losses by doubling down on black at the roulette table.

Source: The Fiscal Times

Multiple investigations were taken by the authorities in the U.S. and U.K., the the investigations show that a variety of institutional failures may have facilitated the large bets made by the London Whale. This HBR article by Ben Heineman, Jr. provides a nice summary of all the key players:

* London traders - The traders simply didn't understand the complexity of the derivative products they were buying and selling

* Chief Investment Office(CIO) - The head of the CIO didn't monitor the trading strategies and put in the proper controls for the portfolio of products the office was buying. The Value at Risk (VaR) model was flawed (see more below).

* Firm-wide Leaders - Not enough oversight by the CFO and CEO (Jamie Dimon)

* Board and Risk Policy Committee - The committee was told that everything was fine with the CIO, and didn't get accurate pictures of what risk officers really felt about the risky trades being made.

Appendix of the Task Force Report by JPMorgan

There is a 130-page report created by JPMorgan Chase in 2012 which details what happened internally that led to this debacle. In my opinion, the juicy stuff starts in the appendix starting on page 121 of the report. I read off some parts of this appendix in this episode, but the appendix basically details issues with the VaR models created by one of the quantitative modelers at JPMorgan to more accurately value the complex traders that were happening. Or at least they thought the model was more accurate.

At the very end of the appendix, there's a section called "Discovery of Problems with the New VaR Model and Discontinuance" where the report details the Excel error that contributed to the large inaccuracies in how the model valued risk.

The $6.2B Excel error

This is how the error is described in the report (emphasis mine):

Following that decision, further errors were discovered in the Basel II.5 model, including, most significantly, an operational error in the calculation of the relative changes in hazard rates and correlation estimates. Specifically, after subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended.

Note: I don't have domain expertise in VaR models, synthetic credit derivatives, or trading in general. The following example is my over-simplification of the error based on what's written in the report.

The report talks about hazard rates (for what I assume relate to the default of corporate loans in this case) and how the changes in the hazard rates were improperly calculated. Here's a simple table from the Google Sheet showing fictitious dates, hazard rates, and the change in rates:

Now here's what happens when you apply a SUM vs. an AVERAGE to the "Change in %" column:

This is hitting the border of my knowledge of growth rates and time periods, but the sum of changes will always be 5X the average of changes given there are 5 values we are summing/averaging.

The difficulty with detecting this type of formula error

The magnitude of the difference between the SUM and the AVERAGE is not what I think is interesting, but rather the absolute difference between the SUM and AVERAGE. Here is a chart plotting the same data:

Based on this chart, can you estimate what the average of the Change in % is? Looks like something around 0%, but 3% doesn't feel that far off. The point I'm trying to make is that unless you are monitoring the SUM and AVERAGE consistently over time to detect any outliers, it will be difficult to know whether you made the formula mistake in the first place. With the presence of outliers, it makes it more clear that you might have an error in your model. Here's the other table from the Google Sheet with intentionally skewed hazard rates:

Here we see the magnitude of the difference is still 5X, but the absolute difference is much wider. This would cause an analyst to look deeper into the model and try to figure out why there is such a large discrepancy. But this is only because there are fictitious hazard rates. In the case of JPMorgan Chase, my hunch is that the gap between the lower and upper bound of daily hazard rates was really narrow, so detecting a change like this would've been very difficult without the proper controls in place.

This reminds me of the tale of the boiling frog:

Urban myth has it that if you put a frog in a pot of boiling water it will instantly leap out. But if you put it in a pot filled with pleasantly tepid water and gradually heat it, the frog will remain in the water until it boils to death. (Source)

Without a really hot pot of boiling water, it was too late for JPMorgan to detect there was something wrong with the CDS trades, and the proverbial frog boils to death.

Hanlon's Razor

One frame for this egregious Excel error is Hanlon's Razor:

"Never attribute to malice that which is adequately explained by stupidity", known in several other forms. It is a philosophical razor which suggests a way of eliminating unlikely explanations for human behavior. (Source)

Perhaps the modeler cannot be blamed for his Excel error because it was an error that he had know way of knowing or predicting. I'm not trying to remove blame from the modeler, but it's an interesting frame to analyze the problem because this is an spreadsheet error that is difficult to prevent unless you have other models and risk controls that are able to predict this type of error in advance. There are many other cases of Excel errors that led to false calculations that cost firms millions of dollars, and it's hard to say if one can blame the modeler for "malice" or plain stupidity.

New intermediate Excel class on Skillshare

Quick plug for a new Excel class I just launched today on Skillshare. It’s an intermediate Excel class for cleaning and analyzing data.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* a16z Podcast: The Future of Decision-Making--3 Startup Opportunities

Listen to this episode here

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst #37: Text manipulation functions to extract domain names from email addresses

Al — Mon, 27 Jul 2020 10:19:10 GMT

This post originally appeared on the KeyCuts blog.

In Excel or Google Sheets, text manipulation is usually associated with data cleaning, data cleansing, and data transformation. Sometimes your data is “dirty” and needs to be categorized in a different way or you need to “extract” a piece of text from a another piece of text. In this example, we use a combination of the FIND, RIGHT, and LEN functions to extract the domain name from an email address (e.g. the “tesla.com” from “john.smith@tesla.com”). Here’s the Google Sheet if you want to make a copy for yourself to follow along.

Start with finding the @

The first step is to use the FIND function to find the location of the “@” symbol in the email address. The FIND function takes two required arguments and one optional argument. You’re basically find the index location of where that characters or string exists within the cell:

In the case of “john.smith@amazon.com,” the FIND function would return 11 since the “@” symbol starts at the 11th position within the email address. Pretty simple right?

Nesting LEN inside the RIGHT function

The next part is a little trickier. Now that we know the position of the “@” symbol, we want all the characters after the “@” symbol to get the domain of the email address. There are multiple ways of doing this, but I chose to use the RIGHT and LEN functions. To make this more clear, I could have put the LEN function in its own column, but decided to next it within the RIGHT function:

The RIGHT function takes two arguments and simply returns the number of characters from the “right” of the text you give it (in this case the email address). Since we don’t know how many characters to pull from each e-mail address, we use the result of the LEN(A2) - B2 formula which tells us how many characters to pull from the right of the email address.

LEN(A2) gives us the length of the entire text (for “john.smith@amazon.com” it’s 21). If we subtract the index position of the “@” symbol from that length, we’ll get the exact number of characters to pull for each unique email address. Pretty nifty.

Note: The “Position of @” column also could’ve been nested in the 3rd column (and replaced the current cell reference of B2).

I typically use a combination of FIND, LEN, and MID to extract the text I need from a longer piece of text. Once you master these few functions, you’ll be able to to pull anything you want out of a long piece of text to get “clean” data.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* The Tim Ferriss Show #444: Hugh Jackman on Best Decisions, Daily Routines, The 85% Rule, Favorite Exercises, Mind Training, and Much More

* EconTalk: Robert Lerman on Apprenticeships

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst #36: What The Economist’s model for the 2020 presidential election can teach us about forecasting

Al — Mon, 13 Jul 2020 11:30:07 GMT

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Episode #34: Trick for finding column index for VLOOKUPs using U.S. pride events data

Al — Mon, 22 Jun 2020 14:47:28 GMT

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst #33: Comparing models for one-time vs. monthly recurring donations to support racial justice organizations

Al — Mon, 08 Jun 2020 16:10:31 GMT

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst #32: How to use the QUERY function in Google Sheets on COVID-19 data

Al — Sun, 07 Jun 2020 18:05:32 GMT

The QUERY()function in Google Sheets gives you the ability to quickly filter and sort your data similar to how you might get data from a database. If you write SQL queries, the QUERY() function feels easy and natural to use. There are a few caveats as I discuss in this episode. If you want to follow along with the exercises I discuss in this episode, make a copy of this Google Sheet which contains the QUERY() functions I mention in the episode.

Basic query to find confirmed cases greater than 50,000

Our data set is from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The data shows confirmed cases, deaths, and recovered cases by country (188 countries) on May 1st:

The first query simply pulls back the list of countries and confirmed cases where the number of confirmed cases is greater than 50,000. Notice how you reference the column letter name versus the actual name of the column in the header row:

The first parameter is covid_data which is a named range in Google Sheets. In this case, it references cells A1:E188 in our data set.

More SQL-like commands

You can do many database-like commands with the QUERY() function. The next example shows how you can use the ORDER BY command to find countries with deaths between 0 and 5 and the resulting list is sorted in descending order:

Check out Ben Collins’ blog post about the QUERY() function to see some of the other SQL commands you can use.

Adding in new calculated columns

In the third query, we get a little more advanced and use the LABEL command to create a new “column” called Case Fatality Rate. This calculation is simply Confirmed / Deaths. Unlike SQL, you put the LABEL at the end of the command instead of in the beginning of the SELECT statement:

Coming from SQL, you’ll need to account for the difference in the order of commands in the query in order for it to work correctly.

Inability to select column names

You’ll notice that you don’t put the actual names of the columns in your header row in the query. This can be a pro or con of the QUERY() function depending on how your underlying data set is structured.

Columns are changing a lot

If you underlying data is constantly “shuffling” where columns are moving around and the structure of the data is not set in stone, the QUERY() function will most likely break because you’re referencing the column letter instead of the column name like in a traditional SQL query.

Columns are fixed

If your columns are not shuffling around a lot, this syntax of selecting the column letter may actually be easier for you. This is because you don’t have to type out the long column name in the QUERY() function. If data is simply getting appended to the bottom of your data set, then the QUERY() function should work fine for you because the letters of the columns will always reference the correct columns of data.

PivotTables vs. the QUERY() function

One of the reasons I don’t use the QUERY() function too often is because I find PivotTables to be easy enough to use to filter, sort, and aggregate my data to do my analysis. Additionally, your columns can move around in the underlying data set and the PivotTable will still work since it’s not referencing columns by letter but rather by the name in your header row.

Plotting trend lines for COVID-19

One of the articles I discuss in this episode is this Vox article about how the Council of Economic Advisers may have applied a stock trendline in Excel to “forecast” deaths as a result of COVID-19. The article discusses the concept of “smoothing out” volatile data versus prescribing a forecast, and that line between these two concepts is a bit blurry. This is the cubic chart in Excel which you can easily build from the trendline features in Excel:

Source: Vox

And then this is the chart from a CEA Tweet that appears to show the cubic trendline as a potential forecast:

SUM by David Eagleman

A book I discuss at the end of this episode is SUM: Tales from the Afterlives by David Eagleman. I read a chapter from the book called Incentive and how it relates to some recent shows I’ve been watching like Westworld and Devs. Highly recommend checking out the book.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* The Trump administration’s “cubic model” of coronavirus deaths, explained by Matthew Yglesias

* Jocko Podcast #222: Life is a Challenge. Life is Suffering. So Live With Fortitude. With Dan Crenshaw

* SUM: Forty tales from the afterlives by David Eagleman

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst episode #31: Writing Google Apps Scripts to sync data from Coda to Google Sheets

Al — Mon, 18 May 2020 13:19:53 GMT

This post originally appeared on the KeyCuts blog.

I worked on a "small" side project recently to sync data between Google Sheets and tables in Coda. The full blog post tutorial is here, and the GitHub repository is here. I started using Google Apps Script last year and it's a super powerful way to connect different apps you use in the G Suite ecosystem. The impetus for creating these two scripts was seeing a few people in the Coda community talk about syncing data between their Google Sheets and Coda. The big caveat is that these are only one-way syncs, but there are several use cases where doing this could be useful in business workflows and making your team more productive.

Writing a script in Google Apps Script

Some Google Apps scripts can be super simple to set up. See this pretty simple workflow below of sending email automatically when there is data in your Google Sheet:

Most of the "work" with writing these scripts was transforming data so that the model in Google Sheets matches the model in Coda as per Coda's API. Once that data munging is done, the rest of the script was relatively easy in terms of giving users the ability to add, delete, and modify data. I would highly recommend taking a look at Google Apps Script especiallyif you use a lot of Google Sheets. You'll be able to connect your Google Sheet with other applications in G Suite and other 3rd-party apps you use for work.

Use cases for syncing data between Coda and Google Sheets

This comes straight from the blog post, but thought it was worth repeating again:

Data synced from your Google Sheet

* HR & recruiting - All your candidates are stored in a Google Sheet but you want to be able to move candidates through different stages in the interviewing pipeline and Google Sheets isn't sufficient for your needs. Having all your candidates in a table in Coda means you can use templates like this one to manage candidates more effectively.

* E-commerce and ERP - Orders, customers, and POs may all be different tabs in a Google Sheet that gets updated through Shopify or some other e-commerce platform. In order to manage your e-commerce business, you may want to see charts, calendar of shipments, and reports that Google Sheets cannot provide easily. Syncing the data from Google Sheets to Coda means you can do ERP properly (see this template as an example).

* Customer Feedback - You may have a ticketing system like Zendesk or Intercom and all feedback lands in a Google Sheet somewhere. You can do some basic analytics in the Google Sheet but to reply to the feedback means you have to go into Gmail and start replying to customers. If your customer feedback is all in a Coda doc, you can run analytics and send emails using the Gmail Pack (see this template).

Data synced to your Google Sheet

* 3rd-party vendor reporting - Your vendors may not be using Coda yet, but you have all your vendor data in Coda and need to send them the data in a format they prefer. While you could publish your Coda doc, the vendor still wants the data in a Google Sheet you have edit access to.

* Data "backup" - Your team may create thousands of rows of data every quarter in a Coda doc and want to start each quarter "fresh." Coda docs grow with your teams and they may get slow as you add in more functionality, so having a backup of your data in Google Sheets is another reason to sync data from your Coda doc to Google Sheets.

* Finance & Accounting - Most internal finance and accounting functions still use Excel and spreadsheets for month-end reporting, taxes, and other business-critical activities. As your data grows in Coda, you can keep your finance counterparts in the loop by having your data synced to a Google Sheet which your finance team can use for their reporting and forecasting purposes.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* Visual Developer's Podcast #25: Sheets vs. Airtable vs. Coda

* Jocko Podcast #226: The Code. The Evaluation. The Protocols. The PATH. With David Berke

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Episode #30: How to learn Excel while staying at home during COVID-19

Al — Mon, 11 May 2020 10:55:15 GMT

This post originally appeared on the KeyCuts blog.

Now that you're staying home and picking up new hobbies and taking classes online, here are a few tips on how to learn Excel and spreadsheets from an online class. I have seen viewership on my own Excel classes spike since COVID-19 hit which has led me to think about the best way to learn online.

First of all, why are so many people trying to learn Excel? Maybe since all schools and universities have pushed to online learning, students may be questioning the value of their college degrees. Maybe I should start learning skills that will actually help me land a job...enter stage left: Excel and spreadsheets.

Spreadsheets most sought after skill

In episode 22, I brought up an episode of Freakonomics where they discussed different stats around subjects Freakonomics listeners wished they had learned in high school to better prepare them for their current jobs. The high-level numbers:

Skills currently used on their jobs

* Less than 5% - Percent of survey responders who said they still use calculus, trigonometry, or geometry in their current jobs

* 70% - Those who use Excel or Google Sheets on a daily basis

* 75% - Those who visualize data or present data to make an argument on a daily, weekly, or monthly basis

Skills people wished they had learned in high school

* 0% - Those who wished they had learned other traditional math subjects in high school beyond what they had already learned

* 65% - Those who wished they had learned skills around analyzing and interpreting data to uncover insights

* 60% - Those who wished they had learned how to visualize and present data

It's pretty clear that data-related skills are what's actually being used on the job, and during a pandemic where you may have been furloughed, laid off, graduating from university, or really any scenario where your future is unclear and you want to secure a job, learning Excel and data skills may bubble to the top on your to-do list while you're in quarantine at home. Hopefully these tips will help you gain the skills you need to learn Excel and spreadsheets to help land your next job.

1) Block out time on your calendar to take your class

If you're a fan of David Allen's Getting Things Done philosophy, you've probably head the phrase that if it doesn't gets scheduled, it doesn't get done. Blocking off time on your Google or Outlook calendar to actually take your Excel class versus taking the class when you feel like it will ensure you get through the material and get into a state of flow with the material.

2) Minimize distractions

While it's easy to stay connected with family and friends while at home, you really need to put away your phone and apps for doing all your meetings and virtual hangouts. Turning off notifications for Facetime, Facebook, Houseparty, Slack, etc. will ensure you can get some uninterrupted time to learn Excel. There are small nuances to writing Excel formulas that can be easy to overlook when you are distracted by your friends or social media.

3) Connect with the instructor and community

Many online Excel classes encourage you to ask the instructor questions and many platforms such as Skillshare encourage students to participate in the community of other students who are taking the class with you. For my Excel classes, there are several discussions where students ask me questions and either I or another student taking the class will jump in an answer. Active participation ensures you are engaged with the class and the instructor and students can help keep you accountable.

4) Have Excel open alongside the video

It's easy to simply watch a screenshare of an instructor doing something in Excel and say: "I get that, that looks easy to do." It's one thing to see the instructor write a VLOOKUP() formula but a completely different experience when you write the formula yourself. Have Excel or Google Sheets open next to the window where you are taking the class is important for you to get hands-on experience with using Excel. Pause the video and try doing what the instructor is doing in Excel.

5) Practice with real use cases from your daily life

Probably the most important tip. In order to take what you learn from the online Excel class marketable to the real world, you need to use spreadsheets for real-life scenarios. The main way I learned Excel was from looking at other people's spreadsheets in a work environment. If you know someone who can share an Excel file they use at work (removing sensitive info, of course), this would give you a way to see how people use Excel in the real world. Then you can talk more intelligently about how you might design a spreadsheet during an interview.

Don't have access to Excel files from people who use Excel every day? Try Googling "financial model Excel example" or "track customers Excel example" and you'll get all sorts of nice templates. Better yet, take Google Sheets or Excel and start tracking something in your daily life. The number of home workouts you do every week. What you are spending on online deliveries. Track COVID-19 stats for your county or state. By building these simple reporting tools, you'll get a feel for how to use spreadsheets for a real world use case.

Some of my favorite Excel teachers

Been following some of these instructors for a while now, and can definitely say their classes are worth checking out if you are new to Excel:

* Oz du Soleil's Lynda classes

* Mynda Treacy's myOnlineTraininghub classes

* Bill Jelen's Mr. Excel YouTube channel

MAKRO is back!

One of my favorite Excel streamers is back with this livestream. He makes some good points about how Microsoft is dumbing down Excel for beginners and alienating advanced Excel users. Bless you MAKRO.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

* a16z Episode #523: Innovation Through Software Development and IT

* Knuckleheads Season 3 Episode #6: Isiah Thomas AKA Zeke

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst #29: Working with dynamic array functions and formulas that spill

Al — Mon, 13 Apr 2020 10:28:00 GMT

Have you ever wondered what an “array-entered formula” is? It’s an intermediate/advanced concept in Excel but in late 2018, Microsoft released dynamic array functions and formulas that “spill” into the cells below your current cell with a function. This makes writing formulas easier and less prone to human error, but there are some tradeoffs to […]

The post Dear Analyst #29: Working with dynamic array functions and formulas that spill appeared first on .

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst #28: Filling a formula down to the last row of your data set

Al — Mon, 30 Mar 2020 10:32:00 GMT

This spreadsheet tip is based on a question I get asked all the time when I teach (well taught) Excel at in-person classes: How do I fill a formula down to the last row of my data set without over-shooting the last row with keyboard shortcuts? This problem occurs with larger data sets where you […]

The post Dear Analyst #28: Filling a formula down to the last row of your data set appeared first on .

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst #27: Splitting a cell diagonally to label y and x-axis and COVID-19 dashboard

Al — Mon, 16 Mar 2020 09:06:00 GMT

This is an Excel trick that’s not super complicated but super useful for labelling a simple table in Excel. Let’s say you have one set of labels along the rows (e.g. “Region”) and then another set of labels along the columns (e.g. “Month”). Cell A1 is now empty because you don’t know which label to […]

The post Dear Analyst #27: Splitting a cell diagonally to label y and x-axis and COVID-19 dashboard appeared first on .

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst Episode 26: Data visualizations for infectious diseases/ideas during coronavirus (COVID-19)

Al — Mon, 09 Mar 2020 09:38:00 GMT

Given the media attention placed on the coronavirus (COVID-19) in the media the last few weeks, I thought it was important to take a step back and look at the math behind infectious diseases and how diseases spread. I spend the entire episode taking a look at Going Critical, a blog post by Kevin Simler […]

The post Dear Analyst Episode 26: Data visualizations for infectious diseases/ideas during coronavirus (COVID-19) appeared first on .

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst Episode 25: Structuring data challenge (denormalize data) with Get and Transform

Al — Tue, 18 Feb 2020 11:03:00 GMT

This episode is based on a video and Tweet posted by Mr. Excel (Bill Jelen). Bill discusses an Excel challenge someone emailed him about regarding how to “transform” a badly structured table of data into a structure that makes it easy to do PivotTables and other downstream analysis. Interestingly, I received a ticket from a […]

The post Dear Analyst Episode 25: Structuring data challenge (denormalize data) with Get and Transform appeared first on .

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst Episode 24: Finding and ranking percentiles

Al — Mon, 03 Feb 2020 11:05:00 GMT

I discuss how to calculate percentiles in Excel or Google Sheets using the PERCENTILE function. With the PERCENTILE function, you can calculate the value that would represent nth percentile in your list of values. This is not exactly the calculation I was looking for. Instead, what if you wanted to know what the rank percentile […]

The post Dear Analyst Episode 24: Finding and ranking percentiles appeared first on .

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Revisiting Calculating Average Trends Across Time Periods in Spreadsheets

Al — Mon, 20 Jan 2020 11:38:00 GMT

In this episode, I discuss how to calculate trends over time in Excel for the purposes for forecasting future values. I reference an old post about calculating trends where someone recently left a comment about the counter-intuitiveness on calculating averages of changes in your values. In order to follow along with this episode, I would […]

The post Revisiting Calculating Average Trends Across Time Periods in Spreadsheets appeared first on .

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst Episode 22: Calculate win streaks for a pool of players in Google Sheets

Al — Mon, 16 Dec 2019 12:16:00 GMT

If you are by your computer, you may want to open this Google Sheet to understand the example discussed in this episode. I walk through a rather long formula involving the FREQUENCY(), COLUMN(), MAX(), and the ARRAYFORMULA() functions in Google Sheets. Here’s the full formula below to calculate win streaks in the Google Sheet: [crayon-5e9a5d15b7f0f403479039/] […]

The post Dear Analyst Episode 22: Calculate win streaks for a pool of players in Google Sheets appeared first on .

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

Dear Analyst Episode 21: Building No-Code Tools and Applications from Spreadsheets

Al — Mon, 18 Nov 2019 11:03:46 GMT

This is my talk from Webflow’s No-Code Conference that took place on November 13th, 2019 in San Francisco. The title of my talk was called Building No-Code Tools and Applications from Spreadsheets. The slides from my presentation are on SlideShare here. Themes from the talk This was my first time talking about my experience with […]

The post Dear Analyst Episode 21: Building No-Code Tools and Applications from Spreadsheets appeared first on .

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com

What it’s like teaching a week-long online data analytics course

Al — Mon, 11 Nov 2019 11:30:53 GMT

A few weeks ago I had the opportunity to teach a week-long data analytics course through General Assembly. The course was taught entirely online using Zoom. I discuss some of the topics the students learned in the class, and what the experience was like teaching an online class in real time. The topics we covered […]

The post What it’s like teaching a week-long online data analytics course appeared first on .

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit alchen.substack.com