« That age – which includes such historically important figures as Arrow’s fellow Nobel laureates Paul Samuelson and Gary Becker – represented a development and expansion of formal economic theory that brought unprecedented precision to the logical foundations of social science. »
And here we are, picking the fruits of this golden age of unprecedented precision. With a mountain of record low-yielding debt serving as the backbone for a mountain range of derivatives and synthetic products. With low savings and with record public participation in investments. Because the infomercials say the risk^D^D^D^Dvolatility is controlled by science and computers n shit.
Sightly off-topic, but IMHO the theory of random walks, risk neutral pricing, and the black scholes option pricing formula to be the most significant findings of the past 100 years in the field of economics, in term of: practical applications (the option and futures market is huge, and they all involve these formulas), spawning research (if you go on arXiv, the original work done in the late 60's and 70's is still spawning tons of research to this day on asset pricing and asset price dynamics, whereas other economic fields seem to have stagnated), and being mathematically empirically sound (meaning it's a complete theory that does an adequate job describing reality), and is was groundbreaking in that the result was unexpected and answered a nagging question about how to price contracts without having to define a drift variable.
There are some issues with it, though. It uses volatility as a proxy for risk---fortunately, past performance is a perfect predictor of future performance and future events can be predicted at least to the extent of normally distributed errors---and assumes a standard normal distribution---the fact that the actual distribution had a much heavier tail[1] can have no material effect, right?
> theory of random walks, risk neutral pricing, and the black scholes option pricing formula to be the most significant findings
Random walks, heck yes! Black Scholes not so much. Its a fantastic toy model, but over all I think, it did more harm than good. People took its thin tail behavior too seriously. There's way too much fluctuation in practice than a Gaussian process would fit/predict.
So I would put it as: mathematically sound enough, empirically not quite so.
> Its a fantastic toy model, but over all I think, it did more harm than good. People took its thin tail behavior too seriously.
Options theory is popularly misunderstood. Makes being a former options trader fun and annoying.
First, the impact. You can model almost anything as an option. Equity? Call option on a firm's assets struck at the liabilities. (Literally anything else? A portfolio of Arrow–Debreu options [0].) Modern risk models, and these include some very successful ones, would not work without Black Scholes.
Second, no professional uses a Gaussian assumption without knowing what they're doing. (In any case, people started re-writing Black Scholes for other curves since at least the 1990s.) No model can economically ascertain every possible risk. One must always choose risks one considers negligible. This is true in finance as in life. Sometimes we choose to ignore the wrong risks. When that happens, it's easier to blame highfalutin math than admit "I didn't think about that".
No amount of "fat tailing" will substitute rigorous risk management. Case in point: the multitude of funds launched after 2008 focussing on "black swans" and "fat tails". Pretty much all of them lost money [3]. They missed--with everyone else--Dubai's near default, Greece's actual default, the 2010 Flash Crash, 2011's summer volatility, China's 2015 crash, the following August's international crash, Brexit or practically anything else that one might consider both significant and unexpected. (No shit.)
The real "magic" in Black Scholes? It's not the distribution. It's volatility. Modern models expand this once single term into a multidimensional beast [4]. This is the other reason for the prevalence of Gaussian assumptions. Traders trade computation of the distribution for computation of the volatility surface. The latter (tau) almost always dominates the former (delta) in terms of what's being mispriced.
Black-Scholes-Merton is a theory. Analogous to Newtonian mechanics or textbook thermodynamics. They all need modification to work in the real world. That doesn't make them BS.
> [Using a fat-tailed distribution] can, however, give one a better appreciation of the risk
How does one choose a model and parameters for events which are, by definition, hard to predict? Keep in mind that most "black swans" arise from unforeseen dimensions of risk. There's the "my stock lost 90% of its value" fat tail versus "the damn exchange went bust". There's "my clearing bank is broke" and "the Russians invaded my country".
I agree that random walks are the most sensible way to model the financial markets, IMO. Anyone telling you otherwise you should be suspicious of (are they can beat the market?) Especially now with the mountain of evidence that pretty much no one can consistently beat a good market index.
The only issue with that is there are a few groups that have consistently beat the market over time, so perhaps its more pseudorandom than we think. One example might be the Renaissance Medallion Fund.
I think people who actually can beat the market fall into the general category of "knows something others don't". Renaissance probably is one of the very few firms that could qualify as that on purely technocratic means (instead of the usual, which is shades of grey in what's really insider trading). The other category would be simply HFT, but I'm not sure how profitable that is at this point, though.
Most of the people who seem to beat the market, though, are generally just happy benefactors of survival bias (eg. the 10000 monkeys on typewrites effect)
From the OP, I see that I've been closer
to Arrow's work than I knew!
In grad school, I had a relatively severe
introduction to optimization including
both linear and non-linear programming.
The non-linear programming was mostly
about the Kuhn-Tucker conditions, and
there the work was mostly about the
Kuhn-Tucker necessary conditions. Kuhn
and Tucker were long at Princeton. The
guy who was the Chair of my Ph.D. oral
exam had been a Tucker student.
Before I got to that grad program, I had
carefully studied W. Rudin, 'Principles of
Mathematical Analysis' (a.k.a. Baby Rudin)
and W. Fleming, 'Functions of Several
Variables'. In my first year of grad
school, I also had a severe course in H.
Royden, 'Real Analysis', the real part of
W. Rudin, 'Real and Complex Analysis',
Neveu, Breiman, Chung, etc.
So, that background gave tools that helped
attack the Kuhn-Tucker conditions.
Intuitive view of the Kuhn-Tucker
conditions (KTC): You are in a cave with
an uneven floor and vertical walls and you
want to find the lowest point. If you put
down a marble and it starts to roll, then
you are not at the lowest point. So, to
be at the lowest point, it is necessary
that the marble not roll.
But K-T wanted more: They wanted to say
that necessarily the slope of the floor
(calculus gradient) and the slopes of the
constraints that define the walls are such
that the slopes from the walls block
moving along the slope of the floor.
Right, the slopes from the walls form a
cone that contains the slope from the
floor (or its negative depending on
maximizing or minimizing and the direction
of the constraints, etc.) -- it's all
about a cone.
Well, this stronger statement is true in
nice cases, and for a nice case have
to have some assumptions that the
constraints are nice. For that there are
various KI constraint qualifications,
KTCQ, that are enough to make the KT
statements about slopes true.
There are lots of KT CQs, and one question
was, which imply the others?
For two famous KT CQs, one due to KT and
one due to Zangwill, it was not known if
they were independent.
So, as a grad student, I settled that --
they are independent. The proof was by
counterexample -- I found some bizarre
constraints.
To know that such bizarre (goofy,
pathological, etc.) constraints could
exist, I needed essentially a theorem
For a positive integer n, the real numbers
R, Euclidean n-space R^n with the usual
topology, and a subset C of R^n closed in
that topology, there exists a function
f: R^n --> R
such that f is zero on closed set C,
strictly positive otherwise, and
infinitely differentiable. So, I proved
that. For the KT CQ I didn't need all of
infinitely differentiable, but I got that
also.
This result is curious in part because
some examples of a close set C can be
surprisingly intricate, e.g., the
Mandelbrot set, a sample path of Brownian
motion, Cantor sets of positive measure,
etc.
As I went to publish, I discovered that my
work also answered a question asked but
not answered in a paper by Arrow, Hurwicz,
and Uzawa.
Of course Arrow got his Nobel Prize. A
few years ago, so did Hurwicz. Last I
heard, Uzawa was still waiting! Cute: As
a grad student I answered a question asked
but not answered by Arrow, Hurwicz, and
Uzawa. Reading Rudin and Fleming helped!
Gee, in the OP, I see that Arrow was also
interested in decision making under
uncertainty. Well, my dissertation
research was in best decision making over
time under uncertainty -- stochastic
optimal control.
I never took a course in economics. My
Ph.D. advisor thought that I would need
such a course if only later in my career
to fend off nonsense objections from
economists -- I've never needed that!
So, I signed up for an econ course, went
the first day, sat in the front row, said
nothing, and took careful notes. After
the class when just the professor and I
were there, I asked him what he was
assuming for his supply and demand curves
-- continuous, uniformly continuous,
differentiable, continuously
differentiable, infinitely differentiable,
convex, pseudo-convex, quasi-convex, etc.?
He said nothing.
Soon I got a call from my department
secretary to call my Ph.D. advisor -- I
was out of the econ course!
Still, the OP shows that I was closer to
some mathematical economics than I knew!
Maybe someday some people in data science
or artificial intelligence will exploit
the KTC!
> I asked him what he was assuming for his supply and demand curves -- continuous, uniformly continuous, differentiable, continuously differentiable, infinitely differentiable, convex, pseudo-convex, quasi-convex, etc.? He said nothing.
> Soon I got a call from my department secretary to call my Ph.D. advisor -- I was out of the econ course!
What the hell? The first two semesters of microeconomic theory in graduate school go over all of that. Supply and demand are formalized down to set theory coming up (from Arrow's work!)
If you are interested, the Mas-Colell/Whinston/Greene text is the bible all PhD students are forced through in microeconomics. Start at the set theory level, define what permits construction of a utility function, and get to defining supply and demand from there. Then get to game theory and other topics.
You even have the theorems where free markets fail to be an efficient mechanism of allocation! We've known those things for decades, but economics is so politicized that it's hard for information out.
I was a grad student in applied math and had done the basic research for my dissertation, solved the KTCQ problem, was polishing the research, and writing the illustrative software when my advisor suggested I take an econ course.
The econ course was in the econ department, not my department, and was not an econ grad course!
But, whatever the course was, the econ prof was apparently just terrified of my question!
One way and another, maybe I've touched on much of what you mentioned. E.g., the optimization I studied, with the math rock solid, was a good start on game theory. Later, while in a part time job, to support us while my wife finished her Ph.D. and basically time-out from my Ph.D., I took a job in military systems analysis with a lot in game theory. So, I dug into parts of G. Owen's book on game theory which did the axiomatic utility function stuff and T. Parthasarathy and T. E. S. Raghavan which did a lot of fixed point theorems, Sion's result, Lemke's proof of Nash's result, etc. Sure, in the relatively general game theory the job had me in, I had to consider saddlepoint results, and apparently that is the core of equilibrium theory in econ.
Once a tried a book on math econ,
and it was just a lot of elementary
regression analysis. Later I saw another such, by Tata?, and more advanced but still regression -- a place to see more about regression than want to know, and maybe the AI people should take a look. Later saw another such book, by Duffie, and early on it was heavily about the Kuhn Tucker conditions. I read the first chapter or two quickly and had some questions, went back, read carefully, and found a counterexample for every statement in that material.
I did want to see a clean, solid, mathematical treatment of the Sharpe idea but didn't find that -- D. Luenberger, a good mathematician, has a book on finance that may have such a treatment.
Thanks for the reference on micro. I copied that in my place for such things and will look at it if I get interested in econ after I exit from my startup!
Most of the advanced math in economics in current research is either in econometrics or game theory. Not saying econ is math-less, far from it, but most of the time we can't simply solve our problems by applying advanced math like physics can.
Equilibrium concepts in game theory are a tough thing. The holy grail is still getting a unifying concept of a "stable equilibrium" (see Kohlber & Mertens '86) which is pretty much a guaranteed Nobel (but I'm not sure it even exists, so many geniuses worked on the problem without success).
> I asked him what he was assuming for his supply and demand curves -- continuous, uniformly continuous, differentiable, continuously differentiable, infinitely differentiable, convex, pseudo-convex, quasi-convex, etc.? He said nothing.
> Soon I got a call from my department secretary to call my Ph.D. advisor -- I was out of the econ course!
This is my favorite economist joke: There is a physicist, chemist, and an economist stuck on a desert island together. They find a can of beans, and are discussing how to open the can. The physicist says "Lets climb up the tree and drop a rock on it. The force from the drop will make the can explode and open." Then the chemist says "We should put the can in some saltwater. The metal will corrode, and we can get inside." Finally the economist says, "Let's assume a can opener..."
Yup, on wild assumptions, considering utility functions, the average busy housewife and mother of four children, all under 6, goes to a big grocery store, gets a list of all the items for sale and their prices, notes her utility function as a function of the whole inventory, notes her grocery budget, and solves the likely NP-complete, non-linear, discrete (integer) optimization problem, maybe under uncertainty if she is buying extra bananas on sale and risking that they go bad too soon or buying too little fresh chicken hoping that they may be on sale in two days, etc., all in her head, right away!
Yup, and if buy a lot more transistors, then the price for each will go up? Hmm. Transistors used to be several dollars each, and now can get a billion or so for less than $100.
If buy more disk space, then the price per byte will go up? Gee, it used to be that got 300 MB for $40,000 and now can get 2 TB for about $50.
If buy more in computing, then the price per unit of computing will go up? Hmm, used to be could get a nice DEC dumb terminal for $1400, and now can get one heck of a desktop computer for that.
So, sometimes, if buy a lot more of something and wait a while, then the price per unit can go down, not up. If don't want to wait, then buy in quantity and get a volume discount. Or, instead of buying the teeny, tiny, itty bitty bottles of sweet pickle relish, buy the gallon size at much less cost per ounce. If are buying in really big quantities, say, Hertz buying a fleet of cars, then Ford can put on an extra shift and get the price way down just for Hertz.
Net, that one day in econ class looking at free hand apparently differentiable convex supply curves was a bummer.
>Maybe someday some people in data science or artificial intelligence will exploit the KTC
Climb out of under the rock will you :) [We have met enough over HN that I thought I could take the liberty in good fun]
As I have said (little less than a million times to you) there is more to data-science/machine-learning than Breiman's CART which for some reason you have latched on to. If you do a random walk in machine learning concepts the parts that don't involve KTC is a set of measure zero.
Read Vapnik, I can bet you will like it: Glivenko Cantelli on steroid meets KTC that is what Vapnik's results are. I doubt anyone will dispute that he is the father of statistical learning theory.
That rock you mention is where I work on
my startup -- with money in mind.
My main reason to do applied math is to
apply it, to business, the money making
kind.
Yes you have told me about Vapnik before,
but the rest I see, e.g., deep learning,
smells like tweaks on Breiman's CART. I
have a lot of respect for Breiman, maybe
more for his fellow student Neveu as they
were at Berkeley under Loeve.
I have a tough time taking Silicon Valley
seriously on anything serious about
machine learning -- new words for
statistical model building and estimation
but done with much more data.
In
Trevor Hastie, Robert Tibshirani, and
Jerome Friedman, The Elements of
Statistical Learning Data: Mining,
Inference, and Prediction, Second
Edition, Springer, 2008.
I find mention of Vapnik, mostly for
separating hyperplanes, and references
Vapnik, V. (1996). The Nature of
Statistical Learning Theory, Springer, New
York.
Vapnik, V. (1998). Statistical Learning
Theory, Wiley, New York.
Looking at Hastie, et al.,
which seems relatively elementary, I
wouldn't expect to find much on the
Kuhn-Tucker conditions or Glivenko
Cantelli.
In
Kevin P. Murphy, Machine Learning: A
Probabilistic Perspective, ISBN
978-0-262-01802-9, MIT Press, 2012.
I saw no mention of Vapnik.
In
Shai Shalev-Shwartz and Shai Ben-David,
Understanding Machine Learning From
Theory to Algorithms, 2014.
I see
Vapnik, V. (1992), Principles of risk
minimization for learning theory, in J. E.
Moody, S. J. Hanson & R. P. Lippmann, eds,
`Advances in Neural Information Processing
Systems 4', Morgan Kaufmann, pp. 831{838.
Vapnik, V. (1995), The Nature of
Statistical Learning Theory, Springer.
Vapnik, V. N. (1982), Estimation of
Dependences Based on Empirical Data,
Springer- Verlag.
Vapnik, V. N. (1998), Statistical Learning
Theory, Wiley.
Vapnik, V. N. & Chervonenkis, A. Y.
(1971), `On the uniform convergence of
relative frequencies of events to their
probabilities', Theory of Probability and
its applications XVI(2), 264{280.
Vapnik, V. N. & Chervonenkis, A. Y.
(1974), Theory of pattern recognition,
Nauka, Moscow. (In Russian).
Okay, that may be a first-cut list of what
to read on Vapnik.
Gee, machine learning and not from Silicon
Valley -- the second part is a step up!
Not from Pittsburgh either -- two steps
up!
Probabilistic and from Russia? More steps
up!
For more steps up, maybe some French or
Japanese could contribute?
For such things, I could take Bertsekas at
MIT seriously.
I'm deep into my startup; there I've done
my applied math derivations and written
the code.
It is true that I have some suspicions
that some old techniques tweaked a little
and maybe some newer techniques might
increase accuracy some, but I'm leaving
that on the back burner for now, avoiding
premature optimization.
But I'll index this post and not forget
about Vapnik.
Indeed, I can imagine (and still grossly underestimate) how frantically busy you would be. All the best for your startup. As I said, it was just a friendly ribbing. Given our interactions on HN I have come to gauge your taste a bit, hence my recommendations.
Vapnik, V. N. (1998), Statistical Learning Theory is a tome. You probably would not have time to read this. I would say go with the other two books of his. They sort of summarize his body of early work. Again a lot of water has flown below the Vapnik bridge, but you will get a non-hyped view of what ML is about.
Kevin P. Murphy, Machine Learning: A Probabilistic Perspective is also a hefty read, but its a good source and quite complementary to Vapnik's treatise. Kevin Murphy's book is about how to tractably model joint distribution of a (mostly discrete) set of random variables and draw inferences from them. So this is about smart and efficient ways to marginalize and condition and compress.
Shai Shalev-Shwartz and Shai Ben-David, Understanding Machine Learning From Theory to Algorithms, 2014
This is also a great book. You will see a lot of KTC here but mostly within the framework of convexity and duality.
Yes, I've heard about nearly only Kuhn-Tucker; apparently Karush got ripped off.
E.g., it's the Cooley-Tukey fast Fourier transform, but IIRC once that work became popular, and for a while was wildly popular and now is likely just crucial at the core of lots of work, some people found work similar or the same going way back. Still Cooley-Tukey get credit.
>Intuitive view of the Kuhn-Tucker conditions (KTC): You are in a cave with an uneven floor and vertical walls and you want to find the lowest point. If you put down a marble and it starts to roll, then you are not at the lowest point. So, to be at the lowest point, it is necessary that the marble not roll.
Lovely analogy, I'll use this when trying to explain this to people from now on!
No, it's not standard. I doubt that it is in Counterexamples in Analysis, one of my favorite books.
No, don't need a non-empty interior for the closed set. The closed set can even be empty. The function is to be zero on the closed set and positive otherwise. For a single point could use just the squared distance to the point, i.e., a parabola.
For a closed set like the Mandelbrot set and for the general case, have to try harder.
For a fast version of the proof, outside the closed set is an open set. If it is empty, then just let the function f = 0.
Else in that open set, pick a countable dense set. Roughly for each point in that countable dense set, solve the problem for that set, and then add all the countably many partial solutions in a convergent way.
A key is a Baby Rudin exercise of a function g: R --> R where
g(x) = 0 for x <= 0
g(x) > 0 for x > 0
and g is infinitely differentiable.
Sure, g is based on an exponential.
Then for f: R^2, at each of the
countably many points, spin g
to make a smooth hill where the
bottom of the hill just touches the
closed set C. The spin, being
intuitive here, also works for f: R^n.
Sure, for each of the countably infinitely many points, the spin, in g(z), z is the
distance to the to the point. I'm being
intuitive and sloppy here to be brief and easier to understand.
Then in R^2, at the origin,
for each positive rational of the
form p/q in lowest terms,
have a ray from the origin
at angle p radians and
length 1/q. That's a bizarre
closed set. Then at
the origin, the Zangwill and
KTCQ don't agree, are independent.
Here, again, I'm being
brief.