Been taking a bit longer to get the core worked out that I'd have liked, but life happens (eg my mom had cancer for a month this winter, though she's fine now, which is awesome. She didn't even need chemo or rad!!).
Also, I was original planning to NOT write my own linear algebra substrate, but I quickly realized all the current tools suck, and that I needed to come up with a better numerical substrate if I wanted to do better.
What do I mean by this? With all the numerical tools out there presently, there are none that address the following falsehood that many folks believe is true: "you can have high level tools that aren't extensible but are fast, or you can have low level tools that are extensible and fast.".
I want high level tools that are fast. I want high level tools that are fast AND extensible. I want it to be easy for the end user to add new matrix layouts (dense and structure, structured sparse, or general sparse) and have generic machinery for giving you all the general linear algebra machinery with only a handful of new lines of code per new fancy layout. I want to make it idiomatic and natural to write all your algorithms in a manner that gives you "level 3" quality memory locality. I want to make sure that for all but the most exotic of performance needs, you can write all your code in haskell. (and by exotic I mean, maybe adding some specialized code for certain fixed sized matrix blocks that fit l2 or l1, but really thats not most peoples real problems ).
Heres the key point in that ramble thats kinda a big deal: getting "level 3" quality memory locality for both sparse and dense linear algebra. I think I've "solved" that, though ultimately the reality of benchmarks will tell me over the coming few weeks if I have or not.
Likewise, I think I have a cute way of using all of this machinery to give a sane performance story for larger than ram on a single machine linear algebra! Theres going to be some inherent overhead to it, but it will work, and doing a cache oblivious optimal dense matrix multiply of 2 square 4gb+ ish sized matrices on a macbook air with 4gb of ram is going to be be a cute benchmark where no other lib will be able to do out of the box. Likewise, any sparse linear algebra will have lower flops throughput than its dense equivalent, but thats kinda the price you pay for sparse.
What I find very very interesting is that no ones really done a good job of providing sparse linear algebra with any semblance of memory locality. I kinda think that I have a nice story for that, but again, at the end of the day the benchmarks will say.
I at the very least hope the basic tech validates, because there needs to be a good not gpl lin alg suite with good perf for haskell. Hmatrix being gpl has cock blocked the growth of a nice numerics ecosystem on hackage /in haskell for years, and its about time someone puts on some pants and fixes that.
Assuming the tech validates, I really hope the biz validates too (despite me likely making various pieces open source in a BSD3 style way to enrich the community / get hobbyist adoption / other libs written on top, people in haskell land try to avoid using libs that use licenses that arent BSD/MIT/Apache styles), because theres so much more that needs to be done to really have a compelling toolchain for data analysis / numerical computation / machine learning / etc, and I really really like spending my time building better tools. Building the rest of that stack will be outlandishly tractable assuming my linear algebra tech validates having the right regimes of performance on large matrices. (amusingly, no one ever benchmarks linear algebra tools in the 1+gb regime, and i suspect thats because at that point, vectorization means nothing, its all about memory locality memory locality, and a dash of cache aware parallelism).
thats the vague version :)
And thats also not even touching my thoughts on the analytics / data vis tools that go on top.
(or the horrifying fact that everyone is eager for better data vis tools, even though most data vis work is about as valuable as designing pretty desktop wall papers to background your power point presentations.... so even if i get everything working... I have a horrifying suspicion that if i allowed unsophisticated folks to use the tools, most of the revenue / interest would be around data vis tooling! Which would mostly be used to provide their customers/end users with pretty pictures that make them feel good but don't help them!)
Point being: i want to be able to say "you understand math, you understand your problem domain, and you can learn stuff. Spend 2-3 weeks playing with haskell and my tools, and you'll be able to focus on applying the math to your problem domain like never before, because you didn't even realize just how terrible most of the current tools out there you were wrestling with are!"
I really really really hope the biz+tech combo validates... because then I could occasionally stop and think "holy fuck, I'm bootstrapping my fantasy job / company, the likes of which I imagined / dreamed of as way back as middle school and high school!"
Realistically theres 3 different outcomes:
the tech doesnt validate (and thus the biz doesnt either) --- then i'm looking for a day job ... (and I'm pretty darn egomaniacal and loud, finding a good fitting dayjob would take a bit of work!)
the tech works yet the business doesnt --- Not sure how that would happen esp since no investors means enough income to support myself would still be a successful business, though I guess i'd have some compelling portfolio work if I went job hunting
the tech and biz both validate, and earning enough to move out of my parents --- magic pony fantasy land of awesome. what more could anyone want? MORE AWESOME PROBLEM DOMAINS THAT NEED BETTER TOOLS (i mean, that would really be sort of the ideal, but remains to be seen if that can happen.)
one point to make. The interface and the performance don't have to appear at once. The interface will be the longer lived portion. So sort that out, and you can focus on performance as problems crop up.
I know it's horribly boring to say, but getting those
first few customers gets you into a virtuous cycle. Given that you're bootstrapped, even a few customers will get you in a very good place, where you can spend on development.
I remember us (or rather I) talking about Mathematica. When it first came out it was horrible for numerics. Truly terrible. But it was easy to transfer technical papers to. You simply wrote down what was on the paper, and you were done.
So people used it, and eventually performance got better over time as they invested in it.
Agree with everything you say. Hence why I'm just going to be releasing the Lin alg soon. It actually turns out for the linear algebra code done right, the API has an intimate relationship with the possible performance! (This will be more apparent once I get things out the door).
There's a lot ill not be even trying to do in the first release : eg parallelism, distributed computation, sse/avx intrinsics.
Fret not, things are moving apace, and basic tech validation and thence conditioned upon that, public release, are approaching scary fast! :-)
Been taking a bit longer to get the core worked out that I'd have liked, but life happens (eg my mom had cancer for a month this winter, though she's fine now, which is awesome. She didn't even need chemo or rad!!).
Also, I was original planning to NOT write my own linear algebra substrate, but I quickly realized all the current tools suck, and that I needed to come up with a better numerical substrate if I wanted to do better.
What do I mean by this? With all the numerical tools out there presently, there are none that address the following falsehood that many folks believe is true: "you can have high level tools that aren't extensible but are fast, or you can have low level tools that are extensible and fast.".
I want high level tools that are fast. I want high level tools that are fast AND extensible. I want it to be easy for the end user to add new matrix layouts (dense and structure, structured sparse, or general sparse) and have generic machinery for giving you all the general linear algebra machinery with only a handful of new lines of code per new fancy layout. I want to make it idiomatic and natural to write all your algorithms in a manner that gives you "level 3" quality memory locality. I want to make sure that for all but the most exotic of performance needs, you can write all your code in haskell. (and by exotic I mean, maybe adding some specialized code for certain fixed sized matrix blocks that fit l2 or l1, but really thats not most peoples real problems ).
Heres the key point in that ramble thats kinda a big deal: getting "level 3" quality memory locality for both sparse and dense linear algebra. I think I've "solved" that, though ultimately the reality of benchmarks will tell me over the coming few weeks if I have or not.
Likewise, I think I have a cute way of using all of this machinery to give a sane performance story for larger than ram on a single machine linear algebra! Theres going to be some inherent overhead to it, but it will work, and doing a cache oblivious optimal dense matrix multiply of 2 square 4gb+ ish sized matrices on a macbook air with 4gb of ram is going to be be a cute benchmark where no other lib will be able to do out of the box. Likewise, any sparse linear algebra will have lower flops throughput than its dense equivalent, but thats kinda the price you pay for sparse.
What I find very very interesting is that no ones really done a good job of providing sparse linear algebra with any semblance of memory locality. I kinda think that I have a nice story for that, but again, at the end of the day the benchmarks will say.
I at the very least hope the basic tech validates, because there needs to be a good not gpl lin alg suite with good perf for haskell. Hmatrix being gpl has cock blocked the growth of a nice numerics ecosystem on hackage /in haskell for years, and its about time someone puts on some pants and fixes that.
Assuming the tech validates, I really hope the biz validates too (despite me likely making various pieces open source in a BSD3 style way to enrich the community / get hobbyist adoption / other libs written on top, people in haskell land try to avoid using libs that use licenses that arent BSD/MIT/Apache styles), because theres so much more that needs to be done to really have a compelling toolchain for data analysis / numerical computation / machine learning / etc, and I really really like spending my time building better tools. Building the rest of that stack will be outlandishly tractable assuming my linear algebra tech validates having the right regimes of performance on large matrices. (amusingly, no one ever benchmarks linear algebra tools in the 1+gb regime, and i suspect thats because at that point, vectorization means nothing, its all about memory locality memory locality, and a dash of cache aware parallelism).
thats the vague version :)
And thats also not even touching my thoughts on the analytics / data vis tools that go on top. (or the horrifying fact that everyone is eager for better data vis tools, even though most data vis work is about as valuable as designing pretty desktop wall papers to background your power point presentations.... so even if i get everything working... I have a horrifying suspicion that if i allowed unsophisticated folks to use the tools, most of the revenue / interest would be around data vis tooling! Which would mostly be used to provide their customers/end users with pretty pictures that make them feel good but don't help them!)
Point being: i want to be able to say "you understand math, you understand your problem domain, and you can learn stuff. Spend 2-3 weeks playing with haskell and my tools, and you'll be able to focus on applying the math to your problem domain like never before, because you didn't even realize just how terrible most of the current tools out there you were wrestling with are!"