Whats happening then?

spitfire · on April 15, 2013

From my talks with Carter previously he's working on a haskell based platform for analytic tooling in Haskell. Basically the core primitives for making large scale data analysis apps in Haskell.

Or that's what he was up to in august. Hopefully he can chime in here and update everyone on his progress. I honestly hope he succeeds in his plans.

carterschonwald · on April 15, 2013

Still working on it!

Been taking a bit longer to get the core worked out that I'd have liked, but life happens (eg my mom had cancer for a month this winter, though she's fine now, which is awesome. She didn't even need chemo or rad!!).

Also, I was original planning to NOT write my own linear algebra substrate, but I quickly realized all the current tools suck, and that I needed to come up with a better numerical substrate if I wanted to do better.

What do I mean by this? With all the numerical tools out there presently, there are none that address the following falsehood that many folks believe is true: "you can have high level tools that aren't extensible but are fast, or you can have low level tools that are extensible and fast.".

I want high level tools that are fast. I want high level tools that are fast AND extensible. I want it to be easy for the end user to add new matrix layouts (dense and structure, structured sparse, or general sparse) and have generic machinery for giving you all the general linear algebra machinery with only a handful of new lines of code per new fancy layout. I want to make it idiomatic and natural to write all your algorithms in a manner that gives you "level 3" quality memory locality. I want to make sure that for all but the most exotic of performance needs, you can write all your code in haskell. (and by exotic I mean, maybe adding some specialized code for certain fixed sized matrix blocks that fit l2 or l1, but really thats not most peoples real problems ).

Heres the key point in that ramble thats kinda a big deal: getting "level 3" quality memory locality for both sparse and dense linear algebra. I think I've "solved" that, though ultimately the reality of benchmarks will tell me over the coming few weeks if I have or not.

Likewise, I think I have a cute way of using all of this machinery to give a sane performance story for larger than ram on a single machine linear algebra! Theres going to be some inherent overhead to it, but it will work, and doing a cache oblivious optimal dense matrix multiply of 2 square 4gb+ ish sized matrices on a macbook air with 4gb of ram is going to be be a cute benchmark where no other lib will be able to do out of the box. Likewise, any sparse linear algebra will have lower flops throughput than its dense equivalent, but thats kinda the price you pay for sparse.

What I find very very interesting is that no ones really done a good job of providing sparse linear algebra with any semblance of memory locality. I kinda think that I have a nice story for that, but again, at the end of the day the benchmarks will say.

I at the very least hope the basic tech validates, because there needs to be a good not gpl lin alg suite with good perf for haskell. Hmatrix being gpl has cock blocked the growth of a nice numerics ecosystem on hackage /in haskell for years, and its about time someone puts on some pants and fixes that.

Assuming the tech validates, I really hope the biz validates too (despite me likely making various pieces open source in a BSD3 style way to enrich the community / get hobbyist adoption / other libs written on top, people in haskell land try to avoid using libs that use licenses that arent BSD/MIT/Apache styles), because theres so much more that needs to be done to really have a compelling toolchain for data analysis / numerical computation / machine learning / etc, and I really really like spending my time building better tools. Building the rest of that stack will be outlandishly tractable assuming my linear algebra tech validates having the right regimes of performance on large matrices. (amusingly, no one ever benchmarks linear algebra tools in the 1+gb regime, and i suspect thats because at that point, vectorization means nothing, its all about memory locality memory locality, and a dash of cache aware parallelism).

thats the vague version :)

And thats also not even touching my thoughts on the analytics / data vis tools that go on top. (or the horrifying fact that everyone is eager for better data vis tools, even though most data vis work is about as valuable as designing pretty desktop wall papers to background your power point presentations.... so even if i get everything working... I have a horrifying suspicion that if i allowed unsophisticated folks to use the tools, most of the revenue / interest would be around data vis tooling! Which would mostly be used to provide their customers/end users with pretty pictures that make them feel good but don't help them!)

Point being: i want to be able to say "you understand math, you understand your problem domain, and you can learn stuff. Spend 2-3 weeks playing with haskell and my tools, and you'll be able to focus on applying the math to your problem domain like never before, because you didn't even realize just how terrible most of the current tools out there you were wrestling with are!"

carterschonwald · on April 15, 2013

I really really really hope the biz+tech combo validates... because then I could occasionally stop and think "holy fuck, I'm bootstrapping my fantasy job / company, the likes of which I imagined / dreamed of as way back as middle school and high school!"

Realistically theres 3 different outcomes:

the tech doesnt validate (and thus the biz doesnt either) --- then i'm looking for a day job ... (and I'm pretty darn egomaniacal and loud, finding a good fitting dayjob would take a bit of work!)

the tech works yet the business doesnt --- Not sure how that would happen esp since no investors means enough income to support myself would still be a successful business, though I guess i'd have some compelling portfolio work if I went job hunting

the tech and biz both validate, and earning enough to move out of my parents --- magic pony fantasy land of awesome. what more could anyone want? MORE AWESOME PROBLEM DOMAINS THAT NEED BETTER TOOLS (i mean, that would really be sort of the ideal, but remains to be seen if that can happen.)

spitfire · on April 16, 2013

Thanks for the update Carter. I'll root for you.

one point to make. The interface and the performance don't have to appear at once. The interface will be the longer lived portion. So sort that out, and you can focus on performance as problems crop up.

I know it's horribly boring to say, but getting those first few customers gets you into a virtuous cycle. Given that you're bootstrapped, even a few customers will get you in a very good place, where you can spend on development.

I remember us (or rather I) talking about Mathematica. When it first came out it was horrible for numerics. Truly terrible. But it was easy to transfer technical papers to. You simply wrote down what was on the paper, and you were done.

So people used it, and eventually performance got better over time as they invested in it.

carterschonwald · on April 16, 2013

Agree with everything you say. Hence why I'm just going to be releasing the Lin alg soon. It actually turns out for the linear algebra code done right, the API has an intimate relationship with the possible performance! (This will be more apparent once I get things out the door).

There's a lot ill not be even trying to do in the first release : eg parallelism, distributed computation, sse/avx intrinsics.

Fret not, things are moving apace, and basic tech validation and thence conditioned upon that, public release, are approaching scary fast! :-)

carterschonwald · on April 15, 2013

you know exactly what i'm up to slowly with wellposed :)

(building numerics/ data analysis tools that dont' fill me with rage and ire over terrible engineering and usability. A matrix / linear algebra kernel of tools are on track to be ready for hackage release + paid pro versions in 1-2 months)

zygomega · on April 15, 2013

Classical landing page had me hooked anyways :-)

carterschonwald · on April 15, 2013

thankee good sir. I'm glad some people appreciate my vague semblance of prioritization skills. Hopefully that pans out to being able to have any early customers be of the sophisticated sort that I'm excited to work with / help!

nazka · on April 15, 2013

What is the name of your library? Or where do you post news?

carterschonwald · on April 15, 2013

It's not out yet. If you really want to hear about things as they happen, signup for the announce list linked to from www.wellposed.com , I've yet to fire off any emails to that list, but I anticipate 2-3 emails over the next 1-2 months (after a year of hard work and focused thinking).

What's also kinda awesome is I think the alpha release with all the Lin alg functionality should be under 2k loc. a lot of the work has been figuring out how to make the design composable and extensible enough that I can write a first working version with good performance just on my own. Ironically, that's also a compelling way to Validate that my tech delivers.

jackpirate · on April 16, 2013

Hey, I'm the author of the [HLearn library](http://hackage.haskell.org/package/HLearn-algebra). I just got two papers accepted into TIFP and ICML about the algebraic nature of machine learning, and how we can make machine learning algorithms both fast and user friendly in Haskell. I plan for a major update of my library in about a month to make it up-to-date with these research contributions, and I'd love to chat about how we can work together to make this a reality. My email's in my profile.

carterschonwald · on April 16, 2013

Oh cool! I've seen your stuff, it's very very neat, but gpl makes me nervous about looking at the code / playing with it much.

Yeah, we should def chat! My emails also in my profile. Lets one of us get around to emailing the other sometime this week.

nazka · on April 16, 2013

Cool! Thx guys. Did you look at this library: Cloud Haskell/distributed-process? [1]

[1] https://github.com/haskell-distributed/distributed-process

carterschonwald · on April 16, 2013

It's promising but still needs a lot of higher level tools before its usable by a none expert. Distributed systems done right are hard. Lets go shopping.