Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Materials Project (materialsproject.org)
117 points by infinitydeltax on July 4, 2022 | hide | past | favorite | 39 comments


That Chemist tried to use a paper published in Nature to synthesize something for his lab, but he found a number of potential concerning details in the paper and was unable to reproduce it using the published method. The authors and publication ghosted him and the paper was still up after 2 months. https://www.youtube.com/watch?v=-WPBtFTLZnM

How do projects like this deal with papers published based on falsified data? Do they reproduce any of the source data themselves?


> How do projects like this deal with papers published based on falsified data? Do they reproduce any of the source data themselves?

I can't speak to this specific instance, but Materials Project does try to pay close attention to questions of reproducibility and provenance. Materials Project runs open-source repos[0] so that its methods can be verified, individual calculations are available via an API[1] and we also partner with NOMAD[2] to make larger files and calculation artifacts available for direct download. This is in addition to documenting methods via peer-reviewed papers, online docs, etc.

This is not to say that issues of reproducibility don't still exist, or that we ourselves couldn't be doing better. It's a big problem in the community.

[0] https://github.com/materialsproject [1] https://api.materialsproject.org/docs [2] https://www.nomad-coe.eu


Retraction watch.

Btw many people (even trained) fail to reproduce legitimate work. It may not actually be fake or falsified, typically not being able to reproduce an experiment is insufficient evidence for retraction. I didn't watch the YouTube video.


Can confirm.

People outside academia seem to believe that reproducibility is an explicit goal when publishing papers. Perhaps it should be, but generally speaking it isn't. Instead a paper is a record of the things the authors found interesting and novel when doing the work.


And if they’re objectively wrong, then what?


"Wrong" is different to "fraudulent"

If they are wrong then you publish a paper with conflicting results. If there is fraud (which is different to "I can't replicate the results") then you deal with the publisher of the paper.


Well there's two problems, first is journals refusing to publish anything that conflicts with previous publications (a despicable practice that goes against the most fundamental principles of science that include curiosity and the willingness to be wrong and change, see also [1]), second is a reasonable expectation to get a response from the original authors when asked about a failure to replicate (even in the mode of "you're stupid and did it wrong").

[1]: The importance of stupidity in scientific research | https://news.ycombinator.com/item?id=31977918


A major shoutout to Materials Project team. Its one of the first open data initiatives from academia. Beyond its stated goal, Materials Project has also helped countless computation material scientists to pivot their career to become Data Scientists and have successful corporate careers.


I believe this pivoting example might be useful to scientists, support staff and postdocs in other fields as well (e.g., physics, chemistry, bioinformatics, material science).

A few specifics that might be useful for scientists that are considering transitioning to industry.

Pivot sooner rather than later. The transition might be a bit rough early on. In general, you can't get away with the heroic/cowboy approach that you can in a smaller academic setting. The interview process can be rough and might have a bit of RNG to it.

Aim to jump around every 2 years (3 at the absolute most) early on in your career. This will help make sure you're leveling up your skill set, getting those hard earned experience points and seeing different approaches to the software development process.

Be mindful of "infrastructure"-ish positions that might make it hard to get noticed. You probably have a high skill ceiling, but you are rough around the edges. It's essential to find a mentor that sees your potential and can help you level up (this might be challenging for remote positions).

At a minimum, you'll have an order of magnitude more job opportunities and mobility. Mobility in academic projects can be challenging to navigate. It can be easier to find a place outside of the academic space where your interests and skills will be aligned. Industry can also have interesting problems to solve.

I suspect that the current economic climate combined with the current clusterfuck housing market might create challenges for retaining support staff and postdocs in these academic/national lab driven projects.


This project is amazing. So much useful information! Even if it's just for students or graduate students to learn from.


Thanks both for the appreciation, it's really nice to see! Will forward to the team :)


Hi everyone, fun to see The Materials Project make the front page! I work on this, happy to answer any questions.


The page says the data is licensed under CC-BY (presumably in countries that have sui generis database protection, rather than countries like the US where facts aren't copyrightable). This is great!

Is there a torrent? How can we ensure that this treasury of materials knowledge is preserved 64, 256, or 1024 years into the future, even if, for example, the US goes to war against Russia or China and decides to criminalize exporting materials data?


In the short (~decade) term, we do tape backups of calculation data in Berkeley, and offload data to an independently-funded European project (NOMAD), to ensure data is in at least two locations. Likewise, our production databases are automatically backed up in the cloud, but we also keep a local mirror on a bare metal server. In the longer 2^6-year time frame or further out still, I would just be flattered if the data is at all still useful for people. I think it's fair to say our community has a lot of challenges to face before we get to that point.

We don't seed any torrents ourselves and only support API access (mainly because we're a small team and have to focus our effort), but with the open license I hope the data can live on wherever/however it can.


If someone were to try to do a bulk download of the data (well, or whatever they thought was the most significant data) through the API for preservation purposes, might it put an undue load on your server infrastructure? Some kind of bulk data download might be useful insurance there.

There seem to be some interesting efforts to run SQLite in the browser so that server infrastructure only has to provide bulk data access, with precomputed indices to avoid full table scans; I wonder if those might be applicable here: https://blog.ouseful.info/2022/02/11/sql-databases-in-the-br... (though of course if you aren't using SQLite as your backend now it might be a headache)

Such an approach, if it were feasible, would have the advantage that bulk data downloads wouldn't look very different from normal use.


This would be a much bigger conversation, the SQLite efforts are very cool.

Short answer to your question is that the API load should be fine (I regularly download large subsets of the database myself via the API for research purposes), although there are good and bad ways of writing API queries. We have some tutorials, workshops, etc. available to help newcomers to our API write good queries.

We also have an email address set up (heavy.api.use@materialsproject.org) where people can give us a heads up if they are concerned about putting an undue load on our servers; as much as we try to have reasonable automatic limits set, sometimes we have had issues! API traffic continues to grow too, which in some ways is a nice problem to have, but does mean this is a moving target.


That's good to hear!


Is this aimed at inorganic materials in general, or are there areas of specialization like say, industrial catalysts for fluid-bed processes etc?


It is aimed at inorganic materials in general, and many of the calculations are bootstrapped from existing experimental crystal databases.

However, this is not to say there aren't some biases. A lot of the Materials Project collaborators work on battery research, so there is some bias towards battery materials. But people have used MP to search for new photocatalysts, for example (or carbon capture materials, new phosphors, thermoelectrics for solid-state refrigeration, lead-free piezoelectrics, transparent conductors, etc.. the list goes on).


Is there any way to request adding a new theoretical material? There are a couple of scandium based compounds that could theoretically exist that I think would be interesting to reason about.


Absolutely, yes. Materials Project runs a service called "MPComplete" where people can submit structures to "help complete the database." There's an API, or we're working on a new drag-and-drop interface on the website to quickly upload a CIF or similar.

By all means email me at mkhorton@lbl.gov if you're interested and I can sort it out.


Hi there! I'm hoping to learn more about Materials Project. Is your team aware that on the website, the documentation links are not working?


No, I was not aware, thanks for reporting! Have we missed a link somewhere? Docs link is https://docs.materialsproject.org and is online.


lol hi matt! Interesting finding Berkeleytheory folks out in the wild


hi Alex! :)


This is really cool -- I hope it's extended to be a way to centralize experimental data about materials as well. For instance I want the electron affinity and band gap of mono, bi, tri, and many layer black phosphorus or something, and I want to see all the different estimates and how they were extracted.


We have a mechanism for upload of experimental data (MPContribs[0]), that can then be linked back to the Materials Project's "material detail pages" for a given material. This also then provides a public API for bulk download of this data. We hope this will help make relevant experimental data more discoverable.

[0] https://contribs.materialsproject.org


For someone that understands this topic better than I, what's the difference between the target audience or information from this site compared to something like https://www.matweb.com?


There are a few differences, but broadly MatWeb is more useful for manufacturing and has a broader range of materials available (including plastics, extensive metallic alloys, etc.) and real world properties. These are materials you might purchase and use today.

In contrast, the Materials Project are computed predicted information on inorganic crystals (typically, ideal, on-stochiometric crystals), that might be used for many different device applications like solar, optoelectronics, batteries, etc. Many of these crystals will not be available to purchase and will need to be grown in a laboratory, and Materials Project is therefore much more focused towards active research into new materials.


"Supercomputing" is involved with this because it has electronic structure plots and other -- try the "random material" button without logging in and you will see most of the key data solid state physicists would use to understand a material.


Yes, this is almost exclusively a computational resource, with the exception of experimental data contributed by third parties. Most of our compute comes from the lovely people at NERSC[0].

All our predictions are benchmarked against experimental data wherever possible, but it's always a balancing act between things that can be calculated reliably and at scale, and the latest-and-greatest methods which give the most accurate predictions.

[0] https://www.nersc.gov


Long ago, I had a more than passing interest in the idea of ultraconductors, which were purported to be a string of "polarons" grown in tricky conditions on the surface of polymers, using ozone, UV, and a strong applied electric field. The company died in 2008... due to either it being a grift, or just the crash.

*Supposedly, they possessed conductivity about 10^6 times that of silver at room temperature, along the axis of growth.

Is there any way I could use this to see if there was merit in that idea?


> Is there any way I could use this to see if there was merit in that idea?

It likely can't give you an instant answer, but it can be a good starting point for a research project. For example, Materials Project has information about the dielectric properties of a material, has datasets for electron conductivities, vibrational (phonon) properties and the like. So you would start by searching the dataset for the properties of interest to get a shortlist of candidate materials, and then do more focused studies based on those.

Note that the Materials Project does also have known materials in its database that are currently used extensively in real-world devices too, so it can also be used to provide additional information about those materials. In this way, if you're looking for an improvement on an existing material, you can start with a known-good material and see if similar materials might exist that offer an improvement on your property of interest.


Looks like a fun way to come up with some new forever chemicals!


The term "forever chemicals" has been used to describe polyfluorinated organic molecules that are prone to bioaccumulation and highly resistant to breakdown by ordinary environmental mechanisms. The older term "persistent organic pollutants," [1] as codified in the Stockholm Convention on Persistent Organic Pollutants, is a broader way of referring to the same concept.

The Materials Project is for inorganic crystals. The project will not unleash new forever chemicals because it doesn't deal with organic molecules.

[1] https://www.epa.gov/international-cooperation/persistent-org...


I would agree with your comment, but I think it's fair to ask this question. Discovering new materials can have many unintended consequences, especially if they contain elements that are not earth abundant or have high costs (environmental, personal) associated with their extraction.


Yes this comment is easily dismissed as anti-science. But it’s not. It’s anti-harmful-technology. Until we learn to clean up our messes (using science!) I don’t believe we should get to make new messes. So, yeah it’s cool they made a database of materials, but what are materials people doing coming up with new materials - some of which will bio-accumulate - when our bodies are still filling up with all the other junk previous materials people came up with? It’s long past due for technology workers to take responsibility for their work and stop adding to the “technical debt” of environmental toxins we’re swimming in.


> Until we learn to clean up our messes (using science!) I don’t believe we should get to make new messes

What if the new materials are necessary to clean up the old messes?

With your method, we'd be stuck forever!

> Until we learn to clean up our messes (using science!) I don’t believe we should get to make new messes

Software, biology and chemistry are not even remotely comparable!


I can pretty easily compare them…

Software pollutes our culture and damages our political environments with toxic platforms, adtech tracking nightmares that have been co-opted by Stasi-esque government agencies. Those are just the first two things that come to mind.

Biology… hasn’t done too much harm. Biology’s biggest crime to date that I can think of is probably suicide seeds. Maybe there are things I’m not thinking of. Certainly crispr and GMOs pose a risk, but the benefits have outweighed the risk pretty well. But maybe we’ve just gotten lucky.

Chemistry… has a very long list. Forever chemicals, mustard gas and other chemical weapons, the chemical dousing chambers of el paso texas (search “bath riots”) and their legacy…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: