Interesting concept. However there are a few issues that surface quickly (this is all assuming a feed through scanner and not a flatbed).
#1 - It rotates AND scales to fit. It's not obvious until you rotate a stupid amount, but pages don't shrink when scanned for real.
#2 - The scanning rotation is way too uniform. Most scans twist a bit, typically near the top when more of the page is in the scanner to straighten it out.
#3 - With #2 there should be some stretching/skewing that isn't uniform.
#4 - The noise is way too uniform as well. It looks like static. Typical scanned documents have noise that is much more variable. You also get other scanning artifacts like streaks for dirt on the scan head.
#5 - The page ends often aren't even and introduce artifacting as well.
#6 - Needs an option for chewed up staple corner and/or holepunch.
> The noise is way too uniform as well. It looks like static.
This was my first question whether the noise is absolutely random and inconsistent, otherwise it would be very easy to detect by "authorities" who is insisting on manual scans.
I just had to do this dumb dance with TD Ameritrade. I did a coin toss on print/scan v.s. learning to do this with ImageMagick. Since I had a bunch of other deadlines to hit I wasted paper so they would accept my electronic submission.
Personally, from a workflow standpoint I'd prefer a PDF Printer driver that would add the effect. I'm sure the website is better for Mobile.
My biggest concern with an online service is privacy (either bad actors or the web app getting hacked). I used an online mortgage service that was fully paperless with the exception of a single document. Just ran it through a few imagemagick commands to add rotation, noise, contrast, etc. My home printer wasn't working so it was either that or buy a whole new one.
> My biggest concern with an online service is privacy
In this case it's all run client-side. You're still trusting that the code you're served wasn't maliciously replaced, but if you want to be careful you could run it in an incognito tab and temporarily disable your internet connection.
While this may work for unsophisticated attacks, wouldn't it still be possible for a more sophisticated adversary to do something more like store the document in browser local storage, and then later with internet access to post the contents?
I haven't spent a huge amount of time in the browser security space, but I do think there is quite alot of surface area if you give the browser session sensitive data.
If you are using an incognito tab, anything in local storage, cookies, even caches should go away. I am not 100% up on the details but I believe modern browsers are pretty strict about isolating incognito state.
You're right though in general, that's why the incognito tab is important.
I mean, I considered this implied within the suggestion of using incognito mode.
In any event, it's an unrealistic attack vector. No bad actor is going to target 0.1% of edge cases when you could get enough damaging information from people who do not go through this process and remain connected to the internet.
Did you try this? Does not work with FF 99 in a private tab on macOS 12, at least for me. It stays stuck at "Rendering finished, waiting for processing".
It's possible for an extension to intercept and block requests, but as Kevin mentions in your sibling comment, it's not enough because they could write data to local storage and then read it later when you're back online if you ever visited that domain again. An extension would have to cover a lot of bases to ensure that data couldn't leak, and I wouldn't trust one to cover them all.
That got weird fast (caution NSFW). It looks like an interesting project, but then it quickly devolved into a PDF filled with drawings of penises. Quite unexpected, glad I wasn't viewing it with my students in the room lol.
I still have to deal with bureaucracy that requires wet signatures. I've tried a few tools like this one, but no bueno. They could tell it was "digitally signed".
Very nice. On thing I would like to see a rotation range for multi page PDFs. A 10 page document won't all be identical rotation. One might be -0.2 and the next 0.3.
How is a web page app better than a shell script you can run yourself? Of course, any old port in a storm – if you can’t run a shell script, this is better. But if you can run things yourself, you should. A shell script will stay on your file system and not disappear in a couple of years when the original author either gets bored with it or decides to make it proprietary.
A friend of mine got a letter from his neighbor's attorney bitching about his tree or something stupid. So he literally wiped his ass with it and then took a picture of it and emailed it back with the body of the email saying thanks, I was running out of TP.
Recently had a client stiff us for $75k. His lawyer then sent us a letter trying to get us to agree to just keeping the $25k deposit or he would sue us for $1MM. he also made fun of my outdated linked in profile and accused me of fraud by restating everything in my CV as "alleged".
I spoke with a few lawyers and they said the appropriate response to that embarrassing letter is to throw it away and not think of it again.
They never sued for $1MM and technically I have something like 6 years to sue him for breech and the $75k if I want to... Likely won't as the legal bill will be $100k.
I know a bit about tree law in the USA. Neighbors have a right to trim (they must do it, unless the tree is obviously diseased.)any part of a tree that comes over their property line. It's a bit more nuanced, but that's basically the law.
(I belive the Tree Cutting outfit is suspose to cut the limbs with care. I mean they are not suspose to introduce pathogens from their tools onto your trees. I have Oak trees and worry about their health. I've noticed PG&E is hiring pretty much any out fit to trim trees. They are seemingly untrained. They don't know what that it's not the low voltage lines that are causing the fires? Years ago I noticed they don't seem to sanitize their chain saws. I once, before the fires, asked PG&E to spray a bit of bleach on their tooling through an email. The following day, there was a side show of sanitizer being sprayed everywhere, and they filmed it. I had no inclination of formally complaining. I just thought a quick clean might kill any pathogens. We have a problem in northern CA with Oak trees dying.
Oh yea, just because a healthy tree falls from your property and damages your neighbors property; it doesn't necessarily mean you are liable to repair their property. It's more complicated. If a lawyer reads this could you educate me?
I ended up just paying my neighbor, who's a lawyer, because a tree healthy tree fell on her property. The tree's trunk truck was 1/2 on the property line. I wasen't feeling good, and didn't want or could afford a fight with a miserable house bound attorney. From my understanding, I probally didn't need to pay anything. (I don't think she had a conversation with anyone without mentioned she passed the bar.)
Ya, but IMO, being a good neighbor is walking over and knocking on the door and saying Hi, My name is ____ and I am your neighbor, how are you today. Good to meet you. blah blah. Hey listen, that tree limb, can we work that out sir ? I can pay someone or do it myself or you can what are you comfortable with yadda yadda.
Resorting to lawyers deserves a shit stain unless the above does not work first.
I never resorted to lawyers. She is the the lawyer (She got her license years ago) whom told me she was going to sue. I threw in PG&E because it was related to trees. I don't have the money to consult lawyers. I can barely pay my property taxes.
Not really. People will normally try to resolve things without going to court unless there is a time sensitive issue that they need emergency relief like a temporary restraining order to secure, or they just want to make a point more than get compensation, because resolution without court is cheaper (meaning they get to keep more of the recovery), quicker, and—where an ongoing relationship is involved—less likely to burn bridges.
People who can sue you, and would have you dead to rights, will often bend over backwards to avoid it as long as possible without losing the option if there is a plausible chance at resolution without going to court.
disagree this "scary lawyer letter" scam has nothing to do with the law,
it has to do with a neighbor who is a cry baby who has no case and a lawyer who makes a lot of money off a little old lady who has him on speed dial.
It's a legal lawyer scam. He get's probably $1K for sending a stupid letter, little old lady gets to pay her lawyer to do her bidding instead of walking over and simply saying hey, you tree limb is over my property or whatever.
There's something extremely wrong with your implementation as it just takes too much to render every page.
I've done plenty of work in the past with both canvas and pdf.js (which is what you're using) and it shouldn't be that slow, at all. Perhaps you have a rogue loop that's calling a very expensive function on each pixel of every page, maybe?
Who knows, but for sure performance on that could be near real-time.
(at the end of the blur pass it prints the elapsed time to the console)
You're right, it does get kind of slow at 2x, but not that slow, on my laptop it takes around 1 sec/page, while on your site takes 20-30 secs/page. Also, my very naive code does not take into account "warming up" and some other code optimizations to make the blur much faster, you could easily get it down to 100ms/page, I'm sure!
Oh! You mean the scanning speed. I thought you was talking about the original PDF preview. For now, scanning is using emscriptened ImageMagick Wasm. Due to the translation from C++ to Wasm, the scanning speed is very slow. Maybe re-writing blur, rotate and noise algorithm will speed up the scanning.
You are right about performance, but does it really matter?
It feels like this is the sort of tool one needs (very) infrequently, and those cases don’t seem like the sort of thing where seconds really matter. I think it’s plenty good enough.
I prefer to focus on how grateful I am that the author has made this and published it for free.
When one first opens the site and nothing happens for 30 secs. you assume that the pdf you're looking at is the actual result (that happened to me, at least), then the other one pops up and you're like ... ooooh I get it!
I wrote a similar program using PDF.js that renders near real-time (https://parepdf.com). You should be able to queue it up without too much trouble. If you’re doing pixel level manipulation, you want to make sure you’re finishing within the browsers frame budget.
I've also had to do something similar to "forge" supporting documentation for medical claims. In order to claim FSA money, I had to provide detailed invoices. My hospital, however, was a big Kafka fan. They would only provide invoices that had a date and an amount, and those would take about 8 months to arrive. In order to get a detailed invoice, you had to call...but the catch is that detailed invoices were no longer available after 6 months. After every service, I'd have to immediately call for the detailed version, but if there were any after-the-fact adjustments due to insurance, I'd never be able to get a detailed statement.
To remedy this, I'd doctor previous invoices, and then print, scan, and fax to hide any editing artifacts. Keep in mind, this is all to get my own money that I'd contributed to the FSA. After that year, I just stopped using the FSA because it was such a pain.
The last FSA I had was the exact opposite. They put my FSA on a Visa card, then I went to the optometrist and forgot to use it and paid on my own credit card. A week later, I got a check in the mail from my FSA with a note basically saying "Hey, you could have used your FSA for that, so here's an automatic reimbursement."
EDIT: It may have been an HSA, not an FSA. I don't remember.
Oh, that's the best part. I did have the Visa, but for whatever reason they hospital never coded things properly, so I had to fall back to the manual reimbursement.
Since it's on a card issued by the FSA provider (I also don't know the correct word), they see the transactions in real time. Data that accompanies the transaction lets them know that it's reimbursable medical care.
The crazy thing is everyone is carrying around devices that would provide much better proof than a "wet signature".
It is trivial to take a timestamped and geo stamped video in this day and age of a person agreeing to a contract, and yet the standard is still "signatures".
Meanwhile people are posting video clips of themselves and their locations all day on WhatsApp/instagram/tiktok/youtube/facebook.
This really happens, especially at big companies. The lack of logic in requiring a literal wet signature but then scanning and emailing the resulting document gets lost in the "but the policy says...". It's mostly been with compliance and security groups in my experience.
A lot of companies in the EU are still refusing to accept eIDAS PDF signatures (which are actually verifiable, and required by EU and national law to be accepted for all purposes previously requiring a "wet" signature).
Likely, but it probably ends up being "harmless fraud" and even if prosecuted the judge would be like "what?".
If the bank really cared, they would ask for the PDF and have you mail the wet signed documents in.
Likely the requirement for a wet signature is left over from earlier times (think fax machines) OR they are trying to ensure that the person actually signing is the person signed (in other words, YOU did the signature, not you asking your wife/broker/whoever to apply it for you).
It is very real, unfortunately. I handles contract often and have clients who demands for wet signature, even during the pandemic. Majority of that coming from public sector.
In the situations where you are supposed to manually sign and scan a printed out pdf, this way instead you can paste your signature with transparency onto it, reform as pdf and then make it looked scanned.
At some point in my education, it was pretty common that some teachers sent us scanned PDFs instead of the original PDFs _or_ even more hilarious, gave us the printed scans of the PDFs.
I assumed that this software is basically a tongue-in-cheek reference to that, I had no idea this can actually have a practical purpose.
To comply with dumb signature requirements. I needed this a couple of times. Usually it goes like this. You fill out a PDF form electronically, and add the signature via Adobe. Then you email it.
A lot of the times the recipient will complain that the signature was not handwritten. They want you to print it, sign it and scan it again. So you do this trick to make it look like you followed this process.
This website is not the first one that tries to achieve this. But I would not trust uploading anything to a random Web server though.
I usually do it via command line. You can use ImageMagick (i.e via the 'convert' command) to achieve the same result. It's very handy.
But you could use it to get one over people who insist on receiving (scanned) 'originals' or 'wet-ink signature's, by combining it with something like handwritten.js [0]..!
The philosophy of technology behind this is fascinating. The need is
the clearest case of non-functional requirements I have ever
seen. When a process owner brazenly does not care about the outcome,
but cares only about forcing people to go through their arbitrary
steps, it is to stamp their seal of authority and control upon the
other.
As Bill Hicks says "Hey, pretend like you're working!"
Everything else is post-facto rationalisation. In other words, they'll
dream up anything as a way to explain why you have to conform to
their process, variously invoking "standard practice", "regulation",
"security", with total disregard for the truth. It is the process
with which they identify vicariously, are attached to, and are obliged
to defend. The process owns them.
As for the solution. Funny as it is, it's an example of tragic new
realm of digital technology whose purpose is to fake human agency, and
create desired appearance over any actual reality.
I'm not just talking about spambots, or automated essay mills for
students to buy their way to a degree one cheat at a time. These are
what Douglas Adams called "Electric Monks". They believe in make-work
bullshit so that real people don't have to. This is the future of AI,
the adversarial workplace, a technological arms race around make-work
wage-slavery which creates no tangible economic value; avatars that
stand in for people remote working so they can sunbathe in the
garden... like that little pecking bird that Homer Simpson gets to run
the nuclear plant by pecking on the Y key.
Whoever can afford the best Electric Monks wins the game, because they
will be able to free their attention for real life.
I've actually never seen the phenomenon through this lens, but I like it!
I think the clearest indicator that this is going on is when you can circumvent the process arbitrarily. Two example memories spring to mind:
In a visa office: "fill out form X, you can get one from the table over there" / "There are none left" / "OK never mind give me your passport and I'll stamp it".
At my Big Tech employer: "please fill out this document template detailing the update and version history of your service, for an audit" / "really? This looks time consuming and I don't really understand the the reason why you need it" / "OK, never mind then"
(Actually, at Big Tech I have found that replying along the lines of "really though?" is a very good first response when confronted with Processes. Sometimes when reporting bugs the template asks you to e.g. gather traces with browser extensions or whatever. I always say "I will do that if you first confirm that it will actually be useful for this bug" and haven't yet received such a confirmation)
Great commentary. I'd add shamelessly that whoever can _build_ the best electric monks dominates the game. The price of developing them will be miniscule.
Really hoping AI ushers us into the resource based economy where humans are freed from rudimentary labor.
A Tulane student got a bunch of funding because he developed a stand-in for folks for zoom meetings. Logs in, records, transcribes, the works. He developed it so he wouldn’t have to attend lectures during Covid. What you’re describing reminds me of this project.
It’s also called “Buelr,” which really captures the energy of what you’re talking about.
Hey man you were the one with that brilliant write up. I saved your comment to review again later. Incredibly insightful stuff. Already passed it to a few coworkers.
But there is something about the aesthetic of such things, it's why the IETF RFC's (example https://www.rfc-editor.org/rfc/rfc8200.html) are made to look like typewritten pages even decades after typewriters stopped being in common usage. I am surprised that they don't "go all the way" with that look and also apply some simulated coffee stains, dog-ears, and stapling artifacts.
No man, it's a deliberate look which sacrifices readability for some kind of retro-aesthetic whether they admit it or not. It's easy enough to reference things by section numbers.
And really, if they cared about being able reference things to the n-th degree, the figures would have been captioned and have their own figure-number instead of just sort-of "in there" like a paragraph (https://www.rfc-editor.org/rfc/rfc8200.html#section-4.4).
The html version looks pretty good even on my iPhone. The words are small but legible. The text version is zoomed in, but wraps unnaturally, and is hard to read. The pdf is hard to read, too zoomed out.
I think they did a pretty good job making a document that can be navigated as people are accustomed to, while adapting to the medium. The aesthetic is not without function.
> We have a word for it within the domain of philosophy and
literature: Kafkaesque.
I was thinking of something a little different and even considered
specifically excluding Kafka and indeed Weber (I've read a lot of
Franz Kafka but am a Cliffs Notes imposter on Max Weber) from my
comment.
In The Trial, or Before the Law, the anxiety lies in not knowing the
mind of a, possibly ambivalent, judging other. In modernity, Weber's
modernity, it is spelled out in intricate, mind numbing detail, in
reams of forms that must be gymnastically navigated. One step further
in the direction I am describing is the officers of Jaroslav Hasek's
Good Soldier Svejk In this incarnation bureaucracy is not an
all-powerful force to be feared, it is a stumbling, stuttering,
inconsistent fool of a thing that can be easily tricked. It brings
tedium not anxiety. I'll wager many hackers relate to that experience
of encountering systems.
That is what I mean by the vision of AI versus AI. Two broken retarded
robots sprawling about in the mud while humans gather around in a
circle and laugh. But the last laugh is on us for building them and
getting enchanted by the spectacle.
OSX's Preview allows you to import your signature by writing in on a white notecard or similar and holding it up the webcam. It then stores a vectorized version which can be added to PDFs.
I did it once with GIMP, and apply it every time with xournal. In my experience people do not really require that the PDF looks printed and scanned, so I never cared about that aspect.
I found this GIMP tutorial several years ago and have used this method ever since. I insert my signature into PDF documents using the stamp tool, unless the software has a more sophisticated method.
I scanned a signature and set of initials, traced them in Illustrator to neaten them up, colored the ink blue and blurred the lines a little in Photoshop, then saved with transparent backgrounds in a couple of formats. PNG and TIF are the ones I mainly use.
In my ancient version of Acrobat I created rubber stamps from the PNGs. Two clicks to drop them in, a quick resize and adjust the placement, and Bob’s your uncle. Never need a pen again.
I'll reiterate what I said there: I think there's a legal angle here that should probably be sorted out by lawyers & forensics experts before something like this goes into widespread use or gets marketed at professional scale.
The law sometimes requires certain documents to be "in writing" and there is, unfortunately, a legal tradition tied to this that "in writing" means "physically on paper", which lawmakers and bureaucrats unfortunately haven't managed to properly transition into the digital age.
However something that is quite a separate matter is the question of whether one needs to actually be in possession of that piece of paper. A scan of an original serves as proof that the original exists. ...and this is usually all that anyone requires for practical intents and purposes.
But: You're not supposed to do print/sign/scan, and then just throw away the original or not have an original in the first place. You're kind of supposed to keep it in case you're ever asked by a court to produce it. The document partially loses its forensic value if no original can be produced.
Even worse: If a digital forensics expert can prove in court that a piece of digital forensics came from this tool, it may no longer be possible to distinguish a scenario where you used it in good faith because you didn't have a scanner on hand versus a scenario involving fraud where a third party used a scan of your signature plus this tool to create the appearance that you signed and scanned a document that you never did in fact sign. Or it could be alleged that you might have deliberately used this tool instead of signing the document for real to deliberately engineer deniability into this piece of digital forensics based on the previous line of reasoning.
Personally, I'd advise anyone to just keep their distance from these sorts of tools.
And writing/using the tool as a web service where people upload documents instead of downloading the tool seems like an especially bad idea. Why take the risk of exposing presumably sensitive documents to a third party?
This project reminds me of another way to avoid dealing with taxing corporate policies that are nonsensical; receipts. If you are interested in this, you might also be interested in https://makereceipt.com/
What would one need a receipt for other than tax purposes? I suspect submitting one of these with your tax return to HMRC or the like, is quite probably "fraud" of some description.
Submitting it to your employer simply puts you or them on the hook for that same fraud if it happened to get picked up in an audit by the tax office.
Is there some other less legally grey use for these (because I like the idea)?
I don't know about legally, but if you actually bought something for business and actually lost the receipt or they weren't willing to give you a receipt, I'd consider it ethically okay to write up a receipt.
Presenting a self-written receipt as a fake of a real receipt, not so much.
But if they aren't willing to take a self-written receipt, what do you do ...
Say I get lunch on a business trip and lose the receipt, I now can't expense it. In a world where I never keep receipts normally this happens all the time. Being able to recreate a receipt so I can expense looks super cool.
Though I've at times not been given receipts on some of my business trips, my employer has always allowed me to claim the expense back with a reasonably accurate amount, without receipt.
I guess the company takes the fairly minimal tax hit by nownactually claiming it, and they allow it due to trust (and low numbers).
Unless you paid cash, your bank or credit card company will remember the amount for you. I don't know if most restaurants receipts are going to itemize the bill. But even if most do you can just say you went to one that didn't.
Some places want it itemized. Also if you use cash you don't have a CC bill. Back when I was a student I had to often buy things for student events with several hundred dollars in cash because the CC company wouldn't give me a higher credit line at the time. I didn't want to use a debit card, that's risky.
That's easy, because the prices might still be physically listed somewhere if it's a store, or you might still have the Craigslist email thread, or whatever.
If you simply lost or don't have a receipt and it's done in good faith I don't think it should be considered fraud.
It’s only fraud if the information on the fake receipt is false, and if you used this false information get money or a benefit that you’re not entitled to.
I haven't been able to find out what the situation in the States is. But in Germany, for instance, it is definitely illegal as it is considered document forgery. It is thus punishable by up to 5 years in prison, or even 10 years in severe cases[0].
In fact, AFAIK a company is not even allowed to issue an invoice twice without clearly labeling it as "copy of the original" and most companies must keep the invoices they issued as well as their books (and thus the invoices they received) for at least 10 years[1].
I suppose the legal situation must be very similar in other EU countries. One reason being, for instance, that invoices are used (by companies that deduct VAT) to request back from the tax authorities the VAT that one pays on invoices by other companies (in the EU). The invoice you receive from a company must therefore match that company's books. All hell would break lose if everyone could just forge invoices, whether with bad intentions or not.
Case in point: Here[2] is a guy asking for legal advice as he's being charged with document forgery even though we wasn't trying to defraud anyone or anything. Specifically, he had lost the invoice of the TV he was trying to sell on Ebay, so he ended up forging it because Ebay required people to upload the original invoice to sell TVs on the marketplace.
Invoices, and more generally the receipts underlying bookkeeping transactions, are the fundamental building block of reliable bookkeeping (and company audits, mind you).
This is a fun project. If there was an option to have a Xerox effect, this could be fun for zinemakers too. I found a discussion where people were figuring out how to recreate GIMP's "Photocopy" effect in ImageMagick:
was interested in how its handling the PDFs - looks like it uses magica (a wasm compiled imagemagick) to do the processing:
https://github.com/cancerberoSgx/magica
Funnily enough, the site is blocked by my college's security software: "Access to this web page has been restricted due to Federal/State Legislation and/or official xxx College policies."
If only they were that smart… they probably have the “block all sites with no reputation information” option turned on… which is functionally “all sites the vendor hasn’t indexed yet” and hits brand new sites.
In 2020 we designed a website made of photocopied book pages. It was a poetic project from a teacher to his students: a selection of texts he found worthy to share during the first lockdown.
When he approached us with the content assembled from photocopies we thought that it’d be the right solution to keep the richness and somehow the warmth of the pages (there are also the pencil-made marks).
I know my scanner also sharpens images a lot (by darkening and increasing contrast?), which with the noise and blurring makes a pretty distinctive look. These still look very "clean" compared to this: https://i.imgur.com/tBkVVic.png
This app would be great if it took a simple 1mb file and adds an extra 50mb, embeds a bunch of unnecessary fonts, maybe replaces utf characters with lookalikes from other languages...
Then it becomes a useful tool to deal with opposing counsel...
It would be great if some more sophisticated effects can be added like blur with a gradient intensity to simulate the page being not perfectly pressed to the glass, and per page randomization
So maybe a variant which makes it look like a photo - fake background, some perspective warping, bad lighting, with fake phone EXIF and selectable geolocation.
once i spent a few months trying to fool a website and their "fraud assessment team" into giving me a login. i was being asked to "give notarized copy of your business license" and what not. i tried all these things and more, went to the extent of making rubber stamps online, pasting images in random sizes, place and then pseudo scanning them.
sadly i ended up being busy in other work and they dropped the application because i hadn't submitted some "important" docs. oh well
Here's an even easier way to make your pdf look scanned: open it up on your laptop, take a picture of the screen with your phone using CamScanner or Adobe PDF scan.
Of course this becomes cumbersome if you have more than a few pages
Hmm, I guess it depends on the task and workflow. I find this easy if I have to send out the document via Gmail, WhatsApp etc.
After I open up the PDF document (which I have to, anyway), the remaining steps happen on my phone. I find picking the right scan filter convenient on the phone (relative to point-and-click on a laptop) - I guess this talks more to the UI of the scanner apps. Then "sharing" the final document via the right app (mail etc) right from the interface of the scanner app is also fast.
Overall, I have noticed this takes me 5-15s to "scan" and send, per page.
#1 - It rotates AND scales to fit. It's not obvious until you rotate a stupid amount, but pages don't shrink when scanned for real.
#2 - The scanning rotation is way too uniform. Most scans twist a bit, typically near the top when more of the page is in the scanner to straighten it out.
#3 - With #2 there should be some stretching/skewing that isn't uniform.
#4 - The noise is way too uniform as well. It looks like static. Typical scanned documents have noise that is much more variable. You also get other scanning artifacts like streaks for dirt on the scan head.
#5 - The page ends often aren't even and introduce artifacting as well.
#6 - Needs an option for chewed up staple corner and/or holepunch.